Production-Ready AI Chatbot Development

Most chatbots deflect. Few resolve. The bots customers actually like are the ones that complete the task they came for: book the appointment, change the order, update the account, refund the charge. We build chatbots that connect to your real systems, ground every answer in approved data and hand off cleanly when they hit the boundary of what they should do.

Pixelfield is for Heads of CX, Heads of Operations and CTOs at companies where support volume is growing faster than headcount. The seniors who scope the work are the seniors who write the code. 50+ AI features in production, fastest deployment two months from kickoff to live, and a reputation for telling you when Intercom is enough.

  • Resolution, not just deflection
  • Connected to your real systems
  • Handoff designed as a feature
  • Flat run-rate, no per-resolution surprises
VeoliaUniversal studiosMercedesVienna insurance groupRaiffeisen BankGeometryWagestreamCinestarWMC | GREYNOAHOgilvyAmeli
4.9/5 on Google
4.8/5 on Trustpilot
5.0/5 on Clutch

Shipping AI and support automation for scaleups and enterprises across the UK, Europe and the US since 2013.

A London based engineering team that will tell you when Intercom is enough, not bill you to build a custom bot anyway.

/ Deliverables
What We Build (and What We Don't)
01
Resolution Bots, Not FAQ Deflectors
Most chatbots send users to a help article and call it a 'deflection'. The customer still has the unresolved problem. We build bots that complete the task: cancel the order, update the booking, escalate the dispute with full context, refund the charge inside the policy. Each action is grounded in your real data and validated before it runs. Resolution rate is the metric we track, not deflection rate.
Task completion
Account-aware bots
Grounded answers
Resolution rate
02
Real System Integration
A bot that can only answer FAQs is a search box with extra steps. Useful bots read and write to your CRM, helpdesk, billing system, booking engine and inventory with scoped permissions and audit logging. We build the bot and the integration layer, so the model can actually do things rather than just talk about them. Standard integrations: Zendesk, Intercom, Salesforce, HubSpot, Stripe, Shopify, Twilio, custom APIs. Where you need a system we don't already know, we'll spike it during discovery.
CRM and helpdesk
Billing and bookings
Custom APIs
Audit logging
03
Handoff Designed as a Feature
The single biggest CSAT driver in any chatbot system is what happens when the bot can't help. We design the handoff first: multi-signal escalation triggers, structured context transfer to the agent, no 'please tell me your problem again'. Bot-to-agent CSAT in well-handled handoffs runs around 92%; in poorly designed ones, CSAT collapses fast. The bot's job ends with the agent reading a one-line summary, not a transcript dump.
Escalation triggers
Structured context
Agent-side payload
CSAT-first design
04
Knowledge Grounding and Hallucination Mitigation
A doctor-appointment bot claimed to have scheduled appointments that never happened. A delivery bot said 'your order is on time' while the driver hadn't moved. Confident wrong answers create more tickets than they close. We engineer against this with retrieval grounded in approved sources only, output validation against typed contracts, forced tool calls before any state-changing claim, and faithfulness scoring tracked per release. The bot says 'I don't have that information' instead of guessing.
Grounded retrieval
Typed output contracts
Forced tool calls
Faithfulness scoring
05
Channel Implementation (Web, WhatsApp, Telegram, Voice)
Each channel has its own session model, formatting and rate limits, so we implement them properly rather than wrapping a generic bot. WhatsApp template approvals, conversation windows, voice latency budgets and platform-specific compliance rules are part of the channel scope, not an afterthought. Web widgets, in-app chat, WhatsApp Business API, Telegram, voice via Twilio or your CCaaS of choice.
Web and in-app
WhatsApp Business
Telegram
Voice and CCaaS
06
Cost Architecture: Flat Run-Rate, No Per-Resolution Surprises
Per-resolution pricing looks fine at small scale. At 10,000 conversations a month and 50% AI resolution, Intercom Fin runs around $5,000 in resolution fees alone, before seats. We design with flat infrastructure cost, model routing (cheap model for simple queries, frontier only when it matters) and semantic caching. You get a quoted run rate before you commit to the build.
Flat infrastructure
Model routing
Semantic caching
Quoted run rate
07
Measurement, Iteration and the Feedback Loop
Most chatbots are launched and never measured properly. We ship every bot with a dashboard tracking resolution rate by topic, handoff rate, post-handoff CSAT, repeat contact rate and cost per resolution. Every two weeks we review where the bot is failing and feed those cases back as KB updates, prompt changes or escalation tweaks. The bot improves because someone is actually watching it.
Resolution by topic
Handoff CSAT
Cost per resolution
Two-week iteration
08
IP Ownership and Handover
You own everything we deliver. Source code, prompts, knowledge base configurations, integration code, evaluation harnesses, infrastructure-as-code, fine-tuning recipes where applicable, and documentation. No vendor lock-in. No rented layer we hold behind a retainer. At the end of the engagement we can hand the system to your internal team with a runbook and a training session, or continue with monitoring and iteration. Your call.
Full IP transfer
No lock-in
Runbook + training
Take in-house anytime

When a Custom Chatbot Is the Right Call

Off-the-Shelf Platforms Have Stopped Working

Intercom Fin, Zendesk AI, Drift before it shut down. These are good for moderate volume, mostly-FAQ workflows where deflection to a help article is acceptable. They stop working when you need account-aware actions, custom integration, predictable flat pricing at higher volume, or a resolution rate higher than what their boilerplate model can deliver. If a platform is enough, use it. If it isn't, the next step is custom.

You've Tried a Chatbot and Customers Hated It

Loops. Confident wrong answers. 'Let me connect you with a human' followed by a blank ticket where the agent has to ask the customer to start over. A bad chatbot is worse than no chatbot. We start by acknowledging that and re-architecting around the things that actually drive CSAT: grounded answers, structured handoff, escalation that triggers on confidence rather than retries, honest 'I don't know' responses.

The Bot Needs to Do Things, Not Just Answer Things

Booking changes, refund processing, account updates, order edits, password resets, plan upgrades. Transactional bots are most of what we build. The bot reads and writes to your real systems via OAuth-scoped permissions, irreversible actions go through human approval, every action is logged, there is a kill switch. This is the work hosted platforms can't do without significant custom development on top.

Volume Is Making Per-Resolution Pricing Painful

Around 10,000 conversations a month is where most platforms become the most expensive option. $0.99 per Fin resolution at 50% resolution rate is roughly $5,000 a month in resolution fees alone, before seats and add-ons. A flat-cost custom build with model routing and caching becomes the cheaper option, and the quote doesn't move when your traffic does.

How the Engagement Runs, Week by Week

01

Discovery and Conversation Mapping (Weeks 1-2)

We work with your real ticket data, transcripts and knowledge base to map which queries are bot-shaped, which are deterministic and which should not be automated. Discovery includes a technical spike on the systems the bot has to talk to, because integration is where the timeline actually lives.

You receive: a conversation flow map, escalation policy, integration plan, projected resolution rate based on your real traffic, and a fixed-scope build quote.

02

Paid Pilot on Your Real Data (optional, 2-4 weeks)

For higher-stakes deployments we run a paid pilot on a narrow, well-scoped slice of your support volume. The pilot measures real resolution rate, handoff CSAT, hidden cleanup hours and customer reaction. We report numbers, not vibes.

We have stopped pilots when the resolution economics didn't justify the build. That's what the pilot is for.

03

Build, Integrate, Guardrail (Weeks 3-10+)

Build runs as a small senior team led by Michal Vavra, with AI and integration engineers embedded with your CX team. Each release ships behind feature flags with conversation regression tests, output validation, escalation logging, drift detection and cost telemetry in place from week one.

You receive: a working bot in staging, then production. Knowledge grounding is treated as a first-class concern. The bot cites sources or admits uncertainty.

04

Launch, Measure, Iterate (optional retainer)

Live deployment with monitoring on resolution by topic, handoff rate, CSAT delta, cost per resolution and knowledge freshness. Every two weeks we review where the bot is failing and feed those cases back as KB updates, prompt changes or escalation tweaks.

Industry benchmark: 90 to 120 days from launch to first measurable ROI lift. We size the retainer to that horizon. Take it in-house whenever you're ready.

ENGAGEMENT SHAPES AND PRICING BANDS

We publish pricing bands because nobody else does and buyers deserve better than "it depends". Exact numbers are defined in your proposal, but the shapes below are the shapes we actually sell. Discovery starts from £2K. Production builds typically start around £8K and run six to ten weeks to a live system. Flat monthly pricing on the run side. No per-resolution surprises.
Discovery (from £2K)
Weeks 1-2. Real ticket analysis, integration spike, projected resolution rate, fixed-scope quote.
Paid Pilot (optional)
2-4 weeks on a narrow slice with real metrics: resolution, CSAT, hidden cleanup hours.
Production Build
Typically £8K+, 6-10 weeks to a live bot with grounded retrieval, integrations, handoff and observability.
Maintenance and Iteration Retainer
Monthly. KB sync, prompt tuning, drift checks and on-call. Optional, priced in bands.

Frequently Asked Questions

Direct answers to the questions Heads of CX and CTOs ask us on every <strong>scoping call</strong>.

Deflection means the bot kept the conversation away from a human, often by sending a link. Resolution means the customer's task was actually completed. Industry benchmarks across 220M+ AI interactions land around 45% fully AI-resolved without human handoff for general support, rising to 70-90% for narrow, well-scoped use cases (Comm100, Botpress, Gartner). We size the proposal to your traffic mix and tell you honestly which band you're likely to land in.

If your queries are mostly FAQ-shaped and your volume is moderate, those platforms are usually the right choice and we'll say so. Custom development pays off when you need account-aware actions, deeper integration, flat predictable pricing at higher volume, or resolution rates above what the platform's boilerplate can deliver. At 10,000 conversations a month, per-resolution pricing on a hosted bot can easily run $5,000+ in fees alone. Our flat retainer becomes the cheaper option.

Layered mitigations: retrieval grounded in approved sources only, output validation against typed contracts, forced tool calls before any state-changing claim, confidence thresholds that escalate instead of guess, and faithfulness scoring tracked per release. The bot is configured to say 'I don't have that information, let me get someone who does' rather than improvise. We measure the actual hallucination rate and ship mitigations until it's acceptable for your use case.

Multi-signal escalation: explicit request, low confidence, frustration sentiment, repeated rephrasing, complexity rules, topic-based routing. The agent receives a structured handoff payload: full transcript, AI-generated 2-3 sentence summary, intent, sentiment flag, customer/account state, suggested next action. The agent opens with 'I see you were asking about order #123' rather than 'how can I help you?' That single design choice is the largest CSAT lever in a chatbot system.

Discovery is fixed-fee from £2K. Production builds typically start at £8K and run six to ten weeks. Run rate is flat monthly: hosting, model inference (with routing and caching), retainer for monitoring and KB sync. We quote both numbers in the proposal so there are no per-resolution surprises later.

Yes. Transactional bots are most of what we build. The bot can read and write to your CRM, helpdesk, billing system, booking engine and inventory via OAuth-scoped permissions. Irreversible actions go through human approval or an explicit policy check. Every action is logged. There is a kill switch.

Industry data lands first measurable lift between 90 and 120 days after launch (data accumulation, prompt tuning, knowledge base maturation). Full ROI payback typically inside 6 to 14 months in well-scoped deployments. We design the retainer and the review cadence around that horizon. Roughly 95% of GenAI pilots fail to show P&L impact (MIT 2025); the difference is narrow scoping, real measurement and disciplined iteration, all of which we treat as part of the build.