Picture the scenario. A UK founder uploads 50 documents into ChatGPT Projects, configures a Custom GPT with the full product catalogue, or subscribes to Claude Pro and creates a Project containing every policy document the company has ever written. Then she asks a specific question about her own business.
The AI responds with confidence. The answer is factually wrong. Or generic. Or based on a policy rewritten last year that never actually made it into the upload.
This is not a failure of artificial intelligence. It is a retrieval failure. The AI never accessed the correct document. It looked at a fragment, missed the point, and defaulted to its general training data instead. The technical name for this is naive RAG failure, and most off-the-shelf “knowledge base” products available in 2026 are little more than naive RAG dressed up with a polished interface.
This guide explains what is actually happening when AI does not know your business, when off-the-shelf tools are enough, when they are not, what a proper RAG system costs in pounds, and how to tell which side of the line your business sits on. Fifteen minutes, plain English, written for a founder rather than an engineer.
Making AI “know your business” means giving it access to your specific information (products, customers, policies, processes) at the moment it answers a question, instead of relying only on what it learned during initial training.
The simplest implementation is retrieval augmented generation (RAG): a separate system retrieves the relevant passage from your documents, then prompts the AI to answer using that passage. Done correctly, this turns a generic model into one that responds with your actual data. Done badly, it hallucinates with confidence.
Every system that makes AI knowledgeable about a specific business falls into one of three layers. Knowing which one you are using, or need, is the first step to fixing it.

You paste the information directly into the conversation, or set it up as a system prompt. ChatGPT Custom Instructions, Claude Projects, and the instructions field of a Custom GPT all live here.
You store your information separately, and a retrieval system fetches the relevant pieces when a question arrives. The AI answers only from what was retrieved.
You do not add knowledge. You change how the model behaves. Fine-tuning updates the model weights to make it write in your tone, follow your specific format, or classify things the way you do.
The trap most SMEs fall into is hearing “AI agent” or “knowledge base” and jumping straight to Layer 2 without checking whether Layer 1 would have done the job for £20 per user per month. The opposite mistake is also common: brute-forcing Layer 1 by pasting 80 PDFs into a Claude Project and being surprised when context collapses.
If you want to know which layer is right for your business, our RAG development team starts every engagement by answering exactly that question.
The most common workflow we see at Pixelfield: a founder spends a week setting up a Custom GPT with 30 to 50 documents, briefs the team to use it, and within a fortnight the team has quietly stopped.
Reasons vary by person. The underlying problem is always one of three things.
Uploading more files makes off-the-shelf knowledge bases worse, not better.
“Builders who upload 100 documents into a Custom GPT think they have built a knowledge base. They have built a haystack.”
// PRACTITIONER ON X
Signal-to-noise collapses. Ten well-curated documents beat 100 mediocre ones, and most platforms do not let you tune what gets retrieved or how.
When you upload a document to ChatGPT or Claude, the platform splits it into chunks for retrieval. You cannot see how. You cannot change the chunk size. You cannot add overlap between chunks.
What this means in practice:
Production RAG systems use multiple retrieval steps, rerank the results, and rewrite vague queries into more precise ones. Off-the-shelf platforms do one shallow retrieval pass and hand the output to the model.
When the right chunk is at position 6 and the platform only sends positions 1 to 3, you get a confident wrong answer.
None of this means off-the-shelf tools are bad. For the right use case, they are excellent.
ChatGPT Projects and Claude Projects work well when:
Where they fall down is being used for the wrong use case. If you are trying to give your customer service team instant answers from 400 product specifications, three years of support tickets, and a constantly updating returns policy, you have outgrown the off-the-shelf layer. You need actual retrieval architecture, which is what the rest of this guide covers.
Once SMEs move past off-the-shelf tools and commission a custom RAG build, most of what gets built is what practitioners call naive RAG.
It works in demos. It impresses stakeholders. It breaks in production, often quietly, in ways nobody catches for weeks. Understanding the difference between naive and practical RAG is the single most useful thing a non-technical founder can learn before commissioning a build.
Across the 50+ AI features we have shipped and the practitioner data we have reviewed, naive RAG fails in five repeating ways. If a system you are evaluating shows any one of these, it is not production-ready.

The answer to a customer’s question lives across two chunks, A and B. The retriever returns chunk A. The model answers using only chunk A. The information in chunk B never reaches the answer.
“You blamed the model. You should have blamed the chunking.”
// BUILDER ON X
Retrieval returns 10 chunks. The right one is at position 6. The model, given a long context, pays most attention to the start and end of what it has been given. Position 6 effectively gets skipped, even though it contains the correct answer.
A customer asks about “Policy 4.2”. Vector search returns Policy 4.3 because the embeddings are semantically similar. Pure vector search is bad at exact phrases, codes, dates, and abbreviations, which is exactly what business knowledge contains.
The database updated five minutes ago. The vector index updated last week. The agent answers confidently from last week’s version. Users assume the system is current. Nobody catches it until a customer complains.
Retrieval returns the right chunk. The model ignores it and answers from training data anyway. This is the most embarrassing failure mode, and the one nobody talks about. Even with perfect retrieval, the model can decide it knows better. Mitigations exist, but they require active engineering.
Practical RAG is not different technology. It is the same RAG with five engineering layers added to address each of the five failure modes above.
When a vendor says “we use RAG” without specifying which of these layers they have built, you are almost certainly looking at naive RAG with marketing on top. Let us walk through the layers that make the difference.
The honest version of “how to build RAG” is that 80% of the work is data engineering and only 20% is machine learning. The five layers below are what separate a demo from a production system, and they map directly onto the five failure modes above.
Every production-ready RAG system we build, and every one we have audited that actually works, has all five of these layers. Missing any one is a leading indicator of failure within six months.

Documents are split into pieces that preserve meaning, not arbitrary length.
This is the layer most off-the-shelf tools skip entirely, and the reason their results plateau quickly.
Pure vector search is good at meaning but bad at exact matching. Pure keyword search is the opposite. Practical RAG combines both: vector search plus BM25 (a keyword algorithm used in proper search engines) plus metadata filtering, with the results fused together.
This single change typically moves retrieval accuracy from around 70% to around 90% on real business data.
Hybrid retrieval might return 30 candidate chunks. A reranker (usually a cross-encoder model) reorders them based on actual relevance to the query before the top 3 to 5 are sent to the model.
Reranking is the single highest-leverage upgrade for most naive RAG systems. It is also the layer skipped most often, because it adds latency and infrastructure complexity that no demo ever shows.
Real users ask vague, ambiguous, and context-dependent questions (“Is this still valid?” where “this” was mentioned three turns ago). Query rewriting takes the original question, expands it with conversation context, and converts it into one or more cleaner search queries before retrieval runs.
This is where most “I asked X but it answered Y” complaints get fixed.
Production RAG needs a permanent evaluation set: a list of real questions with known correct answers that the system is tested against continuously. When retrieval accuracy drifts, you find out within hours, not months.
Frameworks like RAGAS measure:
Without this layer, you cannot tell whether your RAG is improving or quietly degrading.
These layers are not optional. They are what separates a demo that impresses your board from a system that holds up when real customers use it. A vendor or agency that cannot walk you through their plan for all five has not built production RAG. They have built a chatbot with extra steps.
If you want to see all five layers in practice, this is exactly the work behind our RAG development service.
The single most common SME mistake we see is reaching for the wrong tool. Fine-tuning a model to know the products. Stuffing the entire company handbook into a prompt. Commissioning a £30k RAG build for 20 documents that change once a quarter.
Here is the actual decision rule.
Three questions, in order. The answers tell you which approach you need.

In our experience, 95% of SME use cases are solved by prompt engineering (small, stable knowledge) or RAG (large or changing knowledge). The remaining 5%, almost always high-volume classification, structured output generation, or specific tone preservation at scale, are the only cases where fine-tuning genuinely earns its place.
Most “we need to fine-tune a custom model” conversations end with us building RAG instead. Most “we need a RAG system” conversations end with us recommending a better system prompt for a quarter of the price. The right answer is usually one layer simpler than what the founder originally asked for.
“It depends” without a number is what vendors say when they do not want you to compare. The benchmark below reflects what we have seen across UK SME RAG projects in 2026, at Pixelfield and across the wider market.
| Tier | Setup | Monthly | What you get |
|---|---|---|---|
| Off the shelf | £0 | £20 to £60 per user | ChatGPT Enterprise, Claude Teams, Microsoft Copilot. Fine for small, stable knowledge. Hard ceiling around 50 documents. No retrieval control. |
| No-code RAG | £500 to £2,000 | £200 to £800 | Voiceflow, Stack AI, Make plus OpenAI API. Good for one well-defined use case. Maintenance is on you. Limited retrieval tuning. |
| Custom engineered RAG | £8,000 to £20,000 | £100 to £500 (API) | Production RAG with all 5 layers (chunking, hybrid retrieval, reranking, query rewriting, evaluation). Where Pixelfield works. |
| Multi-source enterprise RAG | £20,000 to £60,000+ | £500 to £2,500 | Multiple data sources, role-based access, audit logging, multi-tenant. Justified when knowledge spans CRM plus ERP plus document store plus custom database. |
The £8,000 to £20,000 tier is the floor for SME RAG that will not quietly fail in production.
In a custom RAG build, the model and embedding API costs are tiny, usually under £100 per month for an SME. The real cost is engineering hours, distributed roughly as follows:
If a quote skews heavily toward “model selection” or “prompt engineering” with little time on data and evaluation, that is naive RAG with consultancy decoration.
If you want to test where your project sits on this benchmark before committing, a scoped AI proof of concept typically lands in the £3,000 to £6,000 range and answers the question without locking you in.
The costliest RAG project is the one you did not need to build. In our experience, about 40% of “we need AI to know our business” enquiries actually need proper RAG. The other 60% need something simpler.
Here is how to tell which side of the line you are on.
If your business knowledge is under roughly 30,000 tokens (broadly, under 50 pages of structured information), changes less than once a month, and does not include sensitive customer-specific data, a well-written system prompt in ChatGPT Teams or Claude for Work will outperform most custom RAG builds.
Cost comparison: £20 to £60 per user per month versus £8,000 to £20,000. Test this first.
Some businesses think they need AI to “know” their documents when what they actually need is better internal search.
If your team’s complaint is “I cannot find anything in our wiki”, a properly configured search tool (Notion AI, Glean, Mem) might solve it for £8 to £20 per user per month, without any RAG build at all.
RAG amplifies whatever is in your data. If your knowledge is in someone’s head, scattered across 47 unorganised folders, or contradicts itself across versions, RAG will return the same chaos faster.
Fix the underlying data organisation first, then revisit the build question.
“I want AI to update our CRM when a customer emails us” is not a RAG problem. It is an agent problem.
The line between knowledge and action matters more than most founders realise:
Many enquiries we receive labelled “RAG” are actually AI agent projects in disguise. The right diagnosis saves months.
The fastest way to waste £15,000 on RAG is to commission a build before answering four questions. Here is the framework we use with every SME engagement.
Not “answer customer questions”, because that is not a question. Something like “What is the warranty period for product SKU X?” or “What is our refund policy for orders placed before Date Y?”
Specific. Measurable. Pick the question that costs you the most when answered wrong or slowly today.
Before any code is written, list 30 to 50 real questions with their correct answers. This is your benchmark.
Anyone proposing to build RAG without committing to an evaluation set is either inexperienced or selling you naive RAG with extra steps.
Set up ChatGPT Teams or Claude for Work, load the relevant documents into a Project, and run your evaluation set against it.
If accuracy hits 80% or above, you are done. Total cost: £20 to £60 per user per month.
We have stopped projects at this step more than once and saved the client a five-figure sum.
Custom RAG needs all 5 layers from earlier: chunking, hybrid retrieval, reranking, query rewriting, evaluation. Ask any vendor to walk you through their plan for each layer. If they hedge on any of them, walk away.
Run your evaluation set weekly for the first month, monthly after that. Watch for accuracy drift.
RAG quality degrades as your data changes, which is normal. Catching it early is the difference between a system that works for years and one that quietly fails for six months before anyone notices.
That is essentially what our free AI readiness audit does in a single conversation, applying these five steps to your business.
When your documents become vector embeddings, those embeddings are still personal data if the underlying documents contained personal data. UK GDPR rules on storage location, processor agreements, and data retention all apply.
Three practical implications:
A customer asks to be forgotten under GDPR. You need to delete every chunk of data that mentions them, including embeddings derived from that data.
Most off-the-shelf knowledge bases do not make this easy. Some make it impossible. If you are processing customer data through RAG, your architecture has to support targeted deletion, or you have a legal liability waiting for the day someone exercises their rights.
The EU AI Act begins enforcement in August 2026.
Off-the-shelf platforms typically log generation but not retrieval. If you operate in a regulated industry, your RAG needs to log which chunks were retrieved for which query, and why. Our enterprise AI solutions work builds this in from the architecture stage rather than bolting it on later.
It means the AI has access to your specific information (products, customers, policies, processes) at the moment it answers a question, either through a system prompt, retrieval augmented generation (RAG), or fine-tuning. For most UK SMEs, the right approach is RAG: a system that retrieves relevant information from your documents and feeds it to the AI in real time. Pure model training (fine-tuning) is rarely the right answer for business knowledge.
Naive RAG is the basic version of retrieval augmented generation: split documents into chunks, store them as vectors, retrieve the closest match, ask the AI to answer. It works in demos but fails in production in five common ways: bad chunking, lost-in-the-middle retrieval, exact-phrase failures, stale data, and the AI ignoring retrieved context entirely. Practical RAG adds five engineering layers that prevent each of those failures.
For SME use cases, yes, significantly. Custom RAG typically costs £8,000 to £20,000 to build and £100 to £500 per month to run. Fine-tuning a model costs more upfront, requires substantial training data, and locks you into a snapshot of your knowledge that goes stale within weeks. RAG is also much faster to update: change a document and the system uses the new version immediately.
Three legitimate cases. First, very high query volume (10,000+ per day) where fine-tuning a small model is cheaper per call than running RAG. Second, behaviour change rather than knowledge change, making the AI write in your specific tone or format. Third, very narrow classification tasks where consistency matters more than flexibility. For most SMEs, none of these apply.
Three main ones. RAG only works if your data is well organised, because garbage in equals garbage out. RAG adds latency (typically 1 to 3 seconds per query) because of the retrieval step. RAG can fail silently when the wrong chunk is retrieved and the model answers confidently anyway, which is why production RAG needs continuous evaluation rather than one-off testing.
A scoped proof of concept can be production-ready in 2 to 4 weeks. A first proper RAG system for one workflow typically takes 6 to 10 weeks from kickoff. Anything quoted under 2 weeks is either using a no-code platform or skipping the evaluation and tuning layers that make RAG actually reliable.
Yes, with the right architecture. Practical RAG can include role-based access (so the AI only sees what the asking user is authorised to see), data residency controls (embeddings stored in UK or EU regions), and right-to-erasure support (targeted deletion when customers exercise GDPR rights). Off-the-shelf “AI knowledge base” tools usually do not offer these, which is the main reason regulated UK SMEs end up commissioning custom builds.
The hardest part of giving your AI access to your business knowledge is not the engineering. It is deciding which knowledge actually needs to be there, in what form, and whether you need RAG at all.
Most enquiries we receive labelled “we need a knowledge base AI” turn out to need something simpler: better document organisation, a tighter system prompt, or a different tool entirely. About 40% genuinely need proper RAG.
The fastest way to find out which side of the line your business is on is to walk through your data, your questions, and your team’s actual workflow with someone who has built both kinds of systems. We do this as our free AI readiness audit: a structured conversation with no sales deck and no commitment.
If you have not already read it, the companion piece to this guide is our Complete Beginner’s Guide to AI Agents for SMEs, which covers the action side of business AI. RAG answers questions. Agents do things. Most growing companies eventually need both.