Architecture patterns

Four patterns, one right answer for your use case.

Elk patroon heeft verschillende kostendrijvers, operationele vereisten en faalmodi. De wizard hierboven beoordeelt alle vier tegen uw specifieke input.

RAG·For fresh data + cited answers

Retrieval-Augmented Generation

RAG wins when your data changes weekly or faster, citations are mandatory, and your ML team is early-stage. It keeps the base model frozen, embeds your corpus into a vector store, and fetches only the relevant chunks at query time — giving you verifiable outputs and straightforward data governance without a training run.

Pick this when

  • Data updates daily or faster
  • Audit-grade citations are required
  • Corpus exceeds 10 K documents
Read pattern deep-dive
Fine-Tune·For tight latency + domain voice

Parameter-Efficient Fine-Tuning (LoRA / QLoRA)

Fine-tuning shines when your domain has highly specialised vocabulary, a strict output format, or latency requirements below 300 ms. LoRA and QLoRA adapt only a small fraction of model weights, keeping training costs manageable ($1 K–$25 K per run). The resulting model is faster at inference and requires no retrieval hop, but it cannot incorporate new information without a retraining cycle.

Pick this when

  • Domain vocabulary is highly specialised (medical, legal, financial jargon)
  • Consistent output format or tone is required
  • Latency SLA < 300 ms and retrieval hop is unacceptable
Read pattern deep-dive
Long-Ctx·For small, static corpora

Long-Context Prompting

Long-context prompting stuffs your entire relevant document set into the model's context window — up to 1 M tokens with models like Gemini 1.5 Pro or Claude 3.5. It requires zero training, zero vector infrastructure, and delivers an answer in a single API call. It is the right default for small corpora (< 500 documents) with low query volumes, where simplicity outweighs per-query token cost.

Pick this when

  • Corpus fits in < 200 K tokens (a few hundred documents)
  • Query volume < 50 K/month
  • No ML team — zero setup beyond an API key
Read pattern deep-dive
Hybrid·For accuracy + consistent style

Hybrid / Retrieval-Augmented Fine-Tuning (RAFT)

Hybrid (also called RAFT) combines RAG's real-time retrieval with fine-tuning's domain adaptation. The model is trained to reason over retrieved documents — significantly reducing hallucination compared to RAG alone while preserving the ability to incorporate new information. It is the highest-performance option but also the most expensive and operationally complex. Recommended only when capability ≥ 3 and budget ≥ $15 K/mo.

Pick this when

  • Both citation accuracy and domain vocabulary are critical
  • Query volume > 1 M/month (justifies the training investment)
  • Strong in-house ML team (capability ≥ 3)
Read pattern deep-dive

FAQ

Frequently asked questions

Common questions about how the decision engine works and how to interpret your recommendation.

It asks 9 questions about your data freshness, query volume, citation needs, latency SLA, data sensitivity, domain specificity, ML team capability, and budget, then returns a deterministic recommendation — RAG, Fine-Tuning, Long-Context, or Hybrid — plus a four-way cost comparison, an architecture diagram, a risk register, and a CFO-ready PDF.

Get help deciding

Want a second opinion on the recommendation?

Boek een 20-minuten architectuur-review met ons team. We controleren de scoring tegen uw beperkingen en delen praktische implementatie-notities.