Architecture Pattern

Hybrid / Retrieval-Augmented Fine-Tuning (RAFT)

Hybrid (also called RAFT) combines RAG's real-time retrieval with fine-tuning's domain adaptation. The model is trained to reason over retrieved documents — significantly reducing hallucination compared to RAG alone while preserving the ability to incorporate new information. It is the highest-performance option but also the most expensive and operationally complex. Recommended only when capability ≥ 3 and budget ≥ $15 K/mo.

All RAG costs (embeddings, vector DB, retrieval inference)Fine-tuning training run ($5 K–$25 K amortised)Hosted fine-tuned + RAG inference (30–40% premium over RAG alone)RAFT dataset construction (often 2–4 weeks of ML engineering time)

Cost model

  • All RAG costs (embeddings, vector DB, retrieval inference)
  • Fine-tuning training run ($5 K–$25 K amortised)
  • Hosted fine-tuned + RAG inference (30–40% premium over RAG alone)
  • RAFT dataset construction (often 2–4 weeks of ML engineering time)

When to pick this pattern

  • Both citation accuracy and domain vocabulary are critical
  • Query volume > 1 M/month (justifies the training investment)
  • Strong in-house ML team (capability ≥ 3)
  • Budget ≥ $15 K/month
  • Regulatory requirement for both provenance and specialised terminology

When to avoid it

  • ML team capability < 3
  • Budget < $10 K/month
  • Time-to-production < 4 weeks
  • Corpus changes faster than weekly (retraining cadence cannot keep up)

Common pitfalls

  • RAFT dataset construction is labour-intensive — requires paired (document, query, answer) triples
  • Two failure modes to debug instead of one (retrieval failures + model failures)
  • Training run must be repeated when base model is deprecated

Frequently asked questions

Is Hybrid / Retrieval-Augmented Fine-Tuning (RAFT) right for your workload?

Answer 9 questions to get a deterministic recommendation, cost crossover chart, and PDF report.

Run the full decision wizard