Architecture Pattern

Parameter-Efficient Fine-Tuning (LoRA / QLoRA)

Fine-tuning shines when your domain has highly specialised vocabulary, a strict output format, or latency requirements below 300 ms. LoRA and QLoRA adapt only a small fraction of model weights, keeping training costs manageable ($1 K–$25 K per run). The resulting model is faster at inference and requires no retrieval hop, but it cannot incorporate new information without a retraining cycle.

Training compute ($1 K–$25 K per LoRA run depending on model size and dataset)Amortised training cost over 6-month deployment windowHosted fine-tuned model inference (typically 20% premium over base)Periodic retraining reserve (~2× initial cost per year)

Cost model

Training compute ($1 K–$25 K per LoRA run depending on model size and dataset)
Amortised training cost over 6-month deployment window
Hosted fine-tuned model inference (typically 20% premium over base)
Periodic retraining reserve (~2× initial cost per year)

When to pick this pattern

✓Domain vocabulary is highly specialised (medical, legal, financial jargon)
✓Consistent output format or tone is required
✓Latency SLA < 300 ms and retrieval hop is unacceptable
✓Dataset of 1 K–100 K high-quality labelled examples is available
✓Static or slow-moving corpus (refreshes monthly or less)
✓Strong ML team in-house (capability ≥ 4)

When to avoid it

✗Data freshness requirement is daily or faster
✗Audit-grade citations are mandatory (fine-tuned models hallucinate provenance)
✗ML team is at capability level 1 or 2
✗Budget ceiling < $5 K (training run alone may exceed this)

Common pitfalls

⚠Catastrophic forgetting if training data distribution is too narrow
⚠Hallucinated citations — model may fabricate sources not in training data
⚠Hidden retraining costs when underlying base model is deprecated

Related use cases

Software Engineering

Compliance Q&A Assistant

Legal / Compliance / RegTech

Translation Quality Assurance

Localisation / Publishing

Frequently asked questions

Is Parameter-Efficient Fine-Tuning (LoRA / QLoRA) right for your workload?

Answer 9 questions to get a deterministic recommendation, cost crossover chart, and PDF report.

Run the full decision wizard →