Skip to main content

🎻 Fine-tuning

Teaching an expert new tricks

The Specialist Training Analogy

A general doctor knows medicine broadly. A cardiologist went through additional, specialized training to become an expert in heart conditions.

Fine-tuning is specialized training for AI models.

A base model like GPT or Llama knows language broadly. Fine-tuning trains it further on specific data - medical records, legal documents, your company's style guide - making it an expert in that domain.


Pre-training vs Fine-tuning

Pre-training

Trained from scratch on massive, general data:

  • Trillions of tokens from the internet
  • Costs millions of dollars
  • Takes months on thousands of GPUs
  • Learns language, facts, reasoning

Fine-tuning

Additional training on smaller, specific data:

  • Thousands to millions of examples
  • Costs hundreds to thousands of dollars
  • Takes hours to days
  • Learns domain knowledge, style, format
Pre-trained Model (general)
         ↓
    Fine-tuning
         ↓
Specialized Model (expert)

When to Fine-tune

Good Reasons

Use CaseExample
Consistent style/formatMore consistent JSON output
Domain expertiseMedical, legal terminology
Proprietary knowledgeCompany-specific processes
Behavior modificationCustomer service tone
Language/jargonIndustry-specific terms

Not Great Reasons

Instead UseWhy
RAGFor accessing current or changing knowledge
PromptingFor simple format or style changes
Few-shotWhen a few examples are enough

Fine-tuning Process

1. Prepare Data

Format as input-output pairs:

[
  {
    "messages": [
      { "role": "system", "content": "You are a helpful legal assistant." },
      { "role": "user", "content": "What is consideration in contract law?" },
      {
        "role": "assistant",
        "content": "Consideration is something of value..."
      }
    ]
  }
  // Hundreds or thousands more examples
]

2. Fine-tune

from openai import OpenAI
client = OpenAI()

# Upload training file
file = client.files.create(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune"
)

# Start fine-tuning
job = client.fine_tuning.jobs.create(
    training_file=file.id,
  model="<base-model>"
)

3. Use the Model

response = client.chat.completions.create(
  model="<fine-tuned-model-id>",
    messages=[
        {"role": "user", "content": "What is promissory estoppel?"}
    ]
)

Fine-tuning Techniques

Full Fine-tuning

Update all model weights. Most thorough but expensive.

LoRA (Low-Rank Adaptation)

Train small adapter layers, freeze main weights:

Original:     Large model (billions of params) - frozen
LoRA:         Small adapters (millions of params) - trained
Final:        Original + LoRA = Specialized model

Benefits:

  • Much smaller file size
  • Faster training
  • Can swap adapters
  • Less risk of catastrophic forgetting

QLoRA

LoRA with quantized base model. Even more memory efficient.


Fine-tuning vs RAG

AspectFine-tuningRAG
Knowledge updateRetrain requiredUpdate documents
What it changesModel behaviorModel context
CostHigher (GPU time)Lower (retrieval)
Often used forStyle, format, behaviorFacts, current info
HallucinationsStill possibleReduced (grounded)

Use both: Fine-tune for style/behavior, RAG for knowledge.


Common Mistakes and Gotchas

Not Enough Quality Data

Bad:  100 examples, varying quality
Good: 1000+ examples, carefully curated

Quality matters more than quantity. Clean, consistent examples produce better results.

Overfitting

Model memorizes training data instead of learning patterns:

# Signs of overfitting:
# - Training loss near zero
# - Validation loss still high
# - Model repeats training examples verbatim

# Solutions:
# - More diverse data
# - Early stopping
# - Regularization

Catastrophic Forgetting

Model forgets general capabilities while learning specific ones. Use:

  • LoRA instead of full fine-tuning
  • Mix general examples with specialized ones

Wrong Expectations

Fine-tuning teaches style and format, not facts. For factual knowledge, use RAG. Fine-tuning can even reinforce factual errors if they're in training data.


FAQ

Q: How much data do I need?

Minimum: ~100 examples. Better results: 500-5000 examples. Quality and diversity matter more than raw quantity.

Q: How long does fine-tuning take?

Depends on model size and data. Small models with small datasets: minutes. Large models: hours to days.

Q: Can I fine-tune any model?

It depends on the provider and the model. Some models are typically fine-tuned via a hosted API, while many open models can be fine-tuned locally. It's a good idea to check model licenses and provider terms for commercial use.

Q: What is the cost?

Pricing varies by provider, model size, and dataset size. For up-to-date costs, check the provider's pricing page or estimate based on your training token count and hardware.

Q: Can fine-tuning make models smarter?

Not really. Fine-tuning adjusts existing capabilities, not adds new reasoning ability. It changes what the model does, not what it can do.

Q: What is instruction tuning?

A specific type of fine-tuning that teaches models to follow instructions. ChatGPT was instruction-tuned from GPT-3.


Summary

Fine-tuning can often help adapt pre-trained models to specific domains, styles, or behaviors. It can be powerful, but it's one option among several.

Key Points:

  • Fine-tuning = specialized training on specific data
  • Good for: style, format, domain expertise
  • Use RAG for: factual knowledge, current information
  • LoRA is more efficient than full fine-tuning
  • Quality data > quantity of data
  • Beware overfitting and catastrophic forgetting
  • Often combined with RAG for better results

Fine-tuning is a tool, not a magic solution. Understand when it helps and when alternatives (prompting, RAG) are better.

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.