Apologies if this seems like a survey post. I’m just learning about tuning and want to get a lay of the land. I don’t think I have the money to tune locally so might have to rent some VRAM, but curious how much better tuning is vs something like RAG.

What model? What was your use case? What tuning tool did you use? What is hardware setup? How large was your training set and how did you create it? How effective was the model as tasks pre- and post-tuning?

Thanks!

  • pyr0ball@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    5 days ago

    Yeah, done two separate things in this space.

    Cover letter fine-tuning: Llama-3.2-3B-Instruct as the base, QLoRA via Unsloth (rank 16, 10 epochs). Trained on ~62 of my own cover letters, exported to GGUF, loaded into Ollama. Fits comfortably on 8GB VRAM with 4-bit quantisation. Noticeably more consistent than prompting a generic model for voice and style matching.

    Email classification: completely different story. Classifier models for routing emails into categories (rejection, interview scheduled, offer, etc.) don’t need a GPU at all. DeBERTa-small runs on CPU in milliseconds. The hard part is the labeling pipeline. We bootstrapped with deterministic heuristics to auto-label high-confidence cases, then routed uncertain ones to a human review queue. Around 2,000 labeled examples was enough for meaningful accuracy.

    vs RAG: for classification, fine-tuning wins cleanly. RAG is better when you need to reason over retrieved documents. If you’re making a consistent categorical judgment, you want it baked into the weights, not reconstructed from context at inference time.


    I build local-first process pipeline tooling at circuitforge.tech

    • venusaur@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 days ago

      Oh that’s really interesting! I’m also interested in the classification case. Can you tell me more or direct to where to learn more about DeBerta? Do you train it the same way? Prompt and response sets? Does it work on any open source model? I can only run up to 4B right now.