SFT of Instruct Model Collapse

Even when using SFT based on the Instruct model, normal model weights are often generated.

However, since the Instruct model often already has its own quirks regarding special tokens and chat templates compared to the Base model, if it cannot mimic these, the result may appear broken. Also, note that older models may lack chat templates altogether or have inappropriate ones.


Short, direct answer:
No, it is not a rule that SFT on an already-Instruct model will “collapse” into gibberish. Lots of people successfully SFT Instruct/chat models every day. What does happen quite often is that SFT pipelines that are slightly wrong break Instruct models much more visibly than base models.

Your symptom:

  • “absolute gibberish”
  • “not even getting to eos token”

matches technical / configuration problems, not a fundamental “you SFTed an Instruct model, therefore it died” law.

Below I’ll put your experience in context, then connect it to concrete online reports and the real failure modes.


1. Background: what you did and why the model reacted badly

You thought you were fine-tuning a pretrained/base model.
In reality, the checkpoint was already instruction-tuned and chat-templated.

That matters because:

  1. A base model has:

    • no special chat tokens,
    • no strong expectations on formatting,
    • is relatively tolerant to you just feeding “text in, text out.”
  2. An Instruct/chat model usually:

    • expects a very specific chat template (<|user|> ... <|assistant|> ... <eos> or similar),
    • has special tokens baked into its vocabulary and training,
    • may have been trained with assistant-only loss on top of that format.

If you fine-tune it as if it were a plain LM (no template, wrong special tokens, wrong EOS handling, wrong masking, wrong tokenizer), you can:

  • distort the probability distribution over tokens so that:

    • language degenerates into nonsense,
    • EOS is never produced,
  • and/or break things at save/load or deployment time so the weights no longer match the tokenizer.

That looks like “the model collapsed because it was Instruct,” but the underlying cause is: you violated the assumptions that Instruct model was trained under.


2. Has anyone else seen this? Yes – a lot. But the causes are concrete.

Here are closely matching real reports.

2.1 Gemma-3-1B-IT: Instruct model → gibberish after instruction SFT

On the official Gemma 3 Instruct (google/gemma-3-1b-it), a user fine-tunes for instruction tasks and gets gibberish outputs. Google’s response:

  • likely mismatch between tokenizer and chat template used in fine-tuning vs what the Instruct model expects,
  • plus possible problems with fp16 numerics or data formatting. (Hugging Face)

They explicitly note that the base Gemma model (no chat special tokens) didn’t break under the same pipeline, exactly like your experience.

So: same pattern you saw, and the root cause is format/tokenizer mismatch, not a general rule that “instruct models collapse.”


2.2 Llama-2-7B-chat + softprompt: base OK, Instruct+prompt = pure gibberish

In a Hugging Face forum thread “Softprompt for Llama generating gibberish output”, a user:

  • trains a soft prompt on top of meta-llama/Llama-2-7b-chat-hf,
  • base chat model alone is fine,
  • but with the softprompt attached, generations are nonsensical character salad. (Hugging Face Forums)

The diagnosis revolves around:

  • mis-handling of input formatting with the softprompt,
  • incorrect placement of the prompt embeddings,
  • and mismatched tokenization.

Again, it’s an Instruct model, and it collapses only when the additional SFT / PEFT layer is wired wrong.


2.3 GPT-2 persona-chat: base LM → fine-tune → weird chat gibberish

Older, non-instruct example, but exact same symptom:

  • “Fine tuning GPT2 on persona chat dataset outputs gibberish” (HF forums). (Hugging Face Forums)

  • Training loss looks reasonable, but responses are a mess.

  • Discussion points to:

    • how dialogue is concatenated,
    • tokenization,
    • truncation and decoding config.

This shows: you can get the same kind of collapse without any Instruct model at all if you mishandle the data/format.


2.4 Gemma-3-1B-PT: fine-tuned model OK, but gibberish after quantization

Another closely related pattern:

  • User fine-tunes google/gemma-3-1b-pt (pretrained/base) with Unsloth + LoRA using ChatML.
  • Full-precision merged model responds correctly.
  • After GPTQ/AWQ/BitsAndBytes quantization, all quantized versions produce gibberish or empty outputs. (Hugging Face)

Here the SFT is fine, but the deployment step (quantization / conversion) breaks the model, making it look collapsed.


2.5 BART: fine-tune → gibberish until decoder is fixed

Seq2seq but the same underlying pattern:

  • “What can cause model.generate (BART) output to be gibberish after fine-tuning?” (HF forums). (Hugging Face Forums)

  • After fine-tuning, generate() gives nonsense.

  • Official answer: their decoder inputs were wrong:

    • not shifting decoder labels,
    • not using correct decoder_start_token_id.
  • Once they fix decoding, outputs become normal.

Again: the training isn’t inherently doomed; the mechanics were off.


3. What’s actually going on when you see “gibberish + no EOS”

Based on those reports and your description, the failure is almost always one (or a stack) of these:

  1. Tokenizer / vocab / EOS mismatch

    • Using a tokenizer that doesn’t match the checkpoint.
    • Changing special tokens (e.g., adding chat tokens) and not resizing or retraining the final layer correctly.
    • Accidentally changing EOS or pad IDs so they no longer line up.
    • Result: token IDs at inference don’t match what the model learned → gibberish, EOS never sampled. (Hugging Face)
  2. Ignoring the model’s chat template

    • Instruct model expects something like user: ... assistant: ... <eos>.
    • SFT data is fed as raw text or in a different homemade format.
    • Inference still uses the “official” chat template (or vice versa).
    • Result: the model’s learned structure and your training structure clash → poor logits, nonsense, or strange termination. (Hugging Face)
  3. Over-aggressive learning rate / training schedule

    • Especially on already-aligned Instruct models, large LR or too many epochs can push weights far from a good optimum.
    • T5 fine-tuning threads show that lr=1e-4, 10 epochs is enough to turn fluent T5 into gibberish. (Hugging Face Forums)
  4. Broken save/load or quantization

    • Checkpoints saved without the right PEFT adapters,
    • LoRA merges done incorrectly,
    • conversions to GGUF/AWQ/GPTQ that scramble weights.
    • That’s exactly what you see in the Gemma-3-1B-PT quantization thread and Unsloth GGUF issues. (Hugging Face)
  5. NaNs / numerical instability

    • bf16/fp16 with unstable kernels, mis-configured FlashAttention, or bad gradient scales can silently corrupt weights.
    • The model will still run but outputs nonsense and EOS probabilities are off.

None of these are about “the model already being Instruct” as a fundamental problem. They are about technical mismatch between your pipeline and the model’s assumptions.


4. Why Instruct models feel more fragile than base ones

This is probably the core of your intuition.

  1. More structure baked in.
    Instruct/chat models are trained on very regular patterns:

    • role tokens and separators,
    • explicit EOS at turn boundaries,
    • often assistant-only loss.

    That structure becomes part of their internal distribution. If you fine-tune with a different structure (no roles, different separators, missing EOS), you’re effectively telling it “unlearn what you previously knew,” and it can go off the rails faster than a base model that never had that structure.

  2. Less “room to move” without breaking behavior.
    A base model is a general LM; you can often push it a bit and it’ll remain a fluent LM. An Instruct model has already been optimized for:

    • politeness,
    • format,
    • safety constraints.

    Aggressive SFT or wrong formatting pushes against that alignment and can degrade behavior more visibly.

  3. But they are routinely SFTed successfully.
    At the same time:

    • TRL’s SFT examples use Instruct models like Llama-3-Instruct, Gemma-it, Qwen-chat as standard starting points for domain SFT. (Hugging Face Forums)
    • There are public tutorials specifically for fine-tuning Gemma-3-1B-IT for tasks like financial sentiment, which work fine when the pipeline is correct. (Medium)

    So empirically, there is no “instruct → instant collapse” rule; there is “instruct → more sensitive to mistakes.”


5. What this means for your rule-question

Your question:

Is it a rule that if we try to SFT a model that is already Instruct model that it will collapse at the end?

Clear answer:

  • No, there is no such rule.

  • There is a strong pattern that:

    • if you SFT an Instruct model as if it were a base model,
    • ignore its chat template,
    • mis-handle special tokens/EOS,
    • or use aggressive LR / bad save-load / bad quantization,

    then you can absolutely get hard collapse (gibberish, no EOS) like you saw.

But this is a consequence of pipeline mismatch, not an inherent property of “instructness.”


6. Practical takeaway (if you do this again)

To avoid repeating the collapse you saw:

  1. Always confirm what the checkpoint actually is.

    • Check the model card: is it *-it, *-Instruct, *-Chat, etc.?
    • If yes, assume it needs its documented chat template and special tokens.
  2. Use the official SFT pattern for that family.

    • For Gemma, follow Google/HF + TRL SFT guides. (Google AI for Developers)
    • For Llama, Qwen, etc., use tokenizer.apply_chat_template and TRL’s SFTTrainer as shown in their examples.
  3. Keep tokenizer + model bound together.

    • Same repo ID and revision for both.
    • Save and reload them together.
    • After loading, check vocab size vs LM head size.
  4. Be gentle with hyperparameters, especially on Instruct.

    • Lower LR than you’d use for a base model.
    • Fewer epochs / early stopping.
  5. Test mid-training and immediately after saving.

    • If mid-training generations look fine but the final checkpoint is broken, suspect save/load or quantization, not SFT itself.

If you line those up, SFT on an Instruct model is just another normal step in the stack (pretrain → SFT → preference optimization), not something that “must” end in collapse.

1 Like