Hi there,
you can expand it, but the “turn 3 sentences into 300 words” trick mostly gives you more tokens, not more learning. It teaches the model to waffle confidently.
If you want to multiply data without changing the ideas, do this instead (simple + practical):
Don’t just expand words — expand coverage. For each “ground truth” text, generate lots of different tasks:
yes/no Qs + the correct label
“is this supported by the text? (yes/no/not enough info)”
extract the exact line that proves it (evidence span)
rewrite the claim in 5 different ways (hard rephrasing)
“spot what’s wrong with this answer” (hallucination / overclaim / missing evidence)
For controversial stuff, generate both sides. If you only generate supporting arguments, you’re basically training it to rationalise. Better format:
Question → Answer → Evidence → Strongest counterpoint → What we don’t know
That keeps the “idea” but trains honesty and restraint.
Add “not enough info” examples on purpose. This is the big one. Most models fail by answering anyway.
Dedup hard. Synthetic data repeats itself fast. If your outputs look similar, the model just memorises the style.
RL / rewards: don’t reward “aligned ideas”. Reward good behaviour: cites evidence, admits uncertainty, doesn’t invent facts, fairly summarises the other side. Otherwise RL just makes it more stubborn.
That’s basically the efficient path: more labels + constraints + adversarial cases, not longer essays.
hope this is helpful,
RFTSystems, Liam.