Fine-Tuning GPT-2: PAD Token Pitfalls, Label Masking & Prompt Token Mistakes to Avoid
Avoid silent training bugs when fine-tuning GPT-2: learn how PAD/EOS token conflicts, data collator label masking, and prompt tokens can corrupt your loss signal.
William Briggs
3/3/20261 min read