Learning Compressing Large Language Generation Models with Sequence-Level Knowledge Distillation 3 months ago • 9 min read