Learning Compressing Large Language Generation Models with Sequence-Level Knowledge Distillation a year ago • 9 min read