Learning Compressing Large Language Generation Models with Sequence-Level Knowledge Distillation 2 years ago • 9 min read