Learning Compressing Large Language Generation Models with Sequence-Level Knowledge Distillation a month ago • 9 min read