kd - a ByRookie Collection

ByRookie 's Collections

kd

pretrain data selectection

llm length control

dataset

kd

updated 9 days ago

Aligning Teacher with Student Preferences for Tailored Training Data Generation

Paper • 2406.19227 • Published Jun 27 • 24
Pre-training Distillation for Large Language Models: A Design Space Exploration

Paper • 2410.16215 • Published 11 days ago • 15
Baichuan Alignment Technical Report

Paper • 2410.14940 • Published 13 days ago • 46
MiniPLM: Knowledge Distillation for Pre-Training Language Models

Paper • 2410.17215 • Published 10 days ago • 12