kalo-team
/

llama3-4x8b-pythonT2_step_final

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

kalomaze commited on May 22

Commit

e7b159b

•

1 Parent(s): e7766e1

Update README.md

Files changed (1) hide show

README.md +7 -0

README.md CHANGED Viewed

@@ -1,3 +1,10 @@
 # 70b Distillation Experiment
 This is not the full-fledged run that I plan to do for a large scale distillation of Llama3 70b.
 Instead, it's a preliminary test train of the custom distillation trainer, where we target KL divergence from the larger Llama3 70b teacher model onto 4x8b (the student).

+---
+license: llama3
+language:
+- en
+tags:
+- code
+---
 # 70b Distillation Experiment
 This is not the full-fledged run that I plan to do for a large scale distillation of Llama3 70b.
 Instead, it's a preliminary test train of the custom distillation trainer, where we target KL divergence from the larger Llama3 70b teacher model onto 4x8b (the student).