Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,9 @@ base_model: meta-llama/LLaMA-2-7B
|
|
25 |
|
26 |
### Overview
|
27 |
|
28 |
-
This model is a distilled version of LLaMA 2, containing approximately 80 million parameters.
|
|
|
|
|
29 |
This version is the latest version of DistilLlama, which has gone through 5 days of training using two Nvidia A100 80G GPU.
|
30 |
|
31 |
### Model Architecture
|
@@ -98,4 +100,4 @@ The architecture is based on LLaMA 2, with the following parameters:
|
|
98 |
url={https://arxiv.org/abs/2308.02019},
|
99 |
}
|
100 |
|
101 |
-
*Note: The repository will be updated as training progresses. Last update 2024-
|
|
|
25 |
|
26 |
### Overview
|
27 |
|
28 |
+
This model is a distilled version of LLaMA 2, containing approximately 80 million parameters.
|
29 |
+
It was trained using a mix of OpenWebText and WikiText Raw V1 datasets.
|
30 |
+
Knowledge distillation was employed to transfer knowledge from a larger "teacher" model—Meta’s 7B LLaMA 2—to help this smaller model mimic the behavior of the teacher.
|
31 |
This version is the latest version of DistilLlama, which has gone through 5 days of training using two Nvidia A100 80G GPU.
|
32 |
|
33 |
### Model Architecture
|
|
|
100 |
url={https://arxiv.org/abs/2308.02019},
|
101 |
}
|
102 |
|
103 |
+
*Note: The repository will be updated as training progresses. Last update 2024-11-01*
|