frankminors123
commited on
Commit
•
2b6173b
1
Parent(s):
a3a74c7
Update README.md
Browse files
README.md
CHANGED
@@ -8,9 +8,9 @@ tags:
|
|
8 |
---
|
9 |
# Chinese-CodeLlama-7B-PT
|
10 |
|
11 |
-
We have further expanded the vocabulary based on Chinese-LLaMA-2-7B which from 599296 to 75548,
|
12 |
|
13 |
-
The training data contains approximately 400 million tokens
|
14 |
|
15 |
In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space.
|
16 |
|
|
|
8 |
---
|
9 |
# Chinese-CodeLlama-7B-PT
|
10 |
|
11 |
+
We have further expanded the vocabulary based on Chinese-LLaMA-2-7B which from 599296 to 75548, it is worth noting that the most of them are code tokens. And we pre-trained the model based on LoRA, including `embed_tokens` and `lm_head` layers.
|
12 |
|
13 |
+
The training data contains approximately 400 million tokens which from high-quality code dataset on HuggingFace.
|
14 |
|
15 |
In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space.
|
16 |
|