frankminors123
commited on
Commit
•
a3a74c7
1
Parent(s):
eb15d06
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- zh
|
5 |
+
- en
|
6 |
+
tags:
|
7 |
+
- code
|
8 |
+
---
|
9 |
+
# Chinese-CodeLlama-7B-PT
|
10 |
+
|
11 |
+
We have further expanded the vocabulary based on Chinese-LLaMA-2-7B which from 599296 to 75548, and we pre-trained the model based on LoRA, including `embed_tokens` and `lm_head` layers.
|
12 |
+
|
13 |
+
The training data contains approximately 400 million tokens, which from high-quality code dataset on HuggingFace.
|
14 |
+
|
15 |
+
In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space.
|
16 |
+
|
17 |
+
Our model can be used for SFT, and we hope to contribute more valuable work in the Chinese field.
|