frankminors123
commited on
Commit
•
ad6211d
1
Parent(s):
6a38a01
Update README.md
Browse files
README.md
CHANGED
@@ -8,7 +8,7 @@ tags:
|
|
8 |
---
|
9 |
# Chinese-CodeLlama-7B-PT
|
10 |
|
11 |
-
We have further expanded the vocabulary based on Chinese-LLaMA-2-7B which from 55296 to 75548, it is worth noting that the most of them are code tokens. On [MBPP](https://huggingface.co/datasets/mbpp), we calculated the compression rate of the tokenizer to be
|
12 |
|
13 |
We pre-trained the model based on LoRA which the rank is 8 and the trainable LoRA layers contain `q_proj` and `v_proj`, at the same time, `embed_tokens` and `lm_head` layers were trained with full parameters. All trainable parameters are float32.
|
14 |
|
|
|
8 |
---
|
9 |
# Chinese-CodeLlama-7B-PT
|
10 |
|
11 |
+
We have further expanded the vocabulary based on Chinese-LLaMA-2-7B which from 55296 to 75548, it is worth noting that the most of them are code tokens. On [MBPP](https://huggingface.co/datasets/mbpp), we calculated the compression rate of the tokenizer to be 4.509 `bytes/token`, and we will reduce this value in the future work to improve training and inference efficiency.
|
12 |
|
13 |
We pre-trained the model based on LoRA which the rank is 8 and the trainable LoRA layers contain `q_proj` and `v_proj`, at the same time, `embed_tokens` and `lm_head` layers were trained with full parameters. All trainable parameters are float32.
|
14 |
|