frankminors123
/

Chinese-CodeLlama-7B-PT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

frankminors123 commited on Nov 17, 2023

Commit

ad6211d

•

1 Parent(s): 6a38a01

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -8,7 +8,7 @@ tags:
 ---
 # Chinese-CodeLlama-7B-PT
-We have further expanded the vocabulary based on Chinese-LLaMA-2-7B which from 55296 to 75548, it is worth noting that the most of them are code tokens. On [MBPP](https://huggingface.co/datasets/mbpp), we calculated the compression rate of the tokenizer to be 38.59%. However, the value is 42.19% for [chinese-llama-2-7b](https://huggingface.co/hfl/chinese-llama-2-7b).
 We pre-trained the model based on LoRA which the rank is 8 and the trainable LoRA layers contain `q_proj` and `v_proj`, at the same time, `embed_tokens` and `lm_head` layers were trained with full parameters. All trainable parameters are float32.

 ---
 # Chinese-CodeLlama-7B-PT
+We have further expanded the vocabulary based on Chinese-LLaMA-2-7B which from 55296 to 75548, it is worth noting that the most of them are code tokens. On [MBPP](https://huggingface.co/datasets/mbpp), we calculated the compression rate of the tokenizer to be 4.509 `bytes/token`, and we will reduce this value in the future work to improve training and inference efficiency.
 We pre-trained the model based on LoRA which the rank is 8 and the trainable LoRA layers contain `q_proj` and `v_proj`, at the same time, `embed_tokens` and `lm_head` layers were trained with full parameters. All trainable parameters are float32.