frankminors123's picture
Update README.md
b5a333d
|
raw
history blame
931 Bytes
metadata
license: apache-2.0
language:
  - zh
  - en
tags:
  - code

Chinese-CodeLlama-7B-PT

We have further expanded the vocabulary based on Chinese-LLaMA-2-7B which from 55296 to 75548, it is worth noting that the most of them are code tokens. We pre-trained the model based on LoRA which the rank is 8 and the trainable LoRA layers contain q_proj and v_proj, at the same time, embed_tokens and lm_head layers were trained with full parameters.

The training data contains approximately 400 million tokens which from high-quality code dataset on HuggingFace.

In addition, we applied memory_efficient_attention to the pre-training, which saves us a lot of GPU memory space. If you want to quickly use this technology in your LLaMA model, you can refer to my GitHub: https://github.com/FrankMinions/memory_efficient_adapter.

Our model can be used for SFT, and we hope to contribute more valuable work in the Chinese field.