frankminors123 commited on
Commit
517e389
1 Parent(s): ad6211d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -16,4 +16,6 @@ The training data contains approximately 400 million tokens which from high-qual
16
 
17
  In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space. If you want to quickly use this technology in your LLaMA model, you can refer to my GitHub: https://github.com/FrankMinions/memory_efficient_adapter.
18
 
19
- Our model can be used for SFT, and we hope to contribute more valuable work in the Chinese field.
 
 
 
16
 
17
  In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space. If you want to quickly use this technology in your LLaMA model, you can refer to my GitHub: https://github.com/FrankMinions/memory_efficient_adapter.
18
 
19
+ Our model can be used for SFT, and we hope to contribute more valuable work in the Chinese field.
20
+
21
+ The second version of our fine-tuned model named [Chinese-CodeLlama-7B-SFT-V2](https://huggingface.co/frankminors123/Chinese-CodeLlama-7B-SFT-V2) has been launched. We use a sequence length of 1k for pre-training (this model), and continue training based on this length during the fine-tuning stage. Based on a larger base period of rotary positional embeddings, it can support up 15k context length extrapolation at inference time.