frankminors123
/

Chinese-CodeLlama-7B-PT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

frankminors123 commited on Nov 23, 2023

Commit

517e389

•

1 Parent(s): ad6211d

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -16,4 +16,6 @@ The training data contains approximately 400 million tokens which from high-qual
 In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space. If you want to quickly use this technology in your LLaMA model, you can refer to my GitHub: https://github.com/FrankMinions/memory_efficient_adapter.
-Our model can be used for SFT, and we hope to contribute more valuable work in the Chinese field.

 In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space. If you want to quickly use this technology in your LLaMA model, you can refer to my GitHub: https://github.com/FrankMinions/memory_efficient_adapter.
+Our model can be used for SFT, and we hope to contribute more valuable work in the Chinese field.
+The second version of our fine-tuned model named [Chinese-CodeLlama-7B-SFT-V2](https://huggingface.co/frankminors123/Chinese-CodeLlama-7B-SFT-V2) has been launched. We use a sequence length of 1k for pre-training (this model), and continue training based on this length during the fine-tuning stage. Based on a larger base period of rotary positional embeddings, it can support up 15k context length extrapolation at inference time.