frankminors123
commited on
Commit
•
517e389
1
Parent(s):
ad6211d
Update README.md
Browse files
README.md
CHANGED
@@ -16,4 +16,6 @@ The training data contains approximately 400 million tokens which from high-qual
|
|
16 |
|
17 |
In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space. If you want to quickly use this technology in your LLaMA model, you can refer to my GitHub: https://github.com/FrankMinions/memory_efficient_adapter.
|
18 |
|
19 |
-
Our model can be used for SFT, and we hope to contribute more valuable work in the Chinese field.
|
|
|
|
|
|
16 |
|
17 |
In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space. If you want to quickly use this technology in your LLaMA model, you can refer to my GitHub: https://github.com/FrankMinions/memory_efficient_adapter.
|
18 |
|
19 |
+
Our model can be used for SFT, and we hope to contribute more valuable work in the Chinese field.
|
20 |
+
|
21 |
+
The second version of our fine-tuned model named [Chinese-CodeLlama-7B-SFT-V2](https://huggingface.co/frankminors123/Chinese-CodeLlama-7B-SFT-V2) has been launched. We use a sequence length of 1k for pre-training (this model), and continue training based on this length during the fine-tuning stage. Based on a larger base period of rotary positional embeddings, it can support up 15k context length extrapolation at inference time.
|