frankminors123
commited on
Commit
•
b5a333d
1
Parent(s):
0671c1f
Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,6 @@ tags:
|
|
12 |
|
13 |
The training data contains approximately 400 million tokens which from high-quality code dataset on HuggingFace.
|
14 |
|
15 |
-
In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space.
|
16 |
|
17 |
Our model can be used for SFT, and we hope to contribute more valuable work in the Chinese field.
|
|
|
12 |
|
13 |
The training data contains approximately 400 million tokens which from high-quality code dataset on HuggingFace.
|
14 |
|
15 |
+
In addition, we applied `memory_efficient_attention` to the pre-training, which saves us a lot of GPU memory space. If you want to quickly use this technology in your LLaMA model, you can refer to my GitHub: https://github.com/FrankMinions/memory_efficient_adapter.
|
16 |
|
17 |
Our model can be used for SFT, and we hope to contribute more valuable work in the Chinese field.
|