OpenNLPLab
commited on
Commit
•
39c0658
1
Parent(s):
f3bfc9d
Update README.md
Browse files
README.md
CHANGED
@@ -20,7 +20,7 @@ This official repository unveils the TransNormerLLM3 model along with its open-s
|
|
20 |
|
21 |
[TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
|
22 |
|
23 |
-
> [email protected]: We plan to
|
24 |
|
25 |
# TransNormerLLM3
|
26 |
- **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**.
|
|
|
20 |
|
21 |
[TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
|
22 |
|
23 |
+
> [email protected]: We plan to scale the sequence length in pre-training stage to **10 million**: https://twitter.com/opennlplab/status/1776894730015789300
|
24 |
|
25 |
# TransNormerLLM3
|
26 |
- **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**.
|