OpenNLPLab
/

TransNormerLLM3-15B-Intermediate-Checkpoints

Text Generation

Model card Files Files and versions Community

OpenNLPLab commited on Apr 7

Commit

39c0658

•

1 Parent(s): f3bfc9d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ This official repository unveils the TransNormerLLM3 model along with its open-s
 [TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
-> [email protected]: We plan to increase the sequence length in pre-training stage to **10 million**: https://twitter.com/opennlplab/status/1776894730015789300
 # TransNormerLLM3
 - **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**.

 [TransNormerLLM](https://arxiv.org/abs/2307.14995) evolving from [TransNormer](https://arxiv.org/abs/2210.10340), standing out as the first LLM within the linear transformer architecture. Additionally, it distinguishes itself by being the first non-Transformer LLM to exceed both traditional Transformer and other efficient Transformer models (such as, RetNet and Mamba) in terms of speed and performance.
+> [email protected]: We plan to scale the sequence length in pre-training stage to **10 million**: https://twitter.com/opennlplab/status/1776894730015789300
 # TransNormerLLM3
 - **TransNormerLLM3-15B** features **14.83 billion** parameters. It is structured with **42 layers**, includes **40 attention heads**, and has a total **embedding size of 5120**.