--- license: apache-2.0 language: - en - zh pipeline_tag: text-generation tags: - ' TransNormerLLM' ---
### Pre-training Logbook * Realtime Track: https://api.wandb.ai/links/opennlplab/kip314lq * Join to dicussion: [discord](https://discord.gg/JEU3nTcWKC) <<<>>> [wechat group](https://github.com/OpenNLPLab/TransnormerLLM/blob/main/images/contact_me_qr.png) > --23.12.25-- startup: [WeChat - 预训练启航](https://mp.weixin.qq.com/s/YjUY-uy89WkF75_-rBTuKw) <<<>>> [Twitter - Pre-training Commences ](https://twitter.com/opennlplab/status/1739568669502611825) <<<>>> [YouTube Recording](https://t.co/wk7svS4o5r) <<<>>> [bilibili 回放](https://www.bilibili.com/video/BV11j411J7Dy) > --24.01.02-- first week review: [WeChat - 第一周概览](https://mp.weixin.qq.com/s/zwGnZZI3itNPoxzzXkuU2w) <<<>>> [Twitter - Week 1 Review](https://twitter.com/opennlplab/status/1742187694078501038) > --24.01.09-- second week review: [WeChat - 第二周概览](https://mp.weixin.qq.com/s/6D0qi-0aBier05OKuHfPEA) <<<>>> [Twitter - Week 2 Review](https://twitter.com/opennlplab/status/1744720007299523063) > --24.01.15-- third week review: [WeChat - 第三周概览](https://mp.weixin.qq.com/s/EQg8evZ2cNtAk4HruwCXPA) <<<>>> [Twitter - Week 3 Review](https://twitter.com/opennlplab/status/1746920293069910190) > --24.01.23-- third week review: [WeChat - 第四周概览](https://mp.weixin.qq.com/s/l7LrFGQKkPU38exUtSF4cw) <<<>>> [Twitter - Week 4 Review](https://twitter.com/opennlplab/status/1749821039360840001) > --24.01.30-- third week review: [WeChat - 第五周概览](https://mp.weixin.qq.com/s/OgtQIb749IbX6y5C01bLFg) <<<>>> [Twitter - Week 5 Review](https://twitter.com/opennlplab/status/1752366090754425283) # Released Weights | param | token | Hugging Face | Model Scope | Wisemodel | | :-----: | :---: | :-----------------------------------------------------------------------------------------------------------------------: | :---------: | :-------: | | **15B** | 50B | 🤗[step13000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step13000-50Btokens) | 🤖 | 🐯 | | **15B** | 100B | 🤗[step26000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step26000-100Btokens) | 🤖 | 🐯 | | **15B** | 150B | 🤗[step39000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step39000-150Btokens) | 🤖 | 🐯 | | **15B** | 200B | 🤗[step52000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step52000-200Btokens) | 🤖 | 🐯 | | **15B** | 250B | 🤗[step65000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step65000-250Btokens) | 🤖 | 🐯 | | **15B** | 300B | 🤗[step78000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step78000-300Btokens) | 🤖 | 🐯 | | **15B** | 350B | 🤗[step92000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step92000-350Btokens) | 🤖 | 🐯 | | **15B** | 400B | 🤗[step105000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step105000-400Btokens) | 🤖 | 🐯 | | **15B** | 450B | 🤗[step118000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step118000-450Btokens) | 🤖 | 🐯 | | **15B** | 500B | 🤗[step131000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step131000-500Btokens) | 🤖 | 🐯 | | **15B** | 550B | 🤗[step144000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step144000-550Btokens) | 🤖 | 🐯 | | **15B** | 600B | 🤗[step157000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step157000-600Btokens) | 🤖 | 🐯 | | **15B** | 650B | 🤗[step170000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step170000-650Btokens) | 🤖 | 🐯 | | **15B** | 700B | 🤗[step183000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step183000-700Btokens) | 🤖 | 🐯 | | **15B** | 750B | 🤗[step195500](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step195500-750Btokens) | 🤖 | 🐯 | | **15B** | 800B | 🤗[step209000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step209000-800Btokens) | 🤖 | 🐯 | | **15B** | 850B | 🤗[step222000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step222000-850Btokens) | 🤖 | 🐯 | | **15B** | 900B | 🤗[step235000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step235000-900Btokens) | 🤖 | 🐯 | | **15B** | 950B | 🤗[step248000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step248000-950Btokens) | 🤖 | 🐯 | | **15B** | 1000B | 🤗[step261000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step261000-1000Btokens) | 🤖 | 🐯 | | **15B** | 1050B | 🤗[step274000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step274000-1050Btokens) | 🤖 | 🐯 | | **15B** | 1100B | 🤗[step287000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step287000-1100Btokens) | 🤖 | 🐯 | | **15B** | 1150B | 🤗[step300000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step300000-1150Btokens) | 🤖 | 🐯 | | **15B** | 1200B | 🤗[step313500](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step313500-1200Btokens) | 🤖 | 🐯 | | **15B** | 1250B | 🤗[step326000](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step287000-1250Btokens) | 🤖 | 🐯 | | **15B** | 1300B | 🤗[step339500](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/step326000-1300Btokens) | 🤖 | 🐯 | | **15B** | 1345B | 🤗[stage1](https://huggingface.co/OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints/tree/stage1-1345Btokens) | 🤖 | 🐯 | ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints", revision='step235000-900Btokens', trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("OpenNLPLab/TransNormerLLM3-15B-Intermediate-Checkpoints", torch_dtype=torch.bfloat16, revision='step235000-900Btokens', device_map="auto", trust_remote_code=True) ``` # Benchmark Results The evaluations of all models are conducted using the official settings and the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) framework. | Model | P | T | BoolQ | PIQA | HS | WG | ARC-e | ARC-c | OBQA | C-Eval | MMLU | | ----------------------- | --- | ------ | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ | ----- | | **TransNormerLLM3-15B** | 15 | 0.05 | 62.08 | 72.52 | 55.55 | 57.14 | 62.12 | 31.14 | 32.40 | 26.18 | 27.50 | | **TransNormerLLM3-15B** | 15 | 0.10 | 63.98 | 74.70 | 61.09 | 61.33 | 65.95 | 34.64 | 35.60 | 25.38 | 27.40 | | **TransNormerLLM3-15B** | 15 | 0.15 | 60.34 | 75.08 | 63.99 | 62.04 | 64.56 | 34.90 | 35.20 | 22.64 | 26.60 | | **TransNormerLLM3-15B** | 15 | 0.20 | 52.05 | 74.48 | 64.72 | 62.75 | 66.16 | 35.15 | 36.80 | 27.25 | 30.80 | | **TransNormerLLM3-15B** | 15 | 0.25 | 66.70 | 76.50 | 66.51 | 64.80 | 66.84 | 36.18 | 39.40 | 30.87 | 36.10 | | **TransNormerLLM3-15B** | 15 | 0.30 | 67.00 | 76.50 | 67.17 | 64.40 | 66.29 | 36.77 | 38.80 | 33.99 | 37.60 | | **TransNormerLLM3-15B** | 15 | 0.35 | 65.78 | 75.46 | 67.88 | 66.54 | 67.34 | 38.57 | 39.60 | 36.02 | 39.20 | | **TransNormerLLM3-15B** | 15 | 0.40 | 67.34 | 75.24 | 68.51 | 66.22 | 68.94 | 40.10 | 39.20 | 36.91 | 41.10 | | **TransNormerLLM3-15B** | 15 | 0.45 | 69.02 | 76.28 | 69.11 | 63.77 | 65.82 | 36.01 | 39.40 | 37.17 | 42.80 | | **TransNormerLLM3-15B** | 15 | 0.50 | 66.15 | 77.09 | 69.75 | 65.11 | 68.56 | 35.84 | 39.60 | 39.81 | 42.00 | | **TransNormerLLM3-15B** | 15 | 0.55 | 70.24 | 74.05 | 69.96 | 65.75 | 65.61 | 36.69 | 38.60 | 40.08 | 44.00 | | **TransNormerLLM3-15B** | 15 | 0.60 | 74.34 | 75.68 | 70.44 | 66.22 | 69.36 | 38.40 | 38.40 | 41.05 | 45.30 | | **TransNormerLLM3-15B** | 15 | 0.65 | 73.15 | 76.55 | 71.60 | 66.46 | 69.65 | 39.68 | 40.80 | 41.20 | 44.90 | | **TransNormerLLM3-15B** | 15 | 0.70 | 73.79 | 78.18 | 73.26 | 67.56 | 71.21 | 43.60 | 40.80 | 43.46 | 47.00 | | **TransNormerLLM3-15B** | 15 | 0.75 | 76.45 | 78.07 | 74.22 | 69.30 | 71.21 | 43.43 | 42.20 | 43.46 | 47.80 | | **TransNormerLLM3-15B** | 15 | 0.80 | 76.97 | 78.84 | 74.95 | 69.85 | 72.14 | 43.52 | 41.20 | 45.21 | 49.41 | | **TransNormerLLM3-15B** | 15 | 0.85 | 72.75 | 78.35 | 75.91 | 70.48 | 74.58 | 45.22 | 41.20 | 46.27 | 49.36 | | **TransNormerLLM3-15B** | 15 | 0.90 | 76.09 | 77.91 | 76.49 | 70.88 | 72.14 | 42.92 | 40.20 | 45.70 | 50.15 | | **TransNormerLLM3-15B** | 15 | 0.95 | 74.28 | 78.24 | 76.63 | 72.22 | 74.12 | 44.11 | 42.40 | 46.25 | 51.43 | | **TransNormerLLM3-15B** | 15 | 1.00 | 74.62 | 79.16 | 77.35 | 72.22 | 73.86 | 45.14 | 43.40 | 47.90 | 51.65 | | **TransNormerLLM3-15B** | 15 | 1.05 | 76.36 | 78.94 | 77.15 | 71.35 | 74.66 | 44.45 | 42.80 | 45.87 | 52.28 | | **TransNormerLLM3-15B** | 15 | 1.10 | 76.88 | 78.73 | 77.62 | 70.88 | 74.41 | 45.48 | 42.80 | 49.78 | 53.01 | | **TransNormerLLM3-15B** | 15 | 1.15 | 72.87 | 79.43 | 78.12 | 72.85 | 74.75 | 46.16 | 43.20 | 49.80 | 53.04 | | **TransNormerLLM3-15B** | 15 | 1.20 | 79.48 | 78.67 | 78.45 | 72.93 | 75.42 | 44.37 | 43.60 | 49.33 | 53.80 | | **TransNormerLLM3-15B** | 15 | 1.25 | 79.17 | 79.16 | 78.81 | 72.93 | 75.13 | 45.99 | 43.60 | 50.44 | 54.19 | | **TransNormerLLM3-15B** | 15 | 1.30 | 78.41 | 79.00 | 78.39 | 71.90 | 74.33 | 45.05 | 42.80 | 52.24 | 54.41 | | **TransNormerLLM3-15B** | 15 | stage1 | 78.75 | 79.27 | 78.33 | 71.35 | 75.97 | 46.42 | 45.00 | 50.25 | 54.50 | > **P**: parameter size (billion). **T**: tokens (trillion). **BoolQ**: acc. **PIQA**: acc. **HellaSwag**: acc_norm. **WinoGrande**: acc. **ARC-easy**: acc. **ARC-challenge**: acc_norm. **OpenBookQA**: acc_norm. **MMLU**: 5-shot acc. **C-Eval**: 5-shot acc. # Acknowledgments and Citation ## Acknowledgments Our project is developed based on the following open source projects: - [tiktoken](https://github.com/openai/tiktoken) for the tokenizer. - [metaseq](https://github.com/facebookresearch/metaseq) for training. - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) for evaluation. ## Citation If you wish to cite our work, please use the following reference: ``` @misc{qin2024transnormerllm, title={TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer}, author={Zhen Qin and Dong Li and Weigao Sun and Weixuan Sun and Xuyang Shen and Xiaodong Han and Yunshen Wei and Baohong Lv and Xiao Luo and Yu Qiao and Yiran Zhong}, year={2024}, eprint={2307.14995}, archivePrefix={arXiv}, primaryClass={cs.CL} } @misc{qin2024lightning, title={Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models}, author={Zhen Qin and Weigao Sun and Dong Li and Xuyang Shen and Weixuan Sun and Yiran Zhong}, year={2024}, eprint={2401.04658}, archivePrefix={arXiv}, primaryClass={cs.CL} } @misc{sun2024linear, title={Linear Attention Sequence Parallelism}, author={Weigao Sun and Zhen Qin and Dong Li and Xuyang Shen and Yu Qiao and Yiran Zhong}, year={2024}, eprint={2404.02882}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```
- OpenNLPLab @2024 -