Abstract
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which achieves competitive performance against leading open-weight models at a comparable scale. Zamba is trained on 1T tokens from openly available datasets and is the best non-transformer model at this scale. Zamba pioneers a unique architecture combining a Mamba backbone with a single shared attention module, thus obtaining the benefits of attention at minimal parameter cost. Due to its architecture, Zamba is significantly faster at inference than comparable transformer models and requires substantially less memory for generation of long sequences. Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay. We open-source the weights and all checkpoints for Zamba, through both phase 1 and annealing phases.
Community
The paper seems interesting but the fact that already on the second page, Fig 1 c and d have their y-axis not start at 0 is kind of annoying (and slightly misleading). Same with Fig 4 c.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Linearizing Large Language Models (2024)
- JetMoE: Reaching Llama2 Performance with 0.1M Dollars (2024)
- Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length (2024)
- Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence (2024)
- MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Zamba: The Next Big Thing in Efficient Language Models
Links 🔗:
👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/