Wow, this is amazing! 🤯 Samba is a powerful hybrid model with an unlimited context length, combining Mamba, MLP, Sliding Window Attention, and MLP stacking. Samba largest version, Samba-3.8B, trained on 3.2 trillion tokens, excels in benchmarks like MMLU, GSM8K, and HumanEval, and shines in long-context tasks with minimal tuning. --- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling" Github: https://github.com/microsoft/Samba