Jamba-1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated Aug 22 • 82
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Paper • 2407.01392 • Published Jul 1 • 39
models to evaluate Collection collecting models I want to evaluate on shadereval-task2: https://github.com/bigcode-project/bigcode-evaluation-harness/pull/173 at fp16!! • 39 items • Updated 13 days ago • 2
Code Evaluation Collection Collection of Papers on Code Evaluation (from code generation language models) • 45 items • Updated Oct 29 • 14
Eurus Collection Advancing LLM Reasoning Generalists with Preference Trees • 11 items • Updated Oct 22 • 24
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11 • 43
Retentive Network: A Successor to Transformer for Large Language Models Paper • 2307.08621 • Published Jul 17, 2023 • 170