COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper • 2410.19313 • Published Oct 25 • 18 • 5
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper • 2410.19313 • Published Oct 25 • 18 • 5
On Memorization of Large Language Models in Logical Reasoning Paper • 2410.23123 • Published 29 days ago • 16
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated about 3 hours ago • 392
Common 7B Language Models Already Possess Strong Math Capabilities Paper • 2403.04706 • Published Mar 7 • 16
Running on CPU Upgrade 11.9k 🏆 Open LLM Leaderboard 2 Track, rank and evaluate open LLMs and chatbots