Submitted by akhaliq 24 WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild · 9 authors
Submitted by akhaliq 6 Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? · 9 authors
Submitted by akhaliq 4 Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach · 25 authors