WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published Oct 16 • 29
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published 26 days ago • 56
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published Oct 2 • 15
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation Paper • 2410.01171 • Published Oct 2 • 5
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18 • 36
Challenges and Responses in the Practice of Large Language Models Paper • 2408.09416 • Published Aug 18 • 1
Characterizing Prompt Compression Methods for Long Context Inference Paper • 2407.08892 • Published Jul 11 • 9
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Paper • 2409.06595 • Published Sep 10 • 37
LiveBench: A Challenging, Contamination-Free LLM Benchmark Paper • 2406.19314 • Published Jun 27 • 19
Efficient Detection of Toxic Prompts in Large Language Models Paper • 2408.11727 • Published Aug 21 • 11
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval Paper • 2407.12883 • Published Jul 16 • 8
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 138
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets Paper • 2406.18518 • Published Jun 26 • 23
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models Paper • 2309.03883 • Published Sep 7, 2023 • 33