CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12 • 35
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 114
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Paper • 2408.02718 • Published Aug 5 • 60
EVLM: An Efficient Vision-Language Model for Visual Understanding Paper • 2407.14177 • Published Jul 19 • 42
MIBench: Evaluating Multimodal Large Language Models over Multiple Images Paper • 2407.15272 • Published Jul 21 • 9
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions Paper • 2401.13313 • Published Jan 24 • 5
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12 • 122