In-Context Pretraining: Language Modeling Beyond Document Boundaries Paper • 2310.10638 • Published Oct 16, 2023 • 28
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning Paper • 2310.09478 • Published Oct 14, 2023 • 19
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules Paper • 2310.08992 • Published Oct 13, 2023 • 10
Toward Joint Language Modeling for Speech Units and Text Paper • 2310.08715 • Published Oct 12, 2023 • 7
MotionDirector: Motion Customization of Text-to-Video Diffusion Models Paper • 2310.08465 • Published Oct 12, 2023 • 14
Lemur: Harmonizing Natural Language and Code for Language Agents Paper • 2310.06830 • Published Oct 10, 2023 • 30
Octopus: Embodied Vision-Language Programmer from Environmental Feedback Paper • 2310.08588 • Published Oct 12, 2023 • 34
HeaP: Hierarchical Policies for Web Actions using LLMs Paper • 2310.03720 • Published Oct 5, 2023 • 6
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency Paper • 2310.03734 • Published Oct 5, 2023 • 14
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26 • 31
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning Paper • 2310.03731 • Published Oct 5, 2023 • 29
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation Paper • 2310.03214 • Published Oct 5, 2023 • 17
UniAudio: An Audio Foundation Model Toward Universal Audio Generation Paper • 2310.00704 • Published Oct 1, 2023 • 19
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion Paper • 2310.03502 • Published Oct 5, 2023 • 77
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning Paper • 2407.20798 • Published Jul 30 • 23
A Large Encoder-Decoder Family of Foundation Models For Chemical Language Paper • 2407.20267 • Published Jul 24 • 31
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset Paper • 2402.05937 • Published Feb 8 • 11
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models Paper • 2402.05935 • Published Feb 8 • 15
Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification Paper • 2407.19340 • Published Jul 27 • 55
SHIC: Shape-Image Correspondences with no Keypoint Supervision Paper • 2407.18907 • Published Jul 26 • 38
Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models Paper • 2407.19914 • Published Jul 29 • 12
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models Paper • 2407.12327 • Published Jul 17 • 75
E5-V: Universal Embeddings with Multimodal Large Language Models Paper • 2407.12580 • Published Jul 17 • 38
Compact Language Models via Pruning and Knowledge Distillation Paper • 2407.14679 • Published Jul 19 • 35
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression Paper • 2407.12077 • Published Jul 16 • 52
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism Paper • 2407.10457 • Published Jul 15 • 22
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published Jul 9 • 41
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published Jul 11 • 30
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist Paper • 2407.08733 • Published Jul 11 • 20
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence Paper • 2407.07061 • Published Jul 9 • 26
AgentInstruct: Toward Generative Teaching with Agentic Flows Paper • 2407.03502 • Published Jul 3 • 43
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25 • 82
LongIns: A Challenging Long-context Instruction-based Exam for LLMs Paper • 2406.17588 • Published Jun 25 • 20
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs Paper • 2406.16860 • Published Jun 24 • 54
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22 • 45
Understanding Alignment in Multimodal LLMs: A Comprehensive Study Paper • 2407.02477 • Published Jul 2 • 21
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1 • 75
ColPali: Efficient Document Retrieval with Vision Language Models Paper • 2407.01449 • Published Jun 27 • 36
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild Paper • 2406.04770 • Published Jun 7 • 26
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality Paper • 2405.21060 • Published May 31 • 63
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published May 24 • 13
Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23 • 26
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published Apr 30 • 22
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning Paper • 2405.12130 • Published May 20 • 45