Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models Paper • 2408.15518 • Published Aug 28 • 41
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 138
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs Paper • 2408.13467 • Published Aug 24 • 23
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published Aug 23 • 25
A Web-Based Solution for Federated Learning with LLM-Based Automation Paper • 2408.13010 • Published Aug 23 • 8
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 111
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 56
PhysBERT: A Text Embedding Model for Physics Scientific Literature Paper • 2408.09574 • Published Aug 18 • 6
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 Paper • 2408.05147 • Published Aug 9 • 37
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper • 2408.06195 • Published Aug 12 • 58
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Paper • 2408.03361 • Published Aug 6 • 85
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 154
Medical SAM 2: Segment medical images as video via Segment Anything Model 2 Paper • 2408.00874 • Published Aug 1 • 41
Gemma 2: Improving Open Language Models at a Practical Size Paper • 2408.00118 • Published Jul 31 • 73
TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods Paper • 2407.21630 • Published Jul 31 • 8
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28 • 60
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models Paper • 2407.19474 • Published Jul 28 • 22
Theia: Distilling Diverse Vision Foundation Models for Robot Learning Paper • 2407.20179 • Published Jul 29 • 45
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents Paper • 2407.17490 • Published Jul 3 • 30
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18 • 33
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases Paper • 2407.12784 • Published Jul 17 • 48
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? Paper • 2407.11963 • Published Jul 16 • 43
YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus Paper • 2407.11144 • Published Jul 15 • 7
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models Paper • 2407.09025 • Published Jul 12 • 125
Inference Performance Optimization for Large Language Models on CPUs Paper • 2407.07304 • Published Jul 10 • 52
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages Paper • 2407.05975 • Published Jul 8 • 34
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs Paper • 2407.04051 • Published Jul 4 • 35
TabReD: A Benchmark of Tabular Machine Learning in-the-Wild Paper • 2406.19380 • Published Jun 27 • 47
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge Paper • 2407.00088 • Published Jun 25 • 9
Towards Robust Speech Representation Learning for Thousands of Languages Paper • 2407.00837 • Published Jun 30 • 10
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention Paper • 2407.02490 • Published Jul 2 • 23
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1 • 42
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published Jul 3 • 18
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 92
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning Paper • 2406.19741 • Published Jun 28 • 59
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 94
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published Jun 27 • 59
MatchTime: Towards Automatic Soccer Game Commentary Generation Paper • 2406.18530 • Published Jun 26 • 12
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs Paper • 2406.18521 • Published Jun 26 • 25
Octo-planner: On-device Language Model for Planner-Action Agents Paper • 2406.18082 • Published Jun 26 • 47
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding Paper • 2406.19389 • Published Jun 27 • 51
YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals Paper • 2406.16273 • Published Jun 24 • 40
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials Paper • 2406.14347 • Published Jun 20 • 99
The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing Paper • 2406.10601 • Published Jun 15 • 65