Papers - a Imotech Collection

Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Imotech 's Collections

Inbox

Papers

Papers

updated about 21 hours ago

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3 • 32
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17 • 25
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 121
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17 • 21
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

Paper • 2408.10198 • Published Aug 19 • 32
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20 • 51
ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Paper • 2408.06070 • Published Aug 12 • 52
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Paper • 2408.06195 • Published Aug 12 • 61
DC3DO: Diffusion Classifier for 3D Objects

Paper • 2408.06693 • Published Aug 13 • 10
Learning Task Decomposition to Assist Humans in Competitive Programming

Paper • 2406.04604 • Published Jun 7 • 4
Task-oriented Sequential Grounding in 3D Scenes

Paper • 2408.04034 • Published Aug 7 • 8
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User's Casual Sketches

Paper • 2408.04567 • Published Aug 8 • 24
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

Paper • 2406.13897 • Published May 30 • 12
Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

Paper • 2407.13759 • Published Jul 18 • 17
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation

Paper • 2407.14931 • Published Jul 20 • 20
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

Paper • 2407.16224 • Published Jul 23 • 25
DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized Deepfake Detection

Paper • 2406.00856 • Published Jun 2 • 10
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

Paper • 2407.16741 • Published Jul 23 • 68
3D Question Answering for City Scene Understanding

Paper • 2407.17398 • Published Jul 24 • 22
Improving 2D Feature Representations by 3D-Aware Fine-Tuning

Paper • 2407.20229 • Published Jul 29 • 7
SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 108
RelBench: A Benchmark for Deep Learning on Relational Databases

Paper • 2407.20060 • Published Jul 29 • 7
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

Paper • 2408.02545 • Published Aug 5 • 33
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization

Paper • 2408.02555 • Published Aug 5 • 28
Synthesizing Text-to-SQL Data from Weak and Strong LLMs

Paper • 2408.03256 • Published Aug 6 • 10
LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59
Transformer Explainer: Interactive Learning of Text-Generative Models

Paper • 2408.04619 • Published Aug 8 • 155
FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework

Paper • 2408.06190 • Published Aug 12 • 17
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

Paper • 2408.07060 • Published Aug 13 • 40
Imagen 3

Paper • 2408.07009 • Published Aug 13 • 61
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17 • 51
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation

Paper • 2408.13252 • Published Aug 23 • 23
MuCodec: Ultra Low-Bitrate Music Codec

Paper • 2409.13216 • Published Sep 20 • 22
Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19 • 135
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Paper • 2409.12961 • Published Sep 19 • 24
FlexiTex: Enhancing Texture Generation with Visual Guidance

Paper • 2409.12431 • Published Sep 19 • 11
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt

Paper • 2409.12892 • Published Sep 19 • 5
SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending

Paper • 2409.13926 • Published Sep 20 • 5
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Paper • 2409.15278 • Published Sep 23 • 22
Improvements to SDXL in NovelAI Diffusion V3

Paper • 2409.15997 • Published Sep 24 • 11
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published Sep 25 • 59
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness

Paper • 2409.18125 • Published Sep 26 • 33
Game4Loc: A UAV Geo-Localization Benchmark from Game Data

Paper • 2409.16925 • Published Sep 25 • 6
DressRecon: Freeform 4D Human Reconstruction from Monocular Video

Paper • 2409.20563 • Published Sep 30 • 7
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration

Paper • 2410.00418 • Published Oct 1 • 9
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs

Paper • 2410.00337 • Published Oct 1 • 10
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation

Paper • 2410.00890 • Published Oct 1 • 18
Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30 • 53
Illustrious: an Open Advanced Illustration Model

Paper • 2409.19946 • Published Sep 30 • 13
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Paper • 2410.01215 • Published Oct 2 • 30
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection

Paper • 2410.01647 • Published Oct 2 • 28
Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1 • 144
MIGA: Mixture-of-Experts with Group Aggregation for Stock Market Prediction

Paper • 2410.02241 • Published Oct 3 • 6
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction

Paper • 2410.01273 • Published Oct 2 • 8
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation

Paper • 2410.01912 • Published Oct 2 • 13
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search

Paper • 2410.03864 • Published Oct 4 • 10
Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach

Paper • 2410.06949 • Published Oct 9 • 5
Data Selection via Optimal Control for Language Models

Paper • 2410.07064 • Published Oct 9 • 8
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Paper • 2410.07171 • Published Oct 9 • 41
Does Spatial Cognition Emerge in Frontier Models?

Paper • 2410.06468 • Published Oct 9 • 2
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents

Paper • 2410.03450 • Published Oct 4 • 36
Agent S: An Open Agentic Framework that Uses Computers Like a Human

Paper • 2410.08164 • Published Oct 10 • 24
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

Paper • 2410.05265 • Published Oct 7 • 29
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Paper • 2410.09732 • Published Oct 13 • 54
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation

Paper • 2410.09584 • Published Oct 12 • 45
Animate-X: Universal Character Image Animation with Enhanced Motion Representation

Paper • 2410.10306 • Published Oct 14 • 52
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Paper • 2410.10563 • Published Oct 14 • 37
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14 • 26
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Paper • 2410.10594 • Published Oct 14 • 22
Rethinking Data Selection at Scale: Random Selection is Almost All You Need

Paper • 2410.09335 • Published Oct 12 • 15
Baichuan-Omni Technical Report

Paper • 2410.08565 • Published Oct 11 • 84
Mentor-KD: Making Small Language Models Better Multi-step Reasoners

Paper • 2410.09037 • Published Oct 11 • 4
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights

Paper • 2410.09008 • Published Oct 11 • 16
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining

Paper • 2410.08102 • Published Oct 10 • 19
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper • 2410.08815 • Published Oct 11 • 42
From Generalist to Specialist: Adapting Vision Language Models via Task-Specific Visual Instruction Tuning

Paper • 2410.06456 • Published Oct 9 • 35
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Paper • 2410.08261 • Published Oct 10 • 49
FlatQuant: Flatness Matters for LLM Quantization

Paper • 2410.09426 • Published Oct 12 • 12
Harnessing Webpage UIs for Text-Rich Visual Understanding

Paper • 2410.13824 • Published Oct 17 • 29
MobA: A Two-Level Agent System for Efficient Mobile Task Automation

Paper • 2410.13757 • Published Oct 17 • 31
Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant

Paper • 2410.13360 • Published Oct 17 • 8
AERO: Softmax-Only LLMs for Efficient Private Inference

Paper • 2410.13060 • Published Oct 16 • 4
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Paper • 2410.13754 • Published Oct 17 • 74
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models

Paper • 2410.13370 • Published Oct 17 • 35
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

Paper • 2410.13674 • Published Oct 17 • 15
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

Paper • 2410.13726 • Published Oct 17 • 10
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Paper • 2410.10812 • Published Oct 14 • 14
AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21 • 57
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17 • 53
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation

Paper • 2410.14745 • Published Oct 17 • 45
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages

Paper • 2410.16153 • Published Oct 21 • 42
Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception

Paper • 2410.12788 • Published Oct 16 • 21
DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes

Paper • 2410.18084 • Published 29 days ago • 12
Lightweight Neural App Control

Paper • 2410.17883 • Published 29 days ago • 9
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding

Paper • 2410.13924 • Published Oct 17 • 6
LOGO -- Long cOntext aliGnment via efficient preference Optimization

Paper • 2410.18533 • Published 29 days ago • 42
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published 28 days ago • 40
Framer: Interactive Frame Interpolation

Paper • 2410.18978 • Published 28 days ago • 36
Unbounded: A Generative Infinite Game of Character Life Simulation

Paper • 2410.18975 • Published 28 days ago • 34
Distill Visual Chart Reasoning Ability from LLMs to MLLMs

Paper • 2410.18798 • Published 28 days ago • 19
WAFFLE: Multi-Modal Model for Automated Front-End Development

Paper • 2410.18362 • Published 29 days ago • 11
mistralai/Pixtral-12B-Base-2409

Updated 22 days ago • 69
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning

Paper • 2410.19290 • Published 28 days ago • 10
Continuous Speech Synthesis using per-token Latent Diffusion

Paper • 2410.16048 • Published Oct 21 • 28
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

Paper • 2410.19168 • Published 28 days ago • 19
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding

Paper • 2404.16710 • Published Apr 25 • 74
GPT-4o System Card

Paper • 2410.21276 • Published 27 days ago • 79
Neural Fields in Robotics: A Survey

Paper • 2410.20220 • Published 26 days ago • 4
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

Paper • 2410.21220 • Published 24 days ago • 8
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant

Paper • 2410.18603 • Published 29 days ago • 30
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Paper • 2410.18666 • Published 28 days ago • 18
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Paper • 2410.21465 • Published 24 days ago • 10
CLEAR: Character Unlearning in Textual and Visual Modalities

Paper • 2410.18057 • Published 29 days ago • 199
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Paper • 2410.20424 • Published 25 days ago • 37
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Paper • 2410.23168 • Published 22 days ago • 22
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

Paper • 2410.20650 • Published 25 days ago • 16
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Paper • 2410.22366 • Published 24 days ago • 73
Language Models can Self-Lengthen to Generate Long Texts

Paper • 2410.23933 • Published 21 days ago • 16
SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published 21 days ago • 20
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks

Paper • 2410.24032 • Published 21 days ago • 8
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

Paper • 2410.21157 • Published 24 days ago • 6
Face Anonymization Made Simple

Paper • 2411.00762 • Published 20 days ago • 7
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models

Paper • 2410.22901 • Published 23 days ago • 8
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes

Paper • 2411.00771 • Published 20 days ago • 9
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents

Paper • 2410.24024 • Published 21 days ago • 48
Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published 17 days ago • 23
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization

Paper • 2411.02355 • Published 17 days ago • 44
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

Paper • 2411.02336 • Published 17 days ago • 23
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single In-the-Wild Image using a Dataset with Levels of Details

Paper • 2411.03047 • Published 16 days ago • 7
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems

Paper • 2411.02959 • Published 17 days ago • 62
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

Paper • 2411.04709 • Published 16 days ago • 25
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published 14 days ago • 108
RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval

Paper • 2411.04752 • Published 14 days ago • 16
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Paper • 2411.05007 • Published 14 days ago • 16
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published 14 days ago • 27
BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published 14 days ago • 63
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

Paper • 2411.06176 • Published 12 days ago • 44
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

Paper • 2411.07126 • Published 10 days ago • 28
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Paper • 2411.07199 • Published 10 days ago • 42
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM

Paper • 2411.04954 • Published 14 days ago • 7
PramaLLC/BEN

Image Segmentation • Updated about 23 hours ago • 116 • 65
SAMPart3D: Segment Any Part in 3D Objects

Paper • 2411.07184 • Published 10 days ago • 25
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

Paper • 2411.09595 • Published 7 days ago • 65
MagicQuill: An Intelligent Interactive Image Editing System

Paper • 2411.09703 • Published 7 days ago • 50
Large Language Models Can Self-Improve in Long-context Reasoning

Paper • 2411.08147 • Published 9 days ago • 59
GenXD: Generating Any 3D and 4D Scenes

Paper • 2411.02319 • Published 17 days ago • 20
LLaVA-o1: Let Vision Language Models Reason Step-by-Step

Paper • 2411.10440 • Published 6 days ago • 87
SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers

Paper • 2411.10510 • Published 6 days ago • 8
Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts

Paper • 2411.10669 • Published 6 days ago • 9
SlimLM: An Efficient Small Language Model for On-Device Document Assistance

Paper • 2411.09944 • Published 7 days ago • 12
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Paper • 2411.10640 • Published 6 days ago • 37
Continuous Speculative Decoding for Autoregressive Image Generation

Paper • 2411.11925 • Published 4 days ago • 13
RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published 3 days ago • 36
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text, and Architectural Enhancements

Paper • 2411.12044 • Published 3 days ago • 12
Building Trust: Foundations of Security, Safety and Transparency in AI

Paper • 2411.12275 • Published 3 days ago • 10
SEAGULL: No-reference Image Quality Assessment for Regions of Interest via Vision-Language Instruction Tuning

Paper • 2411.10161 • Published 6 days ago • 6

Collection guide
Browse collections

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs