Vince's picture

579 46

Vince

bolerovt

·

bolerovt

AI & ML interests

None yet

Organizations

None yet

bolerovt's activity

upvoted a paper 3 days ago

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published 14 days ago • 34

upvoted 8 papers 14 days ago

Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published 16 days ago • 37

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published 16 days ago • 32

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published 19 days ago • 44

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published 15 days ago • 26

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published 15 days ago • 65

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published 15 days ago • 80

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published 14 days ago • 69

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published 14 days ago • 119

upvoted an article 15 days ago

Article

Scaling robotics datasets with video encoding

Aug 27

• 33

upvoted 3 papers 15 days ago

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published 27 days ago • 38

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Paper • 2409.08264 • Published 20 days ago • 42

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published 21 days ago • 62

upvoted 14 papers 20 days ago

Kvasir-VQA: A Text-Image Pair GI Tract Dataset

Paper • 2409.01437 • Published about 1 month ago • 70

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published 29 days ago • 76

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published 28 days ago • 54

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published 28 days ago • 85

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published about 1 month ago • 95

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published 27 days ago • 85

Evaluating Multiview Object Consistency in Humans and Image Models

Paper • 2409.05862 • Published 23 days ago • 8

POINTS: Improving Your Vision-language Model with Affordable Strategies

Paper • 2409.04828 • Published 25 days ago • 22

Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance

Paper • 2409.04593 • Published 26 days ago • 20

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published 28 days ago • 71

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 22 days ago • 54

Agent Workflow Memory

Paper • 2409.07429 • Published 21 days ago • 27

MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications

Paper • 2409.07314 • Published 21 days ago • 50

PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation

Paper • 2409.06820 • Published 22 days ago • 58

upvoted 33 papers about 1 month ago

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

Paper • 2408.16293 • Published Aug 29 • 23

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Paper • 2408.16768 • Published Aug 29 • 26

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Paper • 2408.16532 • Published Aug 29 • 46

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Paper • 2408.16767 • Published Aug 29 • 29

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published Aug 29 • 56

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29 • 92

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 101

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 110

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published Aug 26 • 59

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

Paper • 2408.14819 • Published Aug 27 • 19

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 36

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 120

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27 • 137

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28 • 83

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28 • 41

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

Paper • 2408.15079 • Published Aug 27 • 51

FocusLLM: Scaling LLM's Context by Parallel Decoding

Paper • 2408.11745 • Published Aug 21 • 23

TWLV-I: Analysis and Insights from Holistic Evaluation on Video Foundation Models

Paper • 2408.11318 • Published Aug 21 • 54

Hermes 3 Technical Report

Paper • 2408.11857 • Published Aug 15 • 35

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22 • 29

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20 • 49

Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22 • 23

DreamCinema: Cinematic Transfer with Free Camera and 3D Character

Paper • 2408.12601 • Published Aug 22 • 28

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22 • 33

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22 • 61

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published Aug 22 • 86

Show-o: One Single Transformer to Unify Multimodal Understanding and Generation

Paper • 2408.12528 • Published Aug 22 • 50

Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning

Paper • 2408.07931 • Published Aug 15 • 18

Automated Design of Agentic Systems

Paper • 2408.08435 • Published Aug 15 • 38

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 96

SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

Paper • 2408.10195 • Published Aug 19 • 12

Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17 • 20

MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

Paper • 2408.10198 • Published Aug 19 • 32