dashfunnydashdash (J)

upvoted a paper 1 day ago

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Paper • 2409.20566 • Published 2 days ago • 33

upvoted a paper 6 days ago

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published 7 days ago • 87

upvoted a paper 8 days ago

RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning

Paper • 2409.14674 • Published 10 days ago • 40

upvoted a paper 11 days ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published 13 days ago • 127

upvoted a paper 14 days ago

OSV: One Step is Enough for High-Quality Image to Video Generation

Paper • 2409.11367 • Published 15 days ago • 12

upvoted a paper 21 days ago

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 22 days ago • 54

upvoted an article 26 days ago

Article

Getty Images Brings High-Quality, Commercially Safe Dataset to Hugging Face

By

•

26 days ago

• 15

upvoted 2 papers 27 days ago

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published 28 days ago • 54

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published 28 days ago • 85

upvoted a paper 29 days ago

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published Aug 29 • 50

upvoted 6 papers about 1 month ago

MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement

Paper • 2408.14211 • Published Aug 26 • 8

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Paper • 2408.13257 • Published Aug 23 • 25

upvoted an article about 1 month ago

Article

Building DoRA Support for Embedding Layers in PEFT

By

•

Aug 23

• 10

upvoted 4 papers about 1 month ago

Real-Time Video Generation with Pyramid Attention Broadcast

Paper • 2408.12588 • Published Aug 22 • 13

Sapiens: Foundation for Human Vision Models

Paper • 2408.12569 • Published Aug 22 • 86

DreamCinema: Cinematic Transfer with Free Camera and 3D Character

Paper • 2408.12601 • Published Aug 22 • 28

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19 • 51

upvoted a paper about 2 months ago

Imagen 3

Paper • 2408.07009 • Published Aug 13 • 60

upvoted an article about 2 months ago

Article

Introduction to ggml

Aug 13

• 98

upvoted 4 papers about 2 months ago

BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion

Paper • 2408.04785 • Published Aug 8 • 6

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Paper • 2408.05147 • Published Aug 9 • 36

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9 • 46

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

Paper • 2408.02657 • Published Aug 5 • 32

upvoted 11 papers 2 months ago

SAM 2: Segment Anything in Images and Videos

Paper • 2408.00714 • Published Aug 1 • 104

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31 • 73

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Paper • 2407.21705 • Published Jul 31 • 25

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 101

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30 • 23

AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents

Paper • 2407.18901 • Published Jul 26 • 31

VSSD: Vision Mamba with Non-Casual State Space Duality

Paper • 2407.18559 • Published Jul 26 • 16

Wolf: Captioning Everything with a World Summarization Framework

Paper • 2407.18908 • Published Jul 26 • 30

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Paper • 2407.16982 • Published Jul 24 • 40

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

Paper • 2407.16655 • Published Jul 23 • 28

Scaling Retrieval-Based Language Models with a Trillion-Token Datastore

Paper • 2407.12854 • Published Jul 9 • 29

upvoted 22 papers 3 months ago

Shape of Motion: 4D Reconstruction from a Single Video

Paper • 2407.13764 • Published Jul 18 • 19

Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

Paper • 2407.11793 • Published Jul 16 • 3

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

Paper • 2407.10957 • Published Jul 15 • 23

GRUtopia: Dream General Robots in a City at Scale

Paper • 2407.10943 • Published Jul 15 • 23

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Paper • 2407.08296 • Published Jul 11 • 31

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation

Paper • 2407.06135 • Published Jul 8 • 20

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Paper • 2407.04051 • Published Jul 4 • 35

Unveiling Encoder-Free Vision-Language Models

Paper • 2406.11832 • Published Jun 17 • 49

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

Paper • 2406.08085 • Published Jun 12 • 13

Revealing Fine-Grained Values and Opinions in Large Language Models

Paper • 2406.19238 • Published Jun 27 • 14

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2 • 23

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Paper • 2407.02371 • Published Jul 2 • 49

LiteSearch: Efficacious Tree Search for LLM

Paper • 2407.00320 • Published Jun 29 • 37

GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

Paper • 2406.18462 • Published Jun 26 • 11

OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27 • 51

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Paper • 2406.18521 • Published Jun 26 • 25

A Closer Look into Mixture-of-Experts in Large Language Models

Paper • 2406.18219 • Published Jun 26 • 15

Adam-mini: Use Fewer Learning Rates To Gain More

Paper • 2406.16793 • Published Jun 24 • 67

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 55

nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials

Paper • 2406.14347 • Published Jun 20 • 99

The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

Paper • 2406.10601 • Published Jun 15 • 65

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Paper • 2406.12034 • Published Jun 17 • 13

J

AI & ML interests

Organizations

dashfunnydashdash's activity

Getty Images Brings High-Quality, Commercially Safe Dataset to Hugging Face

Building DoRA Support for Embedding Layers in PEFT

Introduction to ggml