yono's picture

31 81

yono

ramu0e

·

AI & ML interests

IfO(Imitation from Observation), RL, Foundation Models,

Organizations

ramu0e's activity

upvoted a paper 3 days ago

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

Paper • 2403.12943 • Published Mar 19 • 14

upvoted a paper 17 days ago

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published 20 days ago • 44

upvoted 4 papers about 2 months ago

Pandora: Towards General World Model with Natural Language Actions and Video States

Paper • 2406.09455 • Published Jun 12 • 14

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

Paper • 2407.03471 • Published Jul 3 • 27

Berkeley Humanoid: A Research Platform for Learning-based Control

Paper • 2407.21781 • Published Jul 31 • 8

Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks

Paper • 2408.03615 • Published Aug 7 • 30

upvoted a paper 4 months ago

SF-V: Single Forward Video Generation Model

Paper • 2406.04324 • Published Jun 6 • 23

upvoted a collection 4 months ago

Paper List

1 item • Updated Jun 7 • 1

upvoted 18 papers 4 months ago

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85

Incremental FastPitch: Chunk-based High Quality Text to Speech

Paper • 2401.01755 • Published Jan 3 • 8

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18 • 14

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24 • 71

An Interactive Agent Foundation Model

Paper • 2402.05929 • Published Feb 8 • 26

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25 • 55

Sora Generates Videos with Stunning Geometrical Consistency

Paper • 2402.17403 • Published Feb 27 • 16

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29 • 49

RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

Paper • 2403.02709 • Published Mar 5 • 7

Pix2Gif: Motion-Guided Diffusion for GIF Generation

Paper • 2403.04634 • Published Mar 7 • 14

StableDrag: Stable Dragging for Point-based Image Editing

Paper • 2403.04437 • Published Mar 7 • 25

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182

3D Diffusion Policy

Paper • 2403.03954 • Published Mar 6 • 11

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Paper • 2403.09055 • Published Mar 14 • 24

LightIt: Illumination Modeling and Control for Diffusion Models

Paper • 2403.10615 • Published Mar 15 • 16

Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos

Paper • 2403.13044 • Published Mar 19 • 14

ThemeStation: Generating Theme-Aware 3D Assets from Few Exemplars

Paper • 2403.15383 • Published Mar 22 • 13

Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 64

upvoted a paper 5 months ago

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published May 16 • 19

upvoted a paper 9 months ago

Verbosity Bias in Preference Labeling by Large Language Models

Paper • 2310.10076 • Published Oct 16, 2023 • 2

upvoted 2 papers 10 months ago

VideoLCM: Video Latent Consistency Model

Paper • 2312.09109 • Published Dec 14, 2023 • 22

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 69

upvoted a paper over 1 year ago

Textbooks Are All You Need

Paper • 2306.11644 • Published Jun 20, 2023 • 143