adakoda's picture

217 125

adakoda

adakoda

·

AI & ML interests

CV

Organizations

None yet

adakoda's activity

upvoted a paper 21 days ago

TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder

Paper • 2409.08248 • Published 22 days ago • 12

upvoted a paper about 1 month ago

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28 • 32

upvoted a paper 3 months ago

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Paper • 2406.11896 • Published Jun 14 • 18

upvoted 6 papers 4 months ago

RVT-2: Learning Precise Manipulation from Few Demonstrations

Paper • 2406.08545 • Published Jun 12 • 7

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13 • 36

Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 91

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Paper • 2406.02523 • Published Jun 4 • 9

pOps: Photo-Inspired Diffusion Operators

Paper • 2406.01300 • Published Jun 3 • 16

2BP: 2-Stage Backpropagation

Paper • 2405.18047 • Published May 28 • 23

upvoted 5 papers 5 months ago

Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20 • 23

Toon3D: Seeing Cartoons from a New Perspective

Paper • 2405.10320 • Published May 16 • 19

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Paper • 2405.10314 • Published May 16 • 43

LogoMotion: Visually Grounded Code Generation for Content-Aware Animation

Paper • 2405.07065 • Published May 11 • 16

A Multimodal Automated Interpretability Agent

Paper • 2404.14394 • Published Apr 22 • 20

upvoted 15 papers 6 months ago

AniClipart: Clipart Animation with Text-to-Video Priors

Paper • 2404.12347 • Published Apr 18 • 12

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Paper • 2404.09833 • Published Apr 15 • 29

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Paper • 2404.07973 • Published Apr 11 • 30

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11 • 43

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11 • 47

RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion

Paper • 2404.07199 • Published Apr 10 • 25

YaART: Yet Another ART Rendering Technology

Paper • 2404.05666 • Published Apr 8 • 15

Aligning Diffusion Models by Optimizing Human Utility

Paper • 2404.04465 • Published Apr 6 • 13

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4 • 25

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

Paper • 2404.03118 • Published Apr 3 • 23

Measuring Style Similarity in Diffusion Models

Paper • 2404.01292 • Published Apr 1 • 16

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

Paper • 2403.20331 • Published Mar 29 • 14

DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion

Paper • 2403.17237 • Published Mar 25 • 8

Garment3DGen: 3D Garment Stylization and Texture Generation

Paper • 2403.18816 • Published Mar 27 • 20

LLM Agent Operating System

Paper • 2403.16971 • Published Mar 25 • 64

upvoted 12 papers 7 months ago

Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers

Paper • 2403.12943 • Published Mar 19 • 14

Gemma: Open Models Based on Gemini Research and Technology

Paper • 2403.08295 • Published Mar 13 • 47

Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5 • 93

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182

3D Diffusion Policy

Paper • 2403.03954 • Published Mar 6 • 11

RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

Paper • 2403.02709 • Published Mar 5 • 7

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5 • 56

Twisting Lids Off with Two Hands

Paper • 2403.02338 • Published Mar 4 • 5

RT-H: Action Hierarchies Using Language

Paper • 2403.01823 • Published Mar 4 • 7

VisionLLaMA: A Unified LLaMA Interface for Vision Tasks

Paper • 2403.00522 • Published Mar 1 • 44

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 592

CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation

Paper • 2402.14795 • Published Feb 22 • 5

upvoted 19 papers 8 months ago

DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

Paper • 2402.11929 • Published Feb 19 • 9

Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Paper • 2402.11450 • Published Feb 18 • 20

Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots

Paper • 2402.10329 • Published Feb 15 • 13

LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing

Paper • 2402.10294 • Published Feb 15 • 22

Rolling Diffusion Models

Paper • 2402.09470 • Published Feb 12 • 9

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Paper • 2402.09812 • Published Feb 15 • 12

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Paper • 2402.10210 • Published Feb 15 • 29

UFO: A UI-Focused Agent for Windows OS Interaction

Paper • 2402.07939 • Published Feb 8 • 13

PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models

Paper • 2402.08714 • Published Feb 13 • 10

L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

Paper • 2402.09052 • Published Feb 14 • 16

Animated Stickers: Bringing Stickers to Life with Video Diffusion

Paper • 2402.06088 • Published Feb 8 • 9

InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

Paper • 2402.05937 • Published Feb 8 • 11

An Interactive Agent Foundation Model

Paper • 2402.05929 • Published Feb 8 • 26

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Paper • 2402.05930 • Published Feb 8 • 39

IMUSIC: IMU-based Facial Expression Capture

Paper • 2402.03944 • Published Feb 3 • 8

MusicRL: Aligning Music Generation to Human Preferences

Paper • 2402.04229 • Published Feb 6 • 16

Training-Free Consistent Text-to-Image Generation

Paper • 2402.03286 • Published Feb 5 • 64

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

Paper • 2402.03040 • Published Feb 5 • 17

AToM: Amortized Text-to-Mesh using 2D Diffusion

Paper • 2402.00867 • Published Feb 1 • 10