GuoLiangTang's picture

1965 3

GuoLiangTang

Tommy930

·

https://github.com/TommyTang930

AI & ML interests

LLM，NLP，ML

Organizations

None yet

Tommy930's activity

upvoted a paper about 11 hours ago

Illustrious: an Open Advanced Illustration Model

Paper • 2409.19946 • Published 3 days ago • 6

upvoted 4 papers about 15 hours ago

ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer

Paper • 2410.00086 • Published 2 days ago • 6

Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models

Paper • 2410.00231 • Published 2 days ago • 3

Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation

Paper • 2410.00890 • Published 1 day ago • 7

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published 1 day ago • 15

upvoted a paper about 16 hours ago

Visual Question Decomposition on Multimodal Large Language Models

Paper • 2409.19339 • Published 4 days ago • 5

upvoted 7 papers 1 day ago

Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers

Paper • 2409.20537 • Published 2 days ago • 10

UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Paper • 2409.20551 • Published 2 days ago • 10

Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

Paper • 2409.18943 • Published 5 days ago • 22

Image Copy Detection for Diffusion Models

Paper • 2409.19952 • Published 3 days ago • 8

Cottention: Linear Transformers With Cosine Attention

Paper • 2409.18747 • Published 5 days ago • 11

Hyper-Connections

Paper • 2409.19606 • Published 3 days ago • 13

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning

Paper • 2409.20566 • Published 2 days ago • 33

upvoted 7 papers 4 days ago

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Paper • 2409.10819 • Published 16 days ago • 17

A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B

Paper • 2409.11055 • Published 15 days ago • 16

Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts

Paper • 2409.13449 • Published 12 days ago • 7

Enhancing Structured-Data Retrieval with GraphRAG: Soccer Data Case Study

Paper • 2409.17580 • Published 6 days ago • 6

The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends

Paper • 2409.14195 • Published 11 days ago • 10

Instruction Following without Instruction Tuning

Paper • 2409.14254 • Published 11 days ago • 24

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Paper • 2409.18124 • Published 6 days ago • 23

upvoted 7 papers 5 days ago

Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction

Paper • 2409.18121 • Published 6 days ago • 7

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

Paper • 2409.17280 • Published 7 days ago • 8

Pixel-Space Post-Training of Latent Diffusion Models

Paper • 2409.17565 • Published 6 days ago • 18

Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction

Paper • 2409.17422 • Published 7 days ago • 22

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Paper • 2409.18042 • Published 6 days ago • 30

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness

Paper • 2409.18125 • Published 6 days ago • 32

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

Paper • 2409.17481 • Published 7 days ago • 42

upvoted 2 papers 6 days ago

NoTeeline: Supporting Real-Time Notetaking from Keypoints with Large Language Models

Paper • 2409.16493 • Published 8 days ago • 7

HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale

Paper • 2409.16299 • Published 23 days ago • 9

upvoted 10 papers 7 days ago

TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans

Paper • 2409.16666 • Published 7 days ago • 5

Synchronize Dual Hands for Physics-Based Dexterous Guitar Playing

Paper • 2409.16629 • Published 8 days ago • 9

DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

Paper • 2409.17145 • Published 7 days ago • 11

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

Paper • 2409.17146 • Published 7 days ago • 87

Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

Paper • 2409.17115 • Published 7 days ago • 57

RRM: Robust Reward Model Training Mitigates Reward Hacking

Paper • 2409.13156 • Published 13 days ago • 3

Improvements to SDXL in NovelAI Diffusion V3

Paper • 2409.15997 • Published 8 days ago • 10

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts

Paper • 2409.16040 • Published 8 days ago • 9

Reward-Robust RLHF in LLMs

Paper • 2409.15360 • Published 15 days ago • 4

OmniBench: Towards The Future of Universal Omni-Language Models

Paper • 2409.15272 • Published 9 days ago • 24

upvoted 7 papers 8 days ago

Making Text Embedders Few-Shot Learners

Paper • 2409.15700 • Published 9 days ago • 27

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Paper • 2409.12192 • Published 14 days ago • 4

Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation

Paper • 2409.16283 • Published 8 days ago • 6

MonoFormer: One Transformer for Both Diffusion and Autoregression

Paper • 2409.16280 • Published 8 days ago • 17

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published 8 days ago • 39

SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending

Paper • 2409.13926 • Published 12 days ago • 4

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions

Paper • 2409.15278 • Published 9 days ago • 21

upvoted 7 papers 9 days ago

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published 10 days ago • 26

A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?

Paper • 2409.15277 • Published 9 days ago • 34

Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments

Paper • 2409.11276 • Published 15 days ago • 6

Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation

Paper • 2409.12941 • Published 13 days ago • 19

Colorful Diffuse Intrinsic Image Decomposition in the Wild

Paper • 2409.13690 • Published 12 days ago • 12

Portrait Video Editing Empowered by Multimodal Generative Priors

Paper • 2409.13591 • Published 12 days ago • 15

Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published 12 days ago • 64

upvoted 7 papers 11 days ago

Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

Paper • 2409.11242 • Published 15 days ago • 4

Human-like Affective Cognition in Foundation Models

Paper • 2409.11733 • Published 14 days ago • 4

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

Paper • 2409.11363 • Published 15 days ago • 2

RoMath: A Mathematical Reasoning Benchmark in Romanian

Paper • 2409.11074 • Published 15 days ago • 3

Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning

Paper • 2409.12001 • Published 14 days ago • 3

SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer

Paper • 2409.08425 • Published 20 days ago • 9

Vista3D: Unravel the 3D Darkside of a Single Image

Paper • 2409.12193 • Published 14 days ago • 8