vincentyang

vincent88

AI & ML interests

None yet

Recent Activity

liked a model about 18 hours ago

auffusion/auffusion-full-no-adapter

liked a model about 20 hours ago

ymzhang319/FoleyCrafter

liked a Space about 20 hours ago

haoheliu/AudioLDM_48K_Text-to-HiFiAudio_Generation

Organizations

None yet

vincent88's activity

upvoted a paper about 1 month ago

Baichuan-Omni Technical Report

Paper • 2410.08565 • Published Oct 11 • 84

upvoted 2 papers 2 months ago

DITTO: Diffusion Inference-Time T-Optimization for Music Generation

Paper • 2401.12179 • Published Jan 22 • 20

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published Sep 4 • 89

upvoted a paper 3 months ago

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28 • 83

upvoted a paper 4 months ago

VILA^2: VILA Augmented VILA

Paper • 2407.17453 • Published Jul 24 • 39

upvoted a paper 6 months ago

An Introduction to Vision-Language Modeling

Paper • 2405.17247 • Published May 27 • 85

upvoted a paper 7 months ago

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Paper • 2404.16821 • Published Apr 25 • 53

upvoted a collection 7 months ago

GPT-4 generated datasets

Collection

Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs. • 18 items • Updated Apr 16 • 8

upvoted 2 papers 8 months ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 90

VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis

Paper • 2403.08764 • Published Mar 13 • 36

upvoted a collection 10 months ago

Awesome SDXL LoRAs

Collection

A curated set of amazing Stable Diffusion XL LoRAs (they power the LoRA the Explorer Space) • 36 items • Updated Jun 24 • 20

upvoted 3 papers 11 months ago

LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Paper • 2401.01055 • Published Jan 2 • 54

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

Paper • 2312.11396 • Published Dec 18, 2023 • 10

FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 26

upvoted 3 papers 12 months ago

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models

Paper • 2312.06109 • Published Dec 11, 2023 • 20

Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models

Paper • 2312.02969 • Published Dec 5, 2023 • 12

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

Paper • 2312.00330 • Published Dec 1, 2023 • 10

upvoted a paper over 1 year ago

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

Paper • 2306.07954 • Published Jun 13, 2023 • 113