victor (Victor Mustar)

upvoted a collection 2 days ago

Llama 3.2

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 3 days ago • 277

upvoted 3 collections 3 days ago

upvoted 2 papers 4 days ago

RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning

Paper • 2409.14674 • Published 6 days ago • 39

jina-embeddings-v3: Multilingual Embeddings With Task LoRA

Paper • 2409.10173 • Published 13 days ago • 20

upvoted an article 4 days ago

Article

Exploring the Daily Papers Page on Hugging Face

6 days ago

• 18

upvoted a collection 5 days ago

Loradex Highlights

Collection

This collection features awesome opensource LoRAs trained by members of the Glif Community during Loradex Early Access! • 12 items • Updated 5 days ago • 16

upvoted 8 papers 6 days ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published 9 days ago • 119

Temporally Aligned Audio for Video with Autoregression

Paper • 2409.13689 • Published 8 days ago • 7

Portrait Video Editing Empowered by Multimodal Generative Priors

Paper • 2409.13591 • Published 8 days ago • 15

Colorful Diffuse Intrinsic Image Decomposition in the Wild

Paper • 2409.13690 • Published 8 days ago • 11

Prithvi WxC: Foundation Model for Weather and Climate

Paper • 2409.13598 • Published 8 days ago • 31

MuCodec: Ultra Low-Bitrate Music Codec

Paper • 2409.13216 • Published 9 days ago • 20

Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published 9 days ago • 64

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Paper • 2409.12191 • Published 10 days ago • 65

upvoted a collection 8 days ago

NIM Serverless Inference API

Collection

Models in this collection are available for inference via a serverless API powered by NVIDIA NIM. • 8 items • Updated 1 day ago • 17

upvoted a paper 10 days ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published 10 days ago • 115

upvoted a collection 10 days ago

Qwen2.5

Collection

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated 10 days ago • 204

upvoted a paper 10 days ago

QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

Paper • 2405.15863 • Published May 24 • 3

upvoted 3 collections 10 days ago

Moshi v0.1 Release

Collection

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated 10 days ago • 192

Qwen2.5-Math

Collection

Math-specific model series based on Qwen2.5 • 9 items • Updated 6 days ago • 33

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 14 items • Updated 4 days ago • 66

upvoted an article 11 days ago

Article

"Diffusers Image Fill" guide

By

•

15 days ago

• 26

upvoted a paper 11 days ago

Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

Paper • 2409.09214 • Published 15 days ago • 44

upvoted a paper 12 days ago

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems

Paper • 2402.12875 • Published Feb 20 • 12

upvoted 4 articles 13 days ago

Article

Introducing Community Tools on HuggingChat

13 days ago

• 26

Article

Fine-tuning a token classification model for legal data using Argilla and AutoTrain

By

•

22 days ago

• 11

Article

Training Flux Locally on Mac

By

•

17 days ago

• 10

Article

Fine-tuning Parler TTS on a Specific Language

By

•

13 days ago

• 19

upvoted 3 papers 16 days ago

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Paper • 2409.06666 • Published 18 days ago • 53

PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation

Paper • 2409.06820 • Published 18 days ago • 57

Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers

Paper • 2409.04109 • Published 23 days ago • 37

upvoted 2 articles 17 days ago

Article

In-browser LLM app in pure Python: Gemini Nano + Gradio-Lite

By

•

Jul 12

• 9

Article

Exploring a Public Domain dataset with Visual Topic Modeling

By

•

Feb 22

• 3

upvoted 2 papers 18 days ago

Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming

Paper • 2408.16725 • Published about 1 month ago • 50

Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27 • 120

upvoted a collection 18 days ago

TTS / Audio

Collection

102 items • Updated 23 days ago • 3

upvoted a collection 19 days ago

RPMax Models

Collection

RPMax series of models with higher creativity and reduced repetition for "classic" RP chats. • 8 items • Updated 2 days ago • 7

upvoted 3 papers 19 days ago

Towards a Unified View of Preference Learning for Large Language Models: A Survey

Paper • 2409.02795 • Published 24 days ago • 71

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published 25 days ago • 31

How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

Paper • 2409.03810 • Published 23 days ago • 30

upvoted 4 papers 22 days ago

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published 26 days ago • 95

FuzzCoder: Byte-level Fuzzing Test via Large Language Model

Paper • 2409.01944 • Published 25 days ago • 44

Attention Heads of Large Language Models: A Survey

Paper • 2409.03752 • Published 23 days ago • 85

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

Paper • 2310.11716 • Published Oct 18, 2023 • 5

upvoted 3 papers 23 days ago

Kvasir-VQA: A Text-Image Pair GI Tract Dataset

Paper • 2409.01437 • Published 26 days ago • 70

LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture

Paper • 2409.02889 • Published 24 days ago • 54

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

Paper • 2409.02634 • Published 25 days ago • 85

upvoted 4 papers 26 days ago

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Paper • 2408.15914 • Published Aug 28 • 21

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios

Paper • 2408.17267 • Published 29 days ago • 22

SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding

Paper • 2408.15545 • Published Aug 28 • 32

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published about 1 month ago • 92

upvoted 2 collections about 1 month ago

Qwen2-VL

Collection

Vision-language model series based on Qwen2 • 15 items • Updated 11 days ago • 125

Video Generation models

Collection

The domain of video generation is booming. Here are the list of selected Open Access video generation (T2V) models. • 14 items • Updated Aug 27 • 12

upvoted 3 papers about 1 month ago

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

Paper • 2408.14906 • Published Aug 27 • 137

Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

Paper • 2408.15518 • Published Aug 28 • 41

Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

Paper • 2408.15998 • Published Aug 28 • 81

upvoted 2 articles about 1 month ago

Article

Recommendation to Revisit the Diffuser Default LoRA Parameters

By

•

Jun 21

• 11

Article

Enhancing Image Model Dreambooth Training Through Effective Captioning: Key Observations

By

•

Jun 19

• 17

Victor Mustar PRO

AI & ML interests

Articles

Inference for PROs

Organizations

victor's activity

Exploring the Daily Papers Page on Hugging Face

"Diffusers Image Fill" guide

Introducing Community Tools on HuggingChat

Fine-tuning a token classification model for legal data using Argilla and AutoTrain

Training Flux Locally on Mac

Fine-tuning Parler TTS on a Specific Language

In-browser LLM app in pure Python: Gemini Nano + Gradio-Lite

Exploring a Public Domain dataset with Visual Topic Modeling

Recommendation to Revisit the Diffuser Default LoRA Parameters

Enhancing Image Model Dreambooth Training Through Effective Captioning: Key Observations