Ross Wightman's picture

Ross Wightman

rwightman

·

AI & ML interests

Computer vision, transfer learning, semi/self supervised learning, robotics.

Recent Activity

Reacted to jeffboudier's post with 🤗 about 23 hours ago

New - add your bluesky account to your HF profile: https://huggingface.co/settings/profile Is the grass greener, the sky bluer? Will try and figure it out at https://bsky.app/profile/jeffboudier.bsky.social By the way, HF people starter pack https://bsky.app/starter-pack/huggingface.bsky.social/3laz5x7naiz22

upvoted an article about 23 hours ago

🤗 Serve Anything with Inference Endpoints + Custom Handlers

Reacted to merve's post with 🔥 1 day ago

What a week! A recap for everything you missed ❄️ https://huggingface.co/collections/merve/nov-22-releases-673fbbcfc1c97c4f411def07 Multimodal ✨ > Mistral AI released Pixtral 124B, a gigantic open vision language model > Llava-CoT (formerly known as Llava-o1) was released, a multimodal reproduction of o1 model by PKU > OpenGVLab released MMPR: a new multimodal reasoning dataset > Jina has released Jina-CLIP-v2 0.98B multilingual multimodal embeddings > Apple released new SotA vision encoders AIMv2 LLMs 🦙 > AllenAI dropped a huge release of models, datasets and scripts for Tülu, a family of models based on Llama 3.1 aligned with SFT, DPO and a new technique they have developed called RLVR > Jina has released embeddings-v3: new multilingual embeddings with longer context > Hugging Face released SmolTalk: synthetic dataset used to align SmolLM2 using supervised fine-tuning > Microsoft released orca-agentinstruct-1M-v1: a gigantic instruction dataset of 1M synthetic instruction pairs Image Generation 🖼️ > Black Forest Labs released Flux 1. tools: four new models for different image modifications and two LoRAs to do image conditioning and better steer generations Lastly Hugging Face released a new library Observers: a lightweight SDK for monitoring interactions with AI APIs and easily store and browse them 📚 $ pip install observers

View all activity

Articles

Trick or ResNet Treat

Mamba Out

Tiny Test Models

Searching for better (Full) ImageNet ViT Baselines

MobileNet Baselines

MobileNet-V4 (now in timm)

Organizations

rwightman's activity

upvoted an article about 23 hours ago

Article

🤗 Serve Anything with Inference Endpoints + Custom Handlers

By

•

1 day ago

• 2

upvoted a collection about 2 months ago

RDNet

DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs [ECCV 2024] • 9 items • Updated Oct 16 • 3

upvoted a collection 2 months ago

timm tiny test models

A collection of very small (~300-500k parameter) models at 160x160 resolution, for testing purposes. Trained on ImageNet-1k. • 13 items • Updated Oct 2 • 3

upvoted 2 articles 4 months ago

Article

MobileNet Baselines

By

•

Jul 26

• 23

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25

• 18

upvoted a collection 4 months ago

🍃 MINT-1T

Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 54

upvoted 2 papers 5 months ago

PaliGemma: A versatile 3B VLM for transfer

Paper • 2407.07726 • Published Jul 10 • 67

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24 • 57

upvoted a collection 5 months ago

Cambrian Data

3 items • Updated Jun 25 • 9

upvoted a paper 5 months ago

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11 • 55

upvoted 2 collections 5 months ago

MobileCLIP Models + DataCompDR Data

MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4 • 25

MobileNetV4 pretrained weights

Weights for MobileNet-V4 pretrained in timm • 17 items • Updated Sep 22 • 17

upvoted 2 papers 6 months ago

MobileNetV4 -- Universal Models for the Mobile Ecosystem

Paper • 2404.10518 • Published Apr 16 • 2

On the Efficiency of Convolutional Neural Networks

Paper • 2404.03617 • Published Apr 4 • 4

upvoted 3 articles 6 months ago

Article

MobileNet-V4 (now in timm)

By

•

Jun 17

• 39

Article

Multimodal Augmentation for Documents: Recovering “Comprehension” in “Reading and Comprehension” task

By

•

May 16

• 17

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 210

upvoted 2 collections 6 months ago

PaliGemma Release

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 137

PaliGemma FT Models

108 items • Updated Jul 31 • 27

upvoted a collection 7 months ago

Searching for Better ViT Baselines

Exploring ViT hparams and model shapes for the GPU poor (between tiny and base). • 25 items • Updated Aug 21 • 13