17 42 36

Aritra Roy Gosthipaty

ariG23498

https://arig23498.github.io/

AI & ML interests

Deep Representation Learning

Recent Activity

liked a dataset about 22 hours ago

data-is-better-together/image-preferences

upvoted a collection 1 day ago

AIMv2

updated a model 1 day ago

AIDC-AI/Marco-o1

View all activity

Articles

Organizations

ariG23498's activity

Reacted to thomwolf's post with 🔥 3 days ago

Post

1232

Very exciting new mistralai/Pixtral-Large-Instruct-2411 model from Mistral-AI

Impressive performances, huge congrats @patrickvonplaten @sgvaze @pandora-s @devendrachaplot @sophiamyang and team!

Very nice to have SOTA Multilingual OCR and Chart understanding in an open-weights model

posted an update 11 days ago

Post

2523

Qwen/qwen25-66e81a666513e518adb90d9e

Qwen/Qwen2.5-Coder-Artifacts

Qwen/Qwen2.5-Coder-demo

Reacted to m-ric's post with 🚀 30 days ago

Post

1941

🌟🌎 Cohere releases Aya 8B & 32B: SOTA multilingual models for 23 languages !

How did they manage to beat top contenders while also adding 23 languages?

🔄 𝗧𝗿𝗮𝗶𝗻 𝗼𝗻 𝘀𝘆𝗻𝘁𝗵𝗲𝘁𝗶𝗰 𝗱𝗮𝘁𝗮:
• Synthetic data has been said to cause model-collapse after too much training
• Cohere has introduced "data arbitrage" to prevent this by strategically sampling from a pool of several teacher models instead of one single teacher
• First train a model pool for each different groups of languages, and employ an internal Reward Model named "Arbiter" to evaluate and select the optimal generation. Then only the best generation is kept as the final completion for each prompt
➡️ This process is particularly effective for multilingual setting, where no single teacher model performs in all languages : here "Multilingual Arbitrage" singlehandedly improves win rates of the 8B model vs Gemma-2-9B by 10 points!

🧩 𝗨𝘀𝗲 𝗺𝗼𝗱𝗲𝗹 𝗺𝗲𝗿𝗴𝗶𝗻𝗴: Rather than struggling to find the right mix of data in training a single model for multilingual use, just train language specific models then merge them!
• Maximize diversity between merged checkpoints by training each on different language families.
• Experimented fancy techniques (SLERP, TIES, DARE-TIES) but found out weighted averaging to be the most consistent!
➡️ Merging had 3x more gains at high 35B scale vs the 8B scale - consistent with literature findings that merging is more effective at scale

⚡️ 𝗚𝗿𝗲𝗮𝘁 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: Automatic evaluations on Arena-Hard-Auto dataset:
➡️ Aya Expanse 8B beats models from its weight class such as Gemma 2 9B, Llama 3.1 8B, and the recent Ministral 8B, with win rates ranging from 60.4% to 70.6%
➡️ Aya Expanse 32B outperforms Gemma 2 27B, Mistral 8x22B, and Llama 3.1 70B (2x its size)
• ⚠️ But this performance eval comes from only one benchmark! Let's wait for Open LLM leaderboard evals;

🔒 CC by NC license

Blog post here: https://huggingface.co/blog/aya-expanse

posted an update 30 days ago

Post

1512

Cohere drops two new multilingual models!

CohereForAI/aya-expanse-8b
CohereForAI/aya-expanse-32b

Try them out here

CohereForAI/aya_expanse

Reacted to reach-vb's post with 🔥 about 1 month ago

Post

5364

Multimodal Ichigo Llama 3.1 - Real Time Voice AI 🔥

> WhisperSpeech X Llama 3.1 8B
> Trained on 50K hours of speech (7 languages)
> Continually trained on 45hrs 10x A1000s
> MLS -> WhisperVQ tokens -> Llama 3.1
> Instruction tuned on 1.89M samples
> 70% speech, 20% transcription, 10% text
> Apache 2.0 licensed ⚡

Architecture:
> WhisperSpeech/ VQ for Semantic Tokens
> Llama 3.1 8B Instruct for Text backbone
> Early fusion (Chameleon)

I'm super bullish on HomeBrew/ Jan and early fusion, audio and text, multimodal models!

(P.S. Play with the demo on Hugging Face: jan-hq/Ichigo-llama3.1-s-instruct)

Reacted to merve's post with 🤗 3 months ago

Post

2260

amazing leaderboard by @rwightman , compare all the image backbones on various metrics against model performance

below is an example for top-k against inferred samples per second
timm/leaderboard

Reacted to joaogante's post with 🤗 3 months ago

Post

2792

New sampling strategy dropped in 🤗 transformers -- Min P sampling 🔥

Are you tired of having top_k arbitrarily discarding high-quality continuations? Or top_p forgetting to exclude low-probability tokens, derailing your generation? Try out the new min_p flag in generate, fresh from a PR merged today! 🥬

Min P consists of a dynamic token filter -- as opposed to Top K, which keeps the K most likely tokens, and Top P, which keeps the most likely tokens up to a fixed cumulative probability, both static filters. Min P takes a base probability (defined in the min_p flag) and multiplies it by the probability of the most likely token in the distribution for the next token. All tokens less likely than the resulting value are filtered. What happens with this strategy?
👉 High probability token present -> aggressive filter (we don't want to miss on that high-probability case and risk derailing generation)
👉 No high probability token present -> relaxed filter (there are many continuation possibilities that the model finds plausible)

You should set min_p to a low value, between 0.05 and 0.1. It behaves particularly well for creative text generation when paired up with temperature > 1.

Kudos to @kalomaze and @menhguin for creating this technique 🔥 Read their discussion in the original issue for benchmarks (https://github.com/huggingface/transformers/issues/27670)

Copy-pasteable version of the example in the image below here: https://pastebin.com/VqXNtuxd

Have fun experimenting! 😎

Reacted to merve's post with 😎 3 months ago

Post

3052

a new shape-optimized SigLIP just dropped 👀 google/siglip-so400m-patch14-224

posted an update 3 months ago

Post

1606

You can now use DoRA for your embedding layers!

PR: https://github.com/huggingface/peft/pull/2006

I have documented my journey of this specific PR in a blog post for everyone to read. The highlight of the PR was when the first author of DoRA reviewed my code.

Blog Post: https://huggingface.co/blog/ariG23498/peft-dora

Huge thanks to @BenjaminB for all the help I needed.

Reacted to rishiraj's post with 🤗 11 months ago

Post

Hugging Face 🤗

3 replies

Aritra Roy Gosthipaty

AI & ML interests

Recent Activity

Articles

Faster Text Generation with Self-Speculative Decoding

Hugging Face Welcomes the Qwen2.5-Coder Series

PyTorchModelHubMixin: Bridging the Gap for Custom AI Models on Hugging Face

Hugging Face welcomes the Aya Expanse family of multilingual models

🧨 Diffusers welcomes Stable Diffusion 3.5 Large

Llama can now see and run on your device - welcome Llama 3.2

Understanding Vector Quantization in VQ-VAE

Building DoRA Support for Embedding Layers in PEFT

How to communicate in a Pull Request?

The Workflow of PEFT

Announcing New Hugging Face and KerasHub integration

Conditional Probability

What is Probability?

Counting 'n' objects

Organizations

ariG23498's activity