16 6 186

Sourab Mangrulkar

smangrul

https://www.linkedin.com/in/sourab-m/

pacman100

AI & ML interests

Machine Learning, Deep Learning, Natural Language Processing, Natural Language Generation, Computer Vision, Reinforcement Learning

Recent Activity

updated a Space 10 days ago

smangrul/PEFT-Docs-QA-Chatbot

View all activity

Articles

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

May 24, 2023

• 93

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Mar 9, 2023

• 34

🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

Feb 10, 2023

• 37

Accelerate Large Model Training using DeepSpeed

Jun 28, 2022

• 2

Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel

May 2, 2022

• 1

Organizations

smangrul's activity

posted an update 7 months ago

Post

3195

Unlocking the Power of locally running Llama-3 8B Model Agents with Chat-UI! 🔥🚀✨

I'm thrilled to share my hackathon-style side project:
1. Finetuning Llama-8B for function calling using PEFT QLoRA as the instruct Llama-3 model doesn't support this. The colab notebook for it is here: https://lnkd.in/ggJMzqh2. 🛠️
2. Finetuned model along with the 4-bit quants here: https://lnkd.in/gNpFKY6V ✨
3. Clone Hugging Face https://lnkd.in/gKBKuUBQ and make it compatible for function calling by building upon the PR https://lnkd.in/gnqFuAd4 for my model and local inferencing usecase using Ollama. This was a steep learning curve wherein I stayed awake the whole night to get it working. 💪🏽
4. Above, I used SerpAPI for web browsing and Mongo DB Atlas free tier for persistence of conversations and assistant configs. 🔎
5. More work is required to switch between using tools and responding directly wherein I see the model breaks. 🧐

How cool is this wherein we are approaching experience akin to ChatGPT while using local hosted agent model running on your laptop! 💻

1 reply

Reacted to vikhyatk's post with ❤️ 8 months ago

Post

2228

Just released a dataset with 1.5M image question/answers! vikhyatk/lnqa

Reacted to Titus-von-Koeller's post with ❤️🤗 8 months ago

Post

We just released bitsandbytes==0.43.0 📦 , with these significant new additions:

‣ 🛫 FSDP+QLoRA support (alpha release)
◦ now anyone with 2 powerful gaming GPUs can fine-tune 70B param models at home!
◦ in collab with Jeremy Howard + team @ answer.ai
◦ answer.ai blogpost: https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html
◦ example repo: https://github.com/AnswerDotAI/fsdp_qlora/

‣ 🌈⊞ Official Windows support
◦ now via simple pip install bitsandbytes>=0.43.0

‣ 📄 Huge docs update:
◦ https://huggingface.co/docs/bitsandbytes/main
◦ Be sure to check out the optimizers and the API docs
◦ ... even more upcoming ...

Under the hood there we have many other improvements, due to extensive maintenance activity, community contributions by super active + knowledgable volunteers ✨ 🚀 and the official sponsorship by Hugging Face that makes all this possible 🤗 ❤️ 🌍

We would greatly appreciate any further community contributions, be it to help with refactorings, exterminating flaky tests, writing doc-strings, tutorials, new features. Don't be shy, just contact us and we see where this leads us:
https://github.com/TimDettmers/bitsandbytes/discussions

Have a great weekend everyone!

1 reply

posted an update 8 months ago

Post

2810

🤗 PEFT v0.10.0 release! 🔥🚀✨

Some highli📝ghts:
1. FSDP+QLoRA and DeepSpeed Stage-3+QLoRA
2. Layer expansion + LoRA
3. DoRA support for Conv2D layers and quantized bitsandbytes layers
4. New LoftQ utility
5. Batched inference for mixed LoRA adapters.

http://Answer.AI team in collaboration with bitsandbytes and Hugging Face 🤗 open sourced code enabling the usage of FSDP+QLoRA and explained the whole process in their insightful blogpost https://lnkd.in/g6jgfXyv. This is now integrated into Hugging Face ecosystem.

For an end-to-end example on FSDP+QLoRA, please refer https://lnkd.in/gT3yY-Rx.

For an end-to-end example on DeepSpeed Stage-3+QLoRA, please refer https://lnkd.in/gkt-xZRE.

With the PR https://lnkd.in/g5F348MN these changes are now upstreamed in https://lnkd.in/g5_MxYtY thanks to Wing Lian ! 🚀

Kudos to http://Answer.AI team, Titus von Köller , Younes Belkada, Benjamin Bossan and Zachary Mueller for all the help without which this couldn't have been possible. 🤗

For efficient depthwise layer expansion akin to passthrough method of mergekit but without using additional memory and attaching LoRAs to it, refer to the details below! 🔥https://lnkd.in/ge95ztjA

Now DoRA is supported for Conv2D layers as well as bitsandbytes quantized layers ✨. For more details, please refer the below thread.
https://lnkd.in/gsJbuWPD

Now you can mix different LoRA adapters in a batch during inference which speeds-up the inference by avoiding computation of base model multiple times which would be the case for adaptive inference with batch_size=1! ⚡️.
Details below. https://lnkd.in/gD-pcX_B

LoftQ reduces quantization error by appropriately initializing the LoRA adapter weights. Normally, this is a two-step process. Benjamin Bossan
added new util replace_lora_weights_loftq for LoftQ to use it on the fly with bnb.

For more details, refer to the release notes. 📝
https://lnkd.in/gg7-AmHA. As always, make sure losses go down and be happy to watch your model train!

1 reply

Reacted to lewtun's post with ❤️ 9 months ago

Post

Can we align code generation models to be good at chat without compromising their base capabilities 🤔?

This was the question the H4 team asked itself when BigCode released StarCoder2 a bit over a week ago. We knew that code models like deepseek-ai/deepseek-coder-6.7b-instruct and m-a-p/OpenCodeInterpreter-DS-33B get impressive scores on code benchmarks like HumanEval, but they tend to score poorly on chat benchmarks like MT Bench and IFEval. We also knew that the Zephyr recipe we applied to Mistral 7B produced a strong chat model, so we wondered -- could be tweaked to produce a strong coding assistant?

It turns out the answer is yes and I'm happy to share StarChat2, a DPO fine-tune of StarCoder2 15B that scores highly on both HumanEval and MT Bench / IFEval 🌟!

The most interesting lesson for me was that you get better models by blending in more code/math data than chat during the SFT step - in terms of tokens, we found a ratio of 3:1 worked best.

Anyway, here's a demo of the model, along with all the code and datasets we used to train it:

* Demo: HuggingFaceH4/starchat2-playground
* Collection: HuggingFaceH4/starchat2-15b-65f068417b330fafad751fce
* Recipe: https://github.com/huggingface/alignment-handbook

Hope it's useful to others!

3 replies

replied to their post 9 months ago

cc @ybelkada for this question.

posted an update 9 months ago

Post

🚨 Now you can run Starcoder- 2 models locally on your Mac M1 Pro Apple Silicon with 16GB memory! 🧑🏽‍💻 ⚡️✨

Below is the UX with Twinny extension using bigcode/starcoder2-3b for FIM and codellama/CodeLlama-7b-Instruct-hf for chat. Dev tools is showing the prompt being sent to ollama server.

Starcoder-2 is now supported in llama.cpp https://github.com/ggerganov/llama.cpp/pull/5795!

cd llama.cpp
python convert-hf-to-gguf.py ../starcoder2-3b/ --outfile models/starcoder2-3b.gguf --outtype "f16"
./quantize models/starcoder2-3b.gguf models/starcoder2-3b-Q4_K_M.gguf Q4_K_M

For more details, please go through the following tweet thread: https://x.com/sourab_m/status/1764583139798823235?s=20

Reacted to loubnabnl's post with ❤️🤯🤗 9 months ago

Post

⭐ Today we’re releasing The Stack v2 & StarCoder2: a series of 3B, 7B & 15B code generation models trained on 3.3 to 4.5 trillion tokens of code:

- StarCoder2-15B matches or outperforms CodeLlama 34B, and approaches DeepSeek-33B on multiple benchmarks.
- StarCoder2-3B outperforms StarCoderBase-15B and similar sized models.
- The Stack v2 a 4x larger dataset than the Stack v1, resulting in 900B unique code tokens 🚀
As always, we released everything from models and datasets to curation code. Enjoy!

🔗 StarCoder2 collection: bigcode/starcoder2-65de6da6e87db3383572be1a
🔗 Paper: https://drive.google.com/file/d/17iGn3c-sYNiLyRSY-A85QOzgzGnGiVI3/view
🔗 BlogPost: https://huggingface.co/blog/starcoder2
🔗 Code Leaderboard: bigcode/bigcode-models-leaderboard

posted an update 9 months ago

Post

🚨 New Release of 🤗PEFT!

1. New methods for merging LoRA weights. Refer this HF Post for more details: https://huggingface.co/posts/smangrul/850816632583824

2. AWQ and AQLM support for LoRA. You can now:
- Train adapters on top of 2-bit quantized models with AQLM
- Train adapters on top of powerful AWQ quantized models
Note for inference you can't merge the LoRA weights into the base model!

3. DoRA support: Enabling DoRA is as easy as adding use_dora=True to your LoraConfig. Find out more about this method here: https://arxiv.org/abs/2402.09353

4. Improved documentation, particularly docs regarding PEFT LoRA+DeepSpeed and PEFT LoRA+FSDP! 📄 Check out the docs at https://huggingface.co/docs/peft/index.

5. Full Release Notes: https://github.com/huggingface/peft/releases/tag/v0.9.0

4 replies

Reacted to merve's post with 🤝👍 9 months ago

Post

I've tried DoRA (https://arxiv.org/abs/2402.09353) with SDXL using PEFT, outputs are quite detailed 🤩🌟
as usual trained on lego dataset I compiled, I compared them with previously trained pivotal tuned model and the normal DreamBooth model before that 😊

Notebook by @linoyts https://colab.research.google.com/drive/134mt7bCMKtCYyYzETfEGKXT1J6J50ydT?usp=sharing
Integration to PEFT by @BenjaminB https://github.com/huggingface/peft/pull/1474 (more info in the PR)

posted an update 9 months ago

Post

Exciting news for Indic LLMs! 🚀

Sarvam AI just released high-quality, curated dataset with multi-turn conversations in English, Hindi, and Hinglish! 💎 With a whopping 100K samples! 🤯
Check it out: sarvamai/samvaad-hi-v1

Who's going to finetune high-quality SFT models on this dataset? ✨
if you are interested in pushing the boundaries with respect to Indic LLMs, join the discord channel: https://discord.gg/hugging-face-879548962464493619

posted an update 9 months ago

Post

🚀 Exciting news from 🤗 PEFT!

We are introducing new merging methods for LoRA adapters. These methods allow for retaining the unique capabilities of individual LoRAs while enabling them to combine their strengths: https://huggingface.co/blog/peft_merging

We explored the application of merging LoRA adapters in the context of personal code copilot before 🚀👾✨. Please go through the below thread on it: https://x.com/sourab_m/status/1718008115726283004?s=20

New merging methods ties, dare, and magnitude_prune introduced alongside existing methods cat, linear, and svd. Blogpost details each method. These methods can be applied on-the-fly during inference time instead of merging offline enabling great developer UX. ✨

How do I merge my LoRA adapters?
Easy, use class method add_weighted_adapter(). For example, below you can see how we can combine three LoRA adapters using ties method. We can observe that merged adapter can retain the capabilities of individual adapters!

Now that we have seen they can retain individual LoRAs, how about use cases wherein we require the capabilities from multiple LoRAs being merged/combined? Below is an application of it in text-to-image domain. 🖼️

Kudos to @prateeky2806 (TIES author) and Le Yu (DARE author) for their kind and generous guidance on the PRs! Also, if you want to explore full model merging, refer to super cool projects like https://github.com/arcee-ai/mergekit/tree/main, https://github.com/Gryphe/BlockMerge_Gradient and https://github.com/yule-BUAA/MergeLM/tree/main.

Excited to see what the community creates on top of this! 🚀✨ #LetsBuildTogether

Reacted to dvilasuero's post with 🤯🤗❤️ 9 months ago

Post

🤗 Data is better together!

Data is essential for training good AI systems. We believe that the amazing community built around open machine learning can also work on developing amazing datasets together.

To explore how this can be done, Argilla and Hugging Face are thrilled to announce a collaborative project where we’re asking Hugging Face community members to build a dataset consisting of LLM prompts collectively.

What are we doing?
Using an instance of Argilla — a powerful open-source data collaboration tool — hosted on the Hugging Face Hub, we are collecting ratings of prompts based on their quality.

How Can You Contribute?
It’s super simple to start contributing:

1. Sign up if you don’t have a Hugging Face account

2. Go to this Argilla Space and sign in: https://huggingface.co/spaces/DIBT/prompt-collective

3. Read the guidelines and start rating prompts!

You can also join the #data-is-better-together channel in the Hugging Face Discord.

Finally, to track the community progress we'll be updating this Gradio dashboard:

https://huggingface.co/spaces/DIBT/prompt-collective-dashboard

5 replies

Reacted to satpalsr's post with ❤️ 10 months ago

Post

Introducing Gajendra!

An early release of our 7B Hindi-Hinglish-English Instruction fine-tuned language model.

Model: BhabhaAI/Gajendra-v0.1

We additionally explore ways to filter examples that can be translated from English to Hindi and are releasing initial versions of both dataset and model for it.

Model: BhabhaAI/Mistral-translation-classify
Dataset: BhabhaAI/translation-classify

Looking forward to collaborate with open source community to accelerate and release Hindi LLMs.

2 replies

Sourab Mangrulkar

AI & ML interests

Recent Activity

Articles

GaLore: Advancing Large Model Training on Consumer-grade Hardware

🤗 PEFT welcomes new merging methods

Mixture of Experts Explained

Personal Copilot: Train Your Own Coding Assistant

Fine-tuning Llama 2 70B using PyTorch FSDP

The Falcon has landed in the Hugging Face ecosystem

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

🤗 PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware

Accelerate Large Model Training using DeepSpeed

Accelerate Large Model Training using PyTorch Fully Sharded Data Parallel

Organizations

smangrul's activity