@VictorSanh on Hugging Face: "Can't wait to see multimodal LLama 3! We released a resource that might come…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

VictorSanh

posted an update Apr 19

Post

2538

Can't wait to see multimodal LLama 3!

We released a resource that might come in handy: The Cauldron 🍯

The Cauldron is a massive manually-curated collection of 50 vision-language sets for instruction fine-tuning. 3.6M images, 30.3M query/answer pairs.

It covers a large variety of downstream uses: visual question answering on natural images, OCR, document/charts/figures/tables understanding, textbooks/academic question, reasoning, captioning, spotting differences between 2 images, and screenshot-to-code.

HuggingFaceM4/the_cauldron

Nitral-AI

Apr 22

•

edited Apr 22

weizhiwang/LLaVA-Llama-3-8B

First llava 1.5 llama 3 pretrain, managed to make a projector file out of it that works with any llama 3 8b model. (this can be used with any backend that supports llava mmproject.)

ChaoticNeutrals/Llava_1.5_Llama3_mmproj

In this post

VictorSanh Victor Sanh
Nitral-AI Nitral