library_name: transformers
pipeline_tag: image-text-to-text
datasets: Vikhrmodels/LLaVA-Instruct-ru
language:
- ru
license: apache-2.0
tags:
- multimodal
- vision
- image-text-to-text
Model Card for Model ID
Русскоязычная версия Idefics, обученная на русифицированном сабсете LLaVA.
Первая версия: слабо развит возможность вести диалог и не работает нарезка изображения (LLaVA-Next like подход). SFT был без текстовых данных, так что вполне возможно просадка по качетсву на text-only данных.
Обучение было в int4 с QLoRA на consumer-grade железе. В след итерациях планируется добавить больше данных и обучить на большем железе.
Скрипты для обучения/инференса добавлю позже.
Model Details
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Model type: ruIdefics2
- Language(s) (NLP): Russian
- License: Apache-2.0
- Finetuned from model [optional]: Idefics2
How to Get Started
This section shows snippets of code for generation for idefics2-8b-base
and idefics2-8b
. The codes only differ by the input formatting. Let's first define some common imports and inputs.
import requests
import torch
from PIL import Image
from io import BytesIO
from transformers import AutoProcessor, AutoModelForVision2Seq
from transformers.image_utils import load_image
DEVICE = "cuda:0"
image1 = load_image("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg")
image2 = load_image("https://cdn.britannica.com/59/94459-050-DBA42467/Skyline-Chicago.jpg")
image3 = load_image("https://cdn.britannica.com/68/170868-050-8DDE8263/Golden-Gate-Bridge-San-Francisco.jpg")
processor = AutoProcessor.from_pretrained("GeorgeBredis/ruIdefics2-ruLLaVA-merged")
model = AutoModelForVision2Seq.from_pretrained(
"GeorgeBredis/ruIdefics2-ruLLaVA-merged",
).to(DEVICE)
# Create inputs
messages = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Что изображено на данной картинке?"},
]
}
]
prompt = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(text=prompt, images=[image1, image2], return_tensors="pt")
inputs = {k: v.to(DEVICE) for k, v in inputs.items()}
generated_ids = model.generate(**inputs, max_new_tokens=500)
generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(generated_texts)