|
--- |
|
library_name: peft |
|
base_model: ybelkada/blip2-opt-2.7b-fp16-sharded |
|
license: apache-2.0 |
|
pipeline_tag: image-to-text |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
Lora for Blip2 to generate QAs from a picture. |
|
|
|
|
|
|
|
## Infertece Demo |
|
|
|
```python |
|
from datasets import load_dataset |
|
from peft import PeftModel |
|
import torch |
|
from transformers import AutoProcessor, Blip2ForConditionalGeneration |
|
|
|
# prepare the model |
|
processor = AutoProcessor.from_pretrained("Salesforce/blip2-opt-2.7b") |
|
model = Blip2ForConditionalGeneration.from_pretrained("ybelkada/blip2-opt-2.7b-fp16-sharded", device_map="auto", load_in_8bit=True) |
|
model = PeftModel.from_pretrained(model, "curlyfu/blip2-OCR-QA-generation") |
|
|
|
# prepare inputs |
|
dataset = load_dataset("howard-hou/OCR-VQA", split="test") |
|
example = dataset[10] |
|
image = example["image"] |
|
|
|
inputs = processor(images=image, return_tensors="pt").to("cuda", torch.float16) |
|
pixel_values = inputs.pixel_values |
|
|
|
generated_ids = model.generate(pixel_values=pixel_values, max_length=100) |
|
generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
print(generated_caption) |
|
``` |
|
|
|
## Thanks |
|
[huggingface/notebooks](!https://github.com/huggingface/notebooks) |