|
--- |
|
license: apache-2.0 |
|
tags: |
|
- image-captioning |
|
languages: |
|
- en |
|
pipeline_tag: image-to-text |
|
datasets: |
|
- michelecafagna26/hl-narratives |
|
language: |
|
- en |
|
metrics: |
|
- sacrebleu |
|
- rouge |
|
library_name: transformers |
|
--- |
|
## BLIP-base fine-tuned for Narrative Image Captioning |
|
|
|
[BLIP](https://arxiv.org/abs/2201.12086) base trained on the [HL Narratives](https://huggingface.co/datasets/michelecafagna26/hl-narratives) for **high-level narrative descriptions generation** |
|
|
|
## Model fine-tuning ποΈβ |
|
|
|
- Trained for a 3 epochs |
|
- lr: 5eβ5 |
|
- Adam optimizer |
|
- half-precision (fp16) |
|
|
|
## Test set metrics π§Ύ |
|
|
|
| Cider | SacreBLEU | Rouge-L| |
|
|--------|------------|--------| |
|
| 79.39 | 11.70 | 26.17 | |
|
|
|
## Model in Action π |
|
|
|
```python |
|
import requests |
|
from PIL import Image |
|
from transformers import BlipProcessor, BlipForConditionalGeneration |
|
|
|
processor = BlipProcessor.from_pretrained("blip-base-captioning-ft-hl-narratives") |
|
model = BlipForConditionalGeneration.from_pretrained("blip-base-captioning-ft-hl-narratives").to("cuda") |
|
|
|
img_url = 'https://datasets-server.huggingface.co/assets/michelecafagna26/hl/--/default/train/0/image/image.jpg' |
|
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB') |
|
|
|
|
|
inputs = processor(raw_image, return_tensors="pt").to("cuda") |
|
pixel_values = inputs.pixel_values |
|
|
|
generated_ids = model.generate(pixel_values=pixel_values, max_length=50, |
|
do_sample=True, |
|
top_k=120, |
|
top_p=0.9, |
|
early_stopping=True, |
|
num_return_sequences=1) |
|
|
|
processor.batch_decode(generated_ids, skip_special_tokens=True) |
|
|
|
>>> "she is holding an umbrella near a lake and is on vacation." |
|
``` |
|
|
|
## BibTex and citation info |
|
|
|
```BibTeX |
|
@inproceedings{cafagna2023hl, |
|
title={{HL} {D}ataset: {V}isually-grounded {D}escription of {S}cenes, {A}ctions and |
|
{R}ationales}, |
|
author={Cafagna, Michele and van Deemter, Kees and Gatt, Albert}, |
|
booktitle={Proceedings of the 16th International Natural Language Generation Conference (INLG'23)}, |
|
address = {Prague, Czech Republic}, |
|
year={2023} |
|
} |
|
``` |