Official ICC model [ACL 2024 Findings]
The official checkpoint of ICC model, introduced in ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation
Usage
The ICC model is used to quantify the concreteness of image captions, and the intended use is finding the best captions in a noisy multimodal dataset. It can be achieved by simply running it over the captions and filtering out samples with low score. It works best in conjunction with CLIP based filtering.
Running the model
Click to expand
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("moranyanuka/icc")
model = AutoModelForSequenceClassification.from_pretrained("moranyanuka/icc").to("cuda")
captions = ["a great method of quantifying concreteness", "a man with a white shirt"]
text_ids = tokenizer(captions, padding=True, return_tensors="pt", truncation=True).to('cuda')
with torch.inference_mode():
icc_scores = model(**text_ids)['logits']
# tensor([[0.0339], [1.0068]])
bibtex:
@misc{yanuka2024icc,
title={ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation},
author={Moran Yanuka and Morris Alper and Hadar Averbuch-Elor and Raja Giryes},
year={2024},
eprint={2403.01306},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 10
Inference API (serverless) has been turned off for this model.