arxiv:2108.08688

Contrastive Language-Image Pre-training for the Italian Language

Published on Aug 19, 2021

Upvote

Authors:

Federico Bianchi ,

Giuseppe Attanasio ,

Raphael Pisoni ,

Gabriele Sarti ,

Sri Lakshmi

Abstract

CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts. The model is trained on a massive amount of English data and shows impressive performance on zero-shot classification tasks. Training the same model on a different language is not trivial, since data in other languages might be not enough and the model needs high-quality translations of the texts to guarantee a good performance. In this paper, we present the first CLIP model for the Italian Language (CLIP-Italian), trained on more than 1.4 million image-text pairs. Results show that CLIP-Italian outperforms the multilingual CLIP model on the tasks of image retrieval and zero-shot classification.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2108.08688 in a dataset README.md to link it from this page.

Contrastive Language-Image Pre-training for the Italian Language

Abstract

Community

Models citing this paper 1

Datasets citing this paper 0

Spaces citing this paper 5

Collections including this paper 1