Spaces:
Running
Running
Silvia Terragni
commited on
Commit
•
acfaaf8
1
Parent(s):
5fa6a85
Update README.md
Browse files
readme.md
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
# Italian CLIP
|
2 |
|
3 |
-
With a few tricks, we have been able to fine-tune a competitive Italian CLIP model with only 1.4 million training samples.
|
4 |
|
5 |
-
In building this project we kept in mind the following
|
6 |
|
7 |
-
+ **Novel Contributions**: We created a dataset of ~1.4 million Italian image-text pairs and to our knowledge trained the best Italian CLIP model currently in existence;
|
8 |
-
+ **Scientific Validity**: Claim are easy, facts are hard. That's why validation is important to assess the real impact of a model. We thoroughly evaluated our models and made the validation reproducible for everybody.
|
9 |
-
+ **Broader Outlook**:
|
10 |
|
11 |
We put our **hearts** and **souls** into the project during this week! Not only did we work on a cool project, but we were
|
12 |
able to make new friends and and learn a lot from each other to work towards a common goal!
|
@@ -14,7 +14,12 @@ Thank you for this amazing opportunity, we hope you will like the results. :hear
|
|
14 |
|
15 |
# Novel Contributions
|
16 |
|
17 |
-
The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
## More Data
|
20 |
|
@@ -24,9 +29,12 @@ Thus, we tried to add as much data as possible while keeping the data-quality as
|
|
24 |
We considered three main sources of data:
|
25 |
|
26 |
+ WIT. Most of these captions describe ontological knowledge and encyclopedic facts (e.g., Roberto Baggio in 1994).
|
27 |
-
However, this kind of text, without more information, is not useful to learn a good mapping between images and captions.
|
28 |
-
this text is written in Italian and it is good quality.
|
29 |
-
|
|
|
|
|
|
|
30 |
|
31 |
+ MSCOCO-IT.
|
32 |
|
|
|
1 |
# Italian CLIP
|
2 |
|
3 |
+
With a few tricks, we have been able to fine-tune a competitive Italian CLIP model with **only 1.4 million** training samples.
|
4 |
|
5 |
+
In building this project we kept in mind the following principles:
|
6 |
|
7 |
+
+ **Novel Contributions**: We created a dataset of ~1.4 million Italian image-text pairs and, to the best of our knowledge, we trained the best Italian CLIP model currently in existence;
|
8 |
+
+ **Scientific Validity**: Claim are easy, facts are hard. That's why validation is important to assess the real impact of a model. We thoroughly evaluated our models in several tasks and made the validation reproducible for everybody.
|
9 |
+
+ **Broader Outlook**: We always kept in mind which are the possible usages for this model.
|
10 |
|
11 |
We put our **hearts** and **souls** into the project during this week! Not only did we work on a cool project, but we were
|
12 |
able to make new friends and and learn a lot from each other to work towards a common goal!
|
|
|
14 |
|
15 |
# Novel Contributions
|
16 |
|
17 |
+
The original CLIP model was trained on 400 million image-text pairs; this amount of data is not available for Italian.
|
18 |
+
We indeed worked in a **low-resource setting**. The only datasets for captioning in the literature are MSCOCO-IT (a translated version of MSCOCO) and WIT.
|
19 |
+
To get competitive results we followed three strategies:
|
20 |
+
1. more data;
|
21 |
+
2. better augmentations;
|
22 |
+
3. better training.
|
23 |
|
24 |
## More Data
|
25 |
|
|
|
29 |
We considered three main sources of data:
|
30 |
|
31 |
+ WIT. Most of these captions describe ontological knowledge and encyclopedic facts (e.g., Roberto Baggio in 1994).
|
32 |
+
However, this kind of text, without more information, is not useful to learn a good mapping between images and captions.
|
33 |
+
On the other hand, this text is written in Italian and it is good quality.
|
34 |
+
To prevent polluting the data with captions that are not meaningful, we used POS tagging
|
35 |
+
on the data and removed all the captions that were composed for the 80% or more by PROPN.
|
36 |
+
|
37 |
+
Example: ....
|
38 |
|
39 |
+ MSCOCO-IT.
|
40 |
|