Training:
- Download tsv files from here: https://github.com/google-research-datasets/wit/blob/main/DATA.md
- Use
prepare_wit.py
to download images from Wikipedia as annotated on each TSV file. - Use
scale_converter.py
to remove corrupt images and resize suitable images to 224x224. - Use
join_datasets_custom_split.py
to group all JSONs from different subsets of the dataset together. - Use
discard_incorrect_files.py
to filter out images that we were not able to convert. - Finally, use
run-clip.sh
to train.