Download datasets:

Download and decompress tsv file from here: https://github.com/google-research-datasets/wit/blob/main/DATA.md
Use prepare_wit.py to download images from Wikipedia.
Use discard_incorrect_files to filter out corrupt files.TODO: Still some corrupt files are being kept. TODO: Make it a CLI.
Finally, use run-clip.sh to train.