# Download datasets: * Download and decompress tsv file from here: https://github.com/google-research-datasets/wit/blob/main/DATA.md * Use `prepare_wit.py` to download images from Wikipedia. * Use `discard_incorrect_files` to filter out corrupt files.`TODO: Still some corrupt files are being kept.` `TODO: Make it a CLI`. * Finally, use `run-clip.sh` to train.