File size: 551 Bytes
a618bc2 2daf3c7 a618bc2 |
1 2 3 4 5 6 7 |
# Download datasets:
* Download and decompress tsv file from here: https://github.com/google-research-datasets/wit/blob/main/DATA.md
* Use `prepare_wit.py` to download images from Wikipedia as annotated on each TSV file.
* Use `scale_converter.py` to remove corrupt images and resize suitable images to 224x224
* Use `join_datasets_custom_split.py` to group all JSONs from different subsets of the dataset together
* Use `discard_incorrect_files.py` to filter out images that we were not able to convert.
* Finally, use `run-clip.sh` to train. |