flax-community
/

clip-spanish

Inference Endpoints

Model card Files Files and versions Community

clip-spanish / README.md

edugp's picture

Add all necessary files to replicate training run

2daf3c7 over 3 years ago

|

551 Bytes

	# Download datasets:
	* Download and decompress tsv file from here: https://github.com/google-research-datasets/wit/blob/main/DATA.md
	* Use `prepare_wit.py` to download images from Wikipedia as annotated on each TSV file.
	* Use `scale_converter.py` to remove corrupt images and resize suitable images to 224x224
	* Use `join_datasets_custom_split.py` to group all JSONs from different subsets of the dataset together
	* Use `discard_incorrect_files.py` to filter out images that we were not able to convert.
	* Finally, use `run-clip.sh` to train.