# Jina CLIP | |
The Jina CLIP implementation is hosted in this repository. The model uses: | |
* the EVA 02 architecture for the vision tower | |
* the Jina BERT with Flash Attention model as a text tower | |
To use the Jina CLIP model, the following packages are required: | |
* `torch` | |
* `timm` | |
* `transformers` | |
* `einops` | |
* `xformers` to use x-attention | |
* `flash-attn` to use flash attention | |
* `apex` to use fused layer normalization | |