--- tags: - transformers - xlm-roberta - eva02 - clip library_name: transformers license: cc-by-nc-4.0 language: - multilingual - af - am - ar - as - az - be - bg - bn - br - bs - ca - cs - cy - da - de - el - en - eo - es - et - eu - fa - fi - fr - fy - ga - gd - gl - gu - ha - he - hi - hr - hu - hy - id - is - it - ja - jv - ka - kk - km - kn - ko - ku - ky - la - lo - lt - lv - mg - mk - ml - mn - mr - ms - my - ne - nl - 'no' - om - or - pa - pl - ps - pt - ro - ru - sa - sd - si - sk - sl - so - sq - sr - su - sv - sw - ta - te - th - tl - tr - ug - uk - ur - uz - vi - xh - yi - zh --- # Jina CLIP Core implementation of Jina CLIP. The model uses: * the [EVA 02](https://github.com/baaivision/EVA/tree/master/EVA-CLIP/rei/eva_clip) architecture for the vision tower * the [Jina XLM RoBERTa with Flash Attention](https://huggingface.co/jinaai/xlm-roberta-flash-implementation) model as a text tower ## Models that use this implementation - [jinaai/jina-clip-v2](https://huggingface.co/jinaai/jina-clip-v2) - [jinaai/jina-clip-v1](https://huggingface.co/jinaai/jina-clip-v1) ## Requirements To use the Jina CLIP source code, the following packages are required: * `torch` * `timm` * `transformers` * `einops` * `xformers` to use x-attention * `flash-attn` to use flash attention * `apex` to use fused layer normalization