About extracting embedding vectors of images and texts.
#10
by
iceleaf97tech
- opened
Is it possible to extract embedding vectors of images and texts using these models?
If so, how should I do that?
Can you provide the template of codes? thx
Multimodal visual VQA models are not recommended for embedding extraction:
- VQA models are primarily designed for visual question-answering tasks, with architectures and optimization goals that differ from embedding extraction.
- The CLIP model is specifically trained for aligned embeddings of images and text, providing better performance and greater adaptability.
Using CLIP for embedding extraction is more efficient and better suited to the practical requirements of embedding tasks.
zRzRzRzRzRzRzR
changed discussion status to
closed