jinaai/jina-embeddings-v3 · How to serve a jina-embeddings-v3 classification task using onnxruntime

9 days ago

Can you provide more detailed sample code? The example from README is a bit unclear, especially the output format is not explained.
I got a output, with shape ([batch_size, seq_len, 1024], [batch_size, 1024]). My questions are:

How can I get the each label of each input text?
How can I get the embedding representation of each input text?

jupyterjazz

Jina AI org 9 days ago

Hi @luozhouyang

How can I get the embedding representation of each input text?

You should apply mean pooling to the token outputs ([batch_size, seq_len, 1024]). You can use this function:
https://huggingface.co/jinaai/xlm-roberta-flash-implementation/blob/12700ba4972d9e900313a85ae855f5a76fb9500e/modeling_xlm_roberta.py#L630

How can I get the each label of each input text?

If you want to get labels, you should train a classifier on top, or use some kind of a zero-shot classification technique. jina-embeddings-v3 is only responsible for the embedding and does not directly output a label.

luozhouyang

8 days ago

@jupyterjazz Thanks!
Can I get the embedding representations of both input text and labels using jina-embeddings-v3 (with task=classification), and then compute the cosine similarities as the confidence score?