Text context length?

by dingo-actual - opened May 31

Discussion

dingo-actual

May 31

What's the text context length for jina-clip-v1?

paulmaksimovich

Jun 2

•

edited Jun 2

Could be 8192? That's what the tokenizer config says anyways.
https://huggingface.co/jinaai/jina-clip-v1/blob/1bae0621529ced998c73bca234a8cb9da997f33c/tokenizer_config.json#L49

Clip patch32 is 77.
Curious to know where this stands.

paulmaksimovich

Jun 2

•

edited Jun 2

Ah scratch that, the paper states - https://arxiv.org/pdf/2405.20204

For stage 2, Ctext pairs
is used again. However, text values are truncated to 512 tokens in this case, and as a result a smaller batch size of 8,192 is used.

So looks like it's 512, if I'm reading that right?

FremyCompany

Jun 3

I had the same question.

The largest size that is well aligned with images per training seems to be 512 instead. However, this might generalize further, for example if the third and last stage of finetuning allows for longer text-only sequences (this unfortunately isn't mentioned in the paper). It might also weakly generalize just because the initial BERT model supported longer input texts (8192 it seems, per the config file), but this would have to be tested.

I would love to get some clarity on that. Any thoughts, @gmastrapas or @bwang0911 ?

bwang0911

Jina AI org Jun 3

hi all, our backbone model JinaBERT support very long sequence (we say up to 8192, but should be unlimited).

we contrastively train the model with a seq length of 512 on embedding tasks, but this does not mean that the model can only handle 512, it should be able to handle much longer sequence, same as jina-embeddings-v2.

However, our experience tell us the best sequence length to get sentence embeddings is around ~512-1000. My suggestion is keep the document below 1000 tokens, but it will definitely work beyond much longer than 1000.

@dingo-actual @paulmaksimovich @FremyCompany

bwang0911 changed discussion status to closed Jun 4

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment