Spaces:
Running
on
CPU Upgrade
SGPT models sequence length
Hi, thanks for the great benchmark. Quick question: The sequence lengths of the SGPT models are stated here between 2048 and 4096 tokens. However, the SGPT paper states that they were trained with sequence lengths of up to 300 tokens (see e.g. Section 4.2.1, https://arxiv.org/pdf/2202.08904.pdf). Is the 300 token number from the paper correct? Are the models expected to perform well for sequences of 2048 tokens nonetheless? Thanks a lot!
(CC:
@Muennighoff
)
Great point. The numbers are correct - They were only trained on short sequences but you can theoretically use them with much longer sequences. I haven't done any extensive testing on longer sequences so I'm not sure how well it would perform.
Also see this issue: https://github.com/Muennighoff/sgpt/issues/23
Thanks a lot for the reply. I'll try to test it out.