mteb/leaderboard · Allow setting seq_len/size/dim for gated models

tomaarsen

Massive Text Embedding Benchmark org Jun 5

•

edited Jun 5

Hello!

Pull Request overview

Allow setting seq_len/size/dim for gated models

Details

By throwing exceptions more quickly when data can't be gathered from the model automatically, we can use the model_meta.yaml with models not marked as external to specify some of the parameters there. This fixes https://huggingface.co/nvidia/NV-Embed-v1 and https://huggingface.co/Linq-AI-Research/Linq-Embed-Mistral not having any model sizes/memory usage/max tokens:

cc @nada5 @linqresearch as this affects your models.

Tom Aarsen

Allow setting seq_len/size/dim for gated models9bce65fe

tomaarsen changed pull request status to open Jun 5

Update edge case where model is not specifieda9153ccd

Linq-Embed-Mistral is now integrated with Sentence Transformers0769964b

Muennighoff

Massive Text Embedding Benchmark org Jun 5

Looks great! Maybe explaining EXTERNAL_MODEL_TO_SIZE[name_without_org] * 1e6 * 4 / 1024**3 with a comment in the code could help but else feel free to merge!

Clarify math for memory usaged8b28e21

Merge commit 'refs/pr/121' of https://huggingface.co/spaces/mteb/leaderboard into pr/12153b23bde

tomaarsen

Massive Text Embedding Benchmark org Jun 5

Clarified the math via https://huggingface.co/spaces/mteb/leaderboard/commit/d8b28e21231e4146fda8321f753e80a172cfd169, will merge now :)

tomaarsen changed pull request status to merged Jun 5