How much GPU memory is required for 32k context embedding?
#13
by
Labmem009
- opened
I tried to use this model to get embedding of long text, but I failed many times with 6*A100 and DP for OOM. Is there any suggestion to allocate memory for long text?
Try:
with torch.no_grad():
outputs = model(**tokens)
I can do 4K tokens with room to spare on 2x 16GB GPUs and fp16
Is there any way to do this while using sentence-transformers? Every time I try to load it, it tries to allocate 96GB of VRAM.
embedding = HuggingFaceEmbeddings(model_name='Salesforce/SFR-Embedding-Mistral', model_kwargs={'device':f"cuda:{device_num}"})