Anyway to 'drop' model to save GPU ram?
Hi, thank you for the reranker. It's working wonderfully in my RAG process.
One thing I am curious on is if there was a way to unload the model in python. On one of my RAG applications I use streamlit as a front end and I noticed that my GPU memory was constantly getting used up the longer I used the application.
To illustrate at app initiation the below code is executed to loan the reranker and my GPU ram usage increased by 1.5gb.
model = FlagReranker('BAAI/bge-reranker-v2-m3', use_fp16=True)
I then use the app by inputting the prompt and my GPU ram usage increases by 2.5gb when this part of the code gets executed;
ranks = model.compute_score(sentence_pairs)
The problem is whenever new prompts are inputted the GPU ram usage increases when the code gets to the model.compute_score
part - this eventually leads to all the GPU ram being used and the application unusable.
One way to I would deal with this is to 'drop' the model to free up the GPU ram usage whenever I have finished the model.compute_score
execution but I don't know how. Any advice would be greatly appreciated, thanks!
All good.
Just realised torch.cuda.empty_cache
coupled with aggressive variable deletion and gc.collect()
will do the trick.