Issues when using in Colab & Sentence Transformers
I get a
TypeError: init() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'
When trying to load the model using sentence transformers in a Google Colab Pro notebook. I am not sure how to resolve the issue.
Thanks for your question!
You may try installing customized sentence-transformers here: https://github.com/HKUNLP/instructor-embedding/tree/main/sentence-transformers, and use transformers 4.20.0.
Feel free to leave further questions!
How does one go about doing that specifically? What is the command one must run?
Thanks for the question!
Specifically, you may first clone the repository:
git clone https://github.com/HKUNLP/instructor-embedding
Then go to the sentence-transformers folder:
cd instructor-embedding/sentence-transformers
Finally you will be able to install the customized package:
pip install -e .
Feel free to leave your further questions here.
Has this been integrated into the main Hugging Face sentence-transformers yet?
No, because we have overwritten several classes in the sentence-transformers library to incorporate instructions.
Hi, I cannot find the path to sentence_transformers
Hi, you may want to install the sentence-transformers via pip install sentence-transformers
.
There aren't may results on Google related to this issue except this thread. I have the sentence-transformers installed and I'm still getting the error from the original post.
TypeError: Pooling.__init__() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'
Here's the script that I've used:
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from InstructorEmbedding import INSTRUCTOR
model = INSTRUCTOR('hkunlp/instructor-xl')
pdf_path = "./document.pdf"
loader = PyPDFLoader(pdf_path)
pages = loader.load_and_split()
embeddings = HuggingFaceEmbeddings(model_name="hkunlp/instructor-xl")
db = Chroma.from_documents(documents=pages, embedding=embeddings, persist_directory="./chroma_db")
db.persist()
Any idea how to solve this? Thanks!
EDIT: It works fine with "sentence-transformers/all-MiniLM-L6-v2" model for example.
EDIT 2: This seems to work https://github.com/Muennighoff/sgpt/issues/14#issuecomment-1405205453
You may try to install sentence-transformers 2.2.2.
You may try to install sentence-transformers 2.2.2.
despite trying that, i still get the same exact error
Here's my hack to solve it (until there's an official fix):
INSTR = [instructor-xl, instructor-large, instructor-base] pick your instructor model
edit the pooling config file in models/hkunlp/INSTR/1_Pooling/config.json
Remove the offending lines with "pooling_mode_max_tokens" and "pooling_mode_mean_sqrt_len_tokens"
change this:
{
"word_embedding_dimension": 768,
"pooling_mode_cls_token": false,
"pooling_mode_mean_tokens": true,
"pooling_mode_max_tokens": false,
"pooling_mode_mean_sqrt_len_tokens": false,
"pooling_mode_weightedmean_tokens": false,
"pooling_mode_lasttoken": false
}
to this:
{
"word_embedding_dimension": 768,
"pooling_mode_cls_token": false,
"pooling_mode_mean_tokens": true,
"pooling_mode_max_tokens": false,
"pooling_mode_mean_sqrt_len_tokens": false
}remember to also remove the "," at the end of the line above
change "pooling_mode_mean_sqrt_len_tokens": false,
to "pooling_mode_mean_sqrt_len_tokens": false
Hope this works for you!
Or you could edit the Pooling.py in your installed version of sentence-transformers, as the original author suggested: https://github.com/Muennighoff/sgpt/issues/14#issuecomment-1405205453
Thank You it worked for me
This issue can be solved by updating the sentence-transformers