hkunlp
/

instructor-base

Model card Files Files and versions Community

Issues when using in Colab & Sentence Transformers

by Vvkishere - opened Jan 13, 2023

Jan 13, 2023

I get a
TypeError: init() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'

When trying to load the model using sentence transformers in a Google Colab Pro notebook. I am not sure how to resolve the issue.

multi-train

NLP Group of The University of Hong Kong org Jan 13, 2023

Thanks for your question!

You may try installing customized sentence-transformers here: https://github.com/HKUNLP/instructor-embedding/tree/main/sentence-transformers, and use transformers 4.20.0.

Feel free to leave further questions!

Vvkishere

Jan 14, 2023

How does one go about doing that specifically? What is the command one must run?

multi-train

NLP Group of The University of Hong Kong org Jan 14, 2023

Thanks for the question!

Specifically, you may first clone the repository:

git clone https://github.com/HKUNLP/instructor-embedding

Then go to the sentence-transformers folder:

cd instructor-embedding/sentence-transformers

Finally you will be able to install the customized package:

pip install -e .

Feel free to leave your further questions here.

yahma

Mar 26, 2023

Has this been integrated into the main Hugging Face sentence-transformers yet?

multi-train

NLP Group of The University of Hong Kong org Mar 27, 2023

No, because we have overwritten several classes in the sentence-transformers library to incorporate instructions.

zeph3152

Jul 4, 2023

Hi, I cannot find the path to sentence_transformers

multi-train

NLP Group of The University of Hong Kong org Jul 4, 2023

Hi, you may want to install the sentence-transformers via pip install sentence-transformers.

Dujma

Jul 4, 2023

•

edited Jul 4, 2023

There aren't may results on Google related to this issue except this thread. I have the sentence-transformers installed and I'm still getting the error from the original post.

TypeError: Pooling.__init__() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'

Here's the script that I've used:

from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from InstructorEmbedding import INSTRUCTOR

model = INSTRUCTOR('hkunlp/instructor-xl')

pdf_path = "./document.pdf"
loader = PyPDFLoader(pdf_path)
pages = loader.load_and_split()

embeddings = HuggingFaceEmbeddings(model_name="hkunlp/instructor-xl")

db = Chroma.from_documents(documents=pages, embedding=embeddings, persist_directory="./chroma_db")
db.persist()

Any idea how to solve this? Thanks!

EDIT: It works fine with "sentence-transformers/all-MiniLM-L6-v2" model for example.
EDIT 2: This seems to work https://github.com/Muennighoff/sgpt/issues/14#issuecomment-1405205453

multi-train

NLP Group of The University of Hong Kong org Jul 6, 2023

You may try to install sentence-transformers 2.2.2.

karim0

Jul 18, 2023

You may try to install sentence-transformers 2.2.2.

despite trying that, i still get the same exact error

bobse

Jul 24, 2023

•

edited Jul 24, 2023

Here's my hack to solve it (until there's an official fix):

INSTR = [instructor-xl, instructor-large, instructor-base] pick your instructor model

edit the pooling config file in models/hkunlp/INSTR/1_Pooling/config.json
Remove the offending lines with "pooling_mode_max_tokens" and "pooling_mode_mean_sqrt_len_tokens"
change this:
{
"word_embedding_dimension": 768,
"pooling_mode_cls_token": false,
"pooling_mode_mean_tokens": true,
"pooling_mode_max_tokens": false,
"pooling_mode_mean_sqrt_len_tokens": false,
"pooling_mode_weightedmean_tokens": false,
"pooling_mode_lasttoken": false
}
to this:
{
"word_embedding_dimension": 768,
"pooling_mode_cls_token": false,
"pooling_mode_mean_tokens": true,
"pooling_mode_max_tokens": false,
"pooling_mode_mean_sqrt_len_tokens": false
}
remember to also remove the "," at the end of the line above
change "pooling_mode_mean_sqrt_len_tokens": false,
to "pooling_mode_mean_sqrt_len_tokens": false

Hope this works for you!
Or you could edit the Pooling.py in your installed version of sentence-transformers, as the original author suggested: https://github.com/Muennighoff/sgpt/issues/14#issuecomment-1405205453

saikrishnaKas

Aug 22, 2023

Thank You it worked for me

shrijayan

Feb 26

This issue can be solved by updating the sentence-transformers

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment