Input Length
Hi New to this model. I'm trying to do a sentiment analysis on a japanese text. I'm getting the following error:
Input is too long, try to truncate or use a paramater to handle this: The size of tensor a (534) must match the size of tensor b (512) at non-singleton dimension 1
Is there a way to increase the length of the input temporarily through parameters
Hi,
We can't temporarily increase the model's sequence length (i.e., the max length of distilbert model is 512).
The easiest solution is to truncate longer sequences. Here's a code snippet that demonstrates this approach for sentiment analysis:
fn_kwargs={"padding": "max_length", "truncation": True, "max_length": 512}
distilled_student_sentiment_classifier = pipeline(
model="lxyuan/distilbert-base-multilingual-cased-sentiments-student",
return_all_scores=True
)
output = distilled_student_sentiment_classifier(jpn_article, **fn_kwargs)
I haven't had the chance to run this code yet, so please let me know if you encounter any issues or errors while executing it.
I'm running it through response requests. Is there a way to add it to the headers of the request?
sir when i checked the api using post man it shows:
{
"error": "You need to specify either text
or text_target
.",
"warnings": [
"There was an inference error: You need to specify either text
or text_target
."
]
}
am i not suppose to give the input in json format?
I'm running it through response requests. Is there a way to add it to the headers of the request?
Could you please share your code with me? It would make it easier to assist with debugging
sir when i checked the api using post man it shows:
{
"error": "You need to specify eithertext
ortext_target
.",
"warnings": [
"There was an inference error: You need to specify eithertext
ortext_target
."
]
}am i not suppose to give the input in json format?
Could you please share your code with me? It would make it easier to assist with debugging
I'm running it through response requests. Is there a way to add it to the headers of the request?
Could you please share your code with me? It would make it easier to assist with debugging
model = "lxyuan/distilbert-base-multilingual-cased-sentiments-student"
hf_token = "your token from env file"
API_URL = "https://api-inference.huggingface.co/models/" + model
headers = {"Authorization": "Bearer %s" % (hf_token)}
async def analysis(session, data, index):
default = [[{'label': 'negative', 'score': 999},
{'label': 'neutral', 'score': 999},
{'label': 'positive', 'score': 999}]] #replace with empty value
payload = dict(inputs=data, options=dict(wait_for_model=True))
async with session.post(API_URL, headers=headers, json=payload) as response:
if response.status != 200:
print('found an error', response)
if response.status == 400:
print('input length error >> ', index)
try:
return await response.json()
except:
print('broken', index)
response = default
return response
I'm running it through response requests. Is there a way to add it to the headers of the request?
Could you please share your code with me? It would make it easier to assist with debugging
model = "lxyuan/distilbert-base-multilingual-cased-sentiments-student" hf_token = "your token from env file" API_URL = "https://api-inference.huggingface.co/models/" + model headers = {"Authorization": "Bearer %s" % (hf_token)} async def analysis(session, data, index): default = [[{'label': 'negative', 'score': 999}, {'label': 'neutral', 'score': 999}, {'label': 'positive', 'score': 999}]] #replace with empty value payload = dict(inputs=data, options=dict(wait_for_model=True)) async with session.post(API_URL, headers=headers, json=payload) as response: if response.status != 200: print('found an error', response) if response.status == 400: print('input length error >> ', index) try: return await response.json() except: print('broken', index) response = default return response
It is strange that it seems like we can't define the 'truncation' or 'max_length' parameters in the Hugging Face Inference API. One potential workaround, though it might be slower, is to preprocess the text using the Hugging Face tokenizer before passing it into the API.
Reference:
Thanks for the suggestion. I've ended up using nltk to tokenise and remove stop words before feeding it to hugging face API