using Gibberish-Detector
Hi madhurjindal,
I tried to use your code to detect Gibberish, but I the output I received seemed strange and I didn't understand it. What I did was the following:
- installed transformers, etc.
- from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("madhurjindal/autonlp-Gibberish-Detector-492513457")
tokenizer = AutoTokenizer.from_pretrained("madhurjindal/autonlp-Gibberish-Detector-492513457") - inputs = tokenizer("I like apples.", return_tensors="pt")
outputs = model(**inputs)
outputs
The output is: SequenceClassifierOutput(loss=None, logits=tensor([[ 2.0615, 0.1996, -2.1773, -0.5643]], grad_fn=), hidden_states=None, attentions=None)
Now, if I understood correctly, tensor gives you the weight/number representing [clean, mild gibberish, word salad, noise] ? What I don't understand is the meaning of the positive and negative number plus its size. What is more, I wanted to ask if with this code one can identify a Gibberish word in a sentence, document, instead of getting some general numbers related to the full sentence?
Thank you in advance for you reply.
Kind regards,
M
Well, the output is the pre-softmax output (logits) - so the range is not fixed. Please use the softmax function at the top of the output to convert the output to the range (0, 1) type of probability, so select the highest probability class. Hopefully this helps.