Silly-Machine
/

TuPy-Bert-Large-Binary-Classifier

Text Classification

Inference Endpoints

Model card Files Files and versions Community

TuPy-Bert-Large-Binary-Classifier / README.md

victoriadreis's picture

Update README.md

10df29f 11 months ago

|

history blame contribute delete

3.35 kB

	---
	license: mit
	datasets:
	- Silly-Machine/TuPyE-Dataset
	language:
	- pt

	pipeline_tag: text-classification
	base_model: neuralmind/bert-large-portuguese-cased
	widget:
	- text: 'Bom dia, flor do dia!!'

	model-index:
	- name: Yi-34B
	results:
	- task:
	type: text-classfication
	dataset:
	name: TuPyE-Dataset
	type: Silly-Machine/TuPyE-Dataset
	metrics:
	- type: accuracy
	value: 0.907
	name: Accuracy
	verified: true
	- type: f1
	value: 0.903
	name: F1-score
	verified: true
	- type: precision
	value: 0.901
	name: Precision
	verified: true
	- type: recall
	value: 0.907
	name: Recall
	verified: true
	---

	## Introduction


	TuPy-Bert-Large-Binary-Classifier is a fine-tuned BERT model designed specifically for binary classification of hate speech in Portuguese.
	Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-large-portuguese-cased),
	TuPy-Bert-Large-Binary-Classifier is a refined solution for addressing binary hate speech concerns (hate or not hate).
	For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).

	The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data.
	In the creation of a specialized Portuguese Language Model tailored for hate speech classification,
	the original BERTimbau model underwent fine-tuning processe carried out on
	the [TuPy Hate Speech DataSet](https://huggingface.co/datasets/Silly-Machine/TuPyE-Dataset), sourced from diverse social networks.

	## Available models

	\| Model \| Arch. \| #Layers \| #Params \|
	\| ---------------------------------------- \| ---------- \| ------- \| ------- \|
	\| `Silly-Machine/TuPy-Bert-Base-Binary-Classifier` \| BERT-Base \|12 \|109M\|
	\| `Silly-Machine/TuPy-Bert-Large-Binary-Classifier` \| BERT-Large \| 24 \| 334M \|
	\| `Silly-Machine/TuPy-Bert-Base-Multilabel` \| BERT-Base \| 12 \| 109M \|
	\| `Silly-Machine/TuPy-Bert-Large-Multilabel` \| BERT-Large \| 24 \| 334M \|

	## Example usage

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
	import torch
	import numpy as np
	from scipy.special import softmax

	def classify_hate_speech(model_name, text):
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	config = AutoConfig.from_pretrained(model_name)

	# Tokenize input text and prepare model input
	model_input = tokenizer(text, padding=True, return_tensors="pt")

	# Get model output scores
	with torch.no_grad():
	output = model(**model_input)
	scores = softmax(output.logits.numpy(), axis=1)
	ranking = np.argsort(scores[0])[::-1]

	# Print the results
	for i, rank in enumerate(ranking):
	label = config.id2label[rank]
	score = scores[0, rank]
	print(f"{i + 1}) Label: {label} Score: {score:.4f}")

	# Example usage
	model_name = "Silly-Machine/TuPy-Bert-Large-Binary-Classifier"
	text = "Bom dia, flor do dia!!"
	classify_hate_speech(model_name, text)

	```