File size: 3,207 Bytes
f030739 98c16aa 5f1dddf dfc099e 5f1dddf f030739 2bdc67a 0b8b22b 2bdc67a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
license: mit
language:
- ru
pipeline_tag: text-classification
tags:
- safetensors
- text-classification
- tensorflow
- russian
library_name: tf-keras
widget:
- text: Мне нравится этот фильм!
output:
- label: POSITIVE
score: 0.98
- label: NEGATIVE
score: 0.02
- text: Какой же ты идиот..
output:
- label: POSITIVE
score: 0.01
- label: NEGATIVE
score: 0.99
- text: Паша, купи уже TURMS
output:
- label: POSITIVE
score: 0.82
- label: NEGATIVE
score: 0.18
- text: Дp пошtл ты, идиот
output:
- label: POSITIVE
score: 0.01
- label: NEGATIVE
score: 0.99
---
#### 1D-CNN-MC-toxicity-classifier-ru
(One-Dimensional Convolutional Neural Network with Multi-Channel input)
Architectural visualization:
![](https://i.imgur.com/skbLM6w.png)
Total parameters: 503249
##### Test Accuracy: 94.44%
##### Training Accuracy: 97.46%
This model is developed for binary classification of Cyrillic text.
##### A dataset of 75093 negative rows and 75093 positive rows was used for training.
##### Recommended length of the input sequence: 25 - 400 Cyrillic characters.
##### Simplifications of the dataset strings:
Removing extra spaces.
Replacing capital letters with small letters. (Я -> я).
Removing any non-Cyrillic characters, including prefixes. (Remove: z, !, ., #, 4, &... etc)
Replacing ё with e.
### Example of use:
import numpy as np
from tensorflow import keras
from tensorflow.keras.preprocessing.text import tokenizer_from_json
from safetensors.numpy import load_file
from tensorflow.keras.preprocessing.sequence import pad_sequences
import os
import re
# Название папки, где хранится модель
model_dir = 'model'
max_len = 400
# Загрузка архитектуры модели
with open(os.path.join(model_dir, 'model_architecture.json'), 'r', encoding='utf-8') as json_file:
model_json = json_file.read()
model = keras.models.model_from_json(model_json)
# Загрузка весов из safetensors
state_dict = load_file(os.path.join(model_dir, 'tf_model.safetensors'))
weights = [state_dict[f'weight_{i}'] for i in range(len(state_dict))]
model.set_weights(weights)
# Загрузка токенизатора
with open(os.path.join(model_dir, 'tokenizer.json'), 'r', encoding='utf-8') as f:
tokenizer_json = f.read()
tokenizer = tokenizer_from_json(tokenizer_json)
def predict_toxicity(text):
sequences = tokenizer.texts_to_sequences([text])
padded = pad_sequences(sequences, maxlen=max_len, padding='post', truncating='post')
probability = model.predict(padded)[0][0]
class_label = "toxic" if probability >= 0.5 else "normal"
return class_label, probability
# Пример использования
text = "Да какой идиот сделал эту НС?"
class_label, probability = predict_toxicity(text)
print(f"Text: {text}")
print(f"Class: {class_label} ({probability:.2%})")
###### Output:
Text: Да какой идиот сделал эту НС?
Class: toxic (99.35%) |