metadata
license: apache-2.0
datasets:
- rigonsallauka/polish_ner_dataset
language:
- pl
metrics:
- f1
- recall
- precision
- confusion_matrix
base_model:
- google-bert/bert-base-cased
pipeline_tag: token-classification
tags:
- NER
- medical
- extraction
- symptom
- polish
Polish Medical NER
Use
- Primary Use Case: This model is designed to extract medical entities such as symptoms, diagnostic tests, and treatments from clinical text in the Polish language.
- Applications: Suitable for healthcare professionals, clinical data analysis, and research into medical text processing.
- Supported Entity Types:
PROBLEM
: Diseases, symptoms, and medical conditions.TEST
: Diagnostic procedures and laboratory tests.TREATMENT
: Medications, therapies, and other medical interventions.
Training Data
- Data Sources: Annotated datasets, including clinical data and translations of English medical text into Polish.
- Data Augmentation: The training dataset underwent data augmentation techniques to improve the model's ability to generalize to different text structures.
- Dataset Split:
- Training Set: 80%
- Validation Set: 10%
- Test Set: 10%
Model Training
Training Configuration: - Optimizer: AdamW - Learning Rate: 3e-5 - Batch Size: 64
Epochs: 200- Loss Function: Focal Loss to handle class imbalance
- Frameworks: PyTorch, Hugging Face Transformers, SimpleTransformers
Evaluation metrics
- eval_loss = 0.3968946770636102
- f1_score = 0.7556232119891866
- precision = 0.7552069671056083
- recall = 0.7560399159663865
Visit HUMADEX/Weekly-Supervised-NER-pipline for more info.
How to Use
You can easily use this model with the Hugging Face transformers
library. Here's an example of how to load and use the model for inference:
from transformers import AutoTokenizer, AutoModelForTokenClassification
model_name = "rigonsallauka/polish_medical_ner"
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Sample text for inference
text = "Pacjent skar偶y艂 si臋 na silne b贸le g艂owy i nudno艣ci, kt贸re utrzymywa艂y si臋 przez dwa dni. W celu z艂agodzenia objaw贸w przepisano mu paracetamol oraz zalecono odpoczynek i picie du偶ej ilo艣ci p艂yn贸w."
# Tokenize the input text
inputs = tokenizer(text, return_tensors="pt")