SMM4H-2024 Task 2 Japanese NER
Overview
This is a named entity extraction model created by fine-tuning daisaku-s/medtxt_ner_roberta on SMM4H 2024 Task 2a corpus.
Tag set (IOB2 format):
- DRUG
- DISORDER
- FUNCTION
Usage
from transformers import BertForTokenClassification, AutoTokenizer
import torch
text = "銈点兂銉椼儷銉嗐偔銈广儓"
model_name = "yseop/SMM4H2024_Task2a_ja"
with torch.inference_mode():
model = BertForTokenClassification.from_pretrained(model_name).eval()
tokenizer = AutoTokenizer.from_pretrained(model_name)
idx2tag = model.config.id2label
vecs = tokenizer(text,
padding=True,
truncation=True,
return_tensors="pt")
ner_logits = model(input_ids=vecs["input_ids"],
attention_mask=vecs["attention_mask"])
idx = torch.argmax(ner_logits.logits, dim=2).detach().cpu().numpy().tolist()[0]
token = [tokenizer.convert_ids_to_tokens(v) for v in vecs["input_ids"]][0][1:-1]
pred_tag = [idx2tag[x] for x in idx][1:-1]
Results
NE | tp | fp | fn | precision | recall | f1 |
---|---|---|---|---|---|---|
DISORDER | 588 | 409 | 330 | 0.5898 | 0.6405 | 0.6141 |
DRUG | 307 | 143 | 169 | 0.6822 | 0.645 | 0.6631 |
FUNCTION | 69 | 160 | 170 | 0.3013 | 0.2887 | 0.2949 |
all | 964 | 712 | 669 | 0.5752 | 0.5903 | 0.5827 |
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.