Japanese Named Entity Recognition (NER)

This model is built using XLM-RoBERTa for Japanese text to recognize named entities such as persons, organizations, locations, and other categories. The model is designed specifically for Japanese text and can be used for a variety of tasks that require entity extraction from Japanese documents or conversations.

Overview
NER Tags
Model Details
Sample Input and Output

Overview

Named Entity Recognition (NER) is a critical task in natural language processing (NLP) for identifying and classifying entities in text. This model recognizes named entities in Japanese, making it ideal for use in applications like document analysis, chatbots, or information retrieval in the Japanese language.

NER Tags

The model identifies the following tags:

Class ID	Tag	Description
0	O	Outside any entity
1	PER	Person names
2	ORG	Organizations
3	ORG-P	Political orgs
4	ORG-O	Other orgs
5	LOC	Locations
6	INS	Institutions
7	PRD	Products
8	EVT	Events

Model Details

Base Model: xlm-roberta-base
Task: Token Classification (NER)
Languages: Japanese
Input: Japanese text
Output: Tokenized text with NER tags

Sample Input and Output

Here’s an example input sentence and the expected NER output.

Input

中国では、中国共産党による一党統治が続く。

Output

Token	Predicted Tag
中国	LOC
では	O
、	O
中国	ORG-P
共産党	ORG-P
による	O
一党	O
統治	O
が	O
続く	O
。	O

Visualization with Gradio and spaCy

The NER output is also visualized in color-coded format for ease of interpretation:

Entities Output:

LOC (Location): China (中国)
ORG-P (Political Organization): Chinese Communist Party (中国共産党)

Here’s the updated README section with the class names replacing the class IDs:

Model Performance Metrics

The following performance metrics were achieved by the model during evaluation:

Overall Metrics:

Total Accuracy: 98.42%
Total F1-score: 99.33%

Class-wise Metrics:

Class	Recall	Precision
O	99.94%	99.00%
PER	97.53%	98.80%
ORG	99.22%	96.23%
ORG-P	95.30%	99.71%
ORG-O	97.80%	98.26%
LOC	99.03%	96.71%
INS	98.88%	99.07%
PRD	99.31%	99.67%
EVT	98.96%	98.31%

The model demonstrates strong overall performance, with particularly high F1-scores and balanced class-wise precision and recall values.

sabaridsnfuji
/

xlm-roberta-name-entity-recognition-japanese