KoichiYasuoka
commited on
Commit
•
467e9af
1
Parent(s):
57e450a
initial release
Browse files
README.md
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- "lzh"
|
4 |
+
tags:
|
5 |
+
- "classical chinese"
|
6 |
+
- "literary chinese"
|
7 |
+
- "ancient chinese"
|
8 |
+
- "sentence segmentation"
|
9 |
+
license: "apache-2.0"
|
10 |
+
pipeline_tag: "token-classification"
|
11 |
+
widget:
|
12 |
+
- text: "子曰學而時習之不亦說乎有朋自遠方來不亦樂乎人不知而不慍不亦君子乎"
|
13 |
+
---
|
14 |
+
|
15 |
+
# roberta-classical-chinese-base-sentence-segmentation
|
16 |
+
|
17 |
+
## Model Description
|
18 |
+
|
19 |
+
This is a RoBERTa model pre-trained on Classical Chinese texts for sentence segmentation, derived from [roberta-classical-chinese-base-char](https://huggingface.co/KoichiYasuoka/roberta-classical-chinese-base-char).
|
20 |
+
|
21 |
+
## How to Use
|
22 |
+
|
23 |
+
```py
|
24 |
+
import torch
|
25 |
+
from transformers import AutoTokenizer,AutoModelForTokenClassification
|
26 |
+
tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/roberta-classical-chinese-base-sentence-segmentation")
|
27 |
+
model=AutoModelForMaskedLM.from_pretrained("KoichiYasuoka/roberta-classical-chinese-base-sentence-segmentation")
|
28 |
+
s="子曰學而時習之不亦說乎有朋自遠方來不亦樂乎人不知而不慍不亦君子乎"
|
29 |
+
p=[model.config.id2label[q] for q in torch.argmax(model(tokenizer.encode(s,return_tensors="pt"))[0],dim=2)[0].tolist()[1:-1]]
|
30 |
+
print("".join(c+"。" if q=="E" or q=="S" else c for c,q in zip(s,p)))
|
31 |
+
```
|
32 |
+
|