lIlBrother
commited on
Commit
•
f5442ab
1
Parent(s):
603198b
Update: 최종 완료 모델에 대한 README 확정
Browse files
README.md
CHANGED
@@ -20,25 +20,25 @@ model-index:
|
|
20 |
name: text2text-generation # Optional. Example: Speech Recognition
|
21 |
metrics:
|
22 |
- type: bleu # Required. Example: wer. Use metric id from https://hf.co/metrics
|
23 |
-
value: 0.
|
24 |
name: eval_bleu # Optional. Example: Test WER
|
25 |
-
verified:
|
26 |
- type: rouge1 # Required. Example: wer. Use metric id from https://hf.co/metrics
|
27 |
-
value: 0.
|
28 |
name: eval_rouge1 # Optional. Example: Test WER
|
29 |
-
verified:
|
30 |
- type: rouge2 # Required. Example: wer. Use metric id from https://hf.co/metrics
|
31 |
-
value: 0.
|
32 |
name: eval_rouge2 # Optional. Example: Test WER
|
33 |
-
verified:
|
34 |
- type: rougeL # Required. Example: wer. Use metric id from https://hf.co/metrics
|
35 |
-
value: 0.
|
36 |
name: eval_rougeL # Optional. Example: Test WER
|
37 |
-
verified:
|
38 |
- type: rougeLsum # Required. Example: wer. Use metric id from https://hf.co/metrics
|
39 |
-
value: 0.
|
40 |
name: eval_rougeLsum # Optional. Example: Test WER
|
41 |
-
verified:
|
42 |
---
|
43 |
|
44 |
# ko-barTNumText(TNT Model🧨): Try Number To Korean Reading(숫자를 한글로 바꾸는 모델)
|
@@ -78,33 +78,12 @@ aihub에서 데이터를 받으실 분은 한국인일 것이므로, 한글로
|
|
78 |
|
79 |
|
80 |
## Uses
|
81 |
-
This Model is inferenced token BACKWARD. so, you have to `flip` before `tokenizer.decode()` <br />
|
82 |
-
해당 모델은 inference시 역순으로 예측합니다. (밥을 6시에 먹었어 -> 어 먹었 시에 여섯 을 밥) <br />
|
83 |
-
때문에 `tokenizer.decode`를 수행하기 전에, `flip`으로 역순으로 치환해주세요.
|
84 |
-
|
85 |
Want see more detail follow this URL [KoGPT_num_converter](https://github.com/ddobokki/KoGPT_num_converter) <br /> and see `bart_inference.py` and `bart_train.py`
|
86 |
-
|
87 |
-
class BartText2TextGenerationPipeline(Text2TextGenerationPipeline):
|
88 |
-
def postprocess(self, model_outputs, return_type=ReturnType.TEXT, clean_up_tokenization_spaces=False):
|
89 |
-
records = []
|
90 |
-
reversed_model_outputs = torch.flip(model_outputs["output_ids"][0], dims=[-1])
|
91 |
-
for output_ids in reversed_model_outputs:
|
92 |
-
if return_type == ReturnType.TENSORS:
|
93 |
-
record = {f"{self.return_name}_token_ids": output_ids}
|
94 |
-
elif return_type == ReturnType.TEXT:
|
95 |
-
record = {
|
96 |
-
f"{self.return_name}_text": self.tokenizer.decode(
|
97 |
-
output_ids,
|
98 |
-
skip_special_tokens=True,
|
99 |
-
clean_up_tokenization_spaces=clean_up_tokenization_spaces,
|
100 |
-
)
|
101 |
-
}
|
102 |
-
records.append(record)
|
103 |
-
return records
|
104 |
-
```
|
105 |
## Evaluation
|
106 |
Just using `evaluate-metric/bleu` and `evaluate-metric/rouge` in huggingface `evaluate` library <br />
|
107 |
-
[Training wanDB URL](https://wandb.ai/bart_tadev/BartForConditionalGeneration/runs/
|
|
|
108 |
## How to Get Started With the Model
|
109 |
```python
|
110 |
from transformers.pipelines import Text2TextGenerationPipeline
|
@@ -112,8 +91,7 @@ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
|
112 |
texts = ["그러게 누가 6시까지 술을 마시래?"]
|
113 |
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-barTNumText")
|
114 |
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-barTNumText")
|
115 |
-
|
116 |
-
seq2seqlm_pipeline = BartText2TextGenerationPipeline(model=model, tokenizer=tokenizer)
|
117 |
kwargs = {
|
118 |
"min_length": 0,
|
119 |
"max_length": 1206,
|
|
|
20 |
name: text2text-generation # Optional. Example: Speech Recognition
|
21 |
metrics:
|
22 |
- type: bleu # Required. Example: wer. Use metric id from https://hf.co/metrics
|
23 |
+
value: 0.9313276940897475 # Required. Example: 20.90
|
24 |
name: eval_bleu # Optional. Example: Test WER
|
25 |
+
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
|
26 |
- type: rouge1 # Required. Example: wer. Use metric id from https://hf.co/metrics
|
27 |
+
value: 0.9607081256861959 # Required. Example: 20.90
|
28 |
name: eval_rouge1 # Optional. Example: Test WER
|
29 |
+
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
|
30 |
- type: rouge2 # Required. Example: wer. Use metric id from https://hf.co/metrics
|
31 |
+
value: 0.9394649136169404 # Required. Example: 20.90
|
32 |
name: eval_rouge2 # Optional. Example: Test WER
|
33 |
+
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
|
34 |
- type: rougeL # Required. Example: wer. Use metric id from https://hf.co/metrics
|
35 |
+
value: 0.9605735834651536 # Required. Example: 20.90
|
36 |
name: eval_rougeL # Optional. Example: Test WER
|
37 |
+
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
|
38 |
- type: rougeLsum # Required. Example: wer. Use metric id from https://hf.co/metrics
|
39 |
+
value: 0.9605993760190767 # Required. Example: 20.90
|
40 |
name: eval_rougeLsum # Optional. Example: Test WER
|
41 |
+
verified: false # Optional. If true, indicates that evaluation was generated by Hugging Face (vs. self-reported).
|
42 |
---
|
43 |
|
44 |
# ko-barTNumText(TNT Model🧨): Try Number To Korean Reading(숫자를 한글로 바꾸는 모델)
|
|
|
78 |
|
79 |
|
80 |
## Uses
|
|
|
|
|
|
|
|
|
81 |
Want see more detail follow this URL [KoGPT_num_converter](https://github.com/ddobokki/KoGPT_num_converter) <br /> and see `bart_inference.py` and `bart_train.py`
|
82 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
## Evaluation
|
84 |
Just using `evaluate-metric/bleu` and `evaluate-metric/rouge` in huggingface `evaluate` library <br />
|
85 |
+
[Training wanDB URL](https://wandb.ai/bart_tadev/BartForConditionalGeneration/runs/326xgytt?workspace=user-bart_tadev)
|
86 |
+
|
87 |
## How to Get Started With the Model
|
88 |
```python
|
89 |
from transformers.pipelines import Text2TextGenerationPipeline
|
|
|
91 |
texts = ["그러게 누가 6시까지 술을 마시래?"]
|
92 |
tokenizer = AutoTokenizer.from_pretrained("lIlBrother/ko-barTNumText")
|
93 |
model = AutoModelForSeq2SeqLM.from_pretrained("lIlBrother/ko-barTNumText")
|
94 |
+
seq2seqlm_pipeline = Text2TextGenerationPipeline(model=model, tokenizer=tokenizer)
|
|
|
95 |
kwargs = {
|
96 |
"min_length": 0,
|
97 |
"max_length": 1206,
|