Marvin
Initial commit
f436953 unverified
---
language:
- de
tags:
- question-generation
- german
- text2text-generation
- generated_from_trainer
datasets:
- lmqg/qg_dequad
metrics:
- bleu4
- f1
- rouge
- exact_match
model-index:
- name: german-jeopardy-mt5-large-256
results:
- task:
name: Sequence-to-sequence Language Modeling
type: text2text-generation
dataset:
name: lmqg/qg_dequad
type: default
args: default
metrics:
- name: BLEU-4
type: bleu4
value: 16.43
- name: F1
type: f1
value: 42.48
- name: ROUGE-1
type: rouge1
value: 43.56
- name: ROUGE-2
type: rouge2
value: 23.78
- name: ROUGE-L
type: rougel
value: 41.81
- name: ROUGE-Lsum
type: rougelsum
value: 41.80
- name: Exact Match
type: exact_match
value: 3.13
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# german-jeopardy-mt5-large-256
This model is a fine-tuned version of [google/mt5-large](https://huggingface.co/google/mt5-large) on the [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad) dataset.
It achieves the following results on the evaluation set:
- Loss: 1.3943
- Brevity Penalty: 0.9201
- System Length: 19195
- Reference Length: 20793
- ROUGE-1: 43.56
- ROUGE-2: 23.78
- ROUGE-L: 41.81
- ROUGE-Lsum: 41.80
- Exact Match: 3.13
- BLEU: 16.43
- F1: 42.48
## Model description
See [google/mt5-large](https://huggingface.co/google/mt5-large) for the model architecture.
The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.
## Intended uses & limitations
This model can be used for question generation on German text.
## Training and evaluation data
See [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad).
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 1
- seed: 7
- gradient_accumulation_steps: 256
- total_train_batch_size: 256
- optimizer: Adafactor
- lr_scheduler_type: constant
- num_epochs: 20
### Training results
| Training Loss | Epoch | Step | Validation Loss | Counts 1 | Counts 2 | Counts 3 | Counts 4 | Totals 1 | Totals 2 | Totals 3 | Totals 4 | Precisions 1 | Precisions 2 | Precisions 3 | Precisions 4 | Brevity Penalty | System Length | Reference Length | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-Lsum | Exact Match | BLEU | Mean Generated Length | F1 |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:--------:|:------------:|:------------:|:------------:|:------------:|:---------------:|:-------------:|:----------------:|:-------:|:-------:|:-------:|:----------:|:-----------:|:-------:|:---------------------:|:------:|
| 5.932 | 0.99 | 36 | 2.4510 | 5614 | 1426 | 527 | 204 | 28835 | 26631 | 24427 | 22223 | 19.4694 | 5.3547 | 2.1574 | 0.918 | 1.0 | 28835 | 21250 | 0.1946 | 0.0763 | 0.1843 | 0.1843 | 0.0 | 3.7906 | 11.4306 | 0.2127 |
| 2.3089 | 1.98 | 72 | 1.3964 | 7578 | 2696 | 1244 | 580 | 17203 | 14999 | 12795 | 10591 | 44.0505 | 17.9745 | 9.7225 | 5.4763 | 0.7904 | 17203 | 21250 | 0.3312 | 0.1655 | 0.316 | 0.3162 | 0.01 | 11.3254 | 12.6583 | 0.3246 |
| 1.6778 | 3.0 | 109 | 1.2660 | 7961 | 3020 | 1480 | 747 | 17067 | 14863 | 12659 | 10455 | 46.6456 | 20.3189 | 11.6913 | 7.1449 | 0.7826 | 17067 | 21250 | 0.3608 | 0.1881 | 0.3456 | 0.3454 | 0.0195 | 13.128 | 12.4682 | 0.3517 |
| 1.5383 | 3.99 | 145 | 1.2212 | 7948 | 3121 | 1558 | 796 | 16694 | 14490 | 12286 | 10082 | 47.6099 | 21.539 | 12.6811 | 7.8953 | 0.7612 | 16694 | 21250 | 0.3663 | 0.1989 | 0.3523 | 0.352 | 0.024 | 13.625 | 12.221 | 0.3554 |
| 1.423 | 4.97 | 181 | 1.1706 | 8746 | 3590 | 1840 | 963 | 17765 | 15561 | 13357 | 11153 | 49.2316 | 23.0705 | 13.7755 | 8.6344 | 0.8219 | 17765 | 21250 | 0.4033 | 0.2224 | 0.3876 | 0.3874 | 0.0304 | 15.7567 | 13.0277 | 0.3941 |
| 1.2861 | 5.99 | 218 | 1.1327 | 8885 | 3646 | 1864 | 1005 | 17406 | 15202 | 12998 | 10794 | 51.0456 | 23.9837 | 14.3407 | 9.3107 | 0.8018 | 17406 | 21250 | 0.4181 | 0.2295 | 0.4022 | 0.402 | 0.0331 | 16.123 | 12.9142 | 0.4092 |
| 1.2372 | 6.98 | 254 | 1.1248 | 9122 | 3824 | 1997 | 1084 | 17310 | 15106 | 12902 | 10698 | 52.6979 | 25.3144 | 15.4782 | 10.1327 | 0.7964 | 17310 | 21250 | 0.4313 | 0.239 | 0.4175 | 0.4172 | 0.0358 | 17.0334 | 12.8412 | 0.4236 |
| 1.1307 | 8.0 | 291 | 1.0998 | 9423 | 4019 | 2136 | 1190 | 18074 | 15870 | 13666 | 11462 | 52.1357 | 25.3245 | 15.63 | 10.3821 | 0.8389 | 18074 | 21250 | 0.441 | 0.249 | 0.4255 | 0.4252 | 0.0404 | 18.0474 | 13.4138 | 0.4327 |
| 1.0982 | 8.99 | 327 | 1.1052 | 9450 | 4003 | 2147 | 1184 | 18145 | 15941 | 13737 | 11533 | 52.0805 | 25.1113 | 15.6293 | 10.2662 | 0.8427 | 18145 | 21250 | 0.4427 | 0.2492 | 0.4266 | 0.4261 | 0.0426 | 18.0367 | 13.4465 | 0.4344 |
| 1.0449 | 9.98 | 363 | 1.0996 | 9471 | 4036 | 2149 | 1180 | 18067 | 15863 | 13659 | 11455 | 52.4215 | 25.4429 | 15.7332 | 10.3012 | 0.8385 | 18067 | 21250 | 0.4422 | 0.2477 | 0.4261 | 0.4257 | 0.0404 | 18.0793 | 13.333 | 0.4341 |
| 0.9686 | 10.99 | 400 | 1.1012 | 9612 | 4165 | 2240 | 1233 | 17983 | 15779 | 13575 | 11371 | 53.4505 | 26.3958 | 16.5009 | 10.8434 | 0.8339 | 17983 | 21250 | 0.4534 | 0.2591 | 0.4381 | 0.4378 | 0.0449 | 18.6914 | 13.3534 | 0.4458 |
| 0.9465 | 11.98 | 436 | 1.1027 | 9670 | 4154 | 2229 | 1239 | 18217 | 16013 | 13809 | 11605 | 53.0823 | 25.9414 | 16.1416 | 10.6764 | 0.8466 | 18217 | 21250 | 0.4531 | 0.258 | 0.4377 | 0.4374 | 0.0445 | 18.6863 | 13.5912 | 0.4452 |
| 0.9025 | 12.97 | 472 | 1.1124 | 9627 | 4155 | 2241 | 1247 | 18076 | 15872 | 13668 | 11464 | 53.2585 | 26.1782 | 16.396 | 10.8775 | 0.839 | 18076 | 21250 | 0.4531 | 0.2583 | 0.4386 | 0.4382 | 0.0436 | 18.7344 | 13.5259 | 0.4452 |
| 0.8402 | 13.99 | 509 | 1.1392 | 9425 | 4071 | 2176 | 1207 | 17339 | 15135 | 12931 | 10727 | 54.3572 | 26.8979 | 16.8278 | 11.252 | 0.7981 | 17339 | 21250 | 0.4495 | 0.2568 | 0.4365 | 0.4358 | 0.0445 | 18.3062 | 12.9129 | 0.4417 |
| 0.8282 | 14.98 | 545 | 1.1227 | 9803 | 4274 | 2316 | 1305 | 18652 | 16448 | 14244 | 12040 | 52.5574 | 25.9849 | 16.2595 | 10.8389 | 0.87 | 18652 | 21250 | 0.4573 | 0.2627 | 0.4418 | 0.4414 | 0.0463 | 19.2695 | 14.0104 | 0.4496 |
| 0.7694 | 16.0 | 582 | 1.1394 | 9740 | 4240 | 2299 | 1296 | 18281 | 16077 | 13873 | 11669 | 53.2794 | 26.3731 | 16.5718 | 11.1064 | 0.8501 | 18281 | 21250 | 0.4572 | 0.2629 | 0.4411 | 0.4412 | 0.0476 | 19.1704 | 13.6475 | 0.4492 |
| 0.7589 | 16.99 | 618 | 1.1497 | 9663 | 4140 | 2214 | 1232 | 18412 | 16208 | 14004 | 11800 | 52.4821 | 25.5429 | 15.8098 | 10.4407 | 0.8572 | 18412 | 21250 | 0.4515 | 0.2561 | 0.4359 | 0.4358 | 0.044 | 18.5906 | 13.7926 | 0.4432 |
| 0.724 | 17.98 | 654 | 1.1680 | 9743 | 4246 | 2316 | 1300 | 18402 | 16198 | 13994 | 11790 | 52.9453 | 26.2131 | 16.5499 | 11.0263 | 0.8566 | 18402 | 21250 | 0.4562 | 0.2625 | 0.4408 | 0.441 | 0.0472 | 19.2167 | 13.7214 | 0.4474 |
| 0.6755 | 18.99 | 691 | 1.1874 | 9722 | 4266 | 2351 | 1341 | 18272 | 16068 | 13864 | 11660 | 53.2071 | 26.5497 | 16.9576 | 11.5009 | 0.8496 | 18272 | 21250 | 0.4559 | 0.2639 | 0.4417 | 0.4413 | 0.0495 | 19.4647 | 13.6071 | 0.4469 |
| 0.657 | 19.79 | 720 | 1.1845 | 9920 | 4361 | 2402 | 1373 | 18884 | 16680 | 14476 | 12272 | 52.5312 | 26.1451 | 16.593 | 11.1881 | 0.8822 | 18884 | 21250 | 0.4594 | 0.2647 | 0.4423 | 0.4421 | 0.0467 | 19.8248 | 14.2001 | 0.4508 |
### Framework versions
- Transformers 4.32.1
- Pytorch 2.1.0
- Datasets 2.12.0
- Tokenizers 0.13.3