german-jeopardy-mt5-large-256 / README.md

Marvin

Initial commit

f436953 unverified about 1 year ago

10.3 kB

	---
	language:
	- de
	tags:
	- question-generation
	- german
	- text2text-generation
	- generated_from_trainer
	datasets:
	- lmqg/qg_dequad
	metrics:
	- bleu4
	- f1
	- rouge
	- exact_match
	model-index:
	- name: german-jeopardy-mt5-large-256
	results:
	- task:
	name: Sequence-to-sequence Language Modeling
	type: text2text-generation
	dataset:
	name: lmqg/qg_dequad
	type: default
	args: default
	metrics:
	- name: BLEU-4
	type: bleu4
	value: 16.43
	- name: F1
	type: f1
	value: 42.48
	- name: ROUGE-1
	type: rouge1
	value: 43.56
	- name: ROUGE-2
	type: rouge2
	value: 23.78
	- name: ROUGE-L
	type: rougel
	value: 41.81
	- name: ROUGE-Lsum
	type: rougelsum
	value: 41.80
	- name: Exact Match
	type: exact_match
	value: 3.13
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# german-jeopardy-mt5-large-256

	This model is a fine-tuned version of [google/mt5-large](https://huggingface.co/google/mt5-large) on the [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.3943
	- Brevity Penalty: 0.9201
	- System Length: 19195
	- Reference Length: 20793
	- ROUGE-1: 43.56
	- ROUGE-2: 23.78
	- ROUGE-L: 41.81
	- ROUGE-Lsum: 41.80
	- Exact Match: 3.13
	- BLEU: 16.43
	- F1: 42.48

	## Model description


	See [google/mt5-large](https://huggingface.co/google/mt5-large) for the model architecture.
	The model was trained on a single NVIDIA RTX 3090 GPU with 24GB of VRAM.

	## Intended uses & limitations

	This model can be used for question generation on German text.

	## Training and evaluation data

	See [lmqg/qg_dequad](https://huggingface.co/datasets/lmqg/qg_dequad).

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 7
	- gradient_accumulation_steps: 256
	- total_train_batch_size: 256
	- optimizer: Adafactor
	- lr_scheduler_type: constant
	- num_epochs: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Counts 1 \| Counts 2 \| Counts 3 \| Counts 4 \| Totals 1 \| Totals 2 \| Totals 3 \| Totals 4 \| Precisions 1 \| Precisions 2 \| Precisions 3 \| Precisions 4 \| Brevity Penalty \| System Length \| Reference Length \| ROUGE-1 \| ROUGE-2 \| ROUGE-L \| ROUGE-Lsum \| Exact Match \| BLEU \| Mean Generated Length \| F1 \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:--------:\|:------------:\|:------------:\|:------------:\|:------------:\|:---------------:\|:-------------:\|:----------------:\|:-------:\|:-------:\|:-------:\|:----------:\|:-----------:\|:-------:\|:---------------------:\|:------:\|
	\| 5.932 \| 0.99 \| 36 \| 2.4510 \| 5614 \| 1426 \| 527 \| 204 \| 28835 \| 26631 \| 24427 \| 22223 \| 19.4694 \| 5.3547 \| 2.1574 \| 0.918 \| 1.0 \| 28835 \| 21250 \| 0.1946 \| 0.0763 \| 0.1843 \| 0.1843 \| 0.0 \| 3.7906 \| 11.4306 \| 0.2127 \|
	\| 2.3089 \| 1.98 \| 72 \| 1.3964 \| 7578 \| 2696 \| 1244 \| 580 \| 17203 \| 14999 \| 12795 \| 10591 \| 44.0505 \| 17.9745 \| 9.7225 \| 5.4763 \| 0.7904 \| 17203 \| 21250 \| 0.3312 \| 0.1655 \| 0.316 \| 0.3162 \| 0.01 \| 11.3254 \| 12.6583 \| 0.3246 \|
	\| 1.6778 \| 3.0 \| 109 \| 1.2660 \| 7961 \| 3020 \| 1480 \| 747 \| 17067 \| 14863 \| 12659 \| 10455 \| 46.6456 \| 20.3189 \| 11.6913 \| 7.1449 \| 0.7826 \| 17067 \| 21250 \| 0.3608 \| 0.1881 \| 0.3456 \| 0.3454 \| 0.0195 \| 13.128 \| 12.4682 \| 0.3517 \|
	\| 1.5383 \| 3.99 \| 145 \| 1.2212 \| 7948 \| 3121 \| 1558 \| 796 \| 16694 \| 14490 \| 12286 \| 10082 \| 47.6099 \| 21.539 \| 12.6811 \| 7.8953 \| 0.7612 \| 16694 \| 21250 \| 0.3663 \| 0.1989 \| 0.3523 \| 0.352 \| 0.024 \| 13.625 \| 12.221 \| 0.3554 \|
	\| 1.423 \| 4.97 \| 181 \| 1.1706 \| 8746 \| 3590 \| 1840 \| 963 \| 17765 \| 15561 \| 13357 \| 11153 \| 49.2316 \| 23.0705 \| 13.7755 \| 8.6344 \| 0.8219 \| 17765 \| 21250 \| 0.4033 \| 0.2224 \| 0.3876 \| 0.3874 \| 0.0304 \| 15.7567 \| 13.0277 \| 0.3941 \|
	\| 1.2861 \| 5.99 \| 218 \| 1.1327 \| 8885 \| 3646 \| 1864 \| 1005 \| 17406 \| 15202 \| 12998 \| 10794 \| 51.0456 \| 23.9837 \| 14.3407 \| 9.3107 \| 0.8018 \| 17406 \| 21250 \| 0.4181 \| 0.2295 \| 0.4022 \| 0.402 \| 0.0331 \| 16.123 \| 12.9142 \| 0.4092 \|
	\| 1.2372 \| 6.98 \| 254 \| 1.1248 \| 9122 \| 3824 \| 1997 \| 1084 \| 17310 \| 15106 \| 12902 \| 10698 \| 52.6979 \| 25.3144 \| 15.4782 \| 10.1327 \| 0.7964 \| 17310 \| 21250 \| 0.4313 \| 0.239 \| 0.4175 \| 0.4172 \| 0.0358 \| 17.0334 \| 12.8412 \| 0.4236 \|
	\| 1.1307 \| 8.0 \| 291 \| 1.0998 \| 9423 \| 4019 \| 2136 \| 1190 \| 18074 \| 15870 \| 13666 \| 11462 \| 52.1357 \| 25.3245 \| 15.63 \| 10.3821 \| 0.8389 \| 18074 \| 21250 \| 0.441 \| 0.249 \| 0.4255 \| 0.4252 \| 0.0404 \| 18.0474 \| 13.4138 \| 0.4327 \|
	\| 1.0982 \| 8.99 \| 327 \| 1.1052 \| 9450 \| 4003 \| 2147 \| 1184 \| 18145 \| 15941 \| 13737 \| 11533 \| 52.0805 \| 25.1113 \| 15.6293 \| 10.2662 \| 0.8427 \| 18145 \| 21250 \| 0.4427 \| 0.2492 \| 0.4266 \| 0.4261 \| 0.0426 \| 18.0367 \| 13.4465 \| 0.4344 \|
	\| 1.0449 \| 9.98 \| 363 \| 1.0996 \| 9471 \| 4036 \| 2149 \| 1180 \| 18067 \| 15863 \| 13659 \| 11455 \| 52.4215 \| 25.4429 \| 15.7332 \| 10.3012 \| 0.8385 \| 18067 \| 21250 \| 0.4422 \| 0.2477 \| 0.4261 \| 0.4257 \| 0.0404 \| 18.0793 \| 13.333 \| 0.4341 \|
	\| 0.9686 \| 10.99 \| 400 \| 1.1012 \| 9612 \| 4165 \| 2240 \| 1233 \| 17983 \| 15779 \| 13575 \| 11371 \| 53.4505 \| 26.3958 \| 16.5009 \| 10.8434 \| 0.8339 \| 17983 \| 21250 \| 0.4534 \| 0.2591 \| 0.4381 \| 0.4378 \| 0.0449 \| 18.6914 \| 13.3534 \| 0.4458 \|
	\| 0.9465 \| 11.98 \| 436 \| 1.1027 \| 9670 \| 4154 \| 2229 \| 1239 \| 18217 \| 16013 \| 13809 \| 11605 \| 53.0823 \| 25.9414 \| 16.1416 \| 10.6764 \| 0.8466 \| 18217 \| 21250 \| 0.4531 \| 0.258 \| 0.4377 \| 0.4374 \| 0.0445 \| 18.6863 \| 13.5912 \| 0.4452 \|
	\| 0.9025 \| 12.97 \| 472 \| 1.1124 \| 9627 \| 4155 \| 2241 \| 1247 \| 18076 \| 15872 \| 13668 \| 11464 \| 53.2585 \| 26.1782 \| 16.396 \| 10.8775 \| 0.839 \| 18076 \| 21250 \| 0.4531 \| 0.2583 \| 0.4386 \| 0.4382 \| 0.0436 \| 18.7344 \| 13.5259 \| 0.4452 \|
	\| 0.8402 \| 13.99 \| 509 \| 1.1392 \| 9425 \| 4071 \| 2176 \| 1207 \| 17339 \| 15135 \| 12931 \| 10727 \| 54.3572 \| 26.8979 \| 16.8278 \| 11.252 \| 0.7981 \| 17339 \| 21250 \| 0.4495 \| 0.2568 \| 0.4365 \| 0.4358 \| 0.0445 \| 18.3062 \| 12.9129 \| 0.4417 \|
	\| 0.8282 \| 14.98 \| 545 \| 1.1227 \| 9803 \| 4274 \| 2316 \| 1305 \| 18652 \| 16448 \| 14244 \| 12040 \| 52.5574 \| 25.9849 \| 16.2595 \| 10.8389 \| 0.87 \| 18652 \| 21250 \| 0.4573 \| 0.2627 \| 0.4418 \| 0.4414 \| 0.0463 \| 19.2695 \| 14.0104 \| 0.4496 \|
	\| 0.7694 \| 16.0 \| 582 \| 1.1394 \| 9740 \| 4240 \| 2299 \| 1296 \| 18281 \| 16077 \| 13873 \| 11669 \| 53.2794 \| 26.3731 \| 16.5718 \| 11.1064 \| 0.8501 \| 18281 \| 21250 \| 0.4572 \| 0.2629 \| 0.4411 \| 0.4412 \| 0.0476 \| 19.1704 \| 13.6475 \| 0.4492 \|
	\| 0.7589 \| 16.99 \| 618 \| 1.1497 \| 9663 \| 4140 \| 2214 \| 1232 \| 18412 \| 16208 \| 14004 \| 11800 \| 52.4821 \| 25.5429 \| 15.8098 \| 10.4407 \| 0.8572 \| 18412 \| 21250 \| 0.4515 \| 0.2561 \| 0.4359 \| 0.4358 \| 0.044 \| 18.5906 \| 13.7926 \| 0.4432 \|
	\| 0.724 \| 17.98 \| 654 \| 1.1680 \| 9743 \| 4246 \| 2316 \| 1300 \| 18402 \| 16198 \| 13994 \| 11790 \| 52.9453 \| 26.2131 \| 16.5499 \| 11.0263 \| 0.8566 \| 18402 \| 21250 \| 0.4562 \| 0.2625 \| 0.4408 \| 0.441 \| 0.0472 \| 19.2167 \| 13.7214 \| 0.4474 \|
	\| 0.6755 \| 18.99 \| 691 \| 1.1874 \| 9722 \| 4266 \| 2351 \| 1341 \| 18272 \| 16068 \| 13864 \| 11660 \| 53.2071 \| 26.5497 \| 16.9576 \| 11.5009 \| 0.8496 \| 18272 \| 21250 \| 0.4559 \| 0.2639 \| 0.4417 \| 0.4413 \| 0.0495 \| 19.4647 \| 13.6071 \| 0.4469 \|
	\| 0.657 \| 19.79 \| 720 \| 1.1845 \| 9920 \| 4361 \| 2402 \| 1373 \| 18884 \| 16680 \| 14476 \| 12272 \| 52.5312 \| 26.1451 \| 16.593 \| 11.1881 \| 0.8822 \| 18884 \| 21250 \| 0.4594 \| 0.2647 \| 0.4423 \| 0.4421 \| 0.0467 \| 19.8248 \| 14.2001 \| 0.4508 \|


	### Framework versions

	- Transformers 4.32.1
	- Pytorch 2.1.0
	- Datasets 2.12.0
	- Tokenizers 0.13.3