chunwoolee0
/

mt5_small_wmt16_de_en

Text2Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

mt5_small_wmt16_de_en / README.md

chunwoolee0's picture

Update README.md

ae13dc0 about 1 year ago

|

history blame contribute delete

2.73 kB

	---
	license: apache-2.0
	base_model: google/mt5-small
	tags:
	- generated_from_trainer
	datasets:
	- wmt16
	metrics:
	- rouge
	- sacrebleu
	model-index:
	- name: mt5_small_wmt16_de_en
	results:
	- task:
	name: Sequence-to-sequence Language Modeling
	type: text2text-generation
	dataset:
	name: wmt16
	type: wmt16
	config: de-en
	split: validation
	args: de-en
	metrics:
	- name: Rouge1
	type: rouge
	value: 0.3666
	- name: Sacrebleu
	type: sacrebleu
	value: 6.4622
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mt5_small_wmt16_de_en

	This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on the wmt16 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.4612
	- Rouge1: 0.3666
	- Rouge2: 0.147
	- Rougel: 0.3362
	- Sacrebleu: 6.4622

	## Model description

	Multilingual T5 (mT5) is a massively multilingual pretrained text-to-text transformer model,
	trained following a similar recipe as T5.

	## Intended uses & limitations

	This is tried to be familiarized with the mt5 model in order to use it for the translation of English to Korean.

	## Training and evaluation data

	This work was done as an exercise for English-Korean translation,
	so I trained by selecting only very small part of a very large original dataset.
	Therefore, the quality is not expected to be very good.
	이 일은 영어 한국어 번역을 위한 연습으로 한 것이기 때문에 매우 큰 원 dataset에서 아주 작은 크기만의 글뭉치만 선택을 해서 훈련을 했다.
	따라서 질은 그리 좋지 않을 것으로 예상된다.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Sacrebleu \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|:------:\|:------:\|:---------:\|
	\| 3.3059 \| 1.6 \| 500 \| 2.5597 \| 0.3398 \| 0.1261 \| 0.3068 \| 5.5524 \|
	\| 2.4093 \| 3.2 \| 1000 \| 2.4996 \| 0.3609 \| 0.144 \| 0.3304 \| 6.2002 \|
	\| 2.2322 \| 4.8 \| 1500 \| 2.4612 \| 0.3666 \| 0.147 \| 0.3362 \| 6.4622 \|


	### Framework versions

	- Transformers 4.32.0
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.4
	- Tokenizers 0.13.3