Upload with huggingface_hub

Browse files

Files changed (7) hide show

README.md +92 -0
config.json +61 -0
pytorch_model.bin +3 -0
special_tokens_map.json +107 -0
spiece.model +3 -0
tokenizer.json +0 -0
tokenizer_config.json +113 -0

README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: apache-2.0
+tags:
+- generated_from_trainer
+model-index:
+- name: t0-alltasksv2-t2
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# t0-alltasksv2-t2
+This model is a fine-tuned version of [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.2461
+- Train Runtime: 54501.5741
+- Train Samples Per Second: 17.607
+- Train Steps Per Second: 0.196
+- Train Loss: 1.2518
+- Train Samples: 239899
+- Gen Len: 9.0377
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 3
+- eval_batch_size: 4
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 6
+- gradient_accumulation_steps: 5
+- total_train_batch_size: 90
+- total_eval_batch_size: 24
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- num_epochs: 4.0
+### Training results
+| Training Loss | Epoch | Step  | Validation Loss | Rouge1  | Rouge2 | Rougel  | Rougelsum | Accuracy | F1      | Recall  | Precision | Bleu 1 | Bleu 2 | Bleu 3 | Bleu 4 | Rouge L | Gen Len |
+|:-------------:|:-----:|:-----:|:---------------:|:-------:|:------:|:-------:|:---------:|:--------:|:-------:|:-------:|:---------:|:------:|:------:|:------:|:------:|:-------:|:-------:|
+| 1.5099        | 0.15  | 400   | 1.3184          | 62.9956 | 6.6989 | 62.5464 | 62.7932   | 73.1148  | 73.1148 | 73.1148 | 73.1148   | 0.6509 | 0.0005 | 0.0001 | 0.0    | 0.6033  | 6.266   |
+| 1.4929        | 0.3   | 800   | 1.2843          | 64.2143 | 6.6289 | 63.7888 | 63.9982   | 75.1756  | 75.1756 | 75.1756 | 75.1756   | 0.6556 | 0.0005 | 0.0001 | 0.0    | 0.6128  | 6.5423  |
+| 1.4285        | 0.45  | 1200  | 1.2717          | 64.3279 | 6.63   | 63.9036 | 64.1136   | 75.4567  | 75.4567 | 75.4567 | 75.4567   | 0.6591 | 0.0005 | 0.0001 | 0.0    | 0.6153  | 6.4407  |
+| 1.4419        | 0.6   | 1600  | 1.2583          | 64.9421 | 6.5473 | 64.445  | 64.6707   | 76.1593  | 76.1593 | 76.1593 | 76.1593   | 0.6682 | 0.0005 | 0.0001 | 0.0    | 0.6218  | 6.378   |
+| 1.3967        | 0.75  | 2000  | 1.2497          | 65.9619 | 6.8233 | 65.4883 | 65.6707   | 77.2834  | 77.2834 | 77.2834 | 77.2834   | 0.6764 | 0.0005 | 0.0001 | 0.0    | 0.6297  | 6.4027  |
+| 1.4062        | 0.9   | 2400  | 1.2440          | 65.8936 | 6.6978 | 65.4631 | 65.6771   | 77.2365  | 77.2365 | 77.2365 | 77.2365   | 0.6753 | 0.0005 | 0.0001 | 0.0    | 0.6291  | 6.4097  |
+| 1.2708        | 1.05  | 2800  | 1.2460          | 66.4183 | 6.7096 | 66.0049 | 66.191    | 78.1265  | 78.1265 | 78.1265 | 78.1265   | 0.6781 | 0.0006 | 0.0001 | 0.0    | 0.6321  | 6.45    |
+| 1.2593        | 1.2   | 3200  | 1.2467          | 66.985  | 6.4882 | 66.596  | 66.7582   | 79.1569  | 79.1569 | 79.1569 | 79.1569   | 0.683  | 0.0006 | 0.0001 | 0.0    | 0.6395  | 6.495   |
+| 1.2623        | 1.35  | 3600  | 1.2461          | 66.8681 | 6.6985 | 66.4877 | 66.6731   | 78.7354  | 78.7354 | 78.7354 | 78.7354   | 0.6821 | 0.0006 | 0.0001 | 0.0    | 0.6369  | 6.4727  |
+| 1.2579        | 1.5   | 4000  | 1.2447          | 67.8146 | 6.7285 | 67.3078 | 67.5351   | 79.8126  | 79.8126 | 79.8126 | 79.8126   | 0.6937 | 0.0006 | 0.0001 | 0.0    | 0.6455  | 6.3997  |
+| 1.3159        | 1.65  | 4400  | 1.2281          | 67.8172 | 6.7857 | 67.3662 | 67.5871   | 79.9532  | 79.9532 | 79.9532 | 79.9532   | 0.694  | 0.0006 | 0.0001 | 0.0    | 0.6461  | 6.448   |
+| 1.2492        | 1.8   | 4800  | 1.2310          | 68.3986 | 6.8058 | 67.9201 | 68.1741   | 80.7963  | 80.7963 | 80.7963 | 80.7963   | 0.6991 | 0.0006 | 0.0001 | 0.0    | 0.6516  | 6.4757  |
+| 1.2338        | 1.95  | 5200  | 1.2253          | 67.9938 | 6.9092 | 67.5083 | 67.6913   | 80.0     | 80.0    | 80.0    | 80.0      | 0.695  | 0.0006 | 0.0001 | 0.0    | 0.6466  | 6.4163  |
+| 1.1788        | 2.1   | 5600  | 1.2499          | 67.9377 | 6.8965 | 67.4813 | 67.674    | 80.0937  | 80.0937 | 80.0937 | 80.0937   | 0.6917 | 0.0006 | 0.0001 | 0.0    | 0.6434  | 6.5223  |
+| 1.1833        | 2.25  | 6000  | 1.2401          | 68.1452 | 6.8628 | 67.7046 | 67.9493   | 80.3279  | 80.3279 | 80.3279 | 80.3279   | 0.6985 | 0.0006 | 0.0001 | 0.0    | 0.6497  | 6.3733  |
+| 1.193         | 2.4   | 6400  | 1.2418          | 68.252  | 6.8999 | 67.7704 | 68.023    | 80.4684  | 80.4684 | 80.4684 | 80.4684   | 0.6985 | 0.0006 | 0.0001 | 0.0    | 0.6506  | 6.3623  |
+| 1.1649        | 2.55  | 6800  | 1.2403          | 68.5799 | 6.7505 | 68.0777 | 68.3039   | 80.9368  | 80.9368 | 80.9368 | 80.9368   | 0.6993 | 0.0006 | 0.0001 | 0.0    | 0.6515  | 6.4743  |
+| 1.1488        | 2.7   | 7200  | 1.2401          | 68.6737 | 6.9022 | 68.2238 | 68.4258   | 81.0304  | 81.0304 | 81.0304 | 81.0304   | 0.7001 | 0.0006 | 0.0001 | 0.0    | 0.6523  | 6.4573  |
+| 1.1703        | 2.85  | 7600  | 1.2384          | 68.9471 | 6.7776 | 68.4667 | 68.6837   | 81.5457  | 81.5457 | 81.5457 | 81.5457   | 0.7004 | 0.0006 | 0.0001 | 0.0    | 0.6544  | 6.5267  |
+| 1.1763        | 3.0   | 8000  | 1.2348          | 68.7204 | 6.8157 | 68.2282 | 68.4466   | 81.1241  | 81.1241 | 81.1241 | 81.1241   | 0.7006 | 0.0006 | 0.0001 | 0.0    | 0.6527  | 6.4737  |
+| 1.0858        | 3.15  | 8400  | 1.2560          | 68.9519 | 6.9424 | 68.4982 | 68.6892   | 81.2646  | 81.2646 | 81.2646 | 81.2646   | 0.7025 | 0.0006 | 0.0001 | 0.0    | 0.6536  | 6.4767  |
+| 1.103         | 3.3   | 8800  | 1.2461          | 69.2609 | 6.8552 | 68.8244 | 69.0129   | 81.8267  | 81.8267 | 81.8267 | 81.8267   | 0.7064 | 0.0006 | 0.0001 | 0.0    | 0.6582  | 6.4533  |
+| 1.1183        | 3.45  | 9200  | 1.2507          | 68.8282 | 6.903  | 68.3923 | 68.5846   | 81.2178  | 81.2178 | 81.2178 | 81.2178   | 0.7018 | 0.0006 | 0.0001 | 0.0    | 0.6533  | 6.4797  |
+| 1.07          | 3.6   | 9600  | 1.2511          | 69.1742 | 6.8362 | 68.7377 | 68.906    | 81.733   | 81.733  | 81.733  | 81.733    | 0.7061 | 0.0006 | 0.0001 | 0.0    | 0.6576  | 6.4547  |
+| 1.0723        | 3.75  | 10000 | 1.2527          | 69.1098 | 6.7762 | 68.6426 | 68.8416   | 81.6862  | 81.6862 | 81.6862 | 81.6862   | 0.7052 | 0.0006 | 0.0001 | 0.0    | 0.6573  | 6.4493  |
+| 1.117         | 3.9   | 10400 | 1.2512          | 69.1469 | 6.7883 | 68.6792 | 68.9055   | 81.733   | 81.733  | 81.733  | 81.733    | 0.7051 | 0.0006 | 0.0001 | 0.0    | 0.6573  | 6.469   |
+### Framework versions
+- Transformers 4.21.1
+- Pytorch 1.12.0
+- Datasets 2.3.2
+- Tokenizers 0.12.1

config.json ADDED Viewed

	@@ -0,0 +1,61 @@

+{
+  "_name_or_path": "google/flan-t5-xl",
+  "architectures": [
+    "T5ForConditionalGeneration"
+  ],
+  "d_ff": 5120,
+  "d_kv": 64,
+  "d_model": 2048,
+  "decoder_start_token_id": 0,
+  "dense_act_fn": "gelu",
+  "dropout_rate": 0.1,
+  "eos_token_id": 1,
+  "feed_forward_proj": "gelu",
+  "initializer_factor": 1.0,
+  "is_encoder_decoder": true,
+  "is_gated_act": true,
+  "layer_norm_epsilon": 1e-06,
+  "model_type": "t5",
+  "n_positions": 512,
+  "num_decoder_layers": 24,
+  "num_heads": 32,
+  "num_layers": 24,
+  "output_past": true,
+  "pad_token_id": 0,
+  "relative_attention_max_distance": 128,
+  "relative_attention_num_buckets": 32,
+  "task_specific_params": {
+    "summarization": {
+      "early_stopping": true,
+      "length_penalty": 2.0,
+      "max_length": 200,
+      "min_length": 30,
+      "no_repeat_ngram_size": 3,
+      "num_beams": 4,
+      "prefix": "summarize: "
+    },
+    "translation_en_to_de": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to German: "
+    },
+    "translation_en_to_fr": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to French: "
+    },
+    "translation_en_to_ro": {
+      "early_stopping": true,
+      "max_length": 300,
+      "num_beams": 4,
+      "prefix": "translate English to Romanian: "
+    }
+  },
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.21.1",
+  "use_cache": false,
+  "vocab_size": 32100
+}

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d366bb6cfc17a7ba795b7e09795f971c89786785267f3f78976aaa9e1b1ff08e
+size 5699366427

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,107 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

spiece.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
+size 791656

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,113 @@

+{
+  "additional_special_tokens": [
+    "<extra_id_0>",
+    "<extra_id_1>",
+    "<extra_id_2>",
+    "<extra_id_3>",
+    "<extra_id_4>",
+    "<extra_id_5>",
+    "<extra_id_6>",
+    "<extra_id_7>",
+    "<extra_id_8>",
+    "<extra_id_9>",
+    "<extra_id_10>",
+    "<extra_id_11>",
+    "<extra_id_12>",
+    "<extra_id_13>",
+    "<extra_id_14>",
+    "<extra_id_15>",
+    "<extra_id_16>",
+    "<extra_id_17>",
+    "<extra_id_18>",
+    "<extra_id_19>",
+    "<extra_id_20>",
+    "<extra_id_21>",
+    "<extra_id_22>",
+    "<extra_id_23>",
+    "<extra_id_24>",
+    "<extra_id_25>",
+    "<extra_id_26>",
+    "<extra_id_27>",
+    "<extra_id_28>",
+    "<extra_id_29>",
+    "<extra_id_30>",
+    "<extra_id_31>",
+    "<extra_id_32>",
+    "<extra_id_33>",
+    "<extra_id_34>",
+    "<extra_id_35>",
+    "<extra_id_36>",
+    "<extra_id_37>",
+    "<extra_id_38>",
+    "<extra_id_39>",
+    "<extra_id_40>",
+    "<extra_id_41>",
+    "<extra_id_42>",
+    "<extra_id_43>",
+    "<extra_id_44>",
+    "<extra_id_45>",
+    "<extra_id_46>",
+    "<extra_id_47>",
+    "<extra_id_48>",
+    "<extra_id_49>",
+    "<extra_id_50>",
+    "<extra_id_51>",
+    "<extra_id_52>",
+    "<extra_id_53>",
+    "<extra_id_54>",
+    "<extra_id_55>",
+    "<extra_id_56>",
+    "<extra_id_57>",
+    "<extra_id_58>",
+    "<extra_id_59>",
+    "<extra_id_60>",
+    "<extra_id_61>",
+    "<extra_id_62>",
+    "<extra_id_63>",
+    "<extra_id_64>",
+    "<extra_id_65>",
+    "<extra_id_66>",
+    "<extra_id_67>",
+    "<extra_id_68>",
+    "<extra_id_69>",
+    "<extra_id_70>",
+    "<extra_id_71>",
+    "<extra_id_72>",
+    "<extra_id_73>",
+    "<extra_id_74>",
+    "<extra_id_75>",
+    "<extra_id_76>",
+    "<extra_id_77>",
+    "<extra_id_78>",
+    "<extra_id_79>",
+    "<extra_id_80>",
+    "<extra_id_81>",
+    "<extra_id_82>",
+    "<extra_id_83>",
+    "<extra_id_84>",
+    "<extra_id_85>",
+    "<extra_id_86>",
+    "<extra_id_87>",
+    "<extra_id_88>",
+    "<extra_id_89>",
+    "<extra_id_90>",
+    "<extra_id_91>",
+    "<extra_id_92>",
+    "<extra_id_93>",
+    "<extra_id_94>",
+    "<extra_id_95>",
+    "<extra_id_96>",
+    "<extra_id_97>",
+    "<extra_id_98>",
+    "<extra_id_99>"
+  ],
+  "eos_token": "</s>",
+  "extra_ids": 100,
+  "model_max_length": 512,
+  "name_or_path": "google/flan-t5-xl",
+  "pad_token": "<pad>",
+  "sp_model_kwargs": {},
+  "special_tokens_map_file": "/home/arthur_huggingface_co/.cache/huggingface/hub/models--google--t5-v1_1-small/snapshots/fb7e6cba609f7bab11c614294bc04f82f613c7b1/special_tokens_map.json",
+  "tokenizer_class": "T5Tokenizer",
+  "unk_token": "<unk>"
+}