Calc loss only on prompts, add special tokens, remove grouping

Browse files

Files changed (11) hide show

README.md +23 -37
added_tokens.json +4 -0
all_results.json +13 -13
config.json +1 -1
eval_results.json +8 -8
pytorch_model.bin +2 -2
special_tokens_map.json +2 -0
tokenizer.json +18 -0
train_results.json +6 -6
trainer_state.json +0 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -5,19 +5,19 @@ tags:
 metrics:
 - accuracy
 model-index:
-- name: gpt2-large-finetuned
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# gpt2-large-finetuned
-This model is a fine-tuned version of [gpt2-large](https://huggingface.co/gpt2-large) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.2374
-- Accuracy: 0.5978
 ## Model description
@@ -36,48 +36,34 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 2.294e-05
-- train_batch_size: 4
-- eval_batch_size: 4
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
-- num_epochs: 6.0
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss | Accuracy |
-|:-------------:|:-----:|:----:|:---------------:|:--------:|
-| 2.5507        | 0.23  | 100  | 2.5061          | 0.5568   |
-| 2.465         | 0.46  | 200  | 2.4254          | 0.5672   |
-| 2.3919        | 0.7   | 300  | 2.3827          | 0.5726   |
-| 2.4222        | 0.93  | 400  | 2.3489          | 0.5760   |
-| 2.1958        | 1.16  | 500  | 2.3302          | 0.5793   |
-| 2.2087        | 1.39  | 600  | 2.3123          | 0.5818   |
-| 2.2436        | 1.62  | 700  | 2.2960          | 0.5841   |
-| 2.1737        | 1.86  | 800  | 2.2810          | 0.5866   |
-| 2.0763        | 2.09  | 900  | 2.2779          | 0.5876   |
-| 2.0852        | 2.32  | 1000 | 2.2678          | 0.5894   |
-| 2.0946        | 2.55  | 1100 | 2.2594          | 0.5906   |
-| 2.0497        | 2.78  | 1200 | 2.2516          | 0.5920   |
-| 2.0141        | 3.02  | 1300 | 2.2513          | 0.5928   |
-| 2.0316        | 3.25  | 1400 | 2.2505          | 0.5932   |
-| 1.9783        | 3.48  | 1500 | 2.2430          | 0.5938   |
-| 1.9917        | 3.71  | 1600 | 2.2386          | 0.5948   |
-| 2.0152        | 3.94  | 1700 | 2.2315          | 0.5960   |
-| 1.886         | 4.18  | 1800 | 2.2420          | 0.5957   |
-| 1.9151        | 4.41  | 1900 | 2.2409          | 0.5967   |
-| 1.9538        | 4.64  | 2000 | 2.2379          | 0.5971   |
-| 1.8886        | 4.87  | 2100 | 2.2349          | 0.5976   |
-| 1.9408        | 5.1   | 2200 | 2.2410          | 0.5975   |
-| 1.9168        | 5.34  | 2300 | 2.2394          | 0.5976   |
-| 1.8002        | 5.57  | 2400 | 2.2381          | 0.5977   |
-| 1.8888        | 5.8   | 2500 | 2.2367          | 0.5978   |
 ### Framework versions
 - Transformers 4.26.0
-- Pytorch 1.13.1
 - Datasets 2.9.0
 - Tokenizers 0.13.2

 metrics:
 - accuracy
 model-index:
+- name: gpt2-sweep
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# gpt2-sweep
+This model is a fine-tuned version of [gpt2-large](https://huggingface.co/gpt2-large) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.0773
+- Accuracy: 0.8482
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 2.294477077303931e-05
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 1000
+- num_epochs: 2.0
 ### Training results
+| Training Loss | Epoch | Step  | Validation Loss | Accuracy |
+|:-------------:|:-----:|:-----:|:---------------:|:--------:|
+| 2.4891        | 0.19  | 1000  | 2.4467          | 0.8446   |
+| 2.7019        | 0.37  | 2000  | 2.3208          | 0.8456   |
+| 2.5278        | 0.56  | 3000  | 2.2470          | 0.8464   |
+| 2.0687        | 0.74  | 4000  | 2.1953          | 0.8468   |
+| 2.1738        | 0.93  | 5000  | 2.1543          | 0.8472   |
+| 1.8554        | 1.12  | 6000  | 2.1500          | 0.8475   |
+| 1.9276        | 1.3   | 7000  | 2.1223          | 0.8477   |
+| 1.7988        | 1.49  | 8000  | 2.1120          | 0.8479   |
+| 2.0632        | 1.67  | 9000  | 2.0973          | 0.8480   |
+| 1.9586        | 1.86  | 10000 | 2.0826          | 0.8481   |
 ### Framework versions
 - Transformers 4.26.0
+- Pytorch 2.0.0+cu117
 - Datasets 2.9.0
 - Tokenizers 0.13.2

added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "</s>": 50258,
+  "[PAD]": 50257
+}

all_results.json CHANGED Viewed

@@ -1,15 +1,15 @@
 {
-    "epoch": 6.0,
-    "eval_accuracy": 0.5978312760008184,
-    "eval_loss": 2.2373838424682617,
-    "eval_runtime": 58.1232,
-    "eval_samples": 430,
-    "eval_samples_per_second": 7.398,
-    "eval_steps_per_second": 1.858,
-    "perplexity": 9.368788970328781,
-    "train_loss": 2.103886123784088,
-    "train_runtime": 6222.0683,
-    "train_samples": 1723,
-    "train_samples_per_second": 1.662,
-    "train_steps_per_second": 0.416
 }

 {
+    "epoch": 2.0,
+    "eval_accuracy": 0.8481798046914326,
+    "eval_loss": 2.0772647857666016,
+    "eval_runtime": 129.398,
+    "eval_samples": 10750,
+    "eval_samples_per_second": 83.077,
+    "eval_steps_per_second": 10.387,
+    "perplexity": 7.982604892014763,
+    "train_loss": 2.1184611846914603,
+    "train_runtime": 4663.4223,
+    "train_samples": 43003,
+    "train_samples_per_second": 18.443,
+    "train_steps_per_second": 2.306
 }

config.json CHANGED Viewed

@@ -35,5 +35,5 @@
   "torch_dtype": "float32",
   "transformers_version": "4.26.0",
   "use_cache": true,
-  "vocab_size": 50257
 }

   "torch_dtype": "float32",
   "transformers_version": "4.26.0",
   "use_cache": true,
+  "vocab_size": 50259
 }

eval_results.json CHANGED Viewed

@@ -1,10 +1,10 @@
 {
-    "epoch": 6.0,
-    "eval_accuracy": 0.5978312760008184,
-    "eval_loss": 2.2373838424682617,
-    "eval_runtime": 58.1232,
-    "eval_samples": 430,
-    "eval_samples_per_second": 7.398,
-    "eval_steps_per_second": 1.858,
-    "perplexity": 9.368788970328781
 }

 {
+    "epoch": 2.0,
+    "eval_accuracy": 0.8481798046914326,
+    "eval_loss": 2.0772647857666016,
+    "eval_runtime": 129.398,
+    "eval_samples": 10750,
+    "eval_samples_per_second": 83.077,
+    "eval_steps_per_second": 10.387,
+    "perplexity": 7.982604892014763
 }

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1ec777bb140f4ef1ab2fd98a9c3619d22b5b322b3ca101fbb85dbb1169259bdc
-size 3134035005

 version https://git-lfs.github.com/spec/v1
+oid sha256:4c68c6a4031f800eda2b9d1b40fe6faa1bd0d8c017a068f2c3c79fc7f83d4eca
+size 3134045245

special_tokens_map.json CHANGED Viewed

@@ -1,5 +1,7 @@
 {
   "bos_token": "<|endoftext|>",
   "eos_token": "<|endoftext|>",
   "unk_token": "<|endoftext|>"
 }

 {
   "bos_token": "<|endoftext|>",
   "eos_token": "<|endoftext|>",
+  "pad_token": "[PAD]",
+  "sep_token": "</s>",
   "unk_token": "<|endoftext|>"
 }

tokenizer.json CHANGED Viewed

@@ -11,6 +11,24 @@
       "rstrip": false,
       "normalized": false,
       "special": true
     }
   ],
   "normalizer": null,

       "rstrip": false,
       "normalized": false,
       "special": true
+    },
+    {
+      "id": 50257,
+      "content": "[PAD]",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    {
+      "id": 50258,
+      "content": "</s>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
     }
   ],
   "normalizer": null,

train_results.json CHANGED Viewed

@@ -1,8 +1,8 @@
 {
-    "epoch": 6.0,
-    "train_loss": 2.103886123784088,
-    "train_runtime": 6222.0683,
-    "train_samples": 1723,
-    "train_samples_per_second": 1.662,
-    "train_steps_per_second": 0.416
 }

 {
+    "epoch": 2.0,
+    "train_loss": 2.1184611846914603,
+    "train_runtime": 4663.4223,
+    "train_samples": 43003,
+    "train_samples_per_second": 18.443,
+    "train_steps_per_second": 2.306
 }

trainer_state.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f6b5dab74e476d5b6df53fd4c78148028a02e2501b93ecd2e51ad20482829c3e
 size 3451

 version https://git-lfs.github.com/spec/v1
+oid sha256:2354ff3ec9e59058b3289218560a957a3cbc5faa14c40080e2daf94d9e8c0c3f
 size 3451