NebulaByte
/

hindi_gpt2

@@ -1,28 +1,25 @@
 ---
-license: apache-2.0
-widget:
-- text: "अपने अनुप्रयोग को पहुंचनीयता व्यायाम"
-- text: "जनतंत्र की सफलता केवल इस बात से नहीं हो सकती है कि हर"
-- text: "अगर इसके बाद भी वे फैसले पर कायम रहते हैं और"
-- text: "मामले का खुलासा होने के बाद"
-- text: "My name is Julien and I like to"
-- text: "My name is Thomas and my main"
-inference:
-  parameters:
-    max_length: 200
 ---
-# Model Overview:
-The model is a language generation model designed for extending the GPT2 models to support Hindi language along with the original languages that it supports. It was fine-tuned on Hindi texts of [wikipedia](https://www.kaggle.com/datasets/disisbig/hindi-wikipedia-articles-55k) articles.
-# Model Architecture and Parameters:
-The model architecture is based on the GPT-2 framework, specifically using the parameters of the small version of the original OpenAI GPT2 model. It employs a Byte Pair Encoding (BPE) tokenizer.
-# Corpus:
-The training corpus for Hindi GPT2 consists of Wikipedia articles.
-# Tokenizer:
-A tokenizer is trained on Hindi Wikipedia Corpus. The new tokenizer vocabulary (5000 tokens) is merged with existing tokenizer. Hindi GPT2 uses a byte-level version of Byte Pair Encoding (BPE) for tokenizing Hindi text, including Unicode characters. The tokenizer has a vocabulary size of 53497, which allows it to effectively represent the Hindi language's rich vocabulary. Input sequences are formed by breaking the text into consecutive tokens with a maximum length of 1024 tokens.
 ## Intended uses & limitations
@@ -34,38 +31,34 @@ More information needed
 ## Training procedure
-More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 0.0005
-- train_batch_size: 64
-- eval_batch_size: 64
 - seed: 42
 - gradient_accumulation_steps: 4
-- total_train_batch_size: 256
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 500
 - num_epochs: 1
-- mixed_precision_training: Native AMP
 ### Training results
-| Step | Training Loss | Validation Loss |
-| :---- | :------------- | :--------------- |
-| 500  | 2.0016        | 1.066703        |
-| 1000 | 1.0314        | 0.959653        |
-| 1500 | 0.9593        | 0.918827        |
-| 2000 | 0.922         | 0.889607        |
-| 2500 | 0.8983        | 0.872523        |
-| 3000 | 0.8852        | 0.863592        |
 ### Framework versions
-- Transformers 4.30.2
-- torch 1.13.1
-- Datasets 2.13.1
 - Tokenizers 0.13.3

 ---
+license: mit
+base_model: gpt2
+tags:
+- generated_from_trainer
+model-index:
+- name: hindi_gpt2
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# hindi_gpt2
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.9187
+## Model description
+More information needed
 ## Intended uses & limitations
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - learning_rate: 0.0005
+- train_batch_size: 40
+- eval_batch_size: 40
 - seed: 42
 - gradient_accumulation_steps: 4
+- total_train_batch_size: 160
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 400
 - num_epochs: 1
 ### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 4.694         | 0.18  | 400  | 2.7361          |
+| 2.3952        | 0.35  | 800  | 2.1608          |
+| 2.1311        | 0.53  | 1200 | 2.0237          |
+| 2.0282        | 0.71  | 1600 | 1.9518          |
+| 1.9731        | 0.89  | 2000 | 1.9187          |
 ### Framework versions
+- Transformers 4.31.0
+- Pytorch 2.0.1+cu118
+- Datasets 2.14.2
 - Tokenizers 0.13.3