# PL-BERT Fine-Tuned on Hindi Wikipedia Dataset This model is a fine-tuned version of **PL-BERT**, specifically trained on the Hindi subset of the Wiki40b dataset. The model has been optimized to understand and generate high-quality Hindi text, making it suitable for various NLP tasks in the Hindi language. For more information about this model, check out the [GitHub](https://github.com/Ionio-io/PL-BERT-Fine-Tuned-hi-) repository. ## Model Overview - **Model Name:** PL-BERT (Fine-tuned on Hindi) - **Base Model:** PL-BERT (Multilingual BERT variant) - **Dataset:** Hindi subset from Wiki40b (51,000 cleaned Wikipedia articles) - **Precision:** Mixed precision (FP16) The fine-tuning process focused on improving the model's ability to handle Hindi text more effectively by leveraging a large, cleaned corpus of Wikipedia articles in Hindi. ## Training Details - **Model:** PL-BERT - **Dataset:** Hindi subset from Wiki40b - **Batch Size:** 64 - **Mixed Precision:** FP16 - **Optimizer:** AdamW - **Training Steps:** 15,000 ### Training Progress - **Final Loss:** 1.879 - **Vocabulary Loss:** 0.49 - **Token Loss:** 1.465 ### Validation Results During training, we monitored performance with validation metrics: - **Validation Loss:** 1.879 - **Vocabulary Accuracy:** 78.54% - **Token Accuracy:** 82.30% --- license: apache-2.0 ---