# PL-BERT Fine-Tuned on Hindi Wikipedia Dataset

This model is a fine-tuned version of **PL-BERT**, specifically trained on the Hindi subset of the Wiki40b dataset. The model has been optimized to understand and generate high-quality Hindi text, making it suitable for various NLP tasks in the Hindi language.
For more information about this model, check out the [GitHub](https://github.com/Ionio-io/PL-BERT-Fine-Tuned-hi-) repository.

## Model Overview

- **Model Name:** PL-BERT (Fine-tuned on Hindi)
- **Base Model:** PL-BERT (Multilingual BERT variant)
- **Dataset:** Hindi subset from Wiki40b (51,000 cleaned Wikipedia articles)
- **Precision:** Mixed precision (FP16)

The fine-tuning process focused on improving the model's ability to handle Hindi text more effectively by leveraging a large, cleaned corpus of Wikipedia articles in Hindi.

## Training Details

- **Model:** PL-BERT
- **Dataset:** Hindi subset from Wiki40b
- **Batch Size:** 64
- **Mixed Precision:** FP16
- **Optimizer:** AdamW
- **Training Steps:** 15,000

### Training Progress

- **Final Loss:** 1.879
- **Vocabulary Loss:** 0.49
- **Token Loss:** 1.465

### Validation Results

During training, we monitored performance with validation metrics:

- **Validation Loss:** 1.879
- **Vocabulary Accuracy:** 78.54%
- **Token Accuracy:** 82.30%


---
license: apache-2.0
---