---
license: mit
language:
- ne
metrics:
- rouge
tags:
- Nepali summary
- Nepali bart
- Nepali
- summary
- text
- nepali text summary
pipeline_tag: text2text-generation
widget:
- text: "अत्यधिक माग भएका बेला दसैंमा चिनीको हाहाकार भएको थियो । उपत्यकाबाहिरका केही जिल्लामा चिनी पाइए पनि काठमाडौंमा भने अभाव नै कायम रहेको छ । प्रधानमन्त्री पुष्पकमल दाहालले बिहीबार बिहान उद्योग तथा वाणिज्य मन्त्री तथा मुख्यसचिवलाई चिनीको अभाव सिर्जना हुन नदिन सबै उपायको खोजी गर्न निर्देशन दिएका थिए । नेपाली चिनी उद्योगहरूले आम उपभोक्तालाई सहज हुने किसिमले बजारमा चिनी नपठाइ ठूला उद्योगलाई आपूर्ति गर्न गोदाममै राख्ने गरेको पनि भेटिएको छ । वाणिज्य विभागको तथ्यांक अनुसार, नेपालमा उत्पादन हुने चिनीको सत्तरी प्रतिशत चिनी बिभिन्न पेय पदार्थ, मिठाइ, चकलेट, विस्कुटलगायतका उद्योगहरुमा आपूर्ति हुने गर्दछ । नेपाल प्रहरीले नेपालमा रहेका सबै चिनी उद्योगको स्टक रेकर्ड चेक गर्ने तथा सो आधारमा बजारमा चिनी पठाउन उद्योगीहरूसँग छलफल गरिने विभागले जनाएको छ ।"
  example_title: "Example 1"
---
# Nep_Summ_BART:

<!-- Provide a quick summary of what the model is/does. -->

This model is pre-trained using BART on Nepali corpus and then fine-tuned on Nepali summary data.
<br>The model generates a summary for the text input.

The parameter size for the model is 101M.

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

The model is trained using BART noising techniques like sentence permutation, token deletion, and random token masking.
<br>The noisy data is fed into the encoder of the transformer and the denoising task/ objective is fulfilled by the decoder of the transformer model.

Cross-entropy loss is used for both the pre-training and fine-tuning of the model.

The Loss for pre-training is as follows:

| Epoch   |      Training Loss      |  Val Loss |
|----------|:-------------:|------:|
| 1 |  0.8137 | 0.8010 |
| 2 |  0.7861 | 0.7524 |
| 3 |  0.7495 | 0.7290 |

The ROUGE Score after the fine-tuning, for the BBC XLSum Nepali Test Dataset is:

ROUGE1 : 0.177

ROUGE2 : 0.059

ROUGEL : 0.154

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
You can use this model for text summarization.
<br>Could be used as an encoder-only model using BartForSequenceClasssification.
## How to Get Started with the Model

Use the code below to get started with the model.
```
# make sure to install the dependencies below/ from requirements.txt
# pip install transformers==4.35
# pip install huggingface_hub==0.23.0

import torch

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("pascalrai/nep_summ_BART")
model = AutoModelForSeq2SeqLM.from_pretrained("pascalrai/nep_summ_BART")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

sentence = """अत्यधिक माग भएका बेला दसैंमा चिनीको हाहाकार भएको थियो । उपत्यकाबाहिरका केही जिल्लामा चिनी पाइए पनि काठमाडौंमा भने अभाव नै कायम रहेको छ । प्रधानमन्त्री पुष्पकमल दाहालले बिहीबार बिहान उद्योग तथा वाणिज्य मन्त्री तथा मुख्यसचिवलाई चिनीको अभाव सिर्जना हुन नदिन सबै उपायको खोजी गर्न निर्देशन दिएका थिए ।

नेपाली चिनी उद्योगहरूले आम उपभोक्तालाई सहज हुने किसिमले बजारमा चिनी नपठाइ ठूला उद्योगलाई आपूर्ति गर्न गोदाममै राख्ने गरेको पनि भेटिएको छ । वाणिज्य विभागको तथ्यांक अनुसार, नेपालमा उत्पादन हुने चिनीको सत्तरी प्रतिशत चिनी बिभिन्न पेय पदार्थ, मिठाइ, चकलेट, विस्कुटलगायतका उद्योगहरुमा आपूर्ति हुने गर्दछ ।

नेपाल प्रहरीले नेपालमा रहेका सबै चिनी उद्योगको स्टक रेकर्ड चेक गर्ने तथा सो आधारमा बजारमा चिनी पठाउन उद्योगीहरूसँग छलफल गरिने विभागले जनाएको छ"""

inputs = tokenizer(sentence, max_length=1000, return_tensors="pt")
summary_ids = model.to(device).generate(inputs["input_ids"].to(device))

tokenizer.decode(summary_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)

'दशैंको मुखमा चिनीको चरम अभाव भएको भन्दै नेपाल प्रहरीले सबै चिनी उद्योगको स्टक रेकर्ड चेक गर्ने र बजारमा चिनी पठाउन उद्योगीहरूसँग छलफल गर्ने जनाएको छ।'

```
#### Hardware

The model was pre-trained continuously on a single A10G GPU in an AWS instance for 133 hours with each epoch taking 45 hours using bf16 quantization.

#### Possible Future Directions:

1. Use a decoder-only model for pre-training and summarization.
<br>As it seems the case when the span deleting tokens is not very large, the model learns to copy the token from the encoder context during Cross-attention to decoder generation.
<br>Thus, hurts the performance of the Abstractive Summarization task.
<br>This case is not present in the decoder-only model as all the predicted next token is not seen by the model at all.

2. We have pre-trained our model with approx 16 GB of data, and testing Classification result on <a href='https://www.kaggle.com/datasets/ashokpant/nepali-news-dataset-large/data'>Nepali News Dataset (Large)</a> with a couple of Nepali transformer based Models available on Hugging Face,
<br> Our models seem to do better than others with an accuracy of 0.58 on validation but,
<br> There could be two reasons for this:

   - There is still room for improving the quality of the data. (test with HLP)
     <br>Try below, if HLP >> 0.58
   - We still do not have enough data for generalization as Transformer models only perform well with large amounts of pre-trained data compared with Classical Sequential Models.

#### Authors:

<a href="https://www.linkedin.com/in/bijaya-bhatta-69536018a/">Vijaya Bhatta</a>
<br><a href="https://www.linkedin.com/in/pascal-rai/">Pascal Rai</a>
<br><a href="https://www.linkedin.com/in/niranjan-shrestha-gem/">Niranjan Shrestha</a>
<br><a href="https://www.linkedin.com/in/dristi-sigdel-3120131b1/">Dristi Sigdel</a>
<br><a href="https://www.linkedin.com/in/sujan-neupane-596964211/">Sujan Neupane</a>
<br><a href="https://www.linkedin.com/in/sagar-kafle-a1b84b185/">Sagar Kafle</a>