File size: 2,265 Bytes
e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 16b5869 e66d572 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
---
language: bn
tags:
- collaborative
- bengali
- SequenceClassification
license: apache-2.0
datasets: IndicGlue
metrics:
- Loss
- Accuracy
- Precision
- Recall
---
# sahajBERT News Article Classification
## Model description
[sahajBERT](https://huggingface.co/neuropark/sahajBERT) fine-tuned for news article classification using the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue).
The model is trained for classifying articles into 5 different classes:
| Label id | Label |
|:--------:|:----:|
|0 | kolkata|
|1 | state|
|2 | national|
|3 | sports|
|4 | entertainment|
|5 | international|
## Intended uses & limitations
#### How to use
You can use this model directly with a pipeline for Sequence Classification:
```python
from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast
# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC")
# Initialize model
model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC")
# Initialize pipeline
pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model)
raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)
```
#### Limitations and bias
<!-- Provide examples of latent issues and potential remediations. -->
WIP
## Training data
The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT) at step 18149 and trained on the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue).
## Training procedure
Coming soon!
<!-- ```bibtex
@inproceedings{...,
year={2020}
}
``` -->
## Eval results
accuracy: 0.920623671155209
loss: 0.2719293534755707
macro_f1: 0.8924089161713425
macro_precision: 0.891858452957785
macro_recall: 0.8978917764271065
micro_f1: 0.920623671155209
micro_precision: 0.920623671155209
micro_recall: 0.920623671155209
weighted_f1: 0.9205158122362266
weighted_precision: 0.9236142214371135
weighted_recall: 0.920623671155209
### BibTeX entry and citation info
Coming soon!
<!-- ```bibtex
@inproceedings{...,
year={2020}
}
``` -->
|