|
--- |
|
license: afl-3.0 |
|
datasets: |
|
- iqballx/indonesian_news_datasets |
|
language: |
|
- id |
|
metrics: |
|
- accuracy |
|
library_name: transformers |
|
--- |
|
# Model Card for Indonesian News Classification Model |
|
|
|
## Model Description |
|
This model is fine-tuned for the specific task of classifying Indonesian news articles (data were extracted from iqballx/indonesian_news_datasets) into predefined categories. It was trained using a dataset that was created by translating Indonesian news articles into English using a Neural Machine Translation (NMT) system and then labeling them with niksmer/ManiBERT, a model trained to classify political texts. The resulting dataset contains parallel corpora of Indonesian and English news texts alongside their corresponding categories. |
|
|
|
## Training Data |
|
The training data consists of articles from the iqballx/indonesian_news_datasets which were translated to English and then labeled using the niksmer/ManiBERT model. The dataset includes various categories, capturing a wide array of topics. |
|
|
|
## Evaluation |
|
The model was evaluated on a held-out test set, and its performance was measured in terms of accuracy. During the training process, the model's accuracy improved across multiple epochs, with the following accuracy scores achieved: 61.71% after the first epoch, 64.62% after the second epoch, 65.64% after the third epoch, and 65.27% after the fourth epoch. These results demonstrate the model's ability to consistently make correct classifications across different categories, indicating its robust performance. |
|
|
|
## Limitations and Bias |
|
As with any machine learning model, it is important to recognize potential limitations and biases. The translation step could introduce errors or nuances that affect the labeling accuracy. Additionally, the ManiBERT model used for initial labeling was trained on political texts, which may limit its effectiveness on non-political news or introduce political bias. |
|
|
|
## How to Use the Model |
|
To classify an Indonesian news article, you can use the script below: |
|
|
|
```python |
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
model_name = "YagiASAFAS/indonesia-news-classification-bert" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
# Write Indonesian Text |
|
inputs = tokenizer("[Indonesian Text]", return_tensors="pt") |
|
|
|
outputs = model(**inputs) |
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=1) |
|
|
|
id2label = model.config.id2label |
|
|
|
predicted_class_index = torch.argmax(predictions, dim=1).item() |
|
|
|
predicted_class_index |
|
|
|
predicted_category = id2label.get(predicted_class_index) |
|
|
|
print("Predicted Category:", predicted_category) |
|
``` |
|
|
|
# Label Mapping |
|
| Label ID | Label Text | |
|
|----------|------------| |
|
| 0 | Agriculture and Farmers | |
|
| 1 | Anti-Growth Economy and Sustainability | |
|
| 2 | Anti-Imperialism | |
|
| 3 | Centralisation: Positive | |
|
| 4 | Civic Mindedness: Positive | |
|
| 5 | Constitutionalism: Negative | |
|
| 6 | Constitutionalism: Positive | |
|
| 7 | Controlled Economy | |
|
| 8 | Corporatism/ Mixed Economy | |
|
| 9 | Culture: Positive | |
|
| 10 | Decentralisation: Positive | |
|
| 11 | Democracy | |
|
| 12 | Economic Goals | |
|
| 13 | Economic Growth: Positive | |
|
| 14 | Economic Orthodoxy | |
|
| 15 | Economic Planning | |
|
| 16 | Education Expansion | |
|
| 17 | Education Limitation | |
|
| 18 | Environmental Protection | |
|
| 19 | Equality: Positive | |
|
| 20 | European Community/Union or Latin America Integration: Negative | |
|
| 21 | European Community/Union or Latin America Integration: Positive | |
|
| 22 | Foreign Special Relationships: Negative | |
|
| 23 | Foreign Special Relationships: Positive | |
|
| 24 | Free Market Economy | |
|
| 25 | Freedom and Human Rights | |
|
| 26 | Governmental and Administrative Efficiency | |
|
| 27 | Incentives: Positive | |
|
| 28 | Internationalism: Negative | |
|
| 29 | Internationalism: Positive | |
|
| 30 | Labour Groups: Negative | |
|
| 31 | Labour Groups: Positive | |
|
| 32 | Law and Order | |
|
| 33 | Market Regulation | |
|
| 34 | Marxist Analysis: Positive | |
|
| 35 | Military: Negative | |
|
| 36 | Military: Positive | |
|
| 37 | Multiculturalism: Negative | |
|
| 38 | Multiculturalism: Positive | |
|
| 39 | National Way of Life: Negative | |
|
| 40 | National Way of Life: Positive | |
|
| 41 | Nationalisation | |
|
| 42 | Non-economic Demographic Groups | |
|
| 43 | None | |
|
| 44 | Peace | |
|
| 45 | Political Authority | |
|
| 46 | Political Corruption | |
|
| 47 | Protectionism: Negative | |
|
| 48 | Protectionism: Positive | |
|
| 49 | Technology and Infrastructure: Positive | |
|
| 50 | Traditional Morality: Negative | |
|
| 51 | Traditional Morality: Positive | |
|
| 52 | Underprivileged Minority Groups | |
|
| 53 | Welfare State Expansion | |
|
| 54 | Welfare State Limitation | |