YagiASAFAS
commited on
Commit
•
c04024e
1
Parent(s):
0a0e3a5
Add tokenizer files
Browse files- README.md +27 -73
- config.json +36 -36
- pytorch_model.bin +1 -1
- tokenizer.json +0 -0
- training_args.bin +1 -1
README.md
CHANGED
@@ -5,58 +5,33 @@ tags:
|
|
5 |
metrics:
|
6 |
- accuracy
|
7 |
model-index:
|
8 |
-
- name: malaysia-news-classification-bert-english
|
9 |
results: []
|
10 |
---
|
11 |
|
12 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
13 |
should probably proofread and complete it, then remove this comment. -->
|
14 |
|
15 |
-
# malaysia-news-classification-bert-english
|
16 |
|
17 |
-
This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on
|
18 |
It achieves the following results on the evaluation set:
|
19 |
-
- Loss: 1.
|
20 |
-
- Accuracy: 0.
|
21 |
|
22 |
## Model description
|
23 |
|
24 |
-
|
25 |
|
26 |
## Intended uses & limitations
|
27 |
|
28 |
-
|
29 |
-
While the model is optimized for Malaysian news content, it has several limitations:
|
30 |
-
|
31 |
-
Cultural and Contextual Specificity: The model is specifically trained to interpret and categorize news based on Malaysia's unique cultural and contextual framework. As a result, its accuracy and relevance drop significantly when applied to news content from other countries or in different languages.
|
32 |
-
- Generalizability: The model's training on Malaysian-specific news content limits its generalizability to broader, international contexts. It may not perform well with news from other regions as it may not correctly interpret cultural nuances, idiomatic expressions, or context-specific references not present in Malaysian news.
|
33 |
-
- Dynamic News Landscape: The model may require frequent retraining to stay relevant due to the dynamic nature of news and ongoing cultural and societal changes. What is considered an important category or context might evolve, necessitating updates to both the model and its training data.
|
34 |
-
- Bias and Sensitivity: Like any data-driven model, there is a risk of inheriting biases from the training dataset. Careful consideration and continuous monitoring are needed to ensure that the model does not perpetuate any form of cultural bias or insensitivity.
|
35 |
|
36 |
## Training and evaluation data
|
37 |
|
38 |
-
|
39 |
-
|
40 |
-
- Training Data: 80% of the dataset (approximately 14,402 articles) was used for training. This substantial portion ensures that the model has ample examples to learn from, encompassing a wide array of topics and linguistic nuances.
|
41 |
-
- Validation Data: The remaining 20% (approximately 3,601 articles) was used for validation. This set is crucial for gauging the model's performance and generalization on unseen data, helping to mitigate overfitting and bias.
|
42 |
-
|
43 |
-
### Evaluation Strategy:
|
44 |
-
Regular evaluations were conducted after certain numbers of training steps to monitor the model’s performance and adjust parameters if necessary. This frequent evaluation helps identify the best model configuration during training and avoids extensive periods of potential overfitting.
|
45 |
-
|
46 |
-
### Performance Metrics:
|
47 |
-
The model's performance was assessed using the loss and accuracy metrics:
|
48 |
-
|
49 |
-
- Training Loss: Showed a consistent decrease from 0.6359 in the first epoch to effectively zero by the last epoch, indicating good learning progress.
|
50 |
-
- Validation Loss: Increased over epochs, suggesting issues with model generalization despite decreasing training loss.
|
51 |
-
- Accuracy: Increased to approximately 89.4751% by the end of the training, reflecting the model's ability to correctly classify a high percentage of the validation set.
|
52 |
|
53 |
## Training procedure
|
54 |
-
Training was conducted over 16 epochs, with the following parameters configured to optimize learning:
|
55 |
-
|
56 |
-
- Batch Size: An instantaneous batch size of 8 was used per device, with no parallel or distributed settings applied, resulting in a total train batch size of 8.
|
57 |
-
- Optimization Steps: The model completed a total of 28,816 optimization steps, aligning with the batch size and data volume to ensure comprehensive exposure to the training data.
|
58 |
-
- Optimizer: The model used the AdamW optimizer, a variant of Adam better suited for these types of models as it handles weight decay more effectively. However, a future warning was noted suggesting the transition to PyTorch’s native AdamW implementation for future uses.
|
59 |
-
- Gradient Accumulation: The training utilized a gradient accumulation strategy, accumulating gradients over steps to enhance training stability and performance on smaller batch sizes.
|
60 |
|
61 |
### Training hyperparameters
|
62 |
|
@@ -70,52 +45,31 @@ The following hyperparameters were used during training:
|
|
70 |
- num_epochs: 16
|
71 |
- mixed_precision_training: Native AMP
|
72 |
|
73 |
-
## Label Mappings
|
74 |
-
This model can predict the following labels:
|
75 |
-
- `0`: Election
|
76 |
-
- `1`: Political Issue
|
77 |
-
- `2`: Corruption
|
78 |
-
- `3`: Democracy
|
79 |
-
- `4`: Economic Growth
|
80 |
-
- `5`: Economic Disparity
|
81 |
-
- `6`: Economic Subsidy
|
82 |
-
- `7`: Ethnic Discrimination
|
83 |
-
- `8`: Ethnic Relation
|
84 |
-
- `9`: Ethnic Culture
|
85 |
-
- `10`: Religious Issue
|
86 |
-
- `11`: Business and Finance
|
87 |
-
- `12`: Sport
|
88 |
-
- `13`: Food
|
89 |
-
- `14`: Entertainment
|
90 |
-
- `15`: Environmental Issue
|
91 |
-
- `16`: Domestic News
|
92 |
-
- `17`: World News
|
93 |
-
|
94 |
### Training results
|
95 |
|
96 |
-
| Training Loss | Epoch | Step
|
97 |
-
|
98 |
-
|
|
99 |
-
|
|
100 |
-
| 0.
|
101 |
-
| 0.
|
102 |
-
| 0.
|
103 |
-
| 0.
|
104 |
-
| 0.
|
105 |
-
| 0.
|
106 |
-
| 0.
|
107 |
-
| 0.
|
108 |
-
| 0.
|
109 |
-
| 0.
|
110 |
-
| 0.
|
111 |
-
| 0.
|
112 |
-
| 0.
|
113 |
-
| 0.
|
114 |
|
115 |
|
116 |
### Framework versions
|
117 |
|
118 |
- Transformers 4.18.0
|
119 |
- Pytorch 2.2.1+cu121
|
120 |
-
- Datasets 2.
|
121 |
- Tokenizers 0.12.1
|
|
|
5 |
metrics:
|
6 |
- accuracy
|
7 |
model-index:
|
8 |
+
- name: malaysia-news-classification-bert-english-skewness-fixed
|
9 |
results: []
|
10 |
---
|
11 |
|
12 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
13 |
should probably proofread and complete it, then remove this comment. -->
|
14 |
|
15 |
+
# malaysia-news-classification-bert-english-skewness-fixed
|
16 |
|
17 |
+
This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on an unknown dataset.
|
18 |
It achieves the following results on the evaluation set:
|
19 |
+
- Loss: 1.2051
|
20 |
+
- Accuracy: 0.8436
|
21 |
|
22 |
## Model description
|
23 |
|
24 |
+
More information needed
|
25 |
|
26 |
## Intended uses & limitations
|
27 |
|
28 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
|
30 |
## Training and evaluation data
|
31 |
|
32 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
|
36 |
### Training hyperparameters
|
37 |
|
|
|
45 |
- num_epochs: 16
|
46 |
- mixed_precision_training: Native AMP
|
47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
### Training results
|
49 |
|
50 |
+
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|
51 |
+
|:-------------:|:-----:|:----:|:---------------:|:--------:|
|
52 |
+
| No log | 1.0 | 358 | 0.9357 | 0.7486 |
|
53 |
+
| 1.3554 | 2.0 | 716 | 0.9041 | 0.7807 |
|
54 |
+
| 0.4851 | 3.0 | 1074 | 0.7842 | 0.8282 |
|
55 |
+
| 0.4851 | 4.0 | 1432 | 0.9478 | 0.8226 |
|
56 |
+
| 0.2558 | 5.0 | 1790 | 1.0765 | 0.8282 |
|
57 |
+
| 0.1084 | 6.0 | 2148 | 1.1310 | 0.8380 |
|
58 |
+
| 0.0625 | 7.0 | 2506 | 1.0999 | 0.8464 |
|
59 |
+
| 0.0625 | 8.0 | 2864 | 1.1391 | 0.8408 |
|
60 |
+
| 0.0301 | 9.0 | 3222 | 1.1036 | 0.8506 |
|
61 |
+
| 0.0171 | 10.0 | 3580 | 1.0765 | 0.8534 |
|
62 |
+
| 0.0171 | 11.0 | 3938 | 1.1291 | 0.8506 |
|
63 |
+
| 0.0129 | 12.0 | 4296 | 1.1360 | 0.8520 |
|
64 |
+
| 0.0035 | 13.0 | 4654 | 1.1619 | 0.8450 |
|
65 |
+
| 0.0039 | 14.0 | 5012 | 1.1727 | 0.8534 |
|
66 |
+
| 0.0039 | 15.0 | 5370 | 1.2079 | 0.8408 |
|
67 |
+
| 0.0031 | 16.0 | 5728 | 1.2051 | 0.8436 |
|
68 |
|
69 |
|
70 |
### Framework versions
|
71 |
|
72 |
- Transformers 4.18.0
|
73 |
- Pytorch 2.2.1+cu121
|
74 |
+
- Datasets 2.19.0
|
75 |
- Tokenizers 0.12.1
|
config.json
CHANGED
@@ -10,46 +10,46 @@
|
|
10 |
"hidden_dropout_prob": 0.1,
|
11 |
"hidden_size": 768,
|
12 |
"id2label": {
|
13 |
-
"0": "
|
14 |
-
"1": "
|
15 |
-
"2": "
|
16 |
-
"3": "
|
17 |
-
"4": "
|
18 |
-
"5": "
|
19 |
-
"6": "
|
20 |
-
"7": "
|
21 |
-
"8": "
|
22 |
-
"9": "
|
23 |
-
"10": "
|
24 |
-
"11": "
|
25 |
-
"12": "
|
26 |
-
"13": "
|
27 |
-
"14": "
|
28 |
-
"15": "
|
29 |
-
"16": "
|
30 |
-
"17": "
|
31 |
},
|
32 |
"initializer_range": 0.02,
|
33 |
"intermediate_size": 3072,
|
34 |
"label2id": {
|
35 |
-
"
|
36 |
-
"
|
37 |
-
"
|
38 |
-
"
|
39 |
-
"
|
40 |
-
"
|
41 |
-
"
|
42 |
-
"
|
43 |
-
"
|
44 |
-
"
|
45 |
-
"
|
46 |
-
"
|
47 |
-
"
|
48 |
-
"
|
49 |
-
"
|
50 |
-
"
|
51 |
-
"
|
52 |
-
"
|
53 |
},
|
54 |
"layer_norm_eps": 1e-12,
|
55 |
"max_position_embeddings": 512,
|
|
|
10 |
"hidden_dropout_prob": 0.1,
|
11 |
"hidden_size": 768,
|
12 |
"id2label": {
|
13 |
+
"0": "LABEL_0",
|
14 |
+
"1": "LABEL_1",
|
15 |
+
"2": "LABEL_2",
|
16 |
+
"3": "LABEL_3",
|
17 |
+
"4": "LABEL_4",
|
18 |
+
"5": "LABEL_5",
|
19 |
+
"6": "LABEL_6",
|
20 |
+
"7": "LABEL_7",
|
21 |
+
"8": "LABEL_8",
|
22 |
+
"9": "LABEL_9",
|
23 |
+
"10": "LABEL_10",
|
24 |
+
"11": "LABEL_11",
|
25 |
+
"12": "LABEL_12",
|
26 |
+
"13": "LABEL_13",
|
27 |
+
"14": "LABEL_14",
|
28 |
+
"15": "LABEL_15",
|
29 |
+
"16": "LABEL_16",
|
30 |
+
"17": "LABEL_17"
|
31 |
},
|
32 |
"initializer_range": 0.02,
|
33 |
"intermediate_size": 3072,
|
34 |
"label2id": {
|
35 |
+
"LABEL_0": 0,
|
36 |
+
"LABEL_1": 1,
|
37 |
+
"LABEL_10": 10,
|
38 |
+
"LABEL_11": 11,
|
39 |
+
"LABEL_12": 12,
|
40 |
+
"LABEL_13": 13,
|
41 |
+
"LABEL_14": 14,
|
42 |
+
"LABEL_15": 15,
|
43 |
+
"LABEL_16": 16,
|
44 |
+
"LABEL_17": 17,
|
45 |
+
"LABEL_2": 2,
|
46 |
+
"LABEL_3": 3,
|
47 |
+
"LABEL_4": 4,
|
48 |
+
"LABEL_5": 5,
|
49 |
+
"LABEL_6": 6,
|
50 |
+
"LABEL_7": 7,
|
51 |
+
"LABEL_8": 8,
|
52 |
+
"LABEL_9": 9
|
53 |
},
|
54 |
"layer_norm_eps": 1e-12,
|
55 |
"max_position_embeddings": 512,
|
pytorch_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 438057586
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:116462010c7257bebb94dcae1576e4d27d367feae196641b5b0d56bf006d7f67
|
3 |
size 438057586
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 3576
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e339ada0cb016077ae43fbacd80614bba57b4f9d5d2915bee4bddc1956e4108b
|
3 |
size 3576
|