Update README.md
Browse files
README.md
CHANGED
@@ -34,12 +34,14 @@ should probably proofread and complete it, then remove this comment. -->
|
|
34 |
|
35 |
This model (distilbert-base-uncased-finetuned-greenplastics-3) classifies patents into "green plastics" or "no green plastics" by their abstracts.
|
36 |
|
37 |
-
The model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the [green plastics dataset](https://huggingface.co/datasets/cwinkler/patents_green_plastics). The green
|
38 |
The model achieves the following results on the evaluation set:
|
39 |
|
40 |
- Accuracy: 0.8574
|
41 |
- F1: 0.8573
|
42 |
|
|
|
|
|
43 |
## EPO - CodeFest on Green Plastics
|
44 |
|
45 |
The model has been developed for submission to the [CodeFest on Green Plastics](https://www.epo.org/news-events/in-focus/codefest.html) by the European Patent Office (EPO).
|
|
|
34 |
|
35 |
This model (distilbert-base-uncased-finetuned-greenplastics-3) classifies patents into "green plastics" or "no green plastics" by their abstracts.
|
36 |
|
37 |
+
The model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the [green plastics dataset](https://huggingface.co/datasets/cwinkler/patents_green_plastics) (11.196 samples of patent abstracts). The green plastics dataset was split into 70 % training data and 30 % test data (using ".train_test_split(test_size=0.3)").
|
38 |
The model achieves the following results on the evaluation set:
|
39 |
|
40 |
- Accuracy: 0.8574
|
41 |
- F1: 0.8573
|
42 |
|
43 |
+
The maximum number of taining steps was set to 200 to avoid overfitting. I considered an accuracy of 0.8574 to be suitable for the task. Further training would lead to a high accuracy but testing the final model with random examples was not really satisfying. That is why I chose to limit the training steps.
|
44 |
+
|
45 |
## EPO - CodeFest on Green Plastics
|
46 |
|
47 |
The model has been developed for submission to the [CodeFest on Green Plastics](https://www.epo.org/news-events/in-focus/codefest.html) by the European Patent Office (EPO).
|