--- license: apache-2.0 tags: - generated_from_trainer metrics: - accuracy - f1 model-index: - name: distilbert-base-uncased-finetuned-greenpatent results: [] widget: - text: A method for recycling waste - text: A method of reducing pollution - text: An apparatus to improve environmental aspects - text: A method to improve waste management - text: A device to use renewable energy sources datasets: - cwinkler/green_patents language: - en pipeline_tag: text-classification --- # Classification of patent title - "green" or "no green" This model classifies patents into "green patents" or "no green patents" by their titles. ### Examples of "green patents" titles: - "A method for recycling waste" - score: 0.714 - "A method of reducing pollution" - score: 0.786 - "An apparatus to improve environmental aspects" - score: 0.570 - "A method to improve waste management" - score: 0.813 - "A device to use renewable energy sources" - score: 0.98 - "A technology for efficient electrical power generation"- score: 0.975 - "A method for the production of fuel of non-fossil origin" - score: 0.975 - "Biofuels from waste" - score: 0.88 - "A combustion technology with mitigation potential" - score: 0.947 - "A device to capture greenhouse gases" - score: 0.871 - "A method to reduce the greenhouse effect" - score: 0.887 - "A device to improve the climate" - score: 0.650 - "A device to stop climate change" - score: 0.55 ### Examples of "no green patents" titles: - "A device to destroy the nature" - score: 0.19 - "A method to produce smoke" - score: 0.386 ### Examples of the model's limitation - "A method to avoid trash" - score: 0.165 - "A method to reduce trash" - score: 0.333 - "A method to burn the Amazonas" - score: 0.501 - "A method to burn wood" - score: 0.408 - "Green plastics" - score: 0.126 - "Greta Thunberg" - score: 0.313 (How dare you, model?); BUT: "A method of using Greta Thunberg to stop climate change" - score: 0.715 Examples were inspired by https://www.epo.org/news-events/in-focus/classification/classification.html # distilbert-base-uncased-finetuned-greenpatent This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the [green patent dataset](https://huggingface.co/datasets/cwinkler/green_patents). The green patent dataset was split into 70 % training data and 30 % test data (using ".train_test_split(test_size=0.3)"). The model achieves the following results on the evaluation set: - Loss: 0.3148 - Accuracy: 0.8776 - F1: 0.8770 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 64 - eval_batch_size: 64 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 2 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:| | 0.4342 | 1.0 | 101 | 0.3256 | 0.8721 | 0.8712 | | 0.3229 | 2.0 | 202 | 0.3148 | 0.8776 | 0.8770 | ### Framework versions - Transformers 4.25.1 - Pytorch 1.13.1+cpu - Datasets 2.8.0 - Tokenizers 0.13.2