cwinkler's picture
Update README.md
785f2b6
|
raw
history blame
11.2 kB
metadata
license: apache-2.0
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - f1
model-index:
  - name: distilbert-base-uncased-finetuned-greenplastics-3
    results: []
widget:
  - text: >-
      The present disclosure relates to a process for recycling of plastic waste
      comprising: segregating plastic waste collected from various sources
      followed by cleaning of the segregated plastic waste to obtain segregated
      cleaned waste; grinding of the segregated cleaned waste to obtain grinded
      waste; introducing the grinded waste into an extrusion line having a
      venting extruder component as part of the extrusion line, to obtain molten
      plastic; and removing the impurities by vacuum venting of the molten
      plastic to obtained recycled plastic free from impurities. The present
      disclosure further relates to various articles like Industrial Post
      Recycled (IPR) plastic tubes, blow moulded bottles, pallates, manufactured
      from the recycled plastic waste.
language:
  - en
pipeline_tag: text-classification
library_name: transformers
datasets:
  - cwinkler/patents_green_plastics

Classification of patent abstracts - "Green Plastics" or "No Green Plastics"

This model (distilbert-base-uncased-finetuned-greenplastics-3) classifies patents into "green plastics" or "no green plastics" by their abstracts.

The model is a fine-tuned version of distilbert-base-uncased on the green plastics dataset (11.196 samples of patent abstracts). The green plastics dataset was split into 70 % training data and 30 % test data (using ".train_test_split(test_size=0.3)"). The model achieves the following results on the evaluation set:

  • Accuracy: 0.8574
  • F1: 0.8573

The maximum number of taining steps was set to 200 to avoid overfitting. I considered an accuracy of 0.8574 to be suitable for the task. Further training would lead to a high accuracy but testing the final model with random examples was not really satisfying. That is why I chose to limit the training steps.

EPO - CodeFest on Green Plastics

The model has been developed for submission to the CodeFest on Green Plastics by the European Patent Office (EPO).

The task:

"To develop creative and reliable artificial intelligence (AI) models for automating the identification of patents related to green plastics."

How to use the model

from transformers import pipeline

model_id = "cwinkler/distilbert-base-uncased-finetuned-greenplastics-3"
classifier = pipeline("text-classification", model=model_id)

your_abstract = <insert_your_abstract_here_as_a_string>
# e.g. your_abstract = "The present disclosure relates to a process for recycling of plastic waste comprising: segregating plastic waste collected from various sources followed by cleaning of the segregated plastic waste to obtain segregated cleaned waste; grinding of the segregated cleaned waste to obtain grinded waste; introducing the grinded waste into an extrusion line having a venting extruder component as part of the extrusion line, to obtain molten plastic; and removing the impurities by vacuum venting of the molten plastic to obtained recycled plastic free from impurities. The present disclosure further relates to various articles like Industrial Post Recycled (IPR) plastic tubes, blow moulded bottles, pallates, manufactured from the recycled plastic waste."
preds = classifier(your_abstract, return_all_scores=True)
print(preds)

Examples

Following examples are randomly chosen abstracts from patent literature.

"Green Plastics"

  • "A process and plant for the thermolytic conversion of waste organic materials into desirable hydrocarbon products involve the steps of delivering melted waste material 11 5 to one or more pyrolysis chambers 26 via heated and valved manifolds 22 and effecting pyrolysis of the waste material into a gaseous state in an oxygen purged and pressure controlled environment. Pyrolytic gases are then trans erred to one or more condensers 30a to distil and cool product hydrocarbons into their respective fractions. Includes the melting of waste (plastic) material 11 before delivery ir to any of the pyrolysis chambers 10 26, making the recovery of hydrocarbon material semi-continuous, directing melted waste material into one or more, but preferably four, pyrolysis chambers 26ab,c,d, making each chamber capable of independent operation. Included is mechanically removing waste char from walls of the pyrolysis chamber 107 by rotary blades and removal of char from the chamber 107 by use of an inernal auger 112 or other suitable 15 means."
    score: 0.696

  • "A trash sorting and recycling method, a trash sorting device and a trash sorting and recycling system are provided. The trash sorting and recycling method includes: acquiring a detection image of trash to be sorted; processing the detection image with a deep learning neural network to judge whether or not the trash to be sorted belongs to recyclable trash; if yes, sending a first control signal, to control to deliver the trash to be sorted into a recycling region; if no, sending a second control signal, to control to deliver the trash to be sorted into a non-recycling region."
    score: 0.717

  • "A biologically degradable polymer mixture containing at least one biopolymer made from renewable raw materials and a polymer selected from the following materials: an aromatic polyester; a polyester-copolymer with both aliphatic and aromatic blocks; a polyesteramide; a polyglycol; a polyester urethane; and/or mixtures of these components. The preferred renewable raw material is starch, more preferably native starch, most preferably native starch that has been predried."
    score: 0.912

  • "The present invention relates to a liquid, cleaning and/or cleansing composition comprising biodegradable abrasive cleaning particles."
    score: 0.87

  • "The present disclosure relates to a process for recycling of plastic waste comprising: segregating plastic waste collected from various sources followed by cleaning of the segregated plastic waste to obtain segregated cleaned waste; grinding of the segregated cleaned waste to obtain grinded waste; introducing the grinded waste into an extrusion line having a venting extruder component as part of the extrusion line, to obtain molten plastic; and removing the impurities by vacuum venting of the molten plastic to obtained recycled plastic free from impurities. The present disclosure further relates to various articles like Industrial Post Recycled (IPR) plastic tubes, blow moulded bottles, pallates, manufactured from the recycled plastic waste."
    score: 0.69

  • "An integrated process for the conversion of waste plastics to high value products. The integrated process allows for operation with a hydroprocessing reactor which provides simultaneous hydrogenation, dechlorination, and hydrocracking of components of a hydrocarbon stream to specifications which meet steam cracker requirements."
    score: 0.54

  • "A process for producing benzene and xylenes comprising introducing hydrocarbon liquid stream to hydroprocessor to yield first gas stream and hydrocarbon product (C5+); optionally introducing hydrocarbon product to first aromatics separating unit to produce saturated hydrocarbons (C5+) and first aromatics stream (C6+); feeding hydrocarbon product and/or saturated hydrocarbons to reformer to produce reformer product, second gas stream, and hydrogen stream; introducing reformer product to second aromatics separating unit to produce a non-aromatics recycle stream and second aromatics stream comprising C6+ aromatics; recycling non-aromatics recycle stream to reformer; introducing first aromatics stream and/or second aromatics stream to third aromatics separating unit to produce first C6 aromatics (benzene), C7 aromatics (toluene), C8 aromatics (xylenes&ethylbenzene), C9 aromatics, C10 aromatics, and C11+ aromatics; introducing C7 aromatics, C9 aromatics, C10 aromatics, or combinations thereof to disproportionation and transalkylation unit to yield third aromatics stream (benzene and xylenes); and conveying C11+ aromatics to hydroprocessor."
    score: 0.827

"No Green Plastics"

  • "A process for processing mixed plastics comprising simultaneous pyrolysis and dechlorination of the mixed plastics, the process comprising contacting the mixed plastics with a zeolitic catalyst in a pyrolysis unit to produce a hydrocarbon product comprising a gas phase and a liquid phase; and separating the hydrocarbon product into a hydrocarbon gas stream and a hydrocarbon liquid stream, wherein the hydrocarbon gas stream comprises at least a portion of the gas phase of the hydrocarbon product, wherein the hydrocarbon liquid stream comprises at least a portion of the liquid phase of the hydrocarbon product, wherein the hydrocarbon liquid stream comprises one or more chloride compounds in an amount of less than about 100 ppmw chloride, based on the total weight of the hydrocarbon liquid stream, and wherein the hydrocarbon liquid stream is characterized by a viscosity of less than about 400 cP at a temperature of 300° C."
    score: 0.38

  • "The present invention is related to improved phthalate-free polyvinyl chloride plastisol compositions for the production of decorative surface coverings, in particular floor and wall coverings with low emission of volatile organic compounds, to a method for the preparation of said phthalate-free PVC plastisols and to a process for the production of said surface coverings."
    score: 0.48

  • "An apparatus for shaping plastic preforms into plastic containers is disclosed. Said apparatus comprises a conveying device on which a plurality of blowing stations are arranged. Each of said blowing stations encompasses a blow mold, within which a plastic preform can be shaped into a plastic container. The apparatus further comprises a clean chamber, within which the plastic preforms can be conveyed. According to the invention, the zone of the conveying device in which the blowing stations are arranged is located in the clean chamber, and at least one additional zone of the conveying device is located outside the clean chamber."
    score: 0.136

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • training_steps: 200

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
No log 0.2 200 0.3435 0.8574 0.8573

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.0+cu116
  • Datasets 2.8.0
  • Tokenizers 0.13.2