File size: 5,748 Bytes
29bfbfa 917903a 5dad04b eb9285d 29bfbfa eb9285d 917903a 9fb80bf 84367c2 917903a eb9285d 917903a 3448016 917903a 9fb80bf 917903a 9fb80bf 917903a eb9285d 917903a eb9285d 917903a eb9285d 917903a eb9285d 917903a eb9285d 917903a eb9285d 917903a eb9285d 917903a eb9285d 917903a eb9285d 917903a eb9285d 917903a eb9285d 917903a eb9285d 917903a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
license: apache-2.0
tags:
- ESG
- finance
language:
- en
---
# About ESGify
**ESGify** is a model for multilabel news classification with respect to ESG risks. Our custom methodology includes 46 ESG classes and 1 non-relevant to ESG class, resulting in 47 classes in total:
| E | S | G |
| ----------- | ----------- | ----------- |
| **Biodiversity** | **Communities Health and Safety** | **Legal Proceedings & Law Violations** |
| **Emergencies (Environmental)** | **Land Acquisition and Resettlement (S)** | **Corporate Governance** |
| **Hazardous Materials Management** | **Emergencies (Social)** | **Responsible Investment & Greenwashing** |
| **Environmental Management** | **Human Rights** | **Economic Crime** |
| **Landscape Transformation** | **Labor Relations Management** | **Disclosure** |
| **Climate Risks** | **Freedom of Association and Right to Organise** | **Values and Ethics** |
| **Surface Water Pollution** | **Employee Health and Safety** | **Risk Management and Internal Control** |
| **Animal Welfare** | **Product Safety and Quality** | **Strategy Implementation** |
| **Water Consumption** | **Indigenous People** | **Supply Chain (Economic / Governance)** |
| **Greenhouse Gas Emissions** | **Cultural Heritage** ||
| **Air Pollution** | **Forced Labour** ||
| **Waste Management** | **Supply Chain (Social)** ||
| **Soil and Groundwater Impact** | **Discrimination** ||
| **Wastewater Management** | **Minimum Age and Child Labour** ||
| **Natural Resources** | **Data Safety** ||
| **Physical Impacts** | **Retrenchment** ||
| **Supply Chain (Environmental)** |||
| **Planning Limitations** |||
| **Energy Efficiency and Renewables** |||
| **Land Acquisition and Resettlement (E)** |||
| **Land Rehabilitation** |||
# Usage
ESGify is based on MPNet architecture but with a custom classification head. The ESGify class is defined is follows.
```python
from collections import OrderedDict
from transformers import MPNetPreTrainedModel, MPNetModel, AutoTokenizer
import torch
# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
# Definition of ESGify class because of custom,sentence-transformers like, mean pooling function and classifier head
class ESGify(MPNetPreTrainedModel):
"""Model for Classification ESG risks from text."""
def __init__(self,config): #tuning only the head
"""
"""
super().__init__(config)
# Instantiate Parts of model
self.mpnet = MPNetModel(config,add_pooling_layer=False)
self.id2label = config.id2label
self.label2id = config.label2id
self.classifier = torch.nn.Sequential(OrderedDict([('norm',torch.nn.BatchNorm1d(768)),
('linear',torch.nn.Linear(768,512)),
('act',torch.nn.ReLU()),
('batch_n',torch.nn.BatchNorm1d(512)),
('drop_class', torch.nn.Dropout(0.2)),
('class_l',torch.nn.Linear(512 ,47))]))
def forward(self, input_ids, attention_mask):
# Feed input to mpnet model
outputs = self.mpnet(input_ids=input_ids,
attention_mask=attention_mask)
# mean pooling dataset and eed input to classifier to compute logits
logits = self.classifier( mean_pooling(outputs['last_hidden_state'],attention_mask))
# apply sigmoid
logits = 1.0 / (1.0 + torch.exp(-logits))
return logits
```
After defining model class, we initialize ESGify and tokenizer with the pre-trained weights
```python
model = ESGify.from_pretrained('ai-lab/ESGify')
tokenizer = AutoTokenizer.from_pretrained('ai-lab/ESGify')
```
Getting results from the model:
```python
texts = ['text1','text2']
to_model = tokenizer.batch_encode_plus(
texts,
add_special_tokens=True,
max_length=512,
return_token_type_ids=False,
padding="max_length",
truncation=True,
return_attention_mask=True,
return_tensors='pt',
)
results = model(**to_model)
```
To identify top-3 classes by relevance and their scores:
```python
for i in torch.topk(results, k=3).indices.tolist()[0]:
print(f"{model.id2label[i]}: {np.round(results.flatten()[i].item(), 3)}")
```
For example, for the news "She faced employment rejection because of her gender", we get the following top-3 labels:
```
Discrimination: 0.944
Strategy Implementation: 0.82
Indigenous People: 0.499
```
Before training our model, we masked words related to Organisation, Date, Country, and Person to prevent false associations between these entities and risks. Hence, we recommend to process text with FLAIR NER model before inference.
An example of such preprocessing is given in https://colab.research.google.com/drive/15YcTW9KPSWesZ6_L4BUayqW_omzars0l?usp=sharing.
# Training procedure
We use the pretrained [`microsoft/mpnet-base`](https://huggingface.co/microsoft/mpnet-base) model.
Next, we do the domain-adaptation procedure by Mask Language Modeling with using texts of ESG reports.
Finally, we fine-tune our model on 2000 texts with manually annotation of ESG specialists.
|