batterydata commited on
Commit
cea4c14
1 Parent(s): 884fe93

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md CHANGED
@@ -1,3 +1,52 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
3
+ tags: Text Classification
4
  license: apache-2.0
5
+ datasets:
6
+ - batterydata/paper-abstracts
7
+ metrics: glue
8
  ---
9
+
10
+ # BatteryBERT-base-cased for Battery Abstract Classification
11
+ **Language model:** batterybert-cased
12
+ **Language:** English
13
+ **Downstream-task:** Text Classification
14
+ **Training data:** training\_data.csv
15
+ **Eval data:** val\_data.csv
16
+ **Code:** See [example](https://github.com/ShuHuang/batterybert)
17
+ **Infrastructure**: 8x DGX A100
18
+ ## Hyperparameters
19
+ ```
20
+ batch_size = 32
21
+ n_epochs = 11
22
+ base_LM_model = "batterybert-cased"
23
+ learning_rate = 2e-5
24
+ ```
25
+ ## Performance
26
+ ```
27
+ "Validation accuracy": 97.29,
28
+ "Test accuracy": 96.85,
29
+ ```
30
+
31
+ ## Usage
32
+ ### In Transformers
33
+ ```python
34
+ from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
35
+ model_name = "batterydata/batterybert-cased-abstract"
36
+
37
+ # a) Get predictions
38
+ nlp = pipeline('text-classification', model=model_name, tokenizer=model_name)
39
+ input = {'The typical non-aqueous electrolyte for commercial Li-ion cells is a solution of LiPF6 in linear and cyclic carbonates.'}
40
+ res = nlp(input)
41
+
42
+ # b) Load model & tokenizer
43
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
44
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
45
+ ```
46
+ ## Authors
47
+ Shu Huang: `sh2009 [at] cam.ac.uk`
48
+
49
+ Jacqueline Cole: `jmc61 [at] cam.ac.uk`
50
+
51
+ ## Citation
52
+ BatteryBERT: A Pre-trained Language Model for Battery Database Enhancement