DeDeckerThomas commited on
Commit
910d2eb
β€’
1 Parent(s): ebe7d3c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -20
README.md CHANGED
@@ -35,11 +35,25 @@ Now with the recent innovations in deep learning methods (such as recurrent neur
35
 
36
 
37
  ## πŸ““ Model Description
38
- This model is a KBIR pre-trained model fine-tuned on the Inspec dataset. KBIR
39
- Keyphrase Boundary Infilling with Replacement (KBIR) which utilizes a multi-task learning setup for optimizing a combined loss of Masked Language Modeling (MLM), Keyphrase Boundary Infilling (KBI) and Keyphrase Replacement Classification (KRC).
40
- Paper: https://arxiv.org/abs/2112.08547
 
 
 
 
 
 
 
 
 
41
 
42
  ## βœ‹ Intended uses & limitations
 
 
 
 
 
43
  ### ❓ How to use
44
  ```python
45
  # Define post_process functions
@@ -131,18 +145,10 @@ and context of a document, which is quite an improvement.
131
  'semantics' 'statistics' 'text analysis' 'transformers']
132
  ```
133
 
134
- ### πŸ›‘ Limitations
135
- * The model performs very well on abstracts of scientific papers. Please be aware that this model very domain-specific.
136
- * Only works in English.
137
-
138
  ## [πŸ“š Training Dataset](https://huggingface.co/datasets/midas/inspec)
139
  ## πŸ‘·β€β™‚οΈ Training procedure
140
- The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
141
- - B => Begin of a keyphrase
142
- - I => Inside of a keyphrase
143
- - O => Ouside of a keyphrase
144
 
145
- For more information, you can take a look at the training notebook.
146
 
147
  ### Preprocessing
148
  ```python
@@ -189,11 +195,4 @@ The model achieves the following results on the Inspec test set:
189
  |:-----------------:|:----:|:----:|:----:|:----:|:----:|:-----:|:----:|:----:|:----:|
190
  | Inspec Test Set | 0.53 | 0.47 | 0.46 | 0.36 | 0.58 | 0.41 | 0.58 | 0.60 | 0.56 |
191
 
192
- For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
193
-
194
- ### Bibliography
195
- Debanjan Mahata, Navneet Agarwal, Dibya Gautam, Amardeep Kumar, Sagar Dhiman, Anish Acharya, & Rajiv Ratn Shah. (2021). LDkp Dataset [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5501744
196
-
197
- Kulkarni, Mayank, Debanjan Mahata, Ravneet Arora, and Rajarshi Bhowmik. "Learning Rich Representation of Keyphrases from Text." arXiv preprint arXiv:2112.08547 (2021).
198
-
199
- Sahrawat, Dhruva, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma, Rakesh Gosangi, Amanda Stent, Yaman Kumar, Rajiv Ratn Shah, and Roger Zimmermann. "Keyphrase extraction as sequence labeling using contextualized embeddings." In European Conference on Information Retrieval, pp. 328-335. Springer, Cham, 2020.
 
35
 
36
 
37
  ## πŸ““ Model Description
38
+ This model is a fine-tuned KBIR model on the Inspec dataset. KBIR or Keyphrase Boundary Infilling with Replacement is a pre-trained model which utilizes a multi-task learning setup for optimizing a combined loss of Masked Language Modeling (MLM), Keyphrase Boundary Infilling (KBI) and Keyphrase Replacement Classification (KRC).
39
+ You can find more information about the architecture in this paper: https://arxiv.org/abs/2112.08547.
40
+
41
+ The model is fine-tuned as a token classification problem where the text is labeled using the BIO scheme.
42
+ | Label | Description |
43
+ | ----- | ------------------------------- |
44
+ | B | At the beginning of a keyphrase |
45
+ | I | Inside a keyphrase |
46
+ | O | Outside a keyphrase |
47
+
48
+ Kulkarni, Mayank, Debanjan Mahata, Ravneet Arora, and Rajarshi Bhowmik. "Learning Rich Representation of Keyphrases from Text." arXiv preprint arXiv:2112.08547 (2021).
49
+ Sahrawat, Dhruva, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma, Rakesh Gosangi, Amanda Stent, Yaman Kumar, Rajiv Ratn Shah, and Roger Zimmermann. "Keyphrase extraction as sequence labeling using contextualized embeddings." In European Conference on Information Retrieval, pp. 328-335. Springer, Cham, 2020.
50
 
51
  ## βœ‹ Intended uses & limitations
52
+ ### πŸ›‘ Limitations
53
+ * This keyphrase extraction model is very domain-specific and will perform very well on abstracts of scientific papers.
54
+ * Only works for English documents.
55
+ * For a custom model, please consult the training notebook (link incoming).
56
+
57
  ### ❓ How to use
58
  ```python
59
  # Define post_process functions
 
145
  'semantics' 'statistics' 'text analysis' 'transformers']
146
  ```
147
 
 
 
 
 
148
  ## [πŸ“š Training Dataset](https://huggingface.co/datasets/midas/inspec)
149
  ## πŸ‘·β€β™‚οΈ Training procedure
 
 
 
 
150
 
151
+ For more information, you can take a look at the training notebook (link incoming).
152
 
153
  ### Preprocessing
154
  ```python
 
195
  |:-----------------:|:----:|:----:|:----:|:----:|:----:|:-----:|:----:|:----:|:----:|
196
  | Inspec Test Set | 0.53 | 0.47 | 0.46 | 0.36 | 0.58 | 0.41 | 0.58 | 0.60 | 0.56 |
197
 
198
+ For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.