DeDeckerThomas
commited on
Commit
β’
8ca8633
1
Parent(s):
6d08934
Update README.md
Browse files
README.md
CHANGED
@@ -84,18 +84,19 @@ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
|
|
84 |
|
85 |
```python
|
86 |
# Load pipeline
|
87 |
-
model_name = "
|
88 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
89 |
```
|
90 |
```python
|
91 |
# Inference
|
92 |
text = """
|
93 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
94 |
-
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
95 |
-
Currently, classical machine learning methods, that use statistics and linguistics,
|
96 |
-
The fact that these methods have been widely used in the community
|
97 |
-
|
98 |
-
|
|
|
99 |
""".replace(
|
100 |
"\n", ""
|
101 |
)
|
@@ -107,10 +108,10 @@ print(keyphrases)
|
|
107 |
|
108 |
```
|
109 |
# Output
|
110 |
-
['Artificial Intelligence' '
|
111 |
-
'classical machine learning' '
|
112 |
-
'
|
113 |
-
'
|
114 |
```
|
115 |
|
116 |
## π Training Dataset
|
@@ -173,7 +174,7 @@ def preprocess_fuction(all_samples_per_split):
|
|
173 |
```
|
174 |
|
175 |
### Postprocessing
|
176 |
-
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive
|
177 |
```python
|
178 |
# Define post_process functions
|
179 |
def concat_tokens_by_tag(keyphrases):
|
@@ -217,4 +218,4 @@ The model achieves the following results on the Inspec test set:
|
|
217 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
218 |
|
219 |
## π¨ Issues
|
220 |
-
Please feel free to
|
|
|
84 |
|
85 |
```python
|
86 |
# Load pipeline
|
87 |
+
model_name = "ml6team/keyphrase-extraction-kbir-inspec"
|
88 |
extractor = KeyphraseExtractionPipeline(model=model_name)
|
89 |
```
|
90 |
```python
|
91 |
# Inference
|
92 |
text = """
|
93 |
Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
|
94 |
+
Since this is a time-consuming process, Artificial Intelligence is used to automate it.
|
95 |
+
Currently, classical machine learning methods, that use statistics and linguistics,
|
96 |
+
are widely used for the extraction process. The fact that these methods have been widely used in the community
|
97 |
+
has the advantage that there are many easy-to-use libraries. Now with the recent innovations in NLP,
|
98 |
+
transformers can be used to improve keyphrase extraction. Transformers also focus on the semantics
|
99 |
+
and context of a document, which is quite an improvement.
|
100 |
""".replace(
|
101 |
"\n", ""
|
102 |
)
|
|
|
108 |
|
109 |
```
|
110 |
# Output
|
111 |
+
['Artificial Intelligence', 'Keyphrase extraction', 'NLP',
|
112 |
+
'classical machine learning', 'keyphrase extraction',
|
113 |
+
'linguistics', 'semantics', 'statistics', 'text analysis',
|
114 |
+
'transformers']
|
115 |
```
|
116 |
|
117 |
## π Training Dataset
|
|
|
174 |
```
|
175 |
|
176 |
### Postprocessing
|
177 |
+
For the post-processing, you will need to filter out the B and I labeled tokens and concat the consecutive Bs and Is. As last you strip the keyphrase to ensure all spaces are removed.
|
178 |
```python
|
179 |
# Define post_process functions
|
180 |
def concat_tokens_by_tag(keyphrases):
|
|
|
218 |
For more information on the evaluation process, you can take a look at the keyphrase extraction evaluation notebook.
|
219 |
|
220 |
## π¨ Issues
|
221 |
+
Please feel free to start discussions in the Community Tab.
|