DeDeckerThomas commited on
Commit
4f8e3db
1 Parent(s): 3b64112

Update inference process

Browse files
Files changed (1) hide show
  1. README.md +27 -66
README.md CHANGED
@@ -58,89 +58,50 @@ Sahrawat, Dhruva, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma,
58
 
59
  ### ❓ How to use
60
  ```python
61
- # Define post_process functions
62
- def concat_tokens_by_tag(keyphrases):
63
- keyphrase_tokens = []
64
- for id, label in keyphrases:
65
- if label == "B":
66
- keyphrase_tokens.append([id])
67
- elif label == "I":
68
- if len(keyphrase_tokens) > 0:
69
- keyphrase_tokens[len(keyphrase_tokens) - 1].append(id)
70
- return keyphrase_tokens
71
-
72
-
73
- def extract_keyphrases(example, predictions, tokenizer, index=0):
74
- keyphrases_list = [
75
- (id, idx2label[label])
76
- for id, label in zip(
77
- np.array(example["input_ids"]).squeeze().tolist(), predictions[index]
78
  )
79
- if idx2label[label] in ["B", "I"]
80
- ]
81
 
82
- processed_keyphrases = concat_tokens_by_tag(keyphrases_list)
83
- extracted_kps = tokenizer.batch_decode(
84
- processed_keyphrases,
85
- skip_special_tokens=True,
86
- clean_up_tokenization_spaces=True,
87
- )
88
- return np.unique([kp.strip() for kp in extracted_kps])
89
 
90
  ```
91
 
92
  ```python
93
- # Load model and tokenizer
94
  model_name = "DeDeckerThomas/keyphrase-extraction-kbir-inspec"
95
- tokenizer = AutoTokenizer.from_pretrained(model_name)
96
- model = AutoModelForTokenClassification.from_pretrained(model_name)
97
  ```
98
  ```python
99
  # Inference
100
  text = """
101
- Keyphrase extraction is a technique in text analysis where you extract the important keyphrases
102
- from a text. Since this is a time-consuming process, Artificial Intelligence is used to automate it.
103
- Currently, classical machine learning methods, that use statistics and linguistics, are widely used
104
- for the extraction process. The fact that these methods have been widely used in the community has
105
- the advantage that there are many easy-to-use libraries. Now with the recent innovations in
106
- deep learning methods (such as recurrent neural networks and transformers, GANS, …),
107
- keyphrase extraction can be improved. These new methods also focus on the semantics
108
- and context of a document, which is quite an improvement.
109
- """.replace("\n", "")
110
-
111
- encoded_input = tokenizer(
112
- text,
113
- truncation=True,
114
- padding="max_length",
115
- max_length=max_length,
116
- return_tensors="pt",
117
  )
118
 
119
- output = model(**encoded_input)
120
- logits = output.logits.detach().numpy()
121
- predictions = np.argmax(logits, axis=2)
122
-
123
- extracted_kps = extract_keyphrases(encoded_input, predictions, tokenizer)
124
-
125
- print("***** Input Document *****")
126
- print(text)
127
 
128
- print("***** Prediction *****")
129
- print(extracted_kps)
130
  ```
131
 
132
  ```
133
- ***** Input Document *****
134
- Keyphrase extraction is a technique in text analysis where you extract the important keyphrases
135
- from a text. Since this is a time-consuming process, Artificial Intelligence is used to automate it.
136
- Currently, classical machine learning methods, that use statistics and linguistics, are widely used
137
- for the extraction process. The fact that these methods have been widely used in the community has
138
- the advantage that there are many easy-to-use libraries. Now with the recent innovations in
139
- deep learning methods (such as recurrent neural networks and transformers, GANS, …),
140
- keyphrase extraction can be improved. These new methods also focus on the semantics
141
- and context of a document, which is quite an improvement.
142
-
143
- ***** Prediction *****
144
  ['Artificial Intelligence' 'GANS' 'Keyphrase extraction'
145
  'classical machine learning' 'deep learning methods'
146
  'keyphrase extraction' 'linguistics' 'recurrent neural networks'
 
58
 
59
  ### ❓ How to use
60
  ```python
61
+ # Define keyphrase extraction pipeline
62
+ class KeyphraseExtractionPipeline(TokenClassificationPipeline):
63
+ def __init__(self, model, *args, **kwargs):
64
+ super().__init__(
65
+ model=AutoModelForTokenClassification.from_pretrained(model),
66
+ tokenizer=AutoTokenizer.from_pretrained(model),
67
+ *args,
68
+ **kwargs
 
 
 
 
 
 
 
 
 
69
  )
 
 
70
 
71
+ def postprocess(self, model_outputs):
72
+ results = super().postprocess(
73
+ model_outputs=model_outputs,
74
+ aggregation_strategy=AggregationStrategy.SIMPLE,
75
+ )
76
+ return np.unique([result.get("word").strip() for result in results])
 
77
 
78
  ```
79
 
80
  ```python
81
+ # Load pipeline
82
  model_name = "DeDeckerThomas/keyphrase-extraction-kbir-inspec"
83
+ extractor = KeyphraseExtractionPipeline(model=model_name)
 
84
  ```
85
  ```python
86
  # Inference
87
  text = """
88
+ Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
89
+ Since this is a time-consuming process, Artificial Intelligence is used to automate it.
90
+ Currently, classical machine learning methods, that use statistics and linguistics, are widely used for the extraction process.
91
+ The fact that these methods have been widely used in the community has the advantage that there are many easy-to-use libraries.
92
+ Now with the recent innovations in deep learning methods (such as recurrent neural networks and transformers, GANS, …),
93
+ keyphrase extraction can be improved. These new methods also focus on the semantics and context of a document, which is quite an improvement.
94
+ """.replace(
95
+ "\n", ""
 
 
 
 
 
 
 
 
96
  )
97
 
98
+ keyphrases = extractor(text)
 
 
 
 
 
 
 
99
 
100
+ print(keyphrases)
 
101
  ```
102
 
103
  ```
104
+ # Output
 
 
 
 
 
 
 
 
 
 
105
  ['Artificial Intelligence' 'GANS' 'Keyphrase extraction'
106
  'classical machine learning' 'deep learning methods'
107
  'keyphrase extraction' 'linguistics' 'recurrent neural networks'