ml6team
/

keyphrase-extraction-kbir-inspec

@@ -58,89 +58,50 @@ Sahrawat, Dhruva, Debanjan Mahata, Haimin Zhang, Mayank Kulkarni, Agniv Sharma,
 ### ❓ How to use
 ```python
-# Define post_process functions
-def concat_tokens_by_tag(keyphrases):
-    keyphrase_tokens = []
-    for id, label in keyphrases:
-        if label == "B":
-            keyphrase_tokens.append([id])
-        elif label == "I":
-            if len(keyphrase_tokens) > 0:
-                keyphrase_tokens[len(keyphrase_tokens) - 1].append(id)
-    return keyphrase_tokens
-def extract_keyphrases(example, predictions, tokenizer, index=0):
-    keyphrases_list = [
-        (id, idx2label[label])
-        for id, label in zip(
-            np.array(example["input_ids"]).squeeze().tolist(), predictions[index]
         )
-        if idx2label[label] in ["B", "I"]
-    ]
-    processed_keyphrases = concat_tokens_by_tag(keyphrases_list)
-    extracted_kps = tokenizer.batch_decode(
-        processed_keyphrases,
-        skip_special_tokens=True,
-        clean_up_tokenization_spaces=True,
-    )
-    return np.unique([kp.strip() for kp in extracted_kps])
 ```
 ```python
-# Load model and tokenizer
 model_name = "DeDeckerThomas/keyphrase-extraction-kbir-inspec"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForTokenClassification.from_pretrained(model_name)
 ```
 ```python
 # Inference
 text = """
-Keyphrase extraction is a technique in text analysis where you extract the important keyphrases
-from a text. Since this is a time-consuming process, Artificial Intelligence is used to automate it.
-Currently, classical machine learning methods, that use statistics and linguistics, are widely used
-for the extraction process. The fact that these methods have been widely used in the community has
-the advantage that there are many easy-to-use libraries. Now with the recent innovations in
-deep learning methods (such as recurrent neural networks and transformers, GANS, …),
-keyphrase extraction can be improved. These new methods also focus on the semantics
-and context of a document, which is quite an improvement.
-""".replace("\n", "")
-encoded_input = tokenizer(
-    text,
-    truncation=True,
-    padding="max_length",
-    max_length=max_length,
-    return_tensors="pt",
 )
-output = model(**encoded_input)
-logits = output.logits.detach().numpy()
-predictions = np.argmax(logits, axis=2)
-extracted_kps = extract_keyphrases(encoded_input, predictions, tokenizer)
-print("***** Input Document *****")
-print(text)
-print("***** Prediction *****")
-print(extracted_kps)
 ```
 ```
-***** Input Document *****
-Keyphrase extraction is a technique in text analysis where you extract the important keyphrases
-from a text. Since this is a time-consuming process, Artificial Intelligence is used to automate it.
-Currently, classical machine learning methods, that use statistics and linguistics, are widely used
-for the extraction process. The fact that these methods have been widely used in the community has
-the advantage that there are many easy-to-use libraries. Now with the recent innovations in
-deep learning methods (such as recurrent neural networks and transformers, GANS, …),
-keyphrase extraction can be improved. These new methods also focus on the semantics
-and context of a document, which is quite an improvement.
-***** Prediction *****
 ['Artificial Intelligence' 'GANS' 'Keyphrase extraction'
  'classical machine learning' 'deep learning methods'
  'keyphrase extraction' 'linguistics' 'recurrent neural networks'

 ### ❓ How to use
 ```python
+# Define keyphrase extraction pipeline
+class KeyphraseExtractionPipeline(TokenClassificationPipeline):
+    def __init__(self, model, *args, **kwargs):
+        super().__init__(
+            model=AutoModelForTokenClassification.from_pretrained(model),
+            tokenizer=AutoTokenizer.from_pretrained(model),
+            *args,
+            **kwargs
         )
+    def postprocess(self, model_outputs):
+        results = super().postprocess(
+            model_outputs=model_outputs,
+            aggregation_strategy=AggregationStrategy.SIMPLE,
+        )
+        return np.unique([result.get("word").strip() for result in results])
 ```
 ```python
+# Load pipeline
 model_name = "DeDeckerThomas/keyphrase-extraction-kbir-inspec"
+extractor = KeyphraseExtractionPipeline(model=model_name)
 ```
 ```python
 # Inference
 text = """
+Keyphrase extraction is a technique in text analysis where you extract the important keyphrases from a text.
+Since this is a time-consuming process, Artificial Intelligence is used to automate it.
+Currently, classical machine learning methods, that use statistics and linguistics, are widely used for the extraction process.
+The fact that these methods have been widely used in the community has the advantage that there are many easy-to-use libraries.
+Now with the recent innovations in deep learning methods (such as recurrent neural networks and transformers, GANS, …),
+keyphrase extraction can be improved. These new methods also focus on the semantics and context of a document, which is quite an improvement.
+""".replace(
+    "\n", ""
 )
+keyphrases = extractor(text)
+print(keyphrases)
 ```
 ```
+# Output
 ['Artificial Intelligence' 'GANS' 'Keyphrase extraction'
  'classical machine learning' 'deep learning methods'
  'keyphrase extraction' 'linguistics' 'recurrent neural networks'