Ihor commited on
Commit
2781497
1 Parent(s): 22a7f85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -0
README.md CHANGED
@@ -199,6 +199,119 @@ results = process(text, prompt)
199
  print(results)
200
  ```
201
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
  ### Benchmarking
203
  Below is a table that highlights the performance of UTC models on the [CrossNER](https://huggingface.co/datasets/DFKI-SLT/cross_ner) dataset. The values represent the Micro F1 scores, with the estimation done at the word level.
204
 
 
199
  print(results)
200
  ```
201
 
202
+ ### How to run with utca:
203
+ First of all, you need to install the package:
204
+ ```bash
205
+ pip install utca -U
206
+ ```
207
+
208
+ After that you to create predictor that will run UTC model:
209
+ ```python
210
+ from utca.core import (
211
+ AddData,
212
+ RenameAttribute,
213
+ Flush
214
+ )
215
+ from utca.implementation.predictors import (
216
+ TokenSearcherPredictor, TokenSearcherPredictorConfig
217
+ )
218
+ from utca.implementation.tasks import (
219
+ TokenSearcherNER,
220
+ TokenSearcherNERPostprocessor,
221
+ )
222
+
223
+ predictor = TokenSearcherPredictor(
224
+ TokenSearcherPredictorConfig(
225
+ device="cuda:0",
226
+ model="knowledgator/UTC-DeBERTa-small-v2"
227
+ )
228
+ )
229
+ ```
230
+
231
+ For NER model you should create the following pipeline:
232
+
233
+ ```python
234
+ ner_task = TokenSearcherNER(
235
+ predictor=predictor,
236
+ postprocess=[TokenSearcherNERPostprocessor(
237
+ threshold=0.5
238
+ )]
239
+ )
240
+
241
+ ner_task = TokenSearcherNER()
242
+
243
+ pipeline = (
244
+ AddData({"labels": ["scientist", "university", "city"]})
245
+ | ner_task
246
+ | Flush(keys=["labels"])
247
+ | RenameAttribute("output", "entities")
248
+ )
249
+ ```
250
+
251
+ And after that you can put your text for prediction and run the pipeline:
252
+
253
+ ```python
254
+ res = pipeline.run({
255
+ "text": """Dr. Paul Hammond, a renowned neurologist at Johns Hopkins University, has recently published a paper in the prestigious journal "Nature Neuroscience".
256
+ His research focuses on a rare genetic mutation, found in less than 0.01% of the population, that appears to prevent the development of Alzheimer's disease. Collaborating with researchers at the University of California, San Francisco, the team is now working to understand the mechanism by which this mutation confers its protective effect.
257
+ Funded by the National Institutes of Health, their research could potentially open new avenues for Alzheimer's treatment."""
258
+ })
259
+ ```
260
+
261
+ To use `utca` for relation extraction construct the following pipeline:
262
+
263
+ ```python
264
+ from utca.implementation.tasks import (
265
+ TokenSearcherNER,
266
+ TokenSearcherNERPostprocessor,
267
+ TokenSearcherRelationExtraction,
268
+ TokenSearcherRelationExtractionPostprocessor,
269
+ )
270
+
271
+ pipe = (
272
+ TokenSearcherNER( # TokenSearcherNER task produces classified entities that will be at the "output" key.
273
+ predictor=predictor,
274
+ postprocess=TokenSearcherNERPostprocessor(
275
+ threshold=0.5 # Entity threshold
276
+ )
277
+ )
278
+ | RenameAttribute("output", "entities") # Rename output entities from TokenSearcherNER task to use them as inputs in TokenSearcherRelationExtraction
279
+ | TokenSearcherRelationExtraction( # TokenSearcherRelationExtraction is used for relation extraction.
280
+ predictor=predictor,
281
+ postprocess=TokenSearcherRelationExtractionPostprocessor(
282
+ threshold=0.5 # Relation threshold
283
+ )
284
+ )
285
+ )
286
+ ```
287
+
288
+ To run pipeline you need to specify parameters for entities and relations:
289
+
290
+ ```python
291
+ r = pipe.run({
292
+ "text": text, # Text to process
293
+ "labels": [ # Labels used by TokenSearcherNER for entity extraction
294
+ "scientist",
295
+ "university",
296
+ "city",
297
+ "research",
298
+ "journal",
299
+ ],
300
+ "relations": [{ # Relation parameters
301
+ "relation": "published at", # Relation label. Required parameter.
302
+ "pairs_filter": [("scientist", "journal")], # Optional parameter. It specifies possible members of relations by their entity labels.
303
+ # Here, "scientist" is the entity label of the source, and "journal" is the target's entity label.
304
+ # If provided, only specified pairs will be returned.
305
+ },{
306
+ "relation": "worked at",
307
+ "pairs_filter": [("scientist", "university"), ("scientist", "other")],
308
+ "distance_threshold": 100, # Optional parameter. It specifies the max distance between spans in the text (i.e., the end of the span that is closer to the start of the text and the start of the next one).
309
+ }]
310
+ })
311
+
312
+ print(r["output"])
313
+ ```
314
+
315
  ### Benchmarking
316
  Below is a table that highlights the performance of UTC models on the [CrossNER](https://huggingface.co/datasets/DFKI-SLT/cross_ner) dataset. The values represent the Micro F1 scores, with the estimation done at the word level.
317