uie-base-chinese / README.md
xuyingjie521
first
0db4d44
|
raw
history blame
No virus
1.55 kB
## UIE(Universal Information Extraction)
### Introduction
UIE(Universal Information Extraction) is an SOTA method in PaddleNLP, you can see details [here](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/uie).
Paper is [here](https://arxiv.org/pdf/2203.12277.pdf)
### Usage
I save the UIE model as a entire model(Ernie 3.0 backbone + start/end layers), so you need to load model as:
#### 1. clone this model to your local path
```sh
git lfs install
git clone https://huggingface.co/xyj125/uie-base-chinese
```
If you don't have [`git-lfs`], you can also:
* Download manually by click [`Files and versions`] at Top Of This Card.
#### 2. load this model from local
```python
import os
import torch
from transformers import AutoTokenizer
uie_model = 'uie-base-zh'
model = torch.load(os.path.join(uie_model, 'pytorch_model.bin')) # load UIE model
tokenizer = AutoTokenizer.from_pretrained('uie-base') # load tokenizer
...
start_prob, end_prob = model(input_ids=batch['input_ids'],
token_type_ids=batch['token_type_ids'],
attention_mask=batch['attention_mask']))
print(f'start_prob ({type(start_prob)}): {start_prob.size()}') # start_prob
print(f'end_prob ({type(end_prob)}): {end_prob.size()}') # end_prob
...
```
Here is the output of model (with batch_size=16, max_seq_len=256):
```python
start_prob (<class 'torch.Tensor'>): torch.Size([16, 256])
end_prob (<class 'torch.Tensor'>): torch.Size([16, 256])
```