File size: 1,925 Bytes
81b7f75
 
 
 
 
ef92768
81b7f75
1f61233
 
 
81b7f75
ef92768
 
81b7f75
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
language:
- en
---

# LED_para document simplification model

This is a pretrained version of the document simplification model presented in the Findings of ACL 2023 paper ["Context-Aware Document Simplification"](https://arxiv.org/abs/2305.06274). 

It is an end-to-end system based on the [Longformer encoder-decoder](https://huggingface.co/allenai/led-base-16384) that operates at the paragraph-level.

Target reading levels (1-4) should be indicated via a control token prepended to each input sequence ("\<RL_1\>", "\<RL_2\>", "\<RL_3\>", "\<RL_4\>"). If using the terminal interface, this will be handled automatically.

## How to use
It is recommended to use the [plan_simp](https://github.com/liamcripwell/plan_simp/tree/main) library to interface with the model.

Here is how to use this model in PyTorch:

```python
from plan_simp.models.bart import load_simplifier

simplifier, tokenizer, hparams = load_simplifier("liamcripwell/ledpara")

text = "<RL_3> Turing has an extensive legacy with statues of him and many things named after him, including an annual award for computer science innovations. He appears on the current Bank of England £50 note, which was released on 23 June 2021, to coincide with his birthday. A 2019 BBC series, as voted by the audience, named him the greatest person of the 20th century."
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, num_beams=5)
```

Generation and evaluation can also be run from the terminal.

```bash
python plan_simp/scripts/generate.py inference 
    --model_ckpt=liamcripwell/ledpara 
    --test_file=<test_data>
    --reading_lvl=s_level 
    --out_file=<output_csv>

python plan_simp/scripts/eval_simp.py
    --input_data=newselaauto_docs_test.csv
    --output_data=test_out_ledpara.csv
    --x_col=complex_str
    --r_col=simple_str
    --y_col=pred
    --doc_id_col=pair_id
    --prepro=True
    --sent_level=True
```