|
--- |
|
license: mit |
|
language: en |
|
--- |
|
|
|
# BART-SLED (SLiding-Encoder and Decoder, base-sized model) |
|
|
|
SLED models use pretrained, short-range encoder-decoder models, and apply them over |
|
long-text inputs by splitting the input into multiple overlapping chunks, encoding each independently and perform fusion-in-decoder |
|
|
|
## Model description |
|
|
|
This SLED model is based on the BART model, which is described in its [model card](https://huggingface.co/facebook/bart-base). |
|
BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works |
|
well for comprehension tasks (e.g. text classification, question answering). When used as a BART-SLED model, it can be applied on long text tasks. |
|
|
|
## Intended uses & limitations |
|
|
|
You can use the raw model for text infilling. However, the model is mostly meant to be fine-tuned on a supervised dataset. |
|
|
|
### How to use |
|
To use the model, you first need to install `py-sled` in your environment (or clone the code from the [official repository](https://github.com/Mivg/SLED/blob/main/README.md)) |
|
``` |
|
pip install py-sled |
|
``` |
|
For more installation instructions, see [here](https://github.com/Mivg/SLED#Installation). |
|
|
|
|
|
Once installed, SLED is fully compatible with HuggingFace's AutoClasses (AutoTokenizer, AutoConfig, AutoModel |
|
and AutoModelForCausalLM) and can be loaded using the from_pretrained methods |
|
```python |
|
import sled # *** required so that SledModels will be registered for the AutoClasses *** |
|
model = AutoModel.from_pretrained('tau/bart-base-sled') |
|
``` |
|
|
|
Here is how to use this model in PyTorch: |
|
|
|
```python |
|
from sled import SledTokenizer, SledModel |
|
tokenizer = SledTokenizer.from_pretrained('tau/bart-base-sled') |
|
model = SledModel.from_pretrained('tau/bart-base-sled') |
|
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") |
|
outputs = model(**inputs) |
|
last_hidden_states = outputs.last_hidden_state |
|
``` |
|
You can also replace SledModel by SledModelForConditionalGeneration for Seq2Seq generation |
|
```python |
|
model = SledModelForConditionalGeneration.from_pretrained('tau/bart-base-sled') |
|
``` |
|
In case you wish to apply SLED on a task containing a prefix (e.g. question) which should be given as a context to |
|
every chunk, you can pass the `prefix_length` tensor input as well (A LongTensor in the length of the batch size). |
|
```python |
|
import torch |
|
import sled # *** required so that SledModels will be registered for the AutoClasses *** |
|
tokenizer = AutoTokenizer.from_pretrained('tau/bart-base-sled') |
|
model = AutoModel.from_pretrained('tau/bart-base-sled') |
|
document_input_ids = tokenizer("Dogs are great for you.", return_tensors="pt").input_ids |
|
prefix_input_ids = tokenizer("Are dogs good for you?", return_tensors="pt").input_ids |
|
input_ids = torch.cat((prefix_input_ids, document_input_ids), dim=-1) |
|
attention_mask = torch.ones_like(input_ids) |
|
prefix_length = torch.LongTensor([[prefix_input_ids.size(1)]]) |
|
|
|
outputs = model(input_ids=input_ids, attention_mask=attention_mask, prefix_length=prefix_length) |
|
last_hidden_states = outputs.last_hidden_state |
|
``` |
|
|
|
### BibTeX entry and citation info |
|
|
|
Please cite both the SLED [paper](https://arxiv.org/abs/2208.00748.pdf) and the BART [paper](https://arxiv.org/abs/1910.13461) by Lewis et al |
|
|
|
```bibtex |
|
@inproceedings{Ivgi2022EfficientLU, |
|
title={Efficient Long-Text Understanding with Short-Text Models}, |
|
author={Maor Ivgi and Uri Shaham and Jonathan Berant}, |
|
year={2022} |
|
} |
|
``` |
|
|
|
```bibtex |
|
@article{DBLP:journals/corr/abs-1910-13461, |
|
author = {Mike Lewis and |
|
Yinhan Liu and |
|
Naman Goyal and |
|
Marjan Ghazvininejad and |
|
Abdelrahman Mohamed and |
|
Omer Levy and |
|
Veselin Stoyanov and |
|
Luke Zettlemoyer}, |
|
title = {{BART:} Denoising Sequence-to-Sequence Pre-training for Natural Language |
|
Generation, Translation, and Comprehension}, |
|
journal = {CoRR}, |
|
volume = {abs/1910.13461}, |
|
year = {2019}, |
|
url = {http://arxiv.org/abs/1910.13461}, |
|
eprinttype = {arXiv}, |
|
eprint = {1910.13461}, |
|
timestamp = {Thu, 31 Oct 2019 14:02:26 +0100}, |
|
biburl = {https://dblp.org/rec/journals/corr/abs-1910-13461.bib}, |
|
bibsource = {dblp computer science bibliography, https://dblp.org} |
|
} |
|
``` |