File size: 3,272 Bytes
a50cb1a 9940a43 a50cb1a b2d761b 2aa9556 b2d761b 73c2a36 bcc655b beac4d1 bcc655b b2d761b 7e628ed 9de792b b92fabd 9de792b a94e500 6056572 b2d761b 73c2a36 2aa9556 b2d761b 2aa9556 b2d761b 5f9eb1b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
---
license: apache-2.0
---
# XGen-7B-8K-Inst
Official research release for the family of **XGen** models (`7B`) by Salesforce AI Research:
*Title*: [Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length](https://arxiv.org/abs/2309.03450)
*Authors*: [Erik Nijkamp](https://eriknijkamp.com)\*, Tian Xie\*, [Hiroaki Hayashi](https://hiroakih.me/)\*, [Bo Pang](https://scholar.google.com/citations?user=s9fNEVEAAAAJ&hl=en)\*, Congying Xia\*, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryscinski, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, [Chien-Sheng Wu](https://jasonwu0731.github.io/), Silvio Savarese, [Yingbo Zhou](https://scholar.google.com/citations?user=H_6RQ7oAAAAJ&hl=en), [Shafiq Rayhan Joty](https://raihanjoty.github.io/), [Caiming Xiong](http://cmxiong.com/).
(* indicates equal contribution)
Correspondence to: [Shafiq Rayhan Joty](mailto:[email protected]), [Caiming Xiong](mailto:[email protected])
## Models
### Base models
* [XGen-7B-4K-Base](https://huggingface.co/Salesforce/xgen-7b-4k-base): XGen-7B model pre-trained under 4K sequence length.
* License: Apache-2.0
* [XGen-7B-8K-Base](https://huggingface.co/Salesforce/xgen-7b-8k-base): XGen-7B model pre-trained under 8K sequence length.
* License: Apache-2.0
### Instruction-finetuned models
Supervised finetuned model on public domain instructional data. Released for ***research purpose*** only.
* [XGen-7B-8K-Inst](https://huggingface.co/Salesforce/xgen-7b-8k-inst)
## How to run
The training data for the models are tokenized with OpenAI Tiktoken library.
To use this model, install the package via `pip`:
```sh
pip install tiktoken
```
The models can be used as auto-regressive samplers as follows:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Salesforce/xgen-7b-8k-inst", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Salesforce/xgen-7b-8k-inst", torch_dtype=torch.bfloat16)
header = (
"A chat between a curious human and an artificial intelligence assistant. "
"The assistant gives helpful, detailed, and polite answers to the human's questions.\n\n"
)
article = "" # insert a document here
prompt = f"### Human: Please summarize the following article.\n\n{article}.\n###"
inputs = tokenizer(header + prompt, return_tensors="pt")
sample = model.generate(**inputs, do_sample=True, max_new_tokens=2048, top_k=100, eos_token_id=50256)
output = tokenizer.decode(sample[0])
print(output.strip().replace("Assistant:", ""))
```
## Citation
```bibtex
@misc{XGen,
title={Long Sequence Modeling with XGen: A 7B LLM Trained on 8K Input Sequence Length},
author={Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryscinski, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Rayhan Joty, Caiming Xiong},
howpublished={ArXiv},
year={2023},
url={https://arxiv.org/abs/2309.03450}
}
``` |