---
dataset_info:
features:
- name: input
dtype: string
- name: output
dtype: string
- name: source
dtype: string
- name: subset
dtype: string
splits:
- name: train
num_bytes: 63759065
num_examples: 23652
- name: validation
num_bytes: 6190242
num_examples: 2042
- name: test
num_bytes: 6080212
num_examples: 2045
download_size: 45525146
dataset_size: 76029519
task_categories:
- text2text-generation
- text-generation
- question-answering
- conversational
- summarization
- table-question-answering
language:
- en
tags:
- instruction-tuning
pretty_name: longform
size_categories:
- 10KPresident Donald Trump tweeted Monday that his nominee for HHS secretary, Alex Azar will “be a star for … lower drug prices!” But the record of the former top executive for Eli Lilly, which tripled the price of a top-selling insulin drug while he led its U.S. operation, suggests a different story.
Lilly is one of three drug companies targeted by a class-action lawsuit that accuses the company[...]
“This nominee has impeccable big pharma credentials but non-existent credentials when it comes to concerns on drug prices,” said Peter Welch, co-chair of the House Democrats Drug Pricing Task Force.
|Wikipedia | "Where is San Saba located in Rome, Italy?" Respond in 2 sentences. |San Saba is an ancient basilica church in Rome, Italy. It lies on the so-called "Piccolo Aventino", which is an area close to the ancient Aurelian Walls next to the Aventine Hill and Caelian Hill.|
## Dataset Description
- **Repository:** https://github.com/akoksal/LongForm
- **Paper:** https://arxiv.org/abs/2304.08460
- **Version:** v1.0 - April 18, 2023
- **Contact:** [Abdullatif Köksal](https://twitter.com/akoksal_)
## License
The LongForm project is subject to a MIT License with custom limitations for restrictions imposed by OpenAI (for the instruction generation part), as well as the license of language models (OPT, LLaMA, and T5).
## Citation
```
@misc{koksal2023longform,
title={LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction},
author={Abdullatif Köksal and Timo Schick and Anna Korhonen and Hinrich Schütze},
year={2023},
eprint={2304.08460},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```