--- license: other license_name: nv-ai-foundation-models-license license_link: >- https://developer.download.nvidia.com/ai-foundation-models/nvidia-ai-foundation-models-license-10Nov2023.pdf language: - en pipeline_tag: text-generation tags: - nvidia - Megatron-LM - Retro - InstructRetro - 8B library_name: Megatron-LM --- # InstructRetro [Documentation](https://github.com/NVIDIA/Megatron-LM/tree/InstructRetro/tools/retro)   [Paper](https://arxiv.org/abs/2310.07713)   [Evaluation Data](https://drive.google.com/drive/folders/1xw-N0LJR_lIWnH6BKzHIb49quVCS_V72?usp=drive_link)   [Model Weights](https://huggingface.co/collections/nvidia/instructretro-65837ea76b60651e01faec8d) InstructRetro [(Wang et al., 2023b)](https://arxiv.org/abs/2310.07713) scales up the size of Retro to 48B, featuring the largest LLM pretrained with retrieval (as of December 2023). The obtained foundation model, Retro 48B, largely outperforms the GPT counterpart in terms of perplexity. With instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on downstream tasks in the zero-shot setting. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks, and 10% over GPT across 4 challenging long-form QA tasks. We also find that one can ablate the encoder from InstructRetro architecture and directly use the InstructRetro decoder backbone as GPT, while achieving comparable results. **For more information about InstructRetro, check the [Documentation](https://github.com/NVIDIA/Megatron-LM/tree/InstructRetro/tools/retro)!** ## Background Retro [(Borgeaud et al., 2022)](https://arxiv.org/abs/2112.04426) is an autoregressive decoder-only language model (LM) pretrained with retrieval-augmentation. Retro features practical scalibility to support large-scale pretraining from scratch by retrieving from trillions of token. Pretraining with retrieval provides a more efficient storage mechanism of factual knowledge, when compared to storing factual knowledge implicitly within the network's parameters, thus largely reducing model parameters while achieving lower perplexity than standard GPT. Retro also provides the flexibility to update the knowledge stored in LMs [(Wang et al., 2023a)](https://arxiv.org/abs/2304.06762) by updating the retrieval database without training LMs again. ## Overview ### License The use of this model is governed by the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license). ### Supported Hardware - H100 - A100 80GB, A100 40GB ### Model Version(s) `retro-8b-instruct-4k`: Pretrained Retro 8B LM with instruction tuning. ### Toolkit [Megatron-LM Framework](https://github.com/NVIDIA/Megatron-LM/tree/InstructRetro) ## Environment We recommend using docker environment to run the code. ### Docker image We provide a docker build file in [Dockerfile](https://github.com/NVIDIA/Megatron-LM/blob/InstructRetro/tools/retro/examples/Dockerfile) for the reproduction. The docker image is based on `nvcr.io/nvidia/pytorch:23.09-py3`. ### Install dependencies Clone the Megatron repo: ```bash git clone --branch InstructRetro https://github.com/NVIDIA/Megatron-LM.git ``` If docker is not available, we recommend starting from a clean conda environment with the following runtime dependencies: - Python 3.10 - NVIDIA CUDA® 12.2.1 - NVIDIA cuBLAS 12.2.5.6 - NVIDIA cuDNN 8.9.5 - NVIDIA NCCL 2.18.5 - PyTorch 2.1.0a0+32f93b1 Then install Retro-specific dependencies, including: ```bash pip install -U faiss-gpu pip install -U transformers pip install -U sentencepiece pip install -U h5py pip install -U nltk pip install -U einops ``` ## Evaluation Command Download our model checkpoint and tokenizer. Specify the blank args in the [tools/retro/text_generation/retro_generate.sh](https://github.com/NVIDIA/Megatron-LM/blob/InstructRetro/tools/retro/text_generation/retro_generate.sh) script, including model path, Retro workdir, and model related params. | Parameter | Value | Explanation | |-----------|-------|-----------------------------------| | mod_par | 4 | Tensor parallelism | | layers | 32 | Number of layers in the model | | hid_dim | 4096 | Hidden dimension size | | heads | 32 | Number of attention heads | | pip_par | 1 | Pipeline parallelism | We present an example command to run retro generation with the InstructRetro checkpoints for the Natural Question (NQ) task. The example command is for the 8b InstructRetro. Please specify the directory for the NQ dataset and update the command accordingly for other checkpoints. ``` bash tools/retro/text_generation/retro_generate.sh nq 8b greedy test 0 20000 1000 5 pp1 2 ``` The generated responses will be saved in the corresponding checkpoint directory. For example, for the 8b InstructRetro, it will be saved to `/retro-generate-nq_5_2_8b_test_greedy_0_20000_1000.txt`. To evaluate the F1 / Exact Match (EM) scores of the generated responses, we provide an example script to run the evaluation on the NQ dataset. Please specify the directory for the NQ dataset and update the command accordingly for other checkpoints and downstream tasks. ```bash python3 tools/retro/text_generation/evaluate.py ``` # Citations See more details from our papers: [Shall we Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study.](https://arxiv.org/abs/2304.06762) _Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro._ (EMNLP 2023) [InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining.](https://arxiv.org/abs/2310.07713) _Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro._ (ICML 2024) Please cite the papers as follows if you use the data or code from this repo: ```bibtex @inproceedings{wang2023shall, title = {Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study}, author = {Boxin Wang and Wei Ping and Peng Xu and Lawrence McAfee and Zihan Liu and Mohammad Shoeybi and Yi Dong and Oleksii Kuchaiev and Bo Li and Chaowei Xiao and Anima Anandkumar and Bryan Catanzaro}, journal = {The 2023 Conference on Empirical Methods in Natural Language Processing}, year = {2023} } @article{wang2023instructretro, title = {InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining}, author = {Boxin Wang and Wei Ping and Lawrence McAfee and Peng Xu and Bo Li and Mohammad Shoeybi and Bryan Catanzaro}, year = {2023}, journal = {arXiv preprint arXiv: 2310.07713} } ```