FuseLLM-7B / README.md
Wanfq's picture
Update README.md
f6cf65f verified
|
raw
history blame
4.4 kB
metadata
license: apache-2.0
language:
  - en
pipeline_tag: text-generation
tags:
  - llama
  - open-llama
  - mpt
  - model-fusion
library_name: transformers

Knowledge Fusion of Large Language Models

| 📑 Paper | 🤗 Model | 🐱 Github Repo |

Fanqi Wan, Xinting Huang, Deng Cai, Xiaojun Quan, Wei Bi, Shuming Shi

Sun Yat-sen University, Tencent AI Lab

News

Contents

Overview

In this study, we explore the realm of knowledge fusion for LLMs to create a unified model that combines the capabilities and distinctive strengths of multiple structurally diverse LLMs. To achieve this, we introduce FuseLLM, which first leverages the generative distributions of these source LLMs to externalize both their collective knowledge and individual strengths, and subsequently transfer them to the target LLM through lightweight continual training.

Compared with model ensemble which requires the parallel deployment of multiple LLMs or weight merging which is generally limited to LLMs with identical architectures, FuseLLM supports the fusion of multiple LLMs with diverse architectures by explicitly transferring their knowledge and capabilities to a single target LLM.


Model Release

We release the FuseLLM-7B on Huggingface Models, which is the fusion of three popular open-source LLMs that possess distinct architectures and functionalities: Llama-2-7B, OpenLLaMA-7B, and MPT-7B.

Evaluations across three benchmarks, which consist of a total of 42 tasks spanning reasoning, commonsense, and code generation, confirm that the target model trained by our method outperforms each source LLM and the casual language model baseline in most tasks.


To further illustrate the effectiveness of FuseLLM, we incorporate additional generative benchmarks related to knowledge-based question-answering, reading comprehension, content analysis, machine translation, and theorem application. The results highlight FuseLLM’s superiority over all source LLMs and the baseline.


Since FuseLLM is also applicable to instruction-tuning models, we assess the instruction-following performance on the Vicuna Benchmark using GPT-4 as an evaluator. The results demonstrate that FuseLLM surpasses each individual source instruction-tuning LLM and the baseline , achieving the best performance with GPT-4 judgment.


Citation

If you find this work is relevant with your research or applications, please feel free to cite our work!

@misc{wan2024knowledge,
   title={Knowledge Fusion of Large Language Models},
   author={Fanqi, Wan and Xinting, Huang and Deng, Cai and Xiaojun, Quan and Wei, Bi and Shuming, Shi},
   year={2024},
   eprint={xxxx.xxxxx},
   archivePrefix={arXiv},
   primaryClass={cs.CL}
}

Acknowledgments

This repo benefits from Stanford-Alpaca and Explore-Instruct. Thanks for their wonderful works!