Papers
arxiv:2409.02877

Configurable Foundation Models: Building LLMs from a Modular Perspective

Published on Sep 4
· Submitted by fengyao1909 on Sep 9
#2 Paper of the day
Authors:
,
,
,
,
,
,

Abstract

Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendency to decompose LLMs into numerous functional modules, allowing for inference with part of modules and dynamic assembly of modules to tackle complex tasks, such as mixture-of-experts. To highlight the inherent efficiency and composability of the modular approach, we coin the term brick to represent each functional module, designating the modularized structure as configurable foundation models. In this paper, we offer a comprehensive overview and investigation of the construction, utilization, and limitation of configurable foundation models. We first formalize modules into emergent bricks - functional neuron partitions that emerge during the pre-training phase, and customized bricks - bricks constructed via additional post-training to improve the capabilities and knowledge of LLMs. Based on diverse functional bricks, we further present four brick-oriented operations: retrieval and routing, merging, updating, and growing. These operations allow for dynamic configuration of LLMs based on instructions to handle complex tasks. To verify our perspective, we conduct an empirical analysis on widely-used LLMs. We find that the FFN layers follow modular patterns with functional specialization of neurons and functional neuron partitions. Finally, we highlight several open issues and directions for future research. Overall, this paper aims to offer a fresh modular perspective on existing LLM research and inspire the future creation of more efficient and scalable foundational models.

Community

Paper author Paper submitter

Inspired by the human brain's functional specialization, we propose a concept: Configurable Foundation Model, a modular approach to LLMs.

Key Concepts:

  • Emergent bricks: Functional neuron partitions that emerge during pre-training
  • Customized bricks: Post-training modules to enhance LLM capabilities
  • Brick operations: Retrieval, routing, merging, updating, and growing

Benefits of our approach:
✅ Efficient inference on resource-limited devices
✅ Dynamic assembly of modules for complex tasks
✅ Scalable capabilities through modular design
✅ Potential for continuous model updates and improvements

Our paper aims to offer a fresh perspective on LLM research and inspire more efficient, scalable foundation models. We also discuss open issues and future research directions in this emerging field.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.02877 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.02877 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.02877 in a Space README.md to link it from this page.

Collections including this paper 4