llmlingua-2 / README.md
qianhuiwu's picture
Add requirements.
ee1e26e
|
raw
history blame
3.43 kB
metadata
title: Llmlingua 2
emoji: 💻
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 4.21.0
app_file: app.py
pinned: false
license: cc-by-nc-sa-4.0

LLMLingua-2 is a branch of work from project:

LLMLingua Series | Effectively Deliver Information to LLMs via Prompt Compression

| Project Page | LLMLingua | LongLLMLingua | LLMLingua-2 | LLMLingua Demo | LLMLingua-2 Demo |

Check the links above for more information.

Brief Introduction

LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.

LongLLMLingua mitigates the 'lost in the middle' issue in LLMs, enhancing long-context information processing. It reduces costs and boosts efficiency with prompt compression, improving RAG performance by up to 21.4% using only 1/4 of the tokens.

LLMLingua-2, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.