llmlingua-2 / README.md
qianhuiwu's picture
Update readme.
9ecb996

A newer version of the Gradio SDK is available: 5.6.0

Upgrade
metadata
title: Llmlingua 2
emoji: πŸ’»
colorFrom: red
colorTo: green
sdk: gradio
sdk_version: 4.21.0
app_file: app.py
pinned: false
license: cc-by-nc-sa-4.0

LLMLingua-2 is a branch of work from project:

LLMLingua Series | Effectively Deliver Information to LLMs via Prompt Compression

| Project Page | LLMLingua | LongLLMLingua | LLMLingua-2 | LLMLingua Demo | LLMLingua-2 Demo |

Check the links above for more information!

Brief Introduction πŸ“š

LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.

LongLLMLingua mitigates the 'lost in the middle' issue in LLMs, enhancing long-context information processing. It reduces costs and boosts efficiency with prompt compression, improving RAG performance by up to 21.4% using only 1/4 of the tokens.

LLMLingua-2, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.