1 8 76

SeungJu Mun

nebchi

NUMCHCOMCH

AI & ML interests

LLM & Multimodal

Recent Activity

liked a model 5 days ago

deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw

liked a model 5 days ago

Mineru/gemma-2-9b-finance-it-tools

liked a dataset 5 days ago

nayohan/finance-alpaca-ko

View all activity

Organizations

nebchi's activity

liked 2 models 5 days ago

deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw

Text Generation • Updated Aug 23 • 45 • 1

Mineru/gemma-2-9b-finance-it-tools

Text Generation • Updated 28 days ago • 38 • 1

liked a dataset 5 days ago

nayohan/finance-alpaca-ko

Viewer • Updated Jul 17 • 68.9k • 138 • 4

liked a model 5 days ago

BCCard/Llama-3-Kor-BCCard-Finance-8B

Text Generation • Updated Jul 25 • 110 • 3

liked a model 9 days ago

HIT-TMG/KaLM-embedding-multilingual-mini-instruct-v1

liked a model 16 days ago

jinaai/jina-embeddings-v3

Feature Extraction • Updated 1 day ago • 963k • 522

liked a model about 1 month ago

rtzr/ko-gemma-2-9b-it

Text Generation • Updated Jul 15 • 42k • 76

Reacted to beomi's post with 👍 about 1 month ago

Post

12237

🚀 **InfiniTransformer, Gemma/Llama3 based Implementation!** 🌌

> Update @ 2024.04.19: It now supports Llama-3!

> Note: this implementation is unofficial

This implementation is designed to handle virtually infinite context lengths.

Here's the github repo: https://github.com/Beomi/InfiniTransformer

📄 **Read the original Paper:** https://arxiv.org/abs/2404.07143

## **Focus on Infini-Attention**

- **2 Types of Implementation available:** Attention-layer only implementation / Model & Train-wise implementation
- **Fixed(segment dependent) Memory Usage:** Enables training on larger models and longer sequences without the memory overhead typical of standard Transformer implementations.
- **Infinite Context Capability:** Train with unprecedented sequence lengths—imagine handling up to 1 million sequence lengths on standard hardware!
- You could train Gemma-2B with 1M sequence length with 2K segmentation size with single H100 GPU.

## **Try InfiniTransformer**

1. **Clone the repository:**

bash
   git clone https://github.com/Beomi/InfiniTransformer

2. **Install necessary tools:**

bash
   pip install -r requirements.txt
   pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers

3. **Dive Deep into Custom Training:**
- Train with extensive sequence lengths using scripts such as ./train.gemma.infini.noclm.1Mseq.sh.

for more detailed info, please visit Repo: https://github.com/Beomi/InfiniTransformer

Look forward to see your feedbacks! 😊

ps. Training loss plot is here 😉