Update README.md
#5
by
Chaitanya890
- opened
README.md
CHANGED
@@ -1,30 +1,19 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
pipeline_tag: text-generation
|
4 |
-
datasets:
|
5 |
-
- aiplanet/buddhi-dataset
|
6 |
-
language:
|
7 |
-
- en
|
8 |
---
|
9 |
|
10 |
<p align="center" style="font-size:34px;"><b>Buddhi-128K-Chat</b></p>
|
11 |
|
12 |
# Buddhi-128K-Chat (7B) vLLM Inference: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing)
|
13 |
|
14 |
-
# Read release article: [🔗 Introducing Buddhi: Open-Source Chat Model with a 128K Context Window 🔗 ](https://medium.aiplanet.com/introducing-buddhi-open-source-chat-model-with-a-128k-context-window-06a1848121d0)
|
15 |
-
|
16 |
-
![4.png](https://cdn-uploads.huggingface.co/production/uploads/630f3058236215d0b7078806/VUY0c4xOGpH9jTNmf6XNU.png)
|
17 |
-
|
18 |
## Model Description
|
19 |
|
20 |
-
Buddhi
|
21 |
-
|
22 |
-
## Architecture
|
23 |
-
The Buddhi-128K-Chat model is fine-tuned on the Mistral-7B Instruct base model. We selected the Mistral 7B Instruct v0.2 as the parent model due to its superior reasoning capabilities. The architecture of the Mistral-7B model includes features like Grouped-Query Attention and Byte-fallback BPE tokenizer. Originally, this model has 32,768 maximum position embeddings. To increase the context size to 128K, we needed to modify the positional embeddings, which is where YaRN comes into play.
|
24 |
|
25 |
-
|
26 |
|
27 |
-
|
28 |
|
29 |
### Hardware requirements:
|
30 |
> For 128k Context Length
|
@@ -134,6 +123,13 @@ Why don't scientists trust atoms?
|
|
134 |
Because they make up everything.
|
135 |
```
|
136 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
137 |
|
138 |
## Prompt Template for Buddi-128-Chat
|
139 |
|
@@ -146,37 +142,6 @@ In order to leverage instruction fine-tuning, your prompt should be surrounded b
|
|
146 |
|
147 |
```
|
148 |
|
149 |
-
# Benchmarks
|
150 |
-
|
151 |
-
### Long Context Benchmark
|
152 |
-
|
153 |
-
<strong>LongICLBench Banking77</strong>
|
154 |
-
<div>
|
155 |
-
|
156 |
-
| Model | 1R/2k | 2R/4K | 3R/7K | 4R/9K | 5R/14K |
|
157 |
-
|-----------------------------------------|-------|-------|-------|-------|--------|
|
158 |
-
| aiplanet/buddhi-128k-chat-7b | 47.8 | 60.8 | 57.8 | 62.4 | 57.2 |
|
159 |
-
| NousResearch/Yarn-Mistral-7b-128k | 31.6 | 68.6 | 68 | 47 | 65.6 |
|
160 |
-
| CallComply/zephyr-7b-beta-128k | 40.2 | 41.2 | 33.6 | 03 | 0 |
|
161 |
-
| Eric111/Yarn-Mistral-7b-128k-DPO | 28.6 | 62.8 | 58 | 41.6 | 59.8 |
|
162 |
-
|
163 |
-
</div>
|
164 |
-
|
165 |
-
<strong>Short Context Benchmark</strong>
|
166 |
-
<div>
|
167 |
-
|
168 |
-
| Model | # Params | Average | ARC (25-shot) | HellaSwag (10-shot) | Winogrande (5-shot) | TruthfulOA (0-shot) | MMLU (5-shot) |
|
169 |
-
|-----------------------------------|----------|---------|---------------|---------------------|---------------------|---------------------|---------------|
|
170 |
-
| aiplanet/buddhi-128k-chat-7b | 7B | 64.42 | 60.84 | 84 | 77.27 | 65.72 | 60.42 |
|
171 |
-
| migtissera/Tess-XS-vl-3-yarn-128K | 7B | 62.66 | 61.09 | 82.95 | 74.43 | 50.13 | 62.15 |
|
172 |
-
| migtissera/Tess-XS-v1-3-yarn-128K | 7B | 62.49 | 61.6 | 82.96 | 74.74 | 50.2 | 62.1 |
|
173 |
-
| Eric111/Yarn-Mistral-7b-128k-DPO | 7B | 60.15 | 60.84 | 82.99 | 78.3 | 43.55 | 63.09 |
|
174 |
-
| NousResearch/Yam-Mistral-7b-128k | 7B | 59.42 | 59.64 | 82.5 | 76.95 | 41.78 | 63.02 |
|
175 |
-
| CallComply/openchat-3.5-0106-128k | 7B | 59.38 | 64.25 | 77.31 | 77.66 | 46.5 | 57.58 |
|
176 |
-
| CallComply/zephyr-7b-beta-128k | 7B | 54.45 | 58.28 | 81 | 74.74 | 46.1 | 53.57 |
|
177 |
-
|
178 |
-
</div>
|
179 |
-
|
180 |
## Get in Touch
|
181 |
|
182 |
You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
<p align="center" style="font-size:34px;"><b>Buddhi-128K-Chat</b></p>
|
7 |
|
8 |
# Buddhi-128K-Chat (7B) vLLM Inference: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing)
|
9 |
|
|
|
|
|
|
|
|
|
10 |
## Model Description
|
11 |
|
12 |
+
Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN [(Yet another Rope Extension)](https://arxiv.org/abs/2309.00071) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
|
|
|
|
|
|
|
13 |
|
14 |
+
## Dataset Creation
|
15 |
|
16 |
+
## Architecture
|
17 |
|
18 |
### Hardware requirements:
|
19 |
> For 128k Context Length
|
|
|
123 |
Because they make up everything.
|
124 |
```
|
125 |
|
126 |
+
## Evaluation
|
127 |
+
|
128 |
+
| Model | HellaSWAG | ARC-Challenge | MMLU | TruthfulQA | Winogrande |
|
129 |
+
|--------------------------------------|-----------|---------------|-------|------------|------------|
|
130 |
+
| Buddhi-128K-Chat | 82.78 | 57.51 | 57.39 | 55.44 | 78.37 |
|
131 |
+
| NousResearch/Yarn-Mistral-7b-128k | 80.58 | 58.87 | 60.64 | 42.46 | 72.85 |
|
132 |
+
|
133 |
|
134 |
## Prompt Template for Buddi-128-Chat
|
135 |
|
|
|
142 |
|
143 |
```
|
144 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
145 |
## Get in Touch
|
146 |
|
147 |
You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)
|