Files changed (1) hide show
  1. README.md +10 -45
README.md CHANGED
@@ -1,30 +1,19 @@
1
  ---
2
  license: apache-2.0
3
  pipeline_tag: text-generation
4
- datasets:
5
- - aiplanet/buddhi-dataset
6
- language:
7
- - en
8
  ---
9
 
10
  <p align="center" style="font-size:34px;"><b>Buddhi-128K-Chat</b></p>
11
 
12
  # Buddhi-128K-Chat (7B) vLLM Inference: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing)
13
 
14
- # Read release article: [🔗 Introducing Buddhi: Open-Source Chat Model with a 128K Context Window 🔗 ](https://medium.aiplanet.com/introducing-buddhi-open-source-chat-model-with-a-128k-context-window-06a1848121d0)
15
-
16
- ![4.png](https://cdn-uploads.huggingface.co/production/uploads/630f3058236215d0b7078806/VUY0c4xOGpH9jTNmf6XNU.png)
17
-
18
  ## Model Description
19
 
20
- Buddhi-128k-Chat is a general-purpose first chat model with 128K context length window. It is meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN (Yet another Rope Extension) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
21
-
22
- ## Architecture
23
- The Buddhi-128K-Chat model is fine-tuned on the Mistral-7B Instruct base model. We selected the Mistral 7B Instruct v0.2 as the parent model due to its superior reasoning capabilities. The architecture of the Mistral-7B model includes features like Grouped-Query Attention and Byte-fallback BPE tokenizer. Originally, this model has 32,768 maximum position embeddings. To increase the context size to 128K, we needed to modify the positional embeddings, which is where YaRN comes into play.
24
 
25
- In our approach, we utilized the NTK-aware technique, which recommends alternative interpolation techniques for positional interpolation. One experimentation involved Dynamic-YARN, suggesting the dynamic value of the 's' scale factor. This is because during inference, the sequence length changes by 1 after every word prediction. By integrating these position embeddings with the Mistral-7B Instruct base model, we achieved the 128K model.
26
 
27
- Additionally, we fine-tuned the model on our dataset to contribute one of the very few 128K chat-based models available in the open-source community with greater reasoning capabilities than all of it.
28
 
29
  ### Hardware requirements:
30
  > For 128k Context Length
@@ -134,6 +123,13 @@ Why don't scientists trust atoms?
134
  Because they make up everything.
135
  ```
136
 
 
 
 
 
 
 
 
137
 
138
  ## Prompt Template for Buddi-128-Chat
139
 
@@ -146,37 +142,6 @@ In order to leverage instruction fine-tuning, your prompt should be surrounded b
146
 
147
  ```
148
 
149
- # Benchmarks
150
-
151
- ### Long Context Benchmark
152
-
153
- <strong>LongICLBench Banking77</strong>
154
- <div>
155
-
156
- | Model | 1R/2k | 2R/4K | 3R/7K | 4R/9K | 5R/14K |
157
- |-----------------------------------------|-------|-------|-------|-------|--------|
158
- | aiplanet/buddhi-128k-chat-7b | 47.8 | 60.8 | 57.8 | 62.4 | 57.2 |
159
- | NousResearch/Yarn-Mistral-7b-128k | 31.6 | 68.6 | 68 | 47 | 65.6 |
160
- | CallComply/zephyr-7b-beta-128k | 40.2 | 41.2 | 33.6 | 03 | 0 |
161
- | Eric111/Yarn-Mistral-7b-128k-DPO | 28.6 | 62.8 | 58 | 41.6 | 59.8 |
162
-
163
- </div>
164
-
165
- <strong>Short Context Benchmark</strong>
166
- <div>
167
-
168
- | Model | # Params | Average | ARC (25-shot) | HellaSwag (10-shot) | Winogrande (5-shot) | TruthfulOA (0-shot) | MMLU (5-shot) |
169
- |-----------------------------------|----------|---------|---------------|---------------------|---------------------|---------------------|---------------|
170
- | aiplanet/buddhi-128k-chat-7b | 7B | 64.42 | 60.84 | 84 | 77.27 | 65.72 | 60.42 |
171
- | migtissera/Tess-XS-vl-3-yarn-128K | 7B | 62.66 | 61.09 | 82.95 | 74.43 | 50.13 | 62.15 |
172
- | migtissera/Tess-XS-v1-3-yarn-128K | 7B | 62.49 | 61.6 | 82.96 | 74.74 | 50.2 | 62.1 |
173
- | Eric111/Yarn-Mistral-7b-128k-DPO | 7B | 60.15 | 60.84 | 82.99 | 78.3 | 43.55 | 63.09 |
174
- | NousResearch/Yam-Mistral-7b-128k | 7B | 59.42 | 59.64 | 82.5 | 76.95 | 41.78 | 63.02 |
175
- | CallComply/openchat-3.5-0106-128k | 7B | 59.38 | 64.25 | 77.31 | 77.66 | 46.5 | 57.58 |
176
- | CallComply/zephyr-7b-beta-128k | 7B | 54.45 | 58.28 | 81 | 74.74 | 46.1 | 53.57 |
177
-
178
- </div>
179
-
180
  ## Get in Touch
181
 
182
  You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)
 
1
  ---
2
  license: apache-2.0
3
  pipeline_tag: text-generation
 
 
 
 
4
  ---
5
 
6
  <p align="center" style="font-size:34px;"><b>Buddhi-128K-Chat</b></p>
7
 
8
  # Buddhi-128K-Chat (7B) vLLM Inference: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing)
9
 
 
 
 
 
10
  ## Model Description
11
 
12
+ Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN [(Yet another Rope Extension)](https://arxiv.org/abs/2309.00071) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
 
 
 
13
 
14
+ ## Dataset Creation
15
 
16
+ ## Architecture
17
 
18
  ### Hardware requirements:
19
  > For 128k Context Length
 
123
  Because they make up everything.
124
  ```
125
 
126
+ ## Evaluation
127
+
128
+ | Model | HellaSWAG | ARC-Challenge | MMLU | TruthfulQA | Winogrande |
129
+ |--------------------------------------|-----------|---------------|-------|------------|------------|
130
+ | Buddhi-128K-Chat | 82.78 | 57.51 | 57.39 | 55.44 | 78.37 |
131
+ | NousResearch/Yarn-Mistral-7b-128k | 80.58 | 58.87 | 60.64 | 42.46 | 72.85 |
132
+
133
 
134
  ## Prompt Template for Buddi-128-Chat
135
 
 
142
 
143
  ```
144
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
145
  ## Get in Touch
146
 
147
  You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)