aiplanet
/

buddhi-128k-chat-7b

@@ -9,8 +9,6 @@ pipeline_tag: text-generation
 ## Model Description
-<!-- Provide a quick summary of what the model is/does. -->
 Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN [(Yet another Rope Extension)](https://arxiv.org/abs/2309.00071) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
 ## Dataset Creation
@@ -36,13 +34,18 @@ Please check out [Flash Attention 2](https://github.com/Dao-AILab/flash-attentio
 **Implementation**:
 ```python
 from vllm import LLM, SamplingParams
 llm = LLM(
-  model='aiplanet/Buddhi-128K-Chat',
-  gpu_memory_utilization=0.99,
-  max_model_len=131072
 )
 prompts = [
@@ -63,8 +66,12 @@ for output in outputs:
     generated_text = output.outputs[0].text
     print(generated_text)
     print("\n\n")
 ```
 ### Transformers - Basic Implementation
 ```python
@@ -155,7 +162,7 @@ In order to leverage instruction fine-tuning, your prompt should be surrounded b
  ```
  @misc {Chaitanya890, lucifertrj ,
-	author       = { {Chaitanya Singhal},{Tarun Jain} },
 	title        = { Buddhi-128k-Chat by AI Planet},
 	year         = 2024,
 	url          = { https://huggingface.co/aiplanet//Buddhi-128K-Chat },

 ## Model Description
 Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN [(Yet another Rope Extension)](https://arxiv.org/abs/2309.00071) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
 ## Dataset Creation
 **Implementation**:
+> Note: The actual hardware requirements to run the model is roughly around 70GB VRAM. For experimentation, we are limiting the context length to 75K instead of 128K. This make it suitable for testing the model in 30-35 GB VRAM
 ```python
 from vllm import LLM, SamplingParams
 llm = LLM(
+    model='aiplanet/buddhi-128k-chat-7b',
+    trust_remote_code=True,
+    download_dir='aiplanet/buddhi-128k-chat-7b',
+    dtype = 'bfloat16',
+    gpu_memory_utilization=1,
+    max_model_len= 75000
 )
 prompts = [
     generated_text = output.outputs[0].text
     print(generated_text)
     print("\n\n")
+# we have also attached a colab notebook, that contains: 2 more experimentations: Long Essay and Entire Book
 ```
+For Output, do check out the colab notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing)
 ### Transformers - Basic Implementation
 ```python
  ```
  @misc {Chaitanya890, lucifertrj ,
+	author       = { Chaitanya Singhal, Tarun Jain },
 	title        = { Buddhi-128k-Chat by AI Planet},
 	year         = 2024,
 	url          = { https://huggingface.co/aiplanet//Buddhi-128K-Chat },