macadeliccc
/

laser-dolphin-mixtral-2x7b-dpo

@@ -51,87 +51,46 @@ Please give ideas and a detailed plan about how to assemble and train an army of
 Switch the commented model definition to use in 4-bit. Should work with 9GB and still exceed the single 7B model by 5-6 points roughly
 ```python
-# Import necessary libraries
-from transformers import AutoTokenizer, AutoModelForCausalLM
-# Load tokenizer and model
-tokenizer = AutoTokenizer.from_pretrained("macadeliccc/laser-dolphin-mixtral-2x7b-dpo")
-model = AutoModelForCausalLM.from_pretrained("macadeliccc/laser-dolphin-mixtral-2x7b-dpo")
-# Define a function to generate responses with adjustable hyperparameters
-def generate_response(messages, max_length=50, num_return_sequences=1, temperature=1.0, top_k=50, top_p=1.0):
     """
-    Generate a response from the model based on the input chat messages and hyperparameters.
     Args:
-    messages (list): List of message dictionaries with 'role' and 'content'.
-    max_length (int): Maximum length of the model's response.
-    num_return_sequences (int): Number of response sequences to generate.
-    temperature (float): Sampling temperature for model generation.
-    top_k (int): The number of highest probability vocabulary tokens to keep for top-k filtering.
-    top_p (float): If set to float < 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation.
     Returns:
     str: The generated response from the model.
     """
-    # Apply chat template to input messages
-    gen_input = tokenizer.apply_chat_template(messages, return_tensors="pt")
-    # Generate a response
-    output = model.generate(**gen_input,
-                            max_length=max_length,
-                            num_return_sequences=num_return_sequences,
-                            temperature=temperature,
-                            top_k=top_k,
-                            top_p=top_p)
     # Decode the generated tokens to a string
-    response = tokenizer.decode(output[0], skip_special_tokens=True)
     return response
-# Example chat messages
-messages = [
-    {"role": "system", "content": "You are Dolphin, an AI assistant."},
-    {"role": "user", "content": "Write a quicksort algorithm in python"}
-]
-# Generate and print the response
-response = generate_response(messages, max_length=100, temperature=0.8)
-print("Response:\n", response)
 ```
 [colab](https://colab.research.google.com/drive/1cmRhAkDWItV7utHNqNANVZnqDqQNsTUr?usp=sharing) with usage example
 ## Eval
-**Full Precision**
-|  Tasks   |Version|Filter|n-shot| Metric |Value |   |Stderr|
-|----------|-------|------|-----:|--------|-----:|---|-----:|
-|arc_easy  |Yaml   |none  |     0|acc     |0.8413|±  |0.0075|
-|          |       |none  |     0|acc_norm|0.8056|±  |0.0081|
-|boolq     |Yaml   |none  |     0|acc     |0.8694|±  |0.0059|
-|hellaswag |Yaml   |none  |     0|acc     |0.6484|±  |0.0048|
-|          |       |none  |     0|acc_norm|0.8354|±  |0.0037|
-|openbookqa|Yaml   |none  |     0|acc     |0.3500|±  |0.0214|
-|          |       |none  |     0|acc_norm|0.4660|±  |0.0223|
-|piqa      |Yaml   |none  |     0|acc     |0.8210|±  |0.0089|
-|          |       |none  |     0|acc_norm|0.8303|±  |0.0088|
-|winogrande|Yaml   |none  |     0|acc     |0.7577|±  |0.0120|
-**4-bit (bnb)**
-|  Tasks   |Version|Filter|n-shot| Metric |Value |   |Stderr|
-|----------|-------|------|-----:|--------|-----:|---|-----:|
-|boolq     |Yaml   |none  |     0|acc     |0.8700|±  |0.0059|
-|hellaswag |Yaml   |none  |     0|acc     |0.6356|±  |0.0048|
-|          |       |none  |     0|acc_norm|0.8270|±  |0.0038|
-|openbookqa|Yaml   |none  |     0|acc     |0.3320|±  |0.0211|
-|          |       |none  |     0|acc_norm|0.4620|±  |0.0223|
-|piqa      |Yaml   |none  |     0|acc     |0.8123|±  |0.0091|
-|          |       |none  |     0|acc_norm|0.8259|±  |0.0088|
-|winogrande|Yaml   |none  |     0|acc     |0.7490|±  |0.0122|
 evaluation [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk?usp=sharing)

 Switch the commented model definition to use in 4-bit. Should work with 9GB and still exceed the single 7B model by 5-6 points roughly
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+def generate_response(prompt):
     """
+    Generate a response from the model based on the input prompt.
     Args:
+    prompt (str): Prompt for the model.
     Returns:
     str: The generated response from the model.
     """
+    # Tokenize the input prompt
+    inputs = tokenizer(prompt, return_tensors="pt")
+    # Generate output tokens
+    outputs = model.generate(**inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id)
     # Decode the generated tokens to a string
+    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
     return response
+# Load the model and tokenizer
+model_id = "macadeliccc/piccolo-2x7b"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True)
+prompt = "Write a quicksort algorithm in python"
+# Generate and print responses for each language
+print("Response:")
+print(generate_response(prompt), "\n")
 ```
 [colab](https://colab.research.google.com/drive/1cmRhAkDWItV7utHNqNANVZnqDqQNsTUr?usp=sharing) with usage example
 ## Eval
+TODO
 evaluation [colab](https://colab.research.google.com/drive/1FpwgsGzCR4tORTxAwUxpN3PcP22En2xk?usp=sharing)