--- tags: - merge - mergekit - moe - frankenmoe - abacusai/Llama-3-Smaug-8B - cognitivecomputations/dolphin-2.9-llama3-8b - Weyaxi/Einstein-v6.1-Llama3-8B - dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2 base_model: - abacusai/Llama-3-Smaug-8B - cognitivecomputations/dolphin-2.9-llama3-8b - Weyaxi/Einstein-v6.1-Llama3-8B - dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2 license: apache-2.0 --- ![](https://raw.githubusercontent.com/saucam/models/main/skyro.png) # 🚀 Skyro-4X8B Skyro-4X8B is a Mixure of Experts (MoE) made with the following models using [Mergekit](https://github.com/arcee-ai/mergekit): * [abacusai/Llama-3-Smaug-8B](https://huggingface.co/abacusai/Llama-3-Smaug-8B) * [cognitivecomputations/dolphin-2.9-llama3-8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b) * [Weyaxi/Einstein-v6.1-Llama3-8B](https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B) * [dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2](https://huggingface.co/dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2) ## 🧩 Configuration ```yamlname: "Skyro-4X8B" base_model: meta-llama/Meta-Llama-3-8B gate_mode: hidden experts: - source_model: abacusai/Llama-3-Smaug-8B positive_prompts: - "chat" - "assistant" - "tell me" - "explain" - "I want" - source_model: cognitivecomputations/dolphin-2.9-llama3-8b positive_prompts: - "math" - "mathematics" - "code" - "engineering" - "solve" - "logic" - "rationality" - "puzzle" - "solve" - source_model: Weyaxi/Einstein-v6.1-Llama3-8B positive_prompts: - "science" - "medical" - "physics" - "engineering" - "math" - "logic" - "rationality" - "mathematics" - "solve" - source_model: dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2 positive_prompts: - "story" - "roleplay" - "role-play" - "storywriting" - "character" - "narrative" - "creative" ``` ## Evaluation |Average|ARC|HellaSwag|MMLU|TruthfulQA|Winogrande|GSM8K| |-------|---|---------|----|----------|----------|-----| |66.39|61.26|82.38|66.67|50.15|77.66|60.2| ## 💻 Usage ```python !pip install -qU transformers accelerate from transformers import AutoTokenizer import transformers import torch model = "saucam/Skyro-4X8B" messages = [{"role": "user", "content": "In a student council election, candidate A got 20% of the votes while candidate B got 50% more than candidate A's votes. The rest of the votes was given to candidate C. If there were 100 voters, how many votes did candidate C get?"}] tokenizer = AutoTokenizer.from_pretrained(model) prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) pipeline = transformers.pipeline( "text-generation", model=model, torch_dtype=torch.float16, device_map="auto", ) outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) print(outputs[0]["generated_text"]) ``` ## Sample output ``` config.json: 100%|██████████████████████████████████████████████████████████████| 878/878 [00:00<00:00, 4.18MB/s] model.safetensors.index.json: 100%|██████████████████████████████████████████| 53.5k/53.5k [00:00<00:00, 101MB/s] model-00001-of-00006.safetensors: 100%|█████████████████████████████████████| 9.89G/9.89G [03:47<00:00, 43.4MB/s] model-00002-of-00006.safetensors: 100%|█████████████████████████████████████| 9.98G/9.98G [03:23<00:00, 49.0MB/s] model-00003-of-00006.safetensors: 100%|█████████████████████████████████████| 9.98G/9.98G [03:44<00:00, 44.5MB/s] model-00004-of-00006.safetensors: 100%|█████████████████████████████████████| 9.90G/9.90G [03:30<00:00, 46.9MB/s] model-00005-of-00006.safetensors: 100%|█████████████████████████████████████| 9.08G/9.08G [03:08<00:00, 48.1MB/s] model-00006-of-00006.safetensors: 100%|█████████████████████████████████████| 1.05G/1.05G [00:20<00:00, 51.3MB/s] Downloading shards: 100%|█████████████████████████████████████████████████████████| 6/6 [17:58<00:00, 179.78s/it] Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 6/6 [01:27<00:00, 14.59s/it] WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. <|im_start|>user In a student council election, candidate A got 20% of the votes while candidate B got 50% more than candidate A's votes. The rest of the votes was given to candidate C. If there were 100 voters, how many votes did candidate C get?<|im_end|> <|im_start|>assistant Let's denote the number of votes candidate A got as \( A \). Candidate B got 50% more votes than candidate A, so candidate B got \( A + 0.5A = 1.5A \) votes. Candidate C got the rest of the votes, which means \( C = 100 - (A + 1.5A) \). We know that candidate A got 20% of the votes, so \( A = 20\% \times 100 = 20 \). Now we can calculate candidate C's votes: \( C = 100 - (20 + 1.5 \times 20) \) \( C = 100 - (20 + 30) \) \( C = 100 - 50 \) \( C = 50 \). Therefore, candidate C got 50 votes.<|im_end|> ```