|
--- |
|
license: apache-2.0 |
|
tags: |
|
- moe |
|
- frankenmoe |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- Himitsui/Kaiju-11B |
|
- Sao10K/Fimbulvetr-11B-v2 |
|
- decapoda-research/Antares-11b-v2 |
|
- beberik/Nyxene-v3-11B |
|
base_model: |
|
- Himitsui/Kaiju-11B |
|
- Sao10K/Fimbulvetr-11B-v2 |
|
- decapoda-research/Antares-11b-v2 |
|
- beberik/Nyxene-v3-11B |
|
--- |
|
|
|
# Umbra-v3-MoE-4x11b |
|
|
|
Umbra-v3-MoE-4x11b is a Mixure of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing): |
|
* [Himitsui/Kaiju-11B](https://huggingface.co/Himitsui/Kaiju-11B) |
|
* [Sao10K/Fimbulvetr-11B-v2](https://huggingface.co/Sao10K/Fimbulvetr-11B-v2) |
|
* [decapoda-research/Antares-11b-v2](https://huggingface.co/decapoda-research/Antares-11b-v2) |
|
* [beberik/Nyxene-v3-11B](https://huggingface.co/beberik/Nyxene-v3-11B) |
|
|
|
## 🧩 Configuration |
|
|
|
```yaml |
|
base_model: vicgalle/CarbonBeagle-11B-truthy |
|
gate_mode: hidden |
|
dtype: bfloat16 |
|
experts_per_token: 4 |
|
experts: |
|
- source_model: Himitsui/Kaiju-11B |
|
positive_prompts: |
|
- "Imagine" |
|
- "Create" |
|
- "Envision" |
|
- "Fantasize" |
|
- "Invent" |
|
- "Narrate" |
|
- "Plot" |
|
- "Portray" |
|
- "Storytell" |
|
- "Visualize" |
|
- "Describe" |
|
- "Develop" |
|
- "Forge" |
|
- "Craft" |
|
- "Conceptualize" |
|
- "Dream" |
|
- "Concoct" |
|
- "Characterize" |
|
negative_prompts: |
|
- "Recite" |
|
- "Report" |
|
- "Summarize" |
|
- "Enumerate" |
|
- "List" |
|
- "Cite" |
|
|
|
- source_model: Sao10K/Fimbulvetr-11B-v2 |
|
positive_prompts: |
|
- "Dramatize" |
|
- "Embody" |
|
- "Illustrate" |
|
- "Perform" |
|
- "Roleplay" |
|
- "Simulate" |
|
- "Stage" |
|
- "Unfold" |
|
- "Weave" |
|
- "Design" |
|
- "Outline" |
|
- "Script" |
|
- "Sketch" |
|
- "Spin" |
|
- "Depict" |
|
- "Render" |
|
- "Fashion" |
|
- "Conceive" |
|
negative_prompts: |
|
- "Analyze" |
|
- "Critique" |
|
- "Dissect" |
|
- "Explain" |
|
- "Clarify" |
|
- "Interpret" |
|
|
|
- source_model: decapoda-research/Antares-11b-v2 |
|
positive_prompts: |
|
- "Solve" |
|
- "Respond" |
|
- "Convey" |
|
- "Disclose" |
|
- "Expound" |
|
- "Narrate" |
|
- "Present" |
|
- "Reveal" |
|
- "Specify" |
|
- "Uncover" |
|
- "Decode" |
|
- "Examine" |
|
- "Report" |
|
- "Survey" |
|
- "Validate" |
|
- "Verify" |
|
- "Question" |
|
- "Query" |
|
negative_prompts: |
|
- "Divert" |
|
- "Obscure" |
|
- "Overstate" |
|
- "Undermine" |
|
- "Misinterpret" |
|
- "Skew" |
|
|
|
- source_model: beberik/Nyxene-v3-11B |
|
positive_prompts: |
|
- "Explain" |
|
- "Instruct" |
|
- "Clarify" |
|
- "Educate" |
|
- "Guide" |
|
- "Inform" |
|
- "Teach" |
|
- "Detail" |
|
- "Elaborate" |
|
- "Enlighten" |
|
- "Advise" |
|
- "Interpret" |
|
- "Analyze" |
|
- "Define" |
|
- "Demonstrate" |
|
- "Illustrate" |
|
- "Simplify" |
|
- "Summarize" |
|
negative_prompts: |
|
- "Speculate" |
|
- "Fabricate" |
|
- "Exaggerate" |
|
- "Mislead" |
|
- "Confuse" |
|
- "Distort" |
|
|
|
``` |
|
|
|
## 💻 Usage |
|
|
|
```python |
|
!pip install -qU transformers bitsandbytes accelerate |
|
|
|
from transformers import AutoTokenizer |
|
import transformers |
|
import torch |
|
|
|
model = "Steelskull/Umbra-v3-MoE-4x11b" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model) |
|
pipeline = transformers.pipeline( |
|
"text-generation", |
|
model=model, |
|
model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True}, |
|
) |
|
|
|
messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}] |
|
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95) |
|
print(outputs[0]["generated_text"]) |
|
``` |