|
--- |
|
base_model: |
|
- nothingiisreal/L3.1-8B-Celeste-V1.5 |
|
- Sao10K/Llama-3.1-8B-Stheno-v3.4 |
|
- Sao10K/L3.1-8B-Niitama-v1.1 |
|
- arcee-ai/Llama-3.1-SuperNova-Lite |
|
- akjindal53244/Llama-3.1-Storm-8B |
|
- arcee-ai/Llama-Spark |
|
- grimjim/Llama-3-Instruct-abliteration-LoRA-8B |
|
- crestf411/sunfall-peft |
|
tags: |
|
- llama |
|
- merge |
|
- llama3 |
|
- mixtral |
|
library_name: transformers |
|
--- |
|
|
|
|
|
> [!WARNING] |
|
> **Content:**<br> |
|
> This models output's can be a bit unhinged. |
|
|
|
# Llama-3.1-Celestial-Stone-2x8B (BF16) |
|
|
|
* *Mixture of Experts (14B).* |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f74b6e6389380c77562762/lBrXRa3sVRinE3cabs-oQ.png) |
|
|
|
Both experts are used in tandem when generating a token. |
|
|
|
------------------------------------------------------------------------------ |
|
|
|
* *Llama.CPP - GGUF.* |
|
|
|
# Thank you mradermacher for the quants! |
|
|
|
----> [GGUF iMatrix](https://huggingface.co/mradermacher/L3.1-Celestial-Stone-2x8B-i1-GGUF) |
|
|
|
----> [GGUF static](https://huggingface.co/mradermacher/L3.1-Celestial-Stone-2x8B-GGUF) |
|
|
|
# Thank you QuantFactory for the quants! |
|
|
|
----> [GGUF static](https://huggingface.co/QuantFactory/L3.1-Celestial-Stone-2x8B-GGUF) |
|
|
|
------------------------------------------------------------------------------ |
|
|
|
*The first expert* is Instruct 405B distillation/RP vector merge <b>(Supernova-Lite, Niitama1.1, Storm)</b> |
|
|
|
*The second expert* is ERP/Reddit data merge <b>(Celeste1.5, Stheno3.4, Storm)</b> |
|
|
|
------------------------------------------------------------------------------- |
|
|
|
*The base model* is <b>Sao10k/L3.1-Stheno-3.4</b> with the <b>Sunfall LoRa 0.6.1</b> to make it understand SillyTavern prompts and storywriting better. |
|
|
|
------------------------------------------------------------------------------- |
|
|
|
# Prompt Template: |
|
```bash |
|
<|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
|
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
|
{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
{output}<|eot_id|> |
|
|
|
``` |
|
|
|
* *Other Details:* |
|
|
|
*The model has 131072 context length, and is on Llama-3.1 and Mixtral architecture.* |
|
|
|
*I did not abliterate the base model at all, so it will refuse zero-shot unethical questions. I recommend avoiding keywords like 'assistant, helpful, kind'* |
|
|
|
# Recipe (I'm sorry...): |
|
```yaml |
|
slices: |
|
- sources: |
|
- model: Sao10K/L3.1-8B-Niitama-v1.1+grimjim/Llama-3-Instruct-abliteration-LoRA-8B |
|
layer_range: [0, 32] |
|
- model: akjindal53244/Llama-3.1-Storm-8B |
|
layer_range: [0, 32] |
|
merge_method: nearswap |
|
base_model: Sao10K/L3.1-8B-Niitama-v1.1+grimjim/Llama-3-Instruct-abliteration-LoRA-8B |
|
parameters: |
|
t: |
|
- value: 0.0001 |
|
dtype: bfloat16 |
|
out_type: float16 |
|
slices: |
|
- sources: |
|
- model: v000000/Llama-3.1-8B-Stheno-v3.4-abliterated |
|
layer_range: [0, 32] |
|
- model: akjindal53244/Llama-3.1-Storm-8B |
|
layer_range: [0, 32] |
|
merge_method: slerp |
|
base_model: v000000/Llama-3.1-8B-Stheno-v3.4-abliterated |
|
parameters: |
|
t: |
|
- filter: self_attn |
|
value: [0.1, 0.6, 0.3, 0.8, 0.5] |
|
- filter: mlp |
|
value: [0.9, 0.4, 0.7, 0.2, 0.5] |
|
- value: 0.5 |
|
dtype: float32 |
|
models: |
|
- model: arcee-ai/Llama-3.1-SuperNova-Lite |
|
parameters: |
|
weight: 1.0 |
|
- model: v000000/L3.1-Niitorm-8B-t0.0001 |
|
parameters: |
|
weight: 0.4 |
|
merge_method: task_arithmetic |
|
base_model: arcee-ai/Llama-3.1-SuperNova-Lite |
|
parameters: |
|
normalize: false |
|
dtype: float16 |
|
models: |
|
- model: arcee-ai/Llama-3.1-SuperNova-Lite |
|
parameters: |
|
weight: 0.0 |
|
- model: v000000/L3.1-Niitorm-8B-t0.0001 |
|
parameters: |
|
weight: 1.25 |
|
merge_method: task_arithmetic |
|
base_model: arcee-ai/Llama-3.1-SuperNova-Lite |
|
parameters: |
|
normalize: false |
|
dtype: float16 |
|
models: |
|
- model: v000000/L3.1-8B-RP-Test-003-Task_Arithmetic |
|
merge_method: slerp |
|
base_model: v000000/L3.1-8B-RP-Test-002-Task_Arithmetic+grimjim/Llama-3-Instruct-abliteration-LoRA-8B |
|
parameters: |
|
t: |
|
- value: [0, 0, 0.3, 0.4, 0.5, 0.6, 0.5, 0.4, 0.3, 0, 0] |
|
dtype: float16 |
|
base_model: nothingiisreal/L3.1-8B-Celeste-V1.5+grimjim/Llama-3-Instruct-abliteration-LoRA-8B |
|
dtype: bfloat16 |
|
merge_method: task_arithmetic |
|
parameters: |
|
normalize: false |
|
slices: |
|
- sources: |
|
- layer_range: [0, 32] |
|
model: nothingiisreal/L3.1-8B-Celeste-V1.5+grimjim/Llama-3-Instruct-abliteration-LoRA-8B |
|
parameters: |
|
weight: 0.7 |
|
- layer_range: [0, 32] |
|
model: v000000/L3.1-Sthenorm-8B |
|
parameters: |
|
weight: 0.2 |
|
- layer_range: [0, 32] |
|
model: nothingiisreal/L3.1-8B-Celeste-V1.5 |
|
parameters: |
|
weight: 0.2 |
|
base_model: crestf411/L3.1-8B-sunfall-stheno-v0.6.1 |
|
experts_per_token: 2 |
|
local_experts: 2 |
|
gate_mode: random |
|
dtype: bfloat16 |
|
experts: |
|
- source_model: v000000/L3.1-Storniitova-8B |
|
- source_model: x0000001/l3.1-part_aaa |
|
|
|
``` |