File size: 3,322 Bytes
c8b7e6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
599ee61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8b7e6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
base_model: allura-org/G2-9B-Aletheia-v1
library_name: transformers
tags:
- mergekit
- merge
- llama-cpp
- gguf-my-repo
license: gemma
---

# Triangle104/G2-9B-Aletheia-v1-Q5_K_S-GGUF
This model was converted to GGUF format from [`allura-org/G2-9B-Aletheia-v1`](https://huggingface.co/allura-org/G2-9B-Aletheia-v1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
Refer to the [original model card](https://huggingface.co/allura-org/G2-9B-Aletheia-v1) for more details on the model.

---
Model details:
-
A merge of Sugarquill and Sunfall. I wanted to combine Sugarquill's more novel-like writing style with something that would improve it's RP perfomance and make it more steerable, w/o adding superfluous synthetic writing patterns.

I quite like Crestfall's Sunfall models and I felt like Gemma version of Sunfall will steer the model in this direction when merged in. To keep more of Gemma-2-9B-it-SPPO-iter3's smarts, I've decided to apply Sunfall LoRA on top of it, instead of using the published Sunfall model.

I'm generally pleased with the result, this model has nice, fresh writing style, good charcard adherence and good system prompt following. It still should work well for raw completion storywriting, as it's a trained feature in both merged models.

Made by Auri.

Thanks to Prodeus, Inflatebot and ShotMisser for testing and giving feedback.

Format
Model responds to Gemma instruct formatting, exactly like it's base model.

<bos><start_of_turn>user
{user message}<end_of_turn>
<start_of_turn>model
{response}<end_of_turn><eos>

Mergekit config
The following YAML configuration was used to produce this model:

models:
  - model: allura-org/G2-9B-Sugarquill-v0
    parameters:
      weight: 0.55
      density: 0.4
  - model: UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3+AuriAetherwiing/sunfall-g2-lora
    parameters:
      weight: 0.45
      density: 0.3
merge_method: ties
base_model: UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
parameters:
  normalize: true
dtype: bfloat16

---
## Use with llama.cpp
Install llama.cpp through brew (works on Mac and Linux)

```bash
brew install llama.cpp

```
Invoke the llama.cpp server or the CLI.

### CLI:
```bash
llama-cli --hf-repo Triangle104/G2-9B-Aletheia-v1-Q5_K_S-GGUF --hf-file g2-9b-aletheia-v1-q5_k_s.gguf -p "The meaning to life and the universe is"
```

### Server:
```bash
llama-server --hf-repo Triangle104/G2-9B-Aletheia-v1-Q5_K_S-GGUF --hf-file g2-9b-aletheia-v1-q5_k_s.gguf -c 2048
```

Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.
```
git clone https://github.com/ggerganov/llama.cpp
```

Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
```
cd llama.cpp && LLAMA_CURL=1 make
```

Step 3: Run inference through the main binary.
```
./llama-cli --hf-repo Triangle104/G2-9B-Aletheia-v1-Q5_K_S-GGUF --hf-file g2-9b-aletheia-v1-q5_k_s.gguf -p "The meaning to life and the universe is"
```
or 
```
./llama-server --hf-repo Triangle104/G2-9B-Aletheia-v1-Q5_K_S-GGUF --hf-file g2-9b-aletheia-v1-q5_k_s.gguf -c 2048
```