Update README.md
Browse files
README.md
CHANGED
@@ -37,9 +37,9 @@ software. Both of them are included in a single file, which can be
|
|
37 |
downloaded and run as follows:
|
38 |
|
39 |
```
|
40 |
-
wget https://huggingface.co/Mozilla/gemma-2-
|
41 |
-
chmod +x gemma-2-
|
42 |
-
./gemma-2-
|
43 |
```
|
44 |
|
45 |
The default mode of operation for these llamafiles is our new command
|
@@ -63,13 +63,13 @@ To instruct Gemma to do role playing, you can customize the system
|
|
63 |
prompt as follows:
|
64 |
|
65 |
```
|
66 |
-
./gemma-2-
|
67 |
```
|
68 |
|
69 |
To view the man page, run:
|
70 |
|
71 |
```
|
72 |
-
./gemma-2-
|
73 |
```
|
74 |
|
75 |
To send a request to the OpenAI API compatible llamafile server, try:
|
@@ -78,7 +78,7 @@ To send a request to the OpenAI API compatible llamafile server, try:
|
|
78 |
curl http://localhost:8080/v1/chat/completions \
|
79 |
-H "Content-Type: application/json" \
|
80 |
-d '{
|
81 |
-
"model": "gemma-
|
82 |
"messages": [{"role": "user", "content": "Say this is a test!"}],
|
83 |
"temperature": 0.0
|
84 |
}'
|
@@ -87,7 +87,7 @@ curl http://localhost:8080/v1/chat/completions \
|
|
87 |
If you don't want the chatbot and you only want to run the server:
|
88 |
|
89 |
```
|
90 |
-
./gemma-2-
|
91 |
```
|
92 |
|
93 |
An advanced CLI mode is provided that's useful for shell scripting. You
|
@@ -95,7 +95,7 @@ can use it by passing the `--cli` flag. For additional help on how it
|
|
95 |
may be used, pass the `--help` flag.
|
96 |
|
97 |
```
|
98 |
-
./gemma-2-
|
99 |
```
|
100 |
|
101 |
You then need to fill out the prompt / history template (see below).
|
@@ -126,7 +126,7 @@ instead downloading the official llamafile release binary from
|
|
126 |
have the .exe file extension, and then saying:
|
127 |
|
128 |
```
|
129 |
-
.\llamafile-0.8.15.exe -m gemma-2-
|
130 |
```
|
131 |
|
132 |
That will overcome the Windows 4GB file size limit, allowing you to
|
@@ -172,13 +172,19 @@ AMD64.
|
|
172 |
## About Quantization Formats
|
173 |
|
174 |
This model works well with any quantization format. Q6\_K is the best
|
175 |
-
choice overall here.
|
176 |
-
|
177 |
-
|
178 |
-
|
179 |
-
|
180 |
-
|
181 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
182 |
|
183 |
## See Also
|
184 |
|
|
|
37 |
downloaded and run as follows:
|
38 |
|
39 |
```
|
40 |
+
wget https://huggingface.co/Mozilla/gemma-2-27b-it-llamafile/resolve/main/gemma-2-27b-it.Q6_K.llamafile
|
41 |
+
chmod +x gemma-2-27b-it.Q6_K.llamafile
|
42 |
+
./gemma-2-27b-it.Q6_K.llamafile
|
43 |
```
|
44 |
|
45 |
The default mode of operation for these llamafiles is our new command
|
|
|
63 |
prompt as follows:
|
64 |
|
65 |
```
|
66 |
+
./gemma-2-27b-it.Q6_K.llamafile --chat -p "you are mosaic's godzilla"
|
67 |
```
|
68 |
|
69 |
To view the man page, run:
|
70 |
|
71 |
```
|
72 |
+
./gemma-2-27b-it.Q6_K.llamafile --help
|
73 |
```
|
74 |
|
75 |
To send a request to the OpenAI API compatible llamafile server, try:
|
|
|
78 |
curl http://localhost:8080/v1/chat/completions \
|
79 |
-H "Content-Type: application/json" \
|
80 |
-d '{
|
81 |
+
"model": "gemma-27b-it",
|
82 |
"messages": [{"role": "user", "content": "Say this is a test!"}],
|
83 |
"temperature": 0.0
|
84 |
}'
|
|
|
87 |
If you don't want the chatbot and you only want to run the server:
|
88 |
|
89 |
```
|
90 |
+
./gemma-2-27b-it.Q6_K.llamafile --server --nobrowser --host 0.0.0.0
|
91 |
```
|
92 |
|
93 |
An advanced CLI mode is provided that's useful for shell scripting. You
|
|
|
95 |
may be used, pass the `--help` flag.
|
96 |
|
97 |
```
|
98 |
+
./gemma-2-27b-it.Q6_K.llamafile --cli -p 'four score and seven' --log-disable
|
99 |
```
|
100 |
|
101 |
You then need to fill out the prompt / history template (see below).
|
|
|
126 |
have the .exe file extension, and then saying:
|
127 |
|
128 |
```
|
129 |
+
.\llamafile-0.8.15.exe -m gemma-2-27b-it.Q6_K.llamafile
|
130 |
```
|
131 |
|
132 |
That will overcome the Windows 4GB file size limit, allowing you to
|
|
|
172 |
## About Quantization Formats
|
173 |
|
174 |
This model works well with any quantization format. Q6\_K is the best
|
175 |
+
choice overall here.
|
176 |
+
|
177 |
+
## Testing
|
178 |
+
|
179 |
+
We tested that the gemma2 27b q6\_k llamafile produces nearly identical
|
180 |
+
responses to the Gemma2 model hosted by Google on aistudio.google.com
|
181 |
+
when temperature is set to zero.
|
182 |
+
|
183 |
+
![screenshot of llamafile producing same output as google's hosted gemma service](gemma-proof.png)
|
184 |
+
|
185 |
+
Therefore, it is our belief, that the llamafile software faithfully
|
186 |
+
implements the gemma model. If you should encounter any divergences,
|
187 |
+
then try using the BF16 weights, which have the original fidelity.
|
188 |
|
189 |
## See Also
|
190 |
|