Mozilla
/

gemma-2-27b-it-llamafile

Transformers

llamafile

Inference Endpoints

Model card Files Files and versions Community

jartine commited on 24 days ago

Commit

77852d6

•

1 Parent(s): 25216ea

Update README.md

Browse files

Files changed (1) hide show

README.md +22 -16

README.md CHANGED Viewed

@@ -37,9 +37,9 @@ software. Both of them are included in a single file, which can be
 downloaded and run as follows:
 ```
-wget https://huggingface.co/Mozilla/gemma-2-9b-it-llamafile/resolve/main/gemma-2-9b-it.Q6_K.llamafile
-chmod +x gemma-2-9b-it.Q6_K.llamafile
-./gemma-2-9b-it.Q6_K.llamafile
 ```
 The default mode of operation for these llamafiles is our new command
@@ -63,13 +63,13 @@ To instruct Gemma to do role playing, you can customize the system
 prompt as follows:
 ```
-./gemma-2-9b-it.Q6_K.llamafile --chat -p "you are mosaic's godzilla"
 ```
 To view the man page, run:
 ```
-./gemma-2-9b-it.Q6_K.llamafile --help
 ```
 To send a request to the OpenAI API compatible llamafile server, try:
@@ -78,7 +78,7 @@ To send a request to the OpenAI API compatible llamafile server, try:
 curl http://localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
-     "model": "gemma-9b-it",
      "messages": [{"role": "user", "content": "Say this is a test!"}],
      "temperature": 0.0
    }'
@@ -87,7 +87,7 @@ curl http://localhost:8080/v1/chat/completions \
 If you don't want the chatbot and you only want to run the server:
 ```
-./gemma-2-9b-it.Q6_K.llamafile --server --nobrowser --host 0.0.0.0
 ```
 An advanced CLI mode is provided that's useful for shell scripting. You
@@ -95,7 +95,7 @@ can use it by passing the `--cli` flag. For additional help on how it
 may be used, pass the `--help` flag.
 ```
-./gemma-2-9b-it.Q6_K.llamafile --cli -p 'four score and seven' --log-disable
 ```
 You then need to fill out the prompt / history template (see below).
@@ -126,7 +126,7 @@ instead downloading the official llamafile release binary from
 have the .exe file extension, and then saying:
 ```
-.\llamafile-0.8.15.exe -m gemma-2-9b-it.Q6_K.llamafile
 ```
 That will overcome the Windows 4GB file size limit, allowing you to
@@ -172,13 +172,19 @@ AMD64.
 ## About Quantization Formats
 This model works well with any quantization format. Q6\_K is the best
-choice overall here. We tested that, with [our 27b Gemma2
-llamafiles](https://huggingface.co/Mozilla/gemma-2-27b-it-llamafile),
-that the llamafile implementation of Gemma2 is able to to produce
-identical responses to the Gemma2 model that's hosted by Google on
-aistudio.google.com. Therefore we'd assume these 9b llamafiles are also
-faithful to Google's intentions. If you encounter any divergences, then
-try using the BF16 weights, which have the original fidelity.
 ## See Also

 downloaded and run as follows:
 ```
+wget https://huggingface.co/Mozilla/gemma-2-27b-it-llamafile/resolve/main/gemma-2-27b-it.Q6_K.llamafile
+chmod +x gemma-2-27b-it.Q6_K.llamafile
+./gemma-2-27b-it.Q6_K.llamafile
 ```
 The default mode of operation for these llamafiles is our new command
 prompt as follows:
 ```
+./gemma-2-27b-it.Q6_K.llamafile --chat -p "you are mosaic's godzilla"
 ```
 To view the man page, run:
 ```
+./gemma-2-27b-it.Q6_K.llamafile --help
 ```
 To send a request to the OpenAI API compatible llamafile server, try:
 curl http://localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
+     "model": "gemma-27b-it",
      "messages": [{"role": "user", "content": "Say this is a test!"}],
      "temperature": 0.0
    }'
 If you don't want the chatbot and you only want to run the server:
 ```
+./gemma-2-27b-it.Q6_K.llamafile --server --nobrowser --host 0.0.0.0
 ```
 An advanced CLI mode is provided that's useful for shell scripting. You
 may be used, pass the `--help` flag.
 ```
+./gemma-2-27b-it.Q6_K.llamafile --cli -p 'four score and seven' --log-disable
 ```
 You then need to fill out the prompt / history template (see below).
 have the .exe file extension, and then saying:
 ```
+.\llamafile-0.8.15.exe -m gemma-2-27b-it.Q6_K.llamafile
 ```
 That will overcome the Windows 4GB file size limit, allowing you to
 ## About Quantization Formats
 This model works well with any quantization format. Q6\_K is the best
+choice overall here.
+## Testing
+We tested that the gemma2 27b q6\_k llamafile produces nearly identical
+responses to the Gemma2 model hosted by Google on aistudio.google.com
+when temperature is set to zero.
+![screenshot of llamafile producing same output as google's hosted gemma service](gemma-proof.png)
+Therefore, it is our belief, that the llamafile software faithfully
+implements the gemma model. If you should encounter any divergences,
+then try using the BF16 weights, which have the original fidelity.
 ## See Also