File size: 509 Bytes
a3390a0 7e1dc6e 7c9fc52 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
---
license: other
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
tags:
- gemma
- gguf
---
# Gemma 7B Instruct GGUF
Contains Q4 & Q8 quantized GGUFs for [google/gemma](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b)
## Perf
| Variant | Device | Perf |
| - | - | - |
| Q4 | RTX 2070S | 22 tok/s |
| | M1 Pro 10-core GPU | 28 tok/s |
| Q8 | RTX 2070S | 7 tok/s (could only offload 23/29 layers to GPU) |
| | M1 Pro 10-core GPU | 17 tok/s | |