iAkashPaul
/

gemma-7b-it-gguf

Inference Endpoints

Model card Files Files and versions Community

gemma-7b-it-gguf / README.md

iAkashPaul's picture

Update README.md

7c9fc52 verified 9 months ago

|

history blame contribute delete

509 Bytes

	---
	license: other
	license_name: gemma-terms-of-use
	license_link: https://ai.google.dev/gemma/terms
	tags:
	- gemma
	- gguf
	---

	# Gemma 7B Instruct GGUF

	Contains Q4 & Q8 quantized GGUFs for [google/gemma](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b)

	## Perf

	\| Variant \| Device \| Perf \|
	\| - \| - \| - \|
	\| Q4 \| RTX 2070S \| 22 tok/s \|
	\| \| M1 Pro 10-core GPU \| 28 tok/s \|
	\| Q8 \| RTX 2070S \| 7 tok/s (could only offload 23/29 layers to GPU) \|
	\| \| M1 Pro 10-core GPU \| 17 tok/s \|