Edit model card

StableLM 2 Zephyr 1.6B

Model Details

Model Name: Zephyr 1.6B (GGUF Format)

Quantization Options:

  • F16 (16-bit float)
  • Q8_0 (8-bit integer)

This repository hosts quantized versions of the StabilityAI Zephyr 1.6B model for efficient inference using the llama.cpp library. The quantized models have been optimized for both performance and memory usage, making them suitable for a variety of platforms, including constrained hardware setups like ARM64 and low-memory x86 machines.

Core Libraries

The original Zephyr 1.6B model has been adapted for llama.cpp with gguf quantization, providing seamless integration with a wide range of inference tools.

Quantized Model Files

Format File Name Size Description
F16 ggml-model-f16.gguf ~3.2 GB 16-bit float precision for balanced speed and accuracy
Q8_0 ggml-model-q8_0.gguf ~1.8 GB 8-bit integer precision for reduced memory usage

The F16 format provides high precision and is ideal for scenarios where maintaining output quality is crucial. The Q8_0 format significantly reduces the model size, making it suitable for deployments where memory is a key constraint.

Hardware Recommendations

Format Minimum RAM Recommended GPU
F16 8 GB 16 GB (with offloading)
Q8_0 4 GB 8 GB (CPU only recommended)

For optimal performance, it is recommended to use GPU offloading for the F16 format. The Q8_0 variant works well on CPUs with low RAM requirements.

Usage Example

You can use the following commands to run the Zephyr 1.6B models using llama.cpp:

Running with f16 Format

./main -m ggml-model-f16.gguf --n-predict -1 --prompt "Write a Python function that computes the Fibonacci sequence."

Running with q8_0 Format

./main -m ggml-model-q8_0.gguf --n-predict -1 --prompt "Explain the concept of machine learning in simple terms."

GPU Offloading (for f16 Models)

To leverage GPU offloading for the f16 model, you can use the following command:

./main -m ggml-model-f16.gguf --n-predict -1 --prompt "Summarize the impact of quantum computing on cryptography." --n-gpu-layers 32

Safety and Responsible Use

The Zephyr 1.6B model has been trained using a combination of instruction-tuning and synthetic data to enhance safety and ensure coherent responses. However, as with any large language model, there may be scenarios where outputs are not fully aligned with user expectations. It is recommended to always supervise outputs, particularly when used in sensitive applications.

For more details, refer to the original model card.

License

The quantized models in this repository are released under the CC-BY-NC-SA-4.0 license.
For details, see the license file.

Citation

If you use the Zephyr 1.6B models in your research or applications, please cite the original authors:

@article{stabilitylm2023,
  title={StableLM 2: Zephyr 1.6B},
  author={Stability AI},
  year={2023}
}
Downloads last month
23
GGUF
Model size
1.64B params
Architecture
stablelm

8-bit

16-bit

Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for teleprint-me/stablelm-2-zephyr-1_6b

Quantized
(12)
this model