StableLM 2 Zephyr 1.6B

Model Details

Model Name: Zephyr 1.6B (GGUF Format)

Quantization Options:

F16 (16-bit float)
Q8_0 (8-bit integer)

This repository hosts quantized versions of the StabilityAI Zephyr 1.6B model for efficient inference using the llama.cpp library. The quantized models have been optimized for both performance and memory usage, making them suitable for a variety of platforms, including constrained hardware setups like ARM64 and low-memory x86 machines.

Core Libraries

Core Library: llama.cpp
Model Format: GGUF (f16 and q8)
Original Model Source: stabilityai/stablelm-2-zephyr-1_6b

The original Zephyr 1.6B model has been adapted for llama.cpp with gguf quantization, providing seamless integration with a wide range of inference tools.

Quantized Model Files

Format	File Name	Size	Description
F16	`ggml-model-f16.gguf`	~3.2 GB	16-bit float precision for balanced speed and accuracy
Q8_0	`ggml-model-q8_0.gguf`	~1.8 GB	8-bit integer precision for reduced memory usage

The F16 format provides high precision and is ideal for scenarios where maintaining output quality is crucial. The Q8_0 format significantly reduces the model size, making it suitable for deployments where memory is a key constraint.

Hardware Recommendations

Format	Minimum RAM	Recommended GPU
F16	8 GB	16 GB (with offloading)
Q8_0	4 GB	8 GB (CPU only recommended)

For optimal performance, it is recommended to use GPU offloading for the F16 format. The Q8_0 variant works well on CPUs with low RAM requirements.

Usage Example

You can use the following commands to run the Zephyr 1.6B models using llama.cpp:

Running with `f16` Format

./main -m ggml-model-f16.gguf --n-predict -1 --prompt "Write a Python function that computes the Fibonacci sequence."

Running with `q8_0` Format

./main -m ggml-model-q8_0.gguf --n-predict -1 --prompt "Explain the concept of machine learning in simple terms."

GPU Offloading (for `f16` Models)

To leverage GPU offloading for the f16 model, you can use the following command:

./main -m ggml-model-f16.gguf --n-predict -1 --prompt "Summarize the impact of quantum computing on cryptography." --n-gpu-layers 32

Safety and Responsible Use

The Zephyr 1.6B model has been trained using a combination of instruction-tuning and synthetic data to enhance safety and ensure coherent responses. However, as with any large language model, there may be scenarios where outputs are not fully aligned with user expectations. It is recommended to always supervise outputs, particularly when used in sensitive applications.

For more details, refer to the original model card.

License

The quantized models in this repository are released under the CC-BY-NC-SA-4.0 license.
For details, see the license file.

Citation

If you use the Zephyr 1.6B models in your research or applications, please cite the original authors:

@article{stabilitylm2023,
  title={StableLM 2: Zephyr 1.6B},
  author={Stability AI},
  year={2023}
}

teleprint-me
/

stablelm-2-zephyr-1_6b

StableLM 2 Zephyr 1.6B

Model Details

Core Libraries

Quantized Model Files

Hardware Recommendations

Usage Example

Running with `f16` Format

Running with `q8_0` Format

GPU Offloading (for `f16` Models)

Safety and Responsible Use

License

Citation

Model tree for teleprint-me/stablelm-2-zephyr-1_6b

StableLM 2 Zephyr 1.6B

Model Details

Core Libraries

Quantized Model Files

Hardware Recommendations

Usage Example

Running with f16 Format

Running with q8_0 Format

GPU Offloading (for f16 Models)

Safety and Responsible Use

License

Citation

Model tree for teleprint-me/stablelm-2-zephyr-1_6b

Running with `f16` Format

Running with `q8_0` Format

GPU Offloading (for `f16` Models)