DeepSparse Sparse LLMs
Collection
Useful LLMs for DeepSparse where we've pruned at least 50% of the weights!
•
10 items
•
Updated
•
5
Chat-aligned MPT 7b model pruned to 50% and quantized using SparseGPT for inference with DeepSparse
from deepsparse import TextGeneration
model = TextGeneration(model="hf:neuralmagic/mpt-7b-chat-pruned50-quant")
model("Tell me a joke.", max_new_tokens=50)