Nemotron models that have been converted and/or quantized to work well in vLLM
Michael Goin
mgoin
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Organizations
Collections
1
spaces
3
models
70
mgoin/llava-1.5-7b-hf-FP8-Dynamic
Updated
•
26
mgoin/DeepSeek-Coder-V2-Lite-Instruct-FP8
Updated
•
9
mgoin/Mixtral-8x7B-Instruct-v0.1-FP8
Updated
•
5
mgoin/Qwen2-VL-7B-Instruct-FP8-Dynamic
Updated
•
9
mgoin/Nemotron-nemo-checkpoints
Updated
mgoin/Minitron-4B-Base-FP8
Text Generation
•
Updated
•
1.31k
•
3
mgoin/Nemotron-4-340B-Base-hf
Text Generation
•
Updated
•
2
•
1
mgoin/Nemotron-4-340B-Instruct-hf-FP8
Text Generation
•
Updated
•
199
•
2
mgoin/Nemotron-4-340B-Base-hf-FP8
Text Generation
•
Updated
•
18
•
2
mgoin/Nemotron-4-340B-Instruct-hf
Text Generation
•
Updated
•
72
•
2