SandLogic Technologies - Quantized Nxcode-CQ-7B-orpo Models
Model Description
We have quantized the Nxcode-CQ-7B-orpo model into two variants:
- Q5_KM
- Q4_KM
These quantized models offer improved efficiency while maintaining performance.
Discover our full range of quantized language models by visiting our SandLogic Lexicon GitHub. To learn more about our company and services, check out our website at SandLogic.
Original Model Information
- Name: Nxcode-CQ-7B-orpo
- Base Model: Qwen/CodeQwen1.5-7B
- Fine-tuning Approach: Monolithic Preference Optimization without Reference Model
- Fine-tuning Data: 100k samples of high-quality ranking data
- Model Type: Transformer-based decoder-only language model
- Parameters: 7 billion
- Context Length: 64K tokens
- Supported Languages: 92 coding languages
Model Capabilities
Nxcode-CQ-7B-orpo is designed for code-related tasks, with strong performance in:
- Code generation
- Long context understanding and generation
- Text-to-SQL conversion
- Bug fixing
Performance
Evalplus benchmark results:
- HumanEval pass@1: 86.6
- HumanEval+ pass@1: 83.5
- MBPP (v0.2.0) pass@1: 82.3
- MBPP+ (v0.2.0) pass@1: 70.4
Use Cases
- Code Generation: Create Python code based on function descriptions or partial implementations
- Code Completion: Suggest completions for partially written code
- Error Understanding: Potential to help identify and explain coding errors
- Programming Education: Provide explanations and examples of coding concepts and patterns
Model Variants
We offer two quantized versions of the Nxcode-CQ-7B-orpo model:
- Q5_KM: 5-bit quantization using the KM method
- Q4_KM: 4-bit quantization using the KM method
These quantized models aim to reduce model size and improve inference speed while maintaining performance as close to the original model as possible.
Input and Output
- Input: Text string (e.g., function descriptions, partial code implementations)
- Output: Generated code, completions, or explanations based on the input
Usage
pip install llama-cpp-python
Please refer to the llama-cpp-python documentation to install with GPU support.
Basic Text Completion
Here's an example demonstrating how to use the high-level API for basic text completion:
from llama_cpp import Llama
llm = Llama(
model_path="./models/7B/Nxcode-CQ-7b.gguf",
verbose=False,
# n_gpu_layers=-1, # Uncomment to use GPU acceleration
# n_ctx=2048, # Uncomment to increase the context window
)
output = llm.create_chat_completion(
messages = [
{"role": "system", "content": "You're an AI coding assistant who help in solving coding questions"},
{
"role": "user",
"content": "Write an python code to find prime number"
}
]
)
print(output["choices"][0]['message']['content'])
Download
You can download Llama
models in gguf
format directly from Hugging Face using the from_pretrained
method. This feature requires the huggingface-hub
package.
To install it, run: pip install huggingface-hub
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="SandLogicTechnologies/Nxcode-CQ-7B-orpo-GGUF",
filename="*Nxcode-CQ-7B-orpo-Q5_K_M.gguf",
verbose=False
)
By default, from_pretrained will download the model to the Hugging Face cache directory. You can manage installed model files using the huggingface-cli tool.
Acknowledgements
We thank the original developers of Nxcode-CQ-7B-orpo and Qwen/CodeQwen1.5-7B for their contributions to the field.Special thanks to Georgi Gerganov and the entire llama.cpp development team for their outstanding contributions.
Contact
For any inquiries or support, please contact us at [email protected] or visit our support page.
- Downloads last month
- 22
Model tree for SandLogicTechnologies/Nxcode-CQ-7B-orpo-GGUF
Base model
NTQAI/Nxcode-CQ-7B-orpo