Edit model card

SparseLlama-2-7b-evolcodealpaca-pruned_50.2of4

Model Overview

  • Model Architecture: Llama-2
    • Input: Text
    • Output: Text
  • Model Optimizations:
    • Pruned: 50% 2:4
  • Release Date: 7/2/2024
  • Version: 1.0
  • Model Developers: Neural Magic

Compressed version of Llama-2-7b specialized for code-generation. This model was obtained by fine-tuning the Sparse Foundational model SparseLlama-2-7b-pruned_50.2of4 on the evol-codealpaca-v1 dataset. SquareHead knowledge distillation was used with Llama-2-7b-evolcodealpaca as teacher. It achieves HumanEval pass@1 of 34.58%, whereas the dense Llama-2-7b-evolcodealpaca model achieves 32.03%.

This model was produced as part if Neural Magic's Sparse Foundational Models initiative, and demostrates the capability of Sparse Foundational Models to transfer to the code-generation domain.

Model Optimizations

This model is derived from the Sparse Foundational model Sparse-Llama-2-7b-pruned_50.2of4, which was obtained by applying the SparseGPT algorithm to prune Llama-2-7b to 50% sparsity with a 2:4 mask. This optimization reduces the number of parameters by 50%, reducing the disk size and FLOPs by the same level.

Evaluation

This model was evaluated in the HumanEval benchmark using the bigcode-evaluation-harness.

Accuracy

Model HumanEval pass@1 Recovery
Llama-2-7b-evolcodealpaca 32.03% --
SparseLlama-2-7b-evolcodealpaca-pruned_50.2of4 34.58% 108%
Downloads last month
14
Safetensors
Model size
6.74B params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train neuralmagic/SparseLlama-2-7b-evolcodealpaca-pruned_50.2of4