llama-161M
Trained on 100B tokens.
- 1e-3 LR
- 0.1 wd
- WSD scheduler with 10% decay
- 80% code, 10% NL, 10% instruction data
- Dataset decontaminated against popular benchmarks following bigcode
- 8x3090s 110~ hours
This is a base pretrained model and requires further fine tuning to be useful.
Model Details
openai/openai_humaneval (greedy) | mbpp (greedy) |
---|---|
9.2% | 9.8% |
- Downloads last month
- 758
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.