YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
language: en
datasets:
- 37 popular Python code repositories
- See princeton-nlp/SWE-bench train split
- See the
make_datasets
documentation on SWE-bench's GitHub for details on formatting input.
SWE-Llama
SWE-Llama are variants of the CodeLlama model fine-tuned on software engineering tasks extracted from real-world GitHub issues and pull requests. They were introduced and evaluated on the SWE-bench benchmark in this paper.
Model Details
- Architecture: Transformer, based on CodeLlama architecture
- Parameters: 7 billion for SWE-Llama-7b, 13 billion for SWE-Llama-13b
- Objective: Generating patches to resolve GitHub issues, conditioned on issue description and code context
Training Data
SWE-Llama was fine-tuned on 19,000 issues and pull requests collected from 37 popular Python code repositories on GitHub, disjoint from those used in SWE-bench.
Training Procedure
- Fine-tuned only the attention matrices using LoRA method
- Trained for 4 epochs with a batch size of 32
- Selected best checkpoint based on validation perplexity
Evaluation Results
When evaluated on the SWE-bench benchmark:
- SWE-Llama-7b achieved 3.0% issue resolution rate using oracle context retrieval
- SWE-Llama-13b achieved 4.0% issue resolution rate using oracle context retrieval
BibTeX Entry
@misc{jimenez2023swebench,
title={SWE-bench: Can Language Models Resolve Real-World GitHub Issues?},
author={Carlos E. Jimenez and John Yang
and Alexander Wettig and Shunyu Yao
and Kexin Pei and Ofir Press and Karthik Narasimhan},
year={2023},
eprint={2310.06770},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 708
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.