DeciCoder-1b / README.md

Update README.md

c00a8fe about 1 year ago

5.54 kB

	---
	pipeline_tag: text-generation
	license: apache-2.0
	tags:
	- text generation
	programming_language:
	- Java
	- JavaScript
	- Python
	metrics:
	- code_eval
	inference: true
	widget:
	- text: 'def print_hello_world():'
	example_title: Hello world
	group: Python
	model-index:
	- name: DeciCoder-1b
	results:
	- task:
	type: text-generation
	dataset:
	type: nuprl/MultiPL-E
	name: MultiPL-HumanEval (Python)
	metrics:
	- name: pass@1
	type: pass@1
	value: 0.191
	verified: false
	- task:
	type: text-generation
	dataset:
	type: nuprl/MultiPL-E
	name: MultiPL-HumanEval (JavaScript)
	metrics:
	- name: pass@1
	type: pass@1
	value: 0.184
	verified: false
	- task:
	type: text-generation
	dataset:
	type: nuprl/MultiPL-E
	name: MultiPL-HumanEval (Java)
	metrics:
	- name: pass@1
	type: pass@1
	value: 0.166
	verified: false
	datasets:
	- bigcode/starcoderdata
	---

	# Model Card for DeciCoder 1B

	DeciCoder 1B is a 1 billion parameter decoder-only code completion model
	trained on the Python, Java, and Javascript subsets of [Starcoder Training Dataset](https://huggingface.co/datasets/bigcode/starcoderdata).
	The model uses Grouped Query Attention and has a context window of 2048
	tokens. It was trained using a Fill-in-the-Middle training objective. The model's
	architecture was generated by Deci's proprietary Neural Architecture
	Search-based technology, AutoNAC.

	## Model Details

	- Developed by: Deci
	- Model type: DeciCoder is an auto-regressive language model based on the transformer decoder architecture, using Grouped Query Attention.
	- Language(s): Python, Java, JavaScript
	- License: Model checkpoints are licensed under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)

	## Model Architecture

	\| Parameters \| Layers \| Heads \| Sequence Length \| GQA num_key_value_heads \| Hidden Size \|
	\|:----------\|:----------\|:----------\|:----------\|:----------\|:----------\|
	\| 1.1B \| 20 \| 32 \| 2048 \| 4 \| 2048 \| \|


	- Decoder layer: Grouped Query Attention [Ainslie et al., 2023](https://arxiv.org/abs/2305.13245)
	- Position Embeddings: Rotary Position Embeddings [Su et al., 2021](https://arxiv.org/abs/2104.09864)

	## Uses

	The model is intended to do single/multiline code completion from a
	context window of up to 2048k tokens. It is not an instruction model
	and commands like \"Write a function that computes the absolute value of
	an integer,\" won't yield the desired results. A more effective approach
	is to frame instructions in the style of source code comments (e.g. \#
	this function calculates the absolute value of an integer) or to present
	a function signature and docstring, enabling the model to complete the
	function's body.

	### How to Use

	```bibtex
	# pip install -q transformers
	from transformers import AutoModelForCausalLM, AutoTokenizer

	checkpoint = "Deci/DeciCoder-1b"
	device = "cuda" # for GPU usage or "cpu" for CPU usage

	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)

	inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
	outputs = model.generate(inputs)
	print(tokenizer.decode(outputs[0]))
	```

	### Attribution

	DeciCoder was trained on StarCoder Training Dataset, filtered for
	Python, Java, and Javascript code. For additional information, please
	refer to [https://huggingface.co/datasets/bigcode/starcoderdata](https://huggingface.co/datasets/bigcode/starcoderdata).

	### Limitations

	The model has undergone training with source code from Python, Java, and
	JavaScript. While the primary language in the source is English, it does
	contain other languages. Therefore, the model can produce code snippets
	given some context. However, there\'s no assurance that the resulting
	code will function as expected. It might be suboptimal, contain bugs, or
	even exploits.

	## Training Details

	### Training Data

	DeciCoder was trained on the Python, Java, and Javascript subsets of [Starcoder Training Dataset](https://huggingface.co/datasets/bigcode/starcoderdata)


	### Training Procedure

	- Warm-Up Steps: 9000
	- Total Training Steps: 284k
	- Total Tokenes: 446B
	- Global Batch Size: 768
	- Optimizer: AdamW
	- Optimizer Parameters: beta1=0.9, beta2=0.95
	- Weight Decay: 0.1
	- Learning Rate: 4e-4
	- Learning Rate Schedule: cosine

	## Evaluation

	Below are DeciCoder's pass@1 on MultiPL HumanEval scores

	\| Python \| JavaScript \| Java \|
	\|:----------\|:----------\|:----------\|
	\| 19.1% \| 18.4% \| 16.6% \|


	### Runtime Benchmarks

	\|Inference Tool/Hardware \| A10G (tokens/sec) \| A100 (tokens/sec) \|
	\|:----------\|:----------\|:----------\|
	\| HF Inference Endpoints \| 1,364.2 \| 3,244.4 \|
	\| Infery LLM \| 3,889.3 \| 11,676.8 \|

	## Documentation

	- [Notebook](https://colab.research.google.com/drive/1JCxvBsWCZKHfIcHSMVf7GZCs3ClMQPjs)
	- Blog post: [Introducing DeciCoder: The New Gold Standard in Efficient and Accurate Code Generation](https://deci.ai/blog/decicoder-efficient-and-accurate-code-generation-llm/)
	- Questions:Feel free to contact us via our [Discord Community!](https://discord.com/invite/p9ecgRhDR8/)

	## How to Cite

	Please cite this model using this format.

	```bibtex
	@misc{DeciFoundationModels,
	title = {DeciCoder},
	author = {DeciAI Research Team},
	year = {2023}
	url={[https://huggingface.co/deci/decicoder-1b](https://huggingface.co/deci/decicoder-1b)},
	}
	```