# Fast-Inference with Ctranslate2
Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
quantized version of Salesforce/codet5p-770m
pip install hf-hub-ctranslate2>=2.0.8
Converted on 2023-05-20 using
ct2-transformers-converter --model Salesforce/codet5p-770m --output_dir /home/michael/tmp-ct2fast-codet5p-770m --force --copy_files merges.txt README.md tokenizer_config.json vocab.json special_tokens_map.json added_tokens.json .gitattributes --quantization float16
Checkpoint compatible to ctranslate2>=3.13.0 and hf-hub-ctranslate2>=2.0.8
compute_type=int8_float16
fordevice="cuda"
compute_type=int8
fordevice="cpu"
from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
from transformers import AutoTokenizer
model_name = "michaelfeil/ct2fast-codet5p-770m"
# use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
model = TranslatorCT2fromHfHub(
# load in int8 on CUDA
model_name_or_path=model_name,
device="cuda",
compute_type="int8_float16",
tokenizer=AutoTokenizer.from_pretrained("Salesforce/codet5p-770m")
)
outputs = model.generate(
text=["def print_hello_world():", "def hello_name(name:"],
decode_tok_kwargs=dict(skip_special_tokens=True),
max_decoding_length=64,
end_token=["def"]
)
print(outputs)
Licence and other remarks:
This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
Original description
CodeT5+ 770M
Model description
CodeT5+ is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. encoder-only, decoder-only, and encoder-decoder) to support a wide range of code understanding and generation tasks. It is introduced in the paper:
CodeT5+: Open Code Large Language Models for Code Understanding and Generation by Yue Wang*, Hung Le*, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi (* indicates equal contribution).
Compared to the original CodeT5 family (CodeT5-base: 220M
, CodeT5-large: 770M
), CodeT5+ is pretrained with a diverse set of pretraining tasks including span denoising, causal language modeling, contrastive learning, and text-code matching to learn rich representations from both unimodal code data and bimodal code-text data.
Additionally, it employs a simple yet effective compute-efficient pretraining method to initialize the model components with frozen off-the-shelf LLMs such as CodeGen to efficiently scale up the model (i.e. 2B
, 6B
, 16B
), and adopts a "shallow encoder and deep decoder" architecture.
Furthermore, it is instruction-tuned to align with natural language instructions (see our InstructCodeT5+ 16B) following Code Alpaca.
How to use
This model can be easily loaded using the T5ForConditionalGeneration
functionality and employs the same tokenizer as original CodeT5.
from transformers import T5ForConditionalGeneration, AutoTokenizer
checkpoint = "Salesforce/codet5p-770m"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = T5ForConditionalGeneration.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("def print_hello_world():<extra_id_0>", return_tensors="pt").to(device)
outputs = model.generate(inputs, max_length=10)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# ==> print "Hello World"
Pretraining data
This checkpoint is trained on the stricter permissive subset of the deduplicated version of the github-code dataset.
The data is preprocessed by reserving only permissively licensed code ("mit" “apache-2”, “bsd-3-clause”, “bsd-2-clause”, “cc0-1.0”, “unlicense”, “isc”).
Supported languages (9 in total) are as follows:
c
, c++
, c-sharp
, go
, java
, javascript
, php
, python
, ruby.
Training procedure
This checkpoint is trained on the unimodal code data at the first-stage pretraining, which includes a diverse set of pretraining tasks including span denoising and two variants of causal language modeling. Please refer to the paper for more details.
Evaluation results
CodeT5+ models have been comprehensively evaluated on a wide range of code understanding and generation tasks in various settings: zero-shot, finetuning, and instruction-tuning. Specifically, CodeT5+ yields substantial performance gains on many downstream tasks compared to their SoTA baselines, e.g., 8 text-to-code retrieval tasks (+3.2 avg. MRR), 2 line-level code completion tasks (+2.1 avg. Exact Match), and 2 retrieval-augmented code generation tasks (+5.8 avg. BLEU-4). In 2 math programming tasks on MathQA-Python and GSM8K-Python, CodeT5+ models of below billion-parameter sizes significantly outperform many LLMs of up to 137B parameters. Particularly, in the zero-shot text-to-code generation task on HumanEval benchmark, InstructCodeT5+ 16B sets new SoTA results of 35.0% pass@1 and 54.5% pass@10 against other open code LLMs, even surpassing the closed-source OpenAI code-cushman-001 mode Please refer to the paper for more details.
BibTeX entry and citation info
@article{wang2023codet5plus,
title={CodeT5+: Open Code Large Language Models for Code Understanding and Generation},
author={Wang, Yue and Le, Hung and Gotmare, Akhilesh Deepak and Bui, Nghi D.Q. and Li, Junnan and Hoi, Steven C. H.},
journal={arXiv preprint},
year={2023}
}
- Downloads last month
- 3