CodeT5+ 220M Bimodal Models
Model description
CodeT5+ is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. encoder-only, decoder-only, and encoder-decoder) to support a wide range of code understanding and generation tasks. It is introduced in the paper:
CodeT5+: Open Code Large Language Models for Code Understanding and Generation by Yue Wang*, Hung Le*, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi (* indicates equal contribution).
This repo is the tokenizer with special tokens for pretrain tasks and also have pretrained on verilog code.
How to use
from transformers import AutoTokenizer
tokenizer_path = "zacharyxxxxcr/codet5p-220m-bimodal-verilog-word"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)
BibTeX entry and citation info
@article{wang2023codet5plus,
title={CodeT5+: Open Code Large Language Models for Code Understanding and Generation},
author={Wang, Yue and Le, Hung and Gotmare, Akhilesh Deepak and Bui, Nghi D.Q. and Li, Junnan and Hoi, Steven C. H.},
journal={arXiv preprint},
year={2023}
}