CodeT5+ 220M Bimodal Models

Model description

CodeT5+ is a new family of open code large language models with an encoder-decoder architecture that can flexibly operate in different modes (i.e. encoder-only, decoder-only, and encoder-decoder) to support a wide range of code understanding and generation tasks. It is introduced in the paper:

CodeT5+: Open Code Large Language Models for Code Understanding and Generation by Yue Wang*, Hung Le*, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi (* indicates equal contribution).

This repo is the tokenizer with special tokens for pretrain tasks and also have pretrained on verilog code.

How to use

from transformers import AutoTokenizer

tokenizer_path = "zacharyxxxxcr/codet5p-220m-bimodal-verilog-word"

tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)

BibTeX entry and citation info

@article{wang2023codet5plus,
  title={CodeT5+: Open Code Large Language Models for Code Understanding and Generation},
  author={Wang, Yue and Le, Hung and Gotmare, Akhilesh Deepak and Bui, Nghi D.Q. and Li, Junnan and Hoi, Steven C. H.},
  journal={arXiv preprint},
  year={2023}
}