--- license: bsd-3-clause --- # codgen-16B-action codgen-16B-action is a 16 billion parameter model used for api based action generation. It is instruction tuned from [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) on api based action generation datasets. ## Model Details ### Model Description - **Developed by:** [SambaNova Systems](https://sambanova.ai/) - **Model type:** Language Model - **Language(s):** English - **License:** - **Finetuned from model:** [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono) ### Basic Information - **Paper**: [Link] - **Github**: [Link] ### Licensing TBD ## Uses
Click to expand ### Direct Use This model is intended for commercial and research use. ### Out-of-Scope Use codgen-16B-action should NOT be used for purpose other than API based action generation. ### Recommendations Users should be made aware of the risks, biases, limitations, and restrictions of the model, which are listed down at the bottom of the page.
--- ## How to Get Started with the Model
Click to expand ### Loading in model with Huggingface ```python from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/codegen-16b-action") model = AutoModelForCausalLM.from_pretrained("sambanovasystems/codegen-16b-action", device_map="auto", torch_dtype="auto") ``` ### Suggested Inference Parameters - do_sample: False ### Suggested Prompts To Try in GPU Tutorial ``` Input text: Fenglu, can you add some? ``` ``` Input text: Fenglu, can you add some? ``` ``` Input text: 十七岁的风是什么颜色的? ```
--- ## Training Details
Click to expand ### Training Data - [Fenglu to add](https://huggingface.co/datasets/laion/OIG) ### Training Procedure We trained codegen-16b-action on 4 80GB A100 gpu's. We started from [codegen-16B-mono](https://huggingface.co/Salesforce/codegen-16B-mono). We finetuned it on XXX dataset. All of the code used to prepare the datasets and the scripts to run training and inference are open-sourced and freely available at [githublink here](dummy link) ### Prompting Style Used For Training ``` ``` ### Hyperparameters - Hardware: A100 GPU - Optimizer: AdamW - Grad accumulation: 1 - Epochs: 8 - Global Batch size: 16 - Batch tokens: 16 * 2048 = 32,768 tokens - Learning Rate: 1e-5 - Learning Rate Scheduler: Fixed LR - Weight decay: 0.1 **Instruction-tuned Training on Dolly 2.0 and Oasst1** - Hardware: SambaNova Reconfigurable Dataflow Unit (RDU) - Optimizer: AdamW - Grad accumulation: 1 - Epochs: 3 - Global Batch size: 128 - Batch tokens: 128 * 2048 = 262,144 tokens - Learning Rate: 1e-5 - Learning Rate Scheduler: Cosine Schedule with Warmup - Warmup Steps: 0 - End Learning Ratio: 0.1 - Weight decay: 0.1
## Acknowledgment ## Cite codegen-16b-action ``` @software{bloomchat, title = {{BLOOMChat: a New Open Multilingual Chat LLM}}, author = {SambaNova Systems, Together Computer}, url = {https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1} month = {5}, year = {2023}, version = {1.0}, } ```