metadata

license: bsd-3-clause

codgen-16B-action

codgen-16B-action is a 16 billion parameter model used for api based action generation. It is instruction tuned from codegen-16B-mono on api based action generation datasets.

Model Details

Model Description

Developed by: SambaNova Systems
Model type: Language Model
Language(s): English
License:
Finetuned from model: codegen-16B-mono

Basic Information

Paper: [Link]
Github: [Link]

Licensing

TBD

Uses

Click to expand

Direct Use

This model is intended for commercial and research use.

Out-of-Scope Use

codgen-16B-action should NOT be used for purpose other than API based action generation.

Recommendations

Users should be made aware of the risks, biases, limitations, and restrictions of the model, which are listed down at the bottom of the page.

How to Get Started with the Model

Click to expand

Loading in model with Huggingface

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("sambanovasystems/codegen-16b-action")
model = AutoModelForCausalLM.from_pretrained("sambanovasystems/codegen-16b-action", device_map="auto", torch_dtype="auto")

Suggested Inference Parameters

do_sample: False

Training Details

Click to expand

Training Data

Fenglu to add

Training Procedure

We trained codegen-16b-action on 4 80GB A100 gpu's. We started from codegen-16B-mono. We finetuned it on XXX dataset. All of the code used to prepare the datasets and the scripts to run training and inference are open-sourced and freely available at [githublink here](dummy link)

Prompting Style Used For Training

Hyperparameters

Hardware: A100 GPU
Optimizer: AdamW
Grad accumulation: 1
Epochs: 8
Global Batch size: 16
Batch tokens: 16 * 2048 = 32,768 tokens
Learning Rate: 1e-5
Learning Rate Scheduler: Fixed LR
Weight decay: 0.1

Instruction-tuned Training on Dolly 2.0 and Oasst1

Hardware: SambaNova Reconfigurable Dataflow Unit (RDU)
Optimizer: AdamW
Grad accumulation: 1
Epochs: 3
Global Batch size: 128
Batch tokens: 128 * 2048 = 262,144 tokens
Learning Rate: 1e-5
Learning Rate Scheduler: Cosine Schedule with Warmup
Warmup Steps: 0
End Learning Ratio: 0.1
Weight decay: 0.1

Acknowledgment

Cite codegen-16b-action

@software{bloomchat,
  title = {{BLOOMChat: a New Open Multilingual Chat LLM}},
  author = {SambaNova Systems, Together Computer},
  url = {https://huggingface.co/sambanovasystems/BLOOMChat-176B-v1}
  month = {5},
  year = {2023},
  version = {1.0},
}

sambanovasystems
/

codegen-16B-mono-toolbench

codgen-16B-action

Model Details

Model Description

Basic Information

Licensing

Uses

Direct Use

Out-of-Scope Use

Recommendations

How to Get Started with the Model

Loading in model with Huggingface

Suggested Inference Parameters

Suggested Prompts To Try in GPU Tutorial

Training Details

Training Data

Training Procedure

Prompting Style Used For Training

Hyperparameters

Acknowledgment

Cite codegen-16b-action