|
--- |
|
language: |
|
- en |
|
license: mit |
|
tags: |
|
- code |
|
- data science |
|
datasets: |
|
- ed001/ds-coder-instruct-v2 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# datagemma-2b |
|
|
|
The datagemma-2b is a model designated for data science code generation from natural language instruction. It is fine-tuned from codegemma-2b model. Fine tuning was performed on the [ed001/ds-coder-instruct-v2](https://huggingface.co/datasets/ed001/ds-coder-instruct-v2) dataset which is constructed by filtering publicly available datasets on HuggingFace. |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
"ed001/datagemma-2b", |
|
low_cpu_mem_usage=True |
|
).cuda() |
|
|
|
# Reload tokenizer to save it |
|
tokenizer = AutoTokenizer.from_pretrained("ed001/datagemma-2b", trust_remote_code=True) |
|
tokenizer.padding_side = "right" |
|
|
|
prompt_template = "### Question: {}\n ### Answer: " |
|
generation_config = GenerationConfig(max_new_tokens=512, top_p=0.5, do_sample=True, repetition_penalty=1) |
|
prompt = "How can I profile speed of my neural network using PyTorch?" |
|
input = tokenizer(prompt_template.format(prompt), return_tensors="pt").to(model.device)["input_ids"] |
|
|
|
print(tokenizer.decode(model.generate(input, generation_config=generation_config)[0])) |
|
``` |
|
|
|
## Training Details |
|
lora_r: 32 |
|
lora_alpha: 16 |
|
lora_dropout: 0.05 |
|
target_modules: q, k, v, o, gate_proj, down_proj, up_proj |
|
weight_decay: 0 |
|
optmizer: paged_adamw_8bit |
|
lr: 1e-4 |
|
lr_scheduler: cosine |
|
max_seq_len: 1536 |
|
batch_size: 1 |
|
grad_acc: 4 |
|
max_grad_norm: 0.5 |
|
warmup_ratio: 0.05 |
|
num_epochs: 1 |
|
|
|
## Contact |
|
GitHub: [Ea0011](https://github.com/Ea0011) |
|
|