--- license: other base_model: deepseek-ai/deepseek-coder-1.3b-base tags: - axolotl - generated_from_trainer model-index: - name: deepseek-coder-1.3b-typescript results: [] datasets: - bigcode/the-stack-dedup widget: - text: "class Person {\n constructor(public name:" example_title: "class" - text: "function quickSort" example_title: "function" ---

CodeGPT: DeepSeek Coder - Typescript

[CodeGPT.co] | [šŸ¦™ Ollama] | [Discord] | [VSCode Extension]


[Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.3.0` ```yaml base_model: deepseek-ai/deepseek-coder-1.3b-base model_type: AutoModelForCausalLM trust_remote_code: true load_in_8bit: false load_in_4bit: false strict: false datasets: - path: CodeGPTPlus/typescript-0-500000-seq1024 type: completion field: text val_set_size: 0.001 output_dir: ./fft-out sequence_len: 1024 adapter: lora_model_dir: lora_r: lora_alpha: lora_dropout: lora_target_linear: lora_fan_in_fan_out: lora_modules_to_save: wandb_project: deepseek_1.3_fft wandb_entity: wandb_watch: wandb_name: aws_a10g wandb_log_model: end gradient_accumulation_steps: 2 micro_batch_size: 20 num_epochs: 1 optimizer: adamw_bnb_8bit adam_beta1: 0.9 adam_beta2: 0.999 adam_epsilon: 0.000001 max_grad_norm: 1.0 weight_decay: 0.1 lr_scheduler: cosine learning_rate: 0.00002 train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true loss_watchdog_threshold: 5.0 loss_watchdog_patience: 3 hub_model_id: CodeGPTPlus/deepseek_coder_1.3b_typescript hub_strategy: every_save warmup_ratio: 0.01 evals_per_epoch: 20 saves_per_epoch: 3 debug: deepspeed: fsdp: fsdp_config: special_tokens: bos_token: "<ļ½œbeginā–ofā–sentenceļ½œ>" eos_token: "<ļ½œendā–ofā–sentenceļ½œ>" pad_token: "<ļ½œendā–ofā–sentenceļ½œ>" ```

# deepseek-coder-1.3b-typescript CodeGPTPlus/deepseek-coder-1.3b-typescript, emerges as a fine-tuned iteration of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base), meticulously crafted by the CodeGPT team to excel in generating expert code in TypeScript. With specific fine-tuning for TypeScript and a dataset of 0.5B tokens, this model excels in producing precise and efficient solutions in this programming language. The 16K window size and an additional fill-in-the-middle task are employed to deliver project-level code completion. This new model stands as the ideal choice for those seeking a specialized code generator for TypeScript, backed by the expertise of the CodeGPT team. It achieves the following results on the evaluation set: - Loss: 0.7681 **Model Developers** CodeGPT Team **Variations** 1.3B **Input** Models input text only. **Output** Models generate text only. ## How to Use This model is for completion purposes only. Here give some examples of how to use the model. #### Running the model on a GPU ```python from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", trust_remote_code=True).cuda() input_text = """<ļ½œfimā–beginļ½œ>function quickSort(arr: number[]): number[] { if (arr.length <= 1) { return arr; } const pivot = arr[0]; const left = []; const right = []; <ļ½œfimā–holeļ½œ> return [...quickSort(left), pivot, ...quickSort(right)]; }<ļ½œfimā–endļ½œ>""" inputs = tokenizer(input_text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_length=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ### Fill In the Middle (FIM) ```python <ļ½œfimā–beginļ½œ>function quickSort(arr: number[]): number[] { if (arr.length <= 1) { return arr; } const pivot = arr[0]; const left = []; const right = []; <ļ½œfimā–holeļ½œ> return [...quickSort(left), pivot, ...quickSort(right)]; }<ļ½œfimā–endļ½œ> ``` ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 20 - eval_batch_size: 20 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 40 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 261 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:| | 1.0745 | 0.0 | 1 | 0.8681 | | 1.2267 | 0.05 | 1308 | 0.8130 | | 1.1594 | 0.1 | 2616 | 0.8018 | | 0.7674 | 0.15 | 3924 | 0.7942 | | 0.6443 | 0.2 | 5232 | 0.7889 | | 0.9155 | 0.25 | 6540 | 0.7847 | | 0.7501 | 0.3 | 7848 | 0.7819 | | 0.8835 | 0.35 | 9156 | 0.7792 | | 0.7261 | 0.4 | 10464 | 0.7769 | | 0.9746 | 0.45 | 11772 | 0.7748 | | 0.6884 | 0.5 | 13080 | 0.7734 | | 0.6104 | 0.55 | 14388 | 0.7722 | | 0.8876 | 0.6 | 15696 | 0.7710 | | 0.9567 | 0.65 | 17004 | 0.7703 | | 0.6915 | 0.7 | 18312 | 0.7696 | | 0.8874 | 0.75 | 19620 | 0.7691 | | 0.6124 | 0.8 | 20928 | 0.7686 | | 0.8147 | 0.85 | 22236 | 0.7684 | | 0.8021 | 0.9 | 23544 | 0.7683 | | 0.8665 | 0.95 | 24852 | 0.7681 | ### Framework versions - Transformers 4.37.0.dev0 - Pytorch 2.0.1+cu118 - Datasets 2.16.1 - Tokenizers 0.15.0