--- license: other base_model: deepseek-ai/deepseek-coder-1.3b-base tags: - axolotl - generated_from_trainer model-index: - name: deepseek_coder_1.3b_typescript results: [] --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.3.0` ```yaml base_model: deepseek-ai/deepseek-coder-1.3b-base model_type: AutoModelForCausalLM trust_remote_code: true load_in_8bit: false load_in_4bit: false strict: false datasets: - path: CodeGPTPlus/typescript-0-500000-seq1024 type: completion field: text #dataset_prepared_path: #pretraining_dataset: CodeGPTPlus/typescript-0-500000-seq1024 val_set_size: 0.001 output_dir: ./fft-out sequence_len: 1024 adapter: lora_model_dir: lora_r: lora_alpha: lora_dropout: lora_target_linear: lora_fan_in_fan_out: lora_modules_to_save: wandb_project: deepseek_1.3_fft wandb_entity: wandb_watch: wandb_name: aws_a10g wandb_log_model: end gradient_accumulation_steps: 2 micro_batch_size: 20 num_epochs: 1 #max_steps: 1 # REMOVE IT optimizer: adamw_bnb_8bit adam_beta1: 0.9 adam_beta2: 0.999 adam_epsilon: 0.000001 max_grad_norm: 1.0 weight_decay: 0.1 lr_scheduler: cosine learning_rate: 0.00002 train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true loss_watchdog_threshold: 5.0 loss_watchdog_patience: 3 hub_model_id: CodeGPTPlus/deepseek_coder_1.3b_typescript hub_strategy: every_save warmup_ratio: 0.01 evals_per_epoch: 20 saves_per_epoch: 3 debug: deepspeed: fsdp: fsdp_config: special_tokens: bos_token: "<|begin▁of▁sentence|>" eos_token: "<|end▁of▁sentence|>" pad_token: "<|end▁of▁sentence|>" # fim_prefix: "<|fim▁begin|>" # fim_middle: "<|fim▁hole|>" # fim_suffix: "<|fim▁end|>" ```

# deepseek_coder_1.3b_typescript This model is a fine-tuned version of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.7681 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 20 - eval_batch_size: 20 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 40 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 261 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:-----:|:---------------:| | 1.0745 | 0.0 | 1 | 0.8681 | | 1.2267 | 0.05 | 1308 | 0.8130 | | 1.1594 | 0.1 | 2616 | 0.8018 | | 0.7674 | 0.15 | 3924 | 0.7942 | | 0.6443 | 0.2 | 5232 | 0.7889 | | 0.9155 | 0.25 | 6540 | 0.7847 | | 0.7501 | 0.3 | 7848 | 0.7819 | | 0.8835 | 0.35 | 9156 | 0.7792 | | 0.7261 | 0.4 | 10464 | 0.7769 | | 0.9746 | 0.45 | 11772 | 0.7748 | | 0.6884 | 0.5 | 13080 | 0.7734 | | 0.6104 | 0.55 | 14388 | 0.7722 | | 0.8876 | 0.6 | 15696 | 0.7710 | | 0.9567 | 0.65 | 17004 | 0.7703 | | 0.6915 | 0.7 | 18312 | 0.7696 | | 0.8874 | 0.75 | 19620 | 0.7691 | | 0.6124 | 0.8 | 20928 | 0.7686 | | 0.8147 | 0.85 | 22236 | 0.7684 | | 0.8021 | 0.9 | 23544 | 0.7683 | | 0.8665 | 0.95 | 24852 | 0.7681 | ### Framework versions - Transformers 4.37.0.dev0 - Pytorch 2.0.1+cu118 - Datasets 2.16.1 - Tokenizers 0.15.0