Vedx04
/

peft-starcoder-lora-T4

PEFT

TensorBoard

Safetensors

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

Vedx04 commited on Apr 11

Commit

f74d553

•

1 Parent(s): 31438ef

End of training

Browse files

Files changed (1) hide show

README.md +19 -46

README.md CHANGED Viewed

@@ -5,71 +5,44 @@ tags:
 - generated_from_trainer
 base_model: bigcode/starcoderbase-1b
 model-index:
-- name: Peft-starcoder-lora-P100
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Peft-starcoder-1b-Lora-P100
-WE will fintune the starcoder-1b model which is trained on 80+ programming language.
-As these models have a lot of trainable params (**1B** for this model) to tune them we require a lot of computation resources and powerful GPU so to solve this we use **PEFT** (Parameter Efficient Fine Tuning) and **LoRa** config (Low Rank Adaption) which reduces the no of trainable params by a lot so we can no tune this in our own collab notebook.
-Below is just a brief depiction on how to make your own copilot which can auto complete your code.
-To find out more about lora and peft refer to this paper
-(https://huggingface.co/docs/peft/conceptual_guides/lora)
-You can acces the notebook from here: (https://colab.research.google.com/drive/1GS4p4bFGhwq3JpHU2GhCyRFSfmujIG9n)
 ## Model description
-The model Peft-starcoder-lora-P100 is a fine tuned version of the starcodebase-1b.It is trained on the following dataset 'smangrul/hf-stack-v1'
-The model is basically a small replica of the github co-pilot it can autocomplete your python code(Machine Learning Related)
 ## Training and evaluation data
-As mentioned above the model is trained on the dataset 'smangrul/hf-stack-v1'. The dataset contains 24k rows out of which 4k was used for evaluation.The remaining rows were shuffled
-and used to train the model.
 ## Training procedure
-The training procedure for the model involved utilizing the base mode "starcodebase-1b" and fine-tuning it on the dataset "smangrul/hf-stack-v1". Due to limited computational resources, techniques such as PEFT (Parameter Efficient Fine Tuning) and LORA (Low Rank Adaptation) were employed. PEFT and LORA are methods designed to optimize the training process by efficiently utilizing parameters and adapting to low-rank structures in the data, respectively. Following the setup of hyperparameters and LORA configuration, the model underwent training on a GPU P100. This comprehensive approach ensured an effective training process despite resource constraints, ultimately enhancing the model's performance and capabilities.
 ### Training hyperparameters
-The following hyperparameters were used during training:
-  # Training arguments
- - SEQ_LENGTH = 2048
- - MAX_STEPS = 100
- - BATCH_SIZE = 4
- - GR_ACC_STEPS = 1
- - LR = 5e-4
- - LR_SCHEDULER_TYPE = "cosine"
- - WEIGHT_DECAY = 0.01
- - NUM_WARMUP_STEPS = 30
- - EVAL_FREQ = 100
- - SAVE_FREQ = 100
- - LOG_FREQ = 25
- - OUTPUT_DIR = "peft-starcoder-lora-P100"
-  # Set bf16 to true in A100
- - BF16 = False
- - FP16 = False
- - FIM_RATE=0.5
- - FIM_SPM_RATE=0.5
-  # LORA
- - LORA_R = 4
- - LORA_ALPHA = 8
- - LORA_DROPOUT = 0.1
- - LORA_TARGET_MODULES = "c_proj,c_attn,q_attn,c_fc,c_proj"
 ### Training results

 - generated_from_trainer
 base_model: bigcode/starcoderbase-1b
 model-index:
+- name: peft-starcoder-lora-T4
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# peft-starcoder-lora-T4
+This model is a fine-tuned version of [bigcode/starcoderbase-1b](https://huggingface.co/bigcode/starcoderbase-1b) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.9165
 ## Model description
+More information needed
+## Intended uses & limitations
+More information needed
 ## Training and evaluation data
+More information needed
 ## Training procedure
 ### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 4
+- eval_batch_size: 4
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 30
+- training_steps: 100
 ### Training results