Vedx04 commited on
Commit
f74d553
1 Parent(s): 31438ef

End of training

Browse files
Files changed (1) hide show
  1. README.md +19 -46
README.md CHANGED
@@ -5,71 +5,44 @@ tags:
5
  - generated_from_trainer
6
  base_model: bigcode/starcoderbase-1b
7
  model-index:
8
- - name: Peft-starcoder-lora-P100
9
  results: []
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
- # Peft-starcoder-1b-Lora-P100
16
-
17
- WE will fintune the starcoder-1b model which is trained on 80+ programming language.
18
-
19
- As these models have a lot of trainable params (**1B** for this model) to tune them we require a lot of computation resources and powerful GPU so to solve this we use **PEFT** (Parameter Efficient Fine Tuning) and **LoRa** config (Low Rank Adaption) which reduces the no of trainable params by a lot so we can no tune this in our own collab notebook.
20
- Below is just a brief depiction on how to make your own copilot which can auto complete your code.
21
-
22
- To find out more about lora and peft refer to this paper
23
- (https://huggingface.co/docs/peft/conceptual_guides/lora)
24
-
25
- You can acces the notebook from here: (https://colab.research.google.com/drive/1GS4p4bFGhwq3JpHU2GhCyRFSfmujIG9n)
26
 
 
 
 
27
 
28
  ## Model description
29
 
30
- The model Peft-starcoder-lora-P100 is a fine tuned version of the starcodebase-1b.It is trained on the following dataset 'smangrul/hf-stack-v1'
31
- The model is basically a small replica of the github co-pilot it can autocomplete your python code(Machine Learning Related)
 
32
 
 
33
 
34
  ## Training and evaluation data
35
 
36
- As mentioned above the model is trained on the dataset 'smangrul/hf-stack-v1'. The dataset contains 24k rows out of which 4k was used for evaluation.The remaining rows were shuffled
37
- and used to train the model.
38
 
39
  ## Training procedure
40
 
41
- The training procedure for the model involved utilizing the base mode "starcodebase-1b" and fine-tuning it on the dataset "smangrul/hf-stack-v1". Due to limited computational resources, techniques such as PEFT (Parameter Efficient Fine Tuning) and LORA (Low Rank Adaptation) were employed. PEFT and LORA are methods designed to optimize the training process by efficiently utilizing parameters and adapting to low-rank structures in the data, respectively. Following the setup of hyperparameters and LORA configuration, the model underwent training on a GPU P100. This comprehensive approach ensured an effective training process despite resource constraints, ultimately enhancing the model's performance and capabilities.
42
-
43
  ### Training hyperparameters
44
 
45
- The following hyperparameters were used during training:
46
-
47
- # Training arguments
48
- - SEQ_LENGTH = 2048
49
- - MAX_STEPS = 100
50
- - BATCH_SIZE = 4
51
- - GR_ACC_STEPS = 1
52
- - LR = 5e-4
53
- - LR_SCHEDULER_TYPE = "cosine"
54
- - WEIGHT_DECAY = 0.01
55
- - NUM_WARMUP_STEPS = 30
56
- - EVAL_FREQ = 100
57
- - SAVE_FREQ = 100
58
- - LOG_FREQ = 25
59
- - OUTPUT_DIR = "peft-starcoder-lora-P100"
60
-
61
- # Set bf16 to true in A100
62
- - BF16 = False
63
- - FP16 = False
64
-
65
- - FIM_RATE=0.5
66
- - FIM_SPM_RATE=0.5
67
-
68
- # LORA
69
- - LORA_R = 4
70
- - LORA_ALPHA = 8
71
- - LORA_DROPOUT = 0.1
72
- - LORA_TARGET_MODULES = "c_proj,c_attn,q_attn,c_fc,c_proj"
73
 
74
  ### Training results
75
 
 
5
  - generated_from_trainer
6
  base_model: bigcode/starcoderbase-1b
7
  model-index:
8
+ - name: peft-starcoder-lora-T4
9
  results: []
10
  ---
11
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
+ # peft-starcoder-lora-T4
 
 
 
 
 
 
 
 
 
 
16
 
17
+ This model is a fine-tuned version of [bigcode/starcoderbase-1b](https://huggingface.co/bigcode/starcoderbase-1b) on an unknown dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 0.9165
20
 
21
  ## Model description
22
 
23
+ More information needed
24
+
25
+ ## Intended uses & limitations
26
 
27
+ More information needed
28
 
29
  ## Training and evaluation data
30
 
31
+ More information needed
 
32
 
33
  ## Training procedure
34
 
 
 
35
  ### Training hyperparameters
36
 
37
+ The following hyperparameters were used during training:
38
+ - learning_rate: 0.0005
39
+ - train_batch_size: 4
40
+ - eval_batch_size: 4
41
+ - seed: 42
42
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
43
+ - lr_scheduler_type: cosine
44
+ - lr_scheduler_warmup_steps: 30
45
+ - training_steps: 100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ### Training results
48