End of training
Browse files
README.md
CHANGED
@@ -5,71 +5,44 @@ tags:
|
|
5 |
- generated_from_trainer
|
6 |
base_model: bigcode/starcoderbase-1b
|
7 |
model-index:
|
8 |
-
- name:
|
9 |
results: []
|
10 |
---
|
11 |
|
12 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
13 |
should probably proofread and complete it, then remove this comment. -->
|
14 |
|
15 |
-
#
|
16 |
-
|
17 |
-
WE will fintune the starcoder-1b model which is trained on 80+ programming language.
|
18 |
-
|
19 |
-
As these models have a lot of trainable params (**1B** for this model) to tune them we require a lot of computation resources and powerful GPU so to solve this we use **PEFT** (Parameter Efficient Fine Tuning) and **LoRa** config (Low Rank Adaption) which reduces the no of trainable params by a lot so we can no tune this in our own collab notebook.
|
20 |
-
Below is just a brief depiction on how to make your own copilot which can auto complete your code.
|
21 |
-
|
22 |
-
To find out more about lora and peft refer to this paper
|
23 |
-
(https://huggingface.co/docs/peft/conceptual_guides/lora)
|
24 |
-
|
25 |
-
You can acces the notebook from here: (https://colab.research.google.com/drive/1GS4p4bFGhwq3JpHU2GhCyRFSfmujIG9n)
|
26 |
|
|
|
|
|
|
|
27 |
|
28 |
## Model description
|
29 |
|
30 |
-
|
31 |
-
|
|
|
32 |
|
|
|
33 |
|
34 |
## Training and evaluation data
|
35 |
|
36 |
-
|
37 |
-
and used to train the model.
|
38 |
|
39 |
## Training procedure
|
40 |
|
41 |
-
The training procedure for the model involved utilizing the base mode "starcodebase-1b" and fine-tuning it on the dataset "smangrul/hf-stack-v1". Due to limited computational resources, techniques such as PEFT (Parameter Efficient Fine Tuning) and LORA (Low Rank Adaptation) were employed. PEFT and LORA are methods designed to optimize the training process by efficiently utilizing parameters and adapting to low-rank structures in the data, respectively. Following the setup of hyperparameters and LORA configuration, the model underwent training on a GPU P100. This comprehensive approach ensured an effective training process despite resource constraints, ultimately enhancing the model's performance and capabilities.
|
42 |
-
|
43 |
### Training hyperparameters
|
44 |
|
45 |
-
The following hyperparameters were used during training:
|
46 |
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
54 |
-
- WEIGHT_DECAY = 0.01
|
55 |
-
- NUM_WARMUP_STEPS = 30
|
56 |
-
- EVAL_FREQ = 100
|
57 |
-
- SAVE_FREQ = 100
|
58 |
-
- LOG_FREQ = 25
|
59 |
-
- OUTPUT_DIR = "peft-starcoder-lora-P100"
|
60 |
-
|
61 |
-
# Set bf16 to true in A100
|
62 |
-
- BF16 = False
|
63 |
-
- FP16 = False
|
64 |
-
|
65 |
-
- FIM_RATE=0.5
|
66 |
-
- FIM_SPM_RATE=0.5
|
67 |
-
|
68 |
-
# LORA
|
69 |
-
- LORA_R = 4
|
70 |
-
- LORA_ALPHA = 8
|
71 |
-
- LORA_DROPOUT = 0.1
|
72 |
-
- LORA_TARGET_MODULES = "c_proj,c_attn,q_attn,c_fc,c_proj"
|
73 |
|
74 |
### Training results
|
75 |
|
|
|
5 |
- generated_from_trainer
|
6 |
base_model: bigcode/starcoderbase-1b
|
7 |
model-index:
|
8 |
+
- name: peft-starcoder-lora-T4
|
9 |
results: []
|
10 |
---
|
11 |
|
12 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
13 |
should probably proofread and complete it, then remove this comment. -->
|
14 |
|
15 |
+
# peft-starcoder-lora-T4
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
+
This model is a fine-tuned version of [bigcode/starcoderbase-1b](https://huggingface.co/bigcode/starcoderbase-1b) on an unknown dataset.
|
18 |
+
It achieves the following results on the evaluation set:
|
19 |
+
- Loss: 0.9165
|
20 |
|
21 |
## Model description
|
22 |
|
23 |
+
More information needed
|
24 |
+
|
25 |
+
## Intended uses & limitations
|
26 |
|
27 |
+
More information needed
|
28 |
|
29 |
## Training and evaluation data
|
30 |
|
31 |
+
More information needed
|
|
|
32 |
|
33 |
## Training procedure
|
34 |
|
|
|
|
|
35 |
### Training hyperparameters
|
36 |
|
37 |
+
The following hyperparameters were used during training:
|
38 |
+
- learning_rate: 0.0005
|
39 |
+
- train_batch_size: 4
|
40 |
+
- eval_batch_size: 4
|
41 |
+
- seed: 42
|
42 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
43 |
+
- lr_scheduler_type: cosine
|
44 |
+
- lr_scheduler_warmup_steps: 30
|
45 |
+
- training_steps: 100
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
|
47 |
### Training results
|
48 |
|