Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,54 @@ language:
|
|
10 |
*Update: Having a bit issue, still figuring things out.
|
11 |
|
12 |
|
13 |
-
Reproduce Vicuna, but based on yi-6B.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
|
15 |
We can see from some preliminary results, the conversation is natural and informative (unsurprisingly), also we observe the unfiltering seems to be working!
|
16 |
|
|
|
10 |
*Update: Having a bit issue, still figuring things out.
|
11 |
|
12 |
|
13 |
+
Reproduce Vicuna, but based on yi-6B. The training data I used was ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json.
|
14 |
+
|
15 |
+
Hyper parameters:
|
16 |
+
```
|
17 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3,5 torchrun --nproc_per_node 5 ../supervised_finetuning.py \
|
18 |
+
--model_type auto \
|
19 |
+
--model_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \
|
20 |
+
--tokenizer_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \
|
21 |
+
--train_file_dir ../data/finetune/vicuna/ \
|
22 |
+
--per_device_train_batch_size 2\
|
23 |
+
--do_train \
|
24 |
+
--max_train_samples -1 \
|
25 |
+
--num_train_epochs 3 \
|
26 |
+
--learning_rate 2e-5 \
|
27 |
+
--weight_decay 0. \
|
28 |
+
--bf16 \
|
29 |
+
--use_peft False \
|
30 |
+
--logging_strategy steps \
|
31 |
+
--logging_steps 10 \
|
32 |
+
--save_strategy epoch \
|
33 |
+
--save_total_limit 5 \
|
34 |
+
--gradient_accumulation_steps 1 \
|
35 |
+
--preprocessing_num_workers 8 \
|
36 |
+
--output_dir ../outputs/20240106_yi6B_vicuna \
|
37 |
+
--overwrite_output_dir \
|
38 |
+
--ddp_timeout 30000 \
|
39 |
+
--logging_first_step True \
|
40 |
+
--torch_dtype bfloat16 \
|
41 |
+
--device_map auto \
|
42 |
+
--report_to tensorboard \
|
43 |
+
--ddp_find_unused_parameters False \
|
44 |
+
--gradient_checkpointing True \
|
45 |
+
--cache_dir ./cache \
|
46 |
+
--model_max_length 4096 \
|
47 |
+
--deepspeed ../deepspeed_zero_stage2_config_no16.json \
|
48 |
+
--template_name yi
|
49 |
+
```
|
50 |
+
|
51 |
+
The training used 5*A800 for 3 epochs
|
52 |
+
```
|
53 |
+
***** train metrics *****
|
54 |
+
epoch = 3.0
|
55 |
+
train_loss = 0.3785
|
56 |
+
train_runtime = 1 day, 10:01:13.95
|
57 |
+
train_samples = 93204
|
58 |
+
train_samples_per_second = 2.24
|
59 |
+
train_steps_per_second = 0.224
|
60 |
+
```
|
61 |
|
62 |
We can see from some preliminary results, the conversation is natural and informative (unsurprisingly), also we observe the unfiltering seems to be working!
|
63 |
|