lorinma commited on
Commit
dd7eb79
1 Parent(s): adcf38b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -1
README.md CHANGED
@@ -10,7 +10,54 @@ language:
10
  *Update: Having a bit issue, still figuring things out.
11
 
12
 
13
- Reproduce Vicuna, but based on yi-6B.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  We can see from some preliminary results, the conversation is natural and informative (unsurprisingly), also we observe the unfiltering seems to be working!
16
 
 
10
  *Update: Having a bit issue, still figuring things out.
11
 
12
 
13
+ Reproduce Vicuna, but based on yi-6B. The training data I used was ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json.
14
+
15
+ Hyper parameters:
16
+ ```
17
+ CUDA_VISIBLE_DEVICES=0,1,2,3,5 torchrun --nproc_per_node 5 ../supervised_finetuning.py \
18
+ --model_type auto \
19
+ --model_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \
20
+ --tokenizer_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \
21
+ --train_file_dir ../data/finetune/vicuna/ \
22
+ --per_device_train_batch_size 2\
23
+ --do_train \
24
+ --max_train_samples -1 \
25
+ --num_train_epochs 3 \
26
+ --learning_rate 2e-5 \
27
+ --weight_decay 0. \
28
+ --bf16 \
29
+ --use_peft False \
30
+ --logging_strategy steps \
31
+ --logging_steps 10 \
32
+ --save_strategy epoch \
33
+ --save_total_limit 5 \
34
+ --gradient_accumulation_steps 1 \
35
+ --preprocessing_num_workers 8 \
36
+ --output_dir ../outputs/20240106_yi6B_vicuna \
37
+ --overwrite_output_dir \
38
+ --ddp_timeout 30000 \
39
+ --logging_first_step True \
40
+ --torch_dtype bfloat16 \
41
+ --device_map auto \
42
+ --report_to tensorboard \
43
+ --ddp_find_unused_parameters False \
44
+ --gradient_checkpointing True \
45
+ --cache_dir ./cache \
46
+ --model_max_length 4096 \
47
+ --deepspeed ../deepspeed_zero_stage2_config_no16.json \
48
+ --template_name yi
49
+ ```
50
+
51
+ The training used 5*A800 for 3 epochs
52
+ ```
53
+ ***** train metrics *****
54
+ epoch = 3.0
55
+ train_loss = 0.3785
56
+ train_runtime = 1 day, 10:01:13.95
57
+ train_samples = 93204
58
+ train_samples_per_second = 2.24
59
+ train_steps_per_second = 0.224
60
+ ```
61
 
62
  We can see from some preliminary results, the conversation is natural and informative (unsurprisingly), also we observe the unfiltering seems to be working!
63