lorinma
/

yi6B_Vicuna

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lorinma commited on Jan 8

Commit

dd7eb79

•

1 Parent(s): adcf38b

Update README.md

Files changed (1) hide show

README.md +48 -1

README.md CHANGED Viewed

@@ -10,7 +10,54 @@ language:
 *Update: Having a bit issue, still figuring things out.
-Reproduce Vicuna, but based on yi-6B.
 We can see from some preliminary results, the conversation is natural and informative (unsurprisingly), also we observe the unfiltering seems to be working!

 *Update: Having a bit issue, still figuring things out.
+Reproduce Vicuna, but based on yi-6B. The training data I used was ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json.
+Hyper parameters:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3,5 torchrun --nproc_per_node 5 ../supervised_finetuning.py \
+    --model_type auto \
+    --model_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \
+    --tokenizer_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \
+    --train_file_dir ../data/finetune/vicuna/ \
+    --per_device_train_batch_size 2\
+    --do_train \
+    --max_train_samples -1 \
+    --num_train_epochs 3 \
+    --learning_rate 2e-5 \
+    --weight_decay 0. \
+    --bf16 \
+    --use_peft False \
+    --logging_strategy steps \
+    --logging_steps 10 \
+    --save_strategy epoch \
+    --save_total_limit 5 \
+    --gradient_accumulation_steps 1 \
+    --preprocessing_num_workers 8 \
+    --output_dir ../outputs/20240106_yi6B_vicuna \
+    --overwrite_output_dir \
+    --ddp_timeout 30000 \
+    --logging_first_step True \
+    --torch_dtype bfloat16 \
+    --device_map auto \
+    --report_to tensorboard \
+    --ddp_find_unused_parameters False \
+    --gradient_checkpointing True \
+    --cache_dir ./cache \
+    --model_max_length 4096 \
+    --deepspeed ../deepspeed_zero_stage2_config_no16.json \
+    --template_name yi
+```
+The training used 5*A800 for 3 epochs
+```
+***** train metrics *****
+  epoch                    =                3.0
+  train_loss               =             0.3785
+  train_runtime            = 1 day, 10:01:13.95
+  train_samples            =              93204
+  train_samples_per_second =               2.24
+  train_steps_per_second   =              0.224
+```
 We can see from some preliminary results, the conversation is natural and informative (unsurprisingly), also we observe the unfiltering seems to be working!