File size: 2,720 Bytes
9ca0ef7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
frameworks:
- Pytorch
license: apache-2.0
tasks:
- text-generation

#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt

#domain:
##如 nlp、cv、audio、multi-modal
#- nlp

#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn 

#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr

#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained

#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
---
Fine-tuning the llama3-8b-instruct model using the [msagent-pro](https://modelscope.cn/datasets/iic/MSAgent-Pro/summary) dataset and the loss_scale technique with [swift](https://github.com/modelscope/swift), the script is as follows:
```bash
NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
MASTER_PORT=29500 \
swift sft \
    --model_type llama3-8b-instruct \
    --learning_rate 2e-5 \
    --sft_type lora \
    --dataset msagent-pro \
    --gradient_checkpointing true \
    --gradient_accumulation_steps 8 \
    --deepspeed default-zero3 \
    --lora_target_modules ALL \
    --use_loss_scale true \
    --save_strategy epoch \
    --batch_size 1 \
    --num_train_epochs 2 \
    --max_length 4096 \
    --preprocess_num_proc 4 \
    --use_loss_scale true \
    --loss_scale_config_path agent-flan \
    --ddp_backend nccl \
```

Comparison with the Original Model on the ToolBench Evaluation Set

| Model                   | ToolBench (in-domain)                        |       |       |       |       | ToolBench (out-of-domain)                  |       |       |       |
|-------------------------|----------------------------------------------|-------|-------|-------|-------|--------------------------------------------|-------|-------|-------|
|                         | Plan.EM                                      | Act.EM| HalluRate (lower is better) | Avg.F1 | R-L   | Plan.EM                                   | Act.EM| HalluRate (lower is better) | Avg.F1 | R-L   |
| llama3-8b-instruct      | 74.22                                        | 36.17 | 15.68                     | 20.0  | 12.14 | 69.47                                     | 34.21 | 14.72                  | 20.25  | 14.07 |
| llama3-8b-agent-instruct-v2   | **85.15**                                        | **58.1** | **1.57**                     | **52.10** | **26.02** | **85.79**                                     | **59.43** | **2.56**                  | **52.19**  | **31.43** |

For detailed explanations of the evaluation metrics, please refer to [document](https://github.com/modelscope/eval-scope/tree/main/llmuses/third_party/toolbench_static)