File size: 2,720 Bytes
9ca0ef7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
frameworks:
- Pytorch
license: apache-2.0
tasks:
- text-generation
#model-type:
##如 gpt、phi、llama、chatglm、baichuan 等
#- gpt
#domain:
##如 nlp、cv、audio、multi-modal
#- nlp
#language:
##语言代码列表 https://help.aliyun.com/document_detail/215387.html?spm=a2c4g.11186623.0.0.9f8d7467kni6Aa
#- cn
#metrics:
##如 CIDEr、Blue、ROUGE 等
#- CIDEr
#tags:
##各种自定义,包括 pretrained、fine-tuned、instruction-tuned、RL-tuned 等训练方法和其他
#- pretrained
#tools:
##如 vllm、fastchat、llamacpp、AdaSeq 等
#- vllm
---
Fine-tuning the llama3-8b-instruct model using the [msagent-pro](https://modelscope.cn/datasets/iic/MSAgent-Pro/summary) dataset and the loss_scale technique with [swift](https://github.com/modelscope/swift), the script is as follows:
```bash
NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
MASTER_PORT=29500 \
swift sft \
--model_type llama3-8b-instruct \
--learning_rate 2e-5 \
--sft_type lora \
--dataset msagent-pro \
--gradient_checkpointing true \
--gradient_accumulation_steps 8 \
--deepspeed default-zero3 \
--lora_target_modules ALL \
--use_loss_scale true \
--save_strategy epoch \
--batch_size 1 \
--num_train_epochs 2 \
--max_length 4096 \
--preprocess_num_proc 4 \
--use_loss_scale true \
--loss_scale_config_path agent-flan \
--ddp_backend nccl \
```
Comparison with the Original Model on the ToolBench Evaluation Set
| Model | ToolBench (in-domain) | | | | | ToolBench (out-of-domain) | | | |
|-------------------------|----------------------------------------------|-------|-------|-------|-------|--------------------------------------------|-------|-------|-------|
| | Plan.EM | Act.EM| HalluRate (lower is better) | Avg.F1 | R-L | Plan.EM | Act.EM| HalluRate (lower is better) | Avg.F1 | R-L |
| llama3-8b-instruct | 74.22 | 36.17 | 15.68 | 20.0 | 12.14 | 69.47 | 34.21 | 14.72 | 20.25 | 14.07 |
| llama3-8b-agent-instruct-v2 | **85.15** | **58.1** | **1.57** | **52.10** | **26.02** | **85.79** | **59.43** | **2.56** | **52.19** | **31.43** |
For detailed explanations of the evaluation metrics, please refer to [document](https://github.com/modelscope/eval-scope/tree/main/llmuses/third_party/toolbench_static)
|