## Below is an example yaml for mixed precision training using DeepSpeed ZeRO Stage-3 with CPU offloading on 8 GPUs.
compute_environment: LOCAL_MACHINE +deepspeed_config: + gradient_accumulation_steps: 1 + gradient_clipping: 1.0 + offload_optimizer_device: cpu + offload_param_device: cpu + zero3_init_flag: true + zero3_save_16bit_model: true + zero_stage: 3 +distributed_type: DEEPSPEED downcast_bf16: 'no' dynamo_backend: 'NO' fsdp_config: {} machine_rank: 0 main_training_function: main megatron_lm_config: {} mixed_precision: fp16 +num_machines: 1 +num_processes: 8 rdzv_backend: static same_network: true use_cpu: false## Assume that `model` is created utilizing the `transformers` library.
from accelerate import Accelerator def main(): accelerator = Accelerator() model, optimizer, training_dataloader, scheduler = accelerator.prepare( model, optimizer, training_dataloader, scheduler ) generated_tokens = accelerator.unwrap_model(model).generate( batch["input_ids"], attention_mask=batch["attention_mask"], **gen_kwargs, + synced_gpus=True ) ... accelerator.unwrap_model(model).save_pretrained( args.output_dir, is_main_process=accelerator.is_main_process, save_function=accelerator.save, + state_dict=accelerator.get_state_dict(model) ) ...## If the YAML was generated through the `accelerate config` command: ``` accelerate launch {script_name.py} {--arg1} {--arg2} ... ``` If the YAML is saved to a `~/config.yaml` file: ``` accelerate launch --config_file ~/config.yaml {script_name.py} {--arg1} {--arg2} ... ``` Or you can use `accelerate launch` with right configuration parameters and have no `config.yaml` file: ``` accelerate launch \ --use_deepspeed \ --num_processes=8 \ --mixed_precision=fp16 \ --zero_stage=3 \ --gradient_accumulation_steps=1 \ --gradient_clipping=1 \ --zero3_init_flag=True \ --zero3_save_16bit_model=True \ --offload_optimizer_device=cpu \ --offload_param_device=cpu \ {script_name.py} {--arg1} {--arg2} ... ``` ## For core DeepSpeed features (ZeRO stages 1 and 2), Accelerate requires no code changes. For ZeRO Stage-3, `transformers`' `generate` function requires `synced_gpus=True` and `save_pretrained` requires the `state_dict` param due to the fact that model parameters are sharded across the GPUs. You can also specify values of most of the fields in the `DeepSpeed` config file to `auto` and they will be automatically filled when performing `accelerate launch`. ## To learn more checkout the related documentation: - How to use DeepSpeed DeepSpeed Config File - Accelerate Large Model Training using DeepSpeed - DeepSpeed Utilities