/usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead. warnings.warn( /usr/local/lib/python3.9/dist-packages/torchvision/transforms/functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional. warnings.warn( /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") [2024-03-15 05:24:52,773] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 05:24:52,808] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 05:24:52,840] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 05:24:52,922] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 05:24:53,113] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 05:24:53,135] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 05:24:53,160] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 05:24:53,249] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-03-15 05:24:53,333] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-15 05:24:53,333] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-15 05:24:53,368] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-15 05:24:53,369] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`. [2024-03-15 05:24:53,451] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-15 05:24:53,644] [INFO] [comm.py:637:init_distributed] cdb=None [2024-03-15 05:24:53,660] [INFO] [comm.py:637:init_distributed] cdb=None Loading checkpoint shards: 0%| | 0/4 [00:00 2024-03-15 05:25:20.229 n213-019-134:3514423:3514423 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-15 05:25:20.238 n213-019-134:3514423:3514423 [0] NCCL INFO cudaDriverVersion 12010 NCCL version 2.19.3+cuda12.1 2024-03-15 05:25:20.242 n213-019-134:3514428:3514428 [4] NCCL INFO cudaDriverVersion 12010 2024-03-15 05:25:20.242 n213-019-134:3514428:3514428 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.243 n213-019-134:3514428:3514428 [4] NCCL INFO Bootstrap : Using eth0:10.213.19.134<0> 2024-03-15 05:25:20.246 n213-019-134:3514426:3514426 [3] NCCL INFO cudaDriverVersion 12010 2024-03-15 05:25:20.246 n213-019-134:3514426:3514426 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.247 n213-019-134:3514423:3515732 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-15 05:25:20.247 n213-019-134:3514426:3514426 [3] NCCL INFO Bootstrap : Using eth0:10.213.19.134<0> 2024-03-15 05:25:20.250 n213-019-134:3514431:3514431 [7] NCCL INFO cudaDriverVersion 12010 2024-03-15 05:25:20.250 n213-019-134:3514431:3514431 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.251 n213-019-134:3514431:3514431 [7] NCCL INFO Bootstrap : Using eth0:10.213.19.134<0> 2024-03-15 05:25:20.257 n213-019-134:3514429:3514429 [5] NCCL INFO cudaDriverVersion 12010 2024-03-15 05:25:20.257 n213-019-134:3514429:3514429 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.258 n213-019-134:3514429:3514429 [5] NCCL INFO Bootstrap : Using eth0:10.213.19.134<0> 2024-03-15 05:25:20.259 n213-019-134:3514425:3514425 [2] NCCL INFO cudaDriverVersion 12010 2024-03-15 05:25:20.259 n213-019-134:3514425:3514425 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.260 n213-019-134:3514425:3514425 [2] NCCL INFO Bootstrap : Using eth0:10.213.19.134<0> 2024-03-15 05:25:20.269 n213-019-134:3514429:3514429 [5] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-15 05:25:20.271 n213-019-134:3514428:3514428 [4] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-15 05:25:20.277 n213-019-134:3514431:3514431 [7] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-15 05:25:20.279 n213-019-134:3514429:3515734 [5] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-15 05:25:20.282 n213-019-134:3514425:3514425 [2] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-15 05:25:20.284 n213-019-134:3514424:3514424 [1] NCCL INFO cudaDriverVersion 12010 2024-03-15 05:25:20.284 n213-019-134:3514424:3514424 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.285 n213-019-134:3514424:3514424 [1] NCCL INFO Bootstrap : Using eth0:10.213.19.134<0> 2024-03-15 05:25:20.285 n213-019-134:3514428:3515735 [4] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-15 05:25:20.289 n213-019-134:3514430:3514430 [6] NCCL INFO cudaDriverVersion 12010 2024-03-15 05:25:20.289 n213-019-134:3514430:3514430 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.289 n213-019-134:3514430:3514430 [6] NCCL INFO Bootstrap : Using eth0:10.213.19.134<0> 2024-03-15 05:25:20.291 n213-019-134:3514431:3515736 [7] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-15 05:25:20.291 n213-019-134:3514426:3514426 [3] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-15 05:25:20.293 n213-019-134:3514425:3515737 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-15 05:25:20.299 n213-019-134:3514426:3515738 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-15 05:25:20.307 n213-019-134:3514424:3514424 [1] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-15 05:25:20.311 n213-019-134:3514430:3514430 [6] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation 2024-03-15 05:25:20.314 n213-019-134:3514423:3515732 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.315 n213-019-134:3514423:3515732 [0] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-15 05:25:20.317 n213-019-134:3514424:3515739 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-15 05:25:20.326 n213-019-134:3514430:3515743 [6] NCCL INFO NCCL_IB_DISABLE set by environment to 0. 2024-03-15 05:25:20.327 n213-019-134:3514423:3515732 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.213.19.134<0> 2024-03-15 05:25:20.328 n213-019-134:3514423:3515732 [0] NCCL INFO Using non-device net plugin version 0 2024-03-15 05:25:20.328 n213-019-134:3514423:3515732 [0] NCCL INFO Using network IB 2024-03-15 05:25:20.353 n213-019-134:3514429:3515734 [5] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.354 n213-019-134:3514429:3515734 [5] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-15 05:25:20.357 n213-019-134:3514425:3515737 [2] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.357 n213-019-134:3514425:3515737 [2] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-15 05:25:20.365 n213-019-134:3514431:3515736 [7] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.365 n213-019-134:3514431:3515736 [7] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-15 05:25:20.371 n213-019-134:3514426:3515738 [3] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.371 n213-019-134:3514426:3515738 [3] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-15 05:25:20.372 n213-019-134:3514428:3515735 [4] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.373 n213-019-134:3514428:3515735 [4] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-15 05:25:20.373 n213-019-134:3514425:3515737 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.213.19.134<0> 2024-03-15 05:25:20.373 n213-019-134:3514425:3515737 [2] NCCL INFO Using non-device net plugin version 0 2024-03-15 05:25:20.373 n213-019-134:3514425:3515737 [2] NCCL INFO Using network IB 2024-03-15 05:25:20.373 n213-019-134:3514429:3515734 [5] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.213.19.134<0> 2024-03-15 05:25:20.374 n213-019-134:3514429:3515734 [5] NCCL INFO Using non-device net plugin version 0 2024-03-15 05:25:20.374 n213-019-134:3514429:3515734 [5] NCCL INFO Using network IB 2024-03-15 05:25:20.377 n213-019-134:3514424:3515739 [1] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.378 n213-019-134:3514424:3515739 [1] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-15 05:25:20.378 n213-019-134:3514431:3515736 [7] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.213.19.134<0> 2024-03-15 05:25:20.378 n213-019-134:3514431:3515736 [7] NCCL INFO Using non-device net plugin version 0 2024-03-15 05:25:20.378 n213-019-134:3514431:3515736 [7] NCCL INFO Using network IB 2024-03-15 05:25:20.379 n213-019-134:3514430:3515743 [6] NCCL INFO NCCL_SOCKET_IFNAME set by environment to eth0 2024-03-15 05:25:20.380 n213-019-134:3514430:3515743 [6] NCCL INFO NCCL_IB_HCA set to mlx5 2024-03-15 05:25:20.383 n213-019-134:3514426:3515738 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.213.19.134<0> 2024-03-15 05:25:20.383 n213-019-134:3514426:3515738 [3] NCCL INFO Using non-device net plugin version 0 2024-03-15 05:25:20.383 n213-019-134:3514426:3515738 [3] NCCL INFO Using network IB 2024-03-15 05:25:20.385 n213-019-134:3514428:3515735 [4] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.213.19.134<0> 2024-03-15 05:25:20.385 n213-019-134:3514428:3515735 [4] NCCL INFO Using non-device net plugin version 0 2024-03-15 05:25:20.385 n213-019-134:3514428:3515735 [4] NCCL INFO Using network IB 2024-03-15 05:25:20.390 n213-019-134:3514424:3515739 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.213.19.134<0> 2024-03-15 05:25:20.390 n213-019-134:3514424:3515739 [1] NCCL INFO Using non-device net plugin version 0 2024-03-15 05:25:20.390 n213-019-134:3514424:3515739 [1] NCCL INFO Using network IB 2024-03-15 05:25:20.391 n213-019-134:3514430:3515743 [6] NCCL INFO NET/IB : Using [0]mlx5_0:1/RoCE [1]mlx5_1:1/RoCE [2]mlx5_2:1/RoCE [3]mlx5_3:1/RoCE [RO]; OOB eth0:10.213.19.134<0> 2024-03-15 05:25:20.392 n213-019-134:3514430:3515743 [6] NCCL INFO Using non-device net plugin version 0 2024-03-15 05:25:20.392 n213-019-134:3514430:3515743 [6] NCCL INFO Using network IB 2024-03-15 05:25:20.618 n213-019-134:3514428:3515735 [4] NCCL INFO comm 0x802210a0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 89000 commId 0xe5c84d4bea5d91da - Init START 2024-03-15 05:25:20.618 n213-019-134:3514430:3515743 [6] NCCL INFO comm 0x7f781d60 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c5000 commId 0xe5c84d4bea5d91da - Init START 2024-03-15 05:25:20.618 n213-019-134:3514426:3515738 [3] NCCL INFO comm 0x815a0e40 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 4e000 commId 0xe5c84d4bea5d91da - Init START 2024-03-15 05:25:20.618 n213-019-134:3514431:3515736 [7] NCCL INFO comm 0x80ffcdc0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c9000 commId 0xe5c84d4bea5d91da - Init START 2024-03-15 05:25:20.618 n213-019-134:3514429:3515734 [5] NCCL INFO comm 0x81b8f850 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 8e000 commId 0xe5c84d4bea5d91da - Init START 2024-03-15 05:25:20.618 n213-019-134:3514424:3515739 [1] NCCL INFO comm 0x81192a70 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0xe5c84d4bea5d91da - Init START 2024-03-15 05:25:20.618 n213-019-134:3514425:3515737 [2] NCCL INFO comm 0x81464080 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 4a000 commId 0xe5c84d4bea5d91da - Init START 2024-03-15 05:25:20.618 n213-019-134:3514423:3515732 [0] NCCL INFO comm 0xb35d8480 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10000 commId 0xe5c84d4bea5d91da - Init START 2024-03-15 05:25:22.607 n213-019-134:3514426:3515738 [3] NCCL INFO Setting affinity for GPU 3 to ffffffff,00000000,ffffffff 2024-03-15 05:25:22.607 n213-019-134:3514426:3515738 [3] NCCL INFO NVLS multicast support is not available on dev 3 2024-03-15 05:25:22.608 n213-019-134:3514431:3515736 [7] NCCL INFO Setting affinity for GPU 7 to ffffffff,00000000,ffffffff,00000000 2024-03-15 05:25:22.608 n213-019-134:3514431:3515736 [7] NCCL INFO NVLS multicast support is not available on dev 7 2024-03-15 05:25:22.609 n213-019-134:3514428:3515735 [4] NCCL INFO Setting affinity for GPU 4 to ffffffff,00000000,ffffffff,00000000 2024-03-15 05:25:22.609 n213-019-134:3514428:3515735 [4] NCCL INFO NVLS multicast support is not available on dev 4 2024-03-15 05:25:22.609 n213-019-134:3514429:3515734 [5] NCCL INFO Setting affinity for GPU 5 to ffffffff,00000000,ffffffff,00000000 2024-03-15 05:25:22.610 n213-019-134:3514429:3515734 [5] NCCL INFO NVLS multicast support is not available on dev 5 2024-03-15 05:25:22.610 n213-019-134:3514430:3515743 [6] NCCL INFO Setting affinity for GPU 6 to ffffffff,00000000,ffffffff,00000000 2024-03-15 05:25:22.610 n213-019-134:3514430:3515743 [6] NCCL INFO NVLS multicast support is not available on dev 6 2024-03-15 05:25:22.611 n213-019-134:3514424:3515739 [1] NCCL INFO Setting affinity for GPU 1 to ffffffff,00000000,ffffffff 2024-03-15 05:25:22.611 n213-019-134:3514424:3515739 [1] NCCL INFO NVLS multicast support is not available on dev 1 2024-03-15 05:25:22.614 n213-019-134:3514425:3515737 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,00000000,ffffffff 2024-03-15 05:25:22.614 n213-019-134:3514425:3515737 [2] NCCL INFO NVLS multicast support is not available on dev 2 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,00000000,ffffffff 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO NVLS multicast support is not available on dev 0 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514425:3515737 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 2024-03-15 05:25:22.615 n213-019-134:3514424:3515739 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514431:3515736 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 2024-03-15 05:25:22.615 n213-019-134:3514428:3515735 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 2024-03-15 05:25:22.615 n213-019-134:3514425:3515737 [2] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:22.615 n213-019-134:3514430:3515743 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 2024-03-15 05:25:22.615 n213-019-134:3514424:3515739 [1] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514431:3515736 [7] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:22.615 n213-019-134:3514428:3515735 [4] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:22.615 n213-019-134:3514429:3515734 [5] NCCL INFO Trees [0] 6/-1/-1->5->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514430:3515743 [6] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:22.615 n213-019-134:3514429:3515734 [5] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514426:3515738 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514426:3515738 [3] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 2024-03-15 05:25:22.615 n213-019-134:3514423:3515732 [0] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:23.097 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.098 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.098 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.098 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.098 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.098 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.098 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.100 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.100 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.101 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.101 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.101 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.101 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.101 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.101 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.103 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.103 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.103 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.103 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.104 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.104 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.104 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.104 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.106 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.106 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.106 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.106 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.107 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.107 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.107 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.108 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.109 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.109 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.109 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.109 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.109 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.109 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.109 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.111 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.112 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.112 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.112 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.112 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.112 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.112 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.112 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.114 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.115 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.115 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.115 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.115 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.115 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.115 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.115 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.117 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.118 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.118 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.118 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.118 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.118 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.118 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.118 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.120 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.121 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.121 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.121 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.122 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.122 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.122 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.122 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.124 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.124 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.125 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.125 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.125 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.125 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.125 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.125 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.127 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.128 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.128 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.128 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.128 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.129 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.129 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.129 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.131 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.131 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.131 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.131 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.132 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.132 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.132 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.132 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.134 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.134 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.134 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.134 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.135 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.135 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.135 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.135 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.137 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.137 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.137 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.137 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.138 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.138 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.138 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.140 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.140 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.140 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.140 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.141 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.141 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.141 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.141 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.143 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.144 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.144 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.144 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.144 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.144 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.144 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.145 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.147 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.147 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.148 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.148 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.148 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.148 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.148 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.148 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.151 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.152 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.152 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.152 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.153 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.153 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.153 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.153 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.157 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.158 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.158 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.158 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.158 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.159 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.159 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.159 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.162 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.164 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.164 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.164 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.164 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.168 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.168 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.168 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.173 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.173 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.173 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.173 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.173 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.174 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.174 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.174 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.180 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.180 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.181 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.181 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.181 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.181 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.181 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.181 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.184 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.185 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.185 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.185 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.185 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.185 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.186 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.186 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.188 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:23.189 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:23.189 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.189 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:23.189 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.189 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:23.189 n213-019-134:3514423:3515732 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:23.189 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:23.192 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:23.192 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:23.192 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.005 n213-019-134:3514424:3515739 [1] NCCL INFO Connected all rings 2024-03-15 05:25:24.014 n213-019-134:3514425:3515737 [2] NCCL INFO Connected all rings 2024-03-15 05:25:24.023 n213-019-134:3514423:3515732 [0] NCCL INFO Connected all rings 2024-03-15 05:25:24.042 n213-019-134:3514426:3515738 [3] NCCL INFO Connected all rings 2024-03-15 05:25:24.042 n213-019-134:3514428:3515735 [4] NCCL INFO Connected all rings 2024-03-15 05:25:24.055 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.058 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.060 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.062 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.064 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.066 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.067 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.068 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.069 n213-019-134:3514431:3515736 [7] NCCL INFO Connected all rings 2024-03-15 05:25:24.069 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.069 n213-019-134:3514430:3515743 [6] NCCL INFO Connected all rings 2024-03-15 05:25:24.069 n213-019-134:3514429:3515734 [5] NCCL INFO Connected all rings 2024-03-15 05:25:24.069 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.071 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.071 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.072 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.074 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.074 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.075 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.076 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.077 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.077 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.079 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.079 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.079 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.081 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.081 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.082 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.083 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.084 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.084 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.086 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.086 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.087 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.089 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.089 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.090 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.091 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.092 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.092 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.094 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.094 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.095 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.097 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.097 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.097 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.098 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.099 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.100 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.100 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.100 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.102 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.103 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.103 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.104 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.104 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.105 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.105 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.106 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.107 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.107 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.107 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.108 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.108 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.109 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.110 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.110 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.110 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.111 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.112 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.112 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.113 n213-019-134:3514424:3515739 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:24.113 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.114 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.115 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.115 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.116 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.116 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.117 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.118 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.118 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.119 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.120 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.120 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.121 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.121 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.122 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.123 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.124 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.124 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.125 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.125 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.126 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.127 n213-019-134:3514425:3515737 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:24.129 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.129 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.129 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.131 n213-019-134:3514431:3515736 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:24.131 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.132 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.132 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.132 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.134 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.134 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.134 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.135 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.136 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.137 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.137 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.137 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.138 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.139 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.139 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.139 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.142 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.142 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.142 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.143 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.144 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.144 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.144 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.145 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.146 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.146 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.147 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.148 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.148 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.149 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.149 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.150 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.151 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.151 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.152 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.153 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.154 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.154 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.154 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.156 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.156 n213-019-134:3514428:3515735 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:24.156 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.157 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.158 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.158 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.159 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.160 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.160 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.161 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.163 n213-019-134:3514426:3515738 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:24.163 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.163 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.165 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.165 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.167 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.167 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.169 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.169 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.170 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.171 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.172 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.173 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.174 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.174 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.176 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.176 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.179 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.179 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.181 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.181 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.183 n213-019-134:3514429:3515734 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:24.184 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.186 n213-019-134:3514430:3515743 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:24.649 n213-019-134:3514423:3515732 [0] NCCL INFO Connected all trees 2024-03-15 05:25:24.649 n213-019-134:3514423:3515732 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:25:24.649 n213-019-134:3514423:3515732 [0] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:25:24.743 n213-019-134:3514424:3515739 [1] NCCL INFO Connected all trees 2024-03-15 05:25:24.743 n213-019-134:3514424:3515739 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:25:24.743 n213-019-134:3514424:3515739 [1] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:25:24.766 n213-019-134:3514425:3515737 [2] NCCL INFO Connected all trees 2024-03-15 05:25:24.766 n213-019-134:3514425:3515737 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:25:24.766 n213-019-134:3514425:3515737 [2] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:25:24.788 n213-019-134:3514426:3515738 [3] NCCL INFO Connected all trees 2024-03-15 05:25:24.788 n213-019-134:3514426:3515738 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:25:24.788 n213-019-134:3514426:3515738 [3] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:25:24.800 n213-019-134:3514428:3515735 [4] NCCL INFO Connected all trees 2024-03-15 05:25:24.800 n213-019-134:3514428:3515735 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:25:24.800 n213-019-134:3514428:3515735 [4] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:25:24.802 n213-019-134:3514429:3515734 [5] NCCL INFO Connected all trees 2024-03-15 05:25:24.802 n213-019-134:3514429:3515734 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:25:24.802 n213-019-134:3514429:3515734 [5] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:25:24.802 n213-019-134:3514431:3515736 [7] NCCL INFO Connected all trees 2024-03-15 05:25:24.802 n213-019-134:3514430:3515743 [6] NCCL INFO Connected all trees 2024-03-15 05:25:24.802 n213-019-134:3514431:3515736 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:25:24.802 n213-019-134:3514431:3515736 [7] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:25:24.802 n213-019-134:3514430:3515743 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:25:24.802 n213-019-134:3514430:3515743 [6] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:25:24.829 n213-019-134:3514431:3515736 [7] NCCL INFO comm 0x80ffcdc0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c9000 commId 0xe5c84d4bea5d91da - Init COMPLETE 2024-03-15 05:25:24.829 n213-019-134:3514428:3515735 [4] NCCL INFO comm 0x802210a0 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 89000 commId 0xe5c84d4bea5d91da - Init COMPLETE 2024-03-15 05:25:24.829 n213-019-134:3514430:3515743 [6] NCCL INFO comm 0x7f781d60 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c5000 commId 0xe5c84d4bea5d91da - Init COMPLETE 2024-03-15 05:25:24.829 n213-019-134:3514426:3515738 [3] NCCL INFO comm 0x815a0e40 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 4e000 commId 0xe5c84d4bea5d91da - Init COMPLETE 2024-03-15 05:25:24.829 n213-019-134:3514425:3515737 [2] NCCL INFO comm 0x81464080 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 4a000 commId 0xe5c84d4bea5d91da - Init COMPLETE 2024-03-15 05:25:24.829 n213-019-134:3514423:3515732 [0] NCCL INFO comm 0xb35d8480 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10000 commId 0xe5c84d4bea5d91da - Init COMPLETE 2024-03-15 05:25:24.829 n213-019-134:3514424:3515739 [1] NCCL INFO comm 0x81192a70 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0xe5c84d4bea5d91da - Init COMPLETE 2024-03-15 05:25:24.829 n213-019-134:3514429:3515734 [5] NCCL INFO comm 0x81b8f850 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 8e000 commId 0xe5c84d4bea5d91da - Init COMPLETE [2024-03-15 05:25:24,914] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2024-03-15 05:25:24,920] [INFO] [logging.py:96:log_dist] [Rank 0] Creating BF16 optimizer [2024-03-15 05:25:25,113] [INFO] [utils.py:802:see_memory_usage] begin bf16_optimizer [2024-03-15 05:25:25,114] [INFO] [utils.py:803:see_memory_usage] MA 15.29 GB Max_MA 15.29 GB CA 15.31 GB Max_CA 15 GB [2024-03-15 05:25:25,114] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 231.83 GB, percent = 11.5% [2024-03-15 05:25:25,266] [INFO] [utils.py:802:see_memory_usage] end bf16_optimizer [2024-03-15 05:25:25,266] [INFO] [utils.py:803:see_memory_usage] MA 15.29 GB Max_MA 15.29 GB CA 15.31 GB Max_CA 15 GB [2024-03-15 05:25:25,267] [INFO] [utils.py:810:see_memory_usage] CPU Virtual Memory: used = 229.29 GB, percent = 11.4% [2024-03-15 05:25:25,269] [INFO] [config.py:972:print] DeepSpeedEngine configuration: [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] amp_enabled .................. False [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] amp_params ................... False [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] bfloat16_enabled ............. True [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] checkpoint_parallel_write_pipeline False [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] checkpoint_tag_validation_enabled True [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] checkpoint_tag_validation_fail False [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] comms_config ................. [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] communication_data_type ...... None [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] curriculum_enabled_legacy .... False [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] curriculum_params_legacy ..... False [2024-03-15 05:25:25,269] [INFO] [config.py:976:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] data_efficiency_enabled ...... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] dataloader_drop_last ......... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] disable_allgather ............ False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] dump_state ................... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] dynamic_loss_scale_args ...... None [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] eigenvalue_enabled ........... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] eigenvalue_gas_boundary_resolution 1 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] eigenvalue_layer_name ........ bert.encoder.layer [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] eigenvalue_layer_num ......... 0 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] eigenvalue_max_iter .......... 100 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] eigenvalue_stability ......... 1e-06 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] eigenvalue_tol ............... 0.01 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] eigenvalue_verbose ........... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] elasticity_enabled ........... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] flops_profiler_config ........ { "enabled": false, "recompute_fwd_factor": 0.0, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] fp16_auto_cast ............... None [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] fp16_enabled ................. False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] fp16_master_weights_and_gradients False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] global_rank .................. 0 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] grad_accum_dtype ............. None [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] gradient_accumulation_steps .. 4 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] gradient_clipping ............ 0.0 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] gradient_predivide_factor .... 1.0 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] initial_dynamic_scale ........ 1 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] load_universal_checkpoint .... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] loss_scale ................... 1.0 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] memory_breakdown ............. False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] mics_hierarchial_params_gather False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] mics_shard_size .............. -1 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] optimizer_legacy_fusion ...... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] optimizer_name ............... None [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] optimizer_params ............. None [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] pld_enabled .................. False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] pld_params ................... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] prescale_gradients ........... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] scheduler_name ............... None [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] scheduler_params ............. None [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] seq_parallel_communication_data_type torch.float32 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] sparse_attention ............. None [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] sparse_gradients_enabled ..... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] steps_per_print .............. inf [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] train_batch_size ............. 128 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] train_micro_batch_size_per_gpu 4 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] use_node_local_storage ....... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] wall_clock_breakdown ......... False [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] weight_quantization_config ... None [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] world_size ................... 8 [2024-03-15 05:25:25,270] [INFO] [config.py:976:print] zero_allow_untested_optimizer False [2024-03-15 05:25:25,271] [INFO] [config.py:976:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1000000000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True [2024-03-15 05:25:25,271] [INFO] [config.py:976:print] zero_enabled ................. False [2024-03-15 05:25:25,271] [INFO] [config.py:976:print] zero_force_ds_cpu_optimizer .. True [2024-03-15 05:25:25,271] [INFO] [config.py:976:print] zero_optimization_stage ...... 0 [2024-03-15 05:25:25,271] [INFO] [config.py:962:print_user_config] json = { "fp16": { "enabled": false, "loss_scale": 0, "loss_scale_window": 1000, "initial_scale_power": 16, "hysteresis": 2, "min_loss_scale": 1 }, "bf16": { "enabled": true }, "train_micro_batch_size_per_gpu": 4, "train_batch_size": 128, "gradient_accumulation_steps": 4, "zero_optimization": { "stage": 0, "overlap_comm": true, "contiguous_gradients": true, "sub_group_size": 1.000000e+09, "reduce_bucket_size": "auto" }, "steps_per_print": inf } /usr/local/lib/python3.9/dist-packages/bytedmetrics/__init__.py:10: UserWarning: bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics` warnings.warn("bytedmetrics is renamed to bytedance.metrics, please using `bytedance.metrics` instead of `bytedmetrics`") wandb: ⭐️ View project at https://ml.byteintl.net/experiment/tracking/detail?Id=project_20230126_e9daa974 wandb: 🚀 View run at https://ml.byteintl.net/experiment/tracking/detail?Id=project_20230126_e9daa974&selectedTrial=run_20240315_562c49c8 wandb: - Waiting for wandb.init()... wandb: \ Waiting for wandb.init()... wandb: | Waiting for wandb.init()... wandb: Tracking run with wandb version 0.13.69 wandb: Run data is saved locally in /mnt/bn/liangkeg/ruohongz/vllm/dpo_experiment/wandb/run-20240315_052545-run_20240315_562c49c8 wandb: Run `wandb offline` to turn off syncing. 0%| | 0/396 [00:005->4 [1] 6/-1/-1->5->4 [2] 6/-1/-1->5->4 [3] 6/-1/-1->5->4 [4] 6/-1/-1->5->4 [5] 6/-1/-1->5->4 [6] 6/-1/-1->5->4 [7] 6/-1/-1->5->4 [8] 6/-1/-1->5->4 [9] 6/-1/-1->5->4 [10] 6/-1/-1->5->4 [11] 6/-1/-1->5->4 [12] 6/-1/-1->5->4 [13] 6/-1/-1->5->4 [14] 6/-1/-1->5->4 [15] 6/-1/-1->5->4 [16] 6/-1/-1->5->4 [17] 6/-1/-1->5->4 [18] 6/-1/-1->5->4 [19] 6/-1/-1->5->4 [20] 6/-1/-1->5->4 [21] 6/-1/-1->5->4 [22] 6/-1/-1->5->4 [23] 6/-1/-1->5->4 2024-03-15 05:25:56.131 n213-019-134:3514429:3516089 [5] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 00/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514428:3516091 [4] NCCL INFO Trees [0] 5/-1/-1->4->3 [1] 5/-1/-1->4->3 [2] 5/-1/-1->4->3 [3] 5/-1/-1->4->3 [4] 5/-1/-1->4->3 [5] 5/-1/-1->4->3 [6] 5/-1/-1->4->3 [7] 5/-1/-1->4->3 [8] 5/-1/-1->4->3 [9] 5/-1/-1->4->3 [10] 5/-1/-1->4->3 [11] 5/-1/-1->4->3 [12] 5/-1/-1->4->3 [13] 5/-1/-1->4->3 [14] 5/-1/-1->4->3 [15] 5/-1/-1->4->3 [16] 5/-1/-1->4->3 [17] 5/-1/-1->4->3 [18] 5/-1/-1->4->3 [19] 5/-1/-1->4->3 [20] 5/-1/-1->4->3 [21] 5/-1/-1->4->3 [22] 5/-1/-1->4->3 [23] 5/-1/-1->4->3 2024-03-15 05:25:56.136 n213-019-134:3514430:3516093 [6] NCCL INFO Trees [0] 7/-1/-1->6->5 [1] 7/-1/-1->6->5 [2] 7/-1/-1->6->5 [3] 7/-1/-1->6->5 [4] 7/-1/-1->6->5 [5] 7/-1/-1->6->5 [6] 7/-1/-1->6->5 [7] 7/-1/-1->6->5 [8] 7/-1/-1->6->5 [9] 7/-1/-1->6->5 [10] 7/-1/-1->6->5 [11] 7/-1/-1->6->5 [12] 7/-1/-1->6->5 [13] 7/-1/-1->6->5 [14] 7/-1/-1->6->5 [15] 7/-1/-1->6->5 [16] 7/-1/-1->6->5 [17] 7/-1/-1->6->5 [18] 7/-1/-1->6->5 [19] 7/-1/-1->6->5 [20] 7/-1/-1->6->5 [21] 7/-1/-1->6->5 [22] 7/-1/-1->6->5 [23] 7/-1/-1->6->5 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 01/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514426:3516092 [3] NCCL INFO Trees [0] 4/-1/-1->3->2 [1] 4/-1/-1->3->2 [2] 4/-1/-1->3->2 [3] 4/-1/-1->3->2 [4] 4/-1/-1->3->2 [5] 4/-1/-1->3->2 [6] 4/-1/-1->3->2 [7] 4/-1/-1->3->2 [8] 4/-1/-1->3->2 [9] 4/-1/-1->3->2 [10] 4/-1/-1->3->2 [11] 4/-1/-1->3->2 [12] 4/-1/-1->3->2 [13] 4/-1/-1->3->2 [14] 4/-1/-1->3->2 [15] 4/-1/-1->3->2 [16] 4/-1/-1->3->2 [17] 4/-1/-1->3->2 [18] 4/-1/-1->3->2 [19] 4/-1/-1->3->2 [20] 4/-1/-1->3->2 [21] 4/-1/-1->3->2 [22] 4/-1/-1->3->2 [23] 4/-1/-1->3->2 2024-03-15 05:25:56.136 n213-019-134:3514425:3516087 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 [2] 3/-1/-1->2->1 [3] 3/-1/-1->2->1 [4] 3/-1/-1->2->1 [5] 3/-1/-1->2->1 [6] 3/-1/-1->2->1 [7] 3/-1/-1->2->1 [8] 3/-1/-1->2->1 [9] 3/-1/-1->2->1 [10] 3/-1/-1->2->1 [11] 3/-1/-1->2->1 [12] 3/-1/-1->2->1 [13] 3/-1/-1->2->1 [14] 3/-1/-1->2->1 [15] 3/-1/-1->2->1 [16] 3/-1/-1->2->1 [17] 3/-1/-1->2->1 [18] 3/-1/-1->2->1 [19] 3/-1/-1->2->1 [20] 3/-1/-1->2->1 [21] 3/-1/-1->2->1 [22] 3/-1/-1->2->1 [23] 3/-1/-1->2->1 2024-03-15 05:25:56.136 n213-019-134:3514428:3516091 [4] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 02/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514430:3516093 [6] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:56.136 n213-019-134:3514426:3516092 [3] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:56.136 n213-019-134:3514425:3516087 [2] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 03/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514424:3516088 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0 [4] 2/-1/-1->1->0 [5] 2/-1/-1->1->0 [6] 2/-1/-1->1->0 [7] 2/-1/-1->1->0 [8] 2/-1/-1->1->0 [9] 2/-1/-1->1->0 [10] 2/-1/-1->1->0 [11] 2/-1/-1->1->0 [12] 2/-1/-1->1->0 [13] 2/-1/-1->1->0 [14] 2/-1/-1->1->0 [15] 2/-1/-1->1->0 [16] 2/-1/-1->1->0 [17] 2/-1/-1->1->0 [18] 2/-1/-1->1->0 [19] 2/-1/-1->1->0 [20] 2/-1/-1->1->0 [21] 2/-1/-1->1->0 [22] 2/-1/-1->1->0 [23] 2/-1/-1->1->0 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 04/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514431:3516090 [7] NCCL INFO Trees [0] -1/-1/-1->7->6 [1] -1/-1/-1->7->6 [2] -1/-1/-1->7->6 [3] -1/-1/-1->7->6 [4] -1/-1/-1->7->6 [5] -1/-1/-1->7->6 [6] -1/-1/-1->7->6 [7] -1/-1/-1->7->6 [8] -1/-1/-1->7->6 [9] -1/-1/-1->7->6 [10] -1/-1/-1->7->6 [11] -1/-1/-1->7->6 [12] -1/-1/-1->7->6 [13] -1/-1/-1->7->6 [14] -1/-1/-1->7->6 [15] -1/-1/-1->7->6 [16] -1/-1/-1->7->6 [17] -1/-1/-1->7->6 [18] -1/-1/-1->7->6 [19] -1/-1/-1->7->6 [20] -1/-1/-1->7->6 [21] -1/-1/-1->7->6 [22] -1/-1/-1->7->6 [23] -1/-1/-1->7->6 2024-03-15 05:25:56.136 n213-019-134:3514424:3516088 [1] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 05/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514431:3516090 [7] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 06/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 07/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 08/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 09/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 10/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 11/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 12/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 13/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 14/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 15/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 16/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 17/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 18/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 19/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 20/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 21/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 22/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 23/24 : 0 1 2 3 4 5 6 7 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1 [4] 1/-1/-1->0->-1 [5] 1/-1/-1->0->-1 [6] 1/-1/-1->0->-1 [7] 1/-1/-1->0->-1 [8] 1/-1/-1->0->-1 [9] 1/-1/-1->0->-1 [10] 1/-1/-1->0->-1 [11] 1/-1/-1->0->-1 [12] 1/-1/-1->0->-1 [13] 1/-1/-1->0->-1 [14] 1/-1/-1->0->-1 [15] 1/-1/-1->0->-1 [16] 1/-1/-1->0->-1 [17] 1/-1/-1->0->-1 [18] 1/-1/-1->0->-1 [19] 1/-1/-1->0->-1 [20] 1/-1/-1->0->-1 [21] 1/-1/-1->0->-1 [22] 1/-1/-1->0->-1 [23] 1/-1/-1->0->-1 2024-03-15 05:25:56.136 n213-019-134:3514423:3516086 [0] NCCL INFO P2P Chunksize set to 524288 2024-03-15 05:25:56.518 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.518 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 00/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.518 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 00/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.519 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.519 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 00/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.519 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 00/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.520 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.520 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.521 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 01/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.521 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 01/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.521 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 00/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.522 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.522 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 01/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.522 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 01/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.522 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 01/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.522 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.523 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 02/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.523 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 02/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.523 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 01/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.524 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.524 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 02/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.524 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 02/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.524 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.525 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.525 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 03/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.525 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 03/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.525 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 02/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.526 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.526 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 03/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.526 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 03/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.526 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.527 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.527 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 04/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.527 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 04/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.528 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 03/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.528 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.528 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 04/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.528 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 04/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.528 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.529 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.530 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 05/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.530 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 05/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.530 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 04/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.530 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.530 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 05/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.531 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 05/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.531 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.531 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.532 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 06/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.532 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 06/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.532 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 05/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.533 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.533 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 06/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.533 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.534 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.534 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 06/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.535 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 07/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.535 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 07/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.535 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 06/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.535 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.536 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 07/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.536 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.536 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.537 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 07/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.537 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 08/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.537 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 08/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.538 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 07/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.538 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.538 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 08/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.538 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.539 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.539 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 08/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.540 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 09/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.540 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 09/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.540 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 08/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.541 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.541 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 09/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.542 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 09/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.543 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.543 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 09/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.544 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 10/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.544 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 10/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.544 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 09/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.545 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.545 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 10/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.545 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.548 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.548 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 10/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.548 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 11/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.548 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 11/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.549 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 10/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.549 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.550 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 11/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.550 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.551 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.551 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 11/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.552 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 12/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.552 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 12/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.553 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 11/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.554 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.555 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 12/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.555 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.556 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.557 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 12/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.557 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 13/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.557 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 13/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.557 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 12/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:56.558 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.558 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 13/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.558 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.561 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:56.560 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:56.559 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:56.560 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 14/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:56.561 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 14/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:56.559 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 13/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:56.560 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 14/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:56.560 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 13/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:58.854 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:58.854 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:58.854 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:58.854 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 15/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:58.854 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 15/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:58.854 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 14/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:58.854 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 15/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:58.854 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 14/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:58.857 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 16/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:58.857 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 16/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:58.857 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 16/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:58.857 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 16/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:58.858 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 16/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:58.858 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 15/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:58.858 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 16/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:58.859 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 15/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:58.860 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 17/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:58.860 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 17/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:58.861 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 17/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:58.861 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 17/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:58.861 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 17/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:58.861 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 16/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:58.861 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 17/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:58.862 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 16/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:58.863 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 18/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:58.864 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 18/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:58.864 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 18/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:58.864 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 18/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:58.864 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 18/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:58.864 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 17/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:58.864 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 18/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:58.866 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 17/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:58.867 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 19/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:58.867 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 19/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:58.867 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 19/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:58.867 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 19/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:58.867 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 19/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:58.867 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 18/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:58.867 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 19/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:58.869 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 18/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:58.870 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 20/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:58.870 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 20/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:58.870 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 20/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:58.870 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 20/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:58.870 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 20/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:58.870 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 19/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:58.870 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 20/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:58.872 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 19/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:58.873 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 21/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:58.873 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 21/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:58.873 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 21/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:58.873 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 21/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:58.873 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 21/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:58.873 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 20/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:58.873 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 21/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:58.875 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 20/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:58.876 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 22/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:58.876 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 22/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:58.876 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 22/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:58.876 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 22/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:58.877 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 22/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:58.877 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 21/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:58.877 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 22/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:58.878 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 21/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:58.879 n213-019-134:3514423:3516086 [0] NCCL INFO Channel 23/0 : 0[0] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:58.880 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 23/0 : 1[1] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:58.880 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 23/0 : 5[5] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:58.880 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 23/0 : 2[2] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:58.880 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 23/0 : 6[6] -> 7[7] via P2P/CUMEM/read 2024-03-15 05:25:58.880 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 22/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:58.880 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 23/0 : 4[4] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:58.882 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 22/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:58.883 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 23/0 : 3[3] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:58.884 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 23/0 : 7[7] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.622 n213-019-134:3514425:3516087 [2] NCCL INFO Connected all rings 2024-03-15 05:25:59.627 n213-019-134:3514426:3516092 [3] NCCL INFO Connected all rings 2024-03-15 05:25:59.651 n213-019-134:3514424:3516088 [1] NCCL INFO Connected all rings 2024-03-15 05:25:59.651 n213-019-134:3514423:3516086 [0] NCCL INFO Connected all rings 2024-03-15 05:25:59.658 n213-019-134:3514428:3516091 [4] NCCL INFO Connected all rings 2024-03-15 05:25:59.697 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 00/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.697 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.699 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 01/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.699 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.701 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 02/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.701 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.703 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 03/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.703 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.706 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 04/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.706 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.708 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 05/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.708 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.710 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 06/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.710 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.713 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 07/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.713 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.713 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 00/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.715 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 08/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.715 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.715 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.717 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 09/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.717 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.717 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 02/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.720 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 10/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.720 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.720 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 03/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.722 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 11/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.722 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.722 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 04/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.722 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 00/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.724 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 12/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.724 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.724 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 05/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.724 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 01/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.726 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 13/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.726 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.726 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 06/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.727 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 02/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.728 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 14/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.728 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.728 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 07/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.729 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 03/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.730 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 15/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.730 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.731 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 08/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.731 n213-019-134:3514431:3516090 [7] NCCL INFO Connected all rings 2024-03-15 05:25:59.731 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 00/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.731 n213-019-134:3514429:3516089 [5] NCCL INFO Connected all rings 2024-03-15 05:25:59.731 n213-019-134:3514430:3516093 [6] NCCL INFO Connected all rings 2024-03-15 05:25:59.731 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 04/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.733 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 16/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.733 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 09/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.733 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 16/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.735 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 01/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.735 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 05/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.736 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 17/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.737 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 10/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.737 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 17/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.737 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 02/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.738 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 06/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.739 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 18/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.739 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 11/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.740 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 18/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.740 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 03/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.740 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 07/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.742 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 19/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.742 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 12/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.743 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 19/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.743 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 08/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.744 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 20/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.745 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 04/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.745 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 13/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.745 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 20/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.746 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 09/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.748 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 21/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.749 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 05/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.749 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 14/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.749 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 21/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.750 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 10/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.755 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 22/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.755 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 06/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.755 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 15/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.755 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 22/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.757 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 11/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.761 n213-019-134:3514425:3516087 [2] NCCL INFO Channel 23/0 : 2[2] -> 1[1] via P2P/CUMEM/read 2024-03-15 05:25:59.761 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 07/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.761 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 16/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.761 n213-019-134:3514426:3516092 [3] NCCL INFO Channel 23/0 : 3[3] -> 2[2] via P2P/CUMEM/read 2024-03-15 05:25:59.765 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 12/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.768 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 08/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.768 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 17/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.769 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 13/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.770 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 09/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.770 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 18/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.770 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 14/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.772 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 10/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.772 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 19/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.773 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 15/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.774 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 11/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.774 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 20/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.775 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 16/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.776 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 12/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.776 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 21/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.777 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 17/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.778 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 13/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.778 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 22/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.779 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 18/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.780 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 14/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.780 n213-019-134:3514424:3516088 [1] NCCL INFO Channel 23/0 : 1[1] -> 0[0] via P2P/CUMEM/read 2024-03-15 05:25:59.781 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 19/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.782 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 15/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.783 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 20/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.784 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 16/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.785 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 21/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.787 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 17/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.788 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 22/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.789 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 18/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.790 n213-019-134:3514428:3516091 [4] NCCL INFO Channel 23/0 : 4[4] -> 3[3] via P2P/CUMEM/read 2024-03-15 05:25:59.793 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 19/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.799 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 20/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.802 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 21/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.809 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 22/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.812 n213-019-134:3514431:3516090 [7] NCCL INFO Channel 23/0 : 7[7] -> 6[6] via P2P/CUMEM/read 2024-03-15 05:25:59.814 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.816 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.816 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 00/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.818 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.818 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 01/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.821 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.821 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 02/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.823 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.823 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 03/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.825 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.826 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 04/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.828 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.828 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 05/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.830 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.830 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 06/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.833 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.834 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 07/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.838 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.839 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 08/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.842 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.843 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 09/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.847 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.847 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 10/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.849 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.849 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 11/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.851 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.851 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 12/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.853 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.854 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 13/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.856 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.856 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 14/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.858 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 16/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.858 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 15/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.859 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 17/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.859 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 16/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.862 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 18/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.862 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 17/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.864 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 19/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.864 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 18/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.866 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 20/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.866 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 19/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.868 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 21/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.868 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 20/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.870 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 22/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.870 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 21/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.872 n213-019-134:3514430:3516093 [6] NCCL INFO Channel 23/0 : 6[6] -> 5[5] via P2P/CUMEM/read 2024-03-15 05:25:59.872 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 22/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:25:59.875 n213-019-134:3514429:3516089 [5] NCCL INFO Channel 23/0 : 5[5] -> 4[4] via P2P/CUMEM/read 2024-03-15 05:26:00.461 n213-019-134:3514423:3516086 [0] NCCL INFO Connected all trees 2024-03-15 05:26:00.461 n213-019-134:3514423:3516086 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:26:00.461 n213-019-134:3514423:3516086 [0] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:26:00.484 n213-019-134:3514424:3516088 [1] NCCL INFO Connected all trees 2024-03-15 05:26:00.484 n213-019-134:3514424:3516088 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:26:00.484 n213-019-134:3514424:3516088 [1] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:26:00.518 n213-019-134:3514425:3516087 [2] NCCL INFO Connected all trees 2024-03-15 05:26:00.519 n213-019-134:3514425:3516087 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:26:00.519 n213-019-134:3514425:3516087 [2] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:26:00.550 n213-019-134:3514426:3516092 [3] NCCL INFO Connected all trees 2024-03-15 05:26:00.550 n213-019-134:3514426:3516092 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:26:00.550 n213-019-134:3514426:3516092 [3] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:26:00.551 n213-019-134:3514431:3516090 [7] NCCL INFO Connected all trees 2024-03-15 05:26:00.551 n213-019-134:3514431:3516090 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:26:00.551 n213-019-134:3514431:3516090 [7] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:26:00.553 n213-019-134:3514428:3516091 [4] NCCL INFO Connected all trees 2024-03-15 05:26:00.553 n213-019-134:3514428:3516091 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:26:00.553 n213-019-134:3514428:3516091 [4] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:26:00.553 n213-019-134:3514430:3516093 [6] NCCL INFO Connected all trees 2024-03-15 05:26:00.553 n213-019-134:3514429:3516089 [5] NCCL INFO Connected all trees 2024-03-15 05:26:00.554 n213-019-134:3514430:3516093 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:26:00.554 n213-019-134:3514429:3516089 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 2024-03-15 05:26:00.554 n213-019-134:3514430:3516093 [6] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:26:00.554 n213-019-134:3514429:3516089 [5] NCCL INFO 24 coll channels, 0 nvls channels, 32 p2p channels, 32 p2p channels per peer 2024-03-15 05:26:00.576 n213-019-134:3514426:3516092 [3] NCCL INFO comm 0x80ceaa80 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 4e000 commId 0xb9123779b59a0657 - Init COMPLETE 2024-03-15 05:26:00.576 n213-019-134:3514424:3516088 [1] NCCL INFO comm 0x811abec0 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 16000 commId 0xb9123779b59a0657 - Init COMPLETE 2024-03-15 05:26:00.576 n213-019-134:3514425:3516087 [2] NCCL INFO comm 0x8124a250 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 4a000 commId 0xb9123779b59a0657 - Init COMPLETE 2024-03-15 05:26:00.576 n213-019-134:3514428:3516091 [4] NCCL INFO comm 0x80215290 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 89000 commId 0xb9123779b59a0657 - Init COMPLETE 2024-03-15 05:26:00.576 n213-019-134:3514431:3516090 [7] NCCL INFO comm 0x7fdb9640 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId c9000 commId 0xb9123779b59a0657 - Init COMPLETE 2024-03-15 05:26:00.576 n213-019-134:3514430:3516093 [6] NCCL INFO comm 0xbf0872e0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c5000 commId 0xb9123779b59a0657 - Init COMPLETE 2024-03-15 05:26:00.576 n213-019-134:3514423:3516086 [0] NCCL INFO comm 0xb195a5c0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId 10000 commId 0xb9123779b59a0657 - Init COMPLETE 2024-03-15 05:26:00.576 n213-019-134:3514429:3516089 [5] NCCL INFO comm 0x80f34d80 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId 8e000 commId 0xb9123779b59a0657 - Init COMPLETE /mnt/bn/liangkeg/ruohongz/vllm/dpo_experiment/trl/trainer/dpo_trainer.py:1138: UserWarning: compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator warnings.warn( /mnt/bn/liangkeg/ruohongz/vllm/dpo_experiment/trl/trainer/dpo_trainer.py:1138: UserWarning: compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator warnings.warn( /mnt/bn/liangkeg/ruohongz/vllm/dpo_experiment/trl/trainer/dpo_trainer.py:1138: UserWarning: compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator warnings.warn( /mnt/bn/liangkeg/ruohongz/vllm/dpo_experiment/trl/trainer/dpo_trainer.py:1138: UserWarning: compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator warnings.warn( /mnt/bn/liangkeg/ruohongz/vllm/dpo_experiment/trl/trainer/dpo_trainer.py:1138: UserWarning: compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator warnings.warn( /mnt/bn/liangkeg/ruohongz/vllm/dpo_experiment/trl/trainer/dpo_trainer.py:1138: UserWarning: compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator warnings.warn( /mnt/bn/liangkeg/ruohongz/vllm/dpo_experiment/trl/trainer/dpo_trainer.py:1138: UserWarning: compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator warnings.warn( /mnt/bn/liangkeg/ruohongz/vllm/dpo_experiment/trl/trainer/dpo_trainer.py:1138: UserWarning: compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( Could not estimate the number of tokens of the input, floating-point operations will not be computed Could not estimate the number of tokens of the input, floating-point operations will not be computed Could not estimate the number of tokens of the input, floating-point operations will not be computed Could not estimate the number of tokens of the input, floating-point operations will not be computed Could not estimate the number of tokens of the input, floating-point operations will not be computed Could not estimate the number of tokens of the input, floating-point operations will not be computed Could not estimate the number of tokens of the input, floating-point operations will not be computed Could not estimate the number of tokens of the input, floating-point operations will not be computed /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) /usr/local/lib/python3.9/dist-packages/deepspeed/runtime/zero/stage_1_and_2.py:1586: UserWarning: The torch.cuda.*DtypeTensor constructors are no longer recommended. It's best to use methods such as torch.tensor(data, dtype=*, device='cuda') to create tensors. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:85.) total_norm_cuda = get_accelerator().FloatTensor([float(total_norm)]) 0%| | 1/396 [00:39<4:17:45, 39.15s/it] {'loss': 0.6931, 'learning_rate': 1.25e-08, 'losses/dpo': 0.6931471824645996, 'losses/sft': 0.7711470723152161, 'losses/total': 0.6931471824645996, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -25.36812400817871, 'logps/chosen': -22.472335815429688, 'ref_logps/rejected': -25.36812400817871, 'ref_logps/chosen': -22.472335815429688, 'epoch': 0.01} 0%| | 1/396 [00:39<4:17:45, 39.15s/it] 1%| | 2/396 [01:08<3:40:14, 33.54s/it] {'loss': 0.6931, 'learning_rate': 2.5e-08, 'losses/dpo': 0.6931471824645996, 'losses/sft': 0.8523496985435486, 'losses/total': 0.6931471824645996, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -25.130128860473633, 'logps/chosen': -21.278339385986328, 'ref_logps/rejected': -25.130128860473633, 'ref_logps/chosen': -21.278339385986328, 'epoch': 0.02} 1%| | 2/396 [01:08<3:40:14, 33.54s/it] 1%| | 3/396 [01:37<3:25:38, 31.39s/it] {'loss': 0.693, 'learning_rate': 3.75e-08, 'losses/dpo': 0.6928481459617615, 'losses/sft': 0.6631997227668762, 'losses/total': 0.6928481459617615, 'rewards/chosen': 0.001451290212571621, 'rewards/rejected': 0.0010142631363123655, 'rewards/accuracies': 0.515625, 'rewards/margins': 0.00043702672701328993, 'logps/rejected': -26.44188690185547, 'logps/chosen': -21.53506851196289, 'ref_logps/rejected': -26.452028274536133, 'ref_logps/chosen': -21.54958152770996, 'epoch': 0.02} 1%| | 3/396 [01:37<3:25:38, 31.39s/it] 1%| | 4/396 [02:07<3:20:18, 30.66s/it] {'loss': 0.6935, 'learning_rate': 5e-08, 'losses/dpo': 0.6933612823486328, 'losses/sft': 0.819932758808136, 'losses/total': 0.6933612823486328, 'rewards/chosen': -0.00046504498459398746, 'rewards/rejected': 0.0001981628010980785, 'rewards/accuracies': 0.4609375, 'rewards/margins': -0.000663207727484405, 'logps/rejected': -26.232192993164062, 'logps/chosen': -21.846920013427734, 'ref_logps/rejected': -26.234174728393555, 'ref_logps/chosen': -21.842269897460938, 'epoch': 0.03} 1%| | 4/396 [02:07<3:20:18, 30.66s/it] 1%|▏ | 5/396 [02:36<3:17:32, 30.31s/it] {'loss': 0.693, 'learning_rate': 6.25e-08, 'losses/dpo': 0.6929464340209961, 'losses/sft': 0.7624120712280273, 'losses/total': 0.6929464340209961, 'rewards/chosen': -0.00025857496075332165, 'rewards/rejected': -0.0007417035521939397, 'rewards/accuracies': 0.5234375, 'rewards/margins': 0.0004831284750252962, 'logps/rejected': -26.558738708496094, 'logps/chosen': -23.82025146484375, 'ref_logps/rejected': -26.55132293701172, 'ref_logps/chosen': -23.817665100097656, 'epoch': 0.04} 1%|▏ | 5/396 [02:36<3:17:32, 30.31s/it] 2%|▏ | 6/396 [03:06<3:15:49, 30.13s/it] {'loss': 0.6923, 'learning_rate': 7.5e-08, 'losses/dpo': 0.6934427618980408, 'losses/sft': 0.7273141741752625, 'losses/total': 0.6934427618980408, 'rewards/chosen': 0.0010370061499997973, 'rewards/rejected': -0.0008292265702039003, 'rewards/accuracies': 0.6015625, 'rewards/margins': 0.0018662326037883759, 'logps/rejected': -29.653806686401367, 'logps/chosen': -25.088871002197266, 'ref_logps/rejected': -29.64551544189453, 'ref_logps/chosen': -25.0992431640625, 'epoch': 0.05} 2%|▏ | 6/396 [03:06<3:15:49, 30.13s/it] 2%|▏ | 7/396 [03:35<3:12:47, 29.74s/it] {'loss': 0.693, 'learning_rate': 8.75e-08, 'losses/dpo': 0.6948896646499634, 'losses/sft': 0.6432714462280273, 'losses/total': 0.6948896646499634, 'rewards/chosen': -0.0008082209387794137, 'rewards/rejected': -0.001252580899745226, 'rewards/accuracies': 0.4765625, 'rewards/margins': 0.0004443599027581513, 'logps/rejected': -27.50556182861328, 'logps/chosen': -23.075027465820312, 'ref_logps/rejected': -27.4930362701416, 'ref_logps/chosen': -23.066946029663086, 'epoch': 0.05} 2%|▏ | 7/396 [03:35<3:12:47, 29.74s/it] 2%|▏ | 8/396 [04:04<3:11:26, 29.60s/it] {'loss': 0.6933, 'learning_rate': 1e-07, 'losses/dpo': 0.6911635398864746, 'losses/sft': 0.8042243123054504, 'losses/total': 0.6911635398864746, 'rewards/chosen': 0.0013606649590656161, 'rewards/rejected': 0.0014805227983742952, 'rewards/accuracies': 0.4921875, 'rewards/margins': -0.00011985772289335728, 'logps/rejected': -29.949260711669922, 'logps/chosen': -21.430335998535156, 'ref_logps/rejected': -29.96406364440918, 'ref_logps/chosen': -21.44394302368164, 'epoch': 0.06} 2%|▏ | 8/396 [04:04<3:11:26, 29.60s/it] 2%|▏ | 9/396 [04:33<3:09:26, 29.37s/it] {'loss': 0.6923, 'learning_rate': 1.125e-07, 'losses/dpo': 0.6914368271827698, 'losses/sft': 0.8787165284156799, 'losses/total': 0.6914368271827698, 'rewards/chosen': 0.0006745259161107242, 'rewards/rejected': -0.0010740034049376845, 'rewards/accuracies': 0.5546875, 'rewards/margins': 0.0017485294956713915, 'logps/rejected': -27.866111755371094, 'logps/chosen': -23.053390502929688, 'ref_logps/rejected': -27.85537338256836, 'ref_logps/chosen': -23.060134887695312, 'epoch': 0.07} 2%|▏ | 9/396 [04:33<3:09:26, 29.37s/it] 3%|▎ | 10/396 [05:02<3:08:04, 29.23s/it] {'loss': 0.6922, 'learning_rate': 1.25e-07, 'losses/dpo': 0.690066397190094, 'losses/sft': 1.0419297218322754, 'losses/total': 0.690066397190094, 'rewards/chosen': 0.0011563875013962388, 'rewards/rejected': -0.0007934823515824974, 'rewards/accuracies': 0.546875, 'rewards/margins': 0.0019498697947710752, 'logps/rejected': -29.587308883666992, 'logps/chosen': -23.637466430664062, 'ref_logps/rejected': -29.579374313354492, 'ref_logps/chosen': -23.649028778076172, 'epoch': 0.08} 3%|▎ | 10/396 [05:02<3:08:04, 29.23s/it] 3%|▎ | 11/396 [05:31<3:07:02, 29.15s/it] {'loss': 0.6926, 'learning_rate': 1.375e-07, 'losses/dpo': 0.6951523423194885, 'losses/sft': 0.9443475008010864, 'losses/total': 0.6951523423194885, 'rewards/chosen': 0.000978996278718114, 'rewards/rejected': -0.00014103890862315893, 'rewards/accuracies': 0.5078125, 'rewards/margins': 0.0011200353037565947, 'logps/rejected': -24.971160888671875, 'logps/chosen': -22.38899040222168, 'ref_logps/rejected': -24.969751358032227, 'ref_logps/chosen': -22.398780822753906, 'epoch': 0.08} 3%|▎ | 11/396 [05:31<3:07:02, 29.15s/it] 3%|▎ | 12/396 [06:00<3:06:36, 29.16s/it] {'loss': 0.6946, 'learning_rate': 1.5e-07, 'losses/dpo': 0.6987805962562561, 'losses/sft': 0.876471221446991, 'losses/total': 0.6987805962562561, 'rewards/chosen': -0.001627539866603911, 'rewards/rejected': 0.0011855755001306534, 'rewards/accuracies': 0.421875, 'rewards/margins': -0.002813115483149886, 'logps/rejected': -26.619457244873047, 'logps/chosen': -20.165252685546875, 'ref_logps/rejected': -26.63131332397461, 'ref_logps/chosen': -20.14897918701172, 'epoch': 0.09} 3%|▎ | 12/396 [06:00<3:06:36, 29.16s/it] 3%|▎ | 13/396 [06:29<3:05:43, 29.10s/it] {'loss': 0.6936, 'learning_rate': 1.625e-07, 'losses/dpo': 0.6952416896820068, 'losses/sft': 0.9322817325592041, 'losses/total': 0.6952416896820068, 'rewards/chosen': -5.185510963201523e-05, 'rewards/rejected': 0.0007665121229365468, 'rewards/accuracies': 0.515625, 'rewards/margins': -0.0008183673489838839, 'logps/rejected': -25.939855575561523, 'logps/chosen': -25.07573699951172, 'ref_logps/rejected': -25.947521209716797, 'ref_logps/chosen': -25.075220108032227, 'epoch': 0.1} 3%|▎ | 13/396 [06:29<3:05:43, 29.10s/it] 4%|▎ | 14/396 [06:59<3:05:35, 29.15s/it] {'loss': 0.692, 'learning_rate': 1.75e-07, 'losses/dpo': 0.6901522874832153, 'losses/sft': 0.8234641551971436, 'losses/total': 0.6901522874832153, 'rewards/chosen': 0.00040407240157946944, 'rewards/rejected': -0.0020239197183400393, 'rewards/accuracies': 0.578125, 'rewards/margins': 0.002427991945296526, 'logps/rejected': -27.590843200683594, 'logps/chosen': -22.58213233947754, 'ref_logps/rejected': -27.570602416992188, 'ref_logps/chosen': -22.58617401123047, 'epoch': 0.11} 4%|▎ | 14/396 [06:59<3:05:35, 29.15s/it] 4%|▍ | 15/396 [07:27<3:04:37, 29.08s/it] {'loss': 0.6925, 'learning_rate': 1.875e-07, 'losses/dpo': 0.6923660039901733, 'losses/sft': 0.7345502376556396, 'losses/total': 0.6923660039901733, 'rewards/chosen': 0.0006403709994629025, 'rewards/rejected': -0.0007500239298678935, 'rewards/accuracies': 0.515625, 'rewards/margins': 0.0013903947547078133, 'logps/rejected': -25.858173370361328, 'logps/chosen': -23.004196166992188, 'ref_logps/rejected': -25.85067367553711, 'ref_logps/chosen': -23.010601043701172, 'epoch': 0.11} 4%|▍ | 15/396 [07:27<3:04:37, 29.08s/it] 4%|▍ | 16/396 [07:56<3:03:37, 28.99s/it] {'loss': 0.6931, 'learning_rate': 2e-07, 'losses/dpo': 0.6901232004165649, 'losses/sft': 0.8039647936820984, 'losses/total': 0.6901232004165649, 'rewards/chosen': -0.0007656853413209319, 'rewards/rejected': -0.0010594006162136793, 'rewards/accuracies': 0.5390625, 'rewards/margins': 0.0002937153331004083, 'logps/rejected': -25.777360916137695, 'logps/chosen': -21.546062469482422, 'ref_logps/rejected': -25.766767501831055, 'ref_logps/chosen': -21.53840446472168, 'epoch': 0.12} 4%|▍ | 16/396 [07:56<3:03:37, 28.99s/it] 4%|▍ | 17/396 [08:25<3:03:40, 29.08s/it] {'loss': 0.6937, 'learning_rate': 2.1249999999999998e-07, 'losses/dpo': 0.6932737827301025, 'losses/sft': 0.7667961716651917, 'losses/total': 0.6932737827301025, 'rewards/chosen': -0.0009270801674574614, 'rewards/rejected': 0.0001852509449236095, 'rewards/accuracies': 0.4453125, 'rewards/margins': -0.0011123311705887318, 'logps/rejected': -27.877731323242188, 'logps/chosen': -22.206989288330078, 'ref_logps/rejected': -27.87958335876465, 'ref_logps/chosen': -22.19771957397461, 'epoch': 0.13} 4%|▍ | 17/396 [08:25<3:03:40, 29.08s/it] 5%|▍ | 18/396 [08:55<3:03:09, 29.07s/it] {'loss': 0.693, 'learning_rate': 2.25e-07, 'losses/dpo': 0.6932240724563599, 'losses/sft': 0.736687421798706, 'losses/total': 0.6932240724563599, 'rewards/chosen': -0.00027507508639246225, 'rewards/rejected': -0.0007486287504434586, 'rewards/accuracies': 0.5, 'rewards/margins': 0.000473553518531844, 'logps/rejected': -25.75381088256836, 'logps/chosen': -21.215139389038086, 'ref_logps/rejected': -25.746326446533203, 'ref_logps/chosen': -21.212387084960938, 'epoch': 0.14} 5%|▍ | 18/396 [08:55<3:03:09, 29.07s/it] 5%|▍ | 19/396 [09:24<3:02:29, 29.04s/it] {'loss': 0.6932, 'learning_rate': 2.3749999999999998e-07, 'losses/dpo': 0.6942628622055054, 'losses/sft': 0.7466978430747986, 'losses/total': 0.6942628622055054, 'rewards/chosen': -0.0003368390607647598, 'rewards/rejected': -0.0003901191521435976, 'rewards/accuracies': 0.484375, 'rewards/margins': 5.3280091378837824e-05, 'logps/rejected': -26.145751953125, 'logps/chosen': -22.499832153320312, 'ref_logps/rejected': -26.141849517822266, 'ref_logps/chosen': -22.496463775634766, 'epoch': 0.14} 5%|▍ | 19/396 [09:24<3:02:29, 29.04s/it] 5%|▌ | 20/396 [09:53<3:02:06, 29.06s/it] {'loss': 0.693, 'learning_rate': 2.5e-07, 'losses/dpo': 0.688271164894104, 'losses/sft': 0.8725596070289612, 'losses/total': 0.688271164894104, 'rewards/chosen': 0.0007566105341538787, 'rewards/rejected': 0.0002616568235680461, 'rewards/accuracies': 0.5, 'rewards/margins': 0.0004949538852088153, 'logps/rejected': -25.036113739013672, 'logps/chosen': -21.5505428314209, 'ref_logps/rejected': -25.038726806640625, 'ref_logps/chosen': -21.558109283447266, 'epoch': 0.15} 5%|▌ | 20/396 [09:53<3:02:06, 29.06s/it] 5%|▌ | 21/396 [10:22<3:01:36, 29.06s/it] {'loss': 0.6917, 'learning_rate': 2.625e-07, 'losses/dpo': 0.6939514875411987, 'losses/sft': 0.7525328993797302, 'losses/total': 0.6939514875411987, 'rewards/chosen': 0.0016953760059550405, 'rewards/rejected': -0.0013215603539720178, 'rewards/accuracies': 0.640625, 'rewards/margins': 0.003016936592757702, 'logps/rejected': -25.741392135620117, 'logps/chosen': -21.649169921875, 'ref_logps/rejected': -25.72817611694336, 'ref_logps/chosen': -21.666126251220703, 'epoch': 0.16} 5%|▌ | 21/396 [10:22<3:01:36, 29.06s/it] 6%|▌ | 22/396 [10:52<3:03:05, 29.37s/it] {'loss': 0.695, 'learning_rate': 2.75e-07, 'losses/dpo': 0.699163019657135, 'losses/sft': 0.7248706221580505, 'losses/total': 0.699163019657135, 'rewards/chosen': -0.002646287204697728, 'rewards/rejected': 0.0010232668137177825, 'rewards/accuracies': 0.4296875, 'rewards/margins': -0.003669553902000189, 'logps/rejected': -26.453773498535156, 'logps/chosen': -21.422496795654297, 'ref_logps/rejected': -26.464006423950195, 'ref_logps/chosen': -21.396032333374023, 'epoch': 0.17} 6%|▌ | 22/396 [10:52<3:03:05, 29.37s/it] 6%|▌ | 23/396 [11:21<3:01:52, 29.26s/it] {'loss': 0.6929, 'learning_rate': 2.8749999999999995e-07, 'losses/dpo': 0.6908746957778931, 'losses/sft': 0.7899657487869263, 'losses/total': 0.6908746957778931, 'rewards/chosen': 0.0003229643334634602, 'rewards/rejected': -0.00031191116431728005, 'rewards/accuracies': 0.515625, 'rewards/margins': 0.0006348754977807403, 'logps/rejected': -25.13469886779785, 'logps/chosen': -21.21988868713379, 'ref_logps/rejected': -25.131580352783203, 'ref_logps/chosen': -21.22311782836914, 'epoch': 0.17} 6%|▌ | 23/396 [11:21<3:01:52, 29.26s/it] 6%|▌ | 24/396 [11:50<3:00:52, 29.17s/it] {'loss': 0.6936, 'learning_rate': 3e-07, 'losses/dpo': 0.6931849718093872, 'losses/sft': 0.7270597219467163, 'losses/total': 0.6931849718093872, 'rewards/chosen': -0.000760397466365248, 'rewards/rejected': 0.00016163833788596094, 'rewards/accuracies': 0.4765625, 'rewards/margins': -0.0009220357751473784, 'logps/rejected': -27.93877410888672, 'logps/chosen': -24.172225952148438, 'ref_logps/rejected': -27.940391540527344, 'ref_logps/chosen': -24.16461944580078, 'epoch': 0.18} 6%|▌ | 24/396 [11:50<3:00:52, 29.17s/it] 6%|▋ | 25/396 [12:19<3:00:23, 29.17s/it] {'loss': 0.6929, 'learning_rate': 3.1249999999999997e-07, 'losses/dpo': 0.6930486559867859, 'losses/sft': 0.779391884803772, 'losses/total': 0.6930486559867859, 'rewards/chosen': 0.0005005812272429466, 'rewards/rejected': -6.23615924268961e-05, 'rewards/accuracies': 0.546875, 'rewards/margins': 0.0005629429360851645, 'logps/rejected': -23.77918243408203, 'logps/chosen': -23.023677825927734, 'ref_logps/rejected': -23.778560638427734, 'ref_logps/chosen': -23.028684616088867, 'epoch': 0.19} 6%|▋ | 25/396 [12:19<3:00:23, 29.17s/it] 7%|▋ | 26/396 [12:48<2:59:29, 29.11s/it] {'loss': 0.6923, 'learning_rate': 3.25e-07, 'losses/dpo': 0.6919558644294739, 'losses/sft': 0.8828473091125488, 'losses/total': 0.6919558644294739, 'rewards/chosen': 0.001289202249608934, 'rewards/rejected': -0.0005525298183783889, 'rewards/accuracies': 0.5078125, 'rewards/margins': 0.0018417320679873228, 'logps/rejected': -30.183570861816406, 'logps/chosen': -24.240978240966797, 'ref_logps/rejected': -30.17804718017578, 'ref_logps/chosen': -24.253870010375977, 'epoch': 0.2} 7%|▋ | 26/396 [12:48<2:59:29, 29.11s/it] 7%|▋ | 27/396 [13:17<2:58:53, 29.09s/it] {'loss': 0.6911, 'learning_rate': 3.375e-07, 'losses/dpo': 0.6919010281562805, 'losses/sft': 0.9361266493797302, 'losses/total': 0.6919010281562805, 'rewards/chosen': 0.0030831946060061455, 'rewards/rejected': -0.001077780150808394, 'rewards/accuracies': 0.59375, 'rewards/margins': 0.004160974640399218, 'logps/rejected': -28.10503387451172, 'logps/chosen': -22.371261596679688, 'ref_logps/rejected': -28.094257354736328, 'ref_logps/chosen': -22.4020938873291, 'epoch': 0.2} 7%|▋ | 27/396 [13:17<2:58:53, 29.09s/it] 7%|▋ | 28/396 [13:46<2:58:39, 29.13s/it] {'loss': 0.6921, 'learning_rate': 3.5e-07, 'losses/dpo': 0.6916664838790894, 'losses/sft': 0.8491181135177612, 'losses/total': 0.6916664838790894, 'rewards/chosen': 7.087946869432926e-06, 'rewards/rejected': -0.00214599072933197, 'rewards/accuracies': 0.5390625, 'rewards/margins': 0.0021530785597860813, 'logps/rejected': -27.053752899169922, 'logps/chosen': -21.107967376708984, 'ref_logps/rejected': -27.03229331970215, 'ref_logps/chosen': -21.1080379486084, 'epoch': 0.21} 7%|▋ | 28/396 [13:46<2:58:39, 29.13s/it] 7%|▋ | 29/396 [14:15<2:57:53, 29.08s/it] {'loss': 0.6906, 'learning_rate': 3.6249999999999997e-07, 'losses/dpo': 0.6926239728927612, 'losses/sft': 0.7789149284362793, 'losses/total': 0.6926239728927612, 'rewards/chosen': 0.0037725979927927256, 'rewards/rejected': -0.0013389625819399953, 'rewards/accuracies': 0.59375, 'rewards/margins': 0.005111560225486755, 'logps/rejected': -27.092483520507812, 'logps/chosen': -23.424461364746094, 'ref_logps/rejected': -27.07909393310547, 'ref_logps/chosen': -23.46218490600586, 'epoch': 0.22} 7%|▋ | 29/396 [14:15<2:57:53, 29.08s/it] 8%|▊ | 30/396 [14:44<2:57:52, 29.16s/it] {'loss': 0.6933, 'learning_rate': 3.75e-07, 'losses/dpo': 0.6948127746582031, 'losses/sft': 0.7969105243682861, 'losses/total': 0.6948127746582031, 'rewards/chosen': 0.0009542852640151978, 'rewards/rejected': 0.001077009947039187, 'rewards/accuracies': 0.515625, 'rewards/margins': -0.00012272456660866737, 'logps/rejected': -27.201662063598633, 'logps/chosen': -22.859556198120117, 'ref_logps/rejected': -27.212430953979492, 'ref_logps/chosen': -22.869096755981445, 'epoch': 0.23} 8%|▊ | 30/396 [14:44<2:57:52, 29.16s/it] 8%|▊ | 31/396 [15:13<2:56:57, 29.09s/it] {'loss': 0.6918, 'learning_rate': 3.875e-07, 'losses/dpo': 0.6922581195831299, 'losses/sft': 0.7759775519371033, 'losses/total': 0.6922581195831299, 'rewards/chosen': 0.001409594900906086, 'rewards/rejected': -0.001307606347836554, 'rewards/accuracies': 0.5625, 'rewards/margins': 0.002717201365157962, 'logps/rejected': -25.310596466064453, 'logps/chosen': -22.666168212890625, 'ref_logps/rejected': -25.297521591186523, 'ref_logps/chosen': -22.68026351928711, 'epoch': 0.23} 8%|▊ | 31/396 [15:13<2:56:57, 29.09s/it] 8%|▊ | 32/396 [15:42<2:56:22, 29.07s/it] {'loss': 0.693, 'learning_rate': 4e-07, 'losses/dpo': 0.6980300545692444, 'losses/sft': 0.7636886835098267, 'losses/total': 0.6980300545692444, 'rewards/chosen': 0.0018782642437145114, 'rewards/rejected': 0.0014139798004180193, 'rewards/accuracies': 0.5390625, 'rewards/margins': 0.000464284501504153, 'logps/rejected': -28.84569549560547, 'logps/chosen': -23.281084060668945, 'ref_logps/rejected': -28.859834671020508, 'ref_logps/chosen': -23.299869537353516, 'epoch': 0.24} 8%|▊ | 32/396 [15:42<2:56:22, 29.07s/it] 8%|▊ | 33/396 [16:12<2:56:20, 29.15s/it] {'loss': 0.6914, 'learning_rate': 4.1249999999999997e-07, 'losses/dpo': 0.6892759799957275, 'losses/sft': 0.7832686901092529, 'losses/total': 0.6892759799957275, 'rewards/chosen': 0.0027261325158178806, 'rewards/rejected': -0.0007671635248698294, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.003493295982480049, 'logps/rejected': -27.139453887939453, 'logps/chosen': -20.922544479370117, 'ref_logps/rejected': -27.13178253173828, 'ref_logps/chosen': -20.949806213378906, 'epoch': 0.25} 8%|▊ | 33/396 [16:12<2:56:20, 29.15s/it] 9%|▊ | 34/396 [16:41<2:56:33, 29.27s/it] {'loss': 0.6926, 'learning_rate': 4.2499999999999995e-07, 'losses/dpo': 0.6938276290893555, 'losses/sft': 0.7895969152450562, 'losses/total': 0.6938276290893555, 'rewards/chosen': 0.0004745282931253314, 'rewards/rejected': -0.0007089540013112128, 'rewards/accuracies': 0.53125, 'rewards/margins': 0.0011834825854748487, 'logps/rejected': -26.6143798828125, 'logps/chosen': -22.535436630249023, 'ref_logps/rejected': -26.607288360595703, 'ref_logps/chosen': -22.540180206298828, 'epoch': 0.26} 9%|▊ | 34/396 [16:41<2:56:33, 29.27s/it] 9%|▉ | 35/396 [17:10<2:55:35, 29.19s/it] {'loss': 0.6928, 'learning_rate': 4.375e-07, 'losses/dpo': 0.6910836100578308, 'losses/sft': 0.7998620271682739, 'losses/total': 0.6910836100578308, 'rewards/chosen': 0.0015797324012964964, 'rewards/rejected': 0.0007564900442957878, 'rewards/accuracies': 0.515625, 'rewards/margins': 0.0008232423570007086, 'logps/rejected': -27.329378128051758, 'logps/chosen': -21.444934844970703, 'ref_logps/rejected': -27.336944580078125, 'ref_logps/chosen': -21.460729598999023, 'epoch': 0.26} 9%|▉ | 35/396 [17:10<2:55:35, 29.19s/it] 9%|▉ | 36/396 [17:40<2:55:15, 29.21s/it] {'loss': 0.6938, 'learning_rate': 4.5e-07, 'losses/dpo': 0.6915764808654785, 'losses/sft': 0.7927474975585938, 'losses/total': 0.6915764808654785, 'rewards/chosen': 0.00022311191423796117, 'rewards/rejected': 0.0013838279992341995, 'rewards/accuracies': 0.4375, 'rewards/margins': -0.001160716055892408, 'logps/rejected': -26.22686195373535, 'logps/chosen': -22.847640991210938, 'ref_logps/rejected': -26.240699768066406, 'ref_logps/chosen': -22.84987449645996, 'epoch': 0.27} 9%|▉ | 36/396 [17:40<2:55:15, 29.21s/it] 9%|▉ | 37/396 [18:09<2:54:43, 29.20s/it] {'loss': 0.6925, 'learning_rate': 4.625e-07, 'losses/dpo': 0.6903287768363953, 'losses/sft': 0.8005999326705933, 'losses/total': 0.6903287768363953, 'rewards/chosen': 0.000591703865211457, 'rewards/rejected': -0.000813445309177041, 'rewards/accuracies': 0.4921875, 'rewards/margins': 0.001405149232596159, 'logps/rejected': -25.179964065551758, 'logps/chosen': -23.097599029541016, 'ref_logps/rejected': -25.171833038330078, 'ref_logps/chosen': -23.103515625, 'epoch': 0.28} 9%|▉ | 37/396 [18:09<2:54:43, 29.20s/it] 10%|▉ | 38/396 [18:38<2:53:40, 29.11s/it] {'loss': 0.6912, 'learning_rate': 4.7499999999999995e-07, 'losses/dpo': 0.6978300213813782, 'losses/sft': 0.7380209565162659, 'losses/total': 0.6978300213813782, 'rewards/chosen': 0.003861566074192524, 'rewards/rejected': -0.00012432527728378773, 'rewards/accuracies': 0.578125, 'rewards/margins': 0.003985891118645668, 'logps/rejected': -26.14615821838379, 'logps/chosen': -23.07529640197754, 'ref_logps/rejected': -26.144914627075195, 'ref_logps/chosen': -23.113910675048828, 'epoch': 0.29} 10%|▉ | 38/396 [18:38<2:53:40, 29.11s/it] 10%|▉ | 39/396 [19:07<2:52:59, 29.07s/it] {'loss': 0.6933, 'learning_rate': 4.875e-07, 'losses/dpo': 0.6926023960113525, 'losses/sft': 0.7966833710670471, 'losses/total': 0.6926023960113525, 'rewards/chosen': 0.0010607184376567602, 'rewards/rejected': 0.0012419875711202621, 'rewards/accuracies': 0.5234375, 'rewards/margins': -0.00018126872600987554, 'logps/rejected': -28.207073211669922, 'logps/chosen': -23.091575622558594, 'ref_logps/rejected': -28.21949577331543, 'ref_logps/chosen': -23.102182388305664, 'epoch': 0.29} 10%|▉ | 39/396 [19:07<2:52:59, 29.07s/it] 10%|█ | 40/396 [19:35<2:52:09, 29.02s/it] {'loss': 0.6903, 'learning_rate': 5e-07, 'losses/dpo': 0.6866278648376465, 'losses/sft': 0.887488842010498, 'losses/total': 0.6866278648376465, 'rewards/chosen': 0.003097555134445429, 'rewards/rejected': -0.0027624592185020447, 'rewards/accuracies': 0.609375, 'rewards/margins': 0.0058600143529474735, 'logps/rejected': -27.111900329589844, 'logps/chosen': -21.683151245117188, 'ref_logps/rejected': -27.08427619934082, 'ref_logps/chosen': -21.714126586914062, 'epoch': 0.3} 10%|█ | 40/396 [19:35<2:52:09, 29.02s/it] 10%|█ | 41/396 [20:05<2:51:47, 29.04s/it] {'loss': 0.6924, 'learning_rate': 4.985955056179775e-07, 'losses/dpo': 0.6903232336044312, 'losses/sft': 0.7454457879066467, 'losses/total': 0.6903232336044312, 'rewards/chosen': 0.0019934140145778656, 'rewards/rejected': 0.0002888469025492668, 'rewards/accuracies': 0.4765625, 'rewards/margins': 0.0017045673448592424, 'logps/rejected': -24.057823181152344, 'logps/chosen': -23.24443817138672, 'ref_logps/rejected': -24.060710906982422, 'ref_logps/chosen': -23.264373779296875, 'epoch': 0.31} 10%|█ | 41/396 [20:05<2:51:47, 29.04s/it] 11%|█ | 42/396 [20:34<2:51:27, 29.06s/it] {'loss': 0.692, 'learning_rate': 4.97191011235955e-07, 'losses/dpo': 0.6917561292648315, 'losses/sft': 0.8527467846870422, 'losses/total': 0.6917561292648315, 'rewards/chosen': 0.0005830166628584266, 'rewards/rejected': -0.0017978833056986332, 'rewards/accuracies': 0.5625, 'rewards/margins': 0.0023808996193110943, 'logps/rejected': -23.993690490722656, 'logps/chosen': -22.751291275024414, 'ref_logps/rejected': -23.975711822509766, 'ref_logps/chosen': -22.75712013244629, 'epoch': 0.32} 11%|█ | 42/396 [20:34<2:51:27, 29.06s/it] 11%|█ | 43/396 [21:03<2:50:46, 29.03s/it] {'loss': 0.6922, 'learning_rate': 4.957865168539325e-07, 'losses/dpo': 0.6912024021148682, 'losses/sft': 0.8869270086288452, 'losses/total': 0.6912024021148682, 'rewards/chosen': 0.0030817545484751463, 'rewards/rejected': 0.0009638546616770327, 'rewards/accuracies': 0.5234375, 'rewards/margins': 0.002117899712175131, 'logps/rejected': -27.22784996032715, 'logps/chosen': -24.575613021850586, 'ref_logps/rejected': -27.23748779296875, 'ref_logps/chosen': -24.60643196105957, 'epoch': 0.32} 11%|█ | 43/396 [21:03<2:50:46, 29.03s/it] 11%|█ | 44/396 [21:32<2:50:13, 29.01s/it] {'loss': 0.6913, 'learning_rate': 4.943820224719101e-07, 'losses/dpo': 0.690817654132843, 'losses/sft': 0.7518939971923828, 'losses/total': 0.690817654132843, 'rewards/chosen': 0.0029130401089787483, 'rewards/rejected': -0.0009577819146215916, 'rewards/accuracies': 0.5546875, 'rewards/margins': 0.0038708222564309835, 'logps/rejected': -29.683177947998047, 'logps/chosen': -23.449739456176758, 'ref_logps/rejected': -29.673599243164062, 'ref_logps/chosen': -23.47886848449707, 'epoch': 0.33} 11%|█ | 44/396 [21:32<2:50:13, 29.01s/it] 11%|█▏ | 45/396 [22:01<2:50:31, 29.15s/it] {'loss': 0.6923, 'learning_rate': 4.929775280898877e-07, 'losses/dpo': 0.6913425922393799, 'losses/sft': 0.6940815448760986, 'losses/total': 0.6913425922393799, 'rewards/chosen': 0.0035258703865110874, 'rewards/rejected': 0.0016616008942946792, 'rewards/accuracies': 0.5078125, 'rewards/margins': 0.0018642698414623737, 'logps/rejected': -26.939178466796875, 'logps/chosen': -21.53199577331543, 'ref_logps/rejected': -26.955793380737305, 'ref_logps/chosen': -21.567256927490234, 'epoch': 0.34} 11%|█▏ | 45/396 [22:01<2:50:31, 29.15s/it] 12%|█▏ | 46/396 [22:30<2:50:04, 29.16s/it] {'loss': 0.6903, 'learning_rate': 4.915730337078651e-07, 'losses/dpo': 0.6909126043319702, 'losses/sft': 0.9766503572463989, 'losses/total': 0.6909126043319702, 'rewards/chosen': 0.005850302986800671, 'rewards/rejected': -7.754407124593854e-05, 'rewards/accuracies': 0.5703125, 'rewards/margins': 0.005927846767008305, 'logps/rejected': -28.62994956970215, 'logps/chosen': -25.476314544677734, 'ref_logps/rejected': -28.629175186157227, 'ref_logps/chosen': -25.53481674194336, 'epoch': 0.35} 12%|█▏ | 46/396 [22:30<2:50:04, 29.16s/it] 12%|█▏ | 47/396 [23:00<2:50:10, 29.26s/it] {'loss': 0.6919, 'learning_rate': 4.901685393258427e-07, 'losses/dpo': 0.6921358704566956, 'losses/sft': 0.8468361496925354, 'losses/total': 0.6921358704566956, 'rewards/chosen': 0.004584114067256451, 'rewards/rejected': 0.0020373575389385223, 'rewards/accuracies': 0.5625, 'rewards/margins': 0.0025467565283179283, 'logps/rejected': -25.871768951416016, 'logps/chosen': -24.225303649902344, 'ref_logps/rejected': -25.892141342163086, 'ref_logps/chosen': -24.27114486694336, 'epoch': 0.35} 12%|█▏ | 47/396 [23:00<2:50:10, 29.26s/it] 12%|█▏ | 48/396 [23:29<2:48:57, 29.13s/it] {'loss': 0.6902, 'learning_rate': 4.887640449438202e-07, 'losses/dpo': 0.6886686086654663, 'losses/sft': 0.7169030904769897, 'losses/total': 0.6886686086654663, 'rewards/chosen': 0.003501205239444971, 'rewards/rejected': -0.00263836607336998, 'rewards/accuracies': 0.5703125, 'rewards/margins': 0.006139571778476238, 'logps/rejected': -27.604530334472656, 'logps/chosen': -21.586902618408203, 'ref_logps/rejected': -27.578147888183594, 'ref_logps/chosen': -21.621915817260742, 'epoch': 0.36} 12%|█▏ | 48/396 [23:29<2:48:57, 29.13s/it] 12%|█▏ | 49/396 [23:58<2:49:37, 29.33s/it] {'loss': 0.6909, 'learning_rate': 4.873595505617978e-07, 'losses/dpo': 0.6952996850013733, 'losses/sft': 0.7813842296600342, 'losses/total': 0.6952996850013733, 'rewards/chosen': 0.004394051153212786, 'rewards/rejected': -0.0003633448213804513, 'rewards/accuracies': 0.5859375, 'rewards/margins': 0.004757395945489407, 'logps/rejected': -30.10862922668457, 'logps/chosen': -24.600910186767578, 'ref_logps/rejected': -30.104995727539062, 'ref_logps/chosen': -24.644847869873047, 'epoch': 0.37} 12%|█▏ | 49/396 [23:58<2:49:37, 29.33s/it] 13%|█▎ | 50/396 [24:28<2:48:49, 29.28s/it] {'loss': 0.6914, 'learning_rate': 4.859550561797752e-07, 'losses/dpo': 0.6889626979827881, 'losses/sft': 0.8148602843284607, 'losses/total': 0.6889626979827881, 'rewards/chosen': 0.006161023862659931, 'rewards/rejected': 0.0024394195061177015, 'rewards/accuracies': 0.5, 'rewards/margins': 0.0037216043565422297, 'logps/rejected': -24.876815795898438, 'logps/chosen': -20.754308700561523, 'ref_logps/rejected': -24.90121078491211, 'ref_logps/chosen': -20.81591796875, 'epoch': 0.38} 13%|█▎ | 50/396 [24:28<2:48:49, 29.28s/it] 13%|█▎ | 51/396 [24:57<2:47:52, 29.20s/it] {'loss': 0.6915, 'learning_rate': 4.845505617977528e-07, 'losses/dpo': 0.6886854767799377, 'losses/sft': 0.8582803010940552, 'losses/total': 0.6886854767799377, 'rewards/chosen': 0.00511885154992342, 'rewards/rejected': 0.0015903504099696875, 'rewards/accuracies': 0.5546875, 'rewards/margins': 0.00352850160561502, 'logps/rejected': -24.949783325195312, 'logps/chosen': -23.585115432739258, 'ref_logps/rejected': -24.965686798095703, 'ref_logps/chosen': -23.63630485534668, 'epoch': 0.38} 13%|█▎ | 51/396 [24:57<2:47:52, 29.20s/it] 13%|█▎ | 52/396 [25:26<2:47:54, 29.29s/it] {'loss': 0.6916, 'learning_rate': 4.831460674157303e-07, 'losses/dpo': 0.6899633407592773, 'losses/sft': 0.6870510578155518, 'losses/total': 0.6899633407592773, 'rewards/chosen': 0.0026545142754912376, 'rewards/rejected': -0.0005432275356724858, 'rewards/accuracies': 0.5546875, 'rewards/margins': 0.0031977419275790453, 'logps/rejected': -24.87842559814453, 'logps/chosen': -20.576318740844727, 'ref_logps/rejected': -24.87299346923828, 'ref_logps/chosen': -20.60286521911621, 'epoch': 0.39} 13%|█▎ | 52/396 [25:26<2:47:54, 29.29s/it] 13%|█▎ | 53/396 [25:55<2:46:51, 29.19s/it] {'loss': 0.6887, 'learning_rate': 4.817415730337078e-07, 'losses/dpo': 0.6841185092926025, 'losses/sft': 0.833280622959137, 'losses/total': 0.6841185092926025, 'rewards/chosen': 0.005483907647430897, 'rewards/rejected': -0.003772540483623743, 'rewards/accuracies': 0.6171875, 'rewards/margins': 0.009256447665393353, 'logps/rejected': -25.128353118896484, 'logps/chosen': -24.051544189453125, 'ref_logps/rejected': -25.090627670288086, 'ref_logps/chosen': -24.10638427734375, 'epoch': 0.4} 13%|█▎ | 53/396 [25:55<2:46:51, 29.19s/it] 14%|█▎ | 54/396 [26:24<2:45:42, 29.07s/it] {'loss': 0.6914, 'learning_rate': 4.803370786516854e-07, 'losses/dpo': 0.687272846698761, 'losses/sft': 0.7218018770217896, 'losses/total': 0.687272846698761, 'rewards/chosen': 0.00562882237136364, 'rewards/rejected': 0.002009942661970854, 'rewards/accuracies': 0.546875, 'rewards/margins': 0.003618879709392786, 'logps/rejected': -26.20134735107422, 'logps/chosen': -21.564958572387695, 'ref_logps/rejected': -26.221445083618164, 'ref_logps/chosen': -21.621246337890625, 'epoch': 0.41} 14%|█▎ | 54/396 [26:24<2:45:42, 29.07s/it] 14%|█▍ | 55/396 [26:53<2:45:09, 29.06s/it] {'loss': 0.6884, 'learning_rate': 4.789325842696629e-07, 'losses/dpo': 0.6862033605575562, 'losses/sft': 0.9426325559616089, 'losses/total': 0.6862033605575562, 'rewards/chosen': 0.007655493449419737, 'rewards/rejected': -0.0021762042306363583, 'rewards/accuracies': 0.5859375, 'rewards/margins': 0.009831697680056095, 'logps/rejected': -26.1567325592041, 'logps/chosen': -23.699432373046875, 'ref_logps/rejected': -26.134971618652344, 'ref_logps/chosen': -23.775989532470703, 'epoch': 0.42} 14%|█▍ | 55/396 [26:53<2:45:09, 29.06s/it] 14%|█▍ | 56/396 [27:23<2:45:53, 29.28s/it] {'loss': 0.6881, 'learning_rate': 4.775280898876405e-07, 'losses/dpo': 0.6900283098220825, 'losses/sft': 0.8505688905715942, 'losses/total': 0.6900283098220825, 'rewards/chosen': 0.005787987262010574, 'rewards/rejected': -0.004667359404265881, 'rewards/accuracies': 0.6640625, 'rewards/margins': 0.010455346666276455, 'logps/rejected': -27.695213317871094, 'logps/chosen': -23.076374053955078, 'ref_logps/rejected': -27.64853858947754, 'ref_logps/chosen': -23.134254455566406, 'epoch': 0.42} 14%|█▍ | 56/396 [27:23<2:45:53, 29.28s/it] 14%|█▍ | 57/396 [27:52<2:44:51, 29.18s/it] {'loss': 0.6911, 'learning_rate': 4.7612359550561797e-07, 'losses/dpo': 0.6942879557609558, 'losses/sft': 0.7311047911643982, 'losses/total': 0.6942879557609558, 'rewards/chosen': 0.0051962630823254585, 'rewards/rejected': 0.0010057740146294236, 'rewards/accuracies': 0.6015625, 'rewards/margins': 0.004190489184111357, 'logps/rejected': -24.36727523803711, 'logps/chosen': -21.54006576538086, 'ref_logps/rejected': -24.37733268737793, 'ref_logps/chosen': -21.592029571533203, 'epoch': 0.43} 14%|█▍ | 57/396 [27:52<2:44:51, 29.18s/it] 15%|█▍ | 58/396 [28:20<2:43:51, 29.09s/it] {'loss': 0.6906, 'learning_rate': 4.747191011235955e-07, 'losses/dpo': 0.6889323592185974, 'losses/sft': 0.7590615749359131, 'losses/total': 0.6889323592185974, 'rewards/chosen': 0.004648969508707523, 'rewards/rejected': -0.0007569912704639137, 'rewards/accuracies': 0.6328125, 'rewards/margins': 0.005405961070209742, 'logps/rejected': -28.501548767089844, 'logps/chosen': -21.678865432739258, 'ref_logps/rejected': -28.493976593017578, 'ref_logps/chosen': -21.72535514831543, 'epoch': 0.44} 15%|█▍ | 58/396 [28:20<2:43:51, 29.09s/it] 15%|█▍ | 59/396 [28:50<2:43:38, 29.13s/it] {'loss': 0.684, 'learning_rate': 4.7331460674157303e-07, 'losses/dpo': 0.6820257902145386, 'losses/sft': 0.8394409418106079, 'losses/total': 0.6820257902145386, 'rewards/chosen': 0.009928906336426735, 'rewards/rejected': -0.008745552971959114, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.01867445930838585, 'logps/rejected': -26.515047073364258, 'logps/chosen': -23.9781436920166, 'ref_logps/rejected': -26.427589416503906, 'ref_logps/chosen': -24.077434539794922, 'epoch': 0.45} 15%|█▍ | 59/396 [28:50<2:43:38, 29.13s/it] 15%|█▌ | 60/396 [29:19<2:42:43, 29.06s/it] {'loss': 0.6894, 'learning_rate': 4.7191011235955054e-07, 'losses/dpo': 0.6909818053245544, 'losses/sft': 0.7433596253395081, 'losses/total': 0.6909818053245544, 'rewards/chosen': 0.00875765923410654, 'rewards/rejected': 0.0009556080331094563, 'rewards/accuracies': 0.5625, 'rewards/margins': 0.007802051026374102, 'logps/rejected': -30.391559600830078, 'logps/chosen': -22.162433624267578, 'ref_logps/rejected': -30.40111541748047, 'ref_logps/chosen': -22.250009536743164, 'epoch': 0.45} 15%|█▌ | 60/396 [29:19<2:42:43, 29.06s/it] 15%|█▌ | 61/396 [29:47<2:42:01, 29.02s/it] {'loss': 0.6888, 'learning_rate': 4.705056179775281e-07, 'losses/dpo': 0.6858267188072205, 'losses/sft': 0.6961312294006348, 'losses/total': 0.6858267188072205, 'rewards/chosen': 0.007471038028597832, 'rewards/rejected': -0.0013920125784352422, 'rewards/accuracies': 0.625, 'rewards/margins': 0.008863050490617752, 'logps/rejected': -26.851608276367188, 'logps/chosen': -24.088329315185547, 'ref_logps/rejected': -26.837688446044922, 'ref_logps/chosen': -24.163042068481445, 'epoch': 0.46} 15%|█▌ | 61/396 [29:47<2:42:01, 29.02s/it] 16%|█▌ | 62/396 [30:17<2:41:39, 29.04s/it] {'loss': 0.6896, 'learning_rate': 4.691011235955056e-07, 'losses/dpo': 0.6952353715896606, 'losses/sft': 0.8425909280776978, 'losses/total': 0.6952353715896606, 'rewards/chosen': 0.006925276480615139, 'rewards/rejected': -0.00042251107515767217, 'rewards/accuracies': 0.578125, 'rewards/margins': 0.007347787730395794, 'logps/rejected': -28.607454299926758, 'logps/chosen': -23.13729476928711, 'ref_logps/rejected': -28.603229522705078, 'ref_logps/chosen': -23.206546783447266, 'epoch': 0.47} 16%|█▌ | 62/396 [30:17<2:41:39, 29.04s/it] 16%|█▌ | 63/396 [30:45<2:40:54, 28.99s/it] {'loss': 0.6882, 'learning_rate': 4.6769662921348315e-07, 'losses/dpo': 0.690306544303894, 'losses/sft': 0.7292711734771729, 'losses/total': 0.690306544303894, 'rewards/chosen': 0.010831332765519619, 'rewards/rejected': 0.0005220210296101868, 'rewards/accuracies': 0.640625, 'rewards/margins': 0.010309312492609024, 'logps/rejected': -25.503629684448242, 'logps/chosen': -22.758800506591797, 'ref_logps/rejected': -25.50885009765625, 'ref_logps/chosen': -22.867115020751953, 'epoch': 0.48} 16%|█▌ | 63/396 [30:45<2:40:54, 28.99s/it] 16%|█▌ | 64/396 [31:14<2:40:14, 28.96s/it] {'loss': 0.6868, 'learning_rate': 4.662921348314606e-07, 'losses/dpo': 0.6876275539398193, 'losses/sft': 0.9537997245788574, 'losses/total': 0.6876275539398193, 'rewards/chosen': 0.012752560898661613, 'rewards/rejected': -0.0001996997743844986, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.012952261604368687, 'logps/rejected': -27.15595245361328, 'logps/chosen': -22.957290649414062, 'ref_logps/rejected': -27.15395736694336, 'ref_logps/chosen': -23.08481788635254, 'epoch': 0.48} 16%|█▌ | 64/396 [31:14<2:40:14, 28.96s/it] 16%|█▋ | 65/396 [31:44<2:40:19, 29.06s/it] {'loss': 0.688, 'learning_rate': 4.6488764044943816e-07, 'losses/dpo': 0.6866365075111389, 'losses/sft': 0.748786211013794, 'losses/total': 0.6866365075111389, 'rewards/chosen': 0.009048780426383018, 'rewards/rejected': -0.0016076482133939862, 'rewards/accuracies': 0.5546875, 'rewards/margins': 0.010656429454684258, 'logps/rejected': -28.90016746520996, 'logps/chosen': -21.856212615966797, 'ref_logps/rejected': -28.884090423583984, 'ref_logps/chosen': -21.946701049804688, 'epoch': 0.49} 16%|█▋ | 65/396 [31:44<2:40:19, 29.06s/it] 17%|█▋ | 66/396 [32:13<2:39:43, 29.04s/it] {'loss': 0.6866, 'learning_rate': 4.634831460674157e-07, 'losses/dpo': 0.6858303546905518, 'losses/sft': 0.7428255677223206, 'losses/total': 0.6858303546905518, 'rewards/chosen': 0.009956244379281998, 'rewards/rejected': -0.0034284412395209074, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.013384684920310974, 'logps/rejected': -24.484195709228516, 'logps/chosen': -21.727970123291016, 'ref_logps/rejected': -24.44991111755371, 'ref_logps/chosen': -21.827533721923828, 'epoch': 0.5} 17%|█▋ | 66/396 [32:13<2:39:43, 29.04s/it] 17%|█▋ | 67/396 [32:42<2:39:12, 29.03s/it] {'loss': 0.685, 'learning_rate': 4.620786516853932e-07, 'losses/dpo': 0.6789939403533936, 'losses/sft': 0.718001127243042, 'losses/total': 0.6789939403533936, 'rewards/chosen': 0.013434557244181633, 'rewards/rejected': -0.0033441600389778614, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.016778716817498207, 'logps/rejected': -25.03292465209961, 'logps/chosen': -23.145030975341797, 'ref_logps/rejected': -24.999483108520508, 'ref_logps/chosen': -23.27937889099121, 'epoch': 0.51} 17%|█▋ | 67/396 [32:42<2:39:12, 29.03s/it] 17%|█▋ | 68/396 [33:11<2:38:30, 29.00s/it] {'loss': 0.6852, 'learning_rate': 4.606741573033708e-07, 'losses/dpo': 0.6921157836914062, 'losses/sft': 0.8621765971183777, 'losses/total': 0.6921157836914062, 'rewards/chosen': 0.011711984872817993, 'rewards/rejected': -0.0046123419888317585, 'rewards/accuracies': 0.640625, 'rewards/margins': 0.016324326395988464, 'logps/rejected': -25.74646759033203, 'logps/chosen': -21.208370208740234, 'ref_logps/rejected': -25.700342178344727, 'ref_logps/chosen': -21.325489044189453, 'epoch': 0.51} 17%|█▋ | 68/396 [33:11<2:38:30, 29.00s/it] 17%|█▋ | 69/396 [33:40<2:38:04, 29.00s/it] {'loss': 0.6885, 'learning_rate': 4.592696629213483e-07, 'losses/dpo': 0.689292848110199, 'losses/sft': 0.7215853929519653, 'losses/total': 0.689292848110199, 'rewards/chosen': 0.009869576431810856, 'rewards/rejected': 5.913013592362404e-05, 'rewards/accuracies': 0.609375, 'rewards/margins': 0.009810445830225945, 'logps/rejected': -28.81465721130371, 'logps/chosen': -22.621421813964844, 'ref_logps/rejected': -28.81524658203125, 'ref_logps/chosen': -22.720117568969727, 'epoch': 0.52} 17%|█▋ | 69/396 [33:40<2:38:04, 29.00s/it] 18%|█▊ | 70/396 [34:09<2:37:58, 29.08s/it] {'loss': 0.6872, 'learning_rate': 4.5786516853932584e-07, 'losses/dpo': 0.6876038312911987, 'losses/sft': 0.7616434097290039, 'losses/total': 0.6876038312911987, 'rewards/chosen': 0.01009867899119854, 'rewards/rejected': -0.002220625290647149, 'rewards/accuracies': 0.6328125, 'rewards/margins': 0.01231930311769247, 'logps/rejected': -28.595046997070312, 'logps/chosen': -22.636703491210938, 'ref_logps/rejected': -28.57284164428711, 'ref_logps/chosen': -22.73769187927246, 'epoch': 0.53} 18%|█▊ | 70/396 [34:09<2:37:58, 29.08s/it] 18%|█▊ | 71/396 [34:38<2:37:23, 29.06s/it] {'loss': 0.6849, 'learning_rate': 4.5646067415730334e-07, 'losses/dpo': 0.6818934082984924, 'losses/sft': 0.8828948736190796, 'losses/total': 0.6818934082984924, 'rewards/chosen': 0.012628016993403435, 'rewards/rejected': -0.004456131719052792, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.01708414778113365, 'logps/rejected': -28.524490356445312, 'logps/chosen': -23.055517196655273, 'ref_logps/rejected': -28.479928970336914, 'ref_logps/chosen': -23.18179702758789, 'epoch': 0.54} 18%|█▊ | 71/396 [34:38<2:37:23, 29.06s/it] 18%|█▊ | 72/396 [35:07<2:37:02, 29.08s/it] {'loss': 0.682, 'learning_rate': 4.550561797752809e-07, 'losses/dpo': 0.6922101974487305, 'losses/sft': 0.7417640089988708, 'losses/total': 0.6922101974487305, 'rewards/chosen': 0.016913428902626038, 'rewards/rejected': -0.006055259145796299, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.02296869084239006, 'logps/rejected': -29.403223037719727, 'logps/chosen': -25.802350997924805, 'ref_logps/rejected': -29.342666625976562, 'ref_logps/chosen': -25.971485137939453, 'epoch': 0.54} 18%|█▊ | 72/396 [35:07<2:37:02, 29.08s/it] 18%|█▊ | 73/396 [35:36<2:36:53, 29.15s/it] {'loss': 0.6849, 'learning_rate': 4.536516853932584e-07, 'losses/dpo': 0.6843876242637634, 'losses/sft': 0.6335030198097229, 'losses/total': 0.6843876242637634, 'rewards/chosen': 0.010656386613845825, 'rewards/rejected': -0.006535808090120554, 'rewards/accuracies': 0.609375, 'rewards/margins': 0.017192194238305092, 'logps/rejected': -31.861392974853516, 'logps/chosen': -22.979541778564453, 'ref_logps/rejected': -31.796035766601562, 'ref_logps/chosen': -23.086105346679688, 'epoch': 0.55} 18%|█▊ | 73/396 [35:36<2:36:53, 29.15s/it] 19%|█▊ | 74/396 [36:05<2:35:59, 29.07s/it] {'loss': 0.6842, 'learning_rate': 4.522471910112359e-07, 'losses/dpo': 0.6832489967346191, 'losses/sft': 0.8737274408340454, 'losses/total': 0.6832489967346191, 'rewards/chosen': 0.012302841059863567, 'rewards/rejected': -0.006098220124840736, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.01840106211602688, 'logps/rejected': -25.32451629638672, 'logps/chosen': -21.333240509033203, 'ref_logps/rejected': -25.263538360595703, 'ref_logps/chosen': -21.456268310546875, 'epoch': 0.56} 19%|█▊ | 74/396 [36:05<2:35:59, 29.07s/it] 19%|█▉ | 75/396 [36:34<2:35:26, 29.05s/it] {'loss': 0.6845, 'learning_rate': 4.5084269662921347e-07, 'losses/dpo': 0.6803750991821289, 'losses/sft': 0.7227590084075928, 'losses/total': 0.6803750991821289, 'rewards/chosen': 0.009546317160129547, 'rewards/rejected': -0.008210576139390469, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.01775689423084259, 'logps/rejected': -25.504837036132812, 'logps/chosen': -21.905548095703125, 'ref_logps/rejected': -25.422731399536133, 'ref_logps/chosen': -22.001012802124023, 'epoch': 0.57} 19%|█▉ | 75/396 [36:34<2:35:26, 29.05s/it] 19%|█▉ | 76/396 [37:03<2:34:51, 29.04s/it] {'loss': 0.6845, 'learning_rate': 4.4943820224719097e-07, 'losses/dpo': 0.6879241466522217, 'losses/sft': 0.936349093914032, 'losses/total': 0.6879241466522217, 'rewards/chosen': 0.012477071955800056, 'rewards/rejected': -0.005634305067360401, 'rewards/accuracies': 0.625, 'rewards/margins': 0.018111376091837883, 'logps/rejected': -25.56966209411621, 'logps/chosen': -22.212453842163086, 'ref_logps/rejected': -25.51331901550293, 'ref_logps/chosen': -22.337223052978516, 'epoch': 0.57} 19%|█▉ | 76/396 [37:03<2:34:51, 29.04s/it] 19%|█▉ | 77/396 [37:32<2:34:48, 29.12s/it] {'loss': 0.6818, 'learning_rate': 4.4803370786516853e-07, 'losses/dpo': 0.6872521638870239, 'losses/sft': 0.6872013211250305, 'losses/total': 0.6872521638870239, 'rewards/chosen': 0.016955075785517693, 'rewards/rejected': -0.006456219125539064, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.02341129444539547, 'logps/rejected': -26.30996322631836, 'logps/chosen': -20.199138641357422, 'ref_logps/rejected': -26.24540138244629, 'ref_logps/chosen': -20.368690490722656, 'epoch': 0.58} 19%|█▉ | 77/396 [37:32<2:34:48, 29.12s/it] 20%|█▉ | 78/396 [38:02<2:34:25, 29.14s/it] {'loss': 0.6813, 'learning_rate': 4.4662921348314603e-07, 'losses/dpo': 0.6833238005638123, 'losses/sft': 0.7775546312332153, 'losses/total': 0.6833238005638123, 'rewards/chosen': 0.01314287818968296, 'rewards/rejected': -0.011400324292480946, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.02454320341348648, 'logps/rejected': -26.07961082458496, 'logps/chosen': -22.031774520874023, 'ref_logps/rejected': -25.965608596801758, 'ref_logps/chosen': -22.163204193115234, 'epoch': 0.59} 20%|█▉ | 78/396 [38:02<2:34:25, 29.14s/it] 20%|█▉ | 79/396 [38:31<2:33:39, 29.08s/it] {'loss': 0.6801, 'learning_rate': 4.452247191011236e-07, 'losses/dpo': 0.6835525035858154, 'losses/sft': 0.7558909058570862, 'losses/total': 0.6835525035858154, 'rewards/chosen': 0.013441269285976887, 'rewards/rejected': -0.013675847090780735, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.027117114514112473, 'logps/rejected': -26.621906280517578, 'logps/chosen': -22.522083282470703, 'ref_logps/rejected': -26.48514747619629, 'ref_logps/chosen': -22.656497955322266, 'epoch': 0.6} 20%|█▉ | 79/396 [38:31<2:33:39, 29.08s/it] 20%|██ | 80/396 [39:00<2:33:08, 29.08s/it] {'loss': 0.6836, 'learning_rate': 4.438202247191011e-07, 'losses/dpo': 0.6767468452453613, 'losses/sft': 0.8101401329040527, 'losses/total': 0.6767468452453613, 'rewards/chosen': 0.015375516377389431, 'rewards/rejected': -0.004492484033107758, 'rewards/accuracies': 0.59375, 'rewards/margins': 0.019868001341819763, 'logps/rejected': -26.428781509399414, 'logps/chosen': -22.05775260925293, 'ref_logps/rejected': -26.38385772705078, 'ref_logps/chosen': -22.211511611938477, 'epoch': 0.6} 20%|██ | 80/396 [39:00<2:33:08, 29.08s/it] 20%|██ | 81/396 [39:29<2:32:32, 29.06s/it] {'loss': 0.6803, 'learning_rate': 4.4241573033707865e-07, 'losses/dpo': 0.6830211281776428, 'losses/sft': 0.7352213263511658, 'losses/total': 0.6830211281776428, 'rewards/chosen': 0.013045946136116982, 'rewards/rejected': -0.013641366735100746, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.026687312871217728, 'logps/rejected': -27.90719985961914, 'logps/chosen': -22.327136993408203, 'ref_logps/rejected': -27.77078628540039, 'ref_logps/chosen': -22.457595825195312, 'epoch': 0.61} 20%|██ | 81/396 [39:29<2:32:32, 29.06s/it] 21%|██ | 82/396 [39:58<2:31:56, 29.03s/it] {'loss': 0.6802, 'learning_rate': 4.410112359550562e-07, 'losses/dpo': 0.673937976360321, 'losses/sft': 0.7962872385978699, 'losses/total': 0.673937976360321, 'rewards/chosen': 0.015645721927285194, 'rewards/rejected': -0.011427570134401321, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.027073292061686516, 'logps/rejected': -26.4810791015625, 'logps/chosen': -23.738140106201172, 'ref_logps/rejected': -26.366804122924805, 'ref_logps/chosen': -23.89459991455078, 'epoch': 0.62} 21%|██ | 82/396 [39:58<2:31:56, 29.03s/it] 21%|██ | 83/396 [40:27<2:32:10, 29.17s/it] {'loss': 0.6805, 'learning_rate': 4.3960674157303366e-07, 'losses/dpo': 0.6789628863334656, 'losses/sft': 0.9124815464019775, 'losses/total': 0.6789628863334656, 'rewards/chosen': 0.01077171228826046, 'rewards/rejected': -0.015632012858986855, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.026403725147247314, 'logps/rejected': -24.34069061279297, 'logps/chosen': -21.008014678955078, 'ref_logps/rejected': -24.184371948242188, 'ref_logps/chosen': -21.115734100341797, 'epoch': 0.63} 21%|██ | 83/396 [40:27<2:32:10, 29.17s/it] 21%|██ | 84/396 [40:56<2:31:42, 29.17s/it] {'loss': 0.6833, 'learning_rate': 4.382022471910112e-07, 'losses/dpo': 0.6907744407653809, 'losses/sft': 0.7639827728271484, 'losses/total': 0.6907744407653809, 'rewards/chosen': 0.011834252625703812, 'rewards/rejected': -0.008949968963861465, 'rewards/accuracies': 0.625, 'rewards/margins': 0.020784219726920128, 'logps/rejected': -26.963245391845703, 'logps/chosen': -20.62143325805664, 'ref_logps/rejected': -26.87374496459961, 'ref_logps/chosen': -20.739776611328125, 'epoch': 0.63} 21%|██ | 84/396 [40:56<2:31:42, 29.17s/it] 21%|██▏ | 85/396 [41:25<2:30:50, 29.10s/it] {'loss': 0.6846, 'learning_rate': 4.367977528089887e-07, 'losses/dpo': 0.6878204345703125, 'losses/sft': 0.6917088627815247, 'losses/total': 0.6878204345703125, 'rewards/chosen': 0.005295174196362495, 'rewards/rejected': -0.012819863855838776, 'rewards/accuracies': 0.6328125, 'rewards/margins': 0.01811503805220127, 'logps/rejected': -24.5494384765625, 'logps/chosen': -21.591964721679688, 'ref_logps/rejected': -24.421239852905273, 'ref_logps/chosen': -21.644916534423828, 'epoch': 0.64} 21%|██▏ | 85/396 [41:25<2:30:50, 29.10s/it] 22%|██▏ | 86/396 [41:54<2:30:28, 29.12s/it] {'loss': 0.6825, 'learning_rate': 4.353932584269663e-07, 'losses/dpo': 0.6937445402145386, 'losses/sft': 0.9424384832382202, 'losses/total': 0.6937445402145386, 'rewards/chosen': 0.013164759613573551, 'rewards/rejected': -0.009081022813916206, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.022245781496167183, 'logps/rejected': -28.227123260498047, 'logps/chosen': -24.759811401367188, 'ref_logps/rejected': -28.136310577392578, 'ref_logps/chosen': -24.891460418701172, 'epoch': 0.65} 22%|██▏ | 86/396 [41:54<2:30:28, 29.12s/it] 22%|██▏ | 87/396 [42:23<2:29:30, 29.03s/it] {'loss': 0.6795, 'learning_rate': 4.339887640449438e-07, 'losses/dpo': 0.6909404993057251, 'losses/sft': 0.8603497743606567, 'losses/total': 0.6909404993057251, 'rewards/chosen': 0.01660749316215515, 'rewards/rejected': -0.012220719829201698, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.028828214854002, 'logps/rejected': -26.10009002685547, 'logps/chosen': -22.8006591796875, 'ref_logps/rejected': -25.977882385253906, 'ref_logps/chosen': -22.96673583984375, 'epoch': 0.66} 22%|██▏ | 87/396 [42:23<2:29:30, 29.03s/it] 22%|██▏ | 88/396 [42:52<2:29:10, 29.06s/it] {'loss': 0.6752, 'learning_rate': 4.3258426966292134e-07, 'losses/dpo': 0.6638558506965637, 'losses/sft': 0.8455443382263184, 'losses/total': 0.6638558506965637, 'rewards/chosen': 0.018375899642705917, 'rewards/rejected': -0.019228998571634293, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.03760489821434021, 'logps/rejected': -28.13039779663086, 'logps/chosen': -24.15732765197754, 'ref_logps/rejected': -27.938106536865234, 'ref_logps/chosen': -24.341087341308594, 'epoch': 0.66} 22%|██▏ | 88/396 [42:52<2:29:10, 29.06s/it] 22%|██▏ | 89/396 [43:21<2:28:25, 29.01s/it] {'loss': 0.6771, 'learning_rate': 4.311797752808989e-07, 'losses/dpo': 0.6774411797523499, 'losses/sft': 0.9257520437240601, 'losses/total': 0.6774411797523499, 'rewards/chosen': 0.015900880098342896, 'rewards/rejected': -0.017474830150604248, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.033375710248947144, 'logps/rejected': -25.207626342773438, 'logps/chosen': -21.290430068969727, 'ref_logps/rejected': -25.032873153686523, 'ref_logps/chosen': -21.449438095092773, 'epoch': 0.67} 22%|██▏ | 89/396 [43:21<2:28:25, 29.01s/it] 23%|██▎ | 90/396 [43:50<2:28:09, 29.05s/it] {'loss': 0.681, 'learning_rate': 4.297752808988764e-07, 'losses/dpo': 0.6869298219680786, 'losses/sft': 0.8004887104034424, 'losses/total': 0.6869298219680786, 'rewards/chosen': 0.015849877148866653, 'rewards/rejected': -0.009937671013176441, 'rewards/accuracies': 0.640625, 'rewards/margins': 0.02578754723072052, 'logps/rejected': -27.57483673095703, 'logps/chosen': -24.241390228271484, 'ref_logps/rejected': -27.475460052490234, 'ref_logps/chosen': -24.399887084960938, 'epoch': 0.68} 23%|██▎ | 90/396 [43:50<2:28:09, 29.05s/it] 23%|██▎ | 91/396 [44:20<2:28:01, 29.12s/it] {'loss': 0.6843, 'learning_rate': 4.2837078651685396e-07, 'losses/dpo': 0.6896719336509705, 'losses/sft': 0.7865870594978333, 'losses/total': 0.6896719336509705, 'rewards/chosen': 0.010422902181744576, 'rewards/rejected': -0.00904794316738844, 'rewards/accuracies': 0.6328125, 'rewards/margins': 0.01947084441781044, 'logps/rejected': -25.188884735107422, 'logps/chosen': -21.290605545043945, 'ref_logps/rejected': -25.098407745361328, 'ref_logps/chosen': -21.39483642578125, 'epoch': 0.69} 23%|██▎ | 91/396 [44:20<2:28:01, 29.12s/it] 23%|██▎ | 92/396 [44:49<2:27:45, 29.16s/it] {'loss': 0.6863, 'learning_rate': 4.269662921348314e-07, 'losses/dpo': 0.6820717453956604, 'losses/sft': 0.8161361813545227, 'losses/total': 0.6820717453956604, 'rewards/chosen': 0.009964808821678162, 'rewards/rejected': -0.00545014813542366, 'rewards/accuracies': 0.53125, 'rewards/margins': 0.015414956025779247, 'logps/rejected': -24.005056381225586, 'logps/chosen': -21.395389556884766, 'ref_logps/rejected': -23.9505558013916, 'ref_logps/chosen': -21.495037078857422, 'epoch': 0.69} 23%|██▎ | 92/396 [44:49<2:27:45, 29.16s/it] 23%|██▎ | 93/396 [45:18<2:26:56, 29.10s/it] {'loss': 0.6786, 'learning_rate': 4.2556179775280896e-07, 'losses/dpo': 0.6868577599525452, 'losses/sft': 0.7177249193191528, 'losses/total': 0.6868577599525452, 'rewards/chosen': 0.012338603846728802, 'rewards/rejected': -0.018008096143603325, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.030346699059009552, 'logps/rejected': -24.735366821289062, 'logps/chosen': -20.948806762695312, 'ref_logps/rejected': -24.555286407470703, 'ref_logps/chosen': -21.072193145751953, 'epoch': 0.7} 23%|██▎ | 93/396 [45:18<2:26:56, 29.10s/it] 24%|██▎ | 94/396 [45:47<2:26:41, 29.14s/it] {'loss': 0.6783, 'learning_rate': 4.2415730337078647e-07, 'losses/dpo': 0.6721839904785156, 'losses/sft': 0.816402018070221, 'losses/total': 0.6721839904785156, 'rewards/chosen': 0.01632346771657467, 'rewards/rejected': -0.01513681747019291, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.03146028146147728, 'logps/rejected': -28.811023712158203, 'logps/chosen': -24.245830535888672, 'ref_logps/rejected': -28.65966033935547, 'ref_logps/chosen': -24.40906524658203, 'epoch': 0.71} 24%|██▎ | 94/396 [45:47<2:26:41, 29.14s/it] 24%|██▍ | 95/396 [46:16<2:26:05, 29.12s/it] {'loss': 0.6709, 'learning_rate': 4.22752808988764e-07, 'losses/dpo': 0.6718644499778748, 'losses/sft': 0.823063313961029, 'losses/total': 0.6718644499778748, 'rewards/chosen': 0.01504128985106945, 'rewards/rejected': -0.03145648539066315, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.04649777710437775, 'logps/rejected': -29.088359832763672, 'logps/chosen': -22.48372459411621, 'ref_logps/rejected': -28.77379608154297, 'ref_logps/chosen': -22.634136199951172, 'epoch': 0.72} 24%|██▍ | 95/396 [46:16<2:26:05, 29.12s/it] 24%|██▍ | 96/396 [46:45<2:25:16, 29.05s/it] {'loss': 0.6785, 'learning_rate': 4.2134831460674153e-07, 'losses/dpo': 0.6842025518417358, 'losses/sft': 0.8330531120300293, 'losses/total': 0.6842025518417358, 'rewards/chosen': 0.009463240392506123, 'rewards/rejected': -0.02177419885993004, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.031237438321113586, 'logps/rejected': -27.790451049804688, 'logps/chosen': -20.869436264038086, 'ref_logps/rejected': -27.572711944580078, 'ref_logps/chosen': -20.964067459106445, 'epoch': 0.72} 24%|██▍ | 96/396 [46:45<2:25:16, 29.05s/it] 24%|██▍ | 97/396 [47:14<2:25:10, 29.13s/it] {'loss': 0.6772, 'learning_rate': 4.199438202247191e-07, 'losses/dpo': 0.6933009028434753, 'losses/sft': 0.7342395186424255, 'losses/total': 0.6933009028434753, 'rewards/chosen': 0.012448801659047604, 'rewards/rejected': -0.02156267873942852, 'rewards/accuracies': 0.640625, 'rewards/margins': 0.03401148319244385, 'logps/rejected': -28.644880294799805, 'logps/chosen': -22.02164077758789, 'ref_logps/rejected': -28.42925262451172, 'ref_logps/chosen': -22.146129608154297, 'epoch': 0.73} 24%|██▍ | 97/396 [47:14<2:25:10, 29.13s/it] 25%|██▍ | 98/396 [47:43<2:24:20, 29.06s/it] {'loss': 0.6834, 'learning_rate': 4.1853932584269664e-07, 'losses/dpo': 0.7061095833778381, 'losses/sft': 0.6976662278175354, 'losses/total': 0.7061095833778381, 'rewards/chosen': 0.015375564806163311, 'rewards/rejected': -0.0058290609158575535, 'rewards/accuracies': 0.5625, 'rewards/margins': 0.021204624325037003, 'logps/rejected': -23.74181365966797, 'logps/chosen': -21.086360931396484, 'ref_logps/rejected': -23.683523178100586, 'ref_logps/chosen': -21.240116119384766, 'epoch': 0.74} 25%|██▍ | 98/396 [47:43<2:24:20, 29.06s/it] 25%|██▌ | 99/396 [48:12<2:23:37, 29.02s/it] {'loss': 0.6749, 'learning_rate': 4.1713483146067415e-07, 'losses/dpo': 0.6546899080276489, 'losses/sft': 0.8132616281509399, 'losses/total': 0.6546899080276489, 'rewards/chosen': 0.014806646853685379, 'rewards/rejected': -0.02421729266643524, 'rewards/accuracies': 0.6640625, 'rewards/margins': 0.03902393952012062, 'logps/rejected': -28.555763244628906, 'logps/chosen': -21.535640716552734, 'ref_logps/rejected': -28.313589096069336, 'ref_logps/chosen': -21.68370819091797, 'epoch': 0.75} 25%|██▌ | 99/396 [48:12<2:23:37, 29.02s/it] 25%|██▌ | 100/396 [48:41<2:23:01, 28.99s/it] {'loss': 0.6777, 'learning_rate': 4.157303370786517e-07, 'losses/dpo': 0.6830233931541443, 'losses/sft': 0.7298552393913269, 'losses/total': 0.6830233931541443, 'rewards/chosen': 0.012851729989051819, 'rewards/rejected': -0.020514097064733505, 'rewards/accuracies': 0.625, 'rewards/margins': 0.033365827053785324, 'logps/rejected': -26.403512954711914, 'logps/chosen': -22.314010620117188, 'ref_logps/rejected': -26.1983699798584, 'ref_logps/chosen': -22.442527770996094, 'epoch': 0.75} 25%|██▌ | 100/396 [48:41<2:23:01, 28.99s/it] 26%|██▌ | 101/396 [49:10<2:22:18, 28.94s/it] {'loss': 0.6787, 'learning_rate': 4.1432584269662915e-07, 'losses/dpo': 0.66861492395401, 'losses/sft': 0.7538549900054932, 'losses/total': 0.66861492395401, 'rewards/chosen': 0.008682135492563248, 'rewards/rejected': -0.022655250504612923, 'rewards/accuracies': 0.6328125, 'rewards/margins': 0.03133738413453102, 'logps/rejected': -27.6639461517334, 'logps/chosen': -23.65606117248535, 'ref_logps/rejected': -27.43739128112793, 'ref_logps/chosen': -23.742881774902344, 'epoch': 0.76} 26%|██▌ | 101/396 [49:10<2:22:18, 28.94s/it] 26%|██▌ | 102/396 [49:39<2:21:58, 28.98s/it] {'loss': 0.6736, 'learning_rate': 4.129213483146067e-07, 'losses/dpo': 0.6594799757003784, 'losses/sft': 0.7625675201416016, 'losses/total': 0.6594799757003784, 'rewards/chosen': 0.015918483957648277, 'rewards/rejected': -0.02568456158041954, 'rewards/accuracies': 0.6640625, 'rewards/margins': 0.04160304740071297, 'logps/rejected': -27.045516967773438, 'logps/chosen': -21.20174789428711, 'ref_logps/rejected': -26.788671493530273, 'ref_logps/chosen': -21.360929489135742, 'epoch': 0.77} 26%|██▌ | 102/396 [49:39<2:21:58, 28.98s/it] 26%|██▌ | 103/396 [50:08<2:21:28, 28.97s/it] {'loss': 0.6789, 'learning_rate': 4.115168539325842e-07, 'losses/dpo': 0.6871756315231323, 'losses/sft': 0.7897288799285889, 'losses/total': 0.6871756315231323, 'rewards/chosen': 0.010980643332004547, 'rewards/rejected': -0.020153000950813293, 'rewards/accuracies': 0.6015625, 'rewards/margins': 0.03113364614546299, 'logps/rejected': -27.158187866210938, 'logps/chosen': -25.287567138671875, 'ref_logps/rejected': -26.95665740966797, 'ref_logps/chosen': -25.39737319946289, 'epoch': 0.78} 26%|██▌ | 103/396 [50:08<2:21:28, 28.97s/it] 26%|██▋ | 104/396 [50:37<2:20:50, 28.94s/it] {'loss': 0.6766, 'learning_rate': 4.1011235955056177e-07, 'losses/dpo': 0.6560062170028687, 'losses/sft': 0.7211654186248779, 'losses/total': 0.6560062170028687, 'rewards/chosen': 0.010623706504702568, 'rewards/rejected': -0.02510090172290802, 'rewards/accuracies': 0.640625, 'rewards/margins': 0.03572461009025574, 'logps/rejected': -27.055557250976562, 'logps/chosen': -20.239051818847656, 'ref_logps/rejected': -26.804546356201172, 'ref_logps/chosen': -20.345287322998047, 'epoch': 0.78} 26%|██▋ | 104/396 [50:37<2:20:50, 28.94s/it] 27%|██▋ | 105/396 [51:06<2:20:26, 28.96s/it] {'loss': 0.6728, 'learning_rate': 4.0870786516853933e-07, 'losses/dpo': 0.6975245475769043, 'losses/sft': 0.8287545442581177, 'losses/total': 0.6975245475769043, 'rewards/chosen': 0.014618270099163055, 'rewards/rejected': -0.02913743630051613, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.043755702674388885, 'logps/rejected': -28.331439971923828, 'logps/chosen': -22.816429138183594, 'ref_logps/rejected': -28.04006576538086, 'ref_logps/chosen': -22.96261215209961, 'epoch': 0.79} 27%|██▋ | 105/396 [51:06<2:20:26, 28.96s/it] 27%|██▋ | 106/396 [51:35<2:20:00, 28.97s/it] {'loss': 0.6776, 'learning_rate': 4.0730337078651683e-07, 'losses/dpo': 0.6524635553359985, 'losses/sft': 0.8967273235321045, 'losses/total': 0.6524635553359985, 'rewards/chosen': 0.006961943581700325, 'rewards/rejected': -0.026723740622401237, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.03368568420410156, 'logps/rejected': -27.868162155151367, 'logps/chosen': -22.864845275878906, 'ref_logps/rejected': -27.60092544555664, 'ref_logps/chosen': -22.934465408325195, 'epoch': 0.8} 27%|██▋ | 106/396 [51:35<2:20:00, 28.97s/it] 27%|██▋ | 107/396 [52:04<2:19:36, 28.98s/it] {'loss': 0.6785, 'learning_rate': 4.058988764044944e-07, 'losses/dpo': 0.6883168816566467, 'losses/sft': 0.9007142782211304, 'losses/total': 0.6883168816566467, 'rewards/chosen': 0.0025312139187008142, 'rewards/rejected': -0.029197873547673225, 'rewards/accuracies': 0.625, 'rewards/margins': 0.031729087233543396, 'logps/rejected': -29.40836524963379, 'logps/chosen': -26.633420944213867, 'ref_logps/rejected': -29.11638641357422, 'ref_logps/chosen': -26.658733367919922, 'epoch': 0.81} 27%|██▋ | 107/396 [52:04<2:19:36, 28.98s/it] 27%|██▋ | 108/396 [52:33<2:19:33, 29.08s/it] {'loss': 0.6678, 'learning_rate': 4.044943820224719e-07, 'losses/dpo': 0.6620572805404663, 'losses/sft': 0.7277075052261353, 'losses/total': 0.6620572805404663, 'rewards/chosen': 0.02055862732231617, 'rewards/rejected': -0.03368859738111496, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.05424723029136658, 'logps/rejected': -26.78734016418457, 'logps/chosen': -21.93716049194336, 'ref_logps/rejected': -26.450454711914062, 'ref_logps/chosen': -22.14274787902832, 'epoch': 0.82} 27%|██▋ | 108/396 [52:33<2:19:33, 29.08s/it] 28%|██▊ | 109/396 [53:03<2:19:40, 29.20s/it] {'loss': 0.6732, 'learning_rate': 4.0308988764044945e-07, 'losses/dpo': 0.6536136865615845, 'losses/sft': 0.793202817440033, 'losses/total': 0.6536136865615845, 'rewards/chosen': 0.014916517771780491, 'rewards/rejected': -0.02867007628083229, 'rewards/accuracies': 0.6171875, 'rewards/margins': 0.043586596846580505, 'logps/rejected': -25.321468353271484, 'logps/chosen': -23.479236602783203, 'ref_logps/rejected': -25.03476905822754, 'ref_logps/chosen': -23.628402709960938, 'epoch': 0.82} 28%|██▊ | 109/396 [53:03<2:19:40, 29.20s/it] 28%|██▊ | 110/396 [53:32<2:19:19, 29.23s/it] {'loss': 0.6677, 'learning_rate': 4.0168539325842696e-07, 'losses/dpo': 0.658541202545166, 'losses/sft': 0.6240718364715576, 'losses/total': 0.658541202545166, 'rewards/chosen': 0.01637459173798561, 'rewards/rejected': -0.03908466175198555, 'rewards/accuracies': 0.640625, 'rewards/margins': 0.05545924976468086, 'logps/rejected': -26.808046340942383, 'logps/chosen': -21.36187744140625, 'ref_logps/rejected': -26.417198181152344, 'ref_logps/chosen': -21.525625228881836, 'epoch': 0.83} 28%|██▊ | 110/396 [53:32<2:19:19, 29.23s/it] 28%|██▊ | 111/396 [54:01<2:19:11, 29.30s/it] {'loss': 0.6732, 'learning_rate': 4.0028089887640446e-07, 'losses/dpo': 0.6707695126533508, 'losses/sft': 0.8353971838951111, 'losses/total': 0.6707695126533508, 'rewards/chosen': 0.014854478649795055, 'rewards/rejected': -0.029122481122612953, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.043976958841085434, 'logps/rejected': -26.035858154296875, 'logps/chosen': -22.143728256225586, 'ref_logps/rejected': -25.744632720947266, 'ref_logps/chosen': -22.292274475097656, 'epoch': 0.84} 28%|██▊ | 111/396 [54:01<2:19:11, 29.30s/it] 28%|██▊ | 112/396 [54:30<2:18:04, 29.17s/it] {'loss': 0.6688, 'learning_rate': 3.9887640449438196e-07, 'losses/dpo': 0.6656994819641113, 'losses/sft': 0.8727293014526367, 'losses/total': 0.6656994819641113, 'rewards/chosen': 0.0060077933594584465, 'rewards/rejected': -0.046595096588134766, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.05260289087891579, 'logps/rejected': -24.53826332092285, 'logps/chosen': -22.15041732788086, 'ref_logps/rejected': -24.07231330871582, 'ref_logps/chosen': -22.210494995117188, 'epoch': 0.85} 28%|██▊ | 112/396 [54:30<2:18:04, 29.17s/it] 29%|██▊ | 113/396 [55:00<2:18:07, 29.28s/it] {'loss': 0.675, 'learning_rate': 3.974719101123595e-07, 'losses/dpo': 0.6690158247947693, 'losses/sft': 0.7370929718017578, 'losses/total': 0.6690158247947693, 'rewards/chosen': 0.008148876950144768, 'rewards/rejected': -0.031473446637392044, 'rewards/accuracies': 0.609375, 'rewards/margins': 0.03962232545018196, 'logps/rejected': -27.797752380371094, 'logps/chosen': -23.314592361450195, 'ref_logps/rejected': -27.483016967773438, 'ref_logps/chosen': -23.396080017089844, 'epoch': 0.85} 29%|██▊ | 113/396 [55:00<2:18:07, 29.28s/it] 29%|██▉ | 114/396 [55:29<2:16:59, 29.15s/it] {'loss': 0.6706, 'learning_rate': 3.960674157303371e-07, 'losses/dpo': 0.645140528678894, 'losses/sft': 0.77164226770401, 'losses/total': 0.645140528678894, 'rewards/chosen': 0.009551877155900002, 'rewards/rejected': -0.03965820372104645, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.0492100827395916, 'logps/rejected': -26.652328491210938, 'logps/chosen': -21.854373931884766, 'ref_logps/rejected': -26.255746841430664, 'ref_logps/chosen': -21.949893951416016, 'epoch': 0.86} 29%|██▉ | 114/396 [55:29<2:16:59, 29.15s/it] 29%|██▉ | 115/396 [55:58<2:16:12, 29.08s/it] {'loss': 0.6634, 'learning_rate': 3.946629213483146e-07, 'losses/dpo': 0.6699668169021606, 'losses/sft': 0.8002771139144897, 'losses/total': 0.6699668169021606, 'rewards/chosen': 0.0125090591609478, 'rewards/rejected': -0.050823770463466644, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.06333282589912415, 'logps/rejected': -28.40381622314453, 'logps/chosen': -23.778413772583008, 'ref_logps/rejected': -27.89557647705078, 'ref_logps/chosen': -23.903501510620117, 'epoch': 0.87} 29%|██▉ | 115/396 [55:58<2:16:12, 29.08s/it] 29%|██▉ | 116/396 [56:27<2:15:51, 29.11s/it] {'loss': 0.6658, 'learning_rate': 3.9325842696629214e-07, 'losses/dpo': 0.6745936870574951, 'losses/sft': 0.8017398715019226, 'losses/total': 0.6745936870574951, 'rewards/chosen': 0.018472209572792053, 'rewards/rejected': -0.04103565216064453, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.059507861733436584, 'logps/rejected': -28.184139251708984, 'logps/chosen': -24.59353256225586, 'ref_logps/rejected': -27.77378273010254, 'ref_logps/chosen': -24.77825164794922, 'epoch': 0.88} 29%|██▉ | 116/396 [56:27<2:15:51, 29.11s/it] 30%|██▉ | 117/396 [56:56<2:14:58, 29.03s/it] {'loss': 0.6641, 'learning_rate': 3.9185393258426964e-07, 'losses/dpo': 0.6748782396316528, 'losses/sft': 0.6509857177734375, 'losses/total': 0.6748782396316528, 'rewards/chosen': 0.014955190010368824, 'rewards/rejected': -0.04809773340821266, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.06305292248725891, 'logps/rejected': -25.704240798950195, 'logps/chosen': -20.781490325927734, 'ref_logps/rejected': -25.223262786865234, 'ref_logps/chosen': -20.93104362487793, 'epoch': 0.88} 30%|██▉ | 117/396 [56:56<2:14:58, 29.03s/it] 30%|██▉ | 118/396 [57:25<2:14:42, 29.08s/it] {'loss': 0.6719, 'learning_rate': 3.904494382022472e-07, 'losses/dpo': 0.6790695190429688, 'losses/sft': 0.7899962663650513, 'losses/total': 0.6790695190429688, 'rewards/chosen': 0.010911967605352402, 'rewards/rejected': -0.03575696796178818, 'rewards/accuracies': 0.640625, 'rewards/margins': 0.04666893184185028, 'logps/rejected': -28.954145431518555, 'logps/chosen': -22.889171600341797, 'ref_logps/rejected': -28.596576690673828, 'ref_logps/chosen': -22.998294830322266, 'epoch': 0.89} 30%|██▉ | 118/396 [57:25<2:14:42, 29.08s/it] 30%|███ | 119/396 [57:54<2:14:34, 29.15s/it] {'loss': 0.6713, 'learning_rate': 3.890449438202247e-07, 'losses/dpo': 0.6665077209472656, 'losses/sft': 0.8753491044044495, 'losses/total': 0.6665077209472656, 'rewards/chosen': 0.01732712611556053, 'rewards/rejected': -0.032384805381298065, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.0497119314968586, 'logps/rejected': -24.892658233642578, 'logps/chosen': -22.229143142700195, 'ref_logps/rejected': -24.568809509277344, 'ref_logps/chosen': -22.402416229248047, 'epoch': 0.9} 30%|███ | 119/396 [57:54<2:14:34, 29.15s/it] 30%|███ | 120/396 [58:23<2:13:49, 29.09s/it] {'loss': 0.6637, 'learning_rate': 3.876404494382022e-07, 'losses/dpo': 0.6545946002006531, 'losses/sft': 0.8056938052177429, 'losses/total': 0.6545946002006531, 'rewards/chosen': 0.009503833949565887, 'rewards/rejected': -0.05418993532657623, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.06369376927614212, 'logps/rejected': -29.53872299194336, 'logps/chosen': -22.233783721923828, 'ref_logps/rejected': -28.996824264526367, 'ref_logps/chosen': -22.328821182250977, 'epoch': 0.91} 30%|███ | 120/396 [58:23<2:13:49, 29.09s/it] 31%|███ | 121/396 [58:52<2:13:21, 29.10s/it] {'loss': 0.6778, 'learning_rate': 3.8623595505617977e-07, 'losses/dpo': 0.6500009298324585, 'losses/sft': 0.9210071563720703, 'losses/total': 0.6500009298324585, 'rewards/chosen': 0.00556858628988266, 'rewards/rejected': -0.03058495745062828, 'rewards/accuracies': 0.5859375, 'rewards/margins': 0.03615354374051094, 'logps/rejected': -27.632476806640625, 'logps/chosen': -24.073867797851562, 'ref_logps/rejected': -27.32662582397461, 'ref_logps/chosen': -24.12955093383789, 'epoch': 0.91} 31%|███ | 121/396 [58:52<2:13:21, 29.10s/it] 31%|███ | 122/396 [59:21<2:12:44, 29.07s/it] {'loss': 0.6601, 'learning_rate': 3.8483146067415727e-07, 'losses/dpo': 0.6630659103393555, 'losses/sft': 0.8758641481399536, 'losses/total': 0.6630659103393555, 'rewards/chosen': 0.01558714546263218, 'rewards/rejected': -0.056019626557826996, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.07160677015781403, 'logps/rejected': -31.358665466308594, 'logps/chosen': -21.38442039489746, 'ref_logps/rejected': -30.79846954345703, 'ref_logps/chosen': -21.540292739868164, 'epoch': 0.92} 31%|███ | 122/396 [59:21<2:12:44, 29.07s/it] 31%|███ | 123/396 [59:50<2:12:29, 29.12s/it] {'loss': 0.6622, 'learning_rate': 3.834269662921348e-07, 'losses/dpo': 0.6400080919265747, 'losses/sft': 0.8849148750305176, 'losses/total': 0.6400080919265747, 'rewards/chosen': 0.00868179090321064, 'rewards/rejected': -0.05847976729273796, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.06716156005859375, 'logps/rejected': -25.64166831970215, 'logps/chosen': -21.09262466430664, 'ref_logps/rejected': -25.056869506835938, 'ref_logps/chosen': -21.179445266723633, 'epoch': 0.93} 31%|███ | 123/396 [59:50<2:12:29, 29.12s/it] 31%|███▏ | 124/396 [1:00:20<2:12:09, 29.15s/it] {'loss': 0.6765, 'learning_rate': 3.8202247191011233e-07, 'losses/dpo': 0.6927012205123901, 'losses/sft': 0.8673559427261353, 'losses/total': 0.6927012205123901, 'rewards/chosen': -0.004251426085829735, 'rewards/rejected': -0.04244796186685562, 'rewards/accuracies': 0.59375, 'rewards/margins': 0.038196537643671036, 'logps/rejected': -28.025104522705078, 'logps/chosen': -25.65859603881836, 'ref_logps/rejected': -27.600624084472656, 'ref_logps/chosen': -25.61608123779297, 'epoch': 0.94} 31%|███▏ | 124/396 [1:00:20<2:12:09, 29.15s/it] 32%|███▏ | 125/396 [1:00:49<2:11:26, 29.10s/it] {'loss': 0.6647, 'learning_rate': 3.806179775280899e-07, 'losses/dpo': 0.7150436639785767, 'losses/sft': 0.9468034505844116, 'losses/total': 0.7150436639785767, 'rewards/chosen': 0.004623853601515293, 'rewards/rejected': -0.058716922998428345, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.06334076821804047, 'logps/rejected': -29.840375900268555, 'logps/chosen': -23.93341636657715, 'ref_logps/rejected': -29.25320816040039, 'ref_logps/chosen': -23.979652404785156, 'epoch': 0.94} 32%|███▏ | 125/396 [1:00:49<2:11:26, 29.10s/it] 32%|███▏ | 126/396 [1:01:18<2:11:32, 29.23s/it] {'loss': 0.6559, 'learning_rate': 3.792134831460674e-07, 'losses/dpo': 0.6770719289779663, 'losses/sft': 0.9255229234695435, 'losses/total': 0.6770719289779663, 'rewards/chosen': 0.019628863781690598, 'rewards/rejected': -0.06242916360497475, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.08205802738666534, 'logps/rejected': -28.292198181152344, 'logps/chosen': -25.031259536743164, 'ref_logps/rejected': -27.667905807495117, 'ref_logps/chosen': -25.22754669189453, 'epoch': 0.95} 32%|███▏ | 126/396 [1:01:18<2:11:32, 29.23s/it] 32%|███▏ | 127/396 [1:01:48<2:11:16, 29.28s/it] {'loss': 0.6765, 'learning_rate': 3.7780898876404495e-07, 'losses/dpo': 0.635480523109436, 'losses/sft': 0.7413178086280823, 'losses/total': 0.635480523109436, 'rewards/chosen': -0.004689330700784922, 'rewards/rejected': -0.04539068788290024, 'rewards/accuracies': 0.5390625, 'rewards/margins': 0.04070135951042175, 'logps/rejected': -26.84676742553711, 'logps/chosen': -21.68558692932129, 'ref_logps/rejected': -26.392860412597656, 'ref_logps/chosen': -21.638694763183594, 'epoch': 0.96} 32%|███▏ | 127/396 [1:01:48<2:11:16, 29.28s/it] 32%|███▏ | 128/396 [1:02:17<2:10:17, 29.17s/it] {'loss': 0.6587, 'learning_rate': 3.7640449438202245e-07, 'losses/dpo': 0.6835530400276184, 'losses/sft': 0.9732310771942139, 'losses/total': 0.6835530400276184, 'rewards/chosen': 0.010786494240164757, 'rewards/rejected': -0.06560088694095612, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.07638738304376602, 'logps/rejected': -26.53976821899414, 'logps/chosen': -22.910152435302734, 'ref_logps/rejected': -25.88375473022461, 'ref_logps/chosen': -23.018016815185547, 'epoch': 0.97} 32%|███▏ | 128/396 [1:02:17<2:10:17, 29.17s/it] 33%|███▎ | 129/396 [1:02:46<2:10:08, 29.25s/it] {'loss': 0.6617, 'learning_rate': 3.75e-07, 'losses/dpo': 0.6463422775268555, 'losses/sft': 0.7454620599746704, 'losses/total': 0.6463422775268555, 'rewards/chosen': 0.012755412608385086, 'rewards/rejected': -0.057515427470207214, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.0702708438038826, 'logps/rejected': -26.875211715698242, 'logps/chosen': -23.20888900756836, 'ref_logps/rejected': -26.300058364868164, 'ref_logps/chosen': -23.336442947387695, 'epoch': 0.97} 33%|███▎ | 129/396 [1:02:46<2:10:08, 29.25s/it] 33%|███▎ | 130/396 [1:03:15<2:09:21, 29.18s/it] {'loss': 0.6784, 'learning_rate': 3.735955056179775e-07, 'losses/dpo': 0.6625787019729614, 'losses/sft': 0.7854889631271362, 'losses/total': 0.6625787019729614, 'rewards/chosen': -0.01194157637655735, 'rewards/rejected': -0.047412216663360596, 'rewards/accuracies': 0.6328125, 'rewards/margins': 0.03547064587473869, 'logps/rejected': -29.472164154052734, 'logps/chosen': -22.396747589111328, 'ref_logps/rejected': -28.998043060302734, 'ref_logps/chosen': -22.277332305908203, 'epoch': 0.98} 33%|███▎ | 130/396 [1:03:15<2:09:21, 29.18s/it] 33%|███▎ | 131/396 [1:03:44<2:09:15, 29.27s/it] {'loss': 0.6612, 'learning_rate': 3.72191011235955e-07, 'losses/dpo': 0.6598723530769348, 'losses/sft': 0.8644169569015503, 'losses/total': 0.6598723530769348, 'rewards/chosen': 0.0076522137969732285, 'rewards/rejected': -0.06372006982564926, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.07137227803468704, 'logps/rejected': -24.600296020507812, 'logps/chosen': -18.81739044189453, 'ref_logps/rejected': -23.96309471130371, 'ref_logps/chosen': -18.89391326904297, 'epoch': 0.99} 33%|███▎ | 131/396 [1:03:44<2:09:15, 29.27s/it] 33%|███▎ | 132/396 [1:04:14<2:08:46, 29.27s/it] {'loss': 0.6576, 'learning_rate': 3.707865168539326e-07, 'losses/dpo': 0.6264052391052246, 'losses/sft': 0.7484258413314819, 'losses/total': 0.6264052391052246, 'rewards/chosen': -0.00039180926978588104, 'rewards/rejected': -0.0802639052271843, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.07987209409475327, 'logps/rejected': -29.2607364654541, 'logps/chosen': -25.24700927734375, 'ref_logps/rejected': -28.458097457885742, 'ref_logps/chosen': -25.243091583251953, 'epoch': 1.0} 33%|███▎ | 132/396 [1:04:14<2:08:46, 29.27s/it] 34%|███▎ | 133/396 [1:04:43<2:08:01, 29.21s/it] {'loss': 0.6596, 'learning_rate': 3.693820224719101e-07, 'losses/dpo': 0.6850643157958984, 'losses/sft': 0.7063156366348267, 'losses/total': 0.6850643157958984, 'rewards/chosen': -0.008415229618549347, 'rewards/rejected': -0.08627899736166, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.07786377519369125, 'logps/rejected': -29.071331024169922, 'logps/chosen': -24.664264678955078, 'ref_logps/rejected': -28.208541870117188, 'ref_logps/chosen': -24.58011245727539, 'epoch': 1.0} 34%|███▎ | 133/396 [1:04:43<2:08:01, 29.21s/it] 34%|███▍ | 134/396 [1:05:12<2:07:48, 29.27s/it] {'loss': 0.6529, 'learning_rate': 3.6797752808988764e-07, 'losses/dpo': 0.6567816734313965, 'losses/sft': 0.8528650403022766, 'losses/total': 0.6567816734313965, 'rewards/chosen': 0.00864771381020546, 'rewards/rejected': -0.08150242269039154, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.0901501327753067, 'logps/rejected': -25.79207992553711, 'logps/chosen': -21.803192138671875, 'ref_logps/rejected': -24.97705841064453, 'ref_logps/chosen': -21.88966941833496, 'epoch': 1.01} 34%|███▍ | 134/396 [1:05:12<2:07:48, 29.27s/it] 34%|███▍ | 135/396 [1:05:41<2:06:48, 29.15s/it] {'loss': 0.6442, 'learning_rate': 3.6657303370786514e-07, 'losses/dpo': 0.6402660608291626, 'losses/sft': 0.7653439044952393, 'losses/total': 0.6402660608291626, 'rewards/chosen': 0.012921325862407684, 'rewards/rejected': -0.0943225771188736, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.10724389553070068, 'logps/rejected': -27.048810958862305, 'logps/chosen': -20.78626823425293, 'ref_logps/rejected': -26.105587005615234, 'ref_logps/chosen': -20.915481567382812, 'epoch': 1.02} 34%|███▍ | 135/396 [1:05:41<2:06:48, 29.15s/it] 34%|███▍ | 136/396 [1:06:10<2:06:25, 29.18s/it] {'loss': 0.6563, 'learning_rate': 3.651685393258427e-07, 'losses/dpo': 0.6588989496231079, 'losses/sft': 0.8334387540817261, 'losses/total': 0.6588989496231079, 'rewards/chosen': 0.0020102611742913723, 'rewards/rejected': -0.08420848101377487, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.08621874451637268, 'logps/rejected': -26.884532928466797, 'logps/chosen': -23.661598205566406, 'ref_logps/rejected': -26.042449951171875, 'ref_logps/chosen': -23.68170166015625, 'epoch': 1.03} 34%|███▍ | 136/396 [1:06:10<2:06:25, 29.18s/it] 35%|███▍ | 137/396 [1:06:39<2:05:46, 29.14s/it] {'loss': 0.6414, 'learning_rate': 3.637640449438202e-07, 'losses/dpo': 0.610801100730896, 'losses/sft': 0.6104759573936462, 'losses/total': 0.610801100730896, 'rewards/chosen': 0.005712391808629036, 'rewards/rejected': -0.10849656164646149, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.11420895159244537, 'logps/rejected': -26.843595504760742, 'logps/chosen': -21.846914291381836, 'ref_logps/rejected': -25.758628845214844, 'ref_logps/chosen': -21.904037475585938, 'epoch': 1.03} 35%|███▍ | 137/396 [1:06:39<2:05:46, 29.14s/it] 35%|███▍ | 138/396 [1:07:08<2:05:04, 29.09s/it] {'loss': 0.6507, 'learning_rate': 3.6235955056179776e-07, 'losses/dpo': 0.6711180806159973, 'losses/sft': 0.8334028720855713, 'losses/total': 0.6711180806159973, 'rewards/chosen': 0.009335671551525593, 'rewards/rejected': -0.08555103838443756, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.09488671272993088, 'logps/rejected': -26.24932861328125, 'logps/chosen': -23.79953384399414, 'ref_logps/rejected': -25.393817901611328, 'ref_logps/chosen': -23.89289093017578, 'epoch': 1.04} 35%|███▍ | 138/396 [1:07:08<2:05:04, 29.09s/it] 35%|███▌ | 139/396 [1:07:38<2:05:17, 29.25s/it] {'loss': 0.6393, 'learning_rate': 3.6095505617977526e-07, 'losses/dpo': 0.6086191534996033, 'losses/sft': 0.7045127749443054, 'losses/total': 0.6086191534996033, 'rewards/chosen': 0.0177919864654541, 'rewards/rejected': -0.10370529443025589, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.12149728834629059, 'logps/rejected': -28.091732025146484, 'logps/chosen': -20.413612365722656, 'ref_logps/rejected': -27.054677963256836, 'ref_logps/chosen': -20.591529846191406, 'epoch': 1.05} 35%|███▌ | 139/396 [1:07:38<2:05:17, 29.25s/it] 35%|███▌ | 140/396 [1:08:07<2:04:20, 29.14s/it] {'loss': 0.6574, 'learning_rate': 3.5955056179775277e-07, 'losses/dpo': 0.6771029233932495, 'losses/sft': 0.8275946378707886, 'losses/total': 0.6771029233932495, 'rewards/chosen': -0.012759597972035408, 'rewards/rejected': -0.09519506990909576, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.0824354737997055, 'logps/rejected': -25.42624282836914, 'logps/chosen': -23.96946907043457, 'ref_logps/rejected': -24.474294662475586, 'ref_logps/chosen': -23.84187126159668, 'epoch': 1.06} 35%|███▌ | 140/396 [1:08:07<2:04:20, 29.14s/it] 36%|███▌ | 141/396 [1:08:36<2:03:57, 29.17s/it] {'loss': 0.6403, 'learning_rate': 3.581460674157303e-07, 'losses/dpo': 0.60587477684021, 'losses/sft': 0.7718257904052734, 'losses/total': 0.60587477684021, 'rewards/chosen': 0.013069930486381054, 'rewards/rejected': -0.10329011082649231, 'rewards/accuracies': 0.75, 'rewards/margins': 0.11636004596948624, 'logps/rejected': -26.33192253112793, 'logps/chosen': -20.24493980407715, 'ref_logps/rejected': -25.299020767211914, 'ref_logps/chosen': -20.375638961791992, 'epoch': 1.06} 36%|███▌ | 141/396 [1:08:36<2:03:57, 29.17s/it] 36%|███▌ | 142/396 [1:09:05<2:03:09, 29.09s/it] {'loss': 0.6384, 'learning_rate': 3.5674157303370783e-07, 'losses/dpo': 0.6827423572540283, 'losses/sft': 0.8567611575126648, 'losses/total': 0.6827423572540283, 'rewards/chosen': 0.01699351891875267, 'rewards/rejected': -0.10577632486820221, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.12276984751224518, 'logps/rejected': -28.200380325317383, 'logps/chosen': -22.9414119720459, 'ref_logps/rejected': -27.142616271972656, 'ref_logps/chosen': -23.111347198486328, 'epoch': 1.07} 36%|███▌ | 142/396 [1:09:05<2:03:09, 29.09s/it] 36%|███▌ | 143/396 [1:09:34<2:02:56, 29.16s/it] {'loss': 0.6624, 'learning_rate': 3.553370786516854e-07, 'losses/dpo': 0.6864386796951294, 'losses/sft': 0.8041479587554932, 'losses/total': 0.6864386796951294, 'rewards/chosen': -0.029866419732570648, 'rewards/rejected': -0.10425157845020294, 'rewards/accuracies': 0.625, 'rewards/margins': 0.07438516616821289, 'logps/rejected': -27.77198028564453, 'logps/chosen': -23.226070404052734, 'ref_logps/rejected': -26.72946548461914, 'ref_logps/chosen': -22.92740249633789, 'epoch': 1.08} 36%|███▌ | 143/396 [1:09:34<2:02:56, 29.16s/it] 36%|███▋ | 144/396 [1:10:03<2:02:13, 29.10s/it] {'loss': 0.6455, 'learning_rate': 3.539325842696629e-07, 'losses/dpo': 0.6347097158432007, 'losses/sft': 0.6569658517837524, 'losses/total': 0.6347097158432007, 'rewards/chosen': 0.011630430817604065, 'rewards/rejected': -0.09969674795866013, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.11132718622684479, 'logps/rejected': -28.53704833984375, 'logps/chosen': -21.75617027282715, 'ref_logps/rejected': -27.540082931518555, 'ref_logps/chosen': -21.872474670410156, 'epoch': 1.09} 36%|███▋ | 144/396 [1:10:03<2:02:13, 29.10s/it] 37%|███▋ | 145/396 [1:10:32<2:01:51, 29.13s/it] {'loss': 0.6407, 'learning_rate': 3.5252808988764045e-07, 'losses/dpo': 0.6530706286430359, 'losses/sft': 0.8703383207321167, 'losses/total': 0.6530706286430359, 'rewards/chosen': -0.005126964300870895, 'rewards/rejected': -0.12409258633852005, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.11896562576293945, 'logps/rejected': -29.736862182617188, 'logps/chosen': -24.18975830078125, 'ref_logps/rejected': -28.495933532714844, 'ref_logps/chosen': -24.138484954833984, 'epoch': 1.09} 37%|███▋ | 145/396 [1:10:32<2:01:51, 29.13s/it] 37%|███▋ | 146/396 [1:11:01<2:01:11, 29.09s/it] {'loss': 0.647, 'learning_rate': 3.51123595505618e-07, 'losses/dpo': 0.6477080583572388, 'losses/sft': 0.8653473854064941, 'losses/total': 0.6477080583572388, 'rewards/chosen': -0.008922239765524864, 'rewards/rejected': -0.11375758051872253, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.10483534634113312, 'logps/rejected': -29.576303482055664, 'logps/chosen': -24.84428596496582, 'ref_logps/rejected': -28.43872833251953, 'ref_logps/chosen': -24.755064010620117, 'epoch': 1.1} 37%|███▋ | 146/396 [1:11:01<2:01:11, 29.09s/it] 37%|███▋ | 147/396 [1:11:30<2:00:25, 29.02s/it] {'loss': 0.6095, 'learning_rate': 3.497191011235955e-07, 'losses/dpo': 0.6273882389068604, 'losses/sft': 0.8987213373184204, 'losses/total': 0.6273882389068604, 'rewards/chosen': 0.019050076603889465, 'rewards/rejected': -0.16993993520736694, 'rewards/accuracies': 0.8125, 'rewards/margins': 0.1889900416135788, 'logps/rejected': -27.753063201904297, 'logps/chosen': -24.983165740966797, 'ref_logps/rejected': -26.05366325378418, 'ref_logps/chosen': -25.17366600036621, 'epoch': 1.11} 37%|███▋ | 147/396 [1:11:30<2:00:25, 29.02s/it] 37%|███▋ | 148/396 [1:11:59<2:00:07, 29.06s/it] {'loss': 0.6583, 'learning_rate': 3.48314606741573e-07, 'losses/dpo': 0.6790063381195068, 'losses/sft': 0.7648496627807617, 'losses/total': 0.6790063381195068, 'rewards/chosen': -0.02102772891521454, 'rewards/rejected': -0.1063925176858902, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.08536479622125626, 'logps/rejected': -27.743179321289062, 'logps/chosen': -22.61692237854004, 'ref_logps/rejected': -26.67925262451172, 'ref_logps/chosen': -22.40664291381836, 'epoch': 1.12} 37%|███▋ | 148/396 [1:11:59<2:00:07, 29.06s/it] 38%|███▊ | 149/396 [1:12:28<1:59:27, 29.02s/it] {'loss': 0.6261, 'learning_rate': 3.469101123595505e-07, 'losses/dpo': 0.6479306221008301, 'losses/sft': 0.8049210906028748, 'losses/total': 0.6479306221008301, 'rewards/chosen': 0.016479745507240295, 'rewards/rejected': -0.13337840139865875, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.14985813200473785, 'logps/rejected': -29.590002059936523, 'logps/chosen': -22.846782684326172, 'ref_logps/rejected': -28.2562198638916, 'ref_logps/chosen': -23.011579513549805, 'epoch': 1.12} 38%|███▊ | 149/396 [1:12:28<1:59:27, 29.02s/it] 38%|███▊ | 150/396 [1:12:57<1:58:53, 29.00s/it] {'loss': 0.6277, 'learning_rate': 3.4550561797752807e-07, 'losses/dpo': 0.6358213424682617, 'losses/sft': 0.8344307541847229, 'losses/total': 0.6358213424682617, 'rewards/chosen': -0.00012012943625450134, 'rewards/rejected': -0.14871028065681458, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.14859014749526978, 'logps/rejected': -27.46141815185547, 'logps/chosen': -21.699583053588867, 'ref_logps/rejected': -25.974313735961914, 'ref_logps/chosen': -21.698383331298828, 'epoch': 1.13} 38%|███▊ | 150/396 [1:12:57<1:58:53, 29.00s/it] 38%|███▊ | 151/396 [1:13:26<1:58:18, 28.97s/it] {'loss': 0.654, 'learning_rate': 3.441011235955056e-07, 'losses/dpo': 0.6406779289245605, 'losses/sft': 0.8018806576728821, 'losses/total': 0.6406779289245605, 'rewards/chosen': -0.01747327297925949, 'rewards/rejected': -0.10612466931343079, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.08865140378475189, 'logps/rejected': -25.436817169189453, 'logps/chosen': -20.88718032836914, 'ref_logps/rejected': -24.37557029724121, 'ref_logps/chosen': -20.712448120117188, 'epoch': 1.14} 38%|███▊ | 151/396 [1:13:26<1:58:18, 28.97s/it] 38%|███▊ | 152/396 [1:13:55<1:57:42, 28.95s/it] {'loss': 0.6355, 'learning_rate': 3.4269662921348313e-07, 'losses/dpo': 0.5932921171188354, 'losses/sft': 0.6528638005256653, 'losses/total': 0.5932921171188354, 'rewards/chosen': -0.010168392211198807, 'rewards/rejected': -0.1429746299982071, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.13280624151229858, 'logps/rejected': -30.142927169799805, 'logps/chosen': -22.312236785888672, 'ref_logps/rejected': -28.713180541992188, 'ref_logps/chosen': -22.210554122924805, 'epoch': 1.15} 38%|███▊ | 152/396 [1:13:55<1:57:42, 28.95s/it] 39%|███▊ | 153/396 [1:14:24<1:57:05, 28.91s/it] {'loss': 0.6359, 'learning_rate': 3.4129213483146064e-07, 'losses/dpo': 0.6205468773841858, 'losses/sft': 0.8744308352470398, 'losses/total': 0.6205468773841858, 'rewards/chosen': -0.013774631544947624, 'rewards/rejected': -0.14487791061401367, 'rewards/accuracies': 0.7890625, 'rewards/margins': 0.1311032772064209, 'logps/rejected': -29.10406494140625, 'logps/chosen': -26.28810691833496, 'ref_logps/rejected': -27.655288696289062, 'ref_logps/chosen': -26.150360107421875, 'epoch': 1.15} 39%|███▊ | 153/396 [1:14:24<1:57:05, 28.91s/it] 39%|███▉ | 154/396 [1:14:53<1:56:50, 28.97s/it] {'loss': 0.6679, 'learning_rate': 3.398876404494382e-07, 'losses/dpo': 0.6655905246734619, 'losses/sft': 0.8864909410476685, 'losses/total': 0.6655905246734619, 'rewards/chosen': -0.04157543182373047, 'rewards/rejected': -0.10723182559013367, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.0656563863158226, 'logps/rejected': -26.302614212036133, 'logps/chosen': -22.283679962158203, 'ref_logps/rejected': -25.230295181274414, 'ref_logps/chosen': -21.867923736572266, 'epoch': 1.16} 39%|███▉ | 154/396 [1:14:53<1:56:50, 28.97s/it] 39%|███▉ | 155/396 [1:15:22<1:56:34, 29.02s/it] {'loss': 0.6559, 'learning_rate': 3.3848314606741575e-07, 'losses/dpo': 0.6645406484603882, 'losses/sft': 0.794353723526001, 'losses/total': 0.6645406484603882, 'rewards/chosen': -0.024232013151049614, 'rewards/rejected': -0.11909983307123184, 'rewards/accuracies': 0.6328125, 'rewards/margins': 0.09486782550811768, 'logps/rejected': -28.45652961730957, 'logps/chosen': -22.68756103515625, 'ref_logps/rejected': -27.26552963256836, 'ref_logps/chosen': -22.445241928100586, 'epoch': 1.17} 39%|███▉ | 155/396 [1:15:22<1:56:34, 29.02s/it] 39%|███▉ | 156/396 [1:15:51<1:55:57, 28.99s/it] {'loss': 0.6194, 'learning_rate': 3.3707865168539325e-07, 'losses/dpo': 0.5837043523788452, 'losses/sft': 0.9716494083404541, 'losses/total': 0.5837043523788452, 'rewards/chosen': 0.001832372508943081, 'rewards/rejected': -0.16736885905265808, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.16920123994350433, 'logps/rejected': -27.09580421447754, 'logps/chosen': -22.336397171020508, 'ref_logps/rejected': -25.422115325927734, 'ref_logps/chosen': -22.35472297668457, 'epoch': 1.18} 39%|███▉ | 156/396 [1:15:51<1:55:57, 28.99s/it] 40%|███▉ | 157/396 [1:16:20<1:55:24, 28.97s/it] {'loss': 0.6425, 'learning_rate': 3.356741573033708e-07, 'losses/dpo': 0.6696836948394775, 'losses/sft': 0.773880660533905, 'losses/total': 0.6696836948394775, 'rewards/chosen': -0.032960131764411926, 'rewards/rejected': -0.15058889985084534, 'rewards/accuracies': 0.75, 'rewards/margins': 0.11762877553701401, 'logps/rejected': -28.435253143310547, 'logps/chosen': -22.49996566772461, 'ref_logps/rejected': -26.929363250732422, 'ref_logps/chosen': -22.170368194580078, 'epoch': 1.18} 40%|███▉ | 157/396 [1:16:20<1:55:24, 28.97s/it] 40%|███▉ | 158/396 [1:16:49<1:54:57, 28.98s/it] {'loss': 0.6295, 'learning_rate': 3.3426966292134826e-07, 'losses/dpo': 0.6400988101959229, 'losses/sft': 0.724359929561615, 'losses/total': 0.6400988101959229, 'rewards/chosen': -0.029904408380389214, 'rewards/rejected': -0.17668592929840088, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.1467815339565277, 'logps/rejected': -30.868057250976562, 'logps/chosen': -22.498619079589844, 'ref_logps/rejected': -29.1011962890625, 'ref_logps/chosen': -22.199575424194336, 'epoch': 1.19} 40%|███▉ | 158/396 [1:16:49<1:54:57, 28.98s/it] 40%|████ | 159/396 [1:17:18<1:54:23, 28.96s/it] {'loss': 0.6331, 'learning_rate': 3.328651685393258e-07, 'losses/dpo': 0.6349748373031616, 'losses/sft': 0.7728020548820496, 'losses/total': 0.6349748373031616, 'rewards/chosen': -0.027167750522494316, 'rewards/rejected': -0.16941101849079132, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.14224328100681305, 'logps/rejected': -29.327089309692383, 'logps/chosen': -24.872241973876953, 'ref_logps/rejected': -27.632978439331055, 'ref_logps/chosen': -24.600563049316406, 'epoch': 1.2} 40%|████ | 159/396 [1:17:18<1:54:23, 28.96s/it] 40%|████ | 160/396 [1:17:47<1:54:18, 29.06s/it] {'loss': 0.6269, 'learning_rate': 3.314606741573033e-07, 'losses/dpo': 0.6175022721290588, 'losses/sft': 0.8887324929237366, 'losses/total': 0.6175022721290588, 'rewards/chosen': -0.013628311455249786, 'rewards/rejected': -0.1763123720884323, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.16268408298492432, 'logps/rejected': -28.384960174560547, 'logps/chosen': -25.719676971435547, 'ref_logps/rejected': -26.621837615966797, 'ref_logps/chosen': -25.583393096923828, 'epoch': 1.21} 40%|████ | 160/396 [1:17:47<1:54:18, 29.06s/it] 41%|████ | 161/396 [1:18:16<1:53:33, 28.99s/it] {'loss': 0.6418, 'learning_rate': 3.300561797752809e-07, 'losses/dpo': 0.604182243347168, 'losses/sft': 0.63340824842453, 'losses/total': 0.604182243347168, 'rewards/chosen': -0.027542442083358765, 'rewards/rejected': -0.14988556504249573, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.12234312295913696, 'logps/rejected': -26.39871597290039, 'logps/chosen': -20.547767639160156, 'ref_logps/rejected': -24.899860382080078, 'ref_logps/chosen': -20.272342681884766, 'epoch': 1.22} 41%|████ | 161/396 [1:18:16<1:53:33, 28.99s/it] 41%|████ | 162/396 [1:18:45<1:52:52, 28.94s/it] {'loss': 0.6111, 'learning_rate': 3.2865168539325844e-07, 'losses/dpo': 0.5942946672439575, 'losses/sft': 0.9472201466560364, 'losses/total': 0.5942946672439575, 'rewards/chosen': -0.029076654464006424, 'rewards/rejected': -0.22233551740646362, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.1932588517665863, 'logps/rejected': -27.69287872314453, 'logps/chosen': -22.42629623413086, 'ref_logps/rejected': -25.469520568847656, 'ref_logps/chosen': -22.135528564453125, 'epoch': 1.22} 41%|████ | 162/396 [1:18:45<1:52:52, 28.94s/it] 41%|████ | 163/396 [1:19:14<1:52:26, 28.95s/it] {'loss': 0.6467, 'learning_rate': 3.2724719101123594e-07, 'losses/dpo': 0.6821013689041138, 'losses/sft': 0.9050745368003845, 'losses/total': 0.6821013689041138, 'rewards/chosen': -0.04406279698014259, 'rewards/rejected': -0.1547047346830368, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.11064193397760391, 'logps/rejected': -28.64287567138672, 'logps/chosen': -23.306896209716797, 'ref_logps/rejected': -27.0958251953125, 'ref_logps/chosen': -22.86626625061035, 'epoch': 1.23} 41%|████ | 163/396 [1:19:14<1:52:26, 28.95s/it] 41%|████▏ | 164/396 [1:19:43<1:51:44, 28.90s/it] {'loss': 0.6214, 'learning_rate': 3.258426966292135e-07, 'losses/dpo': 0.6081950664520264, 'losses/sft': 0.827450692653656, 'losses/total': 0.6081950664520264, 'rewards/chosen': -0.01924710161983967, 'rewards/rejected': -0.19279725849628448, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.17355017364025116, 'logps/rejected': -26.020713806152344, 'logps/chosen': -24.126543045043945, 'ref_logps/rejected': -24.092741012573242, 'ref_logps/chosen': -23.934072494506836, 'epoch': 1.24} 41%|████▏ | 164/396 [1:19:43<1:51:44, 28.90s/it] 42%|████▏ | 165/396 [1:20:12<1:51:15, 28.90s/it] {'loss': 0.6401, 'learning_rate': 3.24438202247191e-07, 'losses/dpo': 0.6096771955490112, 'losses/sft': 0.7951339483261108, 'losses/total': 0.6096771955490112, 'rewards/chosen': -0.05192602425813675, 'rewards/rejected': -0.1801932007074356, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.12826718389987946, 'logps/rejected': -29.666513442993164, 'logps/chosen': -23.07083511352539, 'ref_logps/rejected': -27.86458396911621, 'ref_logps/chosen': -22.55157470703125, 'epoch': 1.25} 42%|████▏ | 165/396 [1:20:12<1:51:15, 28.90s/it] 42%|████▏ | 166/396 [1:20:41<1:51:26, 29.07s/it] {'loss': 0.6543, 'learning_rate': 3.2303370786516856e-07, 'losses/dpo': 0.5806229710578918, 'losses/sft': 0.9021787047386169, 'losses/total': 0.5806229710578918, 'rewards/chosen': -0.0881301686167717, 'rewards/rejected': -0.18992936611175537, 'rewards/accuracies': 0.609375, 'rewards/margins': 0.10179921984672546, 'logps/rejected': -26.387611389160156, 'logps/chosen': -23.97926139831543, 'ref_logps/rejected': -24.48831558227539, 'ref_logps/chosen': -23.097957611083984, 'epoch': 1.25} 42%|████▏ | 166/396 [1:20:41<1:51:26, 29.07s/it] 42%|████▏ | 167/396 [1:21:10<1:50:48, 29.03s/it] {'loss': 0.6439, 'learning_rate': 3.21629213483146e-07, 'losses/dpo': 0.5786381959915161, 'losses/sft': 0.9020153284072876, 'losses/total': 0.5786381959915161, 'rewards/chosen': -0.04694243520498276, 'rewards/rejected': -0.16664597392082214, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.11970352381467819, 'logps/rejected': -28.20893669128418, 'logps/chosen': -24.496349334716797, 'ref_logps/rejected': -26.54248046875, 'ref_logps/chosen': -24.026926040649414, 'epoch': 1.26} 42%|████▏ | 167/396 [1:21:10<1:50:48, 29.03s/it] 42%|████▏ | 168/396 [1:21:39<1:50:10, 28.99s/it] {'loss': 0.6389, 'learning_rate': 3.2022471910112357e-07, 'losses/dpo': 0.6521559953689575, 'losses/sft': 0.9907703399658203, 'losses/total': 0.6521559953689575, 'rewards/chosen': -0.042305897921323776, 'rewards/rejected': -0.18308192491531372, 'rewards/accuracies': 0.640625, 'rewards/margins': 0.14077602326869965, 'logps/rejected': -29.19955825805664, 'logps/chosen': -23.227306365966797, 'ref_logps/rejected': -27.36874008178711, 'ref_logps/chosen': -22.804248809814453, 'epoch': 1.27} 42%|████▏ | 168/396 [1:21:39<1:50:10, 28.99s/it] 43%|████▎ | 169/396 [1:22:08<1:49:33, 28.96s/it] {'loss': 0.6512, 'learning_rate': 3.1882022471910107e-07, 'losses/dpo': 0.6903020143508911, 'losses/sft': 0.8463045358657837, 'losses/total': 0.6903020143508911, 'rewards/chosen': -0.05812288075685501, 'rewards/rejected': -0.1624196618795395, 'rewards/accuracies': 0.6640625, 'rewards/margins': 0.10429678112268448, 'logps/rejected': -27.533721923828125, 'logps/chosen': -22.211841583251953, 'ref_logps/rejected': -25.909526824951172, 'ref_logps/chosen': -21.630611419677734, 'epoch': 1.28} 43%|████▎ | 169/396 [1:22:08<1:49:33, 28.96s/it] 43%|████▎ | 170/396 [1:22:37<1:49:07, 28.97s/it] {'loss': 0.6155, 'learning_rate': 3.1741573033707863e-07, 'losses/dpo': 0.6296464204788208, 'losses/sft': 0.6626120805740356, 'losses/total': 0.6296464204788208, 'rewards/chosen': -0.020640213042497635, 'rewards/rejected': -0.19855400919914246, 'rewards/accuracies': 0.796875, 'rewards/margins': 0.17791378498077393, 'logps/rejected': -28.400074005126953, 'logps/chosen': -22.332489013671875, 'ref_logps/rejected': -26.414535522460938, 'ref_logps/chosen': -22.126087188720703, 'epoch': 1.28} 43%|████▎ | 170/396 [1:22:37<1:49:07, 28.97s/it] 43%|████▎ | 171/396 [1:23:06<1:48:34, 28.96s/it] {'loss': 0.5971, 'learning_rate': 3.160112359550562e-07, 'losses/dpo': 0.6422166228294373, 'losses/sft': 0.7472187876701355, 'losses/total': 0.6422166228294373, 'rewards/chosen': -0.008293594233691692, 'rewards/rejected': -0.24323543906211853, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.23494186997413635, 'logps/rejected': -30.088207244873047, 'logps/chosen': -23.771900177001953, 'ref_logps/rejected': -27.655853271484375, 'ref_logps/chosen': -23.688966751098633, 'epoch': 1.29} 43%|████▎ | 171/396 [1:23:06<1:48:34, 28.96s/it] 43%|████▎ | 172/396 [1:23:35<1:48:49, 29.15s/it] {'loss': 0.6459, 'learning_rate': 3.146067415730337e-07, 'losses/dpo': 0.6455183029174805, 'losses/sft': 0.8395851850509644, 'losses/total': 0.6455183029174805, 'rewards/chosen': -0.0709431990981102, 'rewards/rejected': -0.19281914830207825, 'rewards/accuracies': 0.6640625, 'rewards/margins': 0.12187594175338745, 'logps/rejected': -27.53687286376953, 'logps/chosen': -23.348037719726562, 'ref_logps/rejected': -25.60868263244629, 'ref_logps/chosen': -22.63860511779785, 'epoch': 1.3} 43%|████▎ | 172/396 [1:23:35<1:48:49, 29.15s/it] 44%|████▎ | 173/396 [1:24:04<1:48:04, 29.08s/it] {'loss': 0.627, 'learning_rate': 3.1320224719101125e-07, 'losses/dpo': 0.6627662181854248, 'losses/sft': 0.9079832434654236, 'losses/total': 0.6627662181854248, 'rewards/chosen': -0.07765418291091919, 'rewards/rejected': -0.24502840638160706, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.16737422347068787, 'logps/rejected': -30.49142074584961, 'logps/chosen': -24.17770767211914, 'ref_logps/rejected': -28.0411376953125, 'ref_logps/chosen': -23.401166915893555, 'epoch': 1.31} 44%|████▎ | 173/396 [1:24:04<1:48:04, 29.08s/it] 44%|████▍ | 174/396 [1:24:33<1:47:23, 29.02s/it] {'loss': 0.6251, 'learning_rate': 3.1179775280898875e-07, 'losses/dpo': 0.6143248081207275, 'losses/sft': 0.6558141112327576, 'losses/total': 0.6143248081207275, 'rewards/chosen': -0.047265198081731796, 'rewards/rejected': -0.21331676840782166, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.16605158150196075, 'logps/rejected': -27.670101165771484, 'logps/chosen': -24.392324447631836, 'ref_logps/rejected': -25.53693389892578, 'ref_logps/chosen': -23.9196720123291, 'epoch': 1.31} 44%|████▍ | 174/396 [1:24:33<1:47:23, 29.02s/it] 44%|████▍ | 175/396 [1:25:02<1:47:00, 29.05s/it] {'loss': 0.6157, 'learning_rate': 3.103932584269663e-07, 'losses/dpo': 0.5933184623718262, 'losses/sft': 0.9941530227661133, 'losses/total': 0.5933184623718262, 'rewards/chosen': -0.05922209471464157, 'rewards/rejected': -0.24407947063446045, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.1848573535680771, 'logps/rejected': -33.37188720703125, 'logps/chosen': -24.742660522460938, 'ref_logps/rejected': -30.931093215942383, 'ref_logps/chosen': -24.150442123413086, 'epoch': 1.32} 44%|████▍ | 175/396 [1:25:02<1:47:00, 29.05s/it] 44%|████▍ | 176/396 [1:25:31<1:46:27, 29.03s/it] {'loss': 0.6428, 'learning_rate': 3.0898876404494376e-07, 'losses/dpo': 0.6548395156860352, 'losses/sft': 0.9564076066017151, 'losses/total': 0.6548395156860352, 'rewards/chosen': -0.09491994976997375, 'rewards/rejected': -0.21877314150333405, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.12385320663452148, 'logps/rejected': -28.516223907470703, 'logps/chosen': -23.68863296508789, 'ref_logps/rejected': -26.3284912109375, 'ref_logps/chosen': -22.73943519592285, 'epoch': 1.33} 44%|████▍ | 176/396 [1:25:31<1:46:27, 29.03s/it] 45%|████▍ | 177/396 [1:26:00<1:45:50, 29.00s/it] {'loss': 0.6179, 'learning_rate': 3.075842696629213e-07, 'losses/dpo': 0.5700336694717407, 'losses/sft': 0.8869008421897888, 'losses/total': 0.5700336694717407, 'rewards/chosen': -0.07905411720275879, 'rewards/rejected': -0.2682107090950012, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.18915657699108124, 'logps/rejected': -31.241607666015625, 'logps/chosen': -22.527427673339844, 'ref_logps/rejected': -28.55950164794922, 'ref_logps/chosen': -21.73688507080078, 'epoch': 1.34} 45%|████▍ | 177/396 [1:26:00<1:45:50, 29.00s/it] 45%|████▍ | 178/396 [1:26:30<1:45:54, 29.15s/it] {'loss': 0.6425, 'learning_rate': 3.0617977528089887e-07, 'losses/dpo': 0.651595413684845, 'losses/sft': 0.8127326369285583, 'losses/total': 0.651595413684845, 'rewards/chosen': -0.07222998142242432, 'rewards/rejected': -0.20315957069396973, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.1309295892715454, 'logps/rejected': -26.000946044921875, 'logps/chosen': -22.916969299316406, 'ref_logps/rejected': -23.969348907470703, 'ref_logps/chosen': -22.194671630859375, 'epoch': 1.34} 45%|████▍ | 178/396 [1:26:30<1:45:54, 29.15s/it] 45%|████▌ | 179/396 [1:26:59<1:45:52, 29.27s/it] {'loss': 0.6217, 'learning_rate': 3.047752808988764e-07, 'losses/dpo': 0.7334872484207153, 'losses/sft': 0.9430239200592041, 'losses/total': 0.7334872484207153, 'rewards/chosen': -0.08968427777290344, 'rewards/rejected': -0.2699398398399353, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.18025556206703186, 'logps/rejected': -27.46251106262207, 'logps/chosen': -20.320987701416016, 'ref_logps/rejected': -24.763113021850586, 'ref_logps/chosen': -19.424144744873047, 'epoch': 1.35} 45%|████▌ | 179/396 [1:26:59<1:45:52, 29.27s/it] 45%|████▌ | 180/396 [1:27:28<1:45:03, 29.18s/it] {'loss': 0.6381, 'learning_rate': 3.0337078651685393e-07, 'losses/dpo': 0.6393001079559326, 'losses/sft': 0.766620397567749, 'losses/total': 0.6393001079559326, 'rewards/chosen': -0.10556241869926453, 'rewards/rejected': -0.2442680299282074, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.13870559632778168, 'logps/rejected': -27.422889709472656, 'logps/chosen': -23.853857040405273, 'ref_logps/rejected': -24.980205535888672, 'ref_logps/chosen': -22.798233032226562, 'epoch': 1.36} 45%|████▌ | 180/396 [1:27:28<1:45:03, 29.18s/it] 46%|████▌ | 181/396 [1:27:57<1:44:24, 29.14s/it] {'loss': 0.6234, 'learning_rate': 3.0196629213483144e-07, 'losses/dpo': 0.6311055421829224, 'losses/sft': 0.9324018955230713, 'losses/total': 0.6311055421829224, 'rewards/chosen': -0.04702185466885567, 'rewards/rejected': -0.2207161784172058, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.17369432747364044, 'logps/rejected': -27.110477447509766, 'logps/chosen': -23.360549926757812, 'ref_logps/rejected': -24.903316497802734, 'ref_logps/chosen': -22.890331268310547, 'epoch': 1.37} 46%|████▌ | 181/396 [1:27:57<1:44:24, 29.14s/it] 46%|████▌ | 182/396 [1:28:26<1:43:49, 29.11s/it] {'loss': 0.5926, 'learning_rate': 3.00561797752809e-07, 'losses/dpo': 0.6243355870246887, 'losses/sft': 0.8456003665924072, 'losses/total': 0.6243355870246887, 'rewards/chosen': -0.04487309604883194, 'rewards/rejected': -0.28946632146835327, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.24459321796894073, 'logps/rejected': -31.04292106628418, 'logps/chosen': -23.004093170166016, 'ref_logps/rejected': -28.14826011657715, 'ref_logps/chosen': -22.555362701416016, 'epoch': 1.37} 46%|████▌ | 182/396 [1:28:26<1:43:49, 29.11s/it] 46%|████▌ | 183/396 [1:28:55<1:43:10, 29.07s/it] {'loss': 0.6099, 'learning_rate': 2.991573033707865e-07, 'losses/dpo': 0.6743872761726379, 'losses/sft': 0.836949348449707, 'losses/total': 0.6743872761726379, 'rewards/chosen': -0.09876400232315063, 'rewards/rejected': -0.31580623984336853, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.2170422226190567, 'logps/rejected': -30.511489868164062, 'logps/chosen': -26.668237686157227, 'ref_logps/rejected': -27.35342788696289, 'ref_logps/chosen': -25.680599212646484, 'epoch': 1.38} 46%|████▌ | 183/396 [1:28:55<1:43:10, 29.07s/it] 46%|████▋ | 184/396 [1:29:25<1:43:08, 29.19s/it] {'loss': 0.6119, 'learning_rate': 2.9775280898876406e-07, 'losses/dpo': 0.5823447704315186, 'losses/sft': 0.8065779805183411, 'losses/total': 0.5823447704315186, 'rewards/chosen': -0.07939236611127853, 'rewards/rejected': -0.2866936922073364, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.20730134844779968, 'logps/rejected': -28.162975311279297, 'logps/chosen': -23.974590301513672, 'ref_logps/rejected': -25.296037673950195, 'ref_logps/chosen': -23.180667877197266, 'epoch': 1.39} 46%|████▋ | 184/396 [1:29:25<1:43:08, 29.19s/it] 47%|████▋ | 185/396 [1:29:54<1:42:28, 29.14s/it] {'loss': 0.6203, 'learning_rate': 2.9634831460674156e-07, 'losses/dpo': 0.5889841318130493, 'losses/sft': 0.8877280354499817, 'losses/total': 0.5889841318130493, 'rewards/chosen': -0.12300599366426468, 'rewards/rejected': -0.3046685457229614, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.18166252970695496, 'logps/rejected': -30.05943489074707, 'logps/chosen': -24.01116943359375, 'ref_logps/rejected': -27.01274871826172, 'ref_logps/chosen': -22.781108856201172, 'epoch': 1.4} 47%|████▋ | 185/396 [1:29:54<1:42:28, 29.14s/it] 47%|████▋ | 186/396 [1:30:23<1:41:59, 29.14s/it] {'loss': 0.6198, 'learning_rate': 2.9494382022471906e-07, 'losses/dpo': 0.6025291681289673, 'losses/sft': 0.93308424949646, 'losses/total': 0.6025291681289673, 'rewards/chosen': -0.12049318104982376, 'rewards/rejected': -0.3076876401901245, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.18719442188739777, 'logps/rejected': -28.1258544921875, 'logps/chosen': -22.79621124267578, 'ref_logps/rejected': -25.04897689819336, 'ref_logps/chosen': -21.591278076171875, 'epoch': 1.4} 47%|████▋ | 186/396 [1:30:23<1:41:59, 29.14s/it] 47%|████▋ | 187/396 [1:30:52<1:41:24, 29.11s/it] {'loss': 0.6277, 'learning_rate': 2.935393258426966e-07, 'losses/dpo': 0.5978178977966309, 'losses/sft': 0.7778979539871216, 'losses/total': 0.5978178977966309, 'rewards/chosen': -0.11195877939462662, 'rewards/rejected': -0.2878290116786957, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.17587023973464966, 'logps/rejected': -30.58446502685547, 'logps/chosen': -24.246837615966797, 'ref_logps/rejected': -27.7061767578125, 'ref_logps/chosen': -23.127248764038086, 'epoch': 1.41} 47%|████▋ | 187/396 [1:30:52<1:41:24, 29.11s/it] 47%|████▋ | 188/396 [1:31:21<1:40:42, 29.05s/it] {'loss': 0.6458, 'learning_rate': 2.921348314606741e-07, 'losses/dpo': 0.6147331595420837, 'losses/sft': 0.8299495577812195, 'losses/total': 0.6147331595420837, 'rewards/chosen': -0.1553977131843567, 'rewards/rejected': -0.30890583992004395, 'rewards/accuracies': 0.59375, 'rewards/margins': 0.15350814163684845, 'logps/rejected': -29.098743438720703, 'logps/chosen': -24.55533218383789, 'ref_logps/rejected': -26.009681701660156, 'ref_logps/chosen': -23.001358032226562, 'epoch': 1.42} 47%|████▋ | 188/396 [1:31:21<1:40:42, 29.05s/it] 48%|████▊ | 189/396 [1:31:50<1:40:12, 29.05s/it] {'loss': 0.5968, 'learning_rate': 2.907303370786517e-07, 'losses/dpo': 0.5409806370735168, 'losses/sft': 0.8110998272895813, 'losses/total': 0.5409806370735168, 'rewards/chosen': -0.08966411650180817, 'rewards/rejected': -0.32687509059906006, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.23721098899841309, 'logps/rejected': -30.61502456665039, 'logps/chosen': -22.7973690032959, 'ref_logps/rejected': -27.346271514892578, 'ref_logps/chosen': -21.900728225708008, 'epoch': 1.43} 48%|████▊ | 189/396 [1:31:50<1:40:12, 29.05s/it] 48%|████▊ | 190/396 [1:32:19<1:39:53, 29.09s/it] {'loss': 0.636, 'learning_rate': 2.893258426966292e-07, 'losses/dpo': 0.6395488977432251, 'losses/sft': 0.8838689923286438, 'losses/total': 0.6395488977432251, 'rewards/chosen': -0.14787010848522186, 'rewards/rejected': -0.3065232038497925, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.15865309536457062, 'logps/rejected': -28.09313201904297, 'logps/chosen': -21.656837463378906, 'ref_logps/rejected': -25.027902603149414, 'ref_logps/chosen': -20.17813491821289, 'epoch': 1.43} 48%|████▊ | 190/396 [1:32:19<1:39:53, 29.09s/it] 48%|████▊ | 191/396 [1:32:48<1:39:20, 29.08s/it] {'loss': 0.6131, 'learning_rate': 2.8792134831460674e-07, 'losses/dpo': 0.6822565197944641, 'losses/sft': 0.7876338362693787, 'losses/total': 0.6822565197944641, 'rewards/chosen': -0.11451825499534607, 'rewards/rejected': -0.33027884364128113, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.21576061844825745, 'logps/rejected': -32.06410217285156, 'logps/chosen': -23.13861083984375, 'ref_logps/rejected': -28.761310577392578, 'ref_logps/chosen': -21.99342918395996, 'epoch': 1.44} 48%|████▊ | 191/396 [1:32:48<1:39:20, 29.08s/it] 48%|████▊ | 192/396 [1:33:17<1:39:09, 29.17s/it] {'loss': 0.6132, 'learning_rate': 2.8651685393258425e-07, 'losses/dpo': 0.5694007873535156, 'losses/sft': 0.7940797805786133, 'losses/total': 0.5694007873535156, 'rewards/chosen': -0.07051999121904373, 'rewards/rejected': -0.2690831422805786, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.19856315851211548, 'logps/rejected': -27.791099548339844, 'logps/chosen': -22.36726951599121, 'ref_logps/rejected': -25.100269317626953, 'ref_logps/chosen': -21.662071228027344, 'epoch': 1.45} 48%|████▊ | 192/396 [1:33:17<1:39:09, 29.17s/it] 49%|████▊ | 193/396 [1:33:47<1:38:45, 29.19s/it] {'loss': 0.6251, 'learning_rate': 2.851123595505618e-07, 'losses/dpo': 0.6676912307739258, 'losses/sft': 0.8101266026496887, 'losses/total': 0.6676912307739258, 'rewards/chosen': -0.11661653220653534, 'rewards/rejected': -0.2914498448371887, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.1748332977294922, 'logps/rejected': -30.479598999023438, 'logps/chosen': -24.541927337646484, 'ref_logps/rejected': -27.565099716186523, 'ref_logps/chosen': -23.375761032104492, 'epoch': 1.46} 49%|████▊ | 193/396 [1:33:47<1:38:45, 29.19s/it] 49%|████▉ | 194/396 [1:34:16<1:37:57, 29.10s/it] {'loss': 0.6289, 'learning_rate': 2.8370786516853936e-07, 'losses/dpo': 0.6359354257583618, 'losses/sft': 0.846460223197937, 'losses/total': 0.6359354257583618, 'rewards/chosen': -0.12630482017993927, 'rewards/rejected': -0.30416491627693176, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.1778600960969925, 'logps/rejected': -30.262849807739258, 'logps/chosen': -23.954505920410156, 'ref_logps/rejected': -27.221202850341797, 'ref_logps/chosen': -22.69145965576172, 'epoch': 1.46} 49%|████▉ | 194/396 [1:34:16<1:37:57, 29.10s/it] 49%|████▉ | 195/396 [1:34:44<1:37:13, 29.02s/it] {'loss': 0.6017, 'learning_rate': 2.823033707865168e-07, 'losses/dpo': 0.6264960765838623, 'losses/sft': 0.906339704990387, 'losses/total': 0.6264960765838623, 'rewards/chosen': -0.09929438680410385, 'rewards/rejected': -0.3245629370212555, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.22526855766773224, 'logps/rejected': -32.26765823364258, 'logps/chosen': -25.615474700927734, 'ref_logps/rejected': -29.02202796936035, 'ref_logps/chosen': -24.62253189086914, 'epoch': 1.47} 49%|████▉ | 195/396 [1:34:44<1:37:13, 29.02s/it] 49%|████▉ | 196/396 [1:35:13<1:36:30, 28.95s/it] {'loss': 0.6191, 'learning_rate': 2.8089887640449437e-07, 'losses/dpo': 0.6483104825019836, 'losses/sft': 0.9074235558509827, 'losses/total': 0.6483104825019836, 'rewards/chosen': -0.14234672486782074, 'rewards/rejected': -0.3314274847507477, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.18908075988292694, 'logps/rejected': -28.347021102905273, 'logps/chosen': -22.84251594543457, 'ref_logps/rejected': -25.032745361328125, 'ref_logps/chosen': -21.419048309326172, 'epoch': 1.48} 49%|████▉ | 196/396 [1:35:13<1:36:30, 28.95s/it] 50%|████▉ | 197/396 [1:35:42<1:35:54, 28.92s/it] {'loss': 0.6238, 'learning_rate': 2.794943820224719e-07, 'losses/dpo': 0.6014984250068665, 'losses/sft': 0.773016631603241, 'losses/total': 0.6014984250068665, 'rewards/chosen': -0.13099724054336548, 'rewards/rejected': -0.323010116815567, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.19201286137104034, 'logps/rejected': -28.329975128173828, 'logps/chosen': -23.73548126220703, 'ref_logps/rejected': -25.099872589111328, 'ref_logps/chosen': -22.425506591796875, 'epoch': 1.49} 50%|████▉ | 197/396 [1:35:42<1:35:54, 28.92s/it] 50%|█████ | 198/396 [1:36:11<1:35:31, 28.95s/it] {'loss': 0.625, 'learning_rate': 2.7808988764044943e-07, 'losses/dpo': 0.6309884190559387, 'losses/sft': 0.8918415307998657, 'losses/total': 0.6309884190559387, 'rewards/chosen': -0.14819550514221191, 'rewards/rejected': -0.32420486211776733, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.17600935697555542, 'logps/rejected': -30.921403884887695, 'logps/chosen': -26.183156967163086, 'ref_logps/rejected': -27.679357528686523, 'ref_logps/chosen': -24.701202392578125, 'epoch': 1.49} 50%|█████ | 198/396 [1:36:11<1:35:31, 28.95s/it] 50%|█████ | 199/396 [1:36:40<1:34:57, 28.92s/it] {'loss': 0.6156, 'learning_rate': 2.7668539325842694e-07, 'losses/dpo': 0.6188192367553711, 'losses/sft': 0.8410817384719849, 'losses/total': 0.6188192367553711, 'rewards/chosen': -0.13816949725151062, 'rewards/rejected': -0.3350033462047577, 'rewards/accuracies': 0.6328125, 'rewards/margins': 0.19683387875556946, 'logps/rejected': -29.73432731628418, 'logps/chosen': -23.88658905029297, 'ref_logps/rejected': -26.384294509887695, 'ref_logps/chosen': -22.504894256591797, 'epoch': 1.5} 50%|█████ | 199/396 [1:36:40<1:34:57, 28.92s/it] 51%|█████ | 200/396 [1:37:09<1:34:25, 28.91s/it] {'loss': 0.637, 'learning_rate': 2.752808988764045e-07, 'losses/dpo': 0.6995939612388611, 'losses/sft': 0.9283435344696045, 'losses/total': 0.6995939612388611, 'rewards/chosen': -0.1690514236688614, 'rewards/rejected': -0.32909804582595825, 'rewards/accuracies': 0.625, 'rewards/margins': 0.16004663705825806, 'logps/rejected': -29.627685546875, 'logps/chosen': -23.145811080932617, 'ref_logps/rejected': -26.336702346801758, 'ref_logps/chosen': -21.45529556274414, 'epoch': 1.51} 51%|█████ | 200/396 [1:37:09<1:34:25, 28.91s/it]/mnt/bn/liangkeg/ruohongz/vllm/dpo_experiment/trl/trainer/dpo_trainer.py:1138: UserWarning: compute_loss is only implemented for DPODataCollatorWithPadding, and you passed a datacollator that is different than DPODataCollatorWithPadding - you might see unexpected behavior. Alternatively, you can implement your own prediction_step method if you are using a custom data collator warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /usr/local/lib/python3.9/dist-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( 51%|█████ | 201/396 [1:38:07<2:02:23, 37.66s/it] {'loss': 0.608, 'learning_rate': 2.73876404494382e-07, 'losses/dpo': 0.6513813734054565, 'losses/sft': 0.9403305649757385, 'losses/total': 0.6513813734054565, 'rewards/chosen': -0.15345513820648193, 'rewards/rejected': -0.3899462819099426, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.23649117350578308, 'logps/rejected': -30.04849624633789, 'logps/chosen': -22.545406341552734, 'ref_logps/rejected': -26.149032592773438, 'ref_logps/chosen': -21.010854721069336, 'epoch': 1.52} 51%|█████ | 201/396 [1:38:07<2:02:23, 37.66s/it] 51%|█████ | 202/396 [1:38:36<1:53:04, 34.97s/it] {'loss': 0.6007, 'learning_rate': 2.7247191011235955e-07, 'losses/dpo': 0.5443820357322693, 'losses/sft': 0.8517413139343262, 'losses/total': 0.5443820357322693, 'rewards/chosen': -0.1342916190624237, 'rewards/rejected': -0.37330782413482666, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.23901620507240295, 'logps/rejected': -28.583681106567383, 'logps/chosen': -22.640438079833984, 'ref_logps/rejected': -24.850605010986328, 'ref_logps/chosen': -21.29751968383789, 'epoch': 1.52} 51%|█████ | 202/396 [1:38:36<1:53:04, 34.97s/it] 51%|█████▏ | 203/396 [1:39:05<1:46:46, 33.19s/it] {'loss': 0.6029, 'learning_rate': 2.710674157303371e-07, 'losses/dpo': 0.5749891996383667, 'losses/sft': 0.9417051672935486, 'losses/total': 0.5749891996383667, 'rewards/chosen': -0.17907381057739258, 'rewards/rejected': -0.41839560866355896, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.23932181298732758, 'logps/rejected': -32.96052551269531, 'logps/chosen': -25.259624481201172, 'ref_logps/rejected': -28.776565551757812, 'ref_logps/chosen': -23.468887329101562, 'epoch': 1.53} 51%|█████▏ | 203/396 [1:39:05<1:46:46, 33.19s/it] 52%|█████▏ | 204/396 [1:39:34<1:42:05, 31.91s/it] {'loss': 0.6256, 'learning_rate': 2.6966292134831456e-07, 'losses/dpo': 0.6045551896095276, 'losses/sft': 0.8162484169006348, 'losses/total': 0.6045551896095276, 'rewards/chosen': -0.1712397187948227, 'rewards/rejected': -0.36638063192367554, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.19514092803001404, 'logps/rejected': -31.409852981567383, 'logps/chosen': -24.431142807006836, 'ref_logps/rejected': -27.74604606628418, 'ref_logps/chosen': -22.7187442779541, 'epoch': 1.54} 52%|█████▏ | 204/396 [1:39:34<1:42:05, 31.91s/it] 52%|█████▏ | 205/396 [1:40:02<1:38:40, 31.00s/it] {'loss': 0.6093, 'learning_rate': 2.682584269662921e-07, 'losses/dpo': 0.630817711353302, 'losses/sft': 0.907343327999115, 'losses/total': 0.630817711353302, 'rewards/chosen': -0.16267219185829163, 'rewards/rejected': -0.38944315910339355, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.22677099704742432, 'logps/rejected': -30.418426513671875, 'logps/chosen': -22.776988983154297, 'ref_logps/rejected': -26.52399444580078, 'ref_logps/chosen': -21.150266647338867, 'epoch': 1.55} 52%|█████▏ | 205/396 [1:40:02<1:38:40, 31.00s/it] 52%|█████▏ | 206/396 [1:40:31<1:36:07, 30.36s/it] {'loss': 0.5834, 'learning_rate': 2.668539325842696e-07, 'losses/dpo': 0.5977815389633179, 'losses/sft': 0.8870611190795898, 'losses/total': 0.5977815389633179, 'rewards/chosen': -0.13771943747997284, 'rewards/rejected': -0.4336281716823578, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.29590874910354614, 'logps/rejected': -32.63694763183594, 'logps/chosen': -24.300395965576172, 'ref_logps/rejected': -28.30066680908203, 'ref_logps/chosen': -22.923202514648438, 'epoch': 1.55} 52%|█████▏ | 206/396 [1:40:31<1:36:07, 30.36s/it] 52%|█████▏ | 207/396 [1:41:00<1:34:25, 29.97s/it] {'loss': 0.6248, 'learning_rate': 2.654494382022472e-07, 'losses/dpo': 0.593975841999054, 'losses/sft': 0.8298511505126953, 'losses/total': 0.593975841999054, 'rewards/chosen': -0.16984564065933228, 'rewards/rejected': -0.3705544173717499, 'rewards/accuracies': 0.6171875, 'rewards/margins': 0.2007087767124176, 'logps/rejected': -28.686279296875, 'logps/chosen': -25.562063217163086, 'ref_logps/rejected': -24.980735778808594, 'ref_logps/chosen': -23.863605499267578, 'epoch': 1.56} 52%|█████▏ | 207/396 [1:41:00<1:34:25, 29.97s/it] 53%|█████▎ | 208/396 [1:41:29<1:32:58, 29.67s/it] {'loss': 0.6072, 'learning_rate': 2.640449438202247e-07, 'losses/dpo': 0.5785881280899048, 'losses/sft': 0.9283973574638367, 'losses/total': 0.5785881280899048, 'rewards/chosen': -0.158656507730484, 'rewards/rejected': -0.3880493640899658, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.2293928861618042, 'logps/rejected': -32.86896514892578, 'logps/chosen': -24.133087158203125, 'ref_logps/rejected': -28.988473892211914, 'ref_logps/chosen': -22.546520233154297, 'epoch': 1.57} 53%|█████▎ | 208/396 [1:41:29<1:32:58, 29.67s/it] 53%|█████▎ | 209/396 [1:41:59<1:32:06, 29.55s/it] {'loss': 0.5743, 'learning_rate': 2.6264044943820224e-07, 'losses/dpo': 0.5111271142959595, 'losses/sft': 0.7807843685150146, 'losses/total': 0.5111271142959595, 'rewards/chosen': -0.1349155306816101, 'rewards/rejected': -0.43871009349823, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.3037945628166199, 'logps/rejected': -33.77753448486328, 'logps/chosen': -21.610166549682617, 'ref_logps/rejected': -29.390432357788086, 'ref_logps/chosen': -20.26101303100586, 'epoch': 1.58} 53%|█████▎ | 209/396 [1:41:59<1:32:06, 29.55s/it] 53%|█████▎ | 210/396 [1:42:28<1:31:11, 29.41s/it] {'loss': 0.621, 'learning_rate': 2.612359550561798e-07, 'losses/dpo': 0.6254321336746216, 'losses/sft': 0.7647839188575745, 'losses/total': 0.6254321336746216, 'rewards/chosen': -0.1761387437582016, 'rewards/rejected': -0.38204440474510193, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.20590564608573914, 'logps/rejected': -28.993408203125, 'logps/chosen': -24.600027084350586, 'ref_logps/rejected': -25.172962188720703, 'ref_logps/chosen': -22.838638305664062, 'epoch': 1.58} 53%|█████▎ | 210/396 [1:42:28<1:31:11, 29.41s/it] 53%|█████▎ | 211/396 [1:42:57<1:30:25, 29.33s/it] {'loss': 0.6078, 'learning_rate': 2.598314606741573e-07, 'losses/dpo': 0.6571998000144958, 'losses/sft': 0.8880329728126526, 'losses/total': 0.6571998000144958, 'rewards/chosen': -0.19707328081130981, 'rewards/rejected': -0.4442688822746277, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.24719560146331787, 'logps/rejected': -32.02477264404297, 'logps/chosen': -25.24309730529785, 'ref_logps/rejected': -27.582080841064453, 'ref_logps/chosen': -23.272363662719727, 'epoch': 1.59} 53%|█████▎ | 211/396 [1:42:57<1:30:25, 29.33s/it] 54%|█████▎ | 212/396 [1:43:26<1:29:34, 29.21s/it] {'loss': 0.5954, 'learning_rate': 2.5842696629213486e-07, 'losses/dpo': 0.6153095960617065, 'losses/sft': 0.7867841720581055, 'losses/total': 0.6153095960617065, 'rewards/chosen': -0.19892916083335876, 'rewards/rejected': -0.45600906014442444, 'rewards/accuracies': 0.78125, 'rewards/margins': 0.2570798993110657, 'logps/rejected': -31.662994384765625, 'logps/chosen': -23.570541381835938, 'ref_logps/rejected': -27.1029052734375, 'ref_logps/chosen': -21.58125114440918, 'epoch': 1.6} 54%|█████▎ | 212/396 [1:43:26<1:29:34, 29.21s/it] 54%|█████▍ | 213/396 [1:43:55<1:28:52, 29.14s/it] {'loss': 0.5944, 'learning_rate': 2.5702247191011236e-07, 'losses/dpo': 0.559239387512207, 'losses/sft': 0.8030417561531067, 'losses/total': 0.559239387512207, 'rewards/chosen': -0.18111974000930786, 'rewards/rejected': -0.44782739877700806, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.2667076587677002, 'logps/rejected': -33.26690673828125, 'logps/chosen': -26.515090942382812, 'ref_logps/rejected': -28.788631439208984, 'ref_logps/chosen': -24.70389175415039, 'epoch': 1.61} 54%|█████▍ | 213/396 [1:43:55<1:28:52, 29.14s/it] 54%|█████▍ | 214/396 [1:44:24<1:28:05, 29.04s/it] {'loss': 0.6028, 'learning_rate': 2.5561797752808987e-07, 'losses/dpo': 0.6463332772254944, 'losses/sft': 0.867030918598175, 'losses/total': 0.6463332772254944, 'rewards/chosen': -0.15802377462387085, 'rewards/rejected': -0.3997907340526581, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.24176692962646484, 'logps/rejected': -30.950822830200195, 'logps/chosen': -23.109725952148438, 'ref_logps/rejected': -26.95291519165039, 'ref_logps/chosen': -21.529489517211914, 'epoch': 1.62} 54%|█████▍ | 214/396 [1:44:24<1:28:05, 29.04s/it] 54%|█████▍ | 215/396 [1:44:52<1:27:32, 29.02s/it] {'loss': 0.548, 'learning_rate': 2.5421348314606737e-07, 'losses/dpo': 0.49787038564682007, 'losses/sft': 0.9076435565948486, 'losses/total': 0.49787038564682007, 'rewards/chosen': -0.07684363424777985, 'rewards/rejected': -0.449706107378006, 'rewards/accuracies': 0.796875, 'rewards/margins': 0.37286245822906494, 'logps/rejected': -30.6645450592041, 'logps/chosen': -22.45772933959961, 'ref_logps/rejected': -26.167482376098633, 'ref_logps/chosen': -21.689294815063477, 'epoch': 1.62} 54%|█████▍ | 215/396 [1:44:52<1:27:32, 29.02s/it] 55%|█████▍ | 216/396 [1:45:21<1:26:48, 28.93s/it] {'loss': 0.5791, 'learning_rate': 2.5280898876404493e-07, 'losses/dpo': 0.6228358745574951, 'losses/sft': 0.894844651222229, 'losses/total': 0.6228358745574951, 'rewards/chosen': -0.19276437163352966, 'rewards/rejected': -0.4936027228832245, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.3008383512496948, 'logps/rejected': -31.34885597229004, 'logps/chosen': -23.930644989013672, 'ref_logps/rejected': -26.41282844543457, 'ref_logps/chosen': -22.003002166748047, 'epoch': 1.63} 55%|█████▍ | 216/396 [1:45:21<1:26:48, 28.93s/it] 55%|█████▍ | 217/396 [1:45:50<1:26:15, 28.91s/it] {'loss': 0.5571, 'learning_rate': 2.5140449438202243e-07, 'losses/dpo': 0.5233840942382812, 'losses/sft': 0.8860921263694763, 'losses/total': 0.5233840942382812, 'rewards/chosen': -0.16936028003692627, 'rewards/rejected': -0.5292361378669739, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.35987579822540283, 'logps/rejected': -30.82415199279785, 'logps/chosen': -25.59225082397461, 'ref_logps/rejected': -25.53179168701172, 'ref_logps/chosen': -23.89864730834961, 'epoch': 1.64} 55%|█████▍ | 217/396 [1:45:50<1:26:15, 28.91s/it] 55%|█████▌ | 218/396 [1:46:19<1:25:50, 28.93s/it] {'loss': 0.5821, 'learning_rate': 2.5e-07, 'losses/dpo': 0.5345016121864319, 'losses/sft': 0.9819333553314209, 'losses/total': 0.5345016121864319, 'rewards/chosen': -0.19385257363319397, 'rewards/rejected': -0.5041571855545044, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.3103046417236328, 'logps/rejected': -32.64814376831055, 'logps/chosen': -26.896615982055664, 'ref_logps/rejected': -27.606571197509766, 'ref_logps/chosen': -24.95808982849121, 'epoch': 1.65} 55%|█████▌ | 218/396 [1:46:19<1:25:50, 28.93s/it] 55%|█████▌ | 219/396 [1:46:48<1:25:29, 28.98s/it] {'loss': 0.5621, 'learning_rate': 2.485955056179775e-07, 'losses/dpo': 0.5603345632553101, 'losses/sft': 0.7855640649795532, 'losses/total': 0.5603345632553101, 'rewards/chosen': -0.13391147553920746, 'rewards/rejected': -0.503510594367981, 'rewards/accuracies': 0.75, 'rewards/margins': 0.3695991039276123, 'logps/rejected': -29.887657165527344, 'logps/chosen': -21.461519241333008, 'ref_logps/rejected': -24.85255241394043, 'ref_logps/chosen': -20.122406005859375, 'epoch': 1.65} 55%|█████▌ | 219/396 [1:46:48<1:25:29, 28.98s/it] 56%|█████▌ | 220/396 [1:47:17<1:24:59, 28.97s/it] {'loss': 0.5923, 'learning_rate': 2.4719101123595505e-07, 'losses/dpo': 0.5465586185455322, 'losses/sft': 1.051912546157837, 'losses/total': 0.5465586185455322, 'rewards/chosen': -0.16892319917678833, 'rewards/rejected': -0.4443698525428772, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.27544665336608887, 'logps/rejected': -34.4566764831543, 'logps/chosen': -22.143098831176758, 'ref_logps/rejected': -30.012981414794922, 'ref_logps/chosen': -20.453866958618164, 'epoch': 1.66} 56%|█████▌ | 220/396 [1:47:17<1:24:59, 28.97s/it] 56%|█████▌ | 221/396 [1:47:47<1:24:57, 29.13s/it] {'loss': 0.6149, 'learning_rate': 2.4578651685393255e-07, 'losses/dpo': 0.6469910144805908, 'losses/sft': 1.0151987075805664, 'losses/total': 0.6469910144805908, 'rewards/chosen': -0.220640629529953, 'rewards/rejected': -0.4377601146697998, 'rewards/accuracies': 0.6328125, 'rewards/margins': 0.2171194702386856, 'logps/rejected': -29.772445678710938, 'logps/chosen': -24.042566299438477, 'ref_logps/rejected': -25.394845962524414, 'ref_logps/chosen': -21.836162567138672, 'epoch': 1.67} 56%|█████▌ | 221/396 [1:47:47<1:24:57, 29.13s/it] 56%|█████▌ | 222/396 [1:48:16<1:24:37, 29.18s/it] {'loss': 0.5676, 'learning_rate': 2.443820224719101e-07, 'losses/dpo': 0.6051491498947144, 'losses/sft': 0.8380707502365112, 'losses/total': 0.6051491498947144, 'rewards/chosen': -0.18938273191452026, 'rewards/rejected': -0.5339796543121338, 'rewards/accuracies': 0.75, 'rewards/margins': 0.3445969223976135, 'logps/rejected': -33.834083557128906, 'logps/chosen': -24.834793090820312, 'ref_logps/rejected': -28.49428939819336, 'ref_logps/chosen': -22.940967559814453, 'epoch': 1.68} 56%|█████▌ | 222/396 [1:48:16<1:24:37, 29.18s/it] 56%|█████▋ | 223/396 [1:48:45<1:24:12, 29.20s/it] {'loss': 0.6089, 'learning_rate': 2.429775280898876e-07, 'losses/dpo': 0.5853685140609741, 'losses/sft': 0.6926910877227783, 'losses/total': 0.5853685140609741, 'rewards/chosen': -0.23944953083992004, 'rewards/rejected': -0.47545361518859863, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.2360040545463562, 'logps/rejected': -30.429113388061523, 'logps/chosen': -25.5327091217041, 'ref_logps/rejected': -25.674575805664062, 'ref_logps/chosen': -23.138214111328125, 'epoch': 1.68} 56%|█████▋ | 223/396 [1:48:45<1:24:12, 29.20s/it] 57%|█████▋ | 224/396 [1:49:14<1:23:29, 29.12s/it] {'loss': 0.6134, 'learning_rate': 2.4157303370786517e-07, 'losses/dpo': 0.7566800117492676, 'losses/sft': 0.9139145612716675, 'losses/total': 0.7566800117492676, 'rewards/chosen': -0.20579975843429565, 'rewards/rejected': -0.45619648694992065, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.250396728515625, 'logps/rejected': -29.51090431213379, 'logps/chosen': -24.123153686523438, 'ref_logps/rejected': -24.94894027709961, 'ref_logps/chosen': -22.065155029296875, 'epoch': 1.69} 57%|█████▋ | 224/396 [1:49:14<1:23:29, 29.12s/it] 57%|█████▋ | 225/396 [1:49:43<1:22:53, 29.08s/it] {'loss': 0.6145, 'learning_rate': 2.401685393258427e-07, 'losses/dpo': 0.6078730225563049, 'losses/sft': 1.1017650365829468, 'losses/total': 0.6078730225563049, 'rewards/chosen': -0.2715725004673004, 'rewards/rejected': -0.5027437210083008, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.23117120563983917, 'logps/rejected': -32.90815734863281, 'logps/chosen': -26.274799346923828, 'ref_logps/rejected': -27.880718231201172, 'ref_logps/chosen': -23.55907440185547, 'epoch': 1.7} 57%|█████▋ | 225/396 [1:49:43<1:22:53, 29.08s/it] 57%|█████▋ | 226/396 [1:50:12<1:22:28, 29.11s/it] {'loss': 0.6292, 'learning_rate': 2.3876404494382023e-07, 'losses/dpo': 0.6031284332275391, 'losses/sft': 0.7834776639938354, 'losses/total': 0.6031284332275391, 'rewards/chosen': -0.22974896430969238, 'rewards/rejected': -0.4386328458786011, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.20888389647006989, 'logps/rejected': -30.410335540771484, 'logps/chosen': -25.727689743041992, 'ref_logps/rejected': -26.02400779724121, 'ref_logps/chosen': -23.430198669433594, 'epoch': 1.71} 57%|█████▋ | 226/396 [1:50:12<1:22:28, 29.11s/it] 57%|█████▋ | 227/396 [1:50:41<1:21:51, 29.06s/it] {'loss': 0.5928, 'learning_rate': 2.3735955056179774e-07, 'losses/dpo': 0.5714601874351501, 'losses/sft': 0.8888335227966309, 'losses/total': 0.5714601874351501, 'rewards/chosen': -0.24810227751731873, 'rewards/rejected': -0.5386180877685547, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.29051584005355835, 'logps/rejected': -31.14261245727539, 'logps/chosen': -25.917598724365234, 'ref_logps/rejected': -25.756431579589844, 'ref_logps/chosen': -23.436574935913086, 'epoch': 1.71} 57%|█████▋ | 227/396 [1:50:41<1:21:51, 29.06s/it] 58%|█████▊ | 228/396 [1:51:10<1:21:16, 29.03s/it] {'loss': 0.5505, 'learning_rate': 2.3595505617977527e-07, 'losses/dpo': 0.5715539455413818, 'losses/sft': 0.8663308620452881, 'losses/total': 0.5715539455413818, 'rewards/chosen': -0.20894566178321838, 'rewards/rejected': -0.625501275062561, 'rewards/accuracies': 0.8203125, 'rewards/margins': 0.41655558347702026, 'logps/rejected': -34.945220947265625, 'logps/chosen': -25.50743865966797, 'ref_logps/rejected': -28.690208435058594, 'ref_logps/chosen': -23.417984008789062, 'epoch': 1.72} 58%|█████▊ | 228/396 [1:51:10<1:21:16, 29.03s/it] 58%|█████▊ | 229/396 [1:51:39<1:20:43, 29.00s/it] {'loss': 0.571, 'learning_rate': 2.345505617977528e-07, 'losses/dpo': 0.6053493022918701, 'losses/sft': 0.8246825933456421, 'losses/total': 0.6053493022918701, 'rewards/chosen': -0.23506540060043335, 'rewards/rejected': -0.585770845413208, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.35070547461509705, 'logps/rejected': -34.89327621459961, 'logps/chosen': -23.620698928833008, 'ref_logps/rejected': -29.035568237304688, 'ref_logps/chosen': -21.27004623413086, 'epoch': 1.73} 58%|█████▊ | 229/396 [1:51:39<1:20:43, 29.00s/it] 58%|█████▊ | 230/396 [1:52:08<1:20:15, 29.01s/it] {'loss': 0.5745, 'learning_rate': 2.331460674157303e-07, 'losses/dpo': 0.5964910984039307, 'losses/sft': 0.842921793460846, 'losses/total': 0.5964910984039307, 'rewards/chosen': -0.23740598559379578, 'rewards/rejected': -0.5817204713821411, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.34431448578834534, 'logps/rejected': -34.58841323852539, 'logps/chosen': -21.874225616455078, 'ref_logps/rejected': -28.771209716796875, 'ref_logps/chosen': -19.500164031982422, 'epoch': 1.74} 58%|█████▊ | 230/396 [1:52:08<1:20:15, 29.01s/it] 58%|█████▊ | 231/396 [1:52:37<1:20:04, 29.12s/it] {'loss': 0.6064, 'learning_rate': 2.3174157303370786e-07, 'losses/dpo': 0.5861349105834961, 'losses/sft': 0.9263943433761597, 'losses/total': 0.5861349105834961, 'rewards/chosen': -0.29586488008499146, 'rewards/rejected': -0.5531748533248901, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.2573099732398987, 'logps/rejected': -32.233497619628906, 'logps/chosen': -24.84224510192871, 'ref_logps/rejected': -26.701745986938477, 'ref_logps/chosen': -21.88359832763672, 'epoch': 1.74} 58%|█████▊ | 231/396 [1:52:37<1:20:04, 29.12s/it] 59%|█████▊ | 232/396 [1:53:06<1:19:23, 29.04s/it] {'loss': 0.5747, 'learning_rate': 2.303370786516854e-07, 'losses/dpo': 0.5563768744468689, 'losses/sft': 0.9355225563049316, 'losses/total': 0.5563768744468689, 'rewards/chosen': -0.26525697112083435, 'rewards/rejected': -0.6057769656181335, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.3405200242996216, 'logps/rejected': -34.96025085449219, 'logps/chosen': -25.4254207611084, 'ref_logps/rejected': -28.902484893798828, 'ref_logps/chosen': -22.772850036621094, 'epoch': 1.75} 59%|█████▊ | 232/396 [1:53:06<1:19:23, 29.04s/it] 59%|█████▉ | 233/396 [1:53:36<1:19:00, 29.08s/it] {'loss': 0.6228, 'learning_rate': 2.2893258426966292e-07, 'losses/dpo': 0.6681157946586609, 'losses/sft': 1.0442770719528198, 'losses/total': 0.6681157946586609, 'rewards/chosen': -0.32294073700904846, 'rewards/rejected': -0.5585595369338989, 'rewards/accuracies': 0.640625, 'rewards/margins': 0.23561875522136688, 'logps/rejected': -34.212364196777344, 'logps/chosen': -26.856834411621094, 'ref_logps/rejected': -28.62677001953125, 'ref_logps/chosen': -23.627426147460938, 'epoch': 1.76} 59%|█████▉ | 233/396 [1:53:36<1:19:00, 29.08s/it] 59%|█████▉ | 234/396 [1:54:05<1:18:34, 29.10s/it] {'loss': 0.6217, 'learning_rate': 2.2752808988764045e-07, 'losses/dpo': 0.6866650581359863, 'losses/sft': 0.8693393468856812, 'losses/total': 0.6866650581359863, 'rewards/chosen': -0.31775763630867004, 'rewards/rejected': -0.5735958218574524, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.25583818554878235, 'logps/rejected': -33.41276550292969, 'logps/chosen': -26.366958618164062, 'ref_logps/rejected': -27.676807403564453, 'ref_logps/chosen': -23.189382553100586, 'epoch': 1.77} 59%|█████▉ | 234/396 [1:54:05<1:18:34, 29.10s/it] 59%|█████▉ | 235/396 [1:54:34<1:18:14, 29.16s/it] {'loss': 0.604, 'learning_rate': 2.2612359550561795e-07, 'losses/dpo': 0.5642524361610413, 'losses/sft': 0.9980260133743286, 'losses/total': 0.5642524361610413, 'rewards/chosen': -0.28369593620300293, 'rewards/rejected': -0.5602643489837646, 'rewards/accuracies': 0.609375, 'rewards/margins': 0.2765684127807617, 'logps/rejected': -32.4229736328125, 'logps/chosen': -24.26227569580078, 'ref_logps/rejected': -26.820331573486328, 'ref_logps/chosen': -21.425315856933594, 'epoch': 1.77} 59%|█████▉ | 235/396 [1:54:34<1:18:14, 29.16s/it] 60%|█████▉ | 236/396 [1:55:03<1:17:30, 29.07s/it] {'loss': 0.6448, 'learning_rate': 2.2471910112359549e-07, 'losses/dpo': 0.5940742492675781, 'losses/sft': 0.969171404838562, 'losses/total': 0.5940742492675781, 'rewards/chosen': -0.3312861919403076, 'rewards/rejected': -0.5064890384674072, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.17520278692245483, 'logps/rejected': -31.85492706298828, 'logps/chosen': -27.912431716918945, 'ref_logps/rejected': -26.790037155151367, 'ref_logps/chosen': -24.59956932067871, 'epoch': 1.78} 60%|█████▉ | 236/396 [1:55:03<1:17:30, 29.07s/it] 60%|█████▉ | 237/396 [1:55:32<1:17:14, 29.15s/it] {'loss': 0.5545, 'learning_rate': 2.2331460674157302e-07, 'losses/dpo': 0.5936781764030457, 'losses/sft': 1.015429139137268, 'losses/total': 0.5936781764030457, 'rewards/chosen': -0.27928683161735535, 'rewards/rejected': -0.7101750373840332, 'rewards/accuracies': 0.796875, 'rewards/margins': 0.4308881163597107, 'logps/rejected': -37.65882110595703, 'logps/chosen': -27.303508758544922, 'ref_logps/rejected': -30.55707550048828, 'ref_logps/chosen': -24.510639190673828, 'epoch': 1.79} 60%|█████▉ | 237/396 [1:55:32<1:17:14, 29.15s/it] 60%|██████ | 238/396 [1:56:01<1:16:43, 29.14s/it] {'loss': 0.6034, 'learning_rate': 2.2191011235955055e-07, 'losses/dpo': 0.608791172504425, 'losses/sft': 0.9114975929260254, 'losses/total': 0.608791172504425, 'rewards/chosen': -0.2915502190589905, 'rewards/rejected': -0.5476702451705933, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.2561199963092804, 'logps/rejected': -30.256423950195312, 'logps/chosen': -24.99541473388672, 'ref_logps/rejected': -24.779722213745117, 'ref_logps/chosen': -22.079914093017578, 'epoch': 1.8} 60%|██████ | 238/396 [1:56:01<1:16:43, 29.14s/it] 60%|██████ | 239/396 [1:56:31<1:16:27, 29.22s/it] {'loss': 0.574, 'learning_rate': 2.205056179775281e-07, 'losses/dpo': 0.5037014484405518, 'losses/sft': 0.8922078609466553, 'losses/total': 0.5037014484405518, 'rewards/chosen': -0.2768429219722748, 'rewards/rejected': -0.6418864727020264, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.3650434911251068, 'logps/rejected': -34.17859649658203, 'logps/chosen': -27.542556762695312, 'ref_logps/rejected': -27.759735107421875, 'ref_logps/chosen': -24.774127960205078, 'epoch': 1.8} 60%|██████ | 239/396 [1:56:31<1:16:27, 29.22s/it] 61%|██████ | 240/396 [1:57:00<1:15:56, 29.21s/it] {'loss': 0.6117, 'learning_rate': 2.191011235955056e-07, 'losses/dpo': 0.7050824165344238, 'losses/sft': 0.9497538208961487, 'losses/total': 0.7050824165344238, 'rewards/chosen': -0.27249252796173096, 'rewards/rejected': -0.5582915544509888, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.2857990562915802, 'logps/rejected': -34.46807861328125, 'logps/chosen': -25.87149429321289, 'ref_logps/rejected': -28.885162353515625, 'ref_logps/chosen': -23.14657211303711, 'epoch': 1.81} 61%|██████ | 240/396 [1:57:00<1:15:56, 29.21s/it] 61%|██████ | 241/396 [1:57:29<1:15:12, 29.11s/it] {'loss': 0.5808, 'learning_rate': 2.1769662921348314e-07, 'losses/dpo': 0.5883455276489258, 'losses/sft': 0.9948925375938416, 'losses/total': 0.5883455276489258, 'rewards/chosen': -0.3374406695365906, 'rewards/rejected': -0.66876620054245, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.3313255310058594, 'logps/rejected': -33.622718811035156, 'logps/chosen': -24.89635467529297, 'ref_logps/rejected': -26.93505859375, 'ref_logps/chosen': -21.52194595336914, 'epoch': 1.82} 61%|██████ | 241/396 [1:57:29<1:15:12, 29.11s/it] 61%|██████ | 242/396 [1:57:58<1:14:34, 29.05s/it] {'loss': 0.6136, 'learning_rate': 2.1629213483146067e-07, 'losses/dpo': 0.6376237869262695, 'losses/sft': 0.9374114274978638, 'losses/total': 0.6376237869262695, 'rewards/chosen': -0.30624428391456604, 'rewards/rejected': -0.5729751586914062, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.2667309045791626, 'logps/rejected': -30.974327087402344, 'logps/chosen': -24.833309173583984, 'ref_logps/rejected': -25.24457359313965, 'ref_logps/chosen': -21.7708683013916, 'epoch': 1.83} 61%|██████ | 242/396 [1:57:58<1:14:34, 29.05s/it] 61%|██████▏ | 243/396 [1:58:27<1:14:00, 29.02s/it] {'loss': 0.6133, 'learning_rate': 2.148876404494382e-07, 'losses/dpo': 0.645912766456604, 'losses/sft': 0.9913955926895142, 'losses/total': 0.645912766456604, 'rewards/chosen': -0.32100653648376465, 'rewards/rejected': -0.5834212899208069, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.26241475343704224, 'logps/rejected': -34.610633850097656, 'logps/chosen': -24.04471778869629, 'ref_logps/rejected': -28.77642059326172, 'ref_logps/chosen': -20.834651947021484, 'epoch': 1.83} 61%|██████▏ | 243/396 [1:58:27<1:14:00, 29.02s/it] 62%|██████▏ | 244/396 [1:58:56<1:14:04, 29.24s/it] {'loss': 0.5713, 'learning_rate': 2.134831460674157e-07, 'losses/dpo': 0.6227866411209106, 'losses/sft': 0.9809292554855347, 'losses/total': 0.6227866411209106, 'rewards/chosen': -0.3140770494937897, 'rewards/rejected': -0.6970926523208618, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.38301563262939453, 'logps/rejected': -34.56787109375, 'logps/chosen': -26.419416427612305, 'ref_logps/rejected': -27.596946716308594, 'ref_logps/chosen': -23.278644561767578, 'epoch': 1.84} 62%|██████▏ | 244/396 [1:58:56<1:14:04, 29.24s/it] 62%|██████▏ | 245/396 [1:59:25<1:13:19, 29.13s/it] {'loss': 0.59, 'learning_rate': 2.1207865168539323e-07, 'losses/dpo': 0.6351089477539062, 'losses/sft': 0.9912072420120239, 'losses/total': 0.6351089477539062, 'rewards/chosen': -0.3366050124168396, 'rewards/rejected': -0.6638398170471191, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.32723480463027954, 'logps/rejected': -33.21559524536133, 'logps/chosen': -26.64739990234375, 'ref_logps/rejected': -26.577198028564453, 'ref_logps/chosen': -23.281349182128906, 'epoch': 1.85} 62%|██████▏ | 245/396 [1:59:25<1:13:19, 29.13s/it] 62%|██████▏ | 246/396 [1:59:54<1:12:43, 29.09s/it] {'loss': 0.6064, 'learning_rate': 2.1067415730337076e-07, 'losses/dpo': 0.5233859419822693, 'losses/sft': 0.8136109709739685, 'losses/total': 0.5233859419822693, 'rewards/chosen': -0.3147951364517212, 'rewards/rejected': -0.6298030614852905, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.3150079846382141, 'logps/rejected': -35.08824920654297, 'logps/chosen': -27.422582626342773, 'ref_logps/rejected': -28.790220260620117, 'ref_logps/chosen': -24.274629592895508, 'epoch': 1.86} 62%|██████▏ | 246/396 [1:59:54<1:12:43, 29.09s/it] 62%|██████▏ | 247/396 [2:00:23<1:12:06, 29.04s/it] {'loss': 0.5829, 'learning_rate': 2.0926966292134832e-07, 'losses/dpo': 0.5970532894134521, 'losses/sft': 0.8552703261375427, 'losses/total': 0.5970532894134521, 'rewards/chosen': -0.32263678312301636, 'rewards/rejected': -0.6629844903945923, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.34034764766693115, 'logps/rejected': -31.576181411743164, 'logps/chosen': -26.381507873535156, 'ref_logps/rejected': -24.94633674621582, 'ref_logps/chosen': -23.155136108398438, 'epoch': 1.86} 62%|██████▏ | 247/396 [2:00:23<1:12:06, 29.04s/it] 63%|██████▎ | 248/396 [2:00:52<1:11:51, 29.13s/it] {'loss': 0.6137, 'learning_rate': 2.0786516853932585e-07, 'losses/dpo': 0.6248607039451599, 'losses/sft': 0.8072177767753601, 'losses/total': 0.6248607039451599, 'rewards/chosen': -0.35640496015548706, 'rewards/rejected': -0.6030134558677673, 'rewards/accuracies': 0.625, 'rewards/margins': 0.24660846590995789, 'logps/rejected': -29.508312225341797, 'logps/chosen': -24.061811447143555, 'ref_logps/rejected': -23.47817611694336, 'ref_logps/chosen': -20.497760772705078, 'epoch': 1.87} 63%|██████▎ | 248/396 [2:00:52<1:11:51, 29.13s/it] 63%|██████▎ | 249/396 [2:01:22<1:11:33, 29.21s/it] {'loss': 0.5826, 'learning_rate': 2.0646067415730336e-07, 'losses/dpo': 0.5271694660186768, 'losses/sft': 1.0120395421981812, 'losses/total': 0.5271694660186768, 'rewards/chosen': -0.3308315873146057, 'rewards/rejected': -0.667506217956543, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.33667463064193726, 'logps/rejected': -35.16246032714844, 'logps/chosen': -29.165149688720703, 'ref_logps/rejected': -28.48740005493164, 'ref_logps/chosen': -25.856834411621094, 'epoch': 1.88} 63%|██████▎ | 249/396 [2:01:22<1:11:33, 29.21s/it] 63%|██████▎ | 250/396 [2:01:51<1:11:00, 29.18s/it] {'loss': 0.5345, 'learning_rate': 2.0505617977528089e-07, 'losses/dpo': 0.5425952076911926, 'losses/sft': 0.9156839847564697, 'losses/total': 0.5425952076911926, 'rewards/chosen': -0.2904941737651825, 'rewards/rejected': -0.7790582776069641, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.48856407403945923, 'logps/rejected': -36.45195770263672, 'logps/chosen': -26.1055965423584, 'ref_logps/rejected': -28.661373138427734, 'ref_logps/chosen': -23.200654983520508, 'epoch': 1.89} 63%|██████▎ | 250/396 [2:01:51<1:11:00, 29.18s/it] 63%|██████▎ | 251/396 [2:02:20<1:10:19, 29.10s/it] {'loss': 0.5622, 'learning_rate': 2.0365168539325842e-07, 'losses/dpo': 0.6595858335494995, 'losses/sft': 0.8320033550262451, 'losses/total': 0.6595858335494995, 'rewards/chosen': -0.3515721559524536, 'rewards/rejected': -0.7586992383003235, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.4071270823478699, 'logps/rejected': -36.00947570800781, 'logps/chosen': -24.59746551513672, 'ref_logps/rejected': -28.422481536865234, 'ref_logps/chosen': -21.081745147705078, 'epoch': 1.89} 63%|██████▎ | 251/396 [2:02:20<1:10:19, 29.10s/it] 64%|██████▎ | 252/396 [2:02:49<1:09:45, 29.07s/it] {'loss': 0.5892, 'learning_rate': 2.0224719101123595e-07, 'losses/dpo': 0.5324288606643677, 'losses/sft': 1.0311552286148071, 'losses/total': 0.5324288606643677, 'rewards/chosen': -0.3219751715660095, 'rewards/rejected': -0.6442771553993225, 'rewards/accuracies': 0.65625, 'rewards/margins': 0.322301983833313, 'logps/rejected': -33.07604217529297, 'logps/chosen': -25.407838821411133, 'ref_logps/rejected': -26.633270263671875, 'ref_logps/chosen': -22.188087463378906, 'epoch': 1.9} 64%|██████▎ | 252/396 [2:02:49<1:09:45, 29.07s/it] 64%|██████▍ | 253/396 [2:03:18<1:09:09, 29.02s/it] {'loss': 0.5861, 'learning_rate': 2.0084269662921348e-07, 'losses/dpo': 0.6612842082977295, 'losses/sft': 0.8551939129829407, 'losses/total': 0.6612842082977295, 'rewards/chosen': -0.345088392496109, 'rewards/rejected': -0.6638616919517517, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.3187733292579651, 'logps/rejected': -33.34137725830078, 'logps/chosen': -26.190311431884766, 'ref_logps/rejected': -26.702760696411133, 'ref_logps/chosen': -22.73942756652832, 'epoch': 1.91} 64%|██████▍ | 253/396 [2:03:18<1:09:09, 29.02s/it] 64%|██████▍ | 254/396 [2:03:47<1:08:35, 28.98s/it] {'loss': 0.5511, 'learning_rate': 1.9943820224719098e-07, 'losses/dpo': 0.6082693338394165, 'losses/sft': 1.0973209142684937, 'losses/total': 0.6082693338394165, 'rewards/chosen': -0.3315678834915161, 'rewards/rejected': -0.7560808658599854, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.42451295256614685, 'logps/rejected': -33.776695251464844, 'logps/chosen': -27.615928649902344, 'ref_logps/rejected': -26.21588897705078, 'ref_logps/chosen': -24.300251007080078, 'epoch': 1.92} 64%|██████▍ | 254/396 [2:03:47<1:08:35, 28.98s/it] 64%|██████▍ | 255/396 [2:04:16<1:08:28, 29.14s/it] {'loss': 0.5919, 'learning_rate': 1.9803370786516854e-07, 'losses/dpo': 0.6389520168304443, 'losses/sft': 1.087360143661499, 'losses/total': 0.6389520168304443, 'rewards/chosen': -0.4226321578025818, 'rewards/rejected': -0.7562973499298096, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.33366525173187256, 'logps/rejected': -35.67947769165039, 'logps/chosen': -28.257335662841797, 'ref_logps/rejected': -28.11650276184082, 'ref_logps/chosen': -24.031015396118164, 'epoch': 1.92} 64%|██████▍ | 255/396 [2:04:16<1:08:28, 29.14s/it] 65%|██████▍ | 256/396 [2:04:45<1:07:49, 29.07s/it] {'loss': 0.5884, 'learning_rate': 1.9662921348314607e-07, 'losses/dpo': 0.5772832632064819, 'losses/sft': 1.0057258605957031, 'losses/total': 0.5772832632064819, 'rewards/chosen': -0.4189784526824951, 'rewards/rejected': -0.747002124786377, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.32802364230155945, 'logps/rejected': -33.91853713989258, 'logps/chosen': -27.326435089111328, 'ref_logps/rejected': -26.448516845703125, 'ref_logps/chosen': -23.13665008544922, 'epoch': 1.93} 65%|██████▍ | 256/396 [2:04:45<1:07:49, 29.07s/it] 65%|██████▍ | 257/396 [2:05:14<1:07:27, 29.12s/it] {'loss': 0.5245, 'learning_rate': 1.952247191011236e-07, 'losses/dpo': 0.5826983451843262, 'losses/sft': 0.7670709490776062, 'losses/total': 0.5826983451843262, 'rewards/chosen': -0.2777270972728729, 'rewards/rejected': -0.7696323394775391, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.49190521240234375, 'logps/rejected': -34.923095703125, 'logps/chosen': -24.134462356567383, 'ref_logps/rejected': -27.226768493652344, 'ref_logps/chosen': -21.35719108581543, 'epoch': 1.94} 65%|██████▍ | 257/396 [2:05:14<1:07:27, 29.12s/it] 65%|██████▌ | 258/396 [2:05:43<1:06:49, 29.05s/it] {'loss': 0.5654, 'learning_rate': 1.938202247191011e-07, 'losses/dpo': 0.5832593441009521, 'losses/sft': 0.8260340094566345, 'losses/total': 0.5832593441009521, 'rewards/chosen': -0.3385101854801178, 'rewards/rejected': -0.7442533373832703, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.4057431221008301, 'logps/rejected': -36.34782791137695, 'logps/chosen': -24.894744873046875, 'ref_logps/rejected': -28.905296325683594, 'ref_logps/chosen': -21.5096435546875, 'epoch': 1.95} 65%|██████▌ | 258/396 [2:05:43<1:06:49, 29.05s/it] 65%|██████▌ | 259/396 [2:06:12<1:06:12, 28.99s/it] {'loss': 0.6189, 'learning_rate': 1.9241573033707863e-07, 'losses/dpo': 0.5586456060409546, 'losses/sft': 1.1363164186477661, 'losses/total': 0.5586456060409546, 'rewards/chosen': -0.430155873298645, 'rewards/rejected': -0.7084795236587524, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.27832359075546265, 'logps/rejected': -33.72222900390625, 'logps/chosen': -29.12051773071289, 'ref_logps/rejected': -26.637435913085938, 'ref_logps/chosen': -24.818958282470703, 'epoch': 1.95} 65%|██████▌ | 259/396 [2:06:12<1:06:12, 28.99s/it] 66%|██████▌ | 260/396 [2:06:41<1:05:44, 29.00s/it] {'loss': 0.5647, 'learning_rate': 1.9101123595505617e-07, 'losses/dpo': 0.6132915616035461, 'losses/sft': 0.8355939984321594, 'losses/total': 0.6132915616035461, 'rewards/chosen': -0.37273097038269043, 'rewards/rejected': -0.758247971534729, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.3855169415473938, 'logps/rejected': -32.80144119262695, 'logps/chosen': -25.77654266357422, 'ref_logps/rejected': -25.218961715698242, 'ref_logps/chosen': -22.049232482910156, 'epoch': 1.96} 66%|██████▌ | 260/396 [2:06:41<1:05:44, 29.00s/it] 66%|██████▌ | 261/396 [2:07:11<1:05:30, 29.12s/it] {'loss': 0.5757, 'learning_rate': 1.896067415730337e-07, 'losses/dpo': 0.6402326822280884, 'losses/sft': 0.9358000159263611, 'losses/total': 0.6402326822280884, 'rewards/chosen': -0.3122865557670593, 'rewards/rejected': -0.6630844473838806, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.3507978618144989, 'logps/rejected': -33.18789291381836, 'logps/chosen': -27.173105239868164, 'ref_logps/rejected': -26.557050704956055, 'ref_logps/chosen': -24.05023956298828, 'epoch': 1.97} 66%|██████▌ | 261/396 [2:07:11<1:05:30, 29.12s/it] 66%|██████▌ | 262/396 [2:07:40<1:05:05, 29.14s/it] {'loss': 0.5844, 'learning_rate': 1.8820224719101123e-07, 'losses/dpo': 0.576771080493927, 'losses/sft': 0.8823024034500122, 'losses/total': 0.576771080493927, 'rewards/chosen': -0.35744667053222656, 'rewards/rejected': -0.7043651938438416, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.346918523311615, 'logps/rejected': -34.0608024597168, 'logps/chosen': -25.127092361450195, 'ref_logps/rejected': -27.01715087890625, 'ref_logps/chosen': -21.552627563476562, 'epoch': 1.98} 66%|██████▌ | 262/396 [2:07:40<1:05:05, 29.14s/it] 66%|██████▋ | 263/396 [2:08:09<1:04:26, 29.07s/it] {'loss': 0.5675, 'learning_rate': 1.8679775280898876e-07, 'losses/dpo': 0.5643225312232971, 'losses/sft': 0.7924672365188599, 'losses/total': 0.5643225312232971, 'rewards/chosen': -0.3473738133907318, 'rewards/rejected': -0.7239543199539185, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.37658050656318665, 'logps/rejected': -34.11262893676758, 'logps/chosen': -25.840179443359375, 'ref_logps/rejected': -26.873088836669922, 'ref_logps/chosen': -22.366439819335938, 'epoch': 1.98} 66%|██████▋ | 263/396 [2:08:09<1:04:26, 29.07s/it] 67%|██████▋ | 264/396 [2:08:38<1:04:00, 29.10s/it] {'loss': 0.5768, 'learning_rate': 1.853932584269663e-07, 'losses/dpo': 0.6149911880493164, 'losses/sft': 0.9512190222740173, 'losses/total': 0.6149911880493164, 'rewards/chosen': -0.33720535039901733, 'rewards/rejected': -0.6770392656326294, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.3398338854312897, 'logps/rejected': -30.944011688232422, 'logps/chosen': -24.64289093017578, 'ref_logps/rejected': -24.173620223999023, 'ref_logps/chosen': -21.270837783813477, 'epoch': 1.99} 67%|██████▋ | 264/396 [2:08:38<1:04:00, 29.10s/it] 67%|██████▋ | 265/396 [2:09:07<1:03:17, 28.99s/it] {'loss': 0.5407, 'learning_rate': 1.8398876404494382e-07, 'losses/dpo': 0.49823397397994995, 'losses/sft': 0.8145182132720947, 'losses/total': 0.49823397397994995, 'rewards/chosen': -0.3005535304546356, 'rewards/rejected': -0.7626761794090271, 'rewards/accuracies': 0.796875, 'rewards/margins': 0.4621226191520691, 'logps/rejected': -35.905609130859375, 'logps/chosen': -26.05956268310547, 'ref_logps/rejected': -28.27884864807129, 'ref_logps/chosen': -23.054027557373047, 'epoch': 2.0} 67%|██████▋ | 265/396 [2:09:07<1:03:17, 28.99s/it] 67%|██████▋ | 266/396 [2:09:36<1:03:10, 29.16s/it] {'loss': 0.5301, 'learning_rate': 1.8258426966292135e-07, 'losses/dpo': 0.49924543499946594, 'losses/sft': 0.9444026350975037, 'losses/total': 0.49924543499946594, 'rewards/chosen': -0.3344343304634094, 'rewards/rejected': -0.8015338778495789, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.4670996069908142, 'logps/rejected': -33.71357345581055, 'logps/chosen': -24.907108306884766, 'ref_logps/rejected': -25.69823455810547, 'ref_logps/chosen': -21.562763214111328, 'epoch': 2.01} 67%|██████▋ | 266/396 [2:09:36<1:03:10, 29.16s/it] 67%|██████▋ | 267/396 [2:10:05<1:02:31, 29.08s/it] {'loss': 0.5843, 'learning_rate': 1.8117977528089888e-07, 'losses/dpo': 0.6827691793441772, 'losses/sft': 0.9820384979248047, 'losses/total': 0.6827691793441772, 'rewards/chosen': -0.3843753933906555, 'rewards/rejected': -0.7438246607780457, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.35944926738739014, 'logps/rejected': -33.178436279296875, 'logps/chosen': -24.61281967163086, 'ref_logps/rejected': -25.740190505981445, 'ref_logps/chosen': -20.769065856933594, 'epoch': 2.02} 67%|██████▋ | 267/396 [2:10:05<1:02:31, 29.08s/it] 68%|██████▊ | 268/396 [2:10:34<1:01:58, 29.05s/it] {'loss': 0.5765, 'learning_rate': 1.7977528089887638e-07, 'losses/dpo': 0.48391562700271606, 'losses/sft': 0.9694733619689941, 'losses/total': 0.48391562700271606, 'rewards/chosen': -0.34718072414398193, 'rewards/rejected': -0.7052706480026245, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.3580899238586426, 'logps/rejected': -31.92254638671875, 'logps/chosen': -25.742042541503906, 'ref_logps/rejected': -24.869842529296875, 'ref_logps/chosen': -22.27023696899414, 'epoch': 2.02} 68%|██████▊ | 268/396 [2:10:34<1:01:58, 29.05s/it] 68%|██████▊ | 269/396 [2:11:03<1:01:26, 29.03s/it] {'loss': 0.5197, 'learning_rate': 1.7837078651685391e-07, 'losses/dpo': 0.566383957862854, 'losses/sft': 1.056198239326477, 'losses/total': 0.566383957862854, 'rewards/chosen': -0.2972065806388855, 'rewards/rejected': -0.7786551713943481, 'rewards/accuracies': 0.796875, 'rewards/margins': 0.48144853115081787, 'logps/rejected': -33.92596435546875, 'logps/chosen': -24.76668930053711, 'ref_logps/rejected': -26.1394100189209, 'ref_logps/chosen': -21.79462432861328, 'epoch': 2.03} 68%|██████▊ | 269/396 [2:11:03<1:01:26, 29.03s/it] 68%|██████▊ | 270/396 [2:11:32<1:01:16, 29.18s/it] {'loss': 0.554, 'learning_rate': 1.7696629213483144e-07, 'losses/dpo': 0.5455434322357178, 'losses/sft': 0.9091237783432007, 'losses/total': 0.5455434322357178, 'rewards/chosen': -0.3816927969455719, 'rewards/rejected': -0.7982731461524963, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.4165803790092468, 'logps/rejected': -32.83625030517578, 'logps/chosen': -25.022621154785156, 'ref_logps/rejected': -24.853519439697266, 'ref_logps/chosen': -21.205692291259766, 'epoch': 2.04} 68%|██████▊ | 270/396 [2:11:32<1:01:16, 29.18s/it] 68%|██████▊ | 271/396 [2:12:01<1:00:37, 29.10s/it] {'loss': 0.5526, 'learning_rate': 1.75561797752809e-07, 'losses/dpo': 0.7876778841018677, 'losses/sft': 1.1023296117782593, 'losses/total': 0.7876778841018677, 'rewards/chosen': -0.39987578988075256, 'rewards/rejected': -0.8122704029083252, 'rewards/accuracies': 0.78125, 'rewards/margins': 0.41239458322525024, 'logps/rejected': -35.007415771484375, 'logps/chosen': -27.038639068603516, 'ref_logps/rejected': -26.884708404541016, 'ref_logps/chosen': -23.039878845214844, 'epoch': 2.05} 68%|██████▊ | 271/396 [2:12:01<1:00:37, 29.10s/it] 69%|██████▊ | 272/396 [2:12:30<1:00:06, 29.09s/it] {'loss': 0.5444, 'learning_rate': 1.741573033707865e-07, 'losses/dpo': 0.4805631637573242, 'losses/sft': 0.8787716031074524, 'losses/total': 0.4805631637573242, 'rewards/chosen': -0.3746855556964874, 'rewards/rejected': -0.8824833631515503, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.5077978372573853, 'logps/rejected': -36.53676223754883, 'logps/chosen': -26.300579071044922, 'ref_logps/rejected': -27.711929321289062, 'ref_logps/chosen': -22.55372428894043, 'epoch': 2.05} 69%|██████▊ | 272/396 [2:12:30<1:00:06, 29.09s/it] 69%|██████▉ | 273/396 [2:13:00<59:37, 29.08s/it] {'loss': 0.4883, 'learning_rate': 1.7275280898876404e-07, 'losses/dpo': 0.5499591827392578, 'losses/sft': 1.1995720863342285, 'losses/total': 0.5499591827392578, 'rewards/chosen': -0.37463241815567017, 'rewards/rejected': -0.9740023612976074, 'rewards/accuracies': 0.828125, 'rewards/margins': 0.5993699431419373, 'logps/rejected': -39.48854064941406, 'logps/chosen': -27.612911224365234, 'ref_logps/rejected': -29.748516082763672, 'ref_logps/chosen': -23.866586685180664, 'epoch': 2.06} 69%|██████▉ | 273/396 [2:13:00<59:37, 29.08s/it] 69%|██████▉ | 274/396 [2:13:28<59:02, 29.03s/it] {'loss': 0.5223, 'learning_rate': 1.7134831460674157e-07, 'losses/dpo': 0.5853086113929749, 'losses/sft': 0.9450937509536743, 'losses/total': 0.5853086113929749, 'rewards/chosen': -0.4134795069694519, 'rewards/rejected': -0.9367015957832336, 'rewards/accuracies': 0.796875, 'rewards/margins': 0.5232220888137817, 'logps/rejected': -38.46211242675781, 'logps/chosen': -28.848485946655273, 'ref_logps/rejected': -29.095096588134766, 'ref_logps/chosen': -24.71368980407715, 'epoch': 2.07} 69%|██████▉ | 274/396 [2:13:28<59:02, 29.03s/it] 69%|██████▉ | 275/396 [2:13:57<58:25, 28.97s/it] {'loss': 0.5583, 'learning_rate': 1.699438202247191e-07, 'losses/dpo': 0.6550332307815552, 'losses/sft': 0.844421923160553, 'losses/total': 0.6550332307815552, 'rewards/chosen': -0.4007226824760437, 'rewards/rejected': -0.8187602758407593, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.4180375933647156, 'logps/rejected': -33.2642707824707, 'logps/chosen': -26.53584861755371, 'ref_logps/rejected': -25.07666778564453, 'ref_logps/chosen': -22.528621673583984, 'epoch': 2.08} 69%|██████▉ | 275/396 [2:13:57<58:25, 28.97s/it] 70%|██████▉ | 276/396 [2:14:26<57:53, 28.95s/it] {'loss': 0.5267, 'learning_rate': 1.6853932584269663e-07, 'losses/dpo': 0.37509262561798096, 'losses/sft': 0.9286944270133972, 'losses/total': 0.37509262561798096, 'rewards/chosen': -0.39673954248428345, 'rewards/rejected': -0.8922891616821289, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.4955495595932007, 'logps/rejected': -36.43919372558594, 'logps/chosen': -26.93305778503418, 'ref_logps/rejected': -27.516300201416016, 'ref_logps/chosen': -22.965662002563477, 'epoch': 2.08} 70%|██████▉ | 276/396 [2:14:26<57:53, 28.95s/it] 70%|██████▉ | 277/396 [2:14:55<57:24, 28.94s/it] {'loss': 0.585, 'learning_rate': 1.6713483146067413e-07, 'losses/dpo': 0.45891374349594116, 'losses/sft': 0.8818660378456116, 'losses/total': 0.45891374349594116, 'rewards/chosen': -0.4142143726348877, 'rewards/rejected': -0.7912107110023499, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.37699633836746216, 'logps/rejected': -33.23160934448242, 'logps/chosen': -27.517230987548828, 'ref_logps/rejected': -25.319503784179688, 'ref_logps/chosen': -23.37508773803711, 'epoch': 2.09} 70%|██████▉ | 277/396 [2:14:55<57:24, 28.94s/it] 70%|███████ | 278/396 [2:15:24<56:55, 28.94s/it] {'loss': 0.5569, 'learning_rate': 1.6573033707865166e-07, 'losses/dpo': 0.6695871353149414, 'losses/sft': 1.1478632688522339, 'losses/total': 0.6695871353149414, 'rewards/chosen': -0.4053817391395569, 'rewards/rejected': -0.8484681844711304, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.4430864751338959, 'logps/rejected': -37.009605407714844, 'logps/chosen': -29.848949432373047, 'ref_logps/rejected': -28.52492332458496, 'ref_logps/chosen': -25.79513168334961, 'epoch': 2.1} 70%|███████ | 278/396 [2:15:24<56:55, 28.94s/it] 70%|███████ | 279/396 [2:15:53<56:23, 28.92s/it] {'loss': 0.5853, 'learning_rate': 1.6432584269662922e-07, 'losses/dpo': 0.6266674995422363, 'losses/sft': 0.9419240951538086, 'losses/total': 0.6266674995422363, 'rewards/chosen': -0.3720143437385559, 'rewards/rejected': -0.734940767288208, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.3629264533519745, 'logps/rejected': -33.84664535522461, 'logps/chosen': -26.847461700439453, 'ref_logps/rejected': -26.497238159179688, 'ref_logps/chosen': -23.1273193359375, 'epoch': 2.11} 70%|███████ | 279/396 [2:15:53<56:23, 28.92s/it] 71%|███████ | 280/396 [2:16:22<55:53, 28.91s/it] {'loss': 0.5277, 'learning_rate': 1.6292134831460675e-07, 'losses/dpo': 0.5965819358825684, 'losses/sft': 1.0364360809326172, 'losses/total': 0.5965819358825684, 'rewards/chosen': -0.43689805269241333, 'rewards/rejected': -0.9301656484603882, 'rewards/accuracies': 0.7890625, 'rewards/margins': 0.49326756596565247, 'logps/rejected': -37.79768371582031, 'logps/chosen': -25.227951049804688, 'ref_logps/rejected': -28.49602508544922, 'ref_logps/chosen': -20.85896873474121, 'epoch': 2.11} 71%|███████ | 280/396 [2:16:22<55:53, 28.91s/it] 71%|███████ | 281/396 [2:16:51<55:33, 28.99s/it] {'loss': 0.5305, 'learning_rate': 1.6151685393258428e-07, 'losses/dpo': 0.5456879138946533, 'losses/sft': 0.8692267537117004, 'losses/total': 0.5456879138946533, 'rewards/chosen': -0.36411628127098083, 'rewards/rejected': -0.850979208946228, 'rewards/accuracies': 0.75, 'rewards/margins': 0.4868628680706024, 'logps/rejected': -35.28973388671875, 'logps/chosen': -25.945070266723633, 'ref_logps/rejected': -26.77994155883789, 'ref_logps/chosen': -22.303909301757812, 'epoch': 2.12} 71%|███████ | 281/396 [2:16:51<55:33, 28.99s/it] 71%|███████ | 282/396 [2:17:20<55:04, 28.99s/it] {'loss': 0.5766, 'learning_rate': 1.6011235955056178e-07, 'losses/dpo': 0.6054384708404541, 'losses/sft': 0.9599564671516418, 'losses/total': 0.6054384708404541, 'rewards/chosen': -0.4504254460334778, 'rewards/rejected': -0.8170950412750244, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.36666956543922424, 'logps/rejected': -36.41142272949219, 'logps/chosen': -28.660266876220703, 'ref_logps/rejected': -28.24047088623047, 'ref_logps/chosen': -24.156015396118164, 'epoch': 2.13} 71%|███████ | 282/396 [2:17:20<55:04, 28.99s/it] 71%|███████▏ | 283/396 [2:17:49<54:32, 28.96s/it] {'loss': 0.5215, 'learning_rate': 1.5870786516853931e-07, 'losses/dpo': 0.463223397731781, 'losses/sft': 1.041387915611267, 'losses/total': 0.463223397731781, 'rewards/chosen': -0.4138358533382416, 'rewards/rejected': -0.9703004360198975, 'rewards/accuracies': 0.75, 'rewards/margins': 0.5564644932746887, 'logps/rejected': -38.50691604614258, 'logps/chosen': -27.74228858947754, 'ref_logps/rejected': -28.803909301757812, 'ref_logps/chosen': -23.603931427001953, 'epoch': 2.14} 71%|███████▏ | 283/396 [2:17:49<54:32, 28.96s/it] 72%|███████▏ | 284/396 [2:18:18<54:02, 28.95s/it] {'loss': 0.5266, 'learning_rate': 1.5730337078651685e-07, 'losses/dpo': 0.6279169321060181, 'losses/sft': 0.8709256052970886, 'losses/total': 0.6279169321060181, 'rewards/chosen': -0.4162542223930359, 'rewards/rejected': -0.9086115956306458, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.4923573136329651, 'logps/rejected': -35.91729736328125, 'logps/chosen': -24.93131446838379, 'ref_logps/rejected': -26.83118438720703, 'ref_logps/chosen': -20.768774032592773, 'epoch': 2.14} 72%|███████▏ | 284/396 [2:18:18<54:02, 28.95s/it] 72%|███████▏ | 285/396 [2:18:47<53:31, 28.93s/it] {'loss': 0.5687, 'learning_rate': 1.5589887640449438e-07, 'losses/dpo': 0.5966840386390686, 'losses/sft': 0.9412966966629028, 'losses/total': 0.5966840386390686, 'rewards/chosen': -0.48891347646713257, 'rewards/rejected': -0.898935079574585, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.4100216031074524, 'logps/rejected': -38.57915496826172, 'logps/chosen': -27.571338653564453, 'ref_logps/rejected': -29.58980369567871, 'ref_logps/chosen': -22.682205200195312, 'epoch': 2.15} 72%|███████▏ | 285/396 [2:18:47<53:31, 28.93s/it] 72%|███████▏ | 286/396 [2:19:16<53:04, 28.95s/it] {'loss': 0.5902, 'learning_rate': 1.5449438202247188e-07, 'losses/dpo': 0.7199227213859558, 'losses/sft': 0.9989073276519775, 'losses/total': 0.7199227213859558, 'rewards/chosen': -0.4336766302585602, 'rewards/rejected': -0.7782501578330994, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.3445735573768616, 'logps/rejected': -33.395118713378906, 'logps/chosen': -25.46674346923828, 'ref_logps/rejected': -25.61261749267578, 'ref_logps/chosen': -21.129976272583008, 'epoch': 2.16} 72%|███████▏ | 286/396 [2:19:16<53:04, 28.95s/it] 72%|███████▏ | 287/396 [2:19:45<52:53, 29.11s/it] {'loss': 0.5101, 'learning_rate': 1.5308988764044944e-07, 'losses/dpo': 0.42156773805618286, 'losses/sft': 0.824786365032196, 'losses/total': 0.42156773805618286, 'rewards/chosen': -0.46706122159957886, 'rewards/rejected': -1.0249050855636597, 'rewards/accuracies': 0.8046875, 'rewards/margins': 0.557843804359436, 'logps/rejected': -39.50514221191406, 'logps/chosen': -26.559568405151367, 'ref_logps/rejected': -29.256093978881836, 'ref_logps/chosen': -21.88895606994629, 'epoch': 2.17} 72%|███████▏ | 287/396 [2:19:45<52:53, 29.11s/it] 73%|███████▎ | 288/396 [2:20:14<52:15, 29.03s/it] {'loss': 0.5852, 'learning_rate': 1.5168539325842697e-07, 'losses/dpo': 0.7073966264724731, 'losses/sft': 0.959773600101471, 'losses/total': 0.7073966264724731, 'rewards/chosen': -0.509972333908081, 'rewards/rejected': -0.8988049030303955, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.38883259892463684, 'logps/rejected': -35.846214294433594, 'logps/chosen': -27.430574417114258, 'ref_logps/rejected': -26.85816192626953, 'ref_logps/chosen': -22.33085060119629, 'epoch': 2.17} 73%|███████▎ | 288/396 [2:20:14<52:15, 29.03s/it] 73%|███████▎ | 289/396 [2:20:43<51:45, 29.02s/it] {'loss': 0.5553, 'learning_rate': 1.502808988764045e-07, 'losses/dpo': 0.5419721603393555, 'losses/sft': 0.940202534198761, 'losses/total': 0.5419721603393555, 'rewards/chosen': -0.4574929475784302, 'rewards/rejected': -0.9506438970565796, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.4931509494781494, 'logps/rejected': -37.50861358642578, 'logps/chosen': -25.799861907958984, 'ref_logps/rejected': -28.002174377441406, 'ref_logps/chosen': -21.224933624267578, 'epoch': 2.18} 73%|███████▎ | 289/396 [2:20:43<51:45, 29.02s/it] 73%|███████▎ | 290/396 [2:21:12<51:19, 29.05s/it] {'loss': 0.5921, 'learning_rate': 1.4887640449438203e-07, 'losses/dpo': 0.6595107913017273, 'losses/sft': 1.0057413578033447, 'losses/total': 0.6595107913017273, 'rewards/chosen': -0.4718025326728821, 'rewards/rejected': -0.8120721578598022, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.34026968479156494, 'logps/rejected': -34.21430206298828, 'logps/chosen': -27.79110336303711, 'ref_logps/rejected': -26.093578338623047, 'ref_logps/chosen': -23.073078155517578, 'epoch': 2.19} 73%|███████▎ | 290/396 [2:21:12<51:19, 29.05s/it] 73%|███████▎ | 291/396 [2:21:41<50:55, 29.10s/it] {'loss': 0.5263, 'learning_rate': 1.4747191011235953e-07, 'losses/dpo': 0.47728973627090454, 'losses/sft': 1.0133030414581299, 'losses/total': 0.47728973627090454, 'rewards/chosen': -0.38914480805397034, 'rewards/rejected': -0.9225608110427856, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.5334160327911377, 'logps/rejected': -37.56932830810547, 'logps/chosen': -26.380355834960938, 'ref_logps/rejected': -28.34372329711914, 'ref_logps/chosen': -22.488906860351562, 'epoch': 2.2} 73%|███████▎ | 291/396 [2:21:41<50:55, 29.10s/it] 74%|███████▎ | 292/396 [2:22:10<50:18, 29.02s/it] {'loss': 0.5417, 'learning_rate': 1.4606741573033706e-07, 'losses/dpo': 0.7257384061813354, 'losses/sft': 1.2120591402053833, 'losses/total': 0.7257384061813354, 'rewards/chosen': -0.48753368854522705, 'rewards/rejected': -0.9631930589675903, 'rewards/accuracies': 0.75, 'rewards/margins': 0.47565943002700806, 'logps/rejected': -37.092594146728516, 'logps/chosen': -27.006134033203125, 'ref_logps/rejected': -27.460662841796875, 'ref_logps/chosen': -22.130794525146484, 'epoch': 2.2} 74%|███████▎ | 292/396 [2:22:10<50:18, 29.02s/it] 74%|███████▍ | 293/396 [2:22:39<49:47, 29.01s/it] {'loss': 0.5381, 'learning_rate': 1.446629213483146e-07, 'losses/dpo': 0.6313049793243408, 'losses/sft': 0.9201721549034119, 'losses/total': 0.6313049793243408, 'rewards/chosen': -0.3674505054950714, 'rewards/rejected': -0.8836201429367065, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.5161697268486023, 'logps/rejected': -37.16014099121094, 'logps/chosen': -25.57880210876465, 'ref_logps/rejected': -28.32394027709961, 'ref_logps/chosen': -21.904296875, 'epoch': 2.21} 74%|███████▍ | 293/396 [2:22:39<49:47, 29.01s/it] 74%|███████▍ | 294/396 [2:23:08<49:15, 28.97s/it] {'loss': 0.5429, 'learning_rate': 1.4325842696629212e-07, 'losses/dpo': 0.5216307044029236, 'losses/sft': 1.0138075351715088, 'losses/total': 0.5216307044029236, 'rewards/chosen': -0.4237874746322632, 'rewards/rejected': -0.9241256713867188, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.5003381967544556, 'logps/rejected': -37.283538818359375, 'logps/chosen': -24.601829528808594, 'ref_logps/rejected': -28.042282104492188, 'ref_logps/chosen': -20.36395263671875, 'epoch': 2.22} 74%|███████▍ | 294/396 [2:23:08<49:15, 28.97s/it] 74%|███████▍ | 295/396 [2:23:37<48:52, 29.03s/it] {'loss': 0.5894, 'learning_rate': 1.4185393258426968e-07, 'losses/dpo': 0.43838924169540405, 'losses/sft': 1.2099077701568604, 'losses/total': 0.43838924169540405, 'rewards/chosen': -0.5356131792068481, 'rewards/rejected': -0.9591414332389832, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.423528254032135, 'logps/rejected': -36.47199249267578, 'logps/chosen': -28.58258819580078, 'ref_logps/rejected': -26.880578994750977, 'ref_logps/chosen': -23.226455688476562, 'epoch': 2.23} 74%|███████▍ | 295/396 [2:23:37<48:52, 29.03s/it] 75%|███████▍ | 296/396 [2:24:06<48:21, 29.01s/it] {'loss': 0.5472, 'learning_rate': 1.4044943820224718e-07, 'losses/dpo': 0.6286274790763855, 'losses/sft': 1.1655751466751099, 'losses/total': 0.6286274790763855, 'rewards/chosen': -0.4692240357398987, 'rewards/rejected': -0.9575324058532715, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.48830845952033997, 'logps/rejected': -36.5651741027832, 'logps/chosen': -28.39300537109375, 'ref_logps/rejected': -26.989849090576172, 'ref_logps/chosen': -23.700767517089844, 'epoch': 2.23} 75%|███████▍ | 296/396 [2:24:06<48:21, 29.01s/it] 75%|███████▌ | 297/396 [2:24:36<48:07, 29.17s/it] {'loss': 0.5581, 'learning_rate': 1.3904494382022472e-07, 'losses/dpo': 0.425361692905426, 'losses/sft': 1.129596471786499, 'losses/total': 0.425361692905426, 'rewards/chosen': -0.47656214237213135, 'rewards/rejected': -0.9194119572639465, 'rewards/accuracies': 0.671875, 'rewards/margins': 0.4428498148918152, 'logps/rejected': -34.147830963134766, 'logps/chosen': -27.01761245727539, 'ref_logps/rejected': -24.953710556030273, 'ref_logps/chosen': -22.251991271972656, 'epoch': 2.24} 75%|███████▌ | 297/396 [2:24:36<48:07, 29.17s/it] 75%|███████▌ | 298/396 [2:25:05<47:53, 29.32s/it] {'loss': 0.5111, 'learning_rate': 1.3764044943820225e-07, 'losses/dpo': 0.5578055381774902, 'losses/sft': 1.1197444200515747, 'losses/total': 0.5578055381774902, 'rewards/chosen': -0.4615376591682434, 'rewards/rejected': -1.0309226512908936, 'rewards/accuracies': 0.78125, 'rewards/margins': 0.5693849325180054, 'logps/rejected': -39.81676483154297, 'logps/chosen': -27.92938995361328, 'ref_logps/rejected': -29.507539749145508, 'ref_logps/chosen': -23.314014434814453, 'epoch': 2.25} 75%|███████▌ | 298/396 [2:25:05<47:53, 29.32s/it] 76%|███████▌ | 299/396 [2:25:34<47:17, 29.25s/it] {'loss': 0.5499, 'learning_rate': 1.3623595505617978e-07, 'losses/dpo': 0.4847102165222168, 'losses/sft': 0.989621639251709, 'losses/total': 0.4847102165222168, 'rewards/chosen': -0.45832377672195435, 'rewards/rejected': -0.8928664922714233, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.43454277515411377, 'logps/rejected': -35.69133758544922, 'logps/chosen': -27.910144805908203, 'ref_logps/rejected': -26.762676239013672, 'ref_logps/chosen': -23.326908111572266, 'epoch': 2.26} 76%|███████▌ | 299/396 [2:25:34<47:17, 29.25s/it] 76%|███████▌ | 300/396 [2:26:04<46:49, 29.27s/it] {'loss': 0.5935, 'learning_rate': 1.3483146067415728e-07, 'losses/dpo': 0.5905570983886719, 'losses/sft': 1.0464057922363281, 'losses/total': 0.5905570983886719, 'rewards/chosen': -0.5023964643478394, 'rewards/rejected': -0.8857850432395935, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.38338857889175415, 'logps/rejected': -37.3542366027832, 'logps/chosen': -28.233020782470703, 'ref_logps/rejected': -28.49638557434082, 'ref_logps/chosen': -23.20905303955078, 'epoch': 2.26} 76%|███████▌ | 300/396 [2:26:04<46:49, 29.27s/it] 76%|███████▌ | 301/396 [2:26:33<46:10, 29.16s/it] {'loss': 0.5608, 'learning_rate': 1.334269662921348e-07, 'losses/dpo': 0.5518324375152588, 'losses/sft': 0.9761526584625244, 'losses/total': 0.5518324375152588, 'rewards/chosen': -0.4891367554664612, 'rewards/rejected': -0.9409228563308716, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.451786071062088, 'logps/rejected': -36.25569152832031, 'logps/chosen': -29.44438934326172, 'ref_logps/rejected': -26.846464157104492, 'ref_logps/chosen': -24.553022384643555, 'epoch': 2.27} 76%|███████▌ | 301/396 [2:26:33<46:10, 29.16s/it] 76%|███████▋ | 302/396 [2:27:02<45:32, 29.07s/it] {'loss': 0.5463, 'learning_rate': 1.3202247191011234e-07, 'losses/dpo': 0.5125599503517151, 'losses/sft': 0.9747940897941589, 'losses/total': 0.5125599503517151, 'rewards/chosen': -0.5118017196655273, 'rewards/rejected': -0.9915178418159485, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.47971609234809875, 'logps/rejected': -37.83529281616211, 'logps/chosen': -29.32979393005371, 'ref_logps/rejected': -27.92011260986328, 'ref_logps/chosen': -24.211776733398438, 'epoch': 2.28} 76%|███████▋ | 302/396 [2:27:02<45:32, 29.07s/it] 77%|███████▋ | 303/396 [2:27:31<45:01, 29.05s/it] {'loss': 0.575, 'learning_rate': 1.306179775280899e-07, 'losses/dpo': 0.5703020095825195, 'losses/sft': 0.9395530223846436, 'losses/total': 0.5703020095825195, 'rewards/chosen': -0.49919426441192627, 'rewards/rejected': -0.9130042791366577, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.4138101041316986, 'logps/rejected': -35.418338775634766, 'logps/chosen': -27.11697769165039, 'ref_logps/rejected': -26.28829574584961, 'ref_logps/chosen': -22.12503433227539, 'epoch': 2.29} 77%|███████▋ | 303/396 [2:27:31<45:01, 29.05s/it] 77%|███████▋ | 304/396 [2:27:59<44:30, 29.02s/it] {'loss': 0.5843, 'learning_rate': 1.2921348314606743e-07, 'losses/dpo': 0.4914831221103668, 'losses/sft': 0.9517439603805542, 'losses/total': 0.4914831221103668, 'rewards/chosen': -0.5896263122558594, 'rewards/rejected': -0.9848057627677917, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.3951793909072876, 'logps/rejected': -38.8961181640625, 'logps/chosen': -31.24747085571289, 'ref_logps/rejected': -29.048057556152344, 'ref_logps/chosen': -25.351207733154297, 'epoch': 2.29} 77%|███████▋ | 304/396 [2:27:59<44:30, 29.02s/it] 77%|███████▋ | 305/396 [2:28:29<44:02, 29.04s/it] {'loss': 0.5261, 'learning_rate': 1.2780898876404493e-07, 'losses/dpo': 0.4620394706726074, 'losses/sft': 1.0134756565093994, 'losses/total': 0.4620394706726074, 'rewards/chosen': -0.4622839689254761, 'rewards/rejected': -0.9988985061645508, 'rewards/accuracies': 0.75, 'rewards/margins': 0.5366144776344299, 'logps/rejected': -37.481964111328125, 'logps/chosen': -27.757152557373047, 'ref_logps/rejected': -27.492977142333984, 'ref_logps/chosen': -23.13431167602539, 'epoch': 2.3} 77%|███████▋ | 305/396 [2:28:29<44:02, 29.04s/it] 77%|███████▋ | 306/396 [2:28:58<43:33, 29.04s/it] {'loss': 0.5636, 'learning_rate': 1.2640449438202246e-07, 'losses/dpo': 0.5714951753616333, 'losses/sft': 0.9859296679496765, 'losses/total': 0.5714951753616333, 'rewards/chosen': -0.452495813369751, 'rewards/rejected': -0.882956862449646, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.4304611086845398, 'logps/rejected': -34.95043182373047, 'logps/chosen': -26.990705490112305, 'ref_logps/rejected': -26.120864868164062, 'ref_logps/chosen': -22.465744018554688, 'epoch': 2.31} 77%|███████▋ | 306/396 [2:28:58<43:33, 29.04s/it] 78%|███████▊ | 307/396 [2:29:27<43:08, 29.09s/it] {'loss': 0.5508, 'learning_rate': 1.25e-07, 'losses/dpo': 0.5849748253822327, 'losses/sft': 0.9925932288169861, 'losses/total': 0.5849748253822327, 'rewards/chosen': -0.5496042370796204, 'rewards/rejected': -1.0864741802215576, 'rewards/accuracies': 0.75, 'rewards/margins': 0.5368699431419373, 'logps/rejected': -40.363712310791016, 'logps/chosen': -29.839576721191406, 'ref_logps/rejected': -29.49897003173828, 'ref_logps/chosen': -24.343534469604492, 'epoch': 2.32} 78%|███████▊ | 307/396 [2:29:27<43:08, 29.09s/it] 78%|███████▊ | 308/396 [2:29:56<42:37, 29.06s/it] {'loss': 0.5537, 'learning_rate': 1.2359550561797752e-07, 'losses/dpo': 0.43898260593414307, 'losses/sft': 0.8520787954330444, 'losses/total': 0.43898260593414307, 'rewards/chosen': -0.4556504487991333, 'rewards/rejected': -0.9059818387031555, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.450331449508667, 'logps/rejected': -35.485931396484375, 'logps/chosen': -26.925609588623047, 'ref_logps/rejected': -26.426111221313477, 'ref_logps/chosen': -22.369102478027344, 'epoch': 2.32} 78%|███████▊ | 308/396 [2:29:56<42:37, 29.06s/it] 78%|███████▊ | 309/396 [2:30:25<42:09, 29.08s/it] {'loss': 0.6099, 'learning_rate': 1.2219101123595506e-07, 'losses/dpo': 0.6877168416976929, 'losses/sft': 0.8925371766090393, 'losses/total': 0.6877168416976929, 'rewards/chosen': -0.5292750597000122, 'rewards/rejected': -0.8839576840400696, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.3546826243400574, 'logps/rejected': -34.86455535888672, 'logps/chosen': -27.308425903320312, 'ref_logps/rejected': -26.024978637695312, 'ref_logps/chosen': -22.015674591064453, 'epoch': 2.33} 78%|███████▊ | 309/396 [2:30:25<42:09, 29.08s/it] 78%|███████▊ | 310/396 [2:30:54<41:48, 29.17s/it] {'loss': 0.5451, 'learning_rate': 1.2078651685393259e-07, 'losses/dpo': 0.4608323574066162, 'losses/sft': 1.068372130393982, 'losses/total': 0.4608323574066162, 'rewards/chosen': -0.4308522939682007, 'rewards/rejected': -0.9059160947799683, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.4750638008117676, 'logps/rejected': -34.355613708496094, 'logps/chosen': -27.23873519897461, 'ref_logps/rejected': -25.29645538330078, 'ref_logps/chosen': -22.93021011352539, 'epoch': 2.34} 78%|███████▊ | 310/396 [2:30:54<41:48, 29.17s/it] 79%|███████▊ | 311/396 [2:31:23<41:14, 29.11s/it] {'loss': 0.5435, 'learning_rate': 1.1938202247191012e-07, 'losses/dpo': 0.49991002678871155, 'losses/sft': 0.9416501522064209, 'losses/total': 0.49991002678871155, 'rewards/chosen': -0.4453369379043579, 'rewards/rejected': -0.9242523312568665, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.47891533374786377, 'logps/rejected': -36.60034942626953, 'logps/chosen': -27.410297393798828, 'ref_logps/rejected': -27.357826232910156, 'ref_logps/chosen': -22.95693016052246, 'epoch': 2.35} 79%|███████▊ | 311/396 [2:31:23<41:14, 29.11s/it] 79%|███████▉ | 312/396 [2:31:52<40:39, 29.04s/it] {'loss': 0.5108, 'learning_rate': 1.1797752808988763e-07, 'losses/dpo': 0.49980589747428894, 'losses/sft': 0.8830540776252747, 'losses/total': 0.49980589747428894, 'rewards/chosen': -0.4101608395576477, 'rewards/rejected': -0.9781473875045776, 'rewards/accuracies': 0.78125, 'rewards/margins': 0.5679866075515747, 'logps/rejected': -35.989906311035156, 'logps/chosen': -26.075031280517578, 'ref_logps/rejected': -26.20843505859375, 'ref_logps/chosen': -21.97342300415039, 'epoch': 2.35} 79%|███████▉ | 312/396 [2:31:52<40:39, 29.04s/it] 79%|███████▉ | 313/396 [2:32:21<40:05, 28.99s/it] {'loss': 0.535, 'learning_rate': 1.1657303370786515e-07, 'losses/dpo': 0.5506036281585693, 'losses/sft': 0.842628002166748, 'losses/total': 0.5506036281585693, 'rewards/chosen': -0.4628047049045563, 'rewards/rejected': -0.9872736930847168, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.5244689583778381, 'logps/rejected': -36.756141662597656, 'logps/chosen': -27.034866333007812, 'ref_logps/rejected': -26.883403778076172, 'ref_logps/chosen': -22.406818389892578, 'epoch': 2.36} 79%|███████▉ | 313/396 [2:32:21<40:05, 28.99s/it] 79%|███████▉ | 314/396 [2:32:50<39:42, 29.06s/it] {'loss': 0.6002, 'learning_rate': 1.151685393258427e-07, 'losses/dpo': 0.643724262714386, 'losses/sft': 0.86636883020401, 'losses/total': 0.643724262714386, 'rewards/chosen': -0.4378345012664795, 'rewards/rejected': -0.7820788621902466, 'rewards/accuracies': 0.6640625, 'rewards/margins': 0.3442443907260895, 'logps/rejected': -32.58570861816406, 'logps/chosen': -24.555809020996094, 'ref_logps/rejected': -24.76491928100586, 'ref_logps/chosen': -20.17746353149414, 'epoch': 2.37} 79%|███████▉ | 314/396 [2:32:50<39:42, 29.06s/it] 80%|███████▉ | 315/396 [2:33:19<39:13, 29.05s/it] {'loss': 0.529, 'learning_rate': 1.1376404494382023e-07, 'losses/dpo': 0.5676740407943726, 'losses/sft': 0.8977797627449036, 'losses/total': 0.5676740407943726, 'rewards/chosen': -0.4626464247703552, 'rewards/rejected': -0.9720257520675659, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.5093792676925659, 'logps/rejected': -34.9397087097168, 'logps/chosen': -23.88672637939453, 'ref_logps/rejected': -25.219451904296875, 'ref_logps/chosen': -19.26026153564453, 'epoch': 2.38} 80%|███████▉ | 315/396 [2:33:19<39:13, 29.05s/it] 80%|███████▉ | 316/396 [2:33:48<38:46, 29.08s/it] {'loss': 0.5376, 'learning_rate': 1.1235955056179774e-07, 'losses/dpo': 0.4975647032260895, 'losses/sft': 1.098832130432129, 'losses/total': 0.4975647032260895, 'rewards/chosen': -0.5325409173965454, 'rewards/rejected': -1.0453509092330933, 'rewards/accuracies': 0.75, 'rewards/margins': 0.5128099918365479, 'logps/rejected': -38.62759017944336, 'logps/chosen': -28.308603286743164, 'ref_logps/rejected': -28.174081802368164, 'ref_logps/chosen': -22.983192443847656, 'epoch': 2.38} 80%|███████▉ | 316/396 [2:33:48<38:46, 29.08s/it] 80%|████████ | 317/396 [2:34:17<38:11, 29.00s/it] {'loss': 0.6085, 'learning_rate': 1.1095505617977527e-07, 'losses/dpo': 0.6205140352249146, 'losses/sft': 1.0714130401611328, 'losses/total': 0.6205140352249146, 'rewards/chosen': -0.5933888554573059, 'rewards/rejected': -0.9132209420204163, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.31983205676078796, 'logps/rejected': -34.92308807373047, 'logps/chosen': -29.64281463623047, 'ref_logps/rejected': -25.790882110595703, 'ref_logps/chosen': -23.70892333984375, 'epoch': 2.39} 80%|████████ | 317/396 [2:34:17<38:11, 29.00s/it] 80%|████████ | 318/396 [2:34:46<37:38, 28.95s/it] {'loss': 0.5594, 'learning_rate': 1.095505617977528e-07, 'losses/dpo': 0.46756136417388916, 'losses/sft': 1.0184146165847778, 'losses/total': 0.46756136417388916, 'rewards/chosen': -0.5094340443611145, 'rewards/rejected': -0.9681426882743835, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.45870864391326904, 'logps/rejected': -35.160831451416016, 'logps/chosen': -27.25480079650879, 'ref_logps/rejected': -25.47940444946289, 'ref_logps/chosen': -22.160459518432617, 'epoch': 2.4} 80%|████████ | 318/396 [2:34:46<37:38, 28.95s/it] 81%|████████ | 319/396 [2:35:15<37:07, 28.93s/it] {'loss': 0.5957, 'learning_rate': 1.0814606741573033e-07, 'losses/dpo': 0.6158527135848999, 'losses/sft': 0.9492118954658508, 'losses/total': 0.6158527135848999, 'rewards/chosen': -0.5499600172042847, 'rewards/rejected': -0.9219518899917603, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.3719918131828308, 'logps/rejected': -35.89076232910156, 'logps/chosen': -28.060367584228516, 'ref_logps/rejected': -26.671247482299805, 'ref_logps/chosen': -22.560768127441406, 'epoch': 2.41} 81%|████████ | 319/396 [2:35:15<37:07, 28.93s/it] 81%|████████ | 320/396 [2:35:44<36:34, 28.88s/it] {'loss': 0.579, 'learning_rate': 1.0674157303370785e-07, 'losses/dpo': 0.5990191698074341, 'losses/sft': 1.0173970460891724, 'losses/total': 0.5990191698074341, 'rewards/chosen': -0.4988323450088501, 'rewards/rejected': -0.8880510330200195, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.38921868801116943, 'logps/rejected': -32.179466247558594, 'logps/chosen': -28.077362060546875, 'ref_logps/rejected': -23.298954010009766, 'ref_logps/chosen': -23.08903694152832, 'epoch': 2.42} 81%|████████ | 320/396 [2:35:44<36:34, 28.88s/it] 81%|████████ | 321/396 [2:36:13<36:10, 28.94s/it] {'loss': 0.5694, 'learning_rate': 1.0533707865168538e-07, 'losses/dpo': 0.6682005524635315, 'losses/sft': 0.9579723477363586, 'losses/total': 0.6682005524635315, 'rewards/chosen': -0.519504964351654, 'rewards/rejected': -0.942169189453125, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.42266416549682617, 'logps/rejected': -34.72178268432617, 'logps/chosen': -28.690523147583008, 'ref_logps/rejected': -25.300090789794922, 'ref_logps/chosen': -23.495473861694336, 'epoch': 2.42} 81%|████████ | 321/396 [2:36:13<36:10, 28.94s/it] 81%|████████▏ | 322/396 [2:36:42<35:42, 28.95s/it] {'loss': 0.5471, 'learning_rate': 1.0393258426966293e-07, 'losses/dpo': 0.6572707891464233, 'losses/sft': 1.028795599937439, 'losses/total': 0.6572707891464233, 'rewards/chosen': -0.4386903643608093, 'rewards/rejected': -0.8888803124427795, 'rewards/accuracies': 0.75, 'rewards/margins': 0.45018988847732544, 'logps/rejected': -33.412723541259766, 'logps/chosen': -25.57655143737793, 'ref_logps/rejected': -24.523921966552734, 'ref_logps/chosen': -21.189647674560547, 'epoch': 2.43} 81%|████████▏ | 322/396 [2:36:42<35:42, 28.95s/it] 82%|████████▏ | 323/396 [2:37:11<35:13, 28.95s/it] {'loss': 0.5813, 'learning_rate': 1.0252808988764044e-07, 'losses/dpo': 0.5923129916191101, 'losses/sft': 0.9457908272743225, 'losses/total': 0.5923129916191101, 'rewards/chosen': -0.5013381838798523, 'rewards/rejected': -0.8999974727630615, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.3986593186855316, 'logps/rejected': -33.21003723144531, 'logps/chosen': -29.36073112487793, 'ref_logps/rejected': -24.210060119628906, 'ref_logps/chosen': -24.347349166870117, 'epoch': 2.44} 82%|████████▏ | 323/396 [2:37:11<35:13, 28.95s/it] 82%|████████▏ | 324/396 [2:37:40<34:43, 28.94s/it] {'loss': 0.5458, 'learning_rate': 1.0112359550561797e-07, 'losses/dpo': 0.5657480359077454, 'losses/sft': 0.9646883606910706, 'losses/total': 0.5657480359077454, 'rewards/chosen': -0.44619300961494446, 'rewards/rejected': -0.9597463607788086, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.5135533809661865, 'logps/rejected': -36.034000396728516, 'logps/chosen': -25.595857620239258, 'ref_logps/rejected': -26.436534881591797, 'ref_logps/chosen': -21.133930206298828, 'epoch': 2.45} 82%|████████▏ | 324/396 [2:37:40<34:43, 28.94s/it] 82%|████████▏ | 325/396 [2:38:09<34:13, 28.92s/it] {'loss': 0.5137, 'learning_rate': 9.971910112359549e-08, 'losses/dpo': 0.5698142051696777, 'losses/sft': 0.9736462235450745, 'losses/total': 0.5698142051696777, 'rewards/chosen': -0.370680034160614, 'rewards/rejected': -0.9339342713356018, 'rewards/accuracies': 0.796875, 'rewards/margins': 0.5632542371749878, 'logps/rejected': -35.9984130859375, 'logps/chosen': -25.68283462524414, 'ref_logps/rejected': -26.659069061279297, 'ref_logps/chosen': -21.976032257080078, 'epoch': 2.45} 82%|████████▏ | 325/396 [2:38:09<34:13, 28.92s/it] 82%|████████▏ | 326/396 [2:38:37<33:43, 28.90s/it] {'loss': 0.5574, 'learning_rate': 9.831460674157303e-08, 'losses/dpo': 0.6440725326538086, 'losses/sft': 0.963034987449646, 'losses/total': 0.6440725326538086, 'rewards/chosen': -0.47606179118156433, 'rewards/rejected': -0.9283559918403625, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.45229417085647583, 'logps/rejected': -37.614471435546875, 'logps/chosen': -27.503202438354492, 'ref_logps/rejected': -28.33091163635254, 'ref_logps/chosen': -22.742584228515625, 'epoch': 2.46} 82%|████████▏ | 326/396 [2:38:37<33:43, 28.90s/it] 83%|████████▎ | 327/396 [2:39:06<33:17, 28.95s/it] {'loss': 0.5561, 'learning_rate': 9.691011235955055e-08, 'losses/dpo': 0.6037241816520691, 'losses/sft': 0.9915317296981812, 'losses/total': 0.6037241816520691, 'rewards/chosen': -0.47994428873062134, 'rewards/rejected': -0.9253543019294739, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.44541001319885254, 'logps/rejected': -34.62038803100586, 'logps/chosen': -27.817134857177734, 'ref_logps/rejected': -25.366844177246094, 'ref_logps/chosen': -23.01769256591797, 'epoch': 2.47} 83%|████████▎ | 327/396 [2:39:06<33:17, 28.95s/it] 83%|████████▎ | 328/396 [2:39:36<32:51, 28.99s/it] {'loss': 0.5811, 'learning_rate': 9.550561797752808e-08, 'losses/dpo': 0.5880983471870422, 'losses/sft': 1.1540213823318481, 'losses/total': 0.5880983471870422, 'rewards/chosen': -0.5365712642669678, 'rewards/rejected': -0.9574541449546814, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.42088285088539124, 'logps/rejected': -36.763641357421875, 'logps/chosen': -29.053791046142578, 'ref_logps/rejected': -27.189102172851562, 'ref_logps/chosen': -23.688079833984375, 'epoch': 2.48} 83%|████████▎ | 328/396 [2:39:36<32:51, 28.99s/it] 83%|████████▎ | 329/396 [2:40:04<32:19, 28.96s/it] {'loss': 0.56, 'learning_rate': 9.410112359550561e-08, 'losses/dpo': 0.4375653862953186, 'losses/sft': 1.0353739261627197, 'losses/total': 0.4375653862953186, 'rewards/chosen': -0.43260854482650757, 'rewards/rejected': -0.8912783861160278, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.45866984128952026, 'logps/rejected': -34.453372955322266, 'logps/chosen': -25.49301528930664, 'ref_logps/rejected': -25.54058837890625, 'ref_logps/chosen': -21.16693115234375, 'epoch': 2.48} 83%|████████▎ | 329/396 [2:40:04<32:19, 28.96s/it] 83%|████████▎ | 330/396 [2:40:33<31:50, 28.94s/it] {'loss': 0.5787, 'learning_rate': 9.269662921348314e-08, 'losses/dpo': 0.6422601938247681, 'losses/sft': 0.9122541546821594, 'losses/total': 0.6422601938247681, 'rewards/chosen': -0.45880773663520813, 'rewards/rejected': -0.8279755115509033, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.3691678047180176, 'logps/rejected': -31.897401809692383, 'logps/chosen': -27.841381072998047, 'ref_logps/rejected': -23.617647171020508, 'ref_logps/chosen': -23.25330352783203, 'epoch': 2.49} 83%|████████▎ | 330/396 [2:40:33<31:50, 28.94s/it] 84%|████████▎ | 331/396 [2:41:03<31:30, 29.09s/it] {'loss': 0.5962, 'learning_rate': 9.129213483146067e-08, 'losses/dpo': 0.7063708901405334, 'losses/sft': 1.0378127098083496, 'losses/total': 0.7063708901405334, 'rewards/chosen': -0.558958113193512, 'rewards/rejected': -0.956405520439148, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.3974474370479584, 'logps/rejected': -37.258323669433594, 'logps/chosen': -28.4913330078125, 'ref_logps/rejected': -27.694263458251953, 'ref_logps/chosen': -22.901752471923828, 'epoch': 2.5} 84%|████████▎ | 331/396 [2:41:03<31:30, 29.09s/it] 84%|████████▍ | 332/396 [2:41:32<30:58, 29.04s/it] {'loss': 0.5382, 'learning_rate': 8.988764044943819e-08, 'losses/dpo': 0.6065940856933594, 'losses/sft': 1.061606526374817, 'losses/total': 0.6065940856933594, 'rewards/chosen': -0.5570348501205444, 'rewards/rejected': -1.0411648750305176, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.48413002490997314, 'logps/rejected': -36.76737976074219, 'logps/chosen': -28.342041015625, 'ref_logps/rejected': -26.355728149414062, 'ref_logps/chosen': -22.77169418334961, 'epoch': 2.51} 84%|████████▍ | 332/396 [2:41:32<30:58, 29.04s/it] 84%|████████▍ | 333/396 [2:42:01<30:27, 29.00s/it] {'loss': 0.542, 'learning_rate': 8.848314606741572e-08, 'losses/dpo': 0.5146865844726562, 'losses/sft': 0.82643723487854, 'losses/total': 0.5146865844726562, 'rewards/chosen': -0.5214601755142212, 'rewards/rejected': -0.9904604554176331, 'rewards/accuracies': 0.75, 'rewards/margins': 0.4690002501010895, 'logps/rejected': -36.757789611816406, 'logps/chosen': -28.1925048828125, 'ref_logps/rejected': -26.85318374633789, 'ref_logps/chosen': -22.9779052734375, 'epoch': 2.51} 84%|████████▍ | 333/396 [2:42:01<30:27, 29.00s/it] 84%|████████▍ | 334/396 [2:42:30<30:00, 29.05s/it] {'loss': 0.5288, 'learning_rate': 8.707865168539325e-08, 'losses/dpo': 0.554874062538147, 'losses/sft': 0.9589724540710449, 'losses/total': 0.554874062538147, 'rewards/chosen': -0.4253048598766327, 'rewards/rejected': -0.9129682183265686, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.4876634180545807, 'logps/rejected': -34.3861083984375, 'logps/chosen': -26.62852668762207, 'ref_logps/rejected': -25.256423950195312, 'ref_logps/chosen': -22.37548065185547, 'epoch': 2.52} 84%|████████▍ | 334/396 [2:42:30<30:00, 29.05s/it] 85%|████████▍ | 335/396 [2:42:59<29:28, 29.00s/it] {'loss': 0.564, 'learning_rate': 8.567415730337078e-08, 'losses/dpo': 0.5628042817115784, 'losses/sft': 0.9816582202911377, 'losses/total': 0.5628042817115784, 'rewards/chosen': -0.5088753700256348, 'rewards/rejected': -0.9001079797744751, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.3912326395511627, 'logps/rejected': -35.13941192626953, 'logps/chosen': -30.657304763793945, 'ref_logps/rejected': -26.13833236694336, 'ref_logps/chosen': -25.568553924560547, 'epoch': 2.53} 85%|████████▍ | 335/396 [2:42:59<29:28, 29.00s/it] 85%|████████▍ | 336/396 [2:43:28<29:02, 29.05s/it] {'loss': 0.5751, 'learning_rate': 8.426966292134831e-08, 'losses/dpo': 0.5604207515716553, 'losses/sft': 0.9308174848556519, 'losses/total': 0.5604207515716553, 'rewards/chosen': -0.5098517537117004, 'rewards/rejected': -0.8933306932449341, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.38347893953323364, 'logps/rejected': -35.52781677246094, 'logps/chosen': -27.922801971435547, 'ref_logps/rejected': -26.594511032104492, 'ref_logps/chosen': -22.82427978515625, 'epoch': 2.54} 85%|████████▍ | 336/396 [2:43:28<29:02, 29.05s/it] 85%|████████▌ | 337/396 [2:43:57<28:36, 29.09s/it] {'loss': 0.5092, 'learning_rate': 8.286516853932583e-08, 'losses/dpo': 0.5189211368560791, 'losses/sft': 0.9888613224029541, 'losses/total': 0.5189211368560791, 'rewards/chosen': -0.45166367292404175, 'rewards/rejected': -1.0342316627502441, 'rewards/accuracies': 0.75, 'rewards/margins': 0.5825679302215576, 'logps/rejected': -37.19468688964844, 'logps/chosen': -26.58924674987793, 'ref_logps/rejected': -26.852371215820312, 'ref_logps/chosen': -22.07261085510254, 'epoch': 2.54} 85%|████████▌ | 337/396 [2:43:57<28:36, 29.09s/it] 85%|████████▌ | 338/396 [2:44:26<28:08, 29.12s/it] {'loss': 0.5636, 'learning_rate': 8.146067415730337e-08, 'losses/dpo': 0.5064201951026917, 'losses/sft': 1.0053503513336182, 'losses/total': 0.5064201951026917, 'rewards/chosen': -0.47179120779037476, 'rewards/rejected': -0.8839888572692871, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.41219767928123474, 'logps/rejected': -34.435699462890625, 'logps/chosen': -26.25957489013672, 'ref_logps/rejected': -25.595813751220703, 'ref_logps/chosen': -21.541664123535156, 'epoch': 2.55} 85%|████████▌ | 338/396 [2:44:26<28:08, 29.12s/it] 86%|████████▌ | 339/396 [2:44:55<27:38, 29.09s/it] {'loss': 0.5099, 'learning_rate': 8.005617977528089e-08, 'losses/dpo': 0.5300096273422241, 'losses/sft': 0.9853606224060059, 'losses/total': 0.5300096273422241, 'rewards/chosen': -0.4589604139328003, 'rewards/rejected': -1.0446293354034424, 'rewards/accuracies': 0.78125, 'rewards/margins': 0.5856689810752869, 'logps/rejected': -37.7884635925293, 'logps/chosen': -28.839031219482422, 'ref_logps/rejected': -27.3421688079834, 'ref_logps/chosen': -24.249427795410156, 'epoch': 2.56} 86%|████████▌ | 339/396 [2:44:55<27:38, 29.09s/it] 86%|████████▌ | 340/396 [2:45:24<27:06, 29.05s/it] {'loss': 0.5489, 'learning_rate': 7.865168539325842e-08, 'losses/dpo': 0.5961363315582275, 'losses/sft': 1.0056707859039307, 'losses/total': 0.5961363315582275, 'rewards/chosen': -0.4802970886230469, 'rewards/rejected': -0.9193466305732727, 'rewards/accuracies': 0.75, 'rewards/margins': 0.43904954195022583, 'logps/rejected': -35.306766510009766, 'logps/chosen': -24.501375198364258, 'ref_logps/rejected': -26.113300323486328, 'ref_logps/chosen': -19.69840431213379, 'epoch': 2.57} 86%|████████▌ | 340/396 [2:45:24<27:06, 29.05s/it] 86%|████████▌ | 341/396 [2:45:53<26:42, 29.14s/it] {'loss': 0.5047, 'learning_rate': 7.724719101123594e-08, 'losses/dpo': 0.584464430809021, 'losses/sft': 1.1327065229415894, 'losses/total': 0.584464430809021, 'rewards/chosen': -0.49795451760292053, 'rewards/rejected': -1.0785142183303833, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.5805596113204956, 'logps/rejected': -37.83360290527344, 'logps/chosen': -28.589330673217773, 'ref_logps/rejected': -27.048458099365234, 'ref_logps/chosen': -23.609786987304688, 'epoch': 2.57} 86%|████████▌ | 341/396 [2:45:53<26:42, 29.14s/it] 86%|████████▋ | 342/396 [2:46:22<26:08, 29.04s/it] {'loss': 0.5664, 'learning_rate': 7.584269662921348e-08, 'losses/dpo': 0.5289937257766724, 'losses/sft': 0.9366389513015747, 'losses/total': 0.5289937257766724, 'rewards/chosen': -0.5207412838935852, 'rewards/rejected': -0.9300122261047363, 'rewards/accuracies': 0.6953125, 'rewards/margins': 0.40927091240882874, 'logps/rejected': -36.51441192626953, 'logps/chosen': -26.895156860351562, 'ref_logps/rejected': -27.214290618896484, 'ref_logps/chosen': -21.687744140625, 'epoch': 2.58} 86%|████████▋ | 342/396 [2:46:22<26:08, 29.04s/it] 87%|████████▋ | 343/396 [2:46:51<25:41, 29.09s/it] {'loss': 0.5614, 'learning_rate': 7.443820224719101e-08, 'losses/dpo': 0.5385686159133911, 'losses/sft': 1.026196002960205, 'losses/total': 0.5385686159133911, 'rewards/chosen': -0.5171737670898438, 'rewards/rejected': -0.9449986815452576, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.4278249144554138, 'logps/rejected': -37.010292053222656, 'logps/chosen': -26.770729064941406, 'ref_logps/rejected': -27.5603084564209, 'ref_logps/chosen': -21.59899139404297, 'epoch': 2.59} 87%|████████▋ | 343/396 [2:46:51<25:41, 29.09s/it] 87%|████████▋ | 344/396 [2:47:20<25:10, 29.05s/it] {'loss': 0.5574, 'learning_rate': 7.303370786516853e-08, 'losses/dpo': 0.4933924973011017, 'losses/sft': 1.0346543788909912, 'losses/total': 0.4933924973011017, 'rewards/chosen': -0.5176920294761658, 'rewards/rejected': -1.0000420808792114, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.48235008120536804, 'logps/rejected': -37.94123840332031, 'logps/chosen': -28.282766342163086, 'ref_logps/rejected': -27.940820693969727, 'ref_logps/chosen': -23.10584831237793, 'epoch': 2.6} 87%|████████▋ | 344/396 [2:47:20<25:10, 29.05s/it] 87%|████████▋ | 345/396 [2:47:50<24:41, 29.05s/it] {'loss': 0.5095, 'learning_rate': 7.162921348314606e-08, 'losses/dpo': 0.46729788184165955, 'losses/sft': 1.0185084342956543, 'losses/total': 0.46729788184165955, 'rewards/chosen': -0.4950796663761139, 'rewards/rejected': -1.0654358863830566, 'rewards/accuracies': 0.8046875, 'rewards/margins': 0.5703563690185547, 'logps/rejected': -39.72615432739258, 'logps/chosen': -28.72567367553711, 'ref_logps/rejected': -29.071794509887695, 'ref_logps/chosen': -23.77487564086914, 'epoch': 2.6} 87%|████████▋ | 345/396 [2:47:50<24:41, 29.05s/it] 87%|████████▋ | 346/396 [2:48:19<24:13, 29.06s/it] {'loss': 0.5312, 'learning_rate': 7.022471910112359e-08, 'losses/dpo': 0.5006756782531738, 'losses/sft': 0.9233719110488892, 'losses/total': 0.5006756782531738, 'rewards/chosen': -0.4666164517402649, 'rewards/rejected': -0.9660595059394836, 'rewards/accuracies': 0.78125, 'rewards/margins': 0.49944305419921875, 'logps/rejected': -38.04795455932617, 'logps/chosen': -25.868453979492188, 'ref_logps/rejected': -28.387357711791992, 'ref_logps/chosen': -21.202287673950195, 'epoch': 2.61} 87%|████████▋ | 346/396 [2:48:19<24:13, 29.06s/it] 88%|████████▊ | 347/396 [2:48:48<23:43, 29.04s/it] {'loss': 0.5609, 'learning_rate': 6.882022471910112e-08, 'losses/dpo': 0.53383469581604, 'losses/sft': 1.0966167449951172, 'losses/total': 0.53383469581604, 'rewards/chosen': -0.5395419001579285, 'rewards/rejected': -0.9824715852737427, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.4429297149181366, 'logps/rejected': -37.18801498413086, 'logps/chosen': -30.674976348876953, 'ref_logps/rejected': -27.363300323486328, 'ref_logps/chosen': -25.279560089111328, 'epoch': 2.62} 88%|████████▊ | 347/396 [2:48:48<23:43, 29.04s/it] 88%|████████▊ | 348/396 [2:49:16<23:12, 29.00s/it] {'loss': 0.5459, 'learning_rate': 6.741573033707864e-08, 'losses/dpo': 0.5303448438644409, 'losses/sft': 1.0059340000152588, 'losses/total': 0.5303448438644409, 'rewards/chosen': -0.4470424950122833, 'rewards/rejected': -0.9374332427978516, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.49039074778556824, 'logps/rejected': -36.4505729675293, 'logps/chosen': -27.533077239990234, 'ref_logps/rejected': -27.07624053955078, 'ref_logps/chosen': -23.062650680541992, 'epoch': 2.63} 88%|████████▊ | 348/396 [2:49:16<23:12, 29.00s/it] 88%|████████▊ | 349/396 [2:49:45<22:41, 28.98s/it] {'loss': 0.5395, 'learning_rate': 6.601123595505617e-08, 'losses/dpo': 0.46134790778160095, 'losses/sft': 1.0326218605041504, 'losses/total': 0.46134790778160095, 'rewards/chosen': -0.5758577585220337, 'rewards/rejected': -1.0862531661987305, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.5103954076766968, 'logps/rejected': -37.460845947265625, 'logps/chosen': -27.371315002441406, 'ref_logps/rejected': -26.598316192626953, 'ref_logps/chosen': -21.61273765563965, 'epoch': 2.63} 88%|████████▊ | 349/396 [2:49:45<22:41, 28.98s/it] 88%|████████▊ | 350/396 [2:50:15<22:15, 29.02s/it] {'loss': 0.531, 'learning_rate': 6.460674157303371e-08, 'losses/dpo': 0.5929858684539795, 'losses/sft': 0.8796969056129456, 'losses/total': 0.5929858684539795, 'rewards/chosen': -0.48361673951148987, 'rewards/rejected': -1.0177090167999268, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.5340923070907593, 'logps/rejected': -35.51959228515625, 'logps/chosen': -26.683940887451172, 'ref_logps/rejected': -25.34250259399414, 'ref_logps/chosen': -21.84777069091797, 'epoch': 2.64} 88%|████████▊ | 350/396 [2:50:15<22:15, 29.02s/it] 89%|████████▊ | 351/396 [2:50:43<21:44, 29.00s/it] {'loss': 0.5638, 'learning_rate': 6.320224719101123e-08, 'losses/dpo': 0.46039754152297974, 'losses/sft': 1.014696478843689, 'losses/total': 0.46039754152297974, 'rewards/chosen': -0.4954075217247009, 'rewards/rejected': -0.9326165914535522, 'rewards/accuracies': 0.6484375, 'rewards/margins': 0.4372091293334961, 'logps/rejected': -36.31642150878906, 'logps/chosen': -27.784767150878906, 'ref_logps/rejected': -26.99026107788086, 'ref_logps/chosen': -22.830692291259766, 'epoch': 2.65} 89%|████████▊ | 351/396 [2:50:43<21:44, 29.00s/it] 89%|████████▉ | 352/396 [2:51:12<21:13, 28.95s/it] {'loss': 0.5307, 'learning_rate': 6.179775280898876e-08, 'losses/dpo': 0.5120245218276978, 'losses/sft': 0.9590541124343872, 'losses/total': 0.5120245218276978, 'rewards/chosen': -0.3967083692550659, 'rewards/rejected': -0.9047658443450928, 'rewards/accuracies': 0.7890625, 'rewards/margins': 0.5080575346946716, 'logps/rejected': -32.802425384521484, 'logps/chosen': -23.8892765045166, 'ref_logps/rejected': -23.754770278930664, 'ref_logps/chosen': -19.922191619873047, 'epoch': 2.66} 89%|████████▉ | 352/396 [2:51:12<21:13, 28.95s/it] 89%|████████▉ | 353/396 [2:51:42<20:47, 29.02s/it] {'loss': 0.5216, 'learning_rate': 6.039325842696629e-08, 'losses/dpo': 0.5157948136329651, 'losses/sft': 0.8797988891601562, 'losses/total': 0.5157948136329651, 'rewards/chosen': -0.5669355988502502, 'rewards/rejected': -1.1337225437164307, 'rewards/accuracies': 0.796875, 'rewards/margins': 0.5667868852615356, 'logps/rejected': -39.759193420410156, 'logps/chosen': -27.902587890625, 'ref_logps/rejected': -28.421966552734375, 'ref_logps/chosen': -22.233232498168945, 'epoch': 2.66} 89%|████████▉ | 353/396 [2:51:42<20:47, 29.02s/it] 89%|████████▉ | 354/396 [2:52:11<20:21, 29.08s/it] {'loss': 0.5154, 'learning_rate': 5.898876404494382e-08, 'losses/dpo': 0.6272658705711365, 'losses/sft': 0.901512086391449, 'losses/total': 0.6272658705711365, 'rewards/chosen': -0.49750083684921265, 'rewards/rejected': -1.0629794597625732, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.5654786825180054, 'logps/rejected': -38.05504608154297, 'logps/chosen': -27.200105667114258, 'ref_logps/rejected': -27.425247192382812, 'ref_logps/chosen': -22.22509765625, 'epoch': 2.67} 89%|████████▉ | 354/396 [2:52:11<20:21, 29.08s/it] 90%|████████▉ | 355/396 [2:52:40<19:50, 29.03s/it] {'loss': 0.5507, 'learning_rate': 5.758426966292135e-08, 'losses/dpo': 0.4642670750617981, 'losses/sft': 1.0486382246017456, 'losses/total': 0.4642670750617981, 'rewards/chosen': -0.5843857526779175, 'rewards/rejected': -1.0911391973495483, 'rewards/accuracies': 0.78125, 'rewards/margins': 0.5067534446716309, 'logps/rejected': -38.597557067871094, 'logps/chosen': -29.658336639404297, 'ref_logps/rejected': -27.686166763305664, 'ref_logps/chosen': -23.814481735229492, 'epoch': 2.68} 90%|████████▉ | 355/396 [2:52:40<19:50, 29.03s/it] 90%|████████▉ | 356/396 [2:53:09<19:19, 28.98s/it] {'loss': 0.5631, 'learning_rate': 5.617977528089887e-08, 'losses/dpo': 0.5865851640701294, 'losses/sft': 1.1602400541305542, 'losses/total': 0.5865851640701294, 'rewards/chosen': -0.4928884506225586, 'rewards/rejected': -0.9124425649642944, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.4195541441440582, 'logps/rejected': -33.06968688964844, 'logps/chosen': -23.939620971679688, 'ref_logps/rejected': -23.945262908935547, 'ref_logps/chosen': -19.01073455810547, 'epoch': 2.69} 90%|████████▉ | 356/396 [2:53:09<19:19, 28.98s/it] 90%|█████████ | 357/396 [2:53:38<18:51, 29.00s/it] {'loss': 0.5461, 'learning_rate': 5.47752808988764e-08, 'losses/dpo': 0.6400465369224548, 'losses/sft': 1.0134565830230713, 'losses/total': 0.6400465369224548, 'rewards/chosen': -0.43580204248428345, 'rewards/rejected': -0.9310164451599121, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.4952143728733063, 'logps/rejected': -36.65930938720703, 'logps/chosen': -27.94991683959961, 'ref_logps/rejected': -27.34914779663086, 'ref_logps/chosen': -23.591896057128906, 'epoch': 2.69} 90%|█████████ | 357/396 [2:53:38<18:51, 29.00s/it] 90%|█████████ | 358/396 [2:54:06<18:20, 28.96s/it] {'loss': 0.561, 'learning_rate': 5.3370786516853926e-08, 'losses/dpo': 0.47015029191970825, 'losses/sft': 0.923213005065918, 'losses/total': 0.47015029191970825, 'rewards/chosen': -0.4857832193374634, 'rewards/rejected': -0.9034351110458374, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.417651891708374, 'logps/rejected': -34.08583450317383, 'logps/chosen': -28.148937225341797, 'ref_logps/rejected': -25.051483154296875, 'ref_logps/chosen': -23.29110336303711, 'epoch': 2.7} 90%|█████████ | 358/396 [2:54:06<18:20, 28.96s/it] 91%|█████████ | 359/396 [2:54:36<17:56, 29.08s/it] {'loss': 0.5196, 'learning_rate': 5.196629213483146e-08, 'losses/dpo': 0.4919354021549225, 'losses/sft': 0.9875601530075073, 'losses/total': 0.4919354021549225, 'rewards/chosen': -0.4711495637893677, 'rewards/rejected': -0.9898011684417725, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.51865154504776, 'logps/rejected': -35.78190994262695, 'logps/chosen': -26.931848526000977, 'ref_logps/rejected': -25.88389778137207, 'ref_logps/chosen': -22.220352172851562, 'epoch': 2.71} 91%|█████████ | 359/396 [2:54:36<17:56, 29.08s/it] 91%|█████████ | 360/396 [2:55:05<17:25, 29.03s/it] {'loss': 0.5141, 'learning_rate': 5.056179775280899e-08, 'losses/dpo': 0.5143895745277405, 'losses/sft': 0.8888437747955322, 'losses/total': 0.5143895745277405, 'rewards/chosen': -0.4738062918186188, 'rewards/rejected': -1.0374642610549927, 'rewards/accuracies': 0.796875, 'rewards/margins': 0.5636579394340515, 'logps/rejected': -36.244728088378906, 'logps/chosen': -26.860830307006836, 'ref_logps/rejected': -25.870086669921875, 'ref_logps/chosen': -22.12276840209961, 'epoch': 2.72} 91%|█████████ | 360/396 [2:55:05<17:25, 29.03s/it] 91%|█████████ | 361/396 [2:55:34<16:55, 29.02s/it] {'loss': 0.5612, 'learning_rate': 4.915730337078652e-08, 'losses/dpo': 0.5186240077018738, 'losses/sft': 1.109127402305603, 'losses/total': 0.5186240077018738, 'rewards/chosen': -0.5114448070526123, 'rewards/rejected': -1.0037946701049805, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.4923498034477234, 'logps/rejected': -35.34681701660156, 'logps/chosen': -27.645713806152344, 'ref_logps/rejected': -25.30887222290039, 'ref_logps/chosen': -22.531267166137695, 'epoch': 2.72} 91%|█████████ | 361/396 [2:55:34<16:55, 29.02s/it] 91%|█████████▏| 362/396 [2:56:03<16:26, 29.02s/it] {'loss': 0.5701, 'learning_rate': 4.775280898876404e-08, 'losses/dpo': 0.5167029500007629, 'losses/sft': 1.1346383094787598, 'losses/total': 0.5167029500007629, 'rewards/chosen': -0.5677646398544312, 'rewards/rejected': -0.9773082137107849, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.40954357385635376, 'logps/rejected': -37.111507415771484, 'logps/chosen': -29.528343200683594, 'ref_logps/rejected': -27.338424682617188, 'ref_logps/chosen': -23.850698471069336, 'epoch': 2.73} 91%|█████████▏| 362/396 [2:56:03<16:26, 29.02s/it] 92%|█████████▏| 363/396 [2:56:32<15:56, 28.99s/it] {'loss': 0.5367, 'learning_rate': 4.634831460674157e-08, 'losses/dpo': 0.6075611114501953, 'losses/sft': 1.0922847986221313, 'losses/total': 0.6075611114501953, 'rewards/chosen': -0.5451704859733582, 'rewards/rejected': -1.0743281841278076, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.5291576385498047, 'logps/rejected': -38.734046936035156, 'logps/chosen': -27.099462509155273, 'ref_logps/rejected': -27.990768432617188, 'ref_logps/chosen': -21.647756576538086, 'epoch': 2.74} 92%|█████████▏| 363/396 [2:56:32<15:56, 28.99s/it] 92%|█████████▏| 364/396 [2:57:01<15:28, 29.02s/it] {'loss': 0.5448, 'learning_rate': 4.4943820224719096e-08, 'losses/dpo': 0.5679644346237183, 'losses/sft': 1.123085618019104, 'losses/total': 0.5679644346237183, 'rewards/chosen': -0.5748672485351562, 'rewards/rejected': -1.1037259101867676, 'rewards/accuracies': 0.75, 'rewards/margins': 0.5288586020469666, 'logps/rejected': -38.87983703613281, 'logps/chosen': -28.804433822631836, 'ref_logps/rejected': -27.842578887939453, 'ref_logps/chosen': -23.055761337280273, 'epoch': 2.75} 92%|█████████▏| 364/396 [2:57:01<15:28, 29.02s/it] 92%|█████████▏| 365/396 [2:57:30<15:06, 29.24s/it] {'loss': 0.5544, 'learning_rate': 4.3539325842696626e-08, 'losses/dpo': 0.4389882981777191, 'losses/sft': 0.9757397174835205, 'losses/total': 0.4389882981777191, 'rewards/chosen': -0.5145817995071411, 'rewards/rejected': -1.0139836072921753, 'rewards/accuracies': 0.6640625, 'rewards/margins': 0.4994018077850342, 'logps/rejected': -37.742164611816406, 'logps/chosen': -29.942031860351562, 'ref_logps/rejected': -27.602325439453125, 'ref_logps/chosen': -24.796215057373047, 'epoch': 2.75} 92%|█████████▏| 365/396 [2:57:30<15:06, 29.24s/it] 92%|█████████▏| 366/396 [2:58:00<14:35, 29.19s/it] {'loss': 0.57, 'learning_rate': 4.213483146067416e-08, 'losses/dpo': 0.571212887763977, 'losses/sft': 0.8268208503723145, 'losses/total': 0.571212887763977, 'rewards/chosen': -0.5733895897865295, 'rewards/rejected': -0.9933279752731323, 'rewards/accuracies': 0.75, 'rewards/margins': 0.4199383854866028, 'logps/rejected': -35.81608581542969, 'logps/chosen': -30.154991149902344, 'ref_logps/rejected': -25.88280487060547, 'ref_logps/chosen': -24.421096801757812, 'epoch': 2.76} 92%|█████████▏| 366/396 [2:58:00<14:35, 29.19s/it] 93%|█████████▎| 367/396 [2:58:29<14:09, 29.30s/it] {'loss': 0.5874, 'learning_rate': 4.073033707865169e-08, 'losses/dpo': 0.4875527620315552, 'losses/sft': 0.8703315854072571, 'losses/total': 0.4875527620315552, 'rewards/chosen': -0.45097634196281433, 'rewards/rejected': -0.8424156308174133, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.3914392292499542, 'logps/rejected': -33.205955505371094, 'logps/chosen': -27.25971794128418, 'ref_logps/rejected': -24.781803131103516, 'ref_logps/chosen': -22.749954223632812, 'epoch': 2.77} 93%|█████████▎| 367/396 [2:58:29<14:09, 29.30s/it] 93%|█████████▎| 368/396 [2:58:58<13:37, 29.21s/it] {'loss': 0.508, 'learning_rate': 3.932584269662921e-08, 'losses/dpo': 0.4668968617916107, 'losses/sft': 1.1078698635101318, 'losses/total': 0.4668968617916107, 'rewards/chosen': -0.47762107849121094, 'rewards/rejected': -1.0654823780059814, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.5878612995147705, 'logps/rejected': -37.86750030517578, 'logps/chosen': -28.230928421020508, 'ref_logps/rejected': -27.212678909301758, 'ref_logps/chosen': -23.454715728759766, 'epoch': 2.78} 93%|█████████▎| 368/396 [2:58:58<13:37, 29.21s/it] 93%|█████████▎| 369/396 [2:59:27<13:06, 29.11s/it] {'loss': 0.5722, 'learning_rate': 3.792134831460674e-08, 'losses/dpo': 0.5119404196739197, 'losses/sft': 1.0701940059661865, 'losses/total': 0.5119404196739197, 'rewards/chosen': -0.5569244623184204, 'rewards/rejected': -0.9840888977050781, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.42716455459594727, 'logps/rejected': -39.07720184326172, 'logps/chosen': -28.5417423248291, 'ref_logps/rejected': -29.236312866210938, 'ref_logps/chosen': -22.97249984741211, 'epoch': 2.78} 93%|█████████▎| 369/396 [2:59:27<13:06, 29.11s/it] 93%|█████████▎| 370/396 [2:59:56<12:35, 29.05s/it] {'loss': 0.5144, 'learning_rate': 3.6516853932584266e-08, 'losses/dpo': 0.39502987265586853, 'losses/sft': 1.0756311416625977, 'losses/total': 0.39502987265586853, 'rewards/chosen': -0.474331796169281, 'rewards/rejected': -1.055091381072998, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.5807597041130066, 'logps/rejected': -35.577354431152344, 'logps/chosen': -24.37343406677246, 'ref_logps/rejected': -25.026439666748047, 'ref_logps/chosen': -19.630115509033203, 'epoch': 2.79} 93%|█████████▎| 370/396 [2:59:56<12:35, 29.05s/it] 94%|█████████▎| 371/396 [3:00:25<12:06, 29.05s/it] {'loss': 0.5757, 'learning_rate': 3.5112359550561796e-08, 'losses/dpo': 0.5865879058837891, 'losses/sft': 1.0159986019134521, 'losses/total': 0.5865879058837891, 'rewards/chosen': -0.4642760157585144, 'rewards/rejected': -0.8929827213287354, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.42870670557022095, 'logps/rejected': -35.49622344970703, 'logps/chosen': -25.75430679321289, 'ref_logps/rejected': -26.56639862060547, 'ref_logps/chosen': -21.11154556274414, 'epoch': 2.8} 94%|█████████▎| 371/396 [3:00:25<12:06, 29.05s/it] 94%|█████████▍| 372/396 [3:00:54<11:35, 29.00s/it] {'loss': 0.507, 'learning_rate': 3.370786516853932e-08, 'losses/dpo': 0.6495200395584106, 'losses/sft': 1.097916841506958, 'losses/total': 0.6495200395584106, 'rewards/chosen': -0.45352903008461, 'rewards/rejected': -1.0481172800064087, 'rewards/accuracies': 0.7734375, 'rewards/margins': 0.5945882797241211, 'logps/rejected': -36.934810638427734, 'logps/chosen': -28.015663146972656, 'ref_logps/rejected': -26.453636169433594, 'ref_logps/chosen': -23.48037338256836, 'epoch': 2.81} 94%|█████████▍| 372/396 [3:00:54<11:35, 29.00s/it] 94%|█████████▍| 373/396 [3:01:23<11:08, 29.05s/it] {'loss': 0.5557, 'learning_rate': 3.230337078651686e-08, 'losses/dpo': 0.3998969793319702, 'losses/sft': 0.9329382181167603, 'losses/total': 0.3998969793319702, 'rewards/chosen': -0.5536371469497681, 'rewards/rejected': -1.0455635786056519, 'rewards/accuracies': 0.75, 'rewards/margins': 0.49192649126052856, 'logps/rejected': -35.190895080566406, 'logps/chosen': -29.112939834594727, 'ref_logps/rejected': -24.735258102416992, 'ref_logps/chosen': -23.576570510864258, 'epoch': 2.82} 94%|█████████▍| 373/396 [3:01:23<11:08, 29.05s/it] 94%|█████████▍| 374/396 [3:01:52<10:38, 29.02s/it] {'loss': 0.5536, 'learning_rate': 3.089887640449438e-08, 'losses/dpo': 0.548796534538269, 'losses/sft': 1.0410091876983643, 'losses/total': 0.548796534538269, 'rewards/chosen': -0.49513280391693115, 'rewards/rejected': -0.9486956000328064, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.45356276631355286, 'logps/rejected': -37.038963317871094, 'logps/chosen': -28.713830947875977, 'ref_logps/rejected': -27.5520076751709, 'ref_logps/chosen': -23.76250457763672, 'epoch': 2.82} 94%|█████████▍| 374/396 [3:01:52<10:38, 29.02s/it] 95%|█████████▍| 375/396 [3:02:22<10:14, 29.27s/it] {'loss': 0.5343, 'learning_rate': 2.949438202247191e-08, 'losses/dpo': 0.6983579397201538, 'losses/sft': 1.0986469984054565, 'losses/total': 0.6983579397201538, 'rewards/chosen': -0.5046992897987366, 'rewards/rejected': -1.050853967666626, 'rewards/accuracies': 0.75, 'rewards/margins': 0.5461547374725342, 'logps/rejected': -39.406578063964844, 'logps/chosen': -29.465068817138672, 'ref_logps/rejected': -28.89803695678711, 'ref_logps/chosen': -24.418071746826172, 'epoch': 2.83} 95%|█████████▍| 375/396 [3:02:22<10:14, 29.27s/it] 95%|█████████▍| 376/396 [3:02:51<09:43, 29.16s/it] {'loss': 0.5838, 'learning_rate': 2.8089887640449436e-08, 'losses/dpo': 0.615436851978302, 'losses/sft': 1.064025640487671, 'losses/total': 0.615436851978302, 'rewards/chosen': -0.5222063660621643, 'rewards/rejected': -0.8897100687026978, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.36750373244285583, 'logps/rejected': -35.225887298583984, 'logps/chosen': -27.72464370727539, 'ref_logps/rejected': -26.32878875732422, 'ref_logps/chosen': -22.502582550048828, 'epoch': 2.84} 95%|█████████▍| 376/396 [3:02:51<09:43, 29.16s/it] 95%|█████████▌| 377/396 [3:03:20<09:12, 29.09s/it] {'loss': 0.5351, 'learning_rate': 2.6685393258426963e-08, 'losses/dpo': 0.5552591681480408, 'losses/sft': 0.8796924352645874, 'losses/total': 0.5552591681480408, 'rewards/chosen': -0.49976038932800293, 'rewards/rejected': -1.0156128406524658, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.5158523917198181, 'logps/rejected': -35.305564880371094, 'logps/chosen': -26.368640899658203, 'ref_logps/rejected': -25.149438858032227, 'ref_logps/chosen': -21.371036529541016, 'epoch': 2.85} 95%|█████████▌| 377/396 [3:03:20<09:12, 29.09s/it] 95%|█████████▌| 378/396 [3:03:49<08:42, 29.03s/it] {'loss': 0.5338, 'learning_rate': 2.5280898876404493e-08, 'losses/dpo': 0.4755927324295044, 'losses/sft': 0.9763241410255432, 'losses/total': 0.4755927324295044, 'rewards/chosen': -0.4919911026954651, 'rewards/rejected': -1.0006005764007568, 'rewards/accuracies': 0.734375, 'rewards/margins': 0.5086094737052917, 'logps/rejected': -38.05325698852539, 'logps/chosen': -29.838565826416016, 'ref_logps/rejected': -28.04724884033203, 'ref_logps/chosen': -24.918655395507812, 'epoch': 2.85} 95%|█████████▌| 378/396 [3:03:49<08:42, 29.03s/it] 96%|█████████▌| 379/396 [3:04:17<08:12, 28.99s/it] {'loss': 0.5335, 'learning_rate': 2.387640449438202e-08, 'losses/dpo': 0.5385127067565918, 'losses/sft': 1.245056390762329, 'losses/total': 0.5385127067565918, 'rewards/chosen': -0.5107479095458984, 'rewards/rejected': -1.0612382888793945, 'rewards/accuracies': 0.7421875, 'rewards/margins': 0.5504903793334961, 'logps/rejected': -35.4906005859375, 'logps/chosen': -29.036991119384766, 'ref_logps/rejected': -24.878217697143555, 'ref_logps/chosen': -23.929513931274414, 'epoch': 2.86} 96%|█████████▌| 379/396 [3:04:17<08:12, 28.99s/it] 96%|█████████▌| 380/396 [3:04:46<07:43, 28.97s/it] {'loss': 0.545, 'learning_rate': 2.2471910112359548e-08, 'losses/dpo': 0.43943360447883606, 'losses/sft': 1.023887276649475, 'losses/total': 0.43943360447883606, 'rewards/chosen': -0.5317370891571045, 'rewards/rejected': -1.0256378650665283, 'rewards/accuracies': 0.75, 'rewards/margins': 0.49390077590942383, 'logps/rejected': -38.68418884277344, 'logps/chosen': -29.392702102661133, 'ref_logps/rejected': -28.427806854248047, 'ref_logps/chosen': -24.07533073425293, 'epoch': 2.87} 96%|█████████▌| 380/396 [3:04:46<07:43, 28.97s/it] 96%|█████████▌| 381/396 [3:05:15<07:13, 28.90s/it] {'loss': 0.56, 'learning_rate': 2.106741573033708e-08, 'losses/dpo': 0.6935892701148987, 'losses/sft': 1.0011663436889648, 'losses/total': 0.6935892701148987, 'rewards/chosen': -0.3994261920452118, 'rewards/rejected': -0.8521024584770203, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.45267629623413086, 'logps/rejected': -32.384376525878906, 'logps/chosen': -25.038589477539062, 'ref_logps/rejected': -23.86334991455078, 'ref_logps/chosen': -21.044326782226562, 'epoch': 2.88} 96%|█████████▌| 381/396 [3:05:15<07:13, 28.90s/it] 96%|█████████▋| 382/396 [3:05:44<06:45, 28.93s/it] {'loss': 0.5936, 'learning_rate': 1.9662921348314606e-08, 'losses/dpo': 0.547340989112854, 'losses/sft': 1.0020110607147217, 'losses/total': 0.547340989112854, 'rewards/chosen': -0.6141834259033203, 'rewards/rejected': -1.0012922286987305, 'rewards/accuracies': 0.6171875, 'rewards/margins': 0.38710883259773254, 'logps/rejected': -37.0654411315918, 'logps/chosen': -30.07229995727539, 'ref_logps/rejected': -27.052518844604492, 'ref_logps/chosen': -23.930465698242188, 'epoch': 2.88} 96%|█████████▋| 382/396 [3:05:44<06:45, 28.93s/it] 97%|█████████▋| 383/396 [3:06:13<06:16, 28.95s/it] {'loss': 0.537, 'learning_rate': 1.8258426966292133e-08, 'losses/dpo': 0.5175353288650513, 'losses/sft': 0.8916615843772888, 'losses/total': 0.5175353288650513, 'rewards/chosen': -0.5025948286056519, 'rewards/rejected': -1.0740119218826294, 'rewards/accuracies': 0.6796875, 'rewards/margins': 0.5714170932769775, 'logps/rejected': -40.710792541503906, 'logps/chosen': -30.305606842041016, 'ref_logps/rejected': -29.970672607421875, 'ref_logps/chosen': -25.279661178588867, 'epoch': 2.89} 97%|█████████▋| 383/396 [3:06:13<06:16, 28.95s/it] 97%|█████████▋| 384/396 [3:06:43<05:50, 29.19s/it] {'loss': 0.5598, 'learning_rate': 1.685393258426966e-08, 'losses/dpo': 0.4781198799610138, 'losses/sft': 1.0425841808319092, 'losses/total': 0.4781198799610138, 'rewards/chosen': -0.6009576916694641, 'rewards/rejected': -1.0742262601852417, 'rewards/accuracies': 0.7109375, 'rewards/margins': 0.4732685387134552, 'logps/rejected': -39.89691162109375, 'logps/chosen': -29.87887191772461, 'ref_logps/rejected': -29.154647827148438, 'ref_logps/chosen': -23.869295120239258, 'epoch': 2.9} 97%|█████████▋| 384/396 [3:06:43<05:50, 29.19s/it] 97%|█████████▋| 385/396 [3:07:12<05:19, 29.08s/it] {'loss': 0.5186, 'learning_rate': 1.544943820224719e-08, 'losses/dpo': 0.5135948657989502, 'losses/sft': 0.9224843978881836, 'losses/total': 0.5135948657989502, 'rewards/chosen': -0.48452913761138916, 'rewards/rejected': -1.0585482120513916, 'rewards/accuracies': 0.75, 'rewards/margins': 0.5740190744400024, 'logps/rejected': -39.657188415527344, 'logps/chosen': -26.600048065185547, 'ref_logps/rejected': -29.07170867919922, 'ref_logps/chosen': -21.754756927490234, 'epoch': 2.91} 97%|█████████▋| 385/396 [3:07:12<05:19, 29.08s/it] 97%|█████████▋| 386/396 [3:07:41<04:50, 29.02s/it] {'loss': 0.5551, 'learning_rate': 1.4044943820224718e-08, 'losses/dpo': 0.5367317199707031, 'losses/sft': 1.0271828174591064, 'losses/total': 0.5367317199707031, 'rewards/chosen': -0.5466164350509644, 'rewards/rejected': -1.0410076379776, 'rewards/accuracies': 0.7578125, 'rewards/margins': 0.4943912625312805, 'logps/rejected': -37.88126754760742, 'logps/chosen': -27.77488136291504, 'ref_logps/rejected': -27.471187591552734, 'ref_logps/chosen': -22.3087158203125, 'epoch': 2.91} 97%|█████████▋| 386/396 [3:07:41<04:50, 29.02s/it] 98%|█████████▊| 387/396 [3:08:10<04:21, 29.06s/it] {'loss': 0.5438, 'learning_rate': 1.2640449438202247e-08, 'losses/dpo': 0.5493422746658325, 'losses/sft': 0.9023943543434143, 'losses/total': 0.5493422746658325, 'rewards/chosen': -0.526377260684967, 'rewards/rejected': -1.0052706003189087, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.47889336943626404, 'logps/rejected': -36.061241149902344, 'logps/chosen': -28.72400665283203, 'ref_logps/rejected': -26.00853729248047, 'ref_logps/chosen': -23.460235595703125, 'epoch': 2.92} 98%|█████████▊| 387/396 [3:08:10<04:21, 29.06s/it] 98%|█████████▊| 388/396 [3:08:39<03:52, 29.10s/it] {'loss': 0.5852, 'learning_rate': 1.1235955056179774e-08, 'losses/dpo': 0.5147813558578491, 'losses/sft': 0.8766761422157288, 'losses/total': 0.5147813558578491, 'rewards/chosen': -0.5916341543197632, 'rewards/rejected': -1.0122039318084717, 'rewards/accuracies': 0.6875, 'rewards/margins': 0.4205697774887085, 'logps/rejected': -37.490928649902344, 'logps/chosen': -27.819026947021484, 'ref_logps/rejected': -27.36888885498047, 'ref_logps/chosen': -21.90268898010254, 'epoch': 2.93} 98%|█████████▊| 388/396 [3:08:39<03:52, 29.10s/it] 98%|█████████▊| 389/396 [3:09:08<03:23, 29.07s/it] {'loss': 0.524, 'learning_rate': 9.831460674157303e-09, 'losses/dpo': 0.5489503741264343, 'losses/sft': 0.9560513496398926, 'losses/total': 0.5489503741264343, 'rewards/chosen': -0.5011276006698608, 'rewards/rejected': -1.059302806854248, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.5581751465797424, 'logps/rejected': -37.83194351196289, 'logps/chosen': -26.303754806518555, 'ref_logps/rejected': -27.238914489746094, 'ref_logps/chosen': -21.29248046875, 'epoch': 2.94} 98%|█████████▊| 389/396 [3:09:08<03:23, 29.07s/it] 98%|█████████▊| 390/396 [3:09:37<02:55, 29.19s/it] {'loss': 0.5441, 'learning_rate': 8.42696629213483e-09, 'losses/dpo': 0.5884628295898438, 'losses/sft': 0.9961035251617432, 'losses/total': 0.5884628295898438, 'rewards/chosen': -0.5088227391242981, 'rewards/rejected': -1.0503180027008057, 'rewards/accuracies': 0.7265625, 'rewards/margins': 0.5414952635765076, 'logps/rejected': -38.258975982666016, 'logps/chosen': -26.287546157836914, 'ref_logps/rejected': -27.755794525146484, 'ref_logps/chosen': -21.199317932128906, 'epoch': 2.94} 98%|█████████▊| 390/396 [3:09:37<02:55, 29.19s/it] 99%|█████████▊| 391/396 [3:10:06<02:25, 29.15s/it] {'loss': 0.5711, 'learning_rate': 7.022471910112359e-09, 'losses/dpo': 0.6062160730361938, 'losses/sft': 0.9891349673271179, 'losses/total': 0.6062160730361938, 'rewards/chosen': -0.5031411051750183, 'rewards/rejected': -0.9469910264015198, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.44384992122650146, 'logps/rejected': -35.26235580444336, 'logps/chosen': -29.260208129882812, 'ref_logps/rejected': -25.79244613647461, 'ref_logps/chosen': -24.228797912597656, 'epoch': 2.95} 99%|█████████▊| 391/396 [3:10:06<02:25, 29.15s/it] 99%|█████████▉| 392/396 [3:10:36<01:57, 29.28s/it] {'loss': 0.5188, 'learning_rate': 5.617977528089887e-09, 'losses/dpo': 0.48179134726524353, 'losses/sft': 1.0057315826416016, 'losses/total': 0.48179134726524353, 'rewards/chosen': -0.49385106563568115, 'rewards/rejected': -1.0613962411880493, 'rewards/accuracies': 0.796875, 'rewards/margins': 0.5675452351570129, 'logps/rejected': -38.197296142578125, 'logps/chosen': -26.954505920410156, 'ref_logps/rejected': -27.583335876464844, 'ref_logps/chosen': -22.015995025634766, 'epoch': 2.96} 99%|█████████▉| 392/396 [3:10:36<01:57, 29.28s/it] 99%|█████████▉| 393/396 [3:11:05<01:27, 29.15s/it] {'loss': 0.5126, 'learning_rate': 4.213483146067415e-09, 'losses/dpo': 0.47302547097206116, 'losses/sft': 1.0042707920074463, 'losses/total': 0.47302547097206116, 'rewards/chosen': -0.45173099637031555, 'rewards/rejected': -1.0621416568756104, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.610410749912262, 'logps/rejected': -37.711891174316406, 'logps/chosen': -25.941349029541016, 'ref_logps/rejected': -27.090473175048828, 'ref_logps/chosen': -21.42403793334961, 'epoch': 2.97} 99%|█████████▉| 393/396 [3:11:05<01:27, 29.15s/it] 99%|█████████▉| 394/396 [3:11:34<00:58, 29.20s/it] {'loss': 0.5282, 'learning_rate': 2.8089887640449435e-09, 'losses/dpo': 0.47439950704574585, 'losses/sft': 1.004162073135376, 'losses/total': 0.47439950704574585, 'rewards/chosen': -0.5086286067962646, 'rewards/rejected': -1.0870963335037231, 'rewards/accuracies': 0.765625, 'rewards/margins': 0.5784677267074585, 'logps/rejected': -40.14276123046875, 'logps/chosen': -27.533342361450195, 'ref_logps/rejected': -29.271793365478516, 'ref_logps/chosen': -22.44705581665039, 'epoch': 2.97} 99%|█████████▉| 394/396 [3:11:34<00:58, 29.20s/it] 100%|█████████▉| 395/396 [3:12:04<00:29, 29.33s/it] {'loss': 0.5377, 'learning_rate': 1.4044943820224717e-09, 'losses/dpo': 0.5113502740859985, 'losses/sft': 1.0710563659667969, 'losses/total': 0.5113502740859985, 'rewards/chosen': -0.4875642657279968, 'rewards/rejected': -1.0270556211471558, 'rewards/accuracies': 0.71875, 'rewards/margins': 0.5394913554191589, 'logps/rejected': -38.508323669433594, 'logps/chosen': -27.44398307800293, 'ref_logps/rejected': -28.23776626586914, 'ref_logps/chosen': -22.568340301513672, 'epoch': 2.98} 100%|█████████▉| 395/396 [3:12:04<00:29, 29.33s/it] 100%|██████████| 396/396 [3:12:33<00:00, 29.24s/it] {'loss': 0.5692, 'learning_rate': 0.0, 'losses/dpo': 0.7008877992630005, 'losses/sft': 1.1200252771377563, 'losses/total': 0.7008877992630005, 'rewards/chosen': -0.5250504612922668, 'rewards/rejected': -0.9568085670471191, 'rewards/accuracies': 0.703125, 'rewards/margins': 0.4317581057548523, 'logps/rejected': -36.77953338623047, 'logps/chosen': -28.845203399658203, 'ref_logps/rejected': -27.211450576782227, 'ref_logps/chosen': -23.59469985961914, 'epoch': 2.99} 100%|██████████| 396/396 [3:12:33<00:00, 29.24s/it] {'train_runtime': 11562.7876, 'train_samples_per_second': 4.4, 'train_steps_per_second': 0.034, 'train_loss': 0.6152852120423558, 'epoch': 2.99} 100%|██████████| 396/396 [3:12:33<00:00, 29.24s/it] 100%|██████████| 396/396 [3:12:33<00:00, 29.18s/it] 2024-03-15 08:38:35.647 n213-019-134:3514428:3516109 [4] NCCL INFO [Service thread] Connection closed by localRank 4 2024-03-15 08:38:35.647 n213-019-134:3514426:3516113 [3] NCCL INFO [Service thread] Connection closed by localRank 4 2024-03-15 08:38:35.647 n213-019-134:3514429:3516108 [5] NCCL INFO [Service thread] Connection closed by localRank 4 2024-03-15 08:38:35.794 n213-019-134:3514423:3516115 [0] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-15 08:38:35.794 n213-019-134:3514431:3516114 [7] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-15 08:38:35.794 n213-019-134:3514430:3516111 [6] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-15 08:38:35.838 n213-019-134:3514425:3516110 [2] NCCL INFO [Service thread] Connection closed by localRank 3 2024-03-15 08:38:35.838 n213-019-134:3514428:3516109 [4] NCCL INFO [Service thread] Connection closed by localRank 3 2024-03-15 08:38:35.838 n213-019-134:3514426:3516113 [3] NCCL INFO [Service thread] Connection closed by localRank 3 2024-03-15 08:38:36.050 n213-019-134:3514425:3516110 [2] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-15 08:38:36.050 n213-019-134:3514424:3516112 [1] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-15 08:38:36.050 n213-019-134:3514426:3516113 [3] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-15 08:38:36.124 n213-019-134:3514426:3514426 [3] NCCL INFO comm 0x80ceaa80 rank 3 nranks 8 cudaDev 3 busId 4e000 - Abort COMPLETE 2024-03-15 08:38:36.150 n213-019-134:3514423:3516115 [0] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-15 08:38:36.150 n213-019-134:3514424:3516112 [1] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-15 08:38:36.150 n213-019-134:3514425:3516110 [2] NCCL INFO [Service thread] Connection closed by localRank 1 2024-03-15 08:38:36.165 n213-019-134:3514428:3516109 [4] NCCL INFO [Service thread] Connection closed by localRank 5 2024-03-15 08:38:36.165 n213-019-134:3514430:3516111 [6] NCCL INFO [Service thread] Connection closed by localRank 5 2024-03-15 08:38:36.165 n213-019-134:3514429:3516108 [5] NCCL INFO [Service thread] Connection closed by localRank 5 2024-03-15 08:38:36.171 n213-019-134:3514429:3516108 [5] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-15 08:38:36.171 n213-019-134:3514430:3516111 [6] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-15 08:38:36.171 n213-019-134:3514431:3516114 [7] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-15 08:38:36.442 n213-019-134:3514425:3514425 [2] NCCL INFO comm 0x8124a250 rank 2 nranks 8 cudaDev 2 busId 4a000 - Abort COMPLETE 2024-03-15 08:38:36.453 n213-019-134:3514428:3514428 [4] NCCL INFO comm 0x80215290 rank 4 nranks 8 cudaDev 4 busId 89000 - Abort COMPLETE 2024-03-15 08:38:36.456 n213-019-134:3514429:3514429 [5] NCCL INFO comm 0x80f34d80 rank 5 nranks 8 cudaDev 5 busId 8e000 - Abort COMPLETE 2024-03-15 08:38:36.838 n213-019-134:3514430:3514430 [6] NCCL INFO comm 0xbf0872e0 rank 6 nranks 8 cudaDev 6 busId c5000 - Abort COMPLETE 2024-03-15 08:38:37.692 n213-019-134:3514424:3515796 [1] NCCL INFO [Service thread] Connection closed by localRank 2 2024-03-15 08:38:38.126 n213-019-134:3514431:3515798 [7] NCCL INFO [Service thread] Connection closed by localRank 6 2024-03-15 08:38:39.958 n213-019-134:3514431:3514431 [7] NCCL INFO comm 0x7fdb9640 rank 7 nranks 8 cudaDev 7 busId c9000 - Abort COMPLETE 2024-03-15 08:38:40.910 n213-019-134:3514424:3514424 [1] NCCL INFO comm 0x811abec0 rank 1 nranks 8 cudaDev 1 busId 16000 - Abort COMPLETE 2024-03-15 08:38:41.147 n213-019-134:3514423:3515802 [0] NCCL INFO [Service thread] Connection closed by localRank 7 2024-03-15 08:38:42.356 n213-019-134:3514423:3515802 [0] NCCL INFO [Service thread] Connection closed by localRank 1 wandb: Waiting for W&B process to finish... (success). wandb: - 0.017 MB of 0.017 MB uploaded (0.000 MB deduped) wandb: \ 0.017 MB of 0.094 MB uploaded (0.000 MB deduped) wandb: wandb: Run history: wandb: train/epoch ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇▇███ wandb: train/global_step ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███ wandb: train/learning_rate ▂▃▅▇███▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▃▃▃▃▃▂▂▂▂▂▁▁▁ wandb: train/logps/chosen ▇█▅▆▆█▆▄█▇█▇▅▅▇▇▅▆▆▇▅▆▆▅▄▅▅▄▅▄▁▄▅▃▃▆▃▃▂▃ wandb: train/logps/rejected ▇▇▆▆▅█▇▆▇█▇█▆▆▆▅▇▆▆▆▅▅▆▃▃▄▅▂▃▂▁▃▃▂▂▄▁▂▁▂ wandb: train/loss ██████████▇▇▇▇▆▆▅▆▅▅▅▄▅▄▃▄▄▂▂▂▄▁▂▂▃▂▂▁▃▃ wandb: train/losses/dpo ▆▇▆▇▆▆▆▆▆▆▅▆▆▅▅▄▄▅▄▃▄▄█▃▅▂▄▁▅▇▁▂▃▄▁▂▃▅▁▇ wandb: train/losses/sft ▃▄▂▂▂▁▅▂▂▃▂▄▄▂▁▁▃▃▃▃▃▃▄▅▅▆▅▄▄█▅▄▅▆▆▅▇▇▆▇ wandb: train/losses/total ▆▇▆▇▆▆▆▆▆▆▅▆▆▅▅▄▄▅▄▃▄▄█▃▅▂▄▁▅▇▁▂▃▄▁▂▃▅▁▇ wandb: train/ref_logps/chosen ▆█▃▄▄▇▄▁▇▆█▅▁▂▆▅▃▅▄▆▅▆▆▅▄▅▆▅▇▅▂▆▇▅▄█▄▄▃▄ wandb: train/ref_logps/rejected ▅▅▃▂▁▇▄▁▄█▄█▃▂▄▂█▆▆▆▃▄▇▂▃▅█▃▄▄▂▅▅▅▃█▃▅▂▄ wandb: train/rewards/accuracies ▂▁▂▃▄▄▅▆▅▃▅▇▄▅▇▆█▆▇▇▆█▇▇▇▅▇▇█▇▇█▇▇▇█▇█▇▆ wandb: train/rewards/chosen █████████████████▇▇▇▆▆▅▅▄▄▄▄▃▂▁▃▃▁▂▃▁▃▁▂ wandb: train/rewards/margins ▁▁▁▁▁▁▁▁▁▁▁▂▁▂▂▃▃▂▃▃▃▄▄▅▆▅▅▇▇▇▆█▇▇▇▇▇█▇▆ wandb: train/rewards/rejected █████████████▇▇▇▇▇▆▆▆▅▅▄▄▄▄▂▂▂▂▂▂▁▂▂▁▁▁▂ wandb: train/total_flos ▁ wandb: train/train_loss ▁ wandb: train/train_runtime ▁ wandb: train/train_samples_per_second ▁ wandb: train/train_steps_per_second ▁ wandb: wandb: Run summary: wandb: train/epoch 2.99 wandb: train/global_step 396 wandb: train/learning_rate 0.0 wandb: train/logps/chosen -28.8452 wandb: train/logps/rejected -36.77953 wandb: train/loss 0.5692 wandb: train/losses/dpo 0.70089 wandb: train/losses/sft 1.12003 wandb: train/losses/total 0.70089 wandb: train/ref_logps/chosen -23.5947 wandb: train/ref_logps/rejected -27.21145 wandb: train/rewards/accuracies 0.70312 wandb: train/rewards/chosen -0.52505 wandb: train/rewards/margins 0.43176 wandb: train/rewards/rejected -0.95681 wandb: train/total_flos 0.0 wandb: train/train_loss 0.61529 wandb: train/train_runtime 11562.7876 wandb: train/train_samples_per_second 4.4 wandb: train/train_steps_per_second 0.034 wandb: wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20240315_052545-run_20240315_562c49c8/logs 2024-03-15 08:39:00.546 n213-019-134:3514423:3516115 [0] NCCL INFO [Service thread] Connection closed by localRank 0 2024-03-15 08:39:01.195 n213-019-134:3514423:3514423 [0] NCCL INFO comm 0xb195a5c0 rank 0 nranks 8 cudaDev 0 busId 10000 - Abort COMPLETE