File size: 2,235 Bytes
a58bc5e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#!/bin/bash
#SBATCH -p g24
#SBATCH --job-name=myjob_shareGPT
#SBATCH --qos=high
#SBATCH --nodes=1                 # Number of nodes
#SBATCH --ntasks=1         # Number of tasks (one for each script)
#SBATCH --cpus-per-task=60
#SBATCH --gres=gpu:6
#SBATCH --array=1-1                      # Array range
# #SBATCH --output=./slurm_outputs/run_clm_job_%A_task_%a.out  # Standard output
#SBATCH --output=/dev/null     # Discard standard output  # Because we write to the log.txt file

# # Get the current date and time
current_time=$(date +"%d-%m_%H-%M")
OUTPUT_DIR="./training_outputs_job_${SLURM_ARRAY_JOB_ID}_${SLURM_ARRAY_TASK_ID}_${current_time}"
export DEFAULT_CONFIG_FILE="./config/config1.yaml"

while test $# -gt 0; do
    echo $1
    case "$1" in 
        --output_dir)
            shift
            OUTPUT_DIR=$1
            shift
            ;;
    esac
done

mkdir_is_exists() {
    if [ -d "$1" ]; then
        echo "Directory '$1' already exists."
    else
        mkdir -p "$1"
        echo "Directory '$1' created."
    fi
}


mkdir_is_exists $OUTPUT_DIR
mkdir_is_exists $OUTPUT_DIR/experiment_code
git log -n 1 > $OUTPUT_DIR/commit.txt
pip freeze > $OUTPUT_DIR/pip_freeze.txt
echo $0 $ARGS $current_time > $OUTPUT_DIR/cmd.txt
cp -r ./run_clm.py $OUTPUT_DIR/experiment_code
cp -r ./prepare_sharegpt.py $OUTPUT_DIR/experiment_code
cp -r config $OUTPUT_DIR/experiment_code
cp -r ./submit_job.sh $OUTPUT_DIR/experiment_code
cp -r ./requirements.txt $OUTPUT_DIR/experiment_code

# Define the Python scripts and their corresponding input files
declare -A scripts_and_inputs=(
        ["1"]="./config/config1.yaml"
    #  ["2"]="./config/config_redpajama.yaml"
    # ["3"]="./config/config1.yaml"
    # ["4"]="./config/config1.yaml"
    # ["5"]="./config/config1.yaml"
)

# Launch each script with its corresponding input file as a separate task
echo "Starting job array task: $SLURM_ARRAY_TASK_ID"
PARAMS="--output_dir $OUTPUT_DIR --logging_dir $OUTPUT_DIR --config_file ${scripts_and_inputs[$SLURM_ARRAY_TASK_ID]}"

srun --exclusive python run_clm.py $PARAMS 2>&1 | tee $OUTPUT_DIR/log.txt


# Wait for all background jobs to complete
wait

# Print a message indicating completion
echo "All Python scripts have been executed."