End of training
23b78c8
verified
-
attn_norm=None, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
Training in progress, step 99000
-
attn_norm=None, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
End of training
-
attn_norm=None, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
Training in progress, step 99000
-
attn_norm=None, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
Training in progress, step 49500
-
attn_norm=batchnorm, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
End of training
-
attn_norm=instancenorm, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
End of training
-
attn_norm=instancenorm, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
End of training
-
attn_norm=instancenorm, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
Training in progress, step 49500
-
attn_norm=instancenorm_teacher_only, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
Training in progress, step 49500
-
attn_norm=layernorm, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
Training in progress, step 99000
-
attn_norm=layernorm, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
Training in progress, step 49500
-
attn_norm=layernorm, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
Training in progress, step 99000
-
attn_norm=layernorm, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
End of training
-
attn_norm=layernorm_teacher_only, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
Training in progress, step 99000
-
attn_norm=layernorm_teacher_only, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
Training in progress, step 49500
-
attn_norm=layernorm_teacher_only, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
Training in progress, step 70000
-
attn_norm=layernorm_teacher_only, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
Training in progress, step 49500
-
attn_norm=rmsnorm, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
Training in progress, step 99000
-
attn_norm=rmsnorm, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
Training in progress, step 49500
-
attn_norm=rmsnorm, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
Training in progress, step 99000
-
attn_norm=rmsnorm, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
Training in progress, step 49500
-
attn_norm=rmsnorm_teacher_only, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
End of training
-
attn_norm=rmsnorm_teacher_only, attn_projector=mlp, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
Training in progress, step 49500
-
attn_norm=rmsnorm_teacher_only, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=4, warmup_ratio=0
Training in progress, step 99000
-
attn_norm=rmsnorm_teacher_only, attn_projector=orthogonal, attn_weight=5, learning_rate=0.0002, per_device_train_batch_size=8, warmup_ratio=0
Training in progress, step 49500
-
0 Bytes
End of training