FINGU-AI
/

FinguEm7b

@@ -237,160 +237,6 @@ You can finetune this model on your own dataset.
 *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
 -->
-## Training Details
-### Training Hyperparameters
-#### Non-Default Hyperparameters
-- `eval_strategy`: steps
-- `per_device_train_batch_size`: 2
-- `per_device_eval_batch_size`: 2
-- `gradient_accumulation_steps`: 8
-- `learning_rate`: 2e-05
-- `num_train_epochs`: 1
-- `lr_scheduler_type`: cosine
-- `warmup_ratio`: 0.1
-- `warmup_steps`: 5
-- `bf16`: True
-- `tf32`: True
-- `optim`: adamw_torch_fused
-- `gradient_checkpointing`: True
-- `gradient_checkpointing_kwargs`: {'use_reentrant': False}
-- `batch_sampler`: no_duplicates
-#### All Hyperparameters
-<details><summary>Click to expand</summary>
-- `overwrite_output_dir`: False
-- `do_predict`: False
-- `eval_strategy`: steps
-- `prediction_loss_only`: True
-- `per_device_train_batch_size`: 2
-- `per_device_eval_batch_size`: 2
-- `per_gpu_train_batch_size`: None
-- `per_gpu_eval_batch_size`: None
-- `gradient_accumulation_steps`: 8
-- `eval_accumulation_steps`: None
-- `learning_rate`: 2e-05
-- `weight_decay`: 0.0
-- `adam_beta1`: 0.9
-- `adam_beta2`: 0.999
-- `adam_epsilon`: 1e-08
-- `max_grad_norm`: 1.0
-- `num_train_epochs`: 1
-- `max_steps`: -1
-- `lr_scheduler_type`: cosine
-- `lr_scheduler_kwargs`: {}
-- `warmup_ratio`: 0.1
-- `warmup_steps`: 5
-- `log_level`: passive
-- `log_level_replica`: warning
-- `log_on_each_node`: True
-- `logging_nan_inf_filter`: True
-- `save_safetensors`: True
-- `save_on_each_node`: False
-- `save_only_model`: False
-- `restore_callback_states_from_checkpoint`: False
-- `no_cuda`: False
-- `use_cpu`: False
-- `use_mps_device`: False
-- `seed`: 42
-- `data_seed`: None
-- `jit_mode_eval`: False
-- `use_ipex`: False
-- `bf16`: True
-- `fp16`: False
-- `fp16_opt_level`: O1
-- `half_precision_backend`: auto
-- `bf16_full_eval`: False
-- `fp16_full_eval`: False
-- `tf32`: True
-- `local_rank`: 3
-- `ddp_backend`: None
-- `tpu_num_cores`: None
-- `tpu_metrics_debug`: False
-- `debug`: []
-- `dataloader_drop_last`: True
-- `dataloader_num_workers`: 0
-- `dataloader_prefetch_factor`: None
-- `past_index`: -1
-- `disable_tqdm`: False
-- `remove_unused_columns`: True
-- `label_names`: None
-- `load_best_model_at_end`: False
-- `ignore_data_skip`: False
-- `fsdp`: []
-- `fsdp_min_num_params`: 0
-- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
-- `fsdp_transformer_layer_cls_to_wrap`: None
-- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
-- `deepspeed`: None
-- `label_smoothing_factor`: 0.0
-- `optim`: adamw_torch_fused
-- `optim_args`: None
-- `adafactor`: False
-- `group_by_length`: False
-- `length_column_name`: length
-- `ddp_find_unused_parameters`: None
-- `ddp_bucket_cap_mb`: None
-- `ddp_broadcast_buffers`: False
-- `dataloader_pin_memory`: True
-- `dataloader_persistent_workers`: False
-- `skip_memory_metrics`: True
-- `use_legacy_prediction_loop`: False
-- `push_to_hub`: False
-- `resume_from_checkpoint`: None
-- `hub_model_id`: None
-- `hub_strategy`: every_save
-- `hub_private_repo`: False
-- `hub_always_push`: False
-- `gradient_checkpointing`: True
-- `gradient_checkpointing_kwargs`: {'use_reentrant': False}
-- `include_inputs_for_metrics`: False
-- `eval_do_concat_batches`: True
-- `fp16_backend`: auto
-- `push_to_hub_model_id`: None
-- `push_to_hub_organization`: None
-- `mp_parameters`:
-- `auto_find_batch_size`: False
-- `full_determinism`: False
-- `torchdynamo`: None
-- `ray_scope`: last
-- `ddp_timeout`: 1800
-- `torch_compile`: False
-- `torch_compile_backend`: None
-- `torch_compile_mode`: None
-- `dispatch_batches`: None
-- `split_batches`: None
-- `include_tokens_per_second`: False
-- `include_num_input_tokens_seen`: False
-- `neftune_noise_alpha`: None
-- `optim_target_modules`: None
-- `batch_eval_metrics`: False
-- `batch_sampler`: no_duplicates
-- `multi_dataset_batch_sampler`: proportional
-</details>
-### Training Logs
-| Epoch  | Step | Training Loss | reranking loss | retrival loss | sts loss |
-|:------:|:----:|:-------------:|:--------------:|:-------------:|:--------:|
-| 0.1958 | 500  | 0.5225        | 0.3536         | 0.0413        | 0.5239   |
-| 0.3916 | 1000 | 0.2167        | 0.2598         | 0.0386        | 0.4230   |
-| 0.5875 | 1500 | 0.1924        | 0.2372         | 0.0320        | 0.4046   |
-| 0.7833 | 2000 | 0.1795        | 0.2292         | 0.0310        | 0.4005   |
-| 0.9791 | 2500 | 0.1755        | 0.2276         | 0.0306        | 0.3995   |
-### Framework Versions
-- Python: 3.10.12
-- Sentence Transformers: 3.0.1
-- Transformers: 4.41.2
-- PyTorch: 2.2.0+cu121
-- Accelerate: 0.32.1
-- Datasets: 2.20.0
-- Tokenizers: 0.19.1
 ## Citation
 ### BibTeX

 *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
 -->
 ## Citation
 ### BibTeX