LLM Fine Tuning Parameters
class autotrain.trainers.clm.params.LLMTrainingParams
< source >( model: str = 'gpt2' project_name: str = 'project-name' data_path: str = 'data' train_split: str = 'train' valid_split: Optional = None add_eos_token: bool = True block_size: Union = -1 model_max_length: int = 2048 padding: Optional = 'right' trainer: str = 'default' use_flash_attention_2: bool = False log: str = 'none' disable_gradient_checkpointing: bool = False logging_steps: int = -1 eval_strategy: str = 'epoch' save_total_limit: int = 1 auto_find_batch_size: bool = False mixed_precision: Optional = None lr: float = 3e-05 epochs: int = 1 batch_size: int = 2 warmup_ratio: float = 0.1 gradient_accumulation: int = 4 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 chat_template: Optional = None quantization: Optional = 'int4' target_modules: Optional = 'all-linear' merge_adapter: bool = False peft: bool = False lora_r: int = 16 lora_alpha: int = 32 lora_dropout: float = 0.05 model_ref: Optional = None dpo_beta: float = 0.1 max_prompt_length: int = 128 max_completion_length: Optional = None prompt_text_column: Optional = None text_column: str = 'text' rejected_text_column: Optional = None push_to_hub: bool = False username: Optional = None token: Optional = None unsloth: bool = False distributed_backend: Optional = None )
Parameters
- model (str) — Model name to be used for training. Default is “gpt2”.
- project_name (str) — Name of the project and output directory. Default is “project-name”.
- data_path (str) — Path to the dataset. Default is “data”.
- train_split (str) — Configuration for the training data split. Default is “train”.
- valid_split (Optional[str]) — Configuration for the validation data split. Default is None.
- add_eos_token (bool) — Whether to add an EOS token at the end of sequences. Default is True.
- block_size (Union[int, List[int]]) — Size of the blocks for training, can be a single integer or a list of integers. Default is -1.
- model_max_length (int) — Maximum length of the model input. Default is 2048.
- padding (Optional[str]) — Side on which to pad sequences (left or right). Default is “right”.
- trainer (str) — Type of trainer to use. Default is “default”.
- use_flash_attention_2 (bool) — Whether to use flash attention version 2. Default is False.
- log (str) — Logging method for experiment tracking. Default is “none”.
- disable_gradient_checkpointing (bool) — Whether to disable gradient checkpointing. Default is False.
- logging_steps (int) — Number of steps between logging events. Default is -1.
- eval_strategy (str) — Strategy for evaluation (e.g., ‘epoch’). Default is “epoch”.
- save_total_limit (int) — Maximum number of checkpoints to keep. Default is 1.
- auto_find_batch_size (bool) — Whether to automatically find the optimal batch size. Default is False.
- mixed_precision (Optional[str]) — Type of mixed precision to use (e.g., ‘fp16’, ‘bf16’, or None). Default is None.
- lr (float) — Learning rate for training. Default is 3e-5.
- epochs (int) — Number of training epochs. Default is 1.
- batch_size (int) — Batch size for training. Default is 2.
- warmup_ratio (float) — Proportion of training to perform learning rate warmup. Default is 0.1.
- gradient_accumulation (int) — Number of steps to accumulate gradients before updating. Default is 4.
- optimizer (str) — Optimizer to use for training. Default is “adamw_torch”.
- scheduler (str) — Learning rate scheduler to use. Default is “linear”.
- weight_decay (float) — Weight decay to apply to the optimizer. Default is 0.0.
- max_grad_norm (float) — Maximum norm for gradient clipping. Default is 1.0.
- seed (int) — Random seed for reproducibility. Default is 42.
- chat_template (Optional[str]) — Template for chat-based models, options include: None, zephyr, chatml, or tokenizer. Default is None.
- quantization (Optional[str]) — Quantization method to use (e.g., ‘int4’, ‘int8’, or None). Default is “int4”.
- target_modules (Optional[str]) — Target modules for quantization or fine-tuning. Default is “all-linear”.
- merge_adapter (bool) — Whether to merge the adapter layers. Default is False.
- peft (bool) — Whether to use Parameter-Efficient Fine-Tuning (PEFT). Default is False.
- lora_r (int) — Rank of the LoRA matrices. Default is 16.
- lora_alpha (int) — Alpha parameter for LoRA. Default is 32.
- lora_dropout (float) — Dropout rate for LoRA. Default is 0.05.
- model_ref (Optional[str]) — Reference model for DPO trainer. Default is None.
- dpo_beta (float) — Beta parameter for DPO trainer. Default is 0.1.
- max_prompt_length (int) — Maximum length of the prompt. Default is 128.
- max_completion_length (Optional[int]) — Maximum length of the completion. Default is None.
- prompt_text_column (Optional[str]) — Column name for the prompt text. Default is None.
- text_column (str) — Column name for the text data. Default is “text”.
- rejected_text_column (Optional[str]) — Column name for the rejected text data. Default is None.
- push_to_hub (bool) — Whether to push the model to the Hugging Face Hub. Default is False.
- username (Optional[str]) — Hugging Face username for authentication. Default is None.
- token (Optional[str]) — Hugging Face token for authentication. Default is None.
- unsloth (bool) — Whether to use the unsloth library. Default is False.
- distributed_backend (Optional[str]) — Backend to use for distributed training. Default is None.
LLMTrainingParams: Parameters for training a language model using the autotrain library.
Task specific parameters
The length parameters used for different trainers can be different. Some require more context than others.
- block_size: This is the maximum sequence length or length of one block of text. Setting to -1 determines block size automatically. Default is -1.
- model_max_length: Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage. Default is 1024
- max_prompt_length: Specify the maximum length for prompts used in training, particularly relevant for tasks requiring initial contextual input. Used only for
orpo
anddpo
trainer. - max_completion_length: Completion length to use, for orpo: encoder-decoder models only. For dpo, it is the length of the completion text.
NOTE:
- block size cannot be greater than model_max_length!
- max_prompt_length cannot be greater than model_max_length!
- max_prompt_length cannot be greater than block_size!
- max_completion_length cannot be greater than model_max_length!
- max_completion_length cannot be greater than block_size!
NOTE: Not following these constraints will result in an error / nan losses.
Generic Trainer
--add_eos_token, --add-eos-token
Toggle whether to automatically add an End Of Sentence (EOS) token at the end of texts, which can be critical for certain
types of models like language models. Only used for `default` trainer
--block_size BLOCK_SIZE, --block-size BLOCK_SIZE
Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
-1 determines block size automatically. Default is -1.
--model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
Default is 1024
SFT Trainer
--block_size BLOCK_SIZE, --block-size BLOCK_SIZE
Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
-1 determines block size automatically. Default is -1.
--model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
Default is 1024
Reward Trainer
--block_size BLOCK_SIZE, --block-size BLOCK_SIZE
Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
-1 determines block size automatically. Default is -1.
--model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
Default is 1024
DPO Trainer
--dpo-beta DPO_BETA, --dpo-beta DPO_BETA
Beta for DPO trainer
--model-ref MODEL_REF
Reference model to use for DPO when not using PEFT
--block_size BLOCK_SIZE, --block-size BLOCK_SIZE
Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
-1 determines block size automatically. Default is -1.
--model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
Default is 1024
--max_prompt_length MAX_PROMPT_LENGTH, --max-prompt-length MAX_PROMPT_LENGTH
Specify the maximum length for prompts used in training, particularly relevant for tasks requiring initial contextual input.
Used only for `orpo` trainer.
--max_completion_length MAX_COMPLETION_LENGTH, --max-completion-length MAX_COMPLETION_LENGTH
Completion length to use, for orpo: encoder-decoder models only
ORPO Trainer
--block_size BLOCK_SIZE, --block-size BLOCK_SIZE
Specify the block size for processing sequences. This is maximum sequence length or length of one block of text. Setting to
-1 determines block size automatically. Default is -1.
--model_max_length MODEL_MAX_LENGTH, --model-max-length MODEL_MAX_LENGTH
Set the maximum length for the model to process in a single batch, which can affect both performance and memory usage.
Default is 1024
--max_prompt_length MAX_PROMPT_LENGTH, --max-prompt-length MAX_PROMPT_LENGTH
Specify the maximum length for prompts used in training, particularly relevant for tasks requiring initial contextual input.
Used only for `orpo` trainer.
--max_completion_length MAX_COMPLETION_LENGTH, --max-completion-length MAX_COMPLETION_LENGTH
Completion length to use, for orpo: encoder-decoder models only