|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. |
|
This model uses peft finetuning with NEFTune for robustness. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is a finetuned model of the [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1). |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
This model is finetuned with [kaist-ai/CoT-Collection](https://huggingface.co/datasets/kaist-ai/CoT-Collection). |
|
|
|
### Training Procedure |
|
|
|
This model trained with SFT trainer and [NEFTune](https://arxiv.org/abs/2310.05914) method. |
|
(According to the paper, NEFTune adds noise to the embedding vectors during training) |
|
|
|
#### Training Hyperparameters |
|
|
|
- lora alpha: 16 |
|
- lora r: 64 |
|
- lora dropout: 0.05 |
|
- max sequence length: 4096 |
|
- learning rate: 2e-4 |
|
- max_grad_norm: 0.3 |
|
- weight_decay: 0.001 |
|
- gradient checkpoint: True |
|
- optim: paged_adamw_32bit |
|
- use_bf16: True |
|
- use_4bit: True |
|
- use_nested_quant: False |
|
- bnb_4bit_compute_dtype: float16 |
|
- bnb_4bit_quant_type: nf4 |
|
|