---
license: gemma
base_model: google/gemma-2-2b
tags:
- trl
- sft
- generated_from_trainer
model-index:
- name: collapse_gemma-2-2b_hs2_replace_iter2_sftsd2
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# collapse_gemma-2-2b_hs2_replace_iter2_sftsd2

This model is a fine-tuned version of [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 1.5087
- Num Input Tokens Seen: 7865232

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-06
- train_batch_size: 8
- eval_batch_size: 16
- seed: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant_with_warmup
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:----:|:---------------:|:-----------------:|
| No log        | 0      | 0    | 1.3956          | 0                 |
| 1.7513        | 0.0345 | 5    | 1.3040          | 274624            |
| 1.4363        | 0.0690 | 10   | 1.1967          | 546200            |
| 1.2021        | 0.1035 | 15   | 1.1697          | 817584            |
| 1.0165        | 0.1380 | 20   | 1.1870          | 1088176           |
| 0.9588        | 0.1725 | 25   | 1.2338          | 1358336           |
| 0.856         | 0.2070 | 30   | 1.3377          | 1637992           |
| 0.6142        | 0.2415 | 35   | 1.3806          | 1911376           |
| 0.5705        | 0.2760 | 40   | 1.4600          | 2181176           |
| 0.5098        | 0.3105 | 45   | 1.5034          | 2462856           |
| 0.3225        | 0.3450 | 50   | 1.5081          | 2737752           |
| 0.3129        | 0.3795 | 55   | 1.5481          | 3012656           |
| 0.3444        | 0.4140 | 60   | 1.4783          | 3279744           |
| 0.2324        | 0.4485 | 65   | 1.4703          | 3547808           |
| 0.234         | 0.4830 | 70   | 1.4699          | 3817328           |
| 0.2621        | 0.5175 | 75   | 1.4305          | 4097184           |
| 0.1199        | 0.5520 | 80   | 1.4580          | 4367848           |
| 0.1915        | 0.5865 | 85   | 1.4274          | 4640592           |
| 0.2214        | 0.6210 | 90   | 1.4877          | 4922032           |
| 0.1506        | 0.6555 | 95   | 1.4413          | 5193088           |
| 0.1584        | 0.6900 | 100  | 1.4564          | 5464864           |
| 0.2169        | 0.7245 | 105  | 1.4504          | 5739032           |
| 0.1219        | 0.7589 | 110  | 1.4286          | 6012736           |
| 0.1687        | 0.7934 | 115  | 1.4840          | 6274808           |
| 0.1776        | 0.8279 | 120  | 1.4578          | 6548312           |
| 0.1197        | 0.8624 | 125  | 1.4703          | 6821112           |
| 0.1035        | 0.8969 | 130  | 1.4563          | 7098736           |
| 0.1298        | 0.9314 | 135  | 1.4510          | 7369552           |
| 0.0958        | 0.9659 | 140  | 1.4814          | 7640632           |


### Framework versions

- Transformers 4.44.0
- Pytorch 2.4.0+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1