File size: 2,010 Bytes
4b6c14e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5a95ac7
 
4b6c14e
 
 
 
5a95ac7
4b6c14e
 
 
 
 
 
 
 
5a95ac7
 
4b6c14e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5a95ac7
 
 
4b6c14e
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
language:
- en
license: apache-2.0
base_model: pszemraj/tFINE-base-300m
tags:
- generated_from_trainer
datasets:
- samsum
metrics:
- rouge
model-index:
- name: tFINE-base-300m-samsum
  results:
  - task:
      name: Summarization
      type: summarization
    dataset:
      name: samsum
      type: samsum
      config: samsum
      split: None
      args: samsum
    metrics:
    - name: Rouge1
      type: rouge
      value: 42.3629
library_name: transformers
pipeline_tag: summarization
---

# tFINE-base-300m-samsum

An example fine-tune of [pszemraj/tFINE-base-300m](https://hf.co/pszemraj/tFINE-base-300m) for summarization using the samsum dataset.
It achieves the following results on the evaluation set:
- Loss: 1.9820
- Rouge1: 42.3629
- Rouge2: 18.4285
- Rougel: 34.6339
- Rougelsum: 38.7792
- Gen Len: 27.8033

> [!NOTE]
> The base model was pre-trained with CTX 1024 and fine-tuned on samsum with 1024 CTX inputs.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 16
- seed: 17868
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 4.0

### Training results

> keep epoch 3 checkpt as final


| Training Loss | Epoch  | Step | Validation Loss | Rouge1  | Rouge2  | Rougel  | Rougelsum | Gen Len |
|:-------------:|:------:|:----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
| 1.9528        | 0.9989 | 115  | 1.9189          | 40.093  | 18.2018 | 33.9749 | 36.9071   | 29.3333 |
| 1.5346        | 1.9978 | 230  | 1.8827          | 41.4676 | 18.3467 | 34.1909 | 38.2131   | 27.6633 |
| 1.1696        | 2.9967 | 345  | 1.9820          | 42.3629 | 18.4285 | 34.6339 | 38.7792   | 27.8033 |
| 0.9359        | 3.9957 | 460  | 2.1588          | 41.2237 | 17.8161 | 33.7101 | 37.9569   | 30.18   |