File size: 4,034 Bytes
a855aa9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
---
{}
---
language: en
license: cc-by-4.0
tags:
- text-classification
repo: N.A.

---

# Model Card for y36340hc-z89079mb-AV

<!-- Provide a quick summary of what the model is/does. -->

This is a binary classification model that was trained with prompt input to
      detect whether two pieces of text were written by the same author.


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This model is based upon a Llama2 model that was fine-tuned
      on 30K pairs of texts for authorship verification. The model is trained with prompt inputs to utilize the model's linguistic knowledge.
      To run the model, the demo code is provided in demo.ipynb submitted.
      It is advised to use the pre-processing and post-processing functions (provided in demo.ipynb) along with the model for best results.

- **Developed by:** Hei Chan and Mehedi Bari
- **Language(s):** English
- **Model type:** Supervised
- **Model architecture:** Transformers
- **Finetuned from model [optional]:** meta-llama/Llama-2-7b-hf

### Model Resources

<!-- Provide links where applicable. -->

- **Repository:** https://huggingface.co/meta-llama/Llama-2-7b-hf
- **Paper or documentation:** https://arxiv.org/abs/2307.09288

## Training Details

### Training Data

<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->

30K pairs of texts drawn from emails, news articles and blog posts.

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Training Hyperparameters

<!-- This is a summary of the values of hyperparameters used in training the model. -->


      - learning_rate: 1e-05
      - weight decay: 0.001
      - train_batch_size: 2
      - gradient accumulation steps: 4
      - optimizer: paged_adamw_8bit
      - LoRA r: 64
      - LoRA alpha: 128
      - LoRA dropout: 0.05
      - RSLoRA: True
      - max grad norm: 0.3
      - eval_batch_size: 1
      - num_epochs: 1

#### Speeds, Sizes, Times

<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->


      - trained on: V100 16GB
      - overall training time: 59 hours
      - duration per training epoch: 59 minutes
      - model size: ~27GB
      - LoRA adaptor size: 192 MB

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data & Metrics

#### Testing Data

<!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->

The development set provided, amounting to 6K pairs.

#### Metrics

<!-- These are the evaluation metrics being used. -->


      - Precision
      - Recall
      - F1-score
      - Accuracy

### Results


      - Precision: 80.6%
      - Recall: 80.4%
      - F1 score: 80.3%
      - Accuracy: 80.4%

## Technical Specifications

### Hardware


      - Mode: Inference
      - VRAM: at least 6 GB
      - Storage: at least 30 GB,
      - GPU: RTX3060

### Software


      - Transformers 4.18.0
      - Pytorch 1.11.0+cu113

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

Any inputs (concatenation of two sequences plus prompt words) longer than
      4096 subwords will be truncated by the model.

## Additional Information

<!-- Any other information that would be useful for other people to know. -->

The hyperparameters were determined by experimentation
      with different values, such that the model could succesfully train on the V100 with a gradual decrease in training loss. Since LoRA is used, the Llama2 base model must also
      be loaded for the model to function, pre-trained Llama2 model access would need to be requested, access could be applied on https://huggingface.co/meta-llama.