Cyrus1020 commited on
Commit
a855aa9
1 Parent(s): 8df1548

Upload my_model_card.md

Browse files
Files changed (1) hide show
  1. my_model_card.md +144 -0
my_model_card.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ {}
3
+ ---
4
+ language: en
5
+ license: cc-by-4.0
6
+ tags:
7
+ - text-classification
8
+ repo: N.A.
9
+
10
+ ---
11
+
12
+ # Model Card for y36340hc-z89079mb-AV
13
+
14
+ <!-- Provide a quick summary of what the model is/does. -->
15
+
16
+ This is a binary classification model that was trained with prompt input to
17
+ detect whether two pieces of text were written by the same author.
18
+
19
+
20
+ ## Model Details
21
+
22
+ ### Model Description
23
+
24
+ <!-- Provide a longer summary of what this model is. -->
25
+
26
+ This model is based upon a Llama2 model that was fine-tuned
27
+ on 30K pairs of texts for authorship verification. The model is trained with prompt inputs to utilize the model's linguistic knowledge.
28
+ To run the model, the demo code is provided in demo.ipynb submitted.
29
+ It is advised to use the pre-processing and post-processing functions (provided in demo.ipynb) along with the model for best results.
30
+
31
+ - **Developed by:** Hei Chan and Mehedi Bari
32
+ - **Language(s):** English
33
+ - **Model type:** Supervised
34
+ - **Model architecture:** Transformers
35
+ - **Finetuned from model [optional]:** meta-llama/Llama-2-7b-hf
36
+
37
+ ### Model Resources
38
+
39
+ <!-- Provide links where applicable. -->
40
+
41
+ - **Repository:** https://huggingface.co/meta-llama/Llama-2-7b-hf
42
+ - **Paper or documentation:** https://arxiv.org/abs/2307.09288
43
+
44
+ ## Training Details
45
+
46
+ ### Training Data
47
+
48
+ <!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->
49
+
50
+ 30K pairs of texts drawn from emails, news articles and blog posts.
51
+
52
+ ### Training Procedure
53
+
54
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
55
+
56
+ #### Training Hyperparameters
57
+
58
+ <!-- This is a summary of the values of hyperparameters used in training the model. -->
59
+
60
+
61
+ - learning_rate: 1e-05
62
+ - weight decay: 0.001
63
+ - train_batch_size: 2
64
+ - gradient accumulation steps: 4
65
+ - optimizer: paged_adamw_8bit
66
+ - LoRA r: 64
67
+ - LoRA alpha: 128
68
+ - LoRA dropout: 0.05
69
+ - RSLoRA: True
70
+ - max grad norm: 0.3
71
+ - eval_batch_size: 1
72
+ - num_epochs: 1
73
+
74
+ #### Speeds, Sizes, Times
75
+
76
+ <!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->
77
+
78
+
79
+ - trained on: V100 16GB
80
+ - overall training time: 59 hours
81
+ - duration per training epoch: 59 minutes
82
+ - model size: ~27GB
83
+ - LoRA adaptor size: 192 MB
84
+
85
+ ## Evaluation
86
+
87
+ <!-- This section describes the evaluation protocols and provides the results. -->
88
+
89
+ ### Testing Data & Metrics
90
+
91
+ #### Testing Data
92
+
93
+ <!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->
94
+
95
+ The development set provided, amounting to 6K pairs.
96
+
97
+ #### Metrics
98
+
99
+ <!-- These are the evaluation metrics being used. -->
100
+
101
+
102
+ - Precision
103
+ - Recall
104
+ - F1-score
105
+ - Accuracy
106
+
107
+ ### Results
108
+
109
+
110
+ - Precision: 80.6%
111
+ - Recall: 80.4%
112
+ - F1 score: 80.3%
113
+ - Accuracy: 80.4%
114
+
115
+ ## Technical Specifications
116
+
117
+ ### Hardware
118
+
119
+
120
+ - Mode: Inference
121
+ - VRAM: at least 6 GB
122
+ - Storage: at least 30 GB,
123
+ - GPU: RTX3060
124
+
125
+ ### Software
126
+
127
+
128
+ - Transformers 4.18.0
129
+ - Pytorch 1.11.0+cu113
130
+
131
+ ## Bias, Risks, and Limitations
132
+
133
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
134
+
135
+ Any inputs (concatenation of two sequences plus prompt words) longer than
136
+ 4096 subwords will be truncated by the model.
137
+
138
+ ## Additional Information
139
+
140
+ <!-- Any other information that would be useful for other people to know. -->
141
+
142
+ The hyperparameters were determined by experimentation
143
+ with different values, such that the model could succesfully train on the V100 with a gradual decrease in training loss. Since LoRA is used, the Llama2 base model must also
144
+ be loaded for the model to function, pre-trained Llama2 model access would need to be requested, access could be applied on https://huggingface.co/meta-llama.