|
--- |
|
{} |
|
--- |
|
language: en |
|
license: cc-by-4.0 |
|
tags: |
|
- text-classification |
|
repo: N.A. |
|
|
|
--- |
|
|
|
# Model Card for llama2-promt-av-binary-lora |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This model is trained as part of the coursework of COMP34812. |
|
|
|
This is a binary classification model that was trained with prompt input to |
|
detect whether two pieces of text were written by the same author. |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
This model is based on a Llama2 model that was fine-tuned |
|
on 30K pairs of texts for authorship verification. The model is fine-tuned with prompt inputs to utilize the model's linguistic knowledge. |
|
To run the model, the demo code is provided in demo.ipynb submitted. |
|
It is advised to use the pre-processing and post-processing functions (provided in demo.ipynb) along with the model for best results. |
|
|
|
- **Developed by:** Hei Chan and Mehedi Bari |
|
- **Language(s):** English |
|
- **Model type:** Supervised |
|
- **Model architecture:** Transformers |
|
- **Finetuned from model [optional]:** meta-llama/Llama-2-7b-hf |
|
|
|
### Model Resources |
|
|
|
<!-- Provide links where applicable. --> |
|
|
|
- **Repository:** https://huggingface.co/meta-llama/Llama-2-7b-hf |
|
- **Paper or documentation:** https://arxiv.org/abs/2307.09288 |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). --> |
|
|
|
30K pairs of texts drawn from emails, news articles and blog posts. |
|
|
|
### Training Procedure |
|
|
|
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
|
|
|
#### Training Hyperparameters |
|
|
|
<!-- This is a summary of the values of hyperparameters used in training the model. --> |
|
|
|
|
|
- learning_rate: 1e-05 |
|
- weight decay: 0.001 |
|
- train_batch_size: 2 |
|
- gradient accumulation steps: 4 |
|
- optimizer: paged_adamw_8bit |
|
- LoRA r: 64 |
|
- LoRA alpha: 128 |
|
- LoRA dropout: 0.05 |
|
- RSLoRA: True |
|
- max grad norm: 0.3 |
|
- eval_batch_size: 1 |
|
- num_epochs: 1 |
|
|
|
#### Speeds, Sizes, Times |
|
|
|
<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. --> |
|
|
|
|
|
- trained on: V100 16GB |
|
- overall training time: 59 hours |
|
- duration per training epoch: 59 hours |
|
- model size: ~27GB |
|
- LoRA adaptor size: 192 MB |
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
### Testing Data & Metrics |
|
|
|
#### Testing Data |
|
|
|
<!-- This should describe any evaluation data used (e.g., the development/validation set provided). --> |
|
|
|
The development set provided, amounting to 6K pairs. |
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used. --> |
|
|
|
|
|
- Precision |
|
- Recall |
|
- F1-score |
|
- Accuracy |
|
|
|
### Results |
|
|
|
|
|
- Precision: 80.6% |
|
- Recall: 80.4% |
|
- F1 score: 80.3% |
|
- Accuracy: 80.4% |
|
|
|
## Technical Specifications |
|
|
|
### Hardware |
|
|
|
|
|
- Mode: Inference |
|
- VRAM: at least 6 GB |
|
- Storage: at least 30 GB, |
|
- GPU: RTX3060 |
|
|
|
### Software |
|
|
|
|
|
- Transformers |
|
- Pytorch |
|
- bitesandbytes |
|
- Accelerate |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
Any inputs (concatenation of two sequences plus prompt words) longer than |
|
4096 subwords will be truncated by the model. |
|
|
|
|