{}
language: en license: cc-by-4.0 tags:
- text-classification repo: N.A.
Model Card for llama2-promt-av-binary-lora
This is a binary classification model that was trained with prompt input to detect whether two pieces of text were written by the same author.
Model Details
Model Description
This model is based upon a Llama2 model that was fine-tuned on 30K pairs of texts for authorship verification. The model is trained with prompt inputs to utilize the model's linguistic knowledge. To run the model, the demo code is provided in demo.ipynb submitted. It is advised to use the pre-processing and post-processing functions (provided in demo.ipynb) along with the model for best results.
- Developed by: Hei Chan and Mehedi Bari
- Language(s): English
- Model type: Supervised
- Model architecture: Transformers
- Finetuned from model [optional]: meta-llama/Llama-2-7b-hf
Model Resources
- Repository: https://huggingface.co/meta-llama/Llama-2-7b-hf
- Paper or documentation: https://arxiv.org/abs/2307.09288
Training Details
Training Data
30K pairs of texts drawn from emails, news articles and blog posts.
Training Procedure
Training Hyperparameters
- learning_rate: 1e-05
- weight decay: 0.001
- train_batch_size: 2
- gradient accumulation steps: 4
- optimizer: paged_adamw_8bit
- LoRA r: 64
- LoRA alpha: 128
- LoRA dropout: 0.05
- RSLoRA: True
- max grad norm: 0.3
- eval_batch_size: 1
- num_epochs: 1
Speeds, Sizes, Times
- trained on: V100 16GB
- overall training time: 59 hours
- duration per training epoch: 59 minutes
- model size: ~27GB
- LoRA adaptor size: 192 MB
Evaluation
Testing Data & Metrics
Testing Data
The development set provided, amounting to 6K pairs.
Metrics
- Precision
- Recall
- F1-score
- Accuracy
Results
- Precision: 80.6%
- Recall: 80.4%
- F1 score: 80.3%
- Accuracy: 80.4%
Technical Specifications
Hardware
- Mode: Inference
- VRAM: at least 6 GB
- Storage: at least 30 GB,
- GPU: RTX3060
Software
- Transformers 4.18.0
- Pytorch 1.11.0+cu113
Bias, Risks, and Limitations
Any inputs (concatenation of two sequences plus prompt words) longer than 4096 subwords will be truncated by the model.
Additional Information
The hyperparameters were determined by experimentation with different values, such that the model could succesfully train on the V100 with a gradual decrease in training loss. Since LoRA is used, the Llama2 base model must also be loaded for the model to function, pre-trained Llama2 model access would need to be requested, access could be applied on https://huggingface.co/meta-llama.