Cyrus1020's picture
Update README.md
add42cd verified
|
raw
history blame
3.62 kB
---
{}
---
language: en
license: cc-by-4.0
tags:
- text-classification
repo: N.A.
---
# Model Card for llama2-promt-av-binary-lora
<!-- Provide a quick summary of what the model is/does. -->
This model is trained as part of the coursework of COMP34812.
This is a binary classification model that was trained with prompt input to
detect whether two pieces of text were written by the same author.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This model is based on a Llama2 model that was fine-tuned
on 30K pairs of texts for authorship verification. The model is fine-tuned with prompt inputs to utilize the model's linguistic knowledge.
To run the model, the demo code is provided in demo.ipynb submitted.
It is advised to use the pre-processing and post-processing functions (provided in demo.ipynb) along with the model for best results.
- **Developed by:** Hei Chan and Mehedi Bari
- **Language(s):** English
- **Model type:** Supervised
- **Model architecture:** Transformers
- **Finetuned from model [optional]:** meta-llama/Llama-2-7b-hf
### Model Resources
<!-- Provide links where applicable. -->
- **Repository:** https://huggingface.co/meta-llama/Llama-2-7b-hf
- **Paper or documentation:** https://arxiv.org/abs/2307.09288
## Training Details
### Training Data
<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->
30K pairs of texts drawn from emails, news articles and blog posts.
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
#### Training Hyperparameters
<!-- This is a summary of the values of hyperparameters used in training the model. -->
- learning_rate: 1e-05
- weight decay: 0.001
- train_batch_size: 2
- gradient accumulation steps: 4
- optimizer: paged_adamw_8bit
- LoRA r: 64
- LoRA alpha: 128
- LoRA dropout: 0.05
- RSLoRA: True
- max grad norm: 0.3
- eval_batch_size: 1
- num_epochs: 1
#### Speeds, Sizes, Times
<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->
- trained on: V100 16GB
- overall training time: 59 hours
- duration per training epoch: 59 hours
- model size: ~27GB
- LoRA adaptor size: 192 MB
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data & Metrics
#### Testing Data
<!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->
The development set provided, amounting to 6K pairs.
#### Metrics
<!-- These are the evaluation metrics being used. -->
- Precision
- Recall
- F1-score
- Accuracy
### Results
- Precision: 80.6%
- Recall: 80.4%
- F1 score: 80.3%
- Accuracy: 80.4%
## Technical Specifications
### Hardware
- Mode: Inference
- VRAM: at least 6 GB
- Storage: at least 30 GB,
- GPU: RTX3060
### Software
- Transformers
- Pytorch
- bitesandbytes
- Accelerate
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
Any inputs (concatenation of two sequences plus prompt words) longer than
4096 subwords will be truncated by the model.