File size: 3,619 Bytes

---
{}
---
language: en
license: cc-by-4.0
tags:
- text-classification
repo: N.A.

---

# Model Card for llama2-promt-av-binary-lora

<!-- Provide a quick summary of what the model is/does. -->

This model is trained as part of the coursework of COMP34812.

This is a binary classification model that was trained with prompt input to
      detect whether two pieces of text were written by the same author.


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This model is based on a Llama2 model that was fine-tuned
      on 30K pairs of texts for authorship verification. The model is fine-tuned with prompt inputs to utilize the model's linguistic knowledge.
      To run the model, the demo code is provided in demo.ipynb submitted.
      It is advised to use the pre-processing and post-processing functions (provided in demo.ipynb) along with the model for best results.

- **Developed by:** Hei Chan and Mehedi Bari
- **Language(s):** English
- **Model type:** Supervised
- **Model architecture:** Transformers
- **Finetuned from model [optional]:** meta-llama/Llama-2-7b-hf

### Model Resources

<!-- Provide links where applicable. -->

- **Repository:** https://huggingface.co/meta-llama/Llama-2-7b-hf
- **Paper or documentation:** https://arxiv.org/abs/2307.09288

## Training Details

### Training Data

<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->

30K pairs of texts drawn from emails, news articles and blog posts.

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Training Hyperparameters

<!-- This is a summary of the values of hyperparameters used in training the model. -->


      - learning_rate: 1e-05
      - weight decay: 0.001
      - train_batch_size: 2
      - gradient accumulation steps: 4
      - optimizer: paged_adamw_8bit
      - LoRA r: 64
      - LoRA alpha: 128
      - LoRA dropout: 0.05
      - RSLoRA: True
      - max grad norm: 0.3
      - eval_batch_size: 1
      - num_epochs: 1

#### Speeds, Sizes, Times

<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->


      - trained on: V100 16GB
      - overall training time: 59 hours
      - duration per training epoch: 59 hours
      - model size: ~27GB
      - LoRA adaptor size: 192 MB

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data & Metrics

#### Testing Data

<!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->

The development set provided, amounting to 6K pairs.

#### Metrics

<!-- These are the evaluation metrics being used. -->


      - Precision
      - Recall
      - F1-score
      - Accuracy

### Results


      - Precision: 80.6%
      - Recall: 80.4%
      - F1 score: 80.3%
      - Accuracy: 80.4%

## Technical Specifications

### Hardware


      - Mode: Inference
      - VRAM: at least 6 GB
      - Storage: at least 30 GB,
      - GPU: RTX3060

### Software


      - Transformers
      - Pytorch
      - bitesandbytes
      - Accelerate

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

Any inputs (concatenation of two sequences plus prompt words) longer than
      4096 subwords will be truncated by the model.