Cyrus1020
/

llama2-prompt-av-binary-lora

Text Classification

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

llama2-prompt-av-binary-lora / README.md

Cyrus1020's picture

Update README.md

add42cd verified 7 months ago

|

3.62 kB

	---
	{}
	---
	language: en
	license: cc-by-4.0
	tags:
	- text-classification
	repo: N.A.

	---

	# Model Card for llama2-promt-av-binary-lora

	<!-- Provide a quick summary of what the model is/does. -->

	This model is trained as part of the coursework of COMP34812.

	This is a binary classification model that was trained with prompt input to
	detect whether two pieces of text were written by the same author.


	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This model is based on a Llama2 model that was fine-tuned
	on 30K pairs of texts for authorship verification. The model is fine-tuned with prompt inputs to utilize the model's linguistic knowledge.
	To run the model, the demo code is provided in demo.ipynb submitted.
	It is advised to use the pre-processing and post-processing functions (provided in demo.ipynb) along with the model for best results.

	- Developed by: Hei Chan and Mehedi Bari
	- Language(s): English
	- Model type: Supervised
	- Model architecture: Transformers
	- Finetuned from model [optional]: meta-llama/Llama-2-7b-hf

	### Model Resources

	<!-- Provide links where applicable. -->

	- Repository: https://huggingface.co/meta-llama/Llama-2-7b-hf
	- Paper or documentation: https://arxiv.org/abs/2307.09288

	## Training Details

	### Training Data

	<!-- This is a short stub of information on the training data that was used, and documentation related to data pre-processing or additional filtering (if applicable). -->

	30K pairs of texts drawn from emails, news articles and blog posts.

	### Training Procedure

	<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

	#### Training Hyperparameters

	<!-- This is a summary of the values of hyperparameters used in training the model. -->


	- learning_rate: 1e-05
	- weight decay: 0.001
	- train_batch_size: 2
	- gradient accumulation steps: 4
	- optimizer: paged_adamw_8bit
	- LoRA r: 64
	- LoRA alpha: 128
	- LoRA dropout: 0.05
	- RSLoRA: True
	- max grad norm: 0.3
	- eval_batch_size: 1
	- num_epochs: 1

	#### Speeds, Sizes, Times

	<!-- This section provides information about how roughly how long it takes to train the model and the size of the resulting model. -->


	- trained on: V100 16GB
	- overall training time: 59 hours
	- duration per training epoch: 59 hours
	- model size: ~27GB
	- LoRA adaptor size: 192 MB

	## Evaluation

	<!-- This section describes the evaluation protocols and provides the results. -->

	### Testing Data & Metrics

	#### Testing Data

	<!-- This should describe any evaluation data used (e.g., the development/validation set provided). -->

	The development set provided, amounting to 6K pairs.

	#### Metrics

	<!-- These are the evaluation metrics being used. -->


	- Precision
	- Recall
	- F1-score
	- Accuracy

	### Results


	- Precision: 80.6%
	- Recall: 80.4%
	- F1 score: 80.3%
	- Accuracy: 80.4%

	## Technical Specifications

	### Hardware


	- Mode: Inference
	- VRAM: at least 6 GB
	- Storage: at least 30 GB,
	- GPU: RTX3060

	### Software


	- Transformers
	- Pytorch
	- bitesandbytes
	- Accelerate

	## Bias, Risks, and Limitations

	<!-- This section is meant to convey both technical and sociotechnical limitations. -->

	Any inputs (concatenation of two sequences plus prompt words) longer than
	4096 subwords will be truncated by the model.