instructblip-vicuna-7b-peft-lora
This model is a fine-tuned version of Salesforce/instructblip-vicuna-7b on the pantheon-prompts-dataset dataset. It achieves the following results on the evaluation set:
- Loss: 5.3583
Model Description
Project Overview
This model is part of a two-phase project aimed at automatic prompt engineering for text-to-image generation.
Current Phase: Supervised Fine-Tuning
- Status: Completed
- Input: Base prompt and an image
- Output: Enhanced prompt for image generation
- Purpose: Adapt the base model to generate improved prompts
Future Phase: Reinforcement Learning Fine-Tuning
- Status: Planned
- Method: Proximal Policy Optimization (PPO)
- Purpose: Further refine prompt quality
Ultimate Objective
- Accept a base prompt and a preferred generated image as input
- Automatically engineer an enhanced prompt
- Use the enhanced prompt to generate higher-quality images with the same text-to-image model
Checkpoint Information
This model checkpoint represents the completion of the Supervised Fine-Tuning phase (Phase 1) in the overall project.
Training Limitations
- Dataset Size: The model was trained on a limited dataset of 1,600 examples.
- Resource Constraints: Due to computational resource limitations, we were unable to use a larger training set.
- Potential Issues:
- The model may not have fully generalized to a wide range of inputs.
- There is a risk of overfitting to the training data.
- Caution: Users should be aware that the model's performance might be inconsistent on inputs that significantly differ from the training set.
Training and evaluation data
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_ratio: 0.1
- training_steps: 1000
How to use:
from transformers import (
BitsAndBytesConfig,
InstructBlipProcessor,
InstructBlipForConditionalGeneration,
)
# Define the quantization config
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
processor = InstructBlipProcessor.from_pretrained("Salesforce/instructblip-vicuna-7b", legacy=False)
processor.padding_side = "right"
processor.tokenizer.padding_side = "right"
model = InstructBlipForConditionalGeneration.from_pretrained(
"Salesforce/instructblip-vicuna-7b", quantization_config=bnb_config, device_map="auto"
)
model = PeftModelForCausalLM.from_pretrained(
model,
"NoyHanan/instructblip-vicuna-7b-peft-lora",
is_trainable=False,
adapter_name="lora_policy",
)
prompt = "<Base_Prompt>"
image = "<Image>"
inputs = self.base_processor(texts=prompt, images=[image]).to("cuda")
res = model.generate(
**inputs,
do_sample=True,
pad_token_id=processor.tokenizer.pad_token_id,
top_p=1.0,
top_k=0,
temperature=0.5,
)
enhanced_prompt = processor.decode(res, skip_special_tokens=True)
Framework versions
- PEFT 0.11.1
- Transformers 4.41.2
- Pytorch 2.3.1+cu121
- Datasets 2.19.2
- Tokenizers 0.19.1
- Downloads last month
- 0
Model tree for NoyHanan/instructblip-vicuna-7b-peft-lora
Base model
Salesforce/instructblip-vicuna-7b