Licence
license inherited from Salesforce/blip-image-captioning-base
Overview
ifmain/blip-image2promt-stable-diffusion-base
is a model based on Salesforce/blip-image-captioning-base, trained on the Ar4ikov/civitai-sd-337k dataset (2K images). This model is designed to generate text descriptions of images in the style of prompts for use with Stable Diffusion models.
I used my Blip training code: BLIP-Easy-Trainer
Example Usage
import torch
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration
import re
def prepare(text):
text = text.replace('. ','.').replace(' .','.')
text = text.replace('( ','(').replace(' (','(')
text = text.replace(') ',')').replace(' )',')')
text = text.replace(': ',':').replace(' :',':')
text = text.replace('_ ','_').replace(' _','_')
text = text.replace(',(())','').replace('(()),','')
for i in range(10):
text = text.replace(')))','))').replace('(((','((')
text = re.sub(r'<[^>]*>', '', text)
return text
path_to_model = "ifmain/blip-image2promt-stable-diffusion-base"
processor = BlipProcessor.from_pretrained(path_to_model)
model = BlipForConditionalGeneration.from_pretrained(path_to_model, torch_dtype=torch.float16).to("cuda")
img_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/demo.jpg'
raw_image = Image.open(requests.get(img_url, stream=True).raw).convert('RGB')
# unconditional image captioning
inputs = processor(raw_image, return_tensors="pt").to("cuda", torch.float16)
out = model.generate(**inputs, max_new_tokens=100)
out_txt = processor.decode(out[0], skip_special_tokens=True)
print(prepare(out_txt)) # woman sitting on the beach at sunset, rear view,((happy)),((happy)),((dog)),((mixed)),(()),((
Addition
This model support SFW and NSFW content
- Downloads last month
- 19
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ifmain/blip-image2promt-stable-diffusion-base
Base model
Salesforce/blip-image-captioning-base