File size: 1,533 Bytes
3c27959
3fd9247
67f086b
d4a872e
3c27959
 
fdfa7cf
 
3c27959
 
 
fdfa7cf
3c27959
fdfa7cf
3c27959
fdfa7cf
 
 
3c27959
fdfa7cf
 
3c27959
fdfa7cf
ecae096
3c27959
fdfa7cf
 
 
 
 
3c27959
 
 
 
 
fdfa7cf
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
library_name: transformers
license: bsd-3-clause
pipeline_tag: image-to-text
---

# BrainBLIP
**This model is not ready for production use and is in preliminary stages of training. Use at your own risks**

### Model Description

BrainBLIP is finetuned to give more natural captions for training text-to-image datasets with an emphasis on natural language while adding a minimal amount of tags for context.

## How to Get Started with the Model

```py
from transformers import AutoProcessor, BlipForConditionalGeneration
from PIL import Image

processor = AutoProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("braintacles/brainblip").to("cuda")

image_path_or_url = r"https://imagePath_or_url.jpg"
raw_image = Image.open(requests.get(image_path_or_url, stream=True).raw) if image_path_or_url.startswith("http") else Image.open(image_path_or_url)

inputs = processor(raw_image, return_tensors="pt").to("cuda")
out = model.generate(**inputs, min_length=40, max_new_tokens=75, num_beams=5, repetition_penalty=1.40)
caption = processor.decode(out[0], skip_special_tokens=True)
print(caption)
```

## Training Details

### Training Data

All captions for this data have been written by myself by hand with some occasional help from GPT4.
Very special thanks to the following people who also have contributed a huge amount of time hand captioning some data:
- [Temporarium](https://civitai.com/user/Temporarium)
- [HailoKnight](https://civitai.com/user/HailoKnight)