File size: 3,687 Bytes
51cd4df
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d9ae534
51cd4df
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
language:
- ja
tags:
- heron
- vision
- image-captioning
- VQA
pipeline_tag: image-to-text
license:
- cc-by-nc-4.0
inference: false
---
# Heron BLIP Japanese StableLM Base 7B v1


## Model Details
Heron BLIP Japanese StableLM Base 7B is a vision-language model that can converse about input images.<br>
This model was trained using [the heron library](https://github.com/turingmotors/heron). Please refer to the code for details.


## Usage

Follow [the installation guide](https://github.com/turingmotors/heron/). 

```python
import torch
from heron.models.video_blip import VideoBlipForConditionalGeneration, VideoBlipProcessor
from transformers import LlamaTokenizer

device_id = 0
device = f"cuda:{device_id}"

MODEL_NAME = "turing-motors/heron-chat-blip-ja-stablelm-base-7b-v1"
    
model = VideoBlipForConditionalGeneration.from_pretrained(
    MODEL_NAME, torch_dtype=torch.float16, ignore_mismatched_sizes=True
)

model = model.half()
model.eval()
model.to(device)

# prepare a processor
processor = VideoBlipProcessor.from_pretrained("Salesforce/blip2-opt-2.7b")
tokenizer = LlamaTokenizer.from_pretrained("novelai/nerdstash-tokenizer-v1", additional_special_tokens=['▁▁'])
processor.tokenizer = tokenizer

import requests
from PIL import Image

# prepare inputs
url = "https://www.barnorama.com/wp-content/uploads/2016/12/03-Confusing-Pictures.jpg"
image = Image.open(requests.get(url, stream=True).raw)

text = f"##human: この画像の面白い点は何ですか?\n##gpt: "

# do preprocessing
inputs = processor(
    text=text,
    images=image,
    return_tensors="pt",
    truncation=True,
)

inputs = {k: v.to(device) for k, v in inputs.items()}
inputs["pixel_values"] = inputs["pixel_values"].to(device, torch.float16)

# set eos token
eos_token_id_list = [
    processor.tokenizer.pad_token_id,
    processor.tokenizer.eos_token_id,
    int(tokenizer.convert_tokens_to_ids("##"))
]

# do inference
with torch.no_grad():
    out = model.generate(**inputs, max_length=256, do_sample=False, temperature=0., eos_token_id=eos_token_id_list, no_repeat_ngram_size=2)

# print result
print(processor.tokenizer.batch_decode(out))
```


## Model Details
* **Developed by**: [Turing Inc.](https://www.turing-motors.com/)
* **Adaptor type**: [BLIP2](https://arxiv.org/abs/2301.12597)
* **Lamguage Model**: [Japanese StableLM Base Alpha](https://huggingface.co/stabilityai/japanese-stablelm-base-alpha-7b)
* **Language(s)**: Japanese

### Training
This model was fully fine-tuned with [LLaVA-Instruct-150K-JA](https://huggingface.co/datasets/turing-motors/LLaVA-Instruct-150K-JA).

### Training Dataset

- [LLaVA-Instruct-150K-JA](https://huggingface.co/datasets/turing-motors/LLaVA-Instruct-150K-JA)


## Use and Limitations

### Intended Use

This model is intended for use in chat-like applications and for research purposes.

### Limitations

The model may produce inaccurate or false information, and its accuracy is not guaranteed. It is still in the research and development stage.

## How to cite
```bibtex
@misc{BlipJapaneseStableLM, 
    url    = {[https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0](https://huggingface.co/turing-motors/heron-chat-blip-ja-stablelm-base-7b-v0)}, 
    title  = {Heron BLIP Japanese StableLM Base 7B}, 
    author = {Kotaro Tanahashi, Yuichi Inoue, and Yu Yamaguchi}
}
```

## Citations

```bibtex
@misc{JapaneseInstructBLIPAlpha, 
    url    = {[https://huggingface.co/stabilityai/japanese-instructblip-alpha](https://huggingface.co/stabilityai/japanese-instructblip-alpha)}, 
    title  = {Japanese InstructBLIP Alpha}, 
    author = {Shing, Makoto and Akiba, Takuya}
}
```

---
license: cc-by-nc-4.0
---