Update README.md
Browse files
README.md
CHANGED
@@ -15,6 +15,20 @@ language:
|
|
15 |
|
16 |
# 快速开始(Qiuckstart)
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
```python
|
19 |
import torch
|
20 |
import requests
|
@@ -31,7 +45,6 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
31 |
|
32 |
|
33 |
# chat example
|
34 |
-
|
35 |
query = 'Describe this image'
|
36 |
image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB')
|
37 |
inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image]) # chat mode
|
@@ -56,7 +69,6 @@ with torch.no_grad():
|
|
56 |
|
57 |
|
58 |
# vqa example
|
59 |
-
|
60 |
query = 'How many houses are there in this cartoon?'
|
61 |
image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/3.jpg?raw=true', stream=True).raw).convert('RGB')
|
62 |
inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image], template_version='vqa') # vqa mode
|
@@ -76,6 +88,57 @@ with torch.no_grad():
|
|
76 |
# 4</s>
|
77 |
```
|
78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
79 |
# 方法(Method)
|
80 |
|
81 |
CogVLM 模型包括四个基本组件:视觉变换器(ViT)编码器、MLP适配器、预训练的大型语言模型(GPT)和一个**视觉专家模块**。更多细节请参见[Paper](https://github.com/THUDM/CogVLM/blob/main/assets/cogvlm-paper.pdf)。
|
|
|
15 |
|
16 |
# 快速开始(Qiuckstart)
|
17 |
|
18 |
+
硬件需求(hardware requirement)
|
19 |
+
|
20 |
+
需要近 40GB GPU 显存用于模型推理。如果没有一整块GPU显存超过40GB,则需要使用accelerate的将模型切分到多个有较小显存的GPU设备上。
|
21 |
+
|
22 |
+
40GB VRAM for inference. If there is no single GPU with more than 40GB of VRAM, you will need to use the "accelerate" library to dispatch the model into multiple GPUs with smaller VRAM.
|
23 |
+
|
24 |
+
安装依赖(dependencies)
|
25 |
+
|
26 |
+
```base
|
27 |
+
pip install torch==2.1.0 transformers==4.35.0 accelerate==0.24.1 sentencepiece==0.1.99 einops==0.7.0 xformers==0.0.22.post7 triton==2.1.0
|
28 |
+
```
|
29 |
+
|
30 |
+
代码示例(example)
|
31 |
+
|
32 |
```python
|
33 |
import torch
|
34 |
import requests
|
|
|
45 |
|
46 |
|
47 |
# chat example
|
|
|
48 |
query = 'Describe this image'
|
49 |
image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB')
|
50 |
inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image]) # chat mode
|
|
|
69 |
|
70 |
|
71 |
# vqa example
|
|
|
72 |
query = 'How many houses are there in this cartoon?'
|
73 |
image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/3.jpg?raw=true', stream=True).raw).convert('RGB')
|
74 |
inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image], template_version='vqa') # vqa mode
|
|
|
88 |
# 4</s>
|
89 |
```
|
90 |
|
91 |
+
当单卡显存不足时,可以将模型切分到多个小显存GPU上
|
92 |
+
|
93 |
+
dispatch the model into multiple GPUs with smaller VRAM.
|
94 |
+
|
95 |
+
```python
|
96 |
+
import torch
|
97 |
+
import requests
|
98 |
+
from PIL import Image
|
99 |
+
from transformers import AutoModelForCausalLM, LlamaTokenizer
|
100 |
+
from accelerate import init_empty_weights, infer_auto_device_map, load_checkpoint_and_dispatch
|
101 |
+
|
102 |
+
tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5')
|
103 |
+
with init_empty_weights():
|
104 |
+
model = AutoModelForCausalLM.from_pretrained(
|
105 |
+
'THUDM/cogvlm-chat-hf',
|
106 |
+
torch_dtype=torch.bfloat16,
|
107 |
+
low_cpu_mem_usage=True,
|
108 |
+
trust_remote_code=True,
|
109 |
+
)
|
110 |
+
device_map = infer_auto_device_map(model, max_memory={0:'20GiB',1:'20GiB','cpu':'16GiB'}, no_split_module_classes='CogVLMDecoderLayer')
|
111 |
+
model = load_checkpoint_and_dispatch(
|
112 |
+
model,
|
113 |
+
'local/path/to/hf/version/chat/model', # typical, '~/.cache/huggingface/hub/models--THUDM--cogvlm-chat-hf/snapshots/balabala'
|
114 |
+
device_map=device_map,
|
115 |
+
)
|
116 |
+
model = model.eval()
|
117 |
+
|
118 |
+
# check device for weights if u want to
|
119 |
+
for n, p in model.named_parameters():
|
120 |
+
print(f"{n}: {p.device}")
|
121 |
+
|
122 |
+
# chat example
|
123 |
+
query = 'Describe this image'
|
124 |
+
image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB')
|
125 |
+
inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image]) # chat mode
|
126 |
+
inputs = {
|
127 |
+
'input_ids': inputs['input_ids'].unsqueeze(0).to('cuda'),
|
128 |
+
'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to('cuda'),
|
129 |
+
'attention_mask': inputs['attention_mask'].unsqueeze(0).to('cuda'),
|
130 |
+
'images': [[inputs['images'][0].to('cuda').to(torch.bfloat16)]],
|
131 |
+
}
|
132 |
+
gen_kwargs = {"max_length": 2048, "do_sample": False}
|
133 |
+
|
134 |
+
with torch.no_grad():
|
135 |
+
outputs = model.generate(**inputs, **gen_kwargs)
|
136 |
+
outputs = outputs[:, inputs['input_ids'].shape[1]:]
|
137 |
+
print(tokenizer.decode(outputs[0]))
|
138 |
+
```
|
139 |
+
|
140 |
+
|
141 |
+
|
142 |
# 方法(Method)
|
143 |
|
144 |
CogVLM 模型包括四个基本组件:视觉变换器(ViT)编码器、MLP适配器、预训练的大型语言模型(GPT)和一个**视觉专家模块**。更多细节请参见[Paper](https://github.com/THUDM/CogVLM/blob/main/assets/cogvlm-paper.pdf)。
|