THUDM
/

cogvlm-chat-hf

Text Generation

Transformers

Safetensors

English

custom_code

Model card Files Files and versions Community

chenkq commited on Nov 21, 2023

Commit

a93e1d3

•

1 Parent(s): e5624c0

Update README.md

Browse files

Files changed (1) hide show

README.md +65 -2

README.md CHANGED Viewed

@@ -15,6 +15,20 @@ language:
 # 快速开始（Qiuckstart）
 ```python
 import torch
 import requests
@@ -31,7 +45,6 @@ model = AutoModelForCausalLM.from_pretrained(
 # chat example
 query = 'Describe this image'
 image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB')
 inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image])  # chat mode
@@ -56,7 +69,6 @@ with torch.no_grad():
 # vqa example
 query = 'How many houses are there in this cartoon?'
 image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/3.jpg?raw=true', stream=True).raw).convert('RGB')
 inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image], template_version='vqa')   # vqa mode
@@ -76,6 +88,57 @@ with torch.no_grad():
 # 4</s>
 ```
 # 方法（Method）
 CogVLM 模型包括四个基本组件：视觉变换器（ViT）编码器、MLP适配器、预训练的大型语言模型（GPT）和一个**视觉专家模块**。更多细节请参见[Paper](https://github.com/THUDM/CogVLM/blob/main/assets/cogvlm-paper.pdf)。

 # 快速开始（Qiuckstart）
+硬件需求（hardware requirement）
+需要近 40GB GPU 显存用于模型推理。如果没有一整块GPU显存超过40GB，则需要使用accelerate的将模型切分到多个有较小显存的GPU设备上。
+40GB VRAM for inference. If there is no single GPU with more than 40GB of VRAM, you will need to use the "accelerate" library to dispatch the model into multiple GPUs with smaller VRAM.
+安装依赖（dependencies）
+```base
+pip install torch==2.1.0 transformers==4.35.0 accelerate==0.24.1 sentencepiece==0.1.99 einops==0.7.0 xformers==0.0.22.post7 triton==2.1.0
+```
+代码示例（example）
 ```python
 import torch
 import requests
 # chat example
 query = 'Describe this image'
 image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB')
 inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image])  # chat mode
 # vqa example
 query = 'How many houses are there in this cartoon?'
 image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/3.jpg?raw=true', stream=True).raw).convert('RGB')
 inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image], template_version='vqa')   # vqa mode
 # 4</s>
 ```
+当单卡显存不足时，可以将模型切分到多个小显存GPU上
+dispatch the model into multiple GPUs with smaller VRAM.
+```python
+import torch
+import requests
+from PIL import Image
+from transformers import AutoModelForCausalLM, LlamaTokenizer
+from accelerate import init_empty_weights, infer_auto_device_map, load_checkpoint_and_dispatch
+tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5')
+with init_empty_weights():
+    model = AutoModelForCausalLM.from_pretrained(
+        'THUDM/cogvlm-chat-hf',
+        torch_dtype=torch.bfloat16,
+        low_cpu_mem_usage=True,
+        trust_remote_code=True,
+    )
+device_map = infer_auto_device_map(model, max_memory={0:'20GiB',1:'20GiB','cpu':'16GiB'}, no_split_module_classes='CogVLMDecoderLayer')
+model = load_checkpoint_and_dispatch(
+    model,
+    'local/path/to/hf/version/chat/model',   # typical, '~/.cache/huggingface/hub/models--THUDM--cogvlm-chat-hf/snapshots/balabala'
+    device_map=device_map,
+)
+model = model.eval()
+# check device for weights if u want to
+for n, p in model.named_parameters():
+    print(f"{n}: {p.device}")
+# chat example
+query = 'Describe this image'
+image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB')
+inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image])  # chat mode
+inputs = {
+    'input_ids': inputs['input_ids'].unsqueeze(0).to('cuda'),
+    'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to('cuda'),
+    'attention_mask': inputs['attention_mask'].unsqueeze(0).to('cuda'),
+    'images': [[inputs['images'][0].to('cuda').to(torch.bfloat16)]],
+}
+gen_kwargs = {"max_length": 2048, "do_sample": False}
+with torch.no_grad():
+    outputs = model.generate(**inputs, **gen_kwargs)
+    outputs = outputs[:, inputs['input_ids'].shape[1]:]
+    print(tokenizer.decode(outputs[0]))
+```
 # 方法（Method）
 CogVLM 模型包括四个基本组件：视觉变换器（ViT）编码器、MLP适配器、预训练的大型语言模型（GPT）和一个**视觉专家模块**。更多细节请参见[Paper](https://github.com/THUDM/CogVLM/blob/main/assets/cogvlm-paper.pdf)。