|
|
|
license: other |
|
pipeline_tag: text-generation |
|
|
|
|
|
|
|
<p align="center"> |
|
<img src="logo_en.png" width="400"/> |
|
<p> |
|
|
|
<p align="center"> |
|
<b><font size="6">InternLM-XComposer2</font></b> |
|
<p> |
|
|
|
<div align="center"> |
|
|
|
[💻Github Repo](https://github.com/InternLM/InternLM-XComposer) |
|
|
|
[Paper](https://arxiv.org/abs/2401.16420) |
|
|
|
</div> |
|
|
|
**InternLM-XComposer2** is a vision-language large model (VLLM) based on [InternLM2](https://github.com/InternLM/InternLM) for advanced text-image comprehension and composition. |
|
|
|
We release InternLM-XComposer2 series in two versions: |
|
|
|
- InternLM-XComposer2-VL: The pretrained VLLM model with InternLM2 as the initialization of the LLM, achieving strong performance on various multimodal benchmarks. |
|
- InternLM-XComposer2: The finetuned VLLM for *Free-from Interleaved Text-Image Composition*. |
|
|
|
This is the 4-bit version of InternLM-XComposer2-VL |
|
|
|
## Quickstart |
|
We provide a simple example to show how to use InternLM-XComposer with 🤗 Transformers. |
|
```python |
|
import torch, auto_gptq |
|
from transformers import AutoModel, AutoTokenizer |
|
from auto_gptq.modeling import BaseGPTQForCausalLM |
|
|
|
auto_gptq.modeling._base.SUPPORTED_MODELS = ["internlm"] |
|
torch.set_grad_enabled(False) |
|
|
|
class InternLMXComposer2QForCausalLM(BaseGPTQForCausalLM): |
|
layers_block_name = "model.layers" |
|
outside_layer_modules = [ |
|
'vit', 'vision_proj', 'model.tok_embeddings', 'model.norm', 'output', |
|
] |
|
inside_layer_modules = [ |
|
["attention.wqkv.linear"], |
|
["attention.wo.linear"], |
|
["feed_forward.w1.linear", "feed_forward.w3.linear"], |
|
["feed_forward.w2.linear"], |
|
] |
|
|
|
# init model and tokenizer |
|
model = InternLMXComposer2QForCausalLM.from_quantized( |
|
'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True, device="cuda:0").eval() |
|
tokenizer = AutoTokenizer.from_pretrained( |
|
'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True) |
|
|
|
text = '<ImageHere>Please describe this image in detail.' |
|
image = 'examples/image1.webp' |
|
with torch.cuda.amp.autocast(): |
|
response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False) |
|
print(response) |
|
#The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regrets." |
|
#The quote is displayed in white text against a dark background. In the foreground, there are two silhouettes of people standing on a hill at sunset. |
|
#They appear to be hiking or climbing, as one of them is holding a walking stick. |
|
#The sky behind them is painted with hues of orange and purple, creating a beautiful contrast with the dark figures. |
|
|
|
``` |
|
|
|
### Import from Transformers |
|
To load the InternLM-XComposer2-VL-7B model using Transformers, use the following code: |
|
```python |
|
import torch |
|
from PIL import image |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
ckpt_path = "internlm/internlm-xcomposer2-vl-7b" |
|
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda() |
|
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error. |
|
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.float16, trust_remote_code=True).cuda() |
|
model = model.eval() |
|
``` |
|
|
|
### 通过 Transformers 加载 |
|
通过以下的代码加载 InternLM-XComposer2-VL-7B 模型 |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
ckpt_path = "internlm/internlm-xcomposer2-vl-7b" |
|
tokenizer = AutoTokenizer.from_pretrained(ckpt_path, trust_remote_code=True).cuda() |
|
# `torch_dtype=torch.float16` 可以令模型以 float16 精度加载,否则 transformers 会将模型加载为 float32,导致显存不足 |
|
model = AutoModelForCausalLM.from_pretrained(ckpt_path, torch_dtype=torch.float16, trust_remote_code=True).cuda() |
|
model = model.eval() |
|
``` |
|
|
|
### Open Source License |
|
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected]. |
|
|