metadata
license: other
pipeline_tag: text-generation
InternLM-XComposer2
InternLM-XComposer2 is a vision-language large model (VLLM) based on InternLM2 for advanced text-image comprehension and composition.
We release InternLM-XComposer2 series in two versions:
- InternLM-XComposer2-VL: The pretrained VLLM model with InternLM2 as the initialization of the LLM, achieving strong performance on various multimodal benchmarks.
- InternLM-XComposer2: The finetuned VLLM for Free-from Interleaved Text-Image Composition.
This is the 4-bit version of InternLM-XComposer2-VL, install the latest version of auto_gptq before using.
Quickstart
We provide a simple example to show how to use InternLM-XComposer with 🤗 Transformers.
import torch, auto_gptq
from transformers import AutoModel, AutoTokenizer
from auto_gptq.modeling import BaseGPTQForCausalLM
auto_gptq.modeling._base.SUPPORTED_MODELS = ["internlm"]
torch.set_grad_enabled(False)
class InternLMXComposer2QForCausalLM(BaseGPTQForCausalLM):
layers_block_name = "model.layers"
outside_layer_modules = [
'vit', 'vision_proj', 'model.tok_embeddings', 'model.norm', 'output',
]
inside_layer_modules = [
["attention.wqkv.linear"],
["attention.wo.linear"],
["feed_forward.w1.linear", "feed_forward.w3.linear"],
["feed_forward.w2.linear"],
]
# init model and tokenizer
model = InternLMXComposer2QForCausalLM.from_quantized(
'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True, device="cuda:0").eval()
tokenizer = AutoTokenizer.from_pretrained(
'internlm/internlm-xcomposer2-vl-7b-4bit', trust_remote_code=True)
text = '<ImageHere>Please describe this image in detail.'
image = 'examples/image1.webp'
with torch.cuda.amp.autocast():
response, _ = model.chat(tokenizer, query=query, image=image, history=[], do_sample=False)
print(response)
#The image features a quote by Oscar Wilde, "Live life with no excuses, travel with no regrets."
#The quote is displayed in white text against a dark background. In the foreground, there are two silhouettes of people standing on a hill at sunset.
#They appear to be hiking or climbing, as one of them is holding a walking stick.
#The sky behind them is painted with hues of orange and purple, creating a beautiful contrast with the dark figures.
Open Source License
The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected].