|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- ai4colonoscopy/ColonINST-v1 |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- microsoft/phi-1_5 |
|
library_name: adapter-transformers |
|
tags: |
|
- medical |
|
- colonoscopy |
|
- polyp |
|
--- |
|
|
|
# ColonGPT (A colonoscopy-specific multimodal Language Model) |
|
|
|
<p align="center"> |
|
<img src="./assert/ColonGPT.gif" width="666px"/> <br /> |
|
<em>Details of our multimodal language model, ColonGPT.</em> |
|
</p> |
|
|
|
π [Paper](https://arxiv.org) | π [Home](https://github.com/ai4colonoscopy/IntelliScope) |
|
|
|
> This is the merged weights of [ColonGPT-v1-phi1.5-siglip-lora](https://drive.google.com/drive/folders/1Emi7o7DpN0zlCPIYqsCfNMr9LTPt3SCT?usp=sharing), including vision encoder (siglip) + language model (phi-1.5), and other fine-tuned weights on our ColonINST. |
|
|
|
Our ColonGPT is a standard multimodal language model, which contains four basic components: a language tokenizer, an visual encoder (π€ [SigLIP-SO](https://huggingface.co/google/siglip-so400m-patch14-384)), a multimodal connector, and a language model (π€ [Phi1.5](https://huggingface.co/microsoft/phi-1_5)). In this huggingface page, we provide a quick start for convenient of new users. For further details about ColonGPT, we highly recommend visiting our [homepage](https://github.com/BAAI-DCAI/Bunny). There, you'll find comprehensive usage instructions for our model and the latest advancements in intelligent colonoscopy technology. |
|
|
|
|
|
# Quick start |
|
|
|
Here is a code snippet to show you how to quickly try-on our ColonGPT model with transformers. For convenience, we manually combined some configuration and code files and merged the weights. Please note that this is a quick code, we recommend you installing [ColonGPT's source code](https://github.com/ai4colonoscopy/IntelliScope/blob/main/docs/guideline-for-ColonGPT.md) to explore more. |
|
|
|
- Before running the snippet, you only need to install the following minimium dependencies. |
|
```shell |
|
conda create -n quickstart python=3.10 |
|
conda activate quickstart |
|
pip install torch transformers accelerate pillow |
|
``` |
|
- Then you can use `python script/quick_start/quickstart.py` to start. |
|
|
|
|
|
```python |
|
import torch |
|
import transformers |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria |
|
from PIL import Image |
|
import warnings |
|
|
|
transformers.logging.set_verbosity_error() |
|
transformers.logging.disable_progress_bar() |
|
warnings.filterwarnings('ignore') |
|
|
|
device = 'cuda' # or cpu |
|
torch.set_default_device(device) |
|
|
|
model_name = "ai4colonoscopy/ColonGPT-v1" |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype=torch.float16, # or float32 for cpu |
|
device_map='auto', |
|
trust_remote_code=True |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
model_name, |
|
trust_remote_code=True |
|
) |
|
|
|
class KeywordsStoppingCriteria(StoppingCriteria): |
|
def __init__(self, keyword, tokenizer, input_ids): |
|
self.keyword_id = tokenizer(keyword).input_ids |
|
self.tokenizer = tokenizer |
|
self.start_len = input_ids.shape[1] |
|
|
|
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool: |
|
for keyword_id in self.keyword_id: |
|
if keyword_id in input_ids[0, -len(self.keyword_id):]: |
|
return True |
|
return False |
|
|
|
prompt = "Describe what you see in the image." |
|
text = f"USER: <image>\n{prompt} ASSISTANT:" |
|
text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')] |
|
input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0).to(device) |
|
|
|
image = Image.open('cache/examples/example2.png') |
|
image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device) |
|
|
|
stop_str = "<|endoftext|>" |
|
stopping_criteria = KeywordsStoppingCriteria(stop_str, tokenizer, input_ids) |
|
|
|
output_ids = model.generate( |
|
input_ids, |
|
images=image_tensor, |
|
do_sample=False, |
|
temperature=0, |
|
max_new_tokens=512, |
|
use_cache=True, |
|
stopping_criteria=[stopping_criteria] |
|
) |
|
|
|
outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).replace("<|endoftext|>", "").strip() |
|
print(outputs) |
|
``` |
|
|
|
# License |
|
This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses. |
|
The content of this project itself is licensed under the Apache license 2.0. |