JingyiLiu commited on
Commit
10e8d6b
โ€ข
1 Parent(s): 5b8a059

update README

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +97 -2
  3. assert/ColonGPT.gif +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assert/ColonGPT.gif filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -15,7 +15,102 @@ tags:
15
  - polyp
16
  ---
17
 
18
- # ColonGPT
19
 
20
- A colonoscopy-specifc multimodal language model with token-efficient designs.
 
 
 
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  - polyp
16
  ---
17
 
18
+ # ColonGPT (A colonoscopy-specific multimodal Language Model)
19
 
20
+ <p align="center">
21
+ <img src="./assert/ColonGPT.gif" width="666px"/> <br />
22
+ <em>Details of our multimodal language model, ColonGPT.</em>
23
+ </p>
24
 
25
+
26
+ ๐Ÿ“– [Paper](https://arxiv.org) | ๐Ÿ  [Home](https://github.com/ai4colonoscopy/IntelliScope)
27
+
28
+
29
+ This is the merged weights of [ColonGPT-v1-phi1.5-siglip-lora](https://drive.google.com/drive/folders/1Emi7o7DpN0zlCPIYqsCfNMr9LTPt3SCT?usp=sharing).
30
+
31
+ Our ColonGPT is a standard multimodal language model, which contains four basic components: a language tokenizer, an visual encoder (๐Ÿค— [SigLIP-SO](https://huggingface.co/google/siglip-so400m-patch14-384)), a multimodal connector, and a language model (๐Ÿค— [Phi1.5](https://huggingface.co/microsoft/phi-1_5)).
32
+
33
+ For further details about ColonGPT, we highly recommend visiting our [home page](https://github.com/BAAI-DCAI/Bunny). There, you'll find comprehensive usage instructions for our model and the latest advancements in intelligent colonoscopy technology.
34
+
35
+
36
+
37
+ # Quick start
38
+ Here is a code snippet to show you how to quickly try-on our ColonGPT model with transformers. For convenience, we manually combined some configuration and code files and merged the weights. Please note that this is a quick code, we recommend you installing [ColonGPT's source code](https://github.com/ai4colonoscopy/IntelliScope/blob/main/docs/guideline-for-ColonGPT.md) to explore more.
39
+
40
+ - Before running the snippet, you only need to install the following minimium dependencies.
41
+ ```shell
42
+ conda create -n quickstart python=3.10
43
+ conda activate quickstart
44
+ pip install torch transformers accelerate pillow
45
+ ```
46
+ - Then you can use `python script/quick_start/quickstart.py` to start.
47
+
48
+
49
+ ```python
50
+ import torch
51
+ import transformers
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria
53
+ from PIL import Image
54
+ import warnings
55
+
56
+ transformers.logging.set_verbosity_error()
57
+ transformers.logging.disable_progress_bar()
58
+ warnings.filterwarnings('ignore')
59
+
60
+ device = 'cuda' # or cpu
61
+ torch.set_default_device(device)
62
+
63
+ model_name = "ai4colonoscopy/ColonGPT-v1"
64
+
65
+ model = AutoModelForCausalLM.from_pretrained(
66
+ model_name,
67
+ torch_dtype=torch.float16, # or float32 for cpu
68
+ device_map='auto',
69
+ trust_remote_code=True
70
+ )
71
+
72
+ tokenizer = AutoTokenizer.from_pretrained(
73
+ model_name,
74
+ trust_remote_code=True
75
+ )
76
+
77
+ class KeywordsStoppingCriteria(StoppingCriteria):
78
+ def __init__(self, keyword, tokenizer, input_ids):
79
+ self.keyword_id = tokenizer(keyword).input_ids
80
+ self.tokenizer = tokenizer
81
+ self.start_len = input_ids.shape[1]
82
+
83
+ def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool:
84
+ for keyword_id in self.keyword_id:
85
+ if keyword_id in input_ids[0, -len(self.keyword_id):]:
86
+ return True
87
+ return False
88
+
89
+ prompt = "Describe what you see in the image."
90
+ text = f"USER: <image>\n{prompt} ASSISTANT:"
91
+ text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
92
+ input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0).to(device)
93
+
94
+ image = Image.open('cache/examples/example2.png')
95
+ image_tensor = model.process_images([image], model.config).to(dtype=model.dtype, device=device)
96
+
97
+ stop_str = "<|endoftext|>"
98
+ stopping_criteria = KeywordsStoppingCriteria(stop_str, tokenizer, input_ids)
99
+
100
+ output_ids = model.generate(
101
+ input_ids,
102
+ images=image_tensor,
103
+ do_sample=False,
104
+ temperature=0,
105
+ max_new_tokens=512,
106
+ use_cache=True,
107
+ stopping_criteria=[stopping_criteria]
108
+ )
109
+
110
+ outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).replace("<|endoftext|>", "").strip()
111
+ print(outputs)
112
+ ```
113
+
114
+ # License
115
+ This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses.
116
+ The content of this project itself is licensed under the Apache license 2.0.
assert/ColonGPT.gif ADDED

Git LFS Details

  • SHA256: e3d1435d26943229dbc60a054434d366449f8665e402d3d2090ea3d2b4d250dd
  • Pointer size: 132 Bytes
  • Size of remote file: 5.02 MB