benchang1110 commited on
Commit
b2a917f
1 Parent(s): cdf3527

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -6
README.md CHANGED
@@ -9,7 +9,7 @@ pipeline_tag: image-text-to-text
9
 
10
  # Model Card for Model ID
11
 
12
-
13
  ## Model Details
14
 
15
  ## English
@@ -22,7 +22,7 @@ pipeline_tag: image-text-to-text
22
  Ready to experience the Traditional Chinese visual language model? Let's go! 🖼️🤖
23
 
24
 
25
- ## Traditional Chinese
26
  # 臺視: 首創獨一無二的視覺語言模型!! 🚀
27
 
28
  🌟 TaiVisionLM 是一個小型的視覺語言模型(僅有 12 億參數),可以根據圖像輸入來回覆繁體中文指令!🌟
@@ -57,7 +57,7 @@ Here's the summary of the development process:
57
  - **Model type:** [Image-Text-to-Text](https://huggingface.co/tasks/image-text-to-text)
58
  - **Language(s) (NLP):** *Traditional Chinese*
59
 
60
- ## 中文
61
  這個模型是一個多模態的語言模型,結合了 [SigLIP](https://huggingface.co/docs/transformers/en/model_doc/siglip) 作為其視覺編碼器,並使用 [Tinyllama](https://huggingface.co/benchang1110/Taiwan-tinyllama-v1.0-chat) 作為語言模型。視覺投影器將這兩種模態結合在一起。
62
  其架構與 [PaliGemma](https://huggingface.co/docs/transformers/v4.44.0/model_doc/paligemma) 非常相似。
63
 
@@ -79,6 +79,8 @@ Here's the summary of the development process:
79
 
80
  ## How to Get Started with the Model
81
 
 
 
82
  In Transformers, you can load the model and do inference as follows:
83
 
84
  **IMPORTANT NOTE:** TaiVisionLM model is not yet integrated natively into the Transformers library. So you need to set ```trust_remote_code=True``` when loading the model. It will download the ```configuration_taivisionlm.py```, ```modeling_taivisionlm.py``` and ```processing_taivisionlm.py``` files from the repo. You can check out the content of these files under the *Files and Versions* tab and pin the specific versions if you have any concerns regarding malicious code.
@@ -101,15 +103,42 @@ outputs = processor.tokenizer.decode(model.generate(**inputs,max_length=512)[0])
101
  print(outputs)
102
  ```
103
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  ### Training Procedure
105
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
106
  The following training hyperparameters are used in feature alignment and task specific training stages respectively:
107
 
108
  - **Feature Alignment**
109
 
110
  | Data size | Global Batch Size | Learning Rate | Epochs | Max Length | Weight Decay |
111
  |--------------|-------------------|---------------|--------|------------|--------------|
112
- | 1B | 16 | 5e-4 | 1 | 2048 | 1e-5 |
113
 
 
114
 
115
- We use full-parameter finetuning for the projector and apply LoRA to the language model.
 
 
 
 
9
 
10
  # Model Card for Model ID
11
 
12
+ ![TaivisionLM]()
13
  ## Model Details
14
 
15
  ## English
 
22
  Ready to experience the Traditional Chinese visual language model? Let's go! 🖼️🤖
23
 
24
 
25
+ ## 繁體中文
26
  # 臺視: 首創獨一無二的視覺語言模型!! 🚀
27
 
28
  🌟 TaiVisionLM 是一個小型的視覺語言模型(僅有 12 億參數),可以根據圖像輸入來回覆繁體中文指令!🌟
 
57
  - **Model type:** [Image-Text-to-Text](https://huggingface.co/tasks/image-text-to-text)
58
  - **Language(s) (NLP):** *Traditional Chinese*
59
 
60
+ ## 繁體中文
61
  這個模型是一個多模態的語言模型,結合了 [SigLIP](https://huggingface.co/docs/transformers/en/model_doc/siglip) 作為其視覺編碼器,並使用 [Tinyllama](https://huggingface.co/benchang1110/Taiwan-tinyllama-v1.0-chat) 作為語言模型。視覺投影器將這兩種模態結合在一起。
62
  其架構與 [PaliGemma](https://huggingface.co/docs/transformers/v4.44.0/model_doc/paligemma) 非常相似。
63
 
 
79
 
80
  ## How to Get Started with the Model
81
 
82
+ ## English
83
+
84
  In Transformers, you can load the model and do inference as follows:
85
 
86
  **IMPORTANT NOTE:** TaiVisionLM model is not yet integrated natively into the Transformers library. So you need to set ```trust_remote_code=True``` when loading the model. It will download the ```configuration_taivisionlm.py```, ```modeling_taivisionlm.py``` and ```processing_taivisionlm.py``` files from the repo. You can check out the content of these files under the *Files and Versions* tab and pin the specific versions if you have any concerns regarding malicious code.
 
103
  print(outputs)
104
  ```
105
 
106
+ ## 中文
107
+ 利用 transformers,可以用下面程式碼進行推論:
108
+
109
+ **重要通知:** 台視 (TaiVisionLM)還沒被整合進transformers,因此在下載模型時要使用 ```trust_remote_code=True```,下載模型將會使用``configuration_taivisionlm.py```、 ```modeling_taivisionlm.py``` 和 ```processing_taivisionlm.py``` 這三個檔案,若擔心有惡意程式碼,請先點選右方 *Files and Versions* 來查看程式碼內容。
110
+
111
+ ```python
112
+ from transformers import AutoProcessor, AutoModelForCausalLM, AutoConfig
113
+ from PIL import Image
114
+ import requests
115
+ import torch
116
+
117
+ config = AutoConfig.from_pretrained("benchang1110/TaiVision-base",trust_remote_code=True)
118
+ processor = AutoProcessor.from_pretrained("benchang1110/TaiVision-base",trust_remote_code=True)
119
+ model = AutoModelForCausalLM.from_pretrained("benchang1110/TaiVision-base",trust_remote_code=True,torch_dtype=torch.float16,attn_implementation="sdpa").to('cuda')
120
+ model.eval()
121
+ url = "https://media.wired.com/photos/598e35fb99d76447c4eb1f28/master/pass/phonepicutres-TA.jpg"
122
+ image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
123
+ text = "描述圖片"
124
+ inputs = processor(text=text,images=image, return_tensors="pt",padding=False).to('cuda')
125
+ outputs = processor.tokenizer.decode(model.generate(**inputs,max_length=512)[0])
126
+ print(outputs)
127
+ ```
128
+
129
  ### Training Procedure
130
+
131
  The following training hyperparameters are used in feature alignment and task specific training stages respectively:
132
 
133
  - **Feature Alignment**
134
 
135
  | Data size | Global Batch Size | Learning Rate | Epochs | Max Length | Weight Decay |
136
  |--------------|-------------------|---------------|--------|------------|--------------|
137
+ | 1B | 16 | 5e-5 | 1 | 2048 | 1e-5 |
138
 
139
+ We use full-parameter finetuning for the projector and apply LoRA to the language model.
140
 
141
+ ### Compute Infrastructure
142
+ - **Feature Alignment**
143
+ - 1xV100(32GB), took approximately 16 GPU hours.
144
+