phi3 vision visual encoder
Was a paper published about this vision model?
Which visual encoder was used?
looks like a clip encoder.
We use CLIP-L, the paper will be released later today.
What is the resolution of the image input?
The resolution is dynamic based on the input image aspect ratio. The max resolution is 1344x1344.
We use CLIP-L, the paper will be released later today.
where is the parper, please show links
We use CLIP-L, the paper will be released later today.
Does the visual model freeze during training?
We use CLIP-L, the paper will be released later today.
Are you going to release the paper and the fine-tuning code ?
'img_processor': {'image_dim_out': 1024, 'model_name': 'openai/clip-vit-large-patch14-336', 'name': 'clip_vision_model', 'num_img_tokens': 144}
Please share the Paper URL here or model card.