SanghyukChun
/

PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M

pytorch_model_hub_mixin

model_hub_mixin

Inference Endpoints

Model card Files Files and versions Community

SanghyukChun commited on May 30

Commit

a668fdf

•

1 Parent(s): be6e736

Update README.md

Files changed (1) hide show

README.md +49 -3

README.md CHANGED Viewed

@@ -4,6 +4,52 @@ tags:
 - model_hub_mixin
 ---
-This model has been pushed to the Hub using ****:
-- Repo: [More Information Needed]
-- Docs: [More Information Needed]

 - model_hub_mixin
 ---
+### Official implementation of PCME++ pre-trained model on CC3M, CC12M and RedCaps.
+Zero-shot ImageNet-1k top-1 accuracy: 41.812% (with longer training iterations than the previous version)
+- Paper: https://openreview.net/forum?id=ft1mr3WlGM
+- GitHub: https://github.com/naver-ai/pcmepp
+- Check the official version with ImageNet-1k top-1 accuracy 34.642% (mean-only ZS classification) at [SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps](https://huggingface.co/SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps)
+```python
+import requests
+from PIL import Image
+import torch
+from transformers import CLIPProcessor
+# Check hf_models code here: https://github.com/naver-ai/pcmepp/tree/main/hf_models
+from hf_models import HfPCMEPPModel, tokenize
+processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch16")
+# IN-top1: 34.64%
+# model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps")
+# IN-top1: 41.81%
+model = HfPCMEPPModel.from_pretrained("SanghyukChun/PCMEPP-ViT-B-16-CC3M-12M-RedCaps-256M")
+url = "http://images.cocodataset.org/val2017/000000039769.jpg"
+image = Image.open(requests.get(url, stream=True).raw)
+inputs = processor(images=image, return_tensors="pt", padding=True)
+texts = ["a photo of a cat", "a photo of a dog"]
+texts = tokenize(texts)
+outputs = model(images=inputs["pixel_values"], texts=texts)
+print("Logits:", outputs["image_features"] @ outputs["text_features"].T)
+print("Image uncertainty: ", torch.exp(outputs["image_stds"]).mean(dim=-1))
+print("Text uncertainty: ", torch.exp(outputs["text_stds"]).mean(dim=-1))
+```
+```
+@inproceedings{
+chun2024pcmepp,
+title={Improved Probabilistic Image-Text Representations},
+author={Sanghyuk Chun},
+booktitle={The Twelfth International Conference on Learning Representations},
+year={2024},
+url={https://openreview.net/forum?id=ft1mr3WlGM}
+}
+```