kr-manish
/

fine-tune-image-caption-pokemon

Image-Text-to-Text

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

kr-manish commited on Apr 17

Commit

b9c544b

•

1 Parent(s): cc87729

Update README.md

Files changed (1) hide show

README.md +36 -0

README.md CHANGED Viewed

@@ -11,6 +11,42 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # git-base-pokemon
 This model is a fine-tuned version of [microsoft/git-base](https://huggingface.co/microsoft/git-base) on an unknown dataset.

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+#dataset used: polinaeterna/pokemon-blip-captions
+#code
+```python
+from transformers import AutoProcessor, AutoModelForCausalLM
+import torch
+from PIL import Image
+import requests
+#Preprocess the dataset
+#Since the dataset has two modalities (image and text), the pre-processing pipeline will preprocess images and the captions.
+#To do so, load the processor class associated with the model you are about to fine-tune.
+from transformers import AutoProcessor
+checkpoint = "microsoft/git-base"
+processor = AutoProcessor.from_pretrained(checkpoint)
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model_name = "kr-manish/git-base-pokemon"  # Replace with your actual username and model name
+model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
+url =  "https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/pokemon.png"  # Replace with the URL of your image
+image = Image.open(requests.get(url, stream=True).raw)
+inputs = processor(images=image, return_tensors="pt").to(device)
+generated_ids = model.generate(pixel_values=inputs.pixel_values, max_length=50)
+generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(generated_caption)
+#a pink and purple pokemon character with big eyes
+```
 # git-base-pokemon
 This model is a fine-tuned version of [microsoft/git-base](https://huggingface.co/microsoft/git-base) on an unknown dataset.