social-rise
/

blip-vqa-capfilt-large

sergeymal commited on Oct 8

Commit

827c04d

•

1 Parent(s): 9934132

Adjusted readme, changed response, removed log.

Files changed (2) hide show

README.md CHANGED Viewed

@@ -28,6 +28,58 @@ You can use this model for conditional and un-conditional image captioning
 ### Using the Pytorch model
 #### Running the model on CPU
 <details>

 ### Using the Pytorch model
+#### Running inference
+JSON Payload:
+<details>
+<summary> Click to expand </summary>
+```json
+{
+	"secret_token": "optional",
+	"inputs": {
+		"texts": [
+			[
+				"Is it a person?",
+				"What skin color?",
+				"What person wears?",
+				"Is person solo?",
+				"What is person mood?",
+				"What is person doing?"
+			]
+		],
+		"images": [
+			{
+				"url": "https://example.com"
+			}
+		]
+	}
+}
+```
+</details>
+JSON Response:
+<details>
+<summary> Click to expand </summary>
+```json
+{
+	"captions": [
+		{
+			"image_results": [
+				"yes",
+				"white",
+				"naked",
+				"yes",
+				"happy",
+				"taking selfie"
+			]
+		}
+	]
+}
+```
+</details>
 #### Running the model on CPU
 <details>

handler.py CHANGED Viewed

@@ -52,8 +52,6 @@ class EndpointHandler():
                 image_captions = []  # Store answers for each image
                 for question in questions:
-                    print(f"Question: {question}")
                     # Process the image and question
                     processed_input = self.processor(image, question, return_tensors="pt").to(device)
@@ -61,10 +59,10 @@ class EndpointHandler():
                     out = self.model.generate(**processed_input)
                     # Decode the answer
-                    caption = self.processor.batch_decode(out, skip_special_tokens=True)[0]
                     # Add the answer to the list for the current image
-                    image_captions.append({"answer": caption})
                 # Store results for the current image
                 results.append({"image_results": image_captions})

                 image_captions = []  # Store answers for each image
                 for question in questions:
                     # Process the image and question
                     processed_input = self.processor(image, question, return_tensors="pt").to(device)
                     out = self.model.generate(**processed_input)
                     # Decode the answer
+                    answer = self.processor.batch_decode(out, skip_special_tokens=True)[0]
                     # Add the answer to the list for the current image
+                    image_captions.append(answer)
                 # Store results for the current image
                 results.append({"image_results": image_captions})