Spaces:
Runtime error
How to extract masked image part?=
I want to extract masked image part how can I do that?
Also is this the best text to mask method available atm?
thank you so much @nielsr
perhaps you can show modified version of this script to do this
input image
expected output
image = Image.open("0.png")
processor = CLIPSegProcessor.from_pretrained("./clipseg-rd64-refined")
model = CLIPSegForImageSegmentation.from_pretrained("./clipseg-rd64-refined")
prompts = ["a blue block","a orange block", "a yellow block", "a red block"]
inputs = processor(text=prompts, images=[image] * len(prompts), padding="max_length", return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
print(logits.shape)
preds = outputs.logits.unsqueeze(1)
Hi,
See my demo notebook: https://github.com/NielsRogge/Transformers-Tutorials/tree/master/CLIPSeg
@nielsr hi
thank you so much for answer
on the colab i have tested
and it is not working
i have provided divisibile to 16 pixels images
even exactly 352x352 pixel image still not working
here error for 352x352
used image
i also tested this image divisible to 16 it didnt work either
after i loaded png it passed that part
i loaded this url : https://cdn-uploads.huggingface.co/production/uploads/1676308992250-6345bd89fe134dfd7a0dba40.png
now i got this error
@MonsterMMORPG I have a fork of this space that produces bounding boxes and masks. You can try it here. You might find it helpful.
@taesiri it works great
how can I extract the image based on the alpha? I want to extract only alpha part
only alpha and not the rest
e.g.
this
into this
also i tested different strength and it includes head as well when clothes prompt used
any ideas to discard head?
@MonsterMMORPG About the prompt, you can try different ones, for example "jacket and jeans" works good on your image, but it is not perfect.
As for alpha, you can use the mask output and some PIL magic to get the image you want. You can find a tutorial here: https://note.nkmk.me/en/python-pillow-composite/
@taesiri thank you so much.
would that be too hard for you to add that to your demo page? i am also outputting masked part
i am totally noob at python :/
@MonsterMMORPG Okay :-), I will try to add it later today (after my I am done with my daily tasks :D)
@MonsterMMORPG Okay :-), I will try to add it later today (after my I am done with my daily tasks :D)
Awesome thank you so much. Looking forward to it.
@MonsterMMORPG Pushed an update.
@MonsterMMORPG Pushed an update.
Awesome ty so much will test asap
@taesiri
thank you so much again
since i am very newbie in Python I am having this error
I made a colab and I want to run code without gradio interface
I am running this command but can't show and save returned 3 images
here the colab link : https://colab.research.google.com/drive/1Eain9Tri7HUa90qUBn3kuEQy9D6A-w-e?usp=sharing
could you help me to show 3 returned images? and save them?
input_image = Image.open("/content/a.jpg")
input_prompt = "clothes"
thresholdVal=1.0
alpha_val=0.5
draw_rectangles = False
outputs = process_image(input_image,input_prompt,thresholdVal,alpha_val,draw_rectangles);
outputs[0].show() // only this works
outputs[1].show()
outputs[2].show()
The first returned variable is a matplotlib figure, second is a numpy array; and the last one is a PIL image containing the region of the image you are interested in. We can not call .show()
on a numpy array.
If you want to save the figure: outputs[0].savefig('fig.png')
If you want to save the mask : Image.fromarray(np.uint8(outputs[1] * 255), "L").save('mask.png')
If you want to save the region of interetest: outputs[2].save('clothes.png')
if you want to show them side by side:
import io
# Converting matplotlib figure to PIL Image, which is completely unnecessary!
buf = io.BytesIO()
outputs[0].savefig(buf, format='png', bbox_inches='tight', pad_inches=0)
image_data = buf.getvalue()
pil_image = Image.open(io.BytesIO(image_data))
fig, axes = plt.subplots(1, 3, figsize=(10, 4))
axes[0].imshow(pil_image)
axes[1].imshow(Image.fromarray(np.uint8(outputs[1] * 255), "L"), cmap='jet')
axes[2].imshow(outputs[2])
plt.show()
Hope this helps.
@taesiri working awesome thank you so much
can't we give negative mask word like do not take face as negative prompt
also can we make or?
i mean like clothes or pants or shirts or boots etc
here example
clothes
pants
none below works
clothes or pants
clothes , pants
clothes pants
@taesiri would it be possible like
merging multiple masks
like clothes, pants, shirts and perhaps zeroing out the negative such as head, face
like mask = mask_clothes + mask_pants + mask_shirts - mask_face - mask_head
@MonsterMMORPG This is actually a very cool idea, but I am swamped with my outgoing tasks for the next couple of days. Maybe I will have some time for this in the next week.
@MonsterMMORPG This is actually a very cool idea, but I am swamped with my outgoing tasks for the next couple of days. Maybe I will have some time for this in the next week.
awesome looking forward to that.
@MonsterMMORPG This is actually a very cool idea, but I am swamped with my outgoing tasks for the next couple of days. Maybe I will have some time for this in the next week.
Hello again. Any chance to look at this so far?
@MonsterMMORPG You can try it here: https://huggingface.co/spaces/taesiri/CLIPSeg2
Not very efficient at the moment, but it does what you need. When entering prompts, use a comma to separate multiple objects.
@MonsterMMORPG You can try it here: https://huggingface.co/spaces/taesiri/CLIPSeg2
Not very efficient at the moment, but it does what you need. When entering prompts, use a comma to separate multiple objects.
Awesome thank you so so much. You are amazing.
It works great. But we can't see confidence as before i presume?
I edited the previous code as below
input_image = Image.open(file_name)
positive_prompts = "clothes, shirts, pants"
negative_prompts = "face, head"
thresholdVal=0.5
alpha_val=0.5
draw_rectangles = False
outputs = extract_image(positive_prompts,negative_prompts,input_image,thresholdVal);
outputs[0].show()
outputs[1].show()
outputs[0].save(folder_path+"/"+names_no_ext[irCounter]+'_clothes.png')
outputs[0].save(folder_path+"/"+names_no_ext[irCounter]+'_clothes_mask.png')
here results
input
outputs
@MonsterMMORPG I removed unnecessary visualization as each word has its own confidence information and I assumed that is not required for your task :-)
@MonsterMMORPG I removed unnecessary visualization as each word has its own confidence information and I assumed that is not required for your task :-)
yep not necessary. your help was tremendous. do you think there will be sooner a better model released for this task? like better accuracy having one
@MonsterMMORPG This really depends on your use-cases, you might need to fine-tune some of these models. Have you tried other models as well (... like MaskCut).
@MonsterMMORPG This really depends on your use-cases, you might need to fine-tune some of these models. Have you tried other models as well (... like MaskCut).
thanks a lot for reply. i didn't know MaskCut. but CLIPSeg looks like better for our task
what would you suggest me to calculate similarity of clothings? We want to calculate similarity of clothings. so that lets say you show cloth A, and we return similar clothes to that cloth . so this is similarity of images calculation i presume
sorr about hidden comments. they were duplicate posting due to hugging face error
@MonsterMMORPG What have you tried so far? I reckon simple cosine similarity of CLIP image embeddings would give decent results out of the box.
@MonsterMMORPG What have you tried so far? I reckon simple cosine similarity of CLIP image embeddings would give decent results out of the box.
this was also in my mind
how can I calculate this? Unfortunately I am very bad at python . Lets say If I want to make addition to this script which is your script so should be easy for you :)
https://gist.github.com/FurkanGozukara/09dd8a80d72546bd51ef73b2171e8338
also do you have any ideas?
@MonsterMMORPG English is the most popular programming language these days, you can try ChatGPT + Copilot and do anything you like 😁😁
@MonsterMMORPG English is the most popular programming language these days, you can try ChatGPT + Copilot and do anything you like 😁😁
you are absolutely right
i made a simple logical code block but getting error. cant solve where is the error?
also i need it to be able to process any image dimension
import torch
import torchvision.transforms as transforms
import urllib.request
from transformers import CLIPProcessor, CLIPModel, CLIPTokenizer
from PIL import Image
device = "cuda" if torch.cuda.is_available() else "cpu"
model_ID = "openai/clip-vit-base-patch32"
model = CLIPModel.from_pretrained(model_ID).to(device)
preprocess = CLIPProcessor.from_pretrained(model_ID)
def load_and_preprocess_image(image_path):
image = Image.open(image_path)
image = preprocess(image).unsqueeze(0).to(device)
return image
image_a = load_and_preprocess_image('/content/a.png')
image_b = load_and_preprocess_image('/content/b.png')
with torch.no_grad():
embedding_a = model.encode_image(image_a)
embedding_b = model.encode_image(image_b)
similarity_score = torch.nn.functional.cosine_similarity(embedding_a, embedding_b)
print('Similarity score:', similarity_score.item())