the model "depth-anything/Depth-Anything-V2-Metric-Outdoor-Small-hf" doesnt work as expected, produces a full black depthmap
hi so im trying to use the model as described here, but i keep getting just blank pure black depth maps
when I go back to the original models "depth-anything/Depth-Anything-V2-Small-hf" I get an accurate depthmap
I tried inferring the model both using the high level pipeline API and the manual way, and the result is the same.
what could be the issue?
Hi
@Abbasid
, it looks like the feature "metric depth" for DepthAnything is not in the 4.44.0 release yet, but you can use it if you update transformers
to the latest main
as follows:
# update transformers to latest main
!pip install git+https://github.com/huggingface/transformers
from transformers import AutoImageProcessor, AutoModelForDepthEstimation
import torch
import numpy as np
from PIL import Image
import requests
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
image_processor = AutoImageProcessor.from_pretrained("depth-anything/depth-anything-V2-metric-outdoor-small-hf")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/depth-anything-V2-metric-outdoor-small-hf")
# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
predicted_depth = outputs.predicted_depth
# interpolate to original size
prediction = torch.nn.functional.interpolate(
predicted_depth.unsqueeze(1),
size=image.size[::-1],
mode="bicubic",
align_corners=False,
)
# visualize the output
output = prediction.squeeze().cpu().numpy()
formatted = (output * 255 / np.max(output)).astype("uint8")
depth = Image.fromarray(formatted)
depth
I believe this is the issue you are facing because it is working for me:
Hello
@bthia97
Thanks I managed to get it working with the help of!pip install git+https://github.com/huggingface/transformers
another question these values are supposed to represent metric depth right? so if at a pixel I got the value 60,
that means this object is 60m ? cause I got way off values for my objects of known sizes,
or is something wrong with my understanding?
also, when we infer with pipe whats the difference between depth and predicted depth?
depth is a PIL image with the depth...