Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for Sapiens

Sapiens is a family of vision transformers pretrained on 300 million human images at 1024 x 1024 image resolution. The pretrained models when finetuned for human-centric vision tasks generalize to in-the-wild conditions.

Model Description

Sapiens, a family of models for four fundamental human-centric vision tasks - 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction. Our models natively support 1K high-resolution inference and are extremely easy to adapt for individual tasks by simply fine-tuning models pretrained on over 300 million in-the-wild human images. The resulting models exhibit remarkable generalization to in-the-wild data, even when labeled data is scarce or entirely synthetic. Our simple model design also brings scalability - model performance across tasks improves as we scale the parameters from 0.3 to 2 billion. Sapiens consistently surpasses existing baselines across various human-centric benchmarks.

  • Model type: Vision Transformers
  • License: Creative Commons Attribution-NonCommercial 4.0

More Resources

Uses

pose estimation (keypoints 17, keypoints 133). COCO and COCO-WholeBody dataset compatible.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .