timm
/

Image Classification
timm
PyTorch
Safetensors
rwightman's picture
rwightman HF staff
Update model config and README
a93c49b verified
|
raw
history blame
11.3 kB
metadata
tags:
  - image-classification
  - timm
library_name: timm
license: apache-2.0
datasets:
  - imagenet-1k
  - imagenet-12k

Model card for mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k

A MobileNet-V4 image classification model. Pretrained on ImageNet-12k and fine-tuned on ImageNet-1k by Ross Wightman.

Model Details

Model Usage

Image Classification

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model('mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k', pretrained=True)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)

Feature Map Extraction

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k',
    pretrained=True,
    features_only=True,
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # unsqueeze single image into batch of 1

for o in output:
    # print shape of each feature map in output
    # e.g.:
    #  torch.Size([1, 32, 128, 128])
    #  torch.Size([1, 48, 64, 64])
    #  torch.Size([1, 80, 32, 32])
    #  torch.Size([1, 160, 16, 16])
    #  torch.Size([1, 960, 8, 8])

    print(o.shape)

Image Embeddings

from urllib.request import urlopen
from PIL import Image
import timm

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

model = timm.create_model(
    'mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k',
    pretrained=True,
    num_classes=0,  # remove classifier nn.Linear
)
model = model.eval()

# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)

output = model(transforms(img).unsqueeze(0))  # output is (batch_size, num_features) shaped tensor

# or equivalently (without needing to set num_classes=0)

output = model.forward_features(transforms(img).unsqueeze(0))
# output is unpooled, a (1, 960, 8, 8) shaped tensor

output = model.forward_head(output, pre_logits=True)
# output is a (1, num_features) shaped tensor

Model Comparison

By Top-1

model top1 top1_err top5 top5_err param_count img_size
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k 84.99 15.01 97.294 2.706 32.59 544
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k 84.772 15.228 97.344 2.656 32.59 480
mobilenetv4_conv_aa_large.e230_r448_in12k_ft_in1k 84.64 15.36 97.114 2.886 32.59 448
mobilenetv4_hybrid_large.ix_e600_r384_in1k 84.356 15.644 96.892 3.108 37.76 448
mobilenetv4_conv_aa_large.e230_r384_in12k_ft_in1k 84.314 15.686 97.102 2.898 32.59 384
mobilenetv4_hybrid_large.e600_r384_in1k 84.266 15.734 96.936 3.064 37.76 448
mobilenetv4_hybrid_large.ix_e600_r384_in1k 83.990 16.010 96.702 3.298 37.76 384
mobilenetv4_conv_aa_large.e600_r384_in1k 83.824 16.176 96.734 3.266 32.59 480
mobilenetv4_hybrid_large.e600_r384_in1k 83.800 16.200 96.770 3.230 37.76 384
mobilenetv4_hybrid_medium.ix_e550_r384_in1k 83.394 16.606 96.760 3.240 11.07 448
mobilenetv4_conv_large.e600_r384_in1k 83.392 16.608 96.622 3.378 32.59 448
mobilenetv4_conv_aa_large.e600_r384_in1k 83.244 16.756 96.392 3.608 32.59 384
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k 82.99 17.01 96.67 3.33 11.07 320
mobilenetv4_hybrid_medium.ix_e550_r384_in1k 82.968 17.032 96.474 3.526 11.07 384
mobilenetv4_conv_large.e600_r384_in1k 82.952 17.048 96.266 3.734 32.59 384
mobilenetv4_conv_large.e500_r256_in1k 82.674 17.326 96.31 3.69 32.59 320
mobilenetv4_hybrid_medium.ix_e550_r256_in1k 82.492 17.508 96.278 3.722 11.07 320
mobilenetv4_hybrid_medium.e200_r256_in12k_ft_in1k 82.364 17.636 96.256 3.744 11.07 256
mobilenetv4_conv_large.e500_r256_in1k 81.862 18.138 95.69 4.31 32.59 256
mobilenetv4_hybrid_medium.ix_e550_r256_in1k 81.446 18.554 95.704 4.296 11.07 256
mobilenetv4_hybrid_medium.e500_r224_in1k 81.276 18.724 95.742 4.258 11.07 256
mobilenetv4_conv_medium.e500_r256_in1k 80.858 19.142 95.768 4.232 9.72 320
mobilenet_edgetpu_v2_m.ra4_e3600_r224_in1k 80.680 19.320 95.442 4.558 8.46 256
mobilenetv4_hybrid_medium.e500_r224_in1k 80.442 19.558 95.38 4.62 11.07 224
mobilenetv4_conv_blur_medium.e500_r224_in1k 80.142 19.858 95.298 4.702 9.72 256
mobilenet_edgetpu_v2_m.ra4_e3600_r224_in1k 80.130 19.70 95.002 4.998 8.46 224
mobilenetv4_conv_medium.e500_r256_in1k 79.928 20.072 95.184 4.816 9.72 256
mobilenetv4_conv_medium.e500_r224_in1k 79.808 20.192 95.186 4.814 9.72 256
mobilenetv4_conv_blur_medium.e500_r224_in1k 79.438 20.562 94.932 5.068 9.72 224
efficientnet_b0.ra4_e3600_r224_in1k 79.364 20.636 94.754 5.246 5.29 256
mobilenetv4_conv_medium.e500_r224_in1k 79.094 20.906 94.77 5.23 9.72 224
efficientnet_b0.ra4_e3600_r224_in1k 78.584 21.416 94.338 5.662 5.29 224
mobilenetv1_100h.ra4_e3600_r224_in1k 76.596 23.404 93.272 6.728 5.28 256
mobilenetv1_100.ra4_e3600_r224_in1k 76.094 23.906 93.004 6.996 4.23 256
mobilenetv1_100h.ra4_e3600_r224_in1k 75.662 24.338 92.504 7.496 5.28 224
mobilenetv1_100.ra4_e3600_r224_in1k 75.382 24.618 92.312 7.688 4.23 224
mobilenetv4_conv_small.e2400_r224_in1k 74.616 25.384 92.072 7.928 3.77 256
mobilenetv4_conv_small.e1200_r224_in1k 74.292 25.708 92.116 7.884 3.77 256
mobilenetv4_conv_small.e2400_r224_in1k 73.756 26.244 91.422 8.578 3.77 224
mobilenetv4_conv_small.e1200_r224_in1k 73.454 26.546 91.34 8.66 3.77 224

Citation

@article{qin2024mobilenetv4,
  title={MobileNetV4-Universal Models for the Mobile Ecosystem},
  author={Qin, Danfeng and Leichner, Chas and Delakis, Manolis and Fornoni, Marco and Luo, Shixin and Yang, Fan and Wang, Weijun and Banbury, Colby and Ye, Chengxi and Akin, Berkin and others},
  journal={arXiv preprint arXiv:2404.10518},
  year={2024}
}
@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
}