metadata

tags:
  - vision
  - image-segmentation
datasets:
  - segments/sidewalk-semantic
finetuned_from:
  - nvidia/mit-b5
widget:
  - src: >-
      https://segmentsai-prod.s3.eu-west-2.amazonaws.com/assets/admin-tobias/439f6843-80c5-47ce-9b17-0b2a1d54dbeb.jpg
    example_title: Brugge

SegFormer (b5-sized) model fine-tuned on sidewalk-semantic dataset.

SegFormer model fine-tuned on SegmentsAI sidewalk-semantic. It was introduced in the paper SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Xie et al. and first released in this repository.

Model description

SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset.

Code and Notebook

Here is how to use this model to classify an image of the sidewalk dataset:

from transformers import SegformerFeatureExtractor, SegformerForImageClassification
from PIL import Image
import requests

url = "https://segmentsai-prod.s3.eu-west-2.amazonaws.com/assets/admin-tobias/439f6843-80c5-47ce-9b17-0b2a1d54dbeb.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = SegformerFeatureExtractor.from_pretrained("zoheb/mit-b5-finetuned-sidewalk-semantic")
model = SegformerForImageClassification.from_pretrained("zoheb/mit-b5-finetuned-sidewalk-semantic")

inputs = feature_extractor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits

# model predicts one of the 35 Sidewalk classes
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

You can go through its detailed notebook here.

For more code examples, refer to the documentation.

License

The license for this model can be found here.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2105-15203,
  author    = {Enze Xie and
               Wenhai Wang and
               Zhiding Yu and
               Anima Anandkumar and
               Jose M. Alvarez and
               Ping Luo},
  title     = {SegFormer: Simple and Efficient Design for Semantic Segmentation with
               Transformers},
  journal   = {CoRR},
  volume    = {abs/2105.15203},
  year      = {2021},
  url       = {https://arxiv.org/abs/2105.15203},
  eprinttype = {arXiv},
  eprint    = {2105.15203},
  timestamp = {Wed, 02 Jun 2021 11:46:42 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2105-15203.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}