library_name: transformers
license: other
base_model: nvidia/mit-b0
tags:
- generated_from_trainer
datasets:
- scene_parse_150
model-index:
- name: segformer-b0-scene-parse-150
results: []
metrics:
- mean_iou
pipeline_tag: image-segmentation
Segformer-b0-scene-parse-150
This model is a fine-tuned version of the nvidia/mit-b0 model, specifically trained on the scene_parse_150
dataset. The goal of this model is to perform semantic segmentation for various scene parsing tasks.
Evaluation Results:
The model achieved the following results on the evaluation dataset:
- Loss: 1.8435
- Mean IoU: 0.0881
- Mean Accuracy: 0.1619
- Overall Accuracy: 0.6663
Per-Category IoU and Per-Category Accuracy values are available but sparse, indicating performance variability across different categories.
Model Description
Segformer-b0 is based on a modified version of the Vision Transformer (ViT) architecture, adapted for efficient segmentation tasks. It incorporates hierarchical features to generate high-quality segmentation maps.
More detailed model descriptions, including architectural adjustments or preprocessing requirements, are needed.
Intended Uses & Limitations
- Use Cases: Suitable for scene parsing and segmentation tasks in environments with diverse visual categories.
- Limitations: Performance varies significantly between categories, as seen from sparse accuracy and IoU metrics. The model may struggle with underrepresented classes or categories with fewer visual distinctions.
- Further details on intended domains and limitations are needed.
Training and Evaluation Data
The model was trained on the scene_parse_150
dataset, which consists of diverse visual scenes with 150 unique semantic categories. Further information on dataset specifics and any preprocessing steps is needed.
Training Procedure
Hyperparameters:
- Learning Rate: 6e-05
- Training Batch Size: 2
- Evaluation Batch Size: 2
- Seed: 42
- Optimizer: Adam (betas=(0.9, 0.999), epsilon=1e-08)
- Learning Rate Scheduler: Linear
- Number of Epochs: 50
Training Results:
The model was trained over 50 epochs, but further details regarding its convergence behavior, training duration, and hardware environment could provide additional insights.
Framework Versions:
- Transformers 4.44.2
- PyTorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1