fashion-mnist-base / README.md
dcarpintero's picture
Update README.md
92b8d63 verified
metadata
license: apache-2.0
datasets:
  - fashion_mnist
language:
  - en
metrics:
  - accuracy
pipeline_tag: image-classification

Fashion-MNIST Baseline Classifier

Model Details

  • Model Name: fashion-mnist-base
  • Framework: Custom implementation in Python
  • Version: 0.1
  • License: Apache-2.0

Model Description

This is a neural network model developed from the ground up to classify images from the Fashion-MNIST dataset. The dataset comprises 70,000 grayscale images across 10 categories. Each example is a 28x28 grayscale image, associated with a label from 10 classes including T-shirts/tops, trousers, pullovers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots.

Intended Use

This model is intended for educational purposes and as a baseline for more complex implementations. It can be used by students and AI enthusiasts to understand the workings of neural networks and their application in image classification.

Training Data

The model was trained on the Fashion-MNIST dataset, which contains 60,000 training images and 10,000 test images. Each image is 28x28 pixels, grayscale, associated with one of 10 classes representing different types of clothing and accessories.

Architecture Details:

  • Input layer: 784 neurons (flattened 28x28 image)
  • Hidden layer 1: 256 neurons, ReLU activation, Dropout
  • Hidden layer 2: 64 neurons, ReLU activation, Dropout
  • Output layer: 10 neurons, logits

Hyperparameters:

  • Learning rate: 0.005
  • Batch size: 32
  • Epochs: 25

The model uses a self-implemented stochastic gradient descent (SGD) optimizer.

Evaluation Results

The model achieved the following performance on the test set:

  • Accuracy: 86.7%
  • Precision, Recall, and F1-Score:
Label Precision Recall F1-score
T-shirt/Top 0.847514 0.767 0.805249
Trouser 0.982618 0.961 0.971689
Pullover 0.800000 0.748 0.773127
Dress 0.861868 0.886 0.873767
Coat 0.776278 0.805 0.790378
Sandal 0.957958 0.957 0.957479
Shirt 0.638587 0.705 0.670152
Sneaker 0.935743 0.932 0.933868
Bag 0.952381 0.960 0.956175
Ankle-Boot 0.944554 0.954 0.949254

Limitations and Biases

Due to the nature of the training dataset, the model may not capture the full complexity of fashion items in diverse real-world scenarios. In practice, we found out that it is sensitive to background colors and article's proportions.

How to Use

import torch
import torchvision.transforms as transforms
from PIL import Image

model = torch.load('fashion-mnist-base.pt')

# Images need to be transformed to the `fashion MNIST` dataset format
transform = transforms.Compose(
    [
        transforms.Resize((28, 28)),
        transforms.Grayscale(),
        transforms.ToTensor(),
        transforms.Normalize((0.5,), (0.5,)), # Normalization
        transforms.Lambda(lambda x: 1.0 - x), # Invert colors
        transforms.Lambda(lambda x: x[0]),
        transforms.Lambda(lambda x: x.unsqueeze(0)),
    ]
)

img = Image.open('fashion/dress.png')
img = transform(img)
model.predictions(img)

Sample Output

{'Dress': 84.437744,
 'Coat': 7.631796,
 'Pullover': 4.2272186,
 'Shirt': 1.297625,
 'T-shirt/Top': 1.2237197,
 'Bag': 0.9053432,
 'Trouser/Jeans': 0.27268794,
 'Sneaker': 0.0031491981,
 'Ankle-Boot': 0.00063403655,
 'Sandal': 8.5103806e-05}