metadata
tags:
- image-classification
- timm
library_name: timm
license: apache-2.0
Model card for kat_base_patch16_224.vitft
KAT model trained on ImageNet-1k (1 million images, 1,000 classes) at resolution 224x224. It was first introduced in the paper Kolmogorov–Arnold Transformer.
Model description
KAT is a model that replaces channel mixer in transfomrers with Group Rational Kolmogorov–Arnold Network (GR-KAN).
Usage
The model definition is at https://github.com/Adamdad/kat, katransformer.py
.
from urllib.request import urlopen
from PIL import Image
import timm
import torch
import katransformer
img = Image.open(urlopen(
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
# Move model to CUDA
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = timm.create_model("hf_hub:adamdad/kat_base_patch16_224.vitft", pretrained=True)
model = model.to(device)
model = model.eval()
# get model specific transforms (normalization, resize)
data_config = timm.data.resolve_model_data_config(model)
transforms = timm.data.create_transform(**data_config, is_training=False)
output = model(transforms(img).unsqueeze(0).to(device)) # unsqueeze single image into batch of 1
top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
print(top5_probabilities)
print(top5_class_indices)
Bibtex
@misc{yang2024compositional,
title={Kolmogorov–Arnold Transformer},
author={Xingyi Yang and Xinchao Wang},
year={2024},
eprint={XXXX},
archivePrefix={arXiv},
primaryClass={cs.CV}
}