## MiVOLO: Multi-input Transformer for Age and Gender Estimation [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mivolo-multi-input-transformer-for-age-and/age-estimation-on-utkface)](https://paperswithcode.com/sota/age-estimation-on-utkface?p=mivolo-multi-input-transformer-for-age-and) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-specialization-assessing-the-1/age-estimation-on-imdb-clean)](https://paperswithcode.com/sota/age-estimation-on-imdb-clean?p=beyond-specialization-assessing-the-1) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-specialization-assessing-the-1/facial-attribute-classification-on-fairface)](https://paperswithcode.com/sota/facial-attribute-classification-on-fairface?p=beyond-specialization-assessing-the-1) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-specialization-assessing-the-1/age-and-gender-classification-on-adience)](https://paperswithcode.com/sota/age-and-gender-classification-on-adience?p=beyond-specialization-assessing-the-1) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-specialization-assessing-the-1/age-and-gender-classification-on-adience-age)](https://paperswithcode.com/sota/age-and-gender-classification-on-adience-age?p=beyond-specialization-assessing-the-1) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-specialization-assessing-the-1/age-and-gender-estimation-on-lagenda-age)](https://paperswithcode.com/sota/age-and-gender-estimation-on-lagenda-age?p=beyond-specialization-assessing-the-1) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-specialization-assessing-the-1/gender-prediction-on-lagenda)](https://paperswithcode.com/sota/gender-prediction-on-lagenda?p=beyond-specialization-assessing-the-1) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mivolo-multi-input-transformer-for-age-and/age-estimation-on-agedb)](https://paperswithcode.com/sota/age-estimation-on-agedb?p=mivolo-multi-input-transformer-for-age-and) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mivolo-multi-input-transformer-for-age-and/gender-prediction-on-agedb)](https://paperswithcode.com/sota/gender-prediction-on-agedb?p=mivolo-multi-input-transformer-for-age-and) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/beyond-specialization-assessing-the-1/age-estimation-on-cacd)](https://paperswithcode.com/sota/age-estimation-on-cacd?p=beyond-specialization-assessing-the-1) > [**MiVOLO: Multi-input Transformer for Age and Gender Estimation**](https://arxiv.org/abs/2307.04616), > Maksim Kuprashevich, Irina Tolstykh, > *2023 [arXiv 2307.04616](https://arxiv.org/abs/2307.04616)* > [**Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation**](https://arxiv.org/abs/2403.02302), > Maksim Kuprashevich, Grigorii Alekseenko, Irina Tolstykh > *2024 [arXiv 2403.02302](https://arxiv.org/abs/2403.02302)* [[`Paper 2023`](https://arxiv.org/abs/2307.04616)] [[`Paper 2024`](https://arxiv.org/abs/2403.02302)] [[`Demo`](https://huggingface.co/spaces/iitolstykh/age_gender_estimation_demo)] [[`Telegram Bot`](https://t.me/AnyAgeBot)] [[`BibTex`](#citing)] [[`Data`](https://wildchlamydia.github.io/lagenda/)] ## MiVOLO pretrained models Gender & Age recognition performance.

Model	Type	Dataset (train and test)	Age MAE	Age CS@5	Gender Accuracy	download
volo_d1	face_only, age	IMDB-cleaned	4.29	67.71	-	checkpoint
volo_d1	face_only, age, gender	IMDB-cleaned	4.22	68.68	99.38	checkpoint
mivolo_d1	face_body, age, gender	IMDB-cleaned	4.24 [face+body] 6.87 [body]	68.32 [face+body] 46.32 [body]	99.46 [face+body] 96.48 [body]	model_imdb_cross_person_4.24_99.46.pth.tar
volo_d1	face_only, age	UTKFace	4.23	69.72	-	checkpoint
volo_d1	face_only, age, gender	UTKFace	4.23	69.78	97.69	checkpoint
mivolo_d1	face_body, age, gender	Lagenda	3.99 [face+body]	71.27 [face+body]	97.36 [face+body]	demo
mivolov2_d1_384x384	face_body, age, gender	Lagenda	3.65 [face+body]	74.48 [face+body]	97.99 [face+body]	telegram bot

## MiVOLO regression benchmarks Gender & Age recognition performance. Use [valid_age_gender.sh](scripts/valid_age_gender.sh) to reproduce results with our checkpoints.

Model	Type	Train Dataset	Test Dataset	Age MAE	Age CS@5	Gender Accuracy	download
mivolo_d1	face_body, age, gender	Lagenda	AgeDB	5.55 [face]	55.08 [face]	98.3 [face]	demo
mivolo_d1	face_body, age, gender	IMDB-cleaned	AgeDB	5.58 [face]	55.54 [face]	97.93 [face]	model_imdb_cross_person_4.24_99.46.pth.tar

## MiVOLO classification benchmarks Gender & Age recognition performance.

Model	Type	Train Dataset	Test Dataset	Age Accuracy	Gender Accuracy
mivolo_d1	face_body, age, gender	Lagenda	FairFace	61.07 [face+body]	95.73 [face+body]
mivolo_d1	face_body, age, gender	Lagenda	Adience	68.69 [face]	96.51[face]
mivolov2_d1_384	face_body, age, gender	Lagenda	Adience	69.43 [face]	97.39[face]

## Dataset **Please, [cite our papers](#citing) if you use any this data!** - Lagenda dataset: [images](https://drive.google.com/file/d/1QXO0NlkABPZT6x1_0Uc2i6KAtdcrpTbG/view?usp=sharing) and [annotation](https://drive.google.com/file/d/1mNYjYFb3MuKg-OL1UISoYsKObMUllbJx/view?usp=sharing). - IMDB-clean: follow [these instructions](https://github.com/yiminglin-ai/imdb-clean) to get images and [download](https://drive.google.com/file/d/17uEqyU3uQ5trWZ5vRJKzh41yeuDe5hyL/view?usp=sharing) our annotations. - UTK dataset: [origin full images](https://susanqq.github.io/UTKFace/) and our annotation: [split from the article](https://drive.google.com/file/d/1Fo1vPWrKtC5bPtnnVWNTdD4ZTKRXL9kv/view?usp=sharing), [random full split](https://drive.google.com/file/d/177AV631C3SIfi5nrmZA8CEihIt29cznJ/view?usp=sharing). - Adience dataset: follow [these instructions](https://talhassner.github.io/home/projects/Adience/Adience-data.html) to get images and [download](https://drive.google.com/file/d/1wS1Q4FpksxnCR88A1tGLsLIr91xHwcVv/view?usp=sharing) our annotations.

Click to expand!

After downloading them, your `data` directory should look something like this: ```console data └── Adience ├── annotations (folder with our annotations) ├── aligned (will not be used) ├── faces ├── fold_0_data.txt ├── fold_1_data.txt ├── fold_2_data.txt ├── fold_3_data.txt └── fold_4_data.txt ``` We use coarse aligned images from `faces/` dir. Using our detector we found a face bbox for each image (see [tools/prepare_adience.py](tools/prepare_adience.py)). This dataset has five folds. The performance metric is accuracy on five-fold cross validation. | images before removal | fold 0 | fold 1 | fold 2 | fold 3 | fold 4 | | --------------------- | ------ | ------ | ------ | ------ | ------ | | 19,370 | 4,484 | 3,730 | 3,894 | 3,446 | 3,816 | Not complete data | only age not found | only gender not found | SUM | | ------------------ | --------------------- | ------------- | | 40 | 1170 | 1,210 (6.2 %) | Removed data | failed to process image | age and gender not found | SUM | | ----------------------- | ------------------------ | ----------- | | 0 | 708 | 708 (3.6 %) | Genders | female | male | | ------ | ----- | | 9,372 | 8,120 | Ages (8 classes) after mapping to not intersected ages intervals | 0-2 | 4-6 | 8-12 | 15-20 | 25-32 | 38-43 | 48-53 | 60-100 | | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ------ | | 2,509 | 2,140 | 2,293 | 1,791 | 5,589 | 2,490 | 909 | 901 |

- FairFace dataset: follow [these instructions](https://github.com/joojs/fairface) to get images and [download](https://drive.google.com/file/d/1EdY30A1SQmox96Y39VhBxdgALYhbkzdm/view?usp=drive_link) our annotations.

Click to expand!

After downloading them, your `data` directory should look something like this: ```console data └── FairFace ├── annotations (folder with our annotations) ├── fairface-img-margin025-trainval (will not be used) ├── train ├── val ├── fairface-img-margin125-trainval ├── train ├── val ├── fairface_label_train.csv ├── fairface_label_val.csv ``` We use aligned images from `fairface-img-margin125-trainval/` dir. Using our detector we found a face bbox for each image and added a person bbox if it was possible (see [tools/prepare_fairface.py](tools/prepare_fairface.py)). This dataset has 2 splits: train and val. The performance metric is accuracy on validation. | images train | images val | | ------------ | ---------- | | 86,744 | 10,954 | Genders for **validation** | female | male | | ------ | ----- | | 5,162 | 5,792 | Ages for **validation** (9 classes): | 0-2 | 3-9 | 10-19 | 20-29 | 30-39 | 40-49 | 50-59 | 60-69 | 70+ | | --- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | --- | | 199 | 1,356 | 1,181 | 3,300 | 2,330 | 1,353 | 796 | 321 | 118 |

- AgeDB dataset: follow [these instructions](https://ibug.doc.ic.ac.uk/resources/agedb/) to get images and [download](https://drive.google.com/file/d/1Dp72BUlAsyUKeSoyE_DOsFRS1x6ZBJen/view) our annotations.

Click to expand!

**Ages**: 1 - 101 **Genders**: 9788 faces of `M`, 6700 faces of `F` | images 0 | images 1 | images 2 | images 3 | images 4 | images 5 | images 6 | images 7 | images 8 | images 9 | |----------|----------|----------|----------|----------|----------|----------|----------|----------|----------| | 1701 | 1721 | 1615 | 1619 | 1626 | 1643 | 1634 | 1596 | 1676 | 1657 | Data splits were taken from [here](https://github.com/paplhjak/Facial-Age-Estimation-Benchmark-Databases) !! **All splits(all dataset) were used for models evaluation.**

## Install Install pytorch 1.13+ and other requirements. ``` pip install -r requirements.txt pip install . ``` ## Demo 1. [Download](https://drive.google.com/file/d/1CGNCkZQNj5WkP3rLpENWAOgrBQkUWRdw/view) body + face detector model to `models/yolov8x_person_face.pt` 2. [Download](https://drive.google.com/file/d/11i8pKctxz3wVkDBlWKvhYIh7kpVFXSZ4/view) mivolo checkpoint to `models/mivolo_imbd.pth.tar` ```bash wget https://variety.com/wp-content/uploads/2023/04/MCDNOHA_SP001.jpg -O jennifer_lawrence.jpg python3 demo.py \ --input "jennifer_lawrence.jpg" \ --output "output" \ --detector-weights "models/yolov8x_person_face.pt " \ --checkpoint "models/mivolo_imbd.pth.tar" \ --device "cuda:0" \ --with-persons \ --draw ``` To run demo for a youtube video: ```bash python3 demo.py \ --input "https://www.youtube.com/shorts/pVh32k0hGEI" \ --output "output" \ --detector-weights "models/yolov8x_person_face.pt" \ --checkpoint "models/mivolo_imbd.pth.tar" \ --device "cuda:0" \ --draw \ --with-persons ``` ## Validation To reproduce validation metrics: 1. Download prepared annotations for imbd-clean / utk / adience / lagenda / fairface. 2. Download checkpoint 3. Run validation: ```bash python3 eval_pretrained.py \ --dataset_images /path/to/dataset/utk/images \ --dataset_annotations /path/to/dataset/utk/annotation \ --dataset_name utk \ --split valid \ --batch-size 512 \ --checkpoint models/mivolo_imbd.pth.tar \ --half \ --with-persons \ --device "cuda:0" ```` Supported dataset names: "utk", "imdb", "lagenda", "fairface", "adience". ## Changelog [CHANGELOG.md](CHANGELOG.md) ## ONNX and TensorRT export As of now (11.08.2023), while ONNX export is technically feasible, it is not advisable due to the poor performance of the resulting model with batch processing. **TensorRT** and **OpenVINO** export is impossible due to its lack of support for col2im. If you remain absolutely committed to utilizing ONNX export, you can refer to [these instructions](https://github.com/WildChlamydia/MiVOLO/issues/14#issuecomment-1675245889). The most highly recommended export method at present **is using TorchScript**. You can achieve this with a single line of code: ```python torch.jit.trace(model) ``` This approach provides you with a model that maintains its original speed and only requires a single file for usage, eliminating the need for additional code. ## License Please, see [here](./license) ## Citing If you use our models, code or dataset, we kindly request you to cite the following paper and give repository a :star: ```bibtex @article{mivolo2023, Author = {Maksim Kuprashevich and Irina Tolstykh}, Title = {MiVOLO: Multi-input Transformer for Age and Gender Estimation}, Year = {2023}, Eprint = {arXiv:2307.04616}, } ``` ```bibtex @article{mivolo2024, Author = {Maksim Kuprashevich and Grigorii Alekseenko and Irina Tolstykh}, Title = {Beyond Specialization: Assessing the Capabilities of MLLMs in Age and Gender Estimation}, Year = {2024}, Eprint = {arXiv:2403.02302}, } ```