--- library_name: transformers license: apache-2.0 pipeline_tag: image-text-to-text datasets: - markury/AndroAtlas language: - en --- # AndroGemma-alpha Model Card **Model page:** [AndroGemma-alpha](https://huggingface.co/markury/androgemma-alpha) AndroGemma-alpha is a fine-tuned Vision-Language Model (VLM) based on Google's PaliGemma. The model aims to enhance the representation and understanding of male anatomy, specifically the penis, in AI models. This fine-tuning utilizes the AndroAtlas dataset, which includes both text and image pairs, to provide comprehensive training data for this purpose. **Resources and technical documentation:** * [AndroAtlas Dataset](https://huggingface.co/datasets/markury/androatlas) **Authors:** Markury **Contributors:** Members of The Bulge Discord server, including enkie, Zellian, and SilasAI6609 for various support, and detailed contributions to the system prompts and image sourcing. ## Model information ### Model summary #### Description AndroGemma-alpha is a fine-tuned version of PaliGemma, focusing on male anatomy to improve the model's understanding and representation of this underrepresented area. The dataset for fine-tuning includes a mix of text and image pairs sourced from Reddit and other non-public sources, ensuring detailed and diverse examples. #### Model architecture AndroGemma-alpha builds on the PaliGemma model, comprising a Transformer decoder and a Vision Transformer image encoder, fine-tuned with AndroAtlas. The model supports tasks like image captioning, visual question answering, and more, specific to male anatomy. #### Inputs and outputs * **Input:** Image and text string, such as a prompt to caption the image, or a question. * **Output:** Generated text in response to the input, such as a caption of the image or an answer to a question. ## How to Use AndroGemma-alpha is best used through the MPIC (Markury's Paligemma Image Captioner) application for practical inference and integration into projects. For Python inference code, refer to the MPIC source code and adapt it to fit your needs. ### Using MPIC CLI The MPIC (Markury's Paligemma Image Captioner) CLI is the preferred method for using the AndroGemma-alpha model. For details on installation and usage, visit the [MPIC repository](https://github.com/markuryy/paligemma-image-captioner). ## Training Details ### Training Data The AndroAtlas dataset was used for training, which includes: - **Text and Image Pairs:** Curated from Reddit, ensuring diverse and representative samples. - **Annotations:** Detailed labels to enhance model training and understanding. - **Focus:** Male anatomy, with an emphasis on the penis. ### Training Procedure The fine-tuning process involved using the first 5 batches (243 text/image pairs) of images from AndroAtlas, supplemented with approximately 150 additional image/text pairs with detailed human-captioned annotations on circumcision and erection status. The captions were generated using a specialized system prompt with GPT-4o and later refined with Llama3-70B for consistency. For full details on the training process, refer to the training script provided in the [repository](https://github.com/markuryy/paligemma-image-captioner/blob/main/finetuning/Paligemma_448_Finetune_JAX.ipynb). ## Model Card Authors - **Markury** ## Model Card Contact - **Markury** This model card provides an overview of the AndroGemma-alpha model, including its purpose, training details, and evaluation. By using this model, you contribute to the development of more inclusive and representative AI systems.