Model Card for PaliGemma Dermatology Model
Model Details
Model Description
This model, based on the PaliGemma-3B architecture, has been fine-tuned for dermatology-related image and text processing tasks. The model is designed to assist in the identification of various skin conditions using a combination of image analysis and natural language processing.
- Developed by: Bruce_Wayne
- Model type: vision model
- Finetuned from model: https://huggingface.co/google/paligemma-3b-pt-224
- LoRa Adaptors used: Yes
- Intended use: Medical image analysis, specifically for dermatology **
please let me know how the model works -->https://forms.gle/cBA6apSevTyiEbp46
Thank you
Uses
Direct Use
The model can be directly used for analyzing dermatology images, providing insights into potential skin conditions.
Bias, Risks, and Limitations
Skin Tone Bias: The model may have been trained on a dataset that does not adequately represent all skin tones, potentially leading to biased results. Geographic Bias: The model's performance may vary depending on the prevalence of certain conditions in different geographic regions.
How to Get Started with the Model
import torch
from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
from PIL import Image
# Load the model and processor
model_id = "brucewayne0459/paligemma_derm"
processor = AutoProcessor.from_pretrained(model_id)
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id, device_map={"": 0})
model.eval()
# Load a sample image and text input
input_text = "Identify the skin condition?"
input_image_path = " Replace with your actual image path"
input_image = Image.open(input_image_path).convert("RGB")
# Process the input
inputs = processor(text=input_text, images=input_image, return_tensors="pt", padding="longest").to("cuda" if torch.cuda.is_available() else "cpu")
# Set the maximum length for generation
max_new_tokens = 50
# Run inference
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=max_new_tokens)
# Decode the output
decoded_output = processor.decode(outputs[0], skip_special_tokens=True)
print("Model Output:", decoded_output)
Training Details
Training Data
The model was fine-tuned on a dataset of dermatological images combined with disease names
Training Procedure
The model was fine-tuned using LoRA (Low-Rank Adaptation) for more efficient training. Mixed precision (bfloat16) was used to speed up training and reduce memory usage.
Training Hyperparameters
- Training regime: Mixed precision (bfloat16)
- Epochs: 10
- Learning rate: 2e-5
- Batch size: 6
- Gradient accumulation steps: 4
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated on a separate validation set of dermatological images and Disease Names, distinct from the training data.
Metrics
- Validation Loss: The loss was tracked throughout the training process to evaluate model performance.
- Accuracy: The primary metric for assessing model predictions.
Results
The model achieved a final validation loss of approximately 0.2214, indicating reasonable performance in predicting skin conditions based on the dataset used.
Summary
Environmental Impact
- Hardware Type: 1 x L4 GPU
- Hours used: ~22 HOURS
- Cloud Provider: LIGHTNING AI
- Compute Region: USA
- Carbon Emitted: 0.9 kg eq. CO2
Technical Specifications
Model Architecture and Objective
- Architecture: Vision-Language model based on PaliGemma-3B
- Objective: To classify and diagnose dermatological conditions from images and text
Compute Infrastructure
Hardware
- GPU: 1xL4 GPU
Model Card Authors
Bruce_Wayne
- Downloads last month
- 124