--- base_model: microsoft/Florence-2-base-ft library_name: peft license: apache-2.0 language: - en pipeline_tag: visual-question-answering metrics: - accuracy tags: - deepfake detection --- # FLODA: FLorence-2 Optimized for Deepfake Assessment ## Model Description FLODA (FLorence-2 Optimized for Deepfake Assessment) is an advanced deepfake detection model that leverages the power of Vision-Language Models (VLMs). It's designed to surpass existing deepfake detection models by integrating image captioning and authenticity assessment into a single end-to-end architecture. ## Key Features - Utilizes Florence-2 as the base VLM for both caption generation and deepfake detection - Reframes deepfake detection as a Visual Question Answering (VQA) task - Incorporates image caption information for enhanced contextual understanding - Employs rsLoRA (rank-stabilized Low-Rank Adaptation) for efficient fine-tuning - Demonstrates strong generalization across diverse scenarios - Shows robustness against adversarial attacks ## Model Architecture FLODA is based on the Florence-2 model and consists of two main components: 1. Vision Encoder: Uses DaViT (Dual Attention Vision Transformer) 2. Multi-modality Encoder-Decoder: Based on a standard transformer architecture The model is fine-tuned using rsLoRA, with the following configuration: - Rank (r): 8 - Alpha (α): 8 - Dropout: 0.05 - Target Modules: q_proj, k_proj, v_proj, out_proj, lm_head ## Performance FLODA achieves state-of-the-art performance in deepfake detection: - Average accuracy across all datasets: 97.14% - Strong performance on both real and fake image datasets - 100% accuracy on several fake datasets and all attacked datasets ## Usage ```python from transformers import AutoProcessor, AutoModelForCausalLM from PIL import Image import torch # Load the model and processor model_path = "path/to/floda/model" model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to("cuda").eval() processor = AutoProcessor.from_pretrained(model_path, trust_remote_code=True) def detect_deepfake(image_path): image = Image.open(image_path).convert("RGB") task_prompt = "" text_input = "Is this photo real?" inputs = processor(text=task_prompt + text_input, images=image, return_tensors="pt").to("cuda") with torch.no_grad(): generated_ids = model.generate( input_ids=inputs["input_ids"], pixel_values=inputs["pixel_values"], max_new_tokens=1024, num_beams=3 ) generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0] result = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))[task_prompt] return "Real" if result.lower() == "yes" else "Fake" # Example usage result = detect_deepfake("path/to/image.jpg") print(f"The image is: {result}") ``` ## Training Data FLODA was trained on a dataset including: - Real images: MS COCO - Fake images: Generated by SD2 and LaMa ## Evaluation Data The model was evaluated on 16 datasets: - 2 real image datasets: MS COCO, Flickr30k - 14 fake image datasets generated by various models (e.g., SD2, SDXL, DeepFloyd IF, DALLE-2, SGXL) - Includes datasets with stylized images, inpainting, resolution changes, and face-swapping - Adversarial, backdoor, and data poisoning attack datasets ## Limitations - Performance on the ControlNet dataset (77.07% accuracy) is lower compared to some competing models - The model's effectiveness on very recent or future AI-generated image techniques not included in the training or evaluation datasets is uncertain ## Ethical Considerations While FLODA shows promising results in deepfake detection, it's important to consider: - The potential for false positives or negatives, which could have significant implications depending on the use case - The need for continuous updating as new image generation techniques emerge - Privacy considerations when processing user-submitted images ## Model Card Authors [optional] - Youngho Bae (Hanyang University) - Gunhui Han (Yonsei University) - Seunghyeon Park (Yonsei University) ## Model Card Contact For inquiries about this model card or the FLODA model, please contact: Youngho Bae Email: byh711@gmail.com ### Framework versions - PEFT 0.12.0