NikshepShetty
/

Florence-2-Recap-DataComp

image-captioning

Model card Files Files and versions Community

NikshepShetty commited on Aug 3

Commit

eecbbe1

•

1 Parent(s): eea5f05

Update README.md

Files changed (1) hide show

README.md +35 -1

README.md CHANGED Viewed

@@ -10,6 +10,26 @@ tags:
   - adapter
   - image-captioning
   - peft
 ---
 # Florence-2 Recap-DataComp LoRA Adapter
@@ -57,4 +77,18 @@ This code demonstrates how to:
 2. Load the LoRA adapter
 3. Process an image and generate a detailed caption
-Note: Make sure you have the required libraries installed: transformers, peft, einops, flash_attn, timm, Pillow, and requests.

   - adapter
   - image-captioning
   - peft
+model-index:
+- name: Florence-2-DOCCI-FT
+  results:
+  - task:
+      type: image-to-text
+      name: Image Captioning
+    dataset:
+      name: foundation-multimodal-models/DetailCaps-4870
+      type: other
+    metrics:
+    - type: meteor
+      value: 0.240
+    - type: bleu
+      value: 0.150
+    - type: cider
+      value: 0.035
+    - type: capture
+      value: 0.553
+    - type: rouge-l
+      value: 0.294
 ---
 # Florence-2 Recap-DataComp LoRA Adapter
 2. Load the LoRA adapter
 3. Process an image and generate a detailed caption
+Note: Make sure you have the required libraries installed: transformers, peft, einops, flash_attn, timm, Pillow, and requests.
+## Evaluation results
+Our LoRA adapter shows improvements over the base Florence-2 model across all metrics for MORE_DETAILED_CAPTION tag for 1000 images on the foundation-multimodal-models/DetailCaps-4870 dataset:
+| Metric  | Base Model | Adapted Model | Improvement |
+|---------|------------|-----------------------|-------------|
+| METEOR  | 0.213      | 0.240                 | +12.7%      |
+| BLEU    | 0.110      | 0.150                 | +36.4%      |
+| CIDEr   | 0.031      | 0.035                 | +12.9%      |
+| CAPTURE | 0.546      | 0.553                 | +1.3%       |
+| ROUGE-L | 0.275      | 0.294                 | +6.9%       |
+These results demonstrate that our LoRA adapter enhances the image captioning capabilities of the Florence-2 base model, particularly in generating more detailed and accurate captions.