MultiModal-Phi2 / README.md
GunaKoppula's picture
Upload 70 files
efe75b3 verified
|
raw
history blame
1.44 kB
metadata
title: MultiModal Phi2
emoji: πŸš€
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.35.2
app_file: app.py
pinned: false
license: mit

Phi2 : Multimodal Finetuning

Details

  1. LLM Backbone: Phi2
  2. Vision Tower: clip-vit-large-patch14-336
  3. Audio Model: Whisper
  4. Pretraining Dataset: LAION-CC-SBU dataset with BLIP captions(200k samples)
  5. Finetuning Dataset: Instruct 150k dataset based on COCO

Design

image

Pretraining

Training Loss Curve

image

Learing Rate

image

Training Logs

image

Finetuning

Training Loss Curve

image

Learing Rate

image

Training Logs

image

Results

image