Spaces:
Sleeping
Sleeping
title: MultiModal Phi2 | |
emoji: π | |
colorFrom: blue | |
colorTo: red | |
sdk: gradio | |
sdk_version: 3.35.2 | |
app_file: app.py | |
pinned: false | |
license: mit | |
## Phi2 : Multimodal Finetuning | |
### Details | |
1. LLM Backbone: Phi2 | |
2. Vision Tower: clip-vit-large-patch14-336 | |
3. Audio Model: Whisper | |
4. Pretraining Dataset: LAION-CC-SBU dataset with BLIP captions(200k samples) | |
5. Finetuning Dataset: Instruct 150k dataset based on COCO | |
### Design | |
![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/56df24cd-2681-4e17-ab64-9652f609b15f) | |
### Pretraining | |
#### Training Loss Curve | |
![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/b6c37a95-0a56-4b52-8719-3ff56dc1b703) | |
#### Learing Rate | |
![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/44d9a11b-b28d-47e1-ba1d-d6dc22ebe748) | |
#### Training Logs | |
![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/76543d98-d9fe-4c1a-ac47-3d06e48053ad) | |
### Finetuning | |
#### Training Loss Curve | |
![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/45ef40bd-fae5-4cfe-a522-c0eed2833230) | |
#### Learing Rate | |
![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/df60ee62-a537-4e36-a7f7-f7111e101162) | |
#### Training Logs | |
![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/2747acce-bc99-4c37-a05a-d5e81cb9aa9d) | |
### Results | |
![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/f12a9f04-df32-413e-b957-774c30381b2b) | |