SuperAGI
/

Veagle

Model card Files Files and versions Community

abhaykondi commited on Jan 22

Commit

0dffa44

•

1 Parent(s): 46e34b5

Update README.md

Browse files

Files changed (1) hide show

README.md +69 -0

README.md CHANGED Viewed

@@ -2,8 +2,77 @@
 license: apache-2.0
 ---

 license: apache-2.0
 ---
+# Model Card
+Veagle significantly improves the textual understanding & interpretation of images. The unique feature of Veagle
+is in its architectural change along with a combination of different components: a vision abstractor from mPlugOwl,
+Q-Former from InstructBLIP, and the Mistral language model. This combination allows Veagle to better understand and
+interpret the connection between text and images achieving state-of-the-art results. Veagle starts with a pre-trained
+vision encoder and language model and is trained in two stages. This method helps the model effectively use information
+from images and text together.
+Further details about Veagle can be found in this detailed blog post: https://superagi.com/superagi-veagle/
+## Key Contributions
+- Veagle has surpassed most state-of-the-art (SOTA) models in major benchmarks, capable of outperforming competitors
+   in various tasks and domains.
+- Using an optimized dataset, Veagle achieves high accuracy and efficiency. This demonstrates the model's effective
+  learning from limited data. We meticulously curated a dataset of 3.5 million examples, specifically tailored to
+  enhance visual representation learning.
+- Veagle's architecture is a unique blend of components, including a visionary abstractor inspired by mPlugOwl,
+  the Q-Former module from InstructBLIP, and the powerful Mistral language model. This innovative architecture,
+  complemented by an additional projectional layer and architectural refinements, empowers Veagle to excel in multimodal tasks.
+## Training
+- Trained by: SuperAGI Team
+- Hardware: NVIDIA 8 x A100 SxM (80GB)
+- LLM: Mistral 7B
+- Vision Encoder: mPLUG-OWL2
+- Duration of pretraining: 12 hours
+- Duration of finetuning: 25 hours
+- Number of epochs in pretraining: 3
+- Number of epochs in finetuning: 2
+- Batch size in pretraining: 8
+- Batch size in finetuning: 10
+- Learning Rate: 1e-5
+- Weight Decay: 0.05
+- Optmizer: AdamW
+## Steps to try
+  ```python
+  1.Clone the repository
+  git clone https://github.com/superagi/Veagle
+  cd Veagle
+  ```
+  ```python
+  2. Run installation script
+  source venv/bin/activate
+  chmod +x install.sh
+  ./install.sh
+  ```
+  ```python
+  3. python evaluate.py --answer_qs \
+   --model_name veagle_mistral \
+  --img_path images/food.jpeg \
+   --question "Is the food given in the image is healthy or not?"
+  ```
+## Evaluation
+![Image 18-01-24 at 3.39 PM.jpg](https://cdn-uploads.huggingface.co/production/uploads/65a8fe900dba6b99a0164a47/bBBFaYI6maW_DKci9nl6L.jpeg)
+## The SuperAGI team
+Rajat Chawla, Arkajith Dutta, Tushar Jha, Anmol Gautam, Ayush vatsal,
+Sukrit Chatterji, Adarsh Jha, Mukunda NS, Ishaan Bhola