Spaces:
Runtime error
Runtime error
import streamlit as st | |
from streamlit_extras.switch_page_button import switch_page | |
st.title("Florence-2") | |
st.success("""[Original tweet](https://twitter.com/mervenoyann/status/1803769866878623819) (June 20, 2024)""", icon="βΉοΈ") | |
st.markdown(""" """) | |
st.markdown("""Florence-2 is a new vision foundation model by Microsoft capable of a wide variety of tasks π€― | |
Let's unpack! π§Ά | |
""") | |
st.markdown(""" """) | |
st.image("pages/Florence-2/image_1.jpg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
This model is can handle tasks that vary from document understanding to semantic segmentation π€© | |
[Demo](https://t.co/7YJZvjhw84) | [Collection](https://t.co/Ub7FGazDz1) | |
""") | |
st.markdown(""" """) | |
st.image("pages/Florence-2/image_2.jpg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
The difference from previous models is that the authors have compiled a dataset that consists of 126M images with 5.4B annotations labelled with their own data engine ββ | |
""") | |
st.markdown(""" """) | |
st.image("pages/Florence-2/image_3.jpg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
The dataset also offers more variety in annotations compared to other datasets, it has region level and image level annotations with more variety in semantic granularity as well! | |
""") | |
st.markdown(""" """) | |
st.image("pages/Florence-2/image_4.jpg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
The model is a similar architecture to previous models, an image encoder, a multimodality encoder with text decoder. | |
The authors have compiled the multitask dataset with prompts for each task which makes the model trainable on multiple tasks π€ | |
""") | |
st.markdown(""" """) | |
st.image("pages/Florence-2/image_5.jpg", use_column_width=True) | |
st.markdown(""" """) | |
st.markdown(""" | |
You also fine-tune this model on any task of choice, the authors also released different results on downstream tasks and report their results when un/freezing vision encoder π€π | |
They have released fine-tuned models too, you can find them in the collection above π€ | |
""") | |
st.markdown(""" """) | |
st.image("pages/Florence-2/image_6.jpg", use_column_width=True) | |
st.markdown(""" """) | |
st.info(""" | |
Resources: | |
- [Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks](https://arxiv.org/abs/2311.06242) | |
by Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan (2023) | |
- [Hugging Face blog post](https://huggingface.co/blog/finetune-florence2) | |
- [All the fine-tuned Florence-2 models](https://huggingface.co/models?search=florence-2) | |
""", icon="π") | |
st.markdown(""" """) | |
st.markdown(""" """) | |
st.markdown(""" """) | |
col1, col2, col3 = st.columns(3) | |
with col1: | |
if st.button('Previous paper', use_container_width=True): | |
switch_page("Depth Anything V2") | |
with col2: | |
if st.button('Home', use_container_width=True): | |
switch_page("Home") | |
with col3: | |
if st.button('Next paper', use_container_width=True): | |
switch_page("4M-21") |