import streamlit as st from streamlit_extras.switch_page_button import switch_page st.title("Florence-2") st.success("""[Original tweet](https://twitter.com/mervenoyann/status/1803769866878623819) (June 20, 2024)""", icon="โ„น๏ธ") st.markdown(""" """) st.markdown("""Florence-2 is a new vision foundation model by Microsoft capable of a wide variety of tasks ๐Ÿคฏ Let's unpack! ๐Ÿงถ """) st.markdown(""" """) st.image("pages/Florence-2/image_1.jpg", use_column_width=True) st.markdown(""" """) st.markdown(""" This model is can handle tasks that vary from document understanding to semantic segmentation ๐Ÿคฉ [Demo](https://t.co/7YJZvjhw84) | [Collection](https://t.co/Ub7FGazDz1) """) st.markdown(""" """) st.image("pages/Florence-2/image_2.jpg", use_column_width=True) st.markdown(""" """) st.markdown(""" The difference from previous models is that the authors have compiled a dataset that consists of 126M images with 5.4B annotations labelled with their own data engine โ†“โ†“ """) st.markdown(""" """) st.image("pages/Florence-2/image_3.jpg", use_column_width=True) st.markdown(""" """) st.markdown(""" The dataset also offers more variety in annotations compared to other datasets, it has region level and image level annotations with more variety in semantic granularity as well! """) st.markdown(""" """) st.image("pages/Florence-2/image_4.jpg", use_column_width=True) st.markdown(""" """) st.markdown(""" The model is a similar architecture to previous models, an image encoder, a multimodality encoder with text decoder. The authors have compiled the multitask dataset with prompts for each task which makes the model trainable on multiple tasks ๐Ÿค— """) st.markdown(""" """) st.image("pages/Florence-2/image_5.jpg", use_column_width=True) st.markdown(""" """) st.markdown(""" You also fine-tune this model on any task of choice, the authors also released different results on downstream tasks and report their results when un/freezing vision encoder ๐Ÿค“๐Ÿ“‰ They have released fine-tuned models too, you can find them in the collection above ๐Ÿค— """) st.markdown(""" """) st.image("pages/Florence-2/image_6.jpg", use_column_width=True) st.markdown(""" """) st.info(""" Ressources: [Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks](https://arxiv.org/abs/2311.06242) by Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan (2023) [Hugging Face blog post](https://huggingface.co/blog/finetune-florence2)""", icon="๐Ÿ“š") st.markdown(""" """) st.markdown(""" """) st.markdown(""" """) col1, col2, col3 = st.columns(3) with col1: if st.button('Previous paper', use_container_width=True): switch_page("Depth Anything V2") with col2: if st.button('Home', use_container_width=True): switch_page("Home") with col3: if st.button('Next paper', use_container_width=True): switch_page("4M-21")