MTTR commited on
Commit
765e386
1 Parent(s): 3b01dc7

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +9 -12
app.py CHANGED
@@ -11,12 +11,9 @@ Given a text query and a short clip based on a YouTube video, we demonstrate how
11
  This is a **limited** demonstration of MTTR's performance. The model used here was trained **exclusively** on Refer-YouTube-VOS with window size `w=12` (as described in our paper). No additional training data was used whatsoever.
12
  Hence, the model's performance may be limited, especially on instances from unseen categories.
13
 
14
- Additionally, slow processing times may be encountered, depending on the input clip length and/or resolution, and due to Colab's limited computational resources.
15
-
16
- Finally, we emphasize that this demonstration is intended to be used for academic purposes only. We do not take any responsibility for how the created content is used or distributed, and discourage the users from copyright infringment of YouTube videos. <br><br>
17
-
18
- And now, with all formalities aside, let's begin!
19
 
 
20
  """
21
 
22
  import gradio as gr
@@ -70,13 +67,13 @@ def process(text_query, full_video_path):
70
  subclip = video.subclip(start_pt, end_pt)
71
  subclip.write_videofile(input_clip_path)
72
 
73
- # checkpoint_path ='./refer-youtube-vos_window-12.pth.tar'
74
- model, postprocessor = torch.hub.load('Randl/MTTR:main','mttr_refer_youtube_vos', get_weights=True)
75
 
76
- # model_state_dict = torch.load(checkpoint_path, map_location='cpu')
77
- # if 'model_state_dict' in model_state_dict.keys():
78
- # model_state_dict = model_state_dict['model_state_dict']
79
- # model.load_state_dict(model_state_dict, strict=True)
80
 
81
 
82
  text_queries= [text_query]
@@ -156,7 +153,7 @@ def process(text_query, full_video_path):
156
 
157
  title = "End-to-End Referring Video Object Segmentation with Multimodal Transformers - Interactive Demo"
158
 
159
- description = "This notebook provides a (limited) hands-on demonstration of MTTR. Given a text query and a short clip based on a YouTube video, we demonstrate how MTTR can be used to segment the referred object instance throughout the video. To use it, upload an .mp4 video file and input a text query which describes one of the instances in that video."
160
 
161
  article = "<p style='text-align: center'><a href='https://github.com/mttr2021/MTTR'>Github Repo</a></p>"
162
 
 
11
  This is a **limited** demonstration of MTTR's performance. The model used here was trained **exclusively** on Refer-YouTube-VOS with window size `w=12` (as described in our paper). No additional training data was used whatsoever.
12
  Hence, the model's performance may be limited, especially on instances from unseen categories.
13
 
14
+ Additionally, slow processing times may be encountered, depending on the input clip length and/or resolution, and due to HuggingFace's limited computational resources (no GPU acceleration unfortunately).
 
 
 
 
15
 
16
+ Finally, we emphasize that this demonstration is intended to be used for academic purposes only. We do not take any responsibility for how the created content is used or distributed.
17
  """
18
 
19
  import gradio as gr
 
67
  subclip = video.subclip(start_pt, end_pt)
68
  subclip.write_videofile(input_clip_path)
69
 
70
+ checkpoint_path ='./refer-youtube-vos_window-12.pth.tar'
71
+ model, postprocessor = torch.hub.load('Randl/MTTR:main','mttr_refer_youtube_vos', get_weights=False)
72
 
73
+ model_state_dict = torch.load(checkpoint_path, map_location='cpu')
74
+ if 'model_state_dict' in model_state_dict.keys():
75
+ model_state_dict = model_state_dict['model_state_dict']
76
+ model.load_state_dict(model_state_dict, strict=True)
77
 
78
 
79
  text_queries= [text_query]
 
153
 
154
  title = "End-to-End Referring Video Object Segmentation with Multimodal Transformers - Interactive Demo"
155
 
156
+ description = "This notebook provides a (limited) hands-on demonstration of MTTR.\n Given a text query and a short clip based on a YouTube video, we demonstrate how MTTR can be used to segment the referred object instance throughout the video. To use it, upload an .mp4 video file and input a text query which describes one of the instances in that video. \n Disclaimer: \n This is a **limited** demonstration of MTTR's performance. The model used here was trained **exclusively** on Refer-YouTube-VOS with window size `w=12` (as described in our paper). No additional training data was used whatsoever. Hence, the model's performance may be limited, especially on instances from unseen categories. Additionally, slow processing times may be encountered, depending on the input clip length and/or resolution, and due to HuggingFace's limited computational resources (no GPU acceleration unfortunately).Finally, we emphasize that this demonstration is intended to be used for academic purposes only. We do not take any responsibility for how the created content is used or distributed. "
157
 
158
  article = "<p style='text-align: center'><a href='https://github.com/mttr2021/MTTR'>Github Repo</a></p>"
159