xiziwang
push files
2e36228
|
raw
history blame
1.18 kB

How to process video as data loader

We assume that video is preprocessed in to image files in advance. Usually, we do not use all frames in a clip but sample a certain duration (e.g. 16 frames). The pipline we assume for each chunk is the following.

  • Get a list of images paths of clips e.g. ["./video/clip1/frame0.jpg",...,"./video/clip1/frame101.jpg"]
  • Sample a certain duration we want to use e.g. ["./video/clip1/frame11.jpg",...,"./video/clip1/frame26.jpg"]
  • Load each frames into a tensor shaped as (T, H, W, C). HW can be changed later.
  • Use torchvision builtin utilities to crop, flip, etc. For example,

Note that the first part is different from what official pytorch repository ( https://github.com/pytorch/vision/tree/master/references/video_classification ) does. We don't use VideoClip class.