通常的做法是将视频视为一系列图像(帧),并使用仅在图像上训练的深度神经网络模型来执行视频上的相似分析任务。在这篇论文中,我们表明,这种在图像上运行良好的深度学习模型在视频上也会运行良好的“信念飞跃”实际上是有缺陷的。
It is a common practice to think of a video as a sequence of images (frames), andre-use deep neural network models that are trained only on images for similaranalytics tasks on videos. In this paper, we show that this “leap of faith” t