Video Pose Distillation

Abstract

Human pose is a useful feature for fine-grained sports action understanding. However, pose estimators are often unreliable when run on sports video due to domain shift and factors such as motion blur and occlusions. This leads to poor accuracy when downstream tasks, such as action recognition, depend on pose. End-to-end learning circumvents pose, but requires more labels to generalize.

We introduce Video Pose Distillation (VPD), a weakly-supervised technique to learn features for new video domains, such as individual sports that challenge pose estimation. Under VPD, a student network learns to extract robust pose features from RGB frames in the sports video, such that, whenever pose is considered reliable, the features match the output of a pretrained teacher pose detector. Our strategy retains the best of both pose and end-to-end worlds, exploiting the rich visual patterns in raw video frames, while learning features that agree with the athletes' pose and motion in the target video domain to avoid over-fitting to patterns unrelated to athletes' motion.

VPD features improve performance on few-shot, fine-grained action recognition, retrieval, and detection tasks in four real-world sports video datasets, without requiring additional ground-truth pose annotations.

Materials

Paper

Supplemental Materials

arXiv, CVF Open Access

Code

Video

Citation

@inproceedings{hong_vpd_iccv21,
    author={Hong, James and Fisher, Matthew and Gharbi, Micha\"{e}l and Fatahalian, Kayvon},
    title={Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition},
    booktitle={ICCV},
    year={2021}
}

Acknowledgements

This work is supported by the National Science Foundation (NSF) under III-1908727 and Adobe Research.

Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition

James Hong¹ Matthew Fisher² Michaël Gharbi² Kayvon Fatahalian¹

¹Stanford University ²Adobe Research

ICCV 2021

Abstract

Materials

Video

Citation

Acknowledgements

Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition

James Hong1 Matthew Fisher2 Michaël Gharbi2 Kayvon Fatahalian1

1Stanford University 2Adobe Research

ICCV 2021

Abstract

Materials

Video

Citation

Acknowledgements

James Hong¹ Matthew Fisher² Michaël Gharbi² Kayvon Fatahalian¹

¹Stanford University ²Adobe Research