First-person View Gesture Recognition Based on Region Convolutional 3D Network

  • Shentao Wang, Shang Zhang

Abstract

Gesture is a visual and effective method of interaction with VR/AR devices. However, there are still many challenges for video-based first-person view gesture recognition. This paper proposes a method based on Region Convolutional 3D Network to detect and recognize gestures in undivided RGB-D video data. RGB data and Depth data were extracted with a model and feature fusion was conducted. Then generate candidate regions containing gestures in time. After that, classifies selected generates candidate temporal regions into specific gestures. By characteristic design of first-person view gestures, numbers and size of anchor boxes were optimized to accelerate the process of generating gesture candidate regions, where the quality of gesture candidate regions was enhanced and computation is saved. To verify the method and test effects under different backgrounds and illuminations, we tested with EgoGesture dataset, indicating validity of the method. Our method showed superiority in respect of interaction with wearable devices of VR/AR.

Published
2020-11-30
How to Cite
Shentao Wang, Shang Zhang. (2020). First-person View Gesture Recognition Based on Region Convolutional 3D Network. Design Engineering, 714 - 722. https://doi.org/10.17762/de.vi.940
Section
Articles