Tracking hand articulations: Relying on 3D visual hulls versus relying on multiple 2D cues

Published in ISUVR, 2013

  1. Full citation

    Oikonomidis, I., Kyriazis, N., Tzevanidis, K., & Argyros, A. A. (2013). Tracking hand articulations: Relying on 3D visual hulls versus relying on multiple 2D cues. ISUVR, 7–10. https://doi.org/10.1109/ISUVR.2013.13

    Abstract

    We present a method for articulated hand tracking that relies on visual input acquired by a calibrated multicamera system. A state-of-the-art result on this problem has been presented in [12]. In that work, hand tracking is formulated as the minimization of an objective function that quantifies the discrepancy between a hand pose hypothesis and the observations. The objective function treats the observations from each camera view in an independent way. We follow the same general optimization framework but we choose to employ the visual hull [10] as the main observation cue, which results from the integration of information from all available views prior to optimization. We investigate the behavior of the resulting method in extensive experiments and in comparison with that of [12]. The obtained results demonstrate that for low levels of noise contamination, regardless of the number of cameras, the two methods perform comparably. The situation changes when noisy observations or as few as two cameras with short baselines are employed. In these cases, the proposed method is more accurate than that of [12]. Thus, the proposed method is preferable in real-world scenarios with noisy observations obtained from easy-to-deploy, stereo camera setups.