Multi layer forests

Learning to be a Depth Camera for Close-Range Human Capture and Interaction

Sean Ryan Fanello1,2

1

Cem Keskin1 Shahram Izadi1 Pushmeet Kohli1 David Kim1 David Sweeney1
Antonio Criminisi1 Jamie Shotton1 Sing Bing Kang1 Tim Paek1

Microsoft Research

2

iCub Facility - Istituto Italiano di Tecnologia

a

b

c

e

g

d

f

h

Figure 1: (a, b) Our approach turns any 2D camera into a cheap depth sensor for close-range human capture and 3D interaction scenarios.
(c, d) Simple hardware modiﬁcations allow active illuminated near infrared images to be captured from the camera. (e, f) This is used as input into our machine learning algorithm for depth estimation. (g, h) Our algorithm outputs dense metric depth maps of hands or faces in real-time.

Abstract

1

We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modiﬁcations. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classiﬁcation-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of humancomputer interaction and capture scenarios. Experiments show an accuracy that outperforms a conventional light fall-off baseline, and is comparable to high-quality consumer depth cameras, but with a dramatically reduced cost, power consumption, and form-factor.

While range sensing technologies have existed for a long time, consumer depth cameras such as the Microsoft Kinect have begun to make real-time depth acquisition a commodity. This in turn has opened-up many exciting new applications for gaming, 3D scanning and fabrication, natural user interfaces, augmented reality, and robotics. One important domain where depth cameras have had clear impact is in human-computer interaction. In particular, the ability to

References: A HMED , A. H., AND FARAG , A. A. 2007. Shape from shading under various imaging conditions A MIT, Y., AND G EMAN , D. 1997. Shape quantization and recognition with randomized trees. Neural Computation 9, 7. BARRON , J. T., AND M ALIK , J. 2013. Shape, illumination, and reﬂectance from shading. Tech. Rep. UCB/EECS-2013-117, EECS, UC Berkeley, May. BATLLE , J., M OUADDIB , E., AND S ALVI , J. 1998. Recent progress in coded structured light as a technique to solve the correspondence problem: a survey B EN -A RIE , J., AND NANDY, D. 1998. A neural network approach for reconstructing surface shape from shading B ESL , P. J. 1988. Active, optical range imaging sensors. Machine vision and applications 1, 2, 127–152. B LAIS , F. 2004. Review of 20 years of range sensor development. B LANZ , V., AND V ETTER , T. 1999. A morphable model for the synthesis of 3D faces B REIMAN , L. 2001. Random forests. Machine Learning 45, 1. B ROWN , M. Z., B URSCHKA , D., AND H AGER , G. D. 2003. C OMANICIU , D., AND M EER , P. 2002. Mean shift: A robust approach toward feature space analysis C RIMINISI , A., AND S HOTTON , J. 2013. Decision Forests for Computer Vision and Medical Image Analysis F REDEMBACH , C., AND S USSTRUNK , S. 2008. Colouring the nearinfrared. In Color and Imaging Conference, vol. 2008, Society for Imaging Science and Technology, 176–182. G URBUZ , S. 2009. Application of inverse square law for 3d sensing. In SPIE Optical Engineering+ Applications, International Society for Optics and Photonics, 744706–744706. H ERTZMANN , A., AND S EITZ , S. 2005. Example-based photometric stereo: Shape reconstruction with general, varying BRDFs. H OIEM , D., E FROS , A., AND H EBERT, M. 2005. Automatic photo pop-up H ORN , B. K. 1975. Obtaining shape from shading information. I DESES , I., YAROSLAVSKY, L., AND F ISHBAIN , B. 2007. Realtime 2D to 3D video conversion. J. of Real-Time Image Processing 2, 3–9. J IANG , T., L IU , B., L U , Y., AND E VANS , D. 2003. A neural network approach to shape from shading K ARSCH , K., L IU , C., AND K ANG , S. 2012. Depth extraction from video using non-parametric sampling K ESKIN , C., K IRAC , F., K ARA , Y., AND A KARUN , L. 2012. Hand ¸ P RADOS , E., AND FAUGERAS , O. 2005. Shape from shading: a well-posed problem? In Proc R EMONDINO , F., AND S TOPPA , D. 2013. ToF range-imaging cameras G EHLER , P. V. 2011. Recovering intrinsic images with a global sparsity prior on reﬂectance S AXENA , A., S UN , M., AND N G , A. 2009. Make3D: Learning 3D scene structure from a single still image S CHARSTEIN , D., AND S ZELISKI , R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. S HOTTON , J., W INN , J., ROTHER , C., AND C RIMINISI , A. 2006. S HOTTON , J., F ITZGIBBON , A., C OOK , M., S HARP, T., F INOC CHIO , M., M OORE , R., K IPMAN , A., AND B LAKE , A. 2011. technique. Physics in Medicine and Biology 43, 2465–2478. S MITH , W. A., AND H ANCOCK , E. R. 2008. Facial shape-fromshading and recognition using principal geodesic analysis and robust statistics K HAN , N., T RAN , L., AND TAPPEN , M. 2009. Training manyparameter shape-from-shading models using a surface database.

Multi layer forests

You May Also Find These Documents Helpful

Civil War Origins and Legacy

Civil War Origins and Legacy

The Pros and Cons of the Wechsler Adult Intelligence Scale

The Pros and Cons of the Wechsler Adult Intelligence Scale

How to Create a Figure Drawing

How to Create a Figure Drawing

Dental Identification Abstract

Dental Identification Abstract

Aging and the force-velocity relationship of muscles

Aging and the force-velocity relationship of muscles

Augmented Reality & Virtual Reality Industry Forecast and Analysis to 2013 – 2018

Augmented Reality & Virtual Reality Industry Forecast and Analysis to 2013 – 2018

On Missing Data Treatment for Degraded Video and Film Archives_ A Survey and a New Bayesian Approach

On Missing Data Treatment for Degraded Video and Film Archives_ A Survey and a New Bayesian Approach

Design of Immersivetouch

Design of Immersivetouch

Feature-Based Texture Synthesis and Editing Using Voronoi Diagrams

Feature-Based Texture Synthesis and Editing Using Voronoi Diagrams

Image Processing Using Matlab

Image Processing Using Matlab

Graphene: Nano material with Macro potential

Graphene: Nano material with Macro potential

Kinect

Kinect

Immersive of Multitmedia

Immersive of Multitmedia

Profile of My Achievements

Profile of My Achievements

Is Mobile Phones Necessary?

Is Mobile Phones Necessary?

Related Topics