Direct image mono-camera localization using deep learning


  • Matthew Powelson
  • Stephen L. Canfield


Feature based localization is a common avenue of robotics research. While historically this has been carried out in a 2D space using sensors such as lidar, with the rise of highly mobile sensor packages – for example cell phones or UAVs – the use of 3D feature maps is an area of increasing interest. This poster presents a deep learning approach that estimates the pose directly from a single monocular camera image. This is done by using transfer learning to leverage pretrained models at minimal computational cost. A common convolutional neural network architecture typically used for image classification is adapted to act as a regressor that can directly predict pose from raw RGB pixel values. Introductory tests show effectiveness in 1D with computational speeds sufficient for real time application.