Proceedings of Materials for Sustainable Development Conference (MAT-SUS) (NFM22)
DOI: https://doi.org/10.29363/nanoge.nfm.2022.009
Publication date: 11th July 2022
Event cameras mimic the workings of the human bio visual pathway by sending image intensity
change pulses to the neural system. They are a promising alternative to conventional frame-based
cameras for detecting ultra fast motion with low latency, robustness to changes in illumination
conditions, and low power consumption. These characteristics make them ideal for mobile robotic
tasks. However, exploiting to its full capacity their unconventional sparse and asynchronous spatio-
temporal data flow efficiently still challenges the computer vision community.
Deep Artificial Neural Networks (ANN), especially, the recent architecture of visual transformers
(ViT) have achieved state-of-the-art performance for various visual tasks [1]. However, the straight-
forward use of ANNs on event input data needs a preprocessing step that constraints its sparse and
asynchronous nature. Inspired by computational neuroscience, Spiking Neural Networks (SNNs) turn
out to be a natural match for event cameras due to their sparse event-driven and temporal processing
characteristics. SNNs have been applied mostly for classification tasks [2]. Some other works involve
regression tasks for optical flow estimation [3], [4], depth estimation [5] angular velocity estimation [6],
and video reconstruction [7]. However, limited work has been done to incorporate SNNs for full 3D
ego-motion estimation.
We first present an optimization-based ego-motion estimation framework that exploits the event-based
optical flow outputs of a trained SNN model [8]. Our method successfully estimates pure rotation and
pure translation motion from input events only and shows the potential of using SNNs for continuous
ego-motion estimation tasks. Secondly, we show our Hybrid RNN-ViT architecture for optical flow
estimation which uses ViT to learn global context. We further present preliminary results for its SNN counterpart which combines SNNs to directly process the event data.
This work was supported by projects EBSLAM DPI2017-89564-P and EBCON PID2020-119244GB-I00 funded by MCIN/AEI/10.13039/501100011033 and by an FI AGAUR PhD grant to Yi Tian.