Neuromorphic Perception & Action – Invited Special Session
B5L-D: Neuromorphic Perception & Action - Invited Special Session
Session Type: LectureSession Code: B5L-D
Location: Room 4
Date & Time: Thursday March 23, 2023 (15:20-16:20)
Chair: Cornelia Fermuller
Track: 12
Paper ID | Paper Title | Authors | Abstract |
---|---|---|---|
3066 | EVHuman: Event-Based Human Detection with Track Point Trajectories | Jingxi Chen, Vishaal Sivakumar, Cornelia Fermüller, Yiannis Aloimonos | Current event-based human detection approaches rely on the implicit assumption of a static camera or very sparse background events to reliably humans. This assumption makes event-based human detection less effective for practical application scenarios, especially in robotic applications where the camera itself is mounted on a moving platform and the environment can generate dense background events in addition to the events from humans. We propose an end-to-end two-stage learning approach to moving human detection with a moving event camera. Our approach, in the first stage, detects candidate bounding boxes containing humans using accumulated event images.This usually leads to a large number of detections, including false-positive bounding boxes from background objects. In the second stage, we refine the detections (bounding boxes) by tracking points in the event stream over extended periods. A classifier is trained to detect human-like motion from the track points\' trajectories, velocities, and accelerations, and this way false-positive detection boxes can be filtered out. By using this two-stage approach, we overcome the limitation of sparse background assumption of current event-based human detection methods. |
3075 | Towards Event Based Egomotion Using Normal Flow | Levi Burner, Chenqi Zhu, Cornelia Fermüller, Yiannis Aloimonos | Estimation of egomotion is an essential problem in computer vision and numerous methods exist to solve it. However, there is still significant work to be done in terms of the robustness and efficiency of existing methods. Towards this, we propose a method of estimating normal flow with an event camera which will support future research in egomotion estimation and detection of independently moving objects. While optical flow is traditionally used for these tasks, we focus on normal flow due to its compelling theoretical advantages such as being uniquely defined. Event sensors, which measure per-pixel time of brightness change, provide unprecedented temporal resolution known to be essential to accurate flow estimation. However, event sensors do not measure absolute brightness, and this prevents the straightforward application of existing normal flow estimation methods. Our work consists of three parts. First, we prove theoretically that the non-unique 3D surface normal of the event cloud determines the unique 2D normal flow through a simple relation. Next, we construct an efficient estimator of the surface normal using PCA with an outlier rejection step. Finally, we propose a small, 4-layer CNN to filter the normal flow and produce a reliable estimate. Our results are evaluated on the EVIMO2 dataset which is the only event camera dataset with real cameras and ground truth normal flow. |
3088 | Towards an Improved Hyperdimensional Classifier for Event-Based Data | Neal Anwar, Chethan Parameshwara, Cornelia Fermüller, Yiannis Aloimonos | Hyperdimensional Computing (HDC) is an emerging neuroscience-inspired framework wherein data of various modalities can be represented uniformly in high-dimensional space as long, redundant holographic vectors. When equipped with the proper Vector Symbolic Architecture (VSA) and applied to neuromorphic hardware, HDC-based networks have been demonstrated to be capable of solving complex visual tasks with substantial energy efficiency gains and increased robustness to noise when compared to standard Artificial Neural Networks (ANNs). HDC has shown potential to be used with great efficacy for learning based on spatiotemporal data from neuromorphic sensors such as the Dynamic Vision Sensor (DVS), but prior work has been limited in this arena due to the complexity and unconventional nature of this type of data as well as difficulty choosing the appropriate VSA to hypervectorize spatiotemporal information. We present a bipolar HD encoding mechanism designed for encoding spatiotemporal data, which captures the contours of DVS-generated time surfaces created by moving objects by fitting to them local surfaces which are individually encoded into HD vectors and bundled into descriptive high-dimensional representations. We conclude with a sketch of the structure and training/inference pipelines associated with an HD classifier, predicated on our proposed HD encoding scheme, trained for the complex real-world task of pose estimation from event camera data. |
3124 | When Do Neuromorphic Sensors Outperform Cameras? Learning from Dynamic Features | Daniel Deniz{1}, Eduardo Ros{1}, Cornelia Fermüller{2}, Francisco Barranco{1} | Visual event sensors only output data when changes in the scene happen at very high frequency. This allows for smartly compressing the scene and thus, enabling real-time operation. Despite these advantages, works in the literature have struggled to show a niche for these event-driven approaches compared to conventional sensors, especially when focusing on accuracy performance. In this work, we show a case that fully exploits event sensor advantages: for manipulation action recognition, learning events achieves superior accuracy and time performance. The recognition of manipulation actions requires extracting and learning features from the hand pose and trajectory and the interaction with the object. As shown in our work, approaches based on event sensors are the best fit for extracting these dynamic features contrarily to conventional approaches based on full frames, which mostly extract spatial features and need to reconstruct the dynamics from sequences of frames. Finally, we show how using a tracker to extract the features to be learned only around the hand, we obtain an approach that is scene- and almost object-agnostic and achieves good time performance with a very limited impact in accuracy. |