ECCV 2014 - LNCS 8689-8695

Stixmantics: A Medium-Level Model for Real-Time Semantic Scene Understanding

Timo Scharwächter^{1, 2}, Markus Enzweiler¹, Uwe Franke¹, and Stefan Roth²

¹Environment Perception, Daimler R&D, Sindelfingen, Germany

²Department of Computer Science, TU Darmstadt, Germany

Abstract. In this paper we present Stixmantics, a novel medium-level scene representation for real-time visual semantic scene understanding. Relevant scene structure, motion and object class information is encoded using so-called Stixels as primitive elements. Sparse feature-point trajectories are used to estimate the 3D motion field and to enforce temporal consistency of semantic labels. Spatial label coherency is obtained by using a CRF framework.

The proposed model abstracts and aggregates low-level pixel information to gain robustness and efficiency. Yet, enough flexibility is retained to adequately model complex scenes, such as urban traffic. Our experimental evaluation focuses on semantic scene segmentation using a recently introduced dataset for urban traffic scenes. In comparison to our best baseline approach, we demonstrate state-of-the-art performance but reduce inference time by a factor of more than 2,000, requiring only 50 ms per image.

Keywords: semantic scene understanding, bag-of-features, region classification, real-time, stereo vision, stixels

LNCS 8693, p. 533 ff.

Full article in PDF | BibTeX