Tutorials
Saturday, November 1
Mining Image and Video Data ( link for details )
Organizers: Junsong Yuan, Nanyang Technological University, Singapore; Ying Wu, Northwestern University, USA
Time: 0900-1230, November 1
Location: Global Learning Room, Stephen Riady Centre at U Town, NUS
Abstract
Motivated by the previous success in mining structured data (e.g., transaction data) and semi-structured data (e.g., text), it has aroused our curiosity in mining meaningful patterns in non-structured multimedia data like images and videos. Although the discovery of visual patterns from images and videos appears to be quite exciting, data mining techniques that are successful in business and text data may not be simply applied to image and video data that contain high-dimensional features and have spatial or spatio-temporal structures. Unlike transaction and text data that are composed of discrete elements without much ambiguity (i.e. predefined items and vocabularies), visual patterns generally exhibit large variabilities in their visual appearances, thus challenge existing data mining and pattern discovery algorithms. This tutorial will discuss the state of the art of image and video data mining, and provide in-depth studies on some of the recently developed techniques. The topics cover the co-occurrence visual pattern discovery, context-aware clustering of visual primitives, topic model for pattern discovery, as well as their applications in image search and recognition, scene understanding, video summarization and anomaly detection, intelligent video surveillance, etc.
Riemannian Geometry in Computer Vision ( link for details )
Organizers: Fatih Porikli, NICTA and Australian National University; Mehrtash Harandi, NICTA, Australia; Conrad Sanderson, NICTA, Australia
Time: 1330-1700, November 1
Location: LT53, Stephen Riady Centre at U Town, NUS
Abstract
In computer vision, it is general practice to consider constraints and models to make a problem tractable. We can see many traces of such simplifying assumptions from the front-end (pinhole camera model or the Lambertian reflectance model), to data modelling and decision making. One of the biggest, yet unjustified, assumptions in computer vision is the notion of flat (non-curved) spaces — most of the time we tend to solve our tasks using the traditional Euclidean space.
Recent research in machine learning and computer vision shows that improved discrimination accuracies (lower error rates) can be achieved by explicitly taking into account the curved nature of many representations. This includes applications such as texture classification, action recognition, video-based face recognition, identity re-identification, and object tracking.
The use of Riemannian geometry to handle curved spaces has been fundamental in physics since Einstein and perhaps earlier. Traditionally, ‘manifold-learning’ methods have been at the forefront of these applications where an analytical characterisation of non-flat spaces cannot be found. In the past few years, computer vision researchers have made significant advancement in the analytical and geometric understanding of the non-flat spaces. This makes an important development in computer vision by moving away from purely data-driven approaches to incorporate more prior information via geometry-based approaches. This tutorial will address the key points in developing efficient learning methods and tools for image/video content analysis using Riemannian manifolds.
Color Transfer ( link for details )
Organizers: Sira Ferradans, Duke University; Marcelo Bertalmio, Universitat Pompeu Fabra
Time: 1330-1700, November 1
Location: SR7, Stephen Riady Centre at U Town, NUS
Abstract
Color transfer is the problem of imposing the color palette of an image I1 on an image I2, without changing the spatial geometry of I2. It may be the case that the color distribution or the target characteristics are predefined, such as an equalized histogram, or an image without a colored illuminant. Both problems are computationally challenging since they need to take into account the color and spatial domain information.
This tutorial will present a state-of-the-art and unifying perspective of the color transfer problem between two or more images. We will also relate them to other important problems such as illuminant change or object recolorization. Our goal is to give a rigorous approach, that allows to understand the challenges of these problems, how the different assumptions made in the literature affect the final result, and how these problems are related together. Moreover, after the tutorial the audience should have an intuition of when and why the different algorithms in the literature can be applied, and the issues that still remain to be solved.
Sunday, November 2
Advanced Sparse Representation Models for Image and Video Analysis ( link for details )
Organizers: Shenghua Gao, ShanghaiTech University, China; Kui Jia, Advance Digital Science Center, Singapore; Tianzhu Zhang, Advance Digital Science Center, SingaporeWeisheng Dong, Xidian University, China
Time: 0900-1230, November 2
Location: LT53, Stephen Riady Centre at U Town, NUS
Abstract
The tutorial will start with introducing the basic theory of sparse/low-rank recovery, with generalization to other low-complexity structures in vector/matrix spaces. Following introduction of basic theory, we present choices of algorithms that can better cope with the issues of non-smoothness and scale, commonly occurring in the large-scale optimization problems of sparsity/low-rank models. This tutorial will continue with presenting the very recent breakthroughs in image processing. These results are essentially obtained by properly harnessing these rich low-dimensional structures prevailing in natural images, using carefully designed sparsity/low-rank tools, for various low-level vision tasks including image de-noising, de-blurring, super-resolution, and compressive sensing, etc. In the third session of this tutorial, we will present striking results recently obtained in computer vision research. We cover a variety of mainstream vision applications ranging from face/object recognition, object alignment, feature correspondence/matching, tracking, to unsupervised object discovery and ambiguous learning. We will introduce the respective nature of these problems and explain how sparsity/low-rank models can be designed to harness their problem nature and achieve striking performance.
Metric Learning for Visual Recognition ( link for details )
Organizers:Jiwen Lu, Advanced Digital Sciences Center, Singapore; Ruiping Wang, Chinese Academy of Sciences; Weihong Deng, Beijing University of Posts and Telecommunications
Time: 1330-1700, November 2
Location: LT53, Stephen Riady Centre at U Town, NUS
Abstract
One of the fundamental issues of visual recognition is how to measure the similarity or compute the distance between pairs of examples. While the conventional distance metrics are convenient and well‐defined, they usually ignore the fact that the semantic meaning of “similarity” is inherently task‐dependent and data‐dependent. This simple observation has led to the idea that the distance metric should change adaptively to be better suit to the problem. With this strategy even simple classifiers can be competitive with the state‐of‐the‐art because the distance measure locally adapts to the structure of the data. During the past two decades, we have witnessed how metric learning techniques significantly improved the state‐of‐the‐arts of many important visual recognition tasks. In this tutorial, we will overview the trend of metric learning techniques and discuss how they advance different visual recognition tasks. First, we briefly introduce the basic concept of metric learning, and show how they are used to improve the performance of different visual recognition tasks in previous work. Second, we introduce some of our newly proposed metric learning method from two aspects: single‐metric learning and multi‐metric learning, which differ in assuming that the learned metric is unified or changes smoothly throughout different regions of the feature space. Lastly, we will present how these proposed metric learning methods are used to improve different computer vision tasks and discuss some open problems to understand how to develop more advanced metric learning algorithms for visual recognition in the future.
Essence of Geometric Algebra ( link for details )
Organizers: Kenichi Kanatani, Professor Emeritus, Okayama University
Time: 1330-1700, November 2
Location: SR2, Stephen Riady Centre at U Town, NUS
Abstract
I introduce “geometric algebra”, which recently has been attracting attention of many computer vision and graphics researchers. Many books and articles on geometric algebra start with definitions of symbols and terminologies followed by identities and relationships among them. This often makes beginners shy away. My tutorial takes an alternative approach: the emphasis is on the background mathematics, including the Hamilton algebra, the Grassmann algebra, and the Clifford algebra. In the end, I show how these are combined as geometric algebra.
In order to illustrate the close connection to computer vision applications, I also describe imaging geometry of fisheye lens and omnidirectional cameras using parabolic, hyperbolic, and elliptic mirrors, which play more and more important roles in computer vision and robotics applications as their price goes down.