Our framework for 3D object modeling consists of six majors steps including 1) Multiple pairs of stereo images are captured by 2 calibrated cameras while the object moves freely with respect to the cameras; 2) A SIFT-based feature extraction algorithm establishes the correspondence between various points on every stereo pair sampled; 3) The intersection between the sets of points from two consecutive pairs of images is determined. That is, common feature points present in both the left-right image pair at camera-object position i and the subsequent left-right image pair at camera-object position i+1 are identified; 4) The 3D coordinates of every point in the intersection above is calculated; 5) The transformation between camera-object poses are estimated using the 3D coordinates of the intersection; and 6) The previous transformations are used to create virtual poses of the camera and fed into a patched-base multi-view software to construct the 3D model of the object.
Foreground object detection is an essential task in many image processing and image understanding algorithms, in particular for video surveillance. Background subtraction is a commonly used approach to segment out foreground objects from their background. In real world applications, temporal and spatial changes in pixel values such as due to shadows, gradual/sudden changes in illumination, etc. make modeling backgrounds a quite difficult task.
In our work, we propose an adaptive learning algorithm of multiple subspaces (ALPCA) to handle sudden/gradual illumination variations for background subtraction.
Human motion capture has numerous application in human-robot interaction, law enforcement, surveillance, entertainment, sports, medicine, etc. Various methods have been developed to date and they can be categorized into: marker-based or markerless; articulated model-based or appearance-based; single view or multiple view; and so on. Marker-based methods are the simplest ones and therefore are also the methods with most success so far. However, it is obviously not always possible to add markers to the human subjects, and markerless approaches are without a doubt the most general and desirable methods.
In our work, we explore markerless method for human motion capture. We propose a Bayesian estimation based method which falls into single view and articulated model-based category. The estimator, derived from Particle Filters, was expanded to a hierarchical model by introducing a new coarse-to-fine framework to deal with the computational complexity inherent to Particle Filters.
Any control system using visual-sensory feedback loops falls into one of four categories. These categories are derived from choices made regarding two criteria: the coordinate space of the error function, and the hierarchical structure of the control system. These choices will determine whether the system is a position-based or an image-based system, as well as if it is a dynamic look-and-move or a direct visual servo.
In our work, we present an image-based, dynamic look and move visual servoing system. The difference between our approach and other popular ones is in the use of quaternion representation, which eliminates the potential singularities introduced by a rotational matrix representation.