The dissertation, "Content-Based Retrieval of Visual Information" was chosen as the best thesis from the computer science department (LIACS) and therefore was nominated for the best thesis award at Leiden University. His research was guided by Prof. Joost N. Kok and Dr. Michael S. lew
In his PhD thesis, Ard Oerlemans makes three fundamental scientific contributions: the MOD paradigm; constructed texture patterns; and multi-dimensional maximum likelihood similarity measures.
In the field of visual information retrieval, the most challenging problem for the past decade has been visual concept detection, where the computer is asked to automatically annotate an image using relevant keywords. On a fundamental level, this means that the computer is being endowed with semantically meaningful vision. When it sees a beach, building, face, or sunset, it can recognize the concepts based on the pictorial content as a human would. Until recently, this problem was considered nearly impossible as it would seem to also require solving the problem of artificial intelligence. His first contribution is a new algorithm for visual concept detection using salient points. In his thesis, he proposes a new paradigm called MOD where salient points based are selected by maximizing a distinctiveness criterion, The MOD approach was tested on the most challenging international scientific test sets and found to give significantly improved results over the top method from the research literature in visual concept detection. Furthermore, unlike all of the competitive research in salient points for image understanding, his work generalizes to any type of pictorial content and imagery.
The second contribution is in computationally efficient texture descriptors. Texture features are arguably the most dominant visual feature used in computer vision. Currently the most prevalent texture feature in the research literature is called linear binary patterns (LBP), which is a 256 dimensional feature descriptor composed of 3x3 patterns. It has been repeatedly found to have high accuracy but also requires significant computational effort to use. In his thesis, he found that it was possible to construct larger patterns with similar accuracy to LBP but only requiring 2 dimensions instead of 256. His contribution increases the computational efficiency of texture classification by a hundred-fold and reduces the required memory similarly.
The third contribution is in multi-dimensional maximum likelihood matching of visual imagery. Currently, in the field of computer vision, the most prevalent similarity measures is the sum of squared difference (SSD or L2 or Euclidean distance). Using the theory of maximum likelihood, he discusses that the SSD is optimal only under specific assumptions of the noise distribution in particular where the noise distribution is Gaussian. In his thesis, he examines the common computer vision problem of two view matching where one searches for the corresponding point from one image to another. With the correspondences, one can then directly compute the 3D structure of the image contents and/or the motion of the camera. In his thesis, he finds that the noise distributions are not Gaussian. Furthermore, he investigates several approaches toward estimating the true noise distribution and finds that the single-dimensional formulation has a second fundamental assumption - Does the difference (x-y) contains all information necessary to model the similarity distribution? In the field of computer vision, the problem of two view matching arguably has been the most important and challenging problem over the past two decades. He finds that the multi-dimensional approach overcomes the second fundamental assumption and gives significantly improved results on the most credible and respected international test sets in two view matching. Overall, his work advances the theory of computational similarity in fundamental ways which can potentially improve all areas of pattern recognition and computer vision.
|