by Floris Kleyn, firstname.lastname@example.org, Leiden University, The Netherlands
Finding most similar items in large datasets is a popular task. Examples are searching for similar images, scientific articles, songs and movies. Similarity search is also called nearest neighbor search and it is computationally expensive, especially when data is high dimensional. Many algorithms exists to perform nearest neighbor search, but the performance decreases enormously when the dimensionality of the data increases: the curse of dimensionality. To overcome this, it is also possible to perform an approximate nearest neighbor search which, as opposed to exact nearest neighbor search, does not always yield in the best neighbor(s) but gives good results quicker than currently possible with exact nearest neighbor search algorithms. In this paper we propose a new algorithm for approximate nearest neighbor search and compare it with the current state of the art.