Iterative random projections for high-dimensional data clustering [PDF]
Keywords_ Clustering, K-means, High-dimensional data, Random projections
Authors_ Ângelo Cardoso, Andreas Wichert
Journal_ Pattern Recognition Letters
Pages_ 1749 - 1755
HEIDI (HEIGH Dimensional Indexing) or how to index one billion of vectors
Traditional indexing of multimedia data leads to dilemma. Either the number of features has to be reduced or the quality of the results in unsatisfactory, or approximate queries is preformed leading to a relative error during retrieval. The promise of the recently introduced subspace-tree [1-5] is the logarithmic retrieval complexity of extremely high dimensional features. The subspace-tree indicates that the conjecture "the curse of dimensionality" is false. The search in such a structure starts at the subspace with the lowest dimension. In this subspace, the set of all possible similar objects is determined. In the next subspace, additional metric information corresponding to a higher dimension is used to reduce this set. This process is then repeated. The theoretical estimation of temporal complexity of the subspace tree is logarithmic for evenly distributed data. It means that depending on the distribution of our data, we have to choose an ideal projection into the subspaces leading to an ideal hierarchy. In the project we will perform experiments on different databases, high dimensional data up to several thousand, and of size till 1 billion.