Multiscale principal component analysis
thesisposted on 2016-02-09, 10:10 authored by Ayodeji Akinwumi Akinduko
The problem of approximating multidimensional data with objects of lower dimension is a classical problem in complexity reduction. It is important that data approximation capture the structure(s) and dynamics of the data, however distortion to data by many methods during approximation implies that some geometric structure(s) of the data may not be preserved during data approximation. For methods that model the manifold of the data, the quality of approximation depends crucially on the initialization of the method. The first part of this thesis investigates the effect of initialization on manifold modelling methods. Using Self Organising Maps (SOM) as a case study, we compared the quality of learning of manifold methods for two popular initialization methods; random initialization and principal component initialization. To further understand the dynamics of manifold learning, datasets were further classified into linear, quasilinear and nonlinear. The second part of this thesis focuses on revealing geometric structure(s) in high dimension data using an extension of Principal Component Analysis (PCA). Feature extraction using (PCA) favours direction with large variance which could obfuscate other interesting geometric structure(s) that could be present in the data. To reveal these intrinsic structures, we analysed the local PCA structures of the dataset. An equivalent definition of PCA is that it seeks subspaces that maximize the sum of pairwise distances of data projection; extending this definition we define localization in term of scale as maximizing the sum of weighted squared pairwise distances between data projections for various distributions of weights (scales). Since for complex data various regions of the dataspace could have different PCA structures, we also define localization with regards to dataspace. The resulting local PCA structures were represented by the projection matrix corresponding to the subspaces and analysed to reveal some structures in the data at various localizations.
Date of award2016-02-01
Author affiliationDepartment of Mathematics
Awarding institutionUniversity of Leicester