Consistent estimation of low-dimensional latent structure in high-dimensional data


We consider the problem of extracting a low-dimensional, linear latent variable structure from high-dimensional random variables. Specifically, we show that under mild conditions and when this structure manifests itself as a linear space that spans the conditional means, it is possible to consistently recover the structure using only information up to the second moments of these random variables. This finding, specialized to one-parameter exponential families whose variance function is quadratic in their means, allows for the derivation of an explicit estimator of such latent structure. This approach serves as a latent variable model estimator and as a tool for dimension reduction for a high-dimensional matrix of data composed of many related variables. Our theoretical results are verified by simulation studies and an application to genomic data.

arXiv, 1510.03497