lolR - Linear Optimal Low-Rank Projection
Supervised learning techniques designed for the situation
when the dimensionality exceeds the sample size have a tendency
to overfit as the dimensionality of the data increases. To
remedy this High dimensionality; low sample size (HDLSS)
situation, we attempt to learn a lower-dimensional
representation of the data before learning a classifier. That
is, we project the data to a situation where the dimensionality
is more manageable, and then are able to better apply standard
classification or clustering techniques since we will have
fewer dimensions to overfit. A number of previous works have
focused on how to strategically reduce dimensionality in the
unsupervised case, yet in the supervised HDLSS regime, few
works have attempted to devise dimensionality reduction
techniques that leverage the labels associated with the data.
In this package and the associated manuscript Vogelstein et al.
(2017) <arXiv:1709.01233>, we provide several methods for
feature extraction, some utilizing labels and some not, along
with easily extensible utilities to simplify cross-validative
efforts to identify the best feature extraction method.
Additionally, we include a series of adaptable benchmark
simulations to serve as a standard for future investigative
efforts into supervised HDLSS. Finally, we produce a
comprehensive comparison of the included algorithms across a
range of benchmark simulations and real data applications.