Targeted projection pursuit is a type of statistical technique used for
exploratory data analysis
In statistics, exploratory data analysis (EDA) is an approach of data analysis, analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or ...
,
information visualization
Data and information visualization (data viz/vis or info viz/vis) is the practice of designing and creating Graphics, graphic or visual Representation (arts), representations of a large amount of complex quantitative and qualitative data and i ...
, and
feature selection
In machine learning, feature selection is the process of selecting a subset of relevant Feature (machine learning), features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons:
* sim ...
. It allows the user to interactively explore very complex data (typically having tens to hundreds of attributes) to find features or patterns of potential interest.
Conventional, or 'blind',
projection pursuit
Projection pursuit (PP) is a type of statistical technique that involves finding the most "interesting" possible projections in multidimensional data. Often, projections that deviate more from a normal distribution are considered to be more intere ...
, finds the most "interesting" possible projections in multidimensional data, using a
search algorithm
In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the Feasible region, search space of a problem do ...
that optimizes some fixed criterion of "interestingness" – such as deviation from a
normal distribution
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
f(x) = \frac ...
. In contrast, targeted projection pursuit allows the user to explore the space of projections by manipulating data points directly in an interactive
scatter plot.
Targeted projection pursuit has found applications in
DNA microarray
A DNA microarray (also commonly known as a DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or t ...
data analysis, protein
sequence analysis
In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome ...
,
graph layout and
digital signal processing
Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
. It is available as a package for the
WEKA machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
toolkit.
References
Further reading
* Joe Faith (2007
"Targeted Projection Pursuit for Interactive Exploration of High-Dimensional Data Sets" ''Proceedings of 11th International Conference on Information Visualisation''
External links
imDEVfree Excel add-in for targeted projection pursuits using feature selection coupled with PLS and PLS-DA
Targeted Projection Pursuit project page
Statistical charts and diagrams
{{Statistics-stub