Targeted projection pursuit is a type of statistical technique used for
exploratory data analysis
In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but pri ...
,
information visualization
Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, a ...
, and
feature selection
In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construc ...
. It allows the user to interactively explore very complex data (typically having tens to hundreds of attributes) to find features or patterns of potential interest.
Conventional, or 'blind',
projection pursuit, finds the most "interesting" possible projections in multidimensional data, using a
search algorithm
In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with eith ...
that optimizes some fixed criterion of "interestingness" – such as deviation from a
normal distribution
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
:
f(x) = \frac e^
The parameter \mu ...
. In contrast, targeted projection pursuit allows the user to explore the space of projections by manipulating data points directly in an interactive
scatter plot.
Targeted projection pursuit has found applications in
DNA microarray data analysis, protein
sequence analysis
In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alig ...
,
graph layout
Graph drawing is an area of mathematics and computer science combining methods from geometric graph theory and information visualization to derive two-dimensional depictions of graphs arising from applications such as social network analysis, ...
and
digital signal processing. It is available as a package for the
WEKA
The weka, also known as the Māori hen or woodhen (''Gallirallus australis'') is a flightless bird species of the rail family. It is endemic to New Zealand. It is the only extant member of the genus '' Gallirallus''. Four subspecies are recogni ...
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
toolkit.
References
Further reading
* Joe Faith (2007
"Targeted Projection Pursuit for Interactive Exploration of High-Dimensional Data Sets" ''Proceedings of 11th International Conference on Information Visualisation''
External links
imDEVfree Excel add-in for targeted projection pursuits using feature selection coupled with PLS and PLS-DA
Targeted Projection Pursuit project page
Statistical charts and diagrams
{{Statistics-stub