HOME

TheInfoList



OR:

Targeted projection pursuit is a type of statistical technique used for
exploratory data analysis In statistics, exploratory data analysis (EDA) is an approach of analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or not, but pri ...
,
information visualization Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, a ...
, and
feature selection In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construc ...
. It allows the user to interactively explore very complex data (typically having tens to hundreds of attributes) to find features or patterns of potential interest. Conventional, or 'blind', projection pursuit, finds the most "interesting" possible projections in multidimensional data, using a
search algorithm In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the search space of a problem domain, with eith ...
that optimizes some fixed criterion of "interestingness" – such as deviation from a
normal distribution In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is : f(x) = \frac e^ The parameter \mu ...
. In contrast, targeted projection pursuit allows the user to explore the space of projections by manipulating data points directly in an interactive scatter plot. Targeted projection pursuit has found applications in DNA microarray data analysis, protein
sequence analysis In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alig ...
,
graph layout Graph drawing is an area of mathematics and computer science combining methods from geometric graph theory and information visualization to derive two-dimensional depictions of graphs arising from applications such as social network analysis, ...
and digital signal processing. It is available as a package for the
WEKA The weka, also known as the Māori hen or woodhen (''Gallirallus australis'') is a flightless bird species of the rail family. It is endemic to New Zealand. It is the only extant member of the genus '' Gallirallus''. Four subspecies are recogni ...
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
toolkit.


References


Further reading

* Joe Faith (2007
"Targeted Projection Pursuit for Interactive Exploration of High-Dimensional Data Sets"
''Proceedings of 11th International Conference on Information Visualisation''


External links


imDEV
free Excel add-in for targeted projection pursuits using feature selection coupled with PLS and PLS-DA
Targeted Projection Pursuit project page
Statistical charts and diagrams {{Statistics-stub