HOME

TheInfoList



OR:

Targeted projection pursuit is a type of statistical technique used for
exploratory data analysis In statistics, exploratory data analysis (EDA) is an approach of data analysis, analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. A statistical model can be used or ...
,
information visualization Data and information visualization (data viz/vis or info viz/vis) is the practice of designing and creating Graphics, graphic or visual Representation (arts), representations of a large amount of complex quantitative and qualitative data and i ...
, and
feature selection In machine learning, feature selection is the process of selecting a subset of relevant Feature (machine learning), features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: * sim ...
. It allows the user to interactively explore very complex data (typically having tens to hundreds of attributes) to find features or patterns of potential interest. Conventional, or 'blind',
projection pursuit Projection pursuit (PP) is a type of statistical technique that involves finding the most "interesting" possible projections in multidimensional data. Often, projections that deviate more from a normal distribution are considered to be more intere ...
, finds the most "interesting" possible projections in multidimensional data, using a
search algorithm In computer science, a search algorithm is an algorithm designed to solve a search problem. Search algorithms work to retrieve information stored within particular data structure, or calculated in the Feasible region, search space of a problem do ...
that optimizes some fixed criterion of "interestingness" – such as deviation from a
normal distribution In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is f(x) = \frac ...
. In contrast, targeted projection pursuit allows the user to explore the space of projections by manipulating data points directly in an interactive scatter plot. Targeted projection pursuit has found applications in
DNA microarray A DNA microarray (also commonly known as a DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or t ...
data analysis, protein
sequence analysis In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. It can be performed on the entire genome ...
, graph layout and
digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
. It is available as a package for the WEKA
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
toolkit.


References


Further reading

* Joe Faith (2007
"Targeted Projection Pursuit for Interactive Exploration of High-Dimensional Data Sets"
''Proceedings of 11th International Conference on Information Visualisation''


External links


imDEV
free Excel add-in for targeted projection pursuits using feature selection coupled with PLS and PLS-DA
Targeted Projection Pursuit project page
Statistical charts and diagrams {{Statistics-stub