HOME

TheInfoList



OR:

Feature Selection Toolbox (FST) is software primarily for feature selection in the machine learning domain, written in C++, developed at the Institute of Information Theory and Automation (UTIA), of the Czech Academy of Sciences.


Version 1

The first generation of Feature Selection Toolbox (FST1) was a Windows application with user interface allowing users to apply several sub-optimal, optimal and mixture-based feature selection methods on data stored in a trivial proprietary textual flat file format.


Version 3

The third generation of Feature Selection Toolbox (FST3) was a library without user interface, written to be more efficient and versatile than the original FST1. FST3 supports several standard data mining tasks, more specifically, data preprocessing and
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
, but its main focus is on feature selection. In feature selection context, it implements several common as well as less usual techniques, with particular emphasis put on threaded implementation of various sequential search methods (a form of
hill-climbing numerical analysis, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution ...
). Implemented methods include individual feature ranking, floating search, oscillating search (suitable for very high-dimension problems) in randomized or deterministic form, optimal methods of branch and bound type, probabilistic class distance criteria, various classifier accuracy estimators, feature subset size optimization, feature selection with pre-specified feature weights, criteria ensembles, hybrid methods, detection of all equivalent solutions, or two-criterion optimization. FST3 is more narrowly specialized than popular software like the Waikato Environment for Knowledge Analysis Weka, RapidMiner or PRTools.PRTools
/ref> By default, techniques implemented in the toolbox are predicated on the assumption that the data is available as a single flat file in a simple proprietary format or in Weka format ARFF, where each data point is described by a fixed number of numeric attributes. FST3 is provided without user interface, and is meant to be used by users familiar both with machine learning and C++ programming. The older FST1 software is more suitable for simple experimenting or educational purposes because it can be used with no need to code in C++.


History

* In 1999, development of the first Feature Selection Toolbox version started at UTIA as part of a PhD thesis. It was originally developed in Optima++ (later renamed Power++) RAD C++ environment. * In 2002, the development of the first FST generation has been suspended, mainly due to end of
Sybase Sybase, Inc. was an enterprise software and services company. The company produced software to manage and analyze information in relational databases, with facilities located in California and Massachusetts. Sybase was acquired by SAP in 2010; ...
's support of the then used development environment. * In 2002–2008, FST kernel was recoded and used for research experimentation within UTIA only. * In 2009, 3rd FST kernel recoding from scratch begun. * In 2010, FST3 was made publicly available in form of a C++ library without GUI. The accompanying webpage collects feature selection related links, references, documentation and the original FST1 available for download. * In 2011, an update of FST3 to version 3.1 included new methods (particularly a novel dependency-aware feature ranking suitable for very-high-dimension recognition problems) and core code improvements.


See also

* Feature selection * Pattern recognition * Machine learning * Data mining *
OpenNN OpenNN (Open Neural Networks Library) is a software library written in the C++ programming language which implements neural networks, a main area of deep learning research. The library is open-source, licensed under the GNU Lesser General Public L ...
, Open neural networks library for predictive analytics * Weka, comprehensive and popular Java open-source software from University of Waikato * RapidMiner, formerly ''Yet Another Learning Environment'' (YALE) a commercial machine learning framework
PRTools
of the Delft University of Technology
Infosel++
specialized in
information theory Information theory is the scientific study of the quantification (science), quantification, computer data storage, storage, and telecommunication, communication of information. The field was originally established by the works of Harry Nyquist a ...
based feature selection
Tooldiag
a C++ pattern recognition toolbox * List of numerical analysis software


References


External links

{{Official website, fst.utia.cz Classification algorithms Data mining and machine learning software C++ software Data modeling tools Computer libraries Cross-platform software