Feature Selection Toolbox
   HOME

TheInfoList



OR:

Feature Selection Toolbox (FST) is software primarily for
feature selection In machine learning, feature selection is the process of selecting a subset of relevant Feature (machine learning), features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: * sim ...
in the
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
domain, written in C++, developed at the Institute of Information Theory and Automation (UTIA), of the
Czech Academy of Sciences The Czech Academy of Sciences (abbr. CAS, , abbr. AV ČR) was established in 1992 by the Czech National Council as the Czech successor of the former Czechoslovak Academy of Sciences and its tradition goes back to the Royal Bohemian Society of Sc ...
.


Version 1

The first generation of Feature Selection Toolbox (FST1) was a Windows application with user interface allowing users to apply several sub-optimal, optimal and mixture-based feature selection methods on data stored in a trivial proprietary textual flat file format.


Version 3

The third generation of Feature Selection Toolbox (FST3) was a
library A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
without user interface, written to be more efficient and versatile than the original FST1. FST3 supports several standard
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
tasks, more specifically, data preprocessing and
classification Classification is the activity of assigning objects to some pre-existing classes or categories. This is distinct from the task of establishing the classes themselves (for example through cluster analysis). Examples include diagnostic tests, identif ...
, but its main focus is on
feature selection In machine learning, feature selection is the process of selecting a subset of relevant Feature (machine learning), features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: * sim ...
. In feature selection context, it implements several common as well as less usual techniques, with particular emphasis put on threaded implementation of various sequential search methods (a form of hill-climbing). Implemented methods include individual feature ranking, floating search, oscillating search (suitable for very high-dimension problems) in randomized or deterministic form, optimal methods of
branch and bound Branch and bound (BB, B&B, or BnB) is a method for solving optimization problems by breaking them down into smaller sub-problems and using a bounding function to eliminate sub-problems that cannot contain the optimal solution. It is an algorithm ...
type, probabilistic class distance criteria, various classifier accuracy estimators, feature subset size optimization, feature selection with pre-specified feature weights, criteria ensembles, hybrid methods, detection of all equivalent solutions, or two-criterion optimization. FST3 is more narrowly specialized than popular software like the Waikato Environment for Knowledge Analysis
Weka The weka, also known as the Māori hen or woodhen (''Gallirallus australis'') is a flightless bird species of the rail family. It is endemic to New Zealand. Some authorities consider it as the only extant member of the genus '' Gallirallus''. ...
,
RapidMiner RapidMiner is a data science platform that analyses the collective impact of an organization's data. It was acquired by Altair Engineering in September 2022. History RapidMiner, formerly known as YALE (Yet Another Learning Environment), was deve ...
or PRTools. By default, techniques implemented in the toolbox are predicated on the assumption that the data is available as a single flat file in a simple proprietary format or in Weka format ARFF, where each data point is described by a fixed number of numeric attributes. FST3 is provided without
user interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine fro ...
, and is meant to be used by users familiar both with
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
and C++ programming. The older FST1 software is more suitable for simple experimenting or educational purposes because it can be used with no need to code in C++.


History

* In 1999, development of the first Feature Selection Toolbox version started at UTIA as part of a PhD thesis. It was originally developed in Optima++ (later renamed Power++) RAD C++ environment. * In 2002, the development of the first FST generation has been suspended, mainly due to end of
Sybase Sybase, Inc. was an enterprise software and services company. The company produced software relating to relational databases, with facilities located in California and Massachusetts. Sybase was acquired by SAP in 2010; SAP ceased using the Syba ...
's support of the then used development environment. * In 2002–2008, FST kernel was recoded and used for research experimentation within UTIA only. * In 2009, 3rd FST kernel recoding from scratch begun. * In 2010, FST3 was made publicly available in form of a C++ library without GUI. The accompanying webpage collects feature selection related links, references, documentation and the original FST1 available for download. * In 2011, an update of FST3 to version 3.1 included new methods (particularly a novel dependency-aware feature ranking suitable for very-high-dimension recognition problems) and core code improvements.


See also

*
Feature selection In machine learning, feature selection is the process of selecting a subset of relevant Feature (machine learning), features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: * sim ...
*
Pattern recognition Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess PR capabilities but their p ...
*
Machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
*
Data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
*
OpenNN OpenNN (Open Neural Networks Library) is a software library written in the C++ programming language which implements neural networks, a main area of deep learning research. The library is open-source, licensed under the GNU Lesser General Public ...
, Open neural networks library for
predictive analytics Predictive analytics encompasses a variety of Statistics, statistical techniques from data mining, Predictive modelling, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or other ...
*
Weka The weka, also known as the Māori hen or woodhen (''Gallirallus australis'') is a flightless bird species of the rail family. It is endemic to New Zealand. Some authorities consider it as the only extant member of the genus '' Gallirallus''. ...
, comprehensive and popular
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
open-source software Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
from
University of Waikato The University of Waikato (), established in 1964, is a Public university, public research university located in Hamilton, New Zealand, Hamilton, New Zealand. An additional campus is located in Tauranga. The university performs research in nume ...
*
RapidMiner RapidMiner is a data science platform that analyses the collective impact of an organization's data. It was acquired by Altair Engineering in September 2022. History RapidMiner, formerly known as YALE (Yet Another Learning Environment), was deve ...
, formerly ''Yet Another Learning Environment'' (YALE) a commercial machine learning framework * of the Delft University of Technology
Infosel++
specialized in
information theory Information theory is the mathematical study of the quantification (science), quantification, Data storage, storage, and telecommunications, communication of information. The field was established and formalized by Claude Shannon in the 1940s, ...
based feature selection
Tooldiag
a C++ pattern recognition toolbox *
List of numerical analysis software Listed here are notable end-user computer applications intended for use with numerical or data analysis: Numerical-software packages * Analytica is a widely used proprietary software tool for building and analyzing numerical models. It is a de ...


References


External links

{{Official website, fst.utia.cz Classification algorithms Data mining and machine learning software C++ software Data modeling tools Computer libraries Cross-platform software