HOME

TheInfoList



OR:

Waikato Environment for Knowledge Analysis (Weka), developed at the
University of Waikato The University of Waikato ( mi, Te Whare Wānanga o Waikato), is a Public university, public research university in Hamilton, New Zealand, Hamilton, New Zealand established in 1964. An additional campus is located in Tauranga. The university perfo ...
,
New Zealand New Zealand ( mi, Aotearoa ) is an island country in the southwestern Pacific Ocean. It consists of two main landmasses—the North Island () and the South Island ()—and over 700 smaller islands. It is the sixth-largest island count ...
, is
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
licensed under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
, and the companion software to the book "Data Mining: Practical Machine Learning Tools and Techniques".


Description

Weka contains a collection of visualization tools and algorithms for
data analysis Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, enco ...
and
predictive modeling Predictive modelling uses statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modelling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive mod ...
, together with graphical user interfaces for easy access to these functions. The original non-Java version of Weka was a
Tcl TCL or Tcl or TCLs may refer to: Business * TCL Technology, a Chinese consumer electronics and appliance company **TCL Electronics, a subsidiary of TCL Technology * Texas Collegiate League, a collegiate baseball league * Trade Centre Limited ...
/ Tk front-end to (mostly third-party) modeling algorithms implemented in other programming languages, plus
data preprocessing Data preprocessing can refer to manipulation or dropping of data before it is used in order to ensure or enhance performance, and is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to ...
utilities in C, and a
makefile In software development, Make is a build automation tool that automatically builds executable programs and libraries from source code by reading files called ''Makefiles'' which specify how to derive the target program. Though integrated develo ...
-based system for running machine learning experiments. This original version was primarily designed as a tool for analyzing data from agricultural domains, but the more recent fully
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
-based version (Weka 3), for which development started in 1997, is now used in many different application areas, in particular for educational purposes and research. Advantages of Weka include: * Free availability under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
. * Portability, since it is fully implemented in the
Java programming language Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let programmers ''write once, run anywh ...
and thus runs on almost any modern computing platform. * A comprehensive collection of data preprocessing and modeling techniques. * Ease of use due to its graphical user interfaces. Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering,
classification Classification is a process related to categorization, the process in which ideas and objects are recognized, differentiated and understood. Classification is the grouping of related facts into classes. It may also refer to: Business, organizat ...
, regression,
visualization Visualization or visualisation may refer to: *Visualization (graphics), the physical or imagining creation of images, diagrams, or animations to communicate a message * Data visualization, the graphic representation of data * Information visualiz ...
, and
feature selection In machine learning and statistics, feature selection, also known as variable selection, attribute selection or variable subset selection, is the process of selecting a subset of relevant features (variables, predictors) for use in model construc ...
. Input to Weka is expected to be formatted according the Attribute-Relational File Format and with the filename bearing the extension. All of Weka's techniques are predicated on the assumption that the data is available as one flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other attribute types are also supported). Weka provides access to SQL
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
s using
Java Database Connectivity Java Database Connectivity (JDBC) is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. ...
and can process the result returned by a database query. Weka provides access to
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. De ...
with
Deeplearning4j Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, d ...
. It is not capable of multi-relational data mining, but there is separate software for converting a collection of linked database tables into a single table that is suitable for processing using Weka. Another important area that is currently not covered by the algorithms included in the Weka distribution is sequence modeling.


Extension packages

In version 3.7.2, a package manager was added to allow the easier installation of extension packages. Some functionality that used to be included with Weka prior to this version has since been moved into such extension packages, but this change also makes it easier for others to contribute extensions to Weka and to maintain the software, as this modular architecture allows independent updates of the Weka core and individual extensions.


History

* In 1993, the
University of Waikato The University of Waikato ( mi, Te Whare Wānanga o Waikato), is a Public university, public research university in Hamilton, New Zealand, Hamilton, New Zealand established in 1964. An additional campus is located in Tauranga. The university perfo ...
in
New Zealand New Zealand ( mi, Aotearoa ) is an island country in the southwestern Pacific Ocean. It consists of two main landmasses—the North Island () and the South Island ()—and over 700 smaller islands. It is the sixth-largest island count ...
began development of the original version of Weka, which became a mix of Tcl/Tk, C, and makefiles. * In 1997, the decision was made to redevelop Weka from scratch in Java, including implementations of modeling algorithms. * In 2005, Weka received the
SIGKDD SIGKDD, representing the Association for Computing Machinery's (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining, hosts an influential annual conference. Conference history The KDD Conference grew from KDD (Knowledge Di ...
Data Mining and Knowledge Discovery Service Award. * In 2006,
Pentaho Pentaho is business intelligence (BI) software that provides data integration, OLAP services, reporting, information dashboards, data mining and extract, transform, load (ETL) capabilities. Its headquarters are in Orlando, Florida. Pentaho was ...
Corporation acquired an exclusive licence to use Weka for
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
. It forms the data mining and
predictive analytics Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events. In business ...
component of the Pentaho business intelligence suite. Pentaho has since been acquired by Hitachi Vantara, and Weka now underpins the PMI (Plugin for Machine Intelligence) open source component.


Related tools

* Auto-WEKA is an
automated machine learning Automated machine learning (AutoML) is the process of automating the tasks of applying machine learning to real-world problems. AutoML potentially includes every stage from beginning with a raw dataset to building a machine learning model ready ...
system for Weka. * Environment for DeveLoping KDD-Applications Supported by Index-Structures (
ELKI ELKI (for ''Environment for DeveLoping KDD-Applications Supported by Index-Structures'') is a data mining (KDD, knowledge discovery in databases) software framework developed for use in research and teaching. It was originally at the database s ...
) is a similar project to Weka with a focus on
cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
, i.e., unsupervised methods. * H2O.ai is an open-source data science and machine learning platform *
KNIME KNIME (), the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks ...
is a machine learning and data mining software implemented in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. *
Massive Online Analysis Massive Online Analysis (MOA) is a free open-source software project specific for data stream mining with concept drift. It is written in Java and developed at the University of Waikato, New Zealand. Description MOA is an open-source framework ...
(MOA) is an open-source project for large scale mining of data streams, also developed at the
University of Waikato The University of Waikato ( mi, Te Whare Wānanga o Waikato), is a Public university, public research university in Hamilton, New Zealand, Hamilton, New Zealand established in 1964. An additional campus is located in Tauranga. The university perfo ...
in
New Zealand New Zealand ( mi, Aotearoa ) is an island country in the southwestern Pacific Ocean. It consists of two main landmasses—the North Island () and the South Island ()—and over 700 smaller islands. It is the sixth-largest island count ...
. * Neural Designer is a data mining software based on
deep learning Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. De ...
techniques written in
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
. *
Orange Orange most often refers to: *Orange (fruit), the fruit of the tree species '' Citrus'' × ''sinensis'' ** Orange blossom, its fragrant flower *Orange (colour), from the color of an orange, occurs between red and yellow in the visible spectrum * ...
is a similar open-source project for data mining, machine learning and visualization based on
scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector ...
. *
RapidMiner RapidMiner is a data science platform designed for enterprises that analyses the collective impact of organizations’ employees, expertise and data. Rapid Miner's data science platform is intended to support many analytics users across a broad AI ...
is a commercial
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
framework implemented in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
which integrates Weka. *
scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector ...
is a popular machine learning library in Python.


See also

*
List of numerical-analysis software Listed here are notable end-user computer applications intended for use with numerical or data analysis: Numerical-software packages General-purpose computer algebra systems Interface-oriented Language-oriented Historically signific ...


References


External links

* at University of Waikato in New Zealand {{DEFAULTSORT:Weka (machine learning) Data mining and machine learning software Free artificial intelligence applications Free data analysis software Free science software Free software programmed in Java (programming language) Software using the GPL license