List Of Text Mining Software
   HOME

TheInfoList



OR:

Text mining Text mining, also referred to as ''text data mining'', similar to text analytics, is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extract ...
computer programs are available from many
commercial Commercial may refer to: * a dose of advertising conveyed through media (such as - for example - radio or television) ** Radio advertisement ** Television advertisement * (adjective for:) commerce, a system of voluntary exchange of products and s ...
and
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
companies and sources.


Commercial

*
Angoss Angoss Software Corporation, headquartered in Toronto, Ontario, Canada, with offices in the United States and UK, acquired by Datawatch and now owned by Altair, was a provider of predictive analytics systems through software licensing and ser ...
– Angoss Text Analytics provides
entity An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not. It need not be of material existence. In particular, abstractions and legal fictions are usually ...
and theme extraction, topic categorization,
sentiment analysis Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjec ...
and
document summarization Automatic summarization is the process of shortening a set of data computationally, to create a subset (a Abstract (summary), summary) that represents the most important or relevant information within the original content. Artificial intelligence ...
capabilities via the embedded *
AUTINDEX AUTINDEX is a commercial text mining software package based on sophisticated linguistics. ''AUTINDEX'', resulting from research in information extraction, is a product of the Institute of Applied Information Sciences (IAI) which is a non-profit ...
– is a commercial text mining software package based on sophisticated linguistics by IAI (Institute for Applied Information Sciences), Saarbrücken. * DigitalMR – social media listening & text+image analytics tool for market research *
FICO Score A credit score is a number that provides a comparative estimate of an individual's creditworthiness based on an analysis of their credit report. It is an inexpensive and main alternative to other forms of consumer loan underwriting. Lenders, s ...
– leading provider of analytics. * General Sentiment – Social Intelligence platform that uses
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
to discover affinities between the fans of brands with the fans of traditional television shows in social media. Stand alone text analytics to capture social knowledge base on billions of topics stored to 2004. * IBM LanguageWare – the IBM suite for text analytics (tools and Runtime). * IBM
SPSS SPSS Statistics is a statistical software suite developed by IBM for data management, advanced analytics, multivariate analysis, business intelligence, and criminal investigation. Long produced by SPSS Inc., it was acquired by IBM in 2009. C ...
– provider of Modeler Premium (previously called IBM SPSS Modeler and IBM SPSS Text Analytics), which contains advanced NLP-based text analysis capabilities (multi-lingual sentiment, event and fact extraction), that can be used in conjunction with Predictive Modeling. Text Analytics for Surveys provides the ability to categorize survey responses using NLP-based capabilities for further analysis or reporting. *
Inxight Inxight Software, Inc. was a software company specializing in visualization, information retrieval and natural language processing. It was bought by Business Objects in 2007; Business Objects was in turn acquired by SAP AG in 2008. Founded in 1 ...
– provider of text analytics, search, and unstructured visualization technologies. (Inxight was bought by
Business Objects A business object is an entity within a multi-tiered software application that works in conjunction with the data access and business logic layers to transport data. For example, a "Manager" would be a ''business object'' where its attributes c ...
that was bought by
SAP AG Sap is a fluid transported in xylem cells (vessel elements or tracheids) or phloem sieve tube elements of a plant. These cells transport water and nutrients throughout the plant. Sap is distinct from latex, resin, or cell sap; it is a sepa ...
in 2008). *
Language Computer Corporation Language Computer Corporation (LCC) is a natural language processing research company based in Richardson, Texas. The company develops a variety of natural language processing products, including software for question answering, information extr ...
– text extraction and analysis tools, available in multiple languages. * Lexalytics – provider of a text analytics engine used in Social Media Monitoring, Voice of Customer, Survey Analysis, and other applications. Salience Engine. The software provides the unique capability of merging the output of unstructured, text-based analysis with structured data to provide additional predictive variables for improved predictive models and association analysis. * Linguamatics – provider of
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
(NLP) based enterprise text mining and text analytics software, I2E, for high-value knowledge discovery and decision support. *
Mathematica Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimizat ...
– provides built in tools for text alignment, pattern matching, clustering and semantic analysis. See
Wolfram Language The Wolfram Language ( ) is a general multi-paradigm programming language developed by Wolfram Research. It emphasizes symbolic computation, functional programming, and rule-based programming and can employ arbitrary structures and data. It is ...
, the programming language of Mathematica. *
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
offers Text Analytics Toolbox for importing text data, converting it to numeric form for use in machine and deep learning, sentiment analysis and classification tasks. *
Medallia Medallia is an American customer and employee experience management company based in San Francisco, California, with offices in Buenos Aires, London, Tel Aviv, Munich, Paris, New York City, Washington DC, Austin and Sydney. Medallia provides so ...
– offers one system of record for survey, social, text, written and online feedback. *
NetOwl NetOwl is a suite of multilingual text and identity analytics products that analyze big data in the form of text data – reports, web, social media, etc. – as well as structured entity data about people, organizations, places, and things. NetO ...
– suite of multilingual text and entity analytics products, including entity extraction, link and event extraction, sentiment analysis, geotagging, name translation, name matching, and identity resolution, among others. *
PolyAnalyst PolyAnalyst is a data science software platform developed by Megaputer Intelligence that provides an environment for text mining, data mining, machine learning, and predictive analytics. It is used by Megaputer to build tools with applications ...
- text analytics environment. *
PoolParty Semantic Suite The PoolParty Semantic Suite is a technology platform provided by the Semantic Web Company. The EU-based company belongs to the early pioneers of the Semantic Web movement. The product uses standards-based technologies as defined by W3C, which p ...
- graph-based text mining platform. *
RapidMiner RapidMiner is a data science platform designed for enterprises that analyses the collective impact of organizations’ employees, expertise and data. Rapid Miner's data science platform is intended to support many analytics users across a broad AI ...
with its Text Processing Extension – data and text mining software. * SAS – SAS Text Miner and Teragram; commercial text analytics, natural language processing, and taxonomy software used for
Information Management Information management (IM) concerns a cycle of organizational activity: the acquisition of information from one or more sources, the custodianship and the distribution of that information to those who need it, and its ultimate disposal throug ...
. *
Sketch Engine Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing CZ s.r.o. since 2003. Its purpose is to enable people studying language behaviour ( lexicographers, researchers in corpus linguistics, translators or lan ...
– a corpus manager and analysis software which providing creating
text corpora In linguistics, a corpus (plural ''corpora'') or text corpus is a language resource consisting of a large and structured set of texts (nowadays usually electronically stored and processed). In corpus linguistics, they are used to do statistical ...
from uploaded texts or the Web including
part-of-speech tagging In corpus linguistics, part-of-speech tagging (POS tagging or PoS tagging or POST), also called grammatical tagging is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definitio ...
and
lemmatization Lemmatisation ( or lemmatization) in linguistics is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word's lemma, or dictionary form. In computational linguistics, lemma ...
or detecting a particular website. *
Sysomos Sysomos Inc. is a Toronto-based social media analytics company owned by Outside Insight market leaders Meltwater. History Sysomos was founded by Nilesh Bansal and Nick Koudas. The company is a spinoff of the University of Toronto research pro ...
– provider social media analytics software platform, including text analytics and sentiment analysis on online consumer conversations. * WordStat – Content analysis and text mining add-on module of QDA Miner for analyzing large amounts of text data.


Open source

* Carrot2 – text and search results clustering framework. *
GATE A gate or gateway is a point of entry to or from a space enclosed by walls. The word derived from old Norse "gat" meaning road or path; But other terms include ''yett and port''. The concept originally referred to the gap or hole in the wall ...
– general Architecture for Text Engineering, an open-source toolbox for natural language processing and language engineering. *
Gensim Gensim is an open-source library for unsupervised topic modeling, document indexing, retrieval by similarity, and other natural language processing functionalities, using modern statistical machine learning. Gensim is implemented in Python and ...
– large-scale topic modelling and extraction of semantic information from unstructured text (
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
). * Haystack by
deepset deepset is a startup that provides software developers with the tools to build production-ready natural language processing (NLP) systems. It was founded in 2018 in Berlin by Milos Rusic, Malte Pietsch, and Timo Möller. deepset authored and mai ...
– a framework for building search systems with
document retrieval Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly natural language, unstructured text, such as newspaper articles, real estate records or paragraphs ...
and
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
capabilities. *
KH Coder KH Coder is an open source software for computer assisted qualitative data analysis, particularly quantitative content analysis and text mining. It can be also used for computational linguistics. It supports processing and etymological information ...
– for Quantitative Content Analysis or Text Mining * The
KNIME KNIME (), the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks ...
Text Processing extension. *
Natural Language Toolkit The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for English written in the Python programming language. It was developed by Steven Bird and E ...
(NLTK) – a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
programming language. *
OpenNLP The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named en ...
– natural language processing. *
Orange Orange most often refers to: *Orange (fruit), the fruit of the tree species '' Citrus'' × ''sinensis'' ** Orange blossom, its fragrant flower *Orange (colour), from the color of an orange, occurs between red and yellow in the visible spectrum * ...
with its text mining add-on. * The
PLOS PLOS (for Public Library of Science; PLoS until 2012 ) is a nonprofit publisher of open-access journals in science, technology, and medicine and other scientific literature, under an open-content license. It was founded in 2000 and launc ...
Text Mining Collection. * The programming language R provides a framework for text mining applications in the package ''tm''. The Natural Language Processing task view contains ''tm'' and other text mining library packages. *
spaCy spaCy ( ) is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines ...
– open-source Natural Language Processing library for Python * Stanbol – an open source text mining engine targeted at semantic content management. * Voyant Tools – a web-based text analysis environment, created as a scholarly project.


References


External links


Text Mining APIs on Mashape

Text Mining APIs on Programmable Web

Text Mining APIs at the Text Analysis Portal for Research
{{DEFAULTSORT:Text mining software Data mining and machine learning software Lists of software