Rexer's Annual Data Miner Survey
   HOME

TheInfoList



OR:

Rexer Analytics’s Annual Data Miner Survey is the largest
survey Survey may refer to: Statistics and human research * Statistical survey, a method for collecting quantitative information about items in a population * Survey (human research), including opinion polls Spatial measurement * Surveying, the techniq ...
of data mining,
data science Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a br ...
, and
analytics Analytics is the systematic computational analysis of data or statistics. It is used for the discovery, interpretation, and communication of meaningful patterns in data. It also entails applying data patterns toward effective decision-making. It ...
professionals in the industry. It consists of approximately 50
multiple choice Multiple choice (MC), objective response or MCQ (for multiple choice question) is a form of an objective assessment in which respondents are asked to select only correct answers from the choices offered as a list. The multiple choice format is m ...
and open-ended questions that cover seven general areas of data mining science and practice: (1) Field and goals, (2)
Algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s, (3)
Model A model is an informative representation of an object, person or system. The term originally denoted the Plan_(drawing), plans of a building in late 16th-century English, and derived via French and Italian ultimately from Latin ''modulus'', a mea ...
s, (4)
Tool A tool is an object that can extend an individual's ability to modify features of the surrounding environment or help them accomplish a particular task. Although many animals use simple tools, only human beings, whose use of stone tools dates ba ...
s (software packages used), (5) Technology, (6) Challenges, and (7) Future. It is conducted as a service (without corporate sponsorship) to the data mining community, and the results are usually announced at the PAW (Predictive Analytics World) conferences and shared via freely available summary reports. In the 2013 survey, 1259 data miners from 75 countries participated.Karl Rexer, Heather Allen, & Paul Gearan (2011)
2011 Data Miner Survey Summary''
presented at Predictive Analytics World, Oct. 2011.
After 2011, Rexer Analytics moved to a biannual schedule.


Surveys

# 2015 Survey: 1,220 participants from 72 countries. # 2013 Survey: 68-item survey; 1259 participants from 75 countries. # 2011 Survey: 52-item survey; 1319 participants from over 60 countries. Citations include:Bob Thompson (2012)
''Big Data and Analytics in a Customer-Focused Enterprise: Inside Scoop with Karl Rexer''
CustomerThink, August 7, 2012.
Selena Welz (2012); ''Meet R: a programming language that makes sense of Big Data'', Technology @ Work, Tendo Communications, November 2012. # 2010 Survey: 50-item survey; 735 participants from 60 countries.Karl Rexer, Heather Allen, & Paul Gearan (2010)

presented at Predictive Analytics World, Oct. 2010.
Karl Rexer, Heather Allen, & Paul Gearan (2011)
''Understanding Data Miners''
Analytics Magazine, May/June 2011 (INFORMS: Institute for Operations Research and the Management Sciences).
Citations include:Emilia Mikołajewska and Dariusz Mikołajewski (2011); ''System eksploracji danych na potrzeby obronności państwa''], Kwartalnik Bellona, 2011, Volume 3, pages 119-129 (''Data Mining system for national security purposes'', Bellona Quarterly, Scientific Journal of the Polish Ministry of National Defense; Article is in Polish).Tomasz Ząbkowski (2011)
''Data Mining - Current State and Future Trends''
Information Systems in Management XIII, Business Intelligence and Knowledge Management, Warsaw University of Life Sciences Press, Warsaw, 2011, pages 122-130; .
Tuba Islam (2011)
''How to use Analytics to Improve Your Business: Real Practices''
, SAS Business Analytics Series, Istanbul, Turkey, April, 2011 (presentation is in Turkish).
Shawn Hessinger (2011)
''CRM & Marketing Top Fields for Data Miners''
All Analytics, November 9, 2011.
Gustavo Valencia (2012)
''Minería de Datos: Sesión 0''
Universidad Pontificia Bolivariana, Graduate class
Data mining and Information visualization
, 2012 (Presentation is in Spanish).
Robert A. Muenchen (2012)
''The Popularity of Data Analysis Software''
# 2009 Survey: 40-item survey; 710 participants from 58 countries.Karl Rexer, Heather Allen, & Paul Gearan (2009)

presented at SPSS Directions Conference, Oct. 2009.
Citations include:M. Arthur Munson (2011)
''A Study on the Importance of and Time Spent on Different Modeling Steps''
, ACM SIGKDD Explorations, Volume 13, Issue 2, December 2011, pages 65-71.
Ervina Çergani (2009); ''Data Mining Survey'', Survey of Businesses in Tirana, Albania; July, 2009 (Originally in Albanian, translated into English).Valerie Valentine (2010)

Information Management, March 25, 2010.
Ajay Ohri (2009)
''Interview Karl Rexer - Rexer Analytics''
# 2008 Survey: 34-item survey; 348 participants from 44 countries.Karl Rexer, Paul Gearan, & Heather Allen (2008)

presented at SPSS Directions Conference, Oct. 2008, and Oracle BIWA (Business Intelligence, Data Warehousing and Advanced Analytics) Summit, Nov. 2008.
Citations include:Mayato (2008)
''Mayato Study: Data Mining Software 2009''
, November 2008 (available in German and English).
# 2007 Survey: 27-item survey; 314 participants from 35 countries.Karl Rexer, Paul Gearan, & Heather Allen (2007)

presented at SPSS Directions Conference, Oct. 2007, and Oracle BIWA Summit, Oct. 2007.
Karl Rexer, Paul Gearan, & Heather Allen (2008)
''Portrait of a data miner''
Quirk's Marketing Research Media, March 2008.


Recent survey results

While the five Data Miner surveys have covered many data mining topics, the three topics that get the most attention in citations and at conference presentations are: * Algorithms: Each year the surveys have consistently shown that Decision tree learning, decision trees,
regression Regression or regressions may refer to: Science * Marine regression, coastal advance due to falling sea level, the opposite of marine transgression * Regression (medicine), a characteristic of diseases to express lighter symptoms or less extent ( ...
, and
cluster analysis Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of ...
form a triad of core algorithms for most data miners. However, a wide variety of algorithms are being used. This is consistent with independent polls of data miners conducted by KDnuggets over the years.Gregory Piatetsky-Shapiro (2011)
''Algorithms for Data Analysis / Data Mining''
KDnuggets, 2011.
Gregory Piatetsky-Shapiro (2007)

KDnuggets, 2007.
* Data Mining Tools: Data miners report using an average of four software tool to conduct their analyses. Over the survey years, R has risen in popularity. In 2010 it overtook SPSS Statistics and
SAS SAS or Sas may refer to: Arts, entertainment, and media * ''SAS'' (novel series), a French book series by Gérard de Villiers * ''Shimmer and Shine'', an American animated children's television series * Southern All Stars, a Japanese rock ba ...
to become the tool used by the most data miners. And the 2011 survey showed that R is now being used by close to half of all data miners (47%).
STATISTICA Statistica is an advanced analytics software package originally developed by StatSoft and currently maintained by TIBCO Software Inc. Statistica provides data analysis, data management, statistics, data mining, machine learning, text analytics a ...
has also grown in popularity. From 2007-2009 more data miners indicated that SPSS Clementine (now IBM SPSS Modeler) was their primary data mining tool than any other tool. However, in 2010 and 2011, STATISTICA was cited most frequently as data miners' primary tool. In terms of satisfaction with their tools, in the past few years, STATISTICA,
SPSS Modeler IBM SPSS Modeler is a data mining and text analytics software application from IBM. It is used to build predictive models and conduct other analytic tasks. It has a visual interface which allows users to leverage statistical and data mining a ...
, R,
KNIME KNIME (), the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks ...
,
RapidMiner RapidMiner is a data science platform designed for enterprises that analyses the collective impact of organizations’ employees, expertise and data. Rapid Miner's data science platform is intended to support many analytics users across a broad AI ...
and Salford Systems have received the strongest satisfaction ratings from data miners in these surveys. The growing popularity of R is consistent with independent polls of data miners conducted by KDnuggets, but the KDnuggets polls show a different picture regarding the popularity of commercial data mining software.David Smith (2012)
''R Tops Data Mining Software Poll''
, Java Developers Journal, May 31, 2012.
Gregory Piatetsky-Shapiro (2011)

KDnuggets, 2011.
Gregory Piatetsky-Shapiro (2010)

KDnuggets, 2010.
Robert Muenchen has taken a multi-faceted approach to assessing the popularity of data analysis software - an approach that includes blog post counts,
Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes p ...
data, listserv subscribers, use in competitions, book publications, Google
PageRank PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. According ...
, and more. His analyses are consistent with the Rexer Analytics Surveys and KDnuggets in outlining the growth of R, but Muenchen illustrates that the popularity of software is more nuanced and one's conclusions will be different depending on what measure of popularity is used. The Rexer Analytics survey summary reports include analyses of the data miners' satisfaction with 20 dimensions of their software. Haughton et al. and Nisbet have also produced reviews of data mining software.Nisbet, Robert A. (2006)
''Data Mining Tools: Which One is Best for CRM? Part 1''
Information Management Special Reports, January 2006.
* Challenges: Consistently across the years, dirty data, explaining data mining to others, and difficult access to data are the top challenges data miners report facing. Participants in the 2010 survey shared best practices for overcoming these challenges.Karl Rexer, Paul Gearan, & Heather Allen (2010)

verbatim responses are available online.


References

{{reflist


External links


Rexer Analytics home page



2009 Decisionstats interview of Karl Rexer
President o
Rexer Analytics

The Popularity of Data Analysis Software

Predictive Analytics World


Many single-item polls of data miners conducted from 2000 to the present. Data mining Surveys (human research)