Kaggle
   HOME

TheInfoList



OR:

Kaggle, a subsidiary of
Google LLC Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. It ...
, is an online community of
data scientist Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a bro ...
s and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
practitioners. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges. Kaggle was first launched in 2010 by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and Artificial Intelligence education. Its key personnel were Anthony Goldbloom and Jeremy Howard.
Nicholas Gruen Nicholas Gruen (born 1957) is a prominent Australian economist and commentator on economic reform, innovation and the CEO of Lateral Economics. He is a Visiting Professor at King's College London's Policy Institute. He was formerly Chair of the A ...
was the founding chair succeeded by
Max Levchin Maksymilian Rafailovych "Max" Levchin ( uk, Максиміліан Рафаїлович Левчин; born July 11, 1975) is a Ukrainian-American software engineer and businessman. In 1998, he co-founded the company that eventually became PayP ...
. Equity was raised in 2011 valuing the company at $25.2 million. On 8 March 2017, Google announced that they were acquiring Kaggle.


Kaggle community

In June 2017, Kaggle claimed it surpassed 1 million registered users and as of 2021 over 8 million. The users come from 194 countries. By March 2017, the
Two Sigma Investments Two Sigma Investments is a New York City-based hedge fund that uses a variety of technological methods, including artificial intelligence, machine learning, and distributed computing, for its trading strategies. The firm is run by John Overdec ...
fund was running a competition on Kaggle to code a trading algorithm.


Overview

# The competition host prepares the data and a description of the problem; the host may choose whether it's going to be rewarded with money or by unpaid. # Participants experiment with different techniques and compete against each other to produce the best models. Work is shared publicly through Kaggle Kernels to achieve a better benchmark and to inspire new ideas. Submissions can be made through Kaggle Kernels, through manual upload or using the Kaggle
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
. For most competitions, submissions are scored immediately (based on their predictive accuracy relative to a hidden solution file) and summarized on a live leaderboard. # After the deadline passes, the competition host pays the prize money in exchange for "a worldwide, perpetual, irrevocable and royalty-free license ..to use the winning Entry", i.e. the algorithm, software and related
intellectual property Intellectual property (IP) is a category of property that includes intangible creations of the human intellect. There are many types of intellectual property, and some countries recognize more than others. The best-known types are patents, cop ...
developed, which is "non-exclusive unless otherwise specified". Alongside its public competitions, Kaggle also offers private competitions limited to Kaggle's top participants. Kaggle offers a free tool for data science teachers to run academic machine-learning competitions. Kaggle also hosts recruiting competitions in which data scientists compete for a chance to interview leading data science companies like
Facebook Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
, Winton Capital, and
Walmart Walmart Inc. (; formerly Wal-Mart Stores, Inc.) is an American multinational retail corporation that operates a chain of hypermarkets (also called supercenters), discount department stores, and grocery stores from the United States, headquarter ...
.


Competitions

Hundreds of
machine-learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
competitions were run on Kaggle since the company was founded. Competitions have ranged from improving gesture recognition for
Microsoft Kinect Kinect is a line of motion sensing input devices produced by Microsoft and first released in 2010. The devices generally contain RGB cameras, and infrared projectors and detectors that map depth through either structured light or time of fli ...
to making a
football Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. Unqualified, the word ''football'' normally means the form of football that is the most popular where the word is used. Sports commonly c ...
AI for
Manchester City Manchester () is a city in Greater Manchester, England. It had a population of 552,000 in 2021. It is bordered by the Cheshire Plain to the south, the Pennines to the north and east, and the neighbouring city of Salford to the west. The tw ...
to improving the search for the
Higgs boson The Higgs boson, sometimes called the Higgs particle, is an elementary particle in the Standard Model of particle physics produced by the quantum excitation of the Higgs field, one of the fields in particle physics theory. In the Stand ...
at
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in a northwestern suburb of Gene ...
. Competitions have resulted in many successful projects including furthering the state of the art in
HIV The human immunodeficiency viruses (HIV) are two species of ''Lentivirus'' (a subgroup of retrovirus) that infect humans. Over time, they cause acquired immunodeficiency syndrome (AIDS), a condition in which progressive failure of the immune ...
research,
chess Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to disti ...
ratings and
traffic Traffic comprises pedestrians, vehicles, ridden or herded animals, trains, and other conveyances that use public ways (roads) for travel and transportation. Traffic laws govern and regulate traffic, while rules of the road include traffic ...
forecasting.
Geoffrey Hinton Geoffrey Everest Hinton One or more of the preceding sentences incorporates text from the royalsociety.org website where: (born 6 December 1947) is a British-Canadian cognitive psychologist and computer scientist, most noted for his work on a ...
and George Dahl used deep
neural networks A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
to win a competition hosted by
Merck Merck refers primarily to the German Merck family and three companies founded by the family, including: * the Merck Group, a German chemical, pharmaceutical and life sciences company founded in 1668 ** Merck Serono (known as EMD Serono in the Unite ...
. And Vlad Mnih (one of Hinton's students) used deep neural networks to win a competition hosted by Adzuna. This resulted in the technique being taken up by others in the Kaggle community. Tianqi Chen from the University of Washington also used Kaggle to show the power of XGBoost, which has since taken over from
Random Forest Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time. For classification tasks, the output of th ...
as one of the main methods used to win Kaggle competitions. Several academic papers have been published on the basis of findings made in Kaggle competitions. A key to this is the effect of the live leaderboard, which encourages participants to continue innovating beyond existing best practices. The winning methods are frequently written up on the Kaggle blog
''Kaggle Winner's Blog''


Financials

In March 2017,
Fei-Fei Li Fei-Fei Li (; born 1976) is a Chinese-American computer scientist who is known for establishing ImageNet, the dataset that enabled rapid advances in computer vision in the 2010s. She is the Sequoia Capital Professor of Computer Science at S ...
, Chief Scientist at Google, announced that
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
was acquiring Kaggle during her keynote at Google Next.


See also

*
Data science competition platform A data science competition platform is used by businesses to host data science challenges that are hard to solve for one group. Historically, crowdsourcing challenges have been known to solve very complex problems. The Netflix Prize is one such c ...
*
Anthony Goldbloom Anthony John Goldbloom (born 21 June 1983) is the founder and CEO of Kaggle, a data science competition platform which has used predictive modelling competitions to solve problems for NASA, Wikipedia, Ford and Deloitte. Kaggle has improved the ...


References


Further reading


"Competition shines light on dark matter", Office of Science and Technology Policy, Whitehouse website, June 2011"May the best algorithm win...", ''The Wall Street Journal'', March 2011
* ttp://www.nature.com/nbt/journal/v29/n9/full/nbt.1968.html "Verification of systems biology research in the age of collaborative competition", ''Nature Nanotechnology'', September 2011 {{Google Cloud 2010 establishments in California 2017 mergers and acquisitions Analytics companies Applied machine learning Computer science competitions Crowdsourcing Forecasting competitions Google acquisitions Google Cloud Programming contests