Doug Cutting
   HOME

TheInfoList



OR:

Douglass Read Cutting is a software designer, advocate, and creator of
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized so ...
search technology. He founded two technology projects,
Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as ...
, and
Nutch Apache Nutch is a highly extensible and scalable open source web crawler software project. Features Nutch is coded entirely in the Java programming language, but data is written in language-independent formats. It has a highly modular architec ...
, with
Mike Cafarella __NOTOC__ Mike Cafarella is a computer scientist specializing in database management systems. He is an associate professor of computer science at University of Michigan. Along with Doug Cutting, he is one of the original co-founders of the Hado ...
. Both projects are now managed through the
Apache Software Foundation The Apache Software Foundation (ASF) is an American nonprofit corporation (classified as a 501(c)(3) organization in the United States) to support a number of open source software projects. The ASF was formed from a group of developers of the ...
. Cutting and Cafarella are also the co-founders of
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage a ...
.


Education and early career

Cutting graduated from
Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is conside ...
in 1985 with a
bachelor's degree A bachelor's degree (from Middle Latin ''baccalaureus'') or baccalaureate (from Modern Latin ''baccalaureatus'') is an undergraduate academic degree awarded by colleges and universities upon completion of a course of study lasting three to si ...
. Prior to developing Lucene, Cutting held search technology positions at
Xerox PARC PARC (Palo Alto Research Center; formerly Xerox PARC) is a research and development company in Palo Alto, California. Founded in 1969 by Jacob E. "Jack" Goldman, chief scientist of Xerox Corporation, the company was originally a division of Xero ...
where he worked on the Scatter/Gather algorithm Cutting, Douglass R., David R. Karger, Jan O. Pedersen, and John W. Tukey. "Scatter/gather: A cluster-based approach to browsing large document collections." SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval. (Reprinted in ACM SIGIR Forum, vol. 51, no. 2, pp. 148-159. ACM, 2017.) Pedersen, Jan O., David Karger, Douglass R. Cutting, and John W. Tukey. "Scatter-gather: a cluster-based method and apparatus for browsing large document collections." U.S. Patent 5,442,778, issued August 15, 1995. and on computational
stylistics Stylistics, a branch of applied linguistics, is the study and interpretation of texts of all types and/or spoken language in regard to their linguistic and tonal style, where style is the particular variety of language used by different individu ...
. He also worked at Excite, where he was one of the chief designers of the
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
, and Apple Inc., where he was the primary author of the
V-Twin A V-twin engine, also called a V2 engine, is a two-cylinder piston engine where the cylinders share a common crankshaft and are arranged in a V configuration. Although widely associated with motorcycles (installed either transversely or longi ...
text search framework.


Open source projects

Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as ...
, a search indexer, and
Nutch Apache Nutch is a highly extensible and scalable open source web crawler software project. Features Nutch is coded entirely in the Java programming language, but data is written in language-independent formats. It has a highly modular architec ...
, a spider or crawler, are the two key components of an open-source general search platform, which first crawls the Web for content, and then structures it into a searchable index. Cutting's leadership of these two projects extended the concepts and capabilities of general open-source software projects such as
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
and
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database ...
into the vertical domain of search. In a 2017 article, Cutting was quoted with the statement, "open source is a requirement for business."


Use of MapReduce paradigm

In December 2004,
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
Research published a paper on the
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filtering ...
algorithm, which allows very large-scale computations to be trivially parallelized across large clusters of servers. Cutting and
Mike Cafarella __NOTOC__ Mike Cafarella is a computer scientist specializing in database management systems. He is an associate professor of computer science at University of Michigan. Along with Doug Cutting, he is one of the original co-founders of the Hado ...
, realizing the importance of this paper to extending
Lucene Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as ...
into the realm of extremely large search problems, created the open-source
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
framework A framework is a generic term commonly referring to an essential supporting structure which other things are built on top of. Framework may refer to: Computing * Application framework, used to implement the structure of an application for an op ...
that allows applications based on the
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filtering ...
paradigm to be run on large clusters of commodity hardware. Cutting was an employee of
Yahoo! Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo Inc., which is 90% owned by investment funds managed by Apollo Global Mana ...
, where he led the
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
project full-time; he later went on to work for
Cloudera Cloudera, Inc. is an American software company providing enterprise data management systems that make significant use of Apache Hadoop. As of January 31, 2021, the company had approximately 1,800 customers. History Cloudera, Inc. was formed on J ...
.


Open source foundations and awards

In July 2009, Cutting was elected to the board of directors of the Apache Software Foundation, and in September 2010, he was elected the chairman. In 2015, O'Reilly awarded an open source award to Cutting.


References


External links

*
An interview with Doug CuttingVideo interview of Doug CuttingAudio interview with Doug Cutting
Note that this post was written while Hadoop was still an unnamed spinoff of
Nutch Apache Nutch is a highly extensible and scalable open source web crawler software project. Features Nutch is coded entirely in the Java programming language, but data is written in language-independent formats. It has a highly modular architec ...
. Tom updates his earlier post with the
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
nam
here

Article co-authored by Doug Cutting in ACM Queue, 'Building Nutch: Open Source Search'
{{DEFAULTSORT:Cutting, Doug American information theorists Living people Year of birth missing (living people) Stanford University alumni Scientists at PARC (company) Yahoo! employees Apple Inc. employees American computer programmers Open source advocates