Wikimedia Discovery
   HOME

TheInfoList



OR:

Knowledge Engine (KE) was a
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
project initiated in 2015 by the
Wikimedia Foundation The Wikimedia Foundation, Inc., or Wikimedia for short and abbreviated as WMF, is an American 501(c)(3) nonprofit organization headquartered in San Francisco, California and registered as a charitable foundation under local laws. Best know ...
(WMF) to locate and display verifiable and trustworthy information from public-information sources in a way that was less reliant on traditional search engines. It aimed to allow readers to stay on Wikipedia.org and other Wikipedia-related projects when looking for additional information rather than returning to proprietary search engines. Its goal was to protect user privacy, to be open and transparent about how a piece of information originates, and to allow access to related
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
. The development of the project idea was controversial internally, and was not pursued after 2016. Related ideas were applied to the internal cross-wiki search engine for Wikimedia projects.


History

In 2015, WMF applied for a $250,000 grant from the
Knight Foundation The John S. and James L. Knight Foundation, also known as the Knight Foundation, is an American non-profit foundation that provides grants for journalism, communities, and the arts. The organization was founded as the Knight Memorial Education ...
to support development of the Knowledge Engine. Its grant proposal noted: "Commercial search engines dominate search-engine use of the Internet, and they're employing proprietary technologies to consolidate channels of access to the Internet's knowledge and information." The project was designed in four stages, each scheduled to take about 18 months. The project planned to draw information from Wikipedia-related projects and eventually to search other sources of public information such as the
U.S. Census Bureau The United States Census Bureau (USCB), officially the Bureau of the Census, is a principal agency of the U.S. Federal Statistical System, responsible for producing data about the American people and economy. The Census Bureau is part of the ...
. Leaked internal WMF documents stated the "Knowledge Engine By Wikipedia will democratize the discovery of media, news and information—it will make the Internet's most relevant information more accessible and openly curated, and it will create an
open data Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movements ...
engine that's completely free of commercial interests. Our new site will be the Internet's first transparent search engine, and the first one that carries the reputation of Wikipedia and the Wikimedia Foundation." The new search engine was not expected to immediately replace a general purpose search engine because at first it would only draw on information from Wikipedia and its other free knowledge projects, though it might in time also have included academic and
open access Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
sources in its
search results A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in ...
. Matt Southern in ''Search Engine Journal'' attributed media confusion about the Knowledge Engine's scope to the fact that later WMF statements clarifying the organization's intentions were "quite a contrast to the original grant application documents". The project was not discussed publicly with the Wikipedia community while developing the concept, nor part of the existing annual plan. This secrecy was mirrored by a degree of confusion within the organization, and seen as at odds with the goal of transparency. An initial blogpost by WMF Executive Director
Lila Tretikov Lila Tretikov () (born Olga (Lyalya) Tretyakova, russian: Ольга (Ляля) Третьяко́ва, January 25, 1978) is a Russian–American engineer and manager. Early life and education Tretikov was born in Moscow, Soviet Union. Her fath ...
about the project did not address why the original proposal was so much broader than an internal search engine. Some staff and WMF board members felt the WMF was still not being straightforward with the Wikipedia community. This led to a crisis for the organization, leading to Tretikov's resignation in February 2016.


Design

The goal of the Knowledge Engine was to let readers and editors be less reliant on proprietary search engines when looking for new information. The project proposal asked, "Would users go to Wikipedia if it were an open channel beyond an encyclopedia?" The Knowledge Engine was designed to be open and transparent about how a piece of information originates and allow access to
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
. It would have no advertisements, protect user privacy, and emphasize community building and sharing of information. It would draw information from Wikipedia-related projects and eventually perhaps search other sources of public information such as the
U.S. Census Bureau The United States Census Bureau (USCB), officially the Bureau of the Census, is a principal agency of the U.S. Federal Statistical System, responsible for producing data about the American people and economy. The Census Bureau is part of the ...
,
OpenStreetMap OpenStreetMap (OSM) is a free, open geographic database updated and maintained by a community of volunteers via open collaboration. Contributors collect data from surveys, trace from aerial imagery and also import from other freely licensed g ...
, the Digital Public Library of America, and external sources like Fox News.
Jimmy Wales Jimmy Donal Wales (born August 7, 1966), also known on Wikipedia by the pseudonym Jimbo, is an American-British Internet entrepreneur, webmaster, and former financial trader. He is a co-founder of the online non-profit encyclopedia Wikipe ...
and the WMF stated that the project would focus on improving search on Wikipedia and related Wikimedia projects. The grant application stated that it would "create a model for surfacing high quality, public information on the internet." It also advised that "commercial search engines dominate search-engine use of the internet" and stated that "Google, Yahoo, or another big commercial search engine could suddenly devote resources to a similar project, which could reduce the success of the project."


Development timeline

Information about the project became public only gradually. As early as May 2015, community members asked about the concentration of staff in a new "Search and Discovery" department, though public plans made little or no reference to this work. The grant was applied for in mid-2015 and awarded in September, but only publicly announced in a January 2016 press release. The project plan had four stages, each scheduled to take about 18 months: Discovery, Advisory, Community and Extension. The initial stage of the project was budgeted to cost $2.5 million, with the whole running to the tens of millions. After a year, the WMF was to evaluate development to date, and at the close of the grant, set plans for the project to continue to the second stage.


Motivation and scope

A central source of confusion for the project was the extent to which it would directly compete with traditional search engines as a place to search the Web. According to ''
Vice A vice is a practice, behaviour, or habit generally considered immoral, sinful, criminal, rude, taboo, depraved, degrading, deviant or perverted in the associated society. In more minor usage, vice can refer to a fault, a negative character t ...
'', "the Wikimedia Foundation, the nonprofit that finances and founded Wikipedia, is interested in creating a search engine that appears squarely aimed at competing with Google." According to ''
The Guardian ''The Guardian'' is a British daily newspaper. It was founded in 1821 as ''The Manchester Guardian'', and changed its name in 1959. Along with its sister papers '' The Observer'' and '' The Guardian Weekly'', ''The Guardian'' is part of the ...
'', "there was considerable doubt over what the tool was actually intended to be: a search engine aimed at halting a decline in Wikipedia traffic sent by Google, or simply a service for searching within Wikipedia?" Since 2012, Google Search and other search engines had started highlighting brief informational summaries from Wikipedia in knowledge panels alongside search results, reducing traffic to Wikipedia from those search engines. According to
Search Engine Watch Search Engine Watch (SEW) provides news and information about search engines and search engine marketing. Search Engine Watch was started by Danny Sullivan in 1996. In 1997, Sullivan sold it for an undisclosed amount to Mecklermedia (now WebM ...
, this led to a battle for attention, and this project could have recouped some of that traffic. Leaked internal documents from early concepts framed the plan more boldly than the final public description. They said the "Knowledge Engine By Wikipedia will democratize the discovery of media, news and information—it will make the Internet's most relevant information more accessible and openly curated, and it will create an open data engine that's completely free of commercial interests. Our new site will be the Internet's first transparent search engine, and the first one that carries the reputation of Wikipedia and the Wikimedia Foundation." The apparent contradiction between different descriptions of the purpose led to confusion in the media and in the community. In response to speculation, the WMF published a response clarifying its intentions: "We're not building a global crawler search engine ... Despite headlines, we are not trying to compete with other platforms, including Google. As a non-profit we are noncommercial and support open knowledge. Our focus is on the knowledge contributed on the Wikimedia projects. ... We intend to research how Wikimedia users seek, find, and engage with content. This essential information will allow us to make critical improvements to discovery on the Wikimedia projects." Director of Discovery Tomasz Finc added "we are building an internal search engine, and we are not building a broad one. Jimmy Wales stated that suggestions that the WMF is creating a rival to Google are "trolling", "completely and utterly false", and "a total lie". while allowing that the Knowledge Engine might in time include academic and
open access Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
sources in its
search results A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in ...
. Matt Southern in ''Search Engine Journal'' attributed media confusion about the KE's scope to the fact that this was "quite a contrast to the original grant application documents", an assessment echoed by James Vincent in ''
The Verge ''The Verge'' is an American technology news website operated by Vox Media, publishing news, feature stories, guidebooks, product reviews, consumer electronics news, and podcasts. The website launched on November 1, 2011, and uses Vox Media ...
'', Matt McGee in ''
Search Engine Land Danny Sullivan is an American technologist, journalist, and entrepreneur. He is the founder of Search Engine Watch in 1997, one of the earliest online publications about search engine marketing. He also launched ''Search Engine Strategies'', one ...
'', and Jason Koebler in ''
Vice A vice is a practice, behaviour, or habit generally considered immoral, sinful, criminal, rude, taboo, depraved, degrading, deviant or perverted in the associated society. In more minor usage, vice can refer to a fault, a negative character t ...
''.


Controversy

Large-scale WMF projects are almost always discussed publicly with the Wikipedia community, but this did not happen with the Knowledge Engine development. Wikipedians were unaware of the existence of the project as a concept, and the KE project was not mentioned in the WMF's annual plan. According to the
English Wikipedia The English Wikipedia is, along with the Simple English Wikipedia, one of two English-language editions of Wikipedia, an online encyclopedia. It was founded on January 15, 2001, as Wikipedia's first edition, and, as of , has the most arti ...
's community newsletter, ''
The Signpost ''The Signpost'' (formerly ''The Wikipedia Signpost'') is the Wikimedia movement's online newspaper. Managed by the volunteer community, it is published online with contributions from Wikimedia editors. The newspaper reports on the Wikimedia c ...
'', some community members expressed outrage at the perceived secrecy around it and their lack of ability to give input, and this raised questions about WMF's commitment to transparency with the Wikipedia community.
James Heilman James M. Heilman (born ) is a Canadian emergency physician, Wikipedian, and advocate for the improvement of Wikipedia's health-related content. He encourages other clinicians to contribute to the online encyclopedia. With the Wikipedia user ...
, a member of the WMF's
Board of Trustees A board of directors (commonly referred simply as the board) is an executive committee that jointly supervises the activities of an organization, which can be either a for-profit or a nonprofit organization such as a business, nonprofit org ...
, noted in ''
The Signpost ''The Signpost'' (formerly ''The Wikipedia Signpost'') is the Wikimedia movement's online newspaper. Managed by the volunteer community, it is published online with contributions from Wikimedia editors. The newspaper reports on the Wikimedia c ...
'' that while on the Board, he had insisted multiple times that the grant documentation be made public, without success. He was dismissed from the Board in December 2015, and it was suggested that his push for transparency concerning the grant had been a factor in his dismissal—a suggestion rejected by Jimmy Wales. The Wikipedia community re-elected Heilman to the Board in 2017. Ruth McCambridge said in ''Nonprofit Quarterly'', "Wikipedia editors have been requesting from December for the grant proposal and grant letter for a project that many surmise is a bid to remain technologically cutting-edge by the Wikimedia Foundation, but which may divert resources and attention from other pressing needs of the community." Commenting on the reluctance to share the grant documents with the community, referencing privacy concerns, McCambridge saw "a major difference in culture and values assumptions" compared to previous Wikimedia practice. McCambridge said that "the power of important strategic decisions" here seemed to rest "between funders and the top of the organizational hierarchy" and was "not shared with volunteer editors." The WMF initially published only portions of the grant documentation, later making the full grant agreement available in February. Further internal documents were leaked shortly after. The full agreement clarified the initial concept for the first stage of the project. Tretikov said she regretted being so late in informing the Wikipedia editing community about the grant. Longtime Wikipedia editor and journalist William Beutler told '' Vice Magazine''s Jason Koebler, "Leaving aside whether a search engine is a good idea, let alone feasible, the core issue here is about transparency. The irony is that the Wikimedia Foundation failed to observe one of the movement's own core values ...." UK Wikipedia editor
Ashley van Haeften Wikimedia UK (WMUK) is a registered charity established to support volunteers in the United Kingdom who work on Wikimedia projects such as Wikipedia. As such, it is a Wikimedia chapter approved by the Wikimedia Foundation, which owns and host ...
told ''
Ars Technica ''Ars Technica'' is a website covering news and opinions in technology, science, politics, and society, created by Ken Fisher and Jon Stokes in 1998. It publishes news, reviews, and guides on issues such as computer hardware and software, sc ...
'' via e-mail that "Lila, Jimmy, and the rest chose to keep the project and the Knight Foundation application and grant a secret until the projects were underway for six months, and even then this only came to light because it was leaked." Tretikov's initial public post about the Knowledge Engine project did not explain why the original grant proposal had such a grander vision than the later public plan to develop an internal search engine. Staff who had been uncomfortable about the project's development felt the WMF was not being sufficiently straightforward with the community. According to statements posted of an internal meeting on the WMF's website, a member of the Discovery team member said to Tretikov, "My concern is that we still aren't communicating it clearly enough. This morning's blog post is the truth, but not all of the truth. Namely that we had big plans in the past. It would have been much easier to say that we did have big plans, but they were ditched ... we still haven't acknowledged it. We can't deny it." Former Deputy Director of the WMF Erik Möller, up to April 2015, portrayed the events as "very much out of control" and "a crisis." Disagreements about the project, and the response to the resulting controversy, led to many WMF staff members departing, culminating in Tretikov resigning on February 25, 2016.


References


External links


Discovery homepage
on MediaWiki.org {{Wikimedia Foundation Internet properties established in 2016 Internet search engines Wikimedia Foundation