WebCrow
   HOME

TheInfoList



OR:

The WebCrow is a research project carried out at the
Information Engineering Information engineering is the engineering discipline that deals with the generation, distribution, analysis, and use of information, data, and knowledge in systems. The field first became identifiable in the early 21st century. The component ...
Department of the
University of Siena The University of Siena ( it, Università degli Studi di Siena, abbreviation: UNISI) in Siena, Tuscany, is one of the oldest and first publicly funded universities in Italy. Originally called ''Studium Senese'', the institution was founded in 1240 ...
with the purpose of automatically solving crosswords.


The Project

The scientific relevance of the project can be understood considering that cracking crosswords requires human-level knowledge. Unlike chess and related games and there is no closed world configuration space. A first nucleus of technology, such as
search engine A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
s, information retrieval, and machine learning techniques enable computers to enfold with semantics real-life concepts. The project is based on a software system whose major assumption is to attack crosswords making use of the Web as its primary source of knowledge. WebCrow is very fast and often thrashes human challengers in competitions,G.Angelini, M. Ernandes, E. Di Iorio,
WebCrow: Previous competitions
especially on multi language crossword schemes. A distinct feature of the WebCrow software system is to combine properly
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
(NLP) techniques, the
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
web Web most often refers to: * Spider web, a silken structure created by the animal * World Wide Web or the Web, an Internet-based hypertext system Web, WEB, or the Web may also refer to: Computing * WEB, a literate programming system created by ...
search engine, and constraint satisfaction algorithms from artificial intelligence to acquire knowledge and to fill the schema. The most important component of WebCrow is the Web Search Module (WSM), which implements a domain specific web based
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
algorithm. The way WebCrow approaches crosswords solving is quite different with respect to humans:John S. Quarterman
Google as AI
Whereas we tend to first answer clues we are sure of and then proceed filling the schema by exploiting the already answered clues as hints, WebCrow uses two clearly distinct stages. In the first one, it processes all the clues and tries to answer them all: For each clue it finds many possible candidates and sorts them according to complex
ranking A ranking is a relationship between a set of items such that, for any two items, the first is either "ranked higher than", "ranked lower than" or "ranked equal to" the second. In mathematics, this is known as a weak order or total preorder of o ...
models mainly based on a probability criteria. In the second stage, WebCrow uses constraint satisfaction algorithms to fill the grid with the overall most likely combination of clue answers. In order to interact with Google, first of all, WebCrow needs to compose queries on the basis of the given clues. This is done by
query expansion Query expansion (QE) is the process of reformulating a given query to improve retrieval performance in information retrieval operations, particularly in the context of query understanding. In the context of search engines, query expansion involves ...
, whose purpose is to convert the clue into a query expressed by a simplified and more appropriate language for Google. The retrieved documents are parsed so as to extract a list of word candidates that are congruent with the crossword length constraints. Crosswords can hardly be faced by using encyclopedic knowledge only, since many clues are wordplays or are otherwise purposefully very ambiguous. This enigmatic component of crosswords is faced by a massive use of database of solved crosswords, and by automatic reasoning on a properly organized knowledge base of wired rules. Last but not the least, the final constraint satisfaction step is very effective to fill the correct candidate, even though, unlike humans, the system can not rely on very high confidence on the correctness of the answer.


Competitions

WebCrow speed and effectiveness Tom Simonite,
Crossword Software Thrashes Human Challengers
, ''New Scientist''
has been tested many times in man-machine competitions on Italian, English and multi-language crosswords The outcome of the tests is that WebCrow can successfully compete with average human players on single language schemes and reaches expert level performance in multi-language crosswords. However, WebCrow has not reached expert level in single-language crosswords, yet.


ECAI-06 Competition

On August 30, 2006, at the European Conference on Artificial Intelligence (ECAI2006), 25 conference attendees and 53 internet connected crosswords lovers, competed with WebCrow in an official challenge organized within the conference program. The challenge consisted in 5 different crosswords (2 in Italian, 2 in English and one multi-language in Italian and English) and 15 minutes were assigned for each crossword. WebCrow ranked 21 out of 74 participants in the Italian competition, and won both the bilingual and English competitions.


Other Competitions

Several competitions have been held in
Florence Florence ( ; it, Firenze ) is a city in Central Italy and the capital city of the Tuscany region. It is the most populated city in Tuscany, with 383,083 inhabitants in 2016, and over 1,520,000 in its metropolitan area.Bilancio demografico an ...
, Italy within the Creativity Festival in December 2006, and another official conference competition took place in
Hyderabad Hyderabad ( ; , ) is the capital and largest city of the Indian state of Telangana and the ''de jure'' capital of Andhra Pradesh. It occupies on the Deccan Plateau along the banks of the Musi River (India), Musi River, in the northern part ...
, India in January 2007, within the International Conference of Artificial Intelligence, where it ranked second out of 25 participants.


References

{{reflist, 2


External links


The WebCrow Website






* [http://www.multilingualblog.com/index.php/crosswords_at_the_crossroads_with_il_computer_enigmista Crosswords at the crossroads with “il computer enigmista?”, Blogos - news and views on languages and technologies]
cbc.ca radio

cruciverb.com


Applications of artificial intelligence