HOME

TheInfoList



OR:

deepset is a startup that provides software developers with the tools to build production-ready natural language processing (NLP) systems. It was founded in 2018 in
Berlin Berlin ( , ) is the capital and largest city of Germany by both area and population. Its 3.7 million inhabitants make it the European Union's most populous city, according to population within city limits. One of Germany's sixteen constitue ...
by Milos Rusic, Malte Pietsch, and Timo Möller. deepset authored and maintains the
open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open ...
Haystack and its commercial
SaaS Software as a service (SaaS ) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is also known as "on-demand software" and Web-based/Web-hosted software. SaaS is cons ...
offering deepset Cloud.


History

In June 2018, Milos Rusic, Malte Pietsch, and Timo Möller co-founded deepset in
Berlin Berlin ( , ) is the capital and largest city of Germany by both area and population. Its 3.7 million inhabitants make it the European Union's most populous city, according to population within city limits. One of Germany's sixteen constitue ...
,
Germany Germany,, officially the Federal Republic of Germany, is a country in Central Europe. It is the second most populous country in Europe after Russia, and the most populous member state of the European Union. Germany is situated betwe ...
. In the same year, the company served first customers who wanted to implement NLP services by tailoring
BERT Bert or BERT may refer to: Persons, characters, or animals known as Bert *Bert (name), commonly an abbreviated forename and sometimes a surname *Bert, a character in the poem "Bert the Wombat" by The Wiggles; from their 1992 album Here Comes a Son ...
language models to their domain. In July 2019, the company released the initial version of the
open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open ...
FARM. In November 2019, the company released the initial version of the
open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open ...
Haystack. Throughout 2020 and 2021 deepset published several applied research papers at EMNLP, COLING and ACL, the leading conferences in the area of NLP. In 2020, the research contributions comprised German language models named GBERT and GELECTRA, and a
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
dataset addressing the
COVID-19 pandemic The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identif ...
called COVID-QA, which was created in collaboration with
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
and has been annotated by biomedical experts. In 2021, the research contributions comprised German models and datasets for
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
and passage retrieval named GermanQuAD and GermanDPR, a semantic answer similarity metric, and an approach for multimodal retrieval of texts and tables to enable
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
on tabular data. Haystack contains implementations of all three contributions, enabling the use of the research through the open source framework. In November 2021, the development of the FARM framework was discontinued and its main features were integrated into the Haystack framework. In April 2022, the company announced its commercial
SaaS Software as a service (SaaS ) is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is also known as "on-demand software" and Web-based/Web-hosted software. SaaS is cons ...
offering deepset Cloud. As of October 2022, the most popular finetuned language model created by deepset was downloaded more than 7 million times.


Products and Applications

Haystack is an end-to-end
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
framework for building
semantic search Semantic search denotes search with meaning, as distinguished from lexical search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query. Semantic search seek ...
solutions. With its modular building blocks, software developers can implement pipelines to address various search tasks over large document collections, such as
question answering Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural l ...
,
document retrieval Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly natural language, unstructured text, such as newspaper articles, real estate records or paragraphs ...
or summarization. It integrates with Hugging Face Transformers,
Elasticsearch Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is dual-l ...
,
OpenSearch OpenSearch is a collection of technologies that allow the publishing of search results in a format suitable for syndication and aggregation. Introduced in 2005, it is a way for websites and search engines to publish search results in a standard ...
and others. The
framework A framework is a generic term commonly referring to an essential supporting structure which other things are built on top of. Framework may refer to: Computing * Application framework, used to implement the structure of an application for an op ...
has an active community on
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
, where so far more than 140 people contributed to its continuous development and it also enjoys a vibrant community on
Meetup Meetup is a social media platform for hosting and organizing in-person and virtual activities, gatherings, and events for people and communities of similar interests, hobbies, and professions. It was founded in 2002 by Scott Heiferman and four ot ...
Thousands of organizations use the
framework A framework is a generic term commonly referring to an essential supporting structure which other things are built on top of. Framework may refer to: Computing * Application framework, used to implement the structure of an application for an op ...
, including Global 500 enterprises like
Airbus Airbus SE (; ; ; ) is a European Multinational corporation, multinational aerospace corporation. Airbus designs, manufactures and sells civil and military aerospace manufacturer, aerospace products worldwide and manufactures aircraft througho ...
, or
Infineon Infineon Technologies AG is a German semiconductor manufacturer founded in 1999, when the semiconductor operations of the former parent company Siemens AG were spun off. Infineon has about 50,280 employees and is one of the ten largest semicond ...
,
Alcatel-Lucent Enterprise ALE International SAS, trading as Alcatel-Lucent Enterprise, is a French software company headquartered in Colombes, France, providing communication equipment and services to telecommunications companies, ISPs and data providers. Since March 20 ...
, BetterUp, Etalab, and Sooth.ai. The deepset Cloud platform supports customers at building scalable NLP applications by covering the entire process of prototyping, experimentation, deployment, and monitoring. It is built on Haystack. FARM was a
framework A framework is a generic term commonly referring to an essential supporting structure which other things are built on top of. Framework may refer to: Computing * Application framework, used to implement the structure of an application for an op ...
for adapting representation models. One of its core concepts was the implementation of adaptive models, which comprised language models and an arbitrary number of prediction heads. FARM supported domain-adaptation and finetuning of these models with advanced options, for example gradient accumulation, cross-validation or automatic mixed-precision training. Its main features were integrated into Haystack in November 2021 and its development was discontinued at that time.


Funding

On April 28, 2022, deepset announced a Series A investment round of $14 million led by GV, with the participation of Harpoon Ventures, Acequia Capital and a team of experienced commercial
open source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Open ...
and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
founders, such as Alex Ratner (Snorkel AI),
Mustafa Suleyman Mustafa Suleyman (born August 1984) is the co-founder and former head of applied AI at DeepMind, an artificial intelligence company acquired by Google and now owned by Alphabet. His current venture is Inflection AI. Early life Suleyman's fath ...
(
Deepmind DeepMind Technologies is a British artificial intelligence subsidiary of Alphabet Inc. and research laboratory founded in 2010. DeepMind was List of mergers and acquisitions by Google, acquired by Google in 2014 and became a wholly owned subsid ...
), Spencer Kimball (
Cockroach Labs CockroachDB is a commercial distributed SQL database management system, developed by Cockroach Labs. History Cockroach Labs was founded in 2015 by ex-Google employees Spencer Kimball, Peter Mattis, and Ben Darnell. Cockroach Labs founders Kim ...
),
Jeff Hammerbacher Jeff Hammerbacher is a data scientist. He was chief scientist and cofounder at Cloudera and later served on the faculty of the Icahn School of Medicine at Mount Sinai. Early life Hammerbacher grew up in Fort Wayne, Indiana. His father worked at th ...
(
Cloudera Cloudera, Inc. is an American software company providing enterprise data management systems that make significant use of Apache Hadoop. As of January 31, 2021, the company had approximately 1,800 customers. History Cloudera, Inc. was formed on J ...
) and Emil Eifrem (
Neo4j Neo4j is a graph database management system developed by Neo4j, Inc. Described by its developers as an ACID-compliant transactional database with native graph storage and processing, Neo4j is available in a non-open-source "community edition" ...
). A previous pre-seed investment round of $1.6 million on March 8, 2021, was led by System.One and Lunar Ventures, who also participated in the subsequent Series A round.


References

{{reflist Natural language processing software Companies of Germany