Trino is an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
distributed
SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources. Trino can query
datalakes that contain
open
Open or OPEN may refer to:
Music
* Open (band), Australian pop/rock band
* The Open (band), English indie rock band
* ''Open'' (Blues Image album), 1969
* ''Open'' (Gotthard album), 1999
* ''Open'' (Cowboy Junkies album), 2001
* ''Open'' (YF ...
column-oriented data file formats like
ORC
An Orc (or Ork) is a fictional humanoid monster like a goblin. Orcs were brought into modern usage by the fantasy writings of J. R. R. Tolkien, especially ''The Lord of the Rings''. In Tolkien's works, Orcs are a brutish, aggressive, ugly, a ...
or
Parquet residing on different storage systems like
HDFS
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage a ...
,
AWS S3
Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon (company), Amazon.com u ...
,
Google Cloud Storage
Google Cloud Storage is a RESTful online file storage web service for storing and accessing data on Google Cloud Platform infrastructure. The service combines the performance and scalability of Google's cloud with advanced security and sharing ...
, or
Azure Blob Storage using the
Hive
A hive may refer to a beehive, an enclosed structure in which some honey bee species live and raise their young.
Hive or hives may also refer to:
Arts
* ''Hive'' (game), an abstract-strategy board game published in 2001
* "Hive" (song), a 201 ...
and
Iceberg
An iceberg is a piece of freshwater ice more than 15 m long that has broken off a glacier or an ice shelf and is floating freely in open (salt) water. Smaller chunks of floating glacially-derived ice are called "growlers" or "bergy bits". The ...
table formats. Trino also has the ability to run federated queries that query tables in different data sources such as
MySQL
MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
,
PostgreSQL
PostgreSQL (, ), also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. It was originally named POSTGRES, referring to its origins as a successor to the In ...
,
Cassandra
Cassandra or Kassandra (; Ancient Greek: Κασσάνδρα, , also , and sometimes referred to as Alexandra) in Greek mythology was a Trojan priestess dedicated to the god Apollo and fated by him to utter true prophecies but never to be believe ...
,
Kafka
Franz Kafka (3 July 1883 – 3 June 1924) was a German-speaking Bohemian novelist and short-story writer, widely regarded as one of the major figures of 20th-century literature. His work fuses elements of realism and the fantastic. It typ ...
,
MongoDB
MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and licensed under the Serve ...
and
Elasticsearch
Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is dual-l ...
. Trino is released under the
Apache License.
History
In January 2019, the original creators of
Presto, Martin Traverso, Dain Sundstrom, and David Phillips, created a
fork
In cutlery or kitchenware, a fork (from la, furca 'pitchfork') is a utensil, now usually made of metal, whose long handle terminates in a head that branches into several narrow and often slightly curved tines with which one can spear foods ei ...
of the Presto project. They initially kept the name Presto and used the PrestoSQL web handle to distinguish it from the original PrestoDB project. Simultaneously, they announced the Presto Software Foundation. The foundation is a not-for-profit organization dedicated to the advancement of the Presto open source distributed SQL query engine.
In December 2020, PrestoSQL was rebranded as Trino. The Trino Software Foundation, code base, and all other PrestoSQL assets were renamed as part of the rebrand.
Presto and Trino were originally designed and developed by Martin, Dain, David, and Eric Hwang at
Facebook
Facebook is an online social media and social networking service owned by American company Meta Platforms. Founded in 2004 by Mark Zuckerberg with fellow Harvard College students and roommates Eduardo Saverin, Andrew McCollum, Dustin M ...
to allow data analysts to run interactive queries on its large
data warehouse
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business reporting, reporting and data analysis and is considered a core component of business intelligence. DWs are central Repos ...
in
Apache Hadoop
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
. Trino shares the first six years of development with the Presto project.
To learn more about the earlier history of Trino, you can reference
the Presto history section.
Architecture

Trino is written in
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
.
It contains two types of nodes, a coordinator and a worker.
* The coordinator is responsible for parsing, analyzing, optimizing, planning, and scheduling a query submitted by a client. The coordinator interacts with the
service provider interface Service provider interface (SPI) is an API intended to be implemented or extended by a third party. It can be used to enable framework extension and replaceable components.
Details
From Java documentation:
The concept can be extended to other pla ...
(SPI) to obtain the available tables, table statistics, and other information needed to carry out its tasks.
* The workers are responsible for executing the tasks and operators fed to them by the scheduler. These tasks process rows from the data sources which produce results that are returned to the coordinator and ultimately back to the client.
Trino adheres to the
ANSI
The American National Standards Institute (ANSI ) is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organi ...
SQL standard and includes various parts of the following ANSI specifications:
SQL-92
SQL-92 was the third revision of the SQL database query language. Unlike SQL-89, it was a major revision of the standard. Aside from a few minor incompatibilities, the SQL-89 standard is forward-compatible with SQL-92.
The standard specificatio ...
,
SQL:1999,
SQL:2003,
SQL:2008,
SQL:2011,
SQL:2016.
Trino supports the separation of compute and storage
and may be deployed both on-premises and in the
cloud
In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may co ...
.
Trino has a
Distributed computing
A distributed system is a system whose components are located on different computer network, networked computers, which communicate and coordinate their actions by message passing, passing messages to one another from any system. Distributed com ...
MPP
MPP or M.P.P. may refer to:
* Marginal physical product
* Master of Public Policy, an academic degree
* Member of Provincial Parliament (Ontario), Canada
* Member of Provincial Parliament (Western Cape), South Africa
* ''Merriweather Post Pavilion ...
architecture.
Trino first distributes work over multiple workers by running ad-hoc partitioning operations or relying on existing partitions in the data of the underlying data store. Once this data has reached the worker, the data is processed over pipelined operators carried out on multiple threads.
See also
*
Presto (SQL query engine)
Presto (including PrestoDB, and PrestoSQL which was re-branded to Trino) is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Allu ...
*
Big data
Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
*
Data Intensive Computing
Data-intensive computing is a class of parallel computing applications which use a data parallel approach to process large volumes of data typically terabytes or petabytes in size and typically referred to as big data. Computing applications which ...
*
Apache Drill
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's D ...
*
Computer cluster
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.
The comp ...
References
{{Reflist
External links
Trino Software Foundation (formerly Presto Software Foundation)
SQL
Free system software
Hadoop
Cloud platforms
Java platform