Prometheus is a
free software
Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
application used for
event monitoring
{{more citations needed, date=August 2017
In computer science, event monitoring is the process of collecting, analyzing, and Signal (information theory), signaling event occurrences to subscribers such as operating system Process (computing), proc ...
and
alerting. It records metrics in a
time series database
A time series database is a software system that is optimized for storing and serving time series through associated pairs of time(s) and value(s). In some fields, ''time series'' may be called profiles, curves, traces or trends. Several early tim ...
(allowing for high
dimensionality) built using an
HTTP
HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...
pull model, with flexible queries and real-time alerting.
The project is written in
Go and licensed under the Apache 2 License, with
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
available on
GitHub
GitHub () is a Proprietary software, proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug trackin ...
.
History
Prometheus was developed at
SoundCloud
SoundCloud is a German audio streaming service owned and operated by SoundCloud Global Limited & Co. KG. The service enables its users to upload, promote, and share audio. Founded in 2007 by Alexander Ljung and Eric Wahlforss, SoundCloud is ...
starting in 2012,
when the company discovered that its existing metrics and monitoring tools (using StatsD and
Graphite
Graphite () is a Crystallinity, crystalline allotrope (form) of the element carbon. It consists of many stacked Layered materials, layers of graphene, typically in excess of hundreds of layers. Graphite occurs naturally and is the most stable ...
) were insufficient for their needs. Specifically, they identified needs that Prometheus was built to meet, including a multi-dimensional data model, operational simplicity, scalable data collection, and a powerful query language, all in a single tool.
The project was open-source from the beginning and began to be used by Boxever and
Docker users as well, despite not being explicitly announced.
Prometheus was inspired by the monitoring tool Borgmon used at Google.
By 2013, Prometheus was introduced for production monitoring at SoundCloud.[ The official public announcement was made in January 2015.][
In May 2016, the Cloud Native Computing Foundation (CNCF) accepted Prometheus as its second incubated project, after ]Kubernetes
Kubernetes (), also known as K8s is an open-source software, open-source OS-level virtualization, container orchestration (computing), orchestration system for automating software deployment, scaling, and management. Originally designed by Googl ...
. In August 2018, the CNFC announced that the Prometheus project had graduated.
Versions
Prometheus 1.0 was released in July 2016. Subsequent versions were released through 2016 and 2017, leading to Prometheus 2.0 in November 2017.
Architecture
A typical monitoring platform with Prometheus is composed of multiple tools:
* Multiple ''exporters'' typically run on the monitored host to export local metrics.
* Prometheus to centralize and store the metrics.
* ''Alertmanager'' to trigger alerts based on those metrics.
* '' Grafana'' to produce dashboards.
* ''PromQL'' is the query language used to create dashboards and alerts.
Data storage format
Prometheus data is stored in the form of metrics, with each metric having a name that is used for referencing and querying it. Each metric can be drilled down by an arbitrary number of key=value pairs (labels). Labels can include information on the data source (which server the data is coming from) and other application-specific breakdown information such as the HTTP status code (for metrics related to HTTP responses), query method (GET versus POST), endpoint, etc. The ability to specify an arbitrary list of labels and to query based on these in real time is why Prometheus' data model is called multi-dimensional.[
Prometheus stores data locally on disk, which helps for fast data storage and fast querying.][ There is the ability to store metrics in remote storage.]
Data collection
Prometheus collects data in the form of time series
In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. ...
. The time series are built through a pull model: the Prometheus server queries a list of data sources (sometimes called exporters) at a specific polling frequency. Each of the data sources serves the current values of the metrics for that data source at the endpoint queried by Prometheus. The Prometheus server then aggregates data across the data sources.[ Prometheus has a number of mechanisms to automatically discover resources that should be used as data sources.
]
PromQL
Prometheus provides its own query language PromQL (Prometheus Query Language) that lets users select and aggregate data. PromQL is specifically adjusted to work in convention with a Time-Series Database and therefore provides time-related query functionalities. Examples include the function, the instant vector and the range vector which can provide many samples for each queried time series. Prometheus has four clearly defined metric types around which the PromQL components revolve. The four types are:
* Gauge
* Counter
* Histogram
* Summary
Example code
Alerts and monitoring
Configuration for alerts can be specified in Prometheus which specifies a condition that needs to be maintained for a specific duration in order for an alert to trigger. When alerts trigger, they are forwarded to the Alertmanager service. Alertmanager can include logic to silence alerts and also to forward them to email, Slack, or notification services such as PagerDuty
PagerDuty, Inc. is an American cloud computing company specializing in a SaaS incident management platform for IT operations departments.
PagerDuty is headquartered in San Francisco with offices in Toronto, Atlanta, London, Lisbon, Tokyo, and Sy ...
. Some other messaging systems like Microsoft Teams
Microsoft Teams is a team collaboration platform developed by Microsoft as part of the Microsoft 365 suite. It offers features such as workspace chat, video conferencing, file storage, and integration with both Microsoft and third-party applicat ...
could be configured using the Alertmanager Webhook Receiver as a mechanism for external integrations. also Prometheus Alerts can be used to receive alerts directly on android devices even without the requirement of any targets configuration in Alert Manager.
Time Series Database
Prometheus has its own implementation of time series database
A time series database is a software system that is optimized for storing and serving time series through associated pairs of time(s) and value(s). In some fields, ''time series'' may be called profiles, curves, traces or trends. Several early tim ...
where it stores the recent data (1-3 hours of data by default) in a combination of memory and mmap-ed files from disk, and persists the older data in the form of blocks with an inverted index
In computer science, an inverted index (also referred to as a postings list, postings file, or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of d ...
. Inverted index is well suited for Prometheus data format and querying patterns. As part of background maintenance, smaller blocks are merged together to form bigger blocks in a process called compaction to improve query efficiency by having fewer blocks to read. Prometheus also uses a Write-Ahead-Log (WAL) to provide durability against crashes.
Dashboards
Prometheus is not intended as a full-fledged dashboard. Although it can be used to graph specific queries, it is not a full-fledged dashboard and needs to be hooked up with Grafana to generate dashboards; this has been cited as a disadvantage due to the additional setup complexity.
Interoperability
Prometheus favors white-box monitoring. Applications are encouraged to publish (export) internal metrics to be collected periodically by Prometheus. Some exporters and agents for various applications are available to provide metrics. Prometheus supports some monitoring and administration protocols to allow interoperability for transitioning: Graphite
Graphite () is a Crystallinity, crystalline allotrope (form) of the element carbon. It consists of many stacked Layered materials, layers of graphene, typically in excess of hundreds of layers. Graphite occurs naturally and is the most stable ...
, StatsD, SNMP
Simple Network Management Protocol (SNMP) is an Internet Standard protocol for collecting and organizing information about managed devices on IP networks and for modifying that information to change device behavior. Devices that typically su ...
, JMX, and CollectD.
Prometheus focuses on the availability of the platform and basic operations. The metrics are typically stored for a few weeks. For long-term storage, the metrics can be streamed to remote storage.
Standardization into OpenMetrics
There is an effort to promote Prometheus exposition format into a standard known as OpenMetrics. Some products adopted the format: InfluxData's TICK suite, InfluxDB, Google Cloud Platform
Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, Computer data storage, data storage, Data analysis, data analytics, and machine learnin ...
, DataDog and New Relic.
See also
* Check MK
* Ganglia (software)
Ganglia is a scalable, distributed system monitor, monitoring tool for high-performance computing systems, clusters and networks. The software is used to view either live or recorded statistics covering metrics such as Central processing unit, ...
* Zabbix
* Comparison of network monitoring systems
The following tables compare general and technical information for a number of Wikipedia:Notability, notable network monitoring systems. Please see the individual products' articles for further information.
Features
Legend
; Pro ...
* List of systems management systems
References
Further reading
*
*
*
*
*
*
External links
* {{YouTube, rT4fJNbfe14, Prometheus: The Documentary
Software using the Apache license
System monitors
Time series software
Free software programmed in Go
Management systems
Systems management