HOME

TheInfoList



OR:

Checkmk is software developed in
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
for
IT Infrastructure Information technology infrastructure is defined broadly as a set of information technology (IT) components that are the foundation of an IT service; typically physical components (computer and networking hardware and facilities), but also vario ...
monitoring. It is used for the monitoring of servers,
applications Application may refer to: Mathematics and computing * Application software, computer software designed to help the user to perform specific tasks ** Application layer, an abstraction layer that specifies protocols and interface methods used in a c ...
, networks, cloud infrastructures (
public In public relations and communication science, publics are groups of individual people, and the public (a.k.a. the general public) is the totality of such groupings. This is a different concept to the sociological concept of the ''Öffentlichk ...
, private,
hybrid Hybrid may refer to: Science * Hybrid (biology), an offspring resulting from cross-breeding ** Hybrid grape, grape varieties produced by cross-breeding two ''Vitis'' species ** Hybridity, the property of a hybrid plant which is a union of two dif ...
),
containers A container is any receptacle or enclosure for holding a product used in storage, packaging, and transportation, including shipping. Things kept inside of a container are protected on several sides by being inside of its structure. The term ...
, storage,
databases In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
and environment
sensors A sensor is a device that produces an output signal for the purpose of sensing a physical phenomenon. In the broadest definition, a sensor is a device, module, machine, or subsystem that detects events or changes in its environment and sends ...
. Checkmk is available in three editions: an open source edition ("Checkmk Raw Edition – CRE"), a commercial enterprise edition ("Checkmk Enterprise Edition – CEE") and a commercial edition for managed services providers ("Checkmk Managed Services Edition – CME"). These Checkmk-Editions are available for a range of platforms, in particular for various versions of Debian,
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed mostly of free and open-source software. Ubuntu is officially released in three editions: '' Desktop'', ''Server'', and ''Core'' for Internet of things devices and robots. All ...
, SLES and Red Hat / CentOS, and also as a Docker Image. In addition, physical appliances of various sizes as well as a virtual appliance are offered to simplify the administration of the underlying
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
through a
graphical user interface The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inst ...
and to enable high-availability solutions. The agents used by Checkmk to collect data are available for 11 platforms, including
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ser ...
.


History

Checkmk originated in 2008 as an
Agent Agent may refer to: Espionage, investigation, and law *, spies or intelligence officers * Law of agency, laws involving a person authorized to act on behalf of another ** Agent of record, a person with a contractual agreement with an insuranc ...
-substituting shell script for
Inetd inetd (internet service daemon) is a super-server daemon on many Unix systems that provides Internet services. For each configured service, it listens for requests from connecting clients. Requests are served by spawning a process which runs the ...
, and was published in April 2009 under
GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
. It was initially based on Nagios, and extended this with a number of new components. The open source edition (Checkmk Raw Edition) also continues to be based on the Nagios-core, and bundles this with additional open source components into a complete system. Over many years Checkmk's commercial editions have evolved into a self-contained monitoring system – one that has replaced all of the essential Nagios components with its own – including its very own monitoring core. The majority of the developments for the commercial editions, in particular all plug-ins, are also available into the Checkmk Raw Edition. While in the past Checkmk was designed for monitoring large and heterogeneous
on-premise On- premises software (abbreviated to on-prem, and incorrectly referred to as on-premise) is installed and runs on computers on the premises of the person or organization using the software, rather than at a remote facility such as a server farm ...
environments, from version 1.5+ (1.5p12) it also supports the monitoring of AWS, Azure, Docker and
Kubernetes Kubernetes (, commonly stylized as K8s) is an open-source container orchestration system for automating software deployment, scaling, and management. Google originally designed Kubernetes, but the Cloud Native Computing Foundation now maintains ...
services. Checkmk is being developed by tribe29 GmbH in
Munich Munich ( ; german: München ; bar, Minga ) is the capital and most populous city of the German state of Bavaria. With a population of 1,558,395 inhabitants as of 31 July 2020, it is the third-largest city in Germany, after Berlin and Ha ...
Germany Germany,, officially the Federal Republic of Germany, is a country in Central Europe. It is the second most populous country in Europe after Russia, and the most populous member state of the European Union. Germany is situated betwe ...
, which until 16.04.2019 operated under the name of Mathias Kettner GmbH. Together with the company name change, the product name "Check_MK" was also changed to "Checkmk". tribe29 GmbH follows an open core business model. The open source edition is available under different open source licenses – mostly GPLv2, while large parts of the commercial editions run under the proprietary "Checkmk Enterprise License".


The Product

Checkmk combines three types of IT monitoring: * Status-based monitoring, which (via thresholds) records the "health" of a device or application. * Metric-based monitoring that enables the recording and analysis of
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
graphs using a
HTML5 HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML ...
-based graphing system. An integration with
Grafana Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources. A licensed Grafana Enterprise version with additional ...
is available as well. * Log-based and
event Event may refer to: Gatherings of people * Ceremony, an event of ritual significance, performed on a special occasion * Convention (meeting), a gathering of individuals engaged in some common interest * Event management, the organization of e ...
-based monitoring, in which key events can be filtered out and actions can be triggered based on these events. In order to ensure a very broad monitoring, Checkmk currently has 1700+ plug-ins in each edition – all of which are licensed under GPLv2. These plug-ins are maintained as part of the product and are regularly supplemented with additional plug-ins or extensions. Connecting existing legacy Nagios plug-ins is possible as well. To simplify setup and operation, all components of Checkmk are delivered fully integrated. A rule-based 1:n configuration, as well as a high degree of automation significantly accelerate
workflows A workflow consists of an orchestrated and repeatable pattern of activity, enabled by the systematic organization of resources into processes that transform materials, provide services, or process information. It can be depicted as a sequence of ...
. This includes: * Auto-discovery of hosts (where applicable) * Auto-discovery of
services Service may refer to: Activities * Administrative service, a required part of the workload of university faculty * Civil service, the body of employees of a government * Community service, volunteer service for the benefit of a community or a p ...
* Automated configuration of plug-ins via preconfigured thresholds and rules * Automated agent updates (a CEE feature) * Automatic and dynamic configuration that enables the monitoring of volatile services with a lifespan of just a few seconds, such as in the Kubernetes environment (starting from CEE v1.6) * Automated discovery of tags and labels from sources such as Kubernetes, AWS and Azure (starting from CEE v1.6) In addition, there are also playbooks for the use of configuration and deployment tools such as
Ansible An ansible is a category of fictional devices or technology capable of near-instantaneous or faster-than-light communication. It can send and receive messages to and from a corresponding device over any distance or obstacle whatsoever with no d ...
or
Salt Salt is a mineral composed primarily of sodium chloride (NaCl), a chemical compound belonging to the larger class of salts; salt in the form of a natural crystalline mineral is known as rock salt or halite. Salt is present in vast quant ...
. Checkmk is often used in very large distributed environments where a high number of sites (e.g. 300 locations of
Faurecia Faurecia SE is a French global automotive supplier headquartered in Nanterre, in the western suburbs of Paris. In 2018 it was the 9th largest international automotive parts manufacturer in the world and #1 for vehicle interiors and emission contr ...
) and/or well over 100.000 devices (e.g.
Edeka The Edeka Group is the largest German supermarket corporation , holding a market share of 20.3%. Founded in 1907, it consists today of several co-operatives of independent supermarkets all operating under the umbrella organisation ''Edeka Zentr ...
) are monitored. This is possible, among other things, because Checkmk's microcore consumes much less CPU resources than, for example Nagios, and therefore offers a significantly higher performance on the same hardware. Furthermore the non-persistent data is stored in-memory in
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * ...
which significantly improves the access time.


Components


Monitoring core

Checkmk RE uses Nagios monitoring core. It doesn't offer
container A container is any receptacle or enclosure for holding a product used in storage, packaging, and transportation, including shipping. Things kept inside of a container are protected on several sides by being inside of its structure. The term ...
monitoring and requires a reboot to apply configuration changes. Checkmk commercial edition uses proprietary "Checkmk Microcore" (CMC) monitoring core, written in C++. It has better performance than Checkmk RE core. It supports recording of objects with a short lifespan, such as containers. It does not require a reboot to apply configuration changes.


Configuration & Check Engine

Checkmk offers self-contained service discovery and settings generation. Checkmk uses its own method when carrying out the ''checks''. During the test period each ''host'' is contacted only once. The test results are transmitted to the monitoring core as ''passive checks''. This significantly improves the performance on the ''monitoring server'', as well as on the hosts being monitored. Checkmk uses different methods to access the data in the target systems. These include agents installed on the target system, "special agents" running on the monitoring server and communicating with the
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
of the target system, the
SNMP Simple Network Management Protocol (SNMP) is an Internet Standard protocol for collecting and organizing information about managed devices on IP networks and for modifying that information to change device behaviour. Devices that typically ...
API for monitoring, for example, network devices and printers, and
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
/ TCP protocols to communicate with web and internet services. By default, Checkmk follows the "pull principle", i.e. the data is explicitly queried by the monitoring system to quickly identify when a system suddenly fails and does not respond to a "pull". As an alternative, however, a "push" can be configured with which the system transfers its data directly to Checkmk or to an intermediate host.


Data Interface ("Livestatus")

Livestatus is the main interface in Checkmk. It provides live access to all data from the monitored hosts and services. The data is fetched directly from the RAM, which avoids slow hard disk access and gives fast access to the information without overloading the system too much. Access is done via a simple protocol and it is possible from all
programming languages A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
without requiring a special library.


Web-GUI ("Multisite")

Multisite is Checkmk’s web GUI. In addition to having a quick page layout, it offers user-definable views and dashboards, distributed monitoring by integrating multiple monitoring instances via Livestatus, integration of NagVis, an integrated
LDAP The Lightweight Directory Access Protocol (LDAP ) is an open, vendor-neutral, industry standard application protocol for accessing and maintaining distributed directory information services over an Internet Protocol (IP) network. Directory servi ...
connection, access to status data via web services, and much more. Dashboards and views can be differentiated for various users or groups of users, for example vSphere-specific views for VMware admins. The web GUI is available in English and German.


Setup

Checkmk is completely administrable via the browser via its Setup module. This includes managing users, roles, groups, time periods, and more. Permissions can be granted in a granular way using a role concept. Existing role-based access controls (LDAP, AD) can be used for this. Checkmk works rule-based, so that the configuration remains intuitive even in complex environments, and the necessary effort is low. Automatic discovery and configuration, as well as the automatic agent update further accelerate the configuration process. An HTTP API can also be used to integrate CMDBs for accelerated configuration.


Alert System

Several notification channels can be set up and configured with different rules for each user. For example, emails can be triggered at any time of the day, but notifications via SMS are sent only for important issues during on-call hours. The notifications can be set for all or for specific teams, e.g. notify only the storage admins about a failed hard drive. Duplicate notifications are grouped together so that no user is notified twice through a particular channel. Furthermore, users can configure their own notifications themselves. In distributed environments alerts can be managed centrally. For detected issues, actions can be triggered automatically (alarm control) via scripts. Checkmk includes integrations to email and SMS gateways as well as to communication and IT service-management solutions such as Slack , Jira ,
PagerDuty PagerDuty is an American cloud computing company specializing in a SaaS incident response platform for IT departments. It has been recognized by ''Forbes'' on its "Cloud 100" as well as the ''USA Today'' list for the top small and mid-sized compa ...
, OpsGenie , VictorOps and
ServiceNow ServiceNow is an American software company based in Santa Clara, California that develops a cloud computing platform to help companies manage digital workflows for enterprise operations. Founded in 2003 by Fred Luddy, ServiceNow is listed on the ...
.


Business Intelligence

The BI module is integrated into the graphical user interface. It aggregates the overall status of business processes, their dependency on complex applications and IT infrastructure elements from many individual hosts and services in a rule-based manner. It can also be used to represent applications made up of microservices, which in turn consist of Kubernetes pods and deployments. In addition, worst-case scenarios can be simulated in real time and historical data can be analyzed to understand the causes of performance degradation.


Event Console

The Event Console integrates the processing of log messages and SNMP traps into the monitoring. It is configured via a flexible set of rules, and decides whether incoming messages are to be discarded or how they are to be classified. It can count, correlate, expect messages, rewrite messages, and more. Similar entries can be grouped into a single event (e.g. multiple failed logins) to keep track of events. It also has a built-in
syslog In computing, syslog is a standard for message logging. It allows separation of the software that generates messages, the system that stores them, and the software that reports and analyzes them. Each message is labeled with a facility code, i ...
daemon that receives messages directly on port 514, and an SNMP trap receiver that receives traps on port 162.


Metrics Graphing

The commercial Checkmk editions use their own metric and graphing system. Time series metrics can be analysed over long intervals using interactive HTML5 graphs. The maximum resolution is one second. Data can be imported from a variety of data sources and metrics formats ( JSON,
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
, SNMP etc.) and stored on the disk of a long-term data storage device. Alternatively,
Graphite Graphite () is a crystalline form of the element carbon. It consists of stacked layers of graphene. Graphite occurs naturally and is the most stable form of carbon under standard conditions. Synthetic and natural graphite are consumed on lar ...
or
InfluxDB InfluxDB is an open-source time series database (TSDB) developed by the company InfluxData. It is written in the Go programming language for storage and retrieval of time series data in fields such as operations monitoring, application metr ...
can be connected via an export interface. From CEE version 1.5p16 there is also a plug-in available for integrating data directly from Checkmk into Grafana for visualization purposes. The Checkmk Raw Edition currently uses PNP4Nagios as its graphing system.


Reporting

Reporting enables the direct delivery of PDF reports, ad-hoc or automatically, at regular intervals. It includes the availability analysis in which the history of the states over any desired time period can be provided with a click. Availability calculations can exclude unmonitored times, adjust the resolution, or ignore short intervals. In addition to the availability calculations, reporting also includes SLA reporting in which complex SLAs can be monitored. The reporting is only available in the commercial versions of Checkmk.


Hardware/Software Inventory

The hardware/software inventory can be used, for example, to monitor hardware and software changes, to verify the presence of installed security updates, and to update static data with dynamic parameters (for example, updating the current disk usage statistics based on monitoring data). The Configuration Management Database (CMDB) i-doit has a deep integration that enables the exchange of CMDB data with monitoring data.


See also

*
Comparison of network monitoring systems The following tables compare general and technical information for a number of notable network monitoring systems. Please see the individual products' articles for further information. Features Legend ; Product Name : The name ...


References


External links

* {{Official Website, https://checkmk.com/
Computer monitoring with the Open Monitoring Distribution
(Kelvin Vanderlip, 2012-03-01)
Using the Open Monitoring Distribution(Nagios) to Monitor Complex Hardware/Software Systems
(Joe VanAndel, 2012-03-29) Free network management software Free software programmed in Python Nagios System monitors Systems management