Artificial Intelligence For IT Operations
   HOME

TheInfoList



OR:

Artificial Intelligence for IT Operations (AIOps) is a term coined by Gartner in 2016 as an industry category for machine learning analytics technology that enhances
IT operations analytics In the fields of information technology (IT) and systems management, IT operations analytics (ITOA) is an approach or method to retrieve, analyze, and report data for IT operations. ITOA may apply big data analytics to large datasets to produce bus ...
. AIOps is the acronym of "Artificial Intelligence Operations". Such operation tasks include automation, performance monitoring and event correlations among others. There are two main aspects of an AIOps platform:
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
and big data. In order to collect
observational data In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concern ...
and engagement data that can be found inside a big data platform and requires a shift away from sectionally segregated IT data, a holistic machine learning and analytics strategy is implemented against the combined IT data. The goal is to enable IT transformation, receive continuous insights which provide continuous fixes and improvements via automation. This is why AIOps can be viewed as
CI/CD In software engineering, CI/CD or CICD is the combined practices of continuous integration (CI) and (more often) continuous delivery or (less often) continuous deployment (CD). Comparison * Continuous integration: Frequent merging of several ...
for core IT functions. Given the inherent nature of IT operations, which is closely tied to cloud deployment and the management of distributed applications, AIOps has increasingly led to the coalescence of
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
and cloud research.


Process

The normalized data is suitable to be processed through machine learning algorithms to automatically reduce noise and identify the probable root cause of incidents. The main output of such stage is the detection of any abnormal behavior from users, devices or applications. Noise reduction can be done by various methods, but most of the research in the field points to the following actions: # Analysis of all incoming alerts; # Remove duplicates; # Identify the false positives; # Early anomaly, fault and failure (AFF) detection and analysis. Anomaly detection - another step in any AIOps process is based on the analysis of past behavior of users, equipment and applications. Anything that strays from that behavior baseline is considered unusual and flagged as abnormal. Root cause determination is usually done by passing incoming alerts through algorithms that take into consideration correlated events as well as topology dependencies. The algorithms on which AI are basing their functioning can be influenced directly, essentially by "training" them.


Use

A very important use of AIOps platforms is related to the analysis of large and unconnected datasets, such as the Johns Hopkins Covid-19's data published through GitHub. The data in this example is pulled from a large number of un-normalized databases - aggregated data (10 sources), US regional data (113 sources) and Non-US data (37 sources), which are unuseable considering the needed emergency response time by the traditional analysis models. Generally, the main areas of use for AIOps platforms and principles areUPC.edu - Top 10 Artificial Intelligence Trends in 2019
/ref> * Automation of tasks (
DevOps DevOps is a set of practices that combines software development (''Dev'') and IT operations (''Ops''). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary to ...
) *
Machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
platforms * Augmented reality * Agent-based simulations * Internet of things (IoT) * AI Optimized Hardware *
Natural language generation Natural language generation (NLG) is a software process that produces natural language output. In one of the most widely-cited survey of NLG methods, NLG is characterized as "the subfield of artificial intelligence and computational linguistics th ...
*
Streaming data Streaming data is data that is continuously generated by different sources. Such data should be processed incrementally using stream processing techniques without having access to all of the data. In addition, it should be considered that concept d ...
platforms * Conversational BI and analytics * Deployment and integration testing *
System configuration A system configuration (SC) in systems engineering defines the computers, processes, and devices that compose the system and its boundary. More generally, the system configuration is the specific definition of the elements that define and/or prescri ...
* Service quality monitoring and
anomaly detection In data analysis, anomaly detection (also referred to as outlier detection and sometimes as novelty detection) is generally understood to be the identification of rare items, events or observations which deviate significantly from the majority o ...
* Resource scheduling and optimization * Capacity/workload management and prediction * Hardware/software failure prediction * Auto-diagnosis and problem localization *
Incident management An incident is an event that could lead to loss of, or disruption to, an organization's operations, services or functions. Incident management (IcM) is a term describing the activities of an organization to identify, analyze, and correct hazards ...
* Auto service healing *
Data center management Data center management is the collection of tasks performed by those responsible for managing ongoing operation of a data center This includes ''Business service management'' and planning for the future. Historically, ''data center management'' wa ...
*
Customer support Customer support is a range of services to assist customers in making cost effective and correct use of a product. It includes assistance in planning, installation, training, troubleshooting, maintenance, upgrading, and disposal of a product. Reg ...
*
Security" \n\n\nsecurity.txt is a proposed standard for websites' security information that is meant to allow security researchers to easily report security vulnerabilities. The standard prescribes a text file called \"security.txt\" in the well known locat ...
* Privacy


References

{{reflist Artificial intelligence publications