Event Management (ITIL)
   HOME

TheInfoList



OR:

Event Management, as defined by
ITIL ITIL (previously and also known as Information Technology Infrastructure Library) is a framework with a set of practices (previously processes) for IT activities such as IT service management (ITSM) and IT asset management (ITAM) that focus ...
, is the process that monitors all events that occur through the IT
infrastructure Infrastructure is the set of facilities and systems that serve a country, city, or other area, and encompasses the services and facilities necessary for its economy, households and firms to function. Infrastructure is composed of public and pri ...
. It allows for normal operation, detecting changes of state and escalating exception events and other priorities. An event can be defined as any detectable or discernible occurrence that has significance for the management of the IT Infrastructure or the delivery of IT service and evaluation of the impact a deviation might cause to the services. Events are typically notifications created by an IT service, Configuration Item (CI) or monitoring tool.


Purpose/scope

* The purpose is the ability to detect events, investigate and determine the correct control action * The events (warnings and exceptions) can be used to automate many routine activities * Event Management can be applied to any aspects of
Service Management Service management in the manufacturing context, is integrated into supply chain management as the intersection between the actual sales and the customer point of view. The aim of high-performance service management is to optimize the service- ...
that can be controlled and can be automated (Configuration Items) * Provide mechanisms for early detection of incidents. * Some types of automated activities can be monitored by exception, reducing downtime.


Event handling


Event notification and detection

Event notifications can be proprietary, only certain management tools can be used to detect events. Most of the Configuration Items (CIs) generate event notifications using SNMP open protocol (
Simple Network Management Protocol Simple Network Management Protocol (SNMP) is an Internet Standard protocol for collecting and organizing information about managed devices on IP networks and for modifying that information to change device behavior. Devices that typically su ...
).
The CIs are configured to generate a set of events based on the designer's experience.
Once an Event notification has been generated, it will be detected by the specific tool (read and interpreted)


Event filtering

Filtering means that the event notification can be ignored or communicated to the management tool. If ignored, the event will usually be recorded in a log file on the device, but no further action will be taken.
During the filtering step, the event will receive a level of correlation (type: informational, warning, or exception).
The filtering step is not always mandatory, some CI's have significant events that are communicated directly into the management tool (even if they are duplicated).


Significance of event

Standard categorization based on the significance of an event: *Informational (INFO): the event does not require any immediate action and does not represent an exception. They are recorded in the log files and maintained for a predetermined period. This type of event is used to check the status of a device or service, to confirm the state of an activity, to generate statistics (user login, batch job completed, device power up, number of users logged into an application) *Warning (WARN / ALERT): the event is generated when a device or service, (application / utility), is approaching an agreed threshold ( KPI). Warnings are intended to notify the group/process/tool in order to take the necessary actions to prevent an exception occurring. *Exception (ERROR): means that a service or device is currently operating below the normal parameters/indicators (predefined). This mean that the business service is impacted and the device or service presents a failure, performance degradations or loss of functionality (web server down, CS coverage lost for several sites). A device failure is an error. Note the addition below is not an Event type but analysis that can be carried out from the Event logs: *Trend analysis The event logs should be regularly analyzed for indication that the event patterns NFO, WARN, ALERT, ERRORmay indicate an underlying Problem that may be addressed in advance of a serious service disruption.


Response

At this point in the process, there are a number of response options available. Some of the options available are: *Event logging: regardless of the event type, a good practice should be to record the event and the actions taken. The event can be logged as an Event Record or it can be left as an entry in the system log of the device. *Alert and human intervention: for events that requires human intervention, the event needs to be escalated. The purpose of the alert is to notify the correct resource (person) to handle the event. Incident Record: an incident can be generated when an exception is detected. * RFC: in case of an RFC there are two scenarios underlined: **For an exception (two new network devices have been added without the necessary authorization) **For a change (in order to prevent a file system failure, the server needs to be upgraded. It may take a while for the change to start working.)


Close event

*In the case of events that generated an
incident The Incident Command System (ICS) is a standardized approach to the command, control, and coordination of emergency response providing a common hierarchy within which responders from multiple agencies can be effective. ICS was initially develope ...
, problem or change, these should be formally closed with a link to the appropriate record from the other process *Informational events are simply logged and then used as input to other processes, such as Backup and Storage Management. Auto response events will typically be closed by the generation of a second event.


See also

* Information Technology Infrastructure Library * Incident management (ITSM)


References

{{reflist ITIL