Skip to main content
I
i
Glossary Term

Incident management

Incident management is an IT practice designed to restore normal service as quickly as possible after an unplanned IT disruption.

By IT Brew Staff

less than 3 min read

Back to Glossary

Definition:

Incident management is an IT practice designed to restore normal service as quickly as possible after an unplanned IT disruption, with the ultimate goal of minimizing business impact.

Key Takeaways

Incident management involves a set of structured processes designed to quickly identify, log, categorize, and resolve IT service disruptions. Incident management pros are expected to return service to an acceptable level of quality.

Incident management is often the responsibility of IT operations (ITOps) and DevOps teams. Frameworks such as ITSM (IT service management) and ITIL have a set of routines for handling unexpected incidents. At its core, incident management is reactive and features the following steps:

  • Incident identification: finding the error
  • Incident logging: recording the error
  • Incident containment: deciding how to quickly prevent the problem from becoming worse, especially for incidents such as a DDoS attack
  • Diagnose the incident: determining exactly what is going on
  • Resolve the incident: rectifying the core issue and restoring service
  • Review the incident: communicating the particulars with other teams to reduce the chances of underlying issues occurring again

As a best practice, teams involved in incident management should maintain clear lines of communication with impacted parties, which means relying heavily on tools such as alert systems, incident tracking, and video and text chats. A status page that can be quickly spun up to provide updates to internal and external stakeholders can also prove valuable.