Incident Management Practice
What is incident management practice?
Incident management practice is the process of identifying and resolving unplanned incidents (often referred to as major incidents by teams following ITIL or ITSM practices).
Types of incidents vary – from unplanned customer issues and service interruptions to events that degrade service quality. Some incidents may have a widespread impact on an entire user base (e.g., when a website crashes), while others may impact a handful of users. Incident management also entails mitigating issues before they impact users.
The incident is considered resolved when the impacted service returns to its intended functionality.
Why does incident management matter?
No one wishes for bad things to happen, but when they do, organizations need a plan to mobilize and get things back to center fast. Should things go sideways, a clearly defined incident management process enables an organization to take the proper steps to respond efficiently and resolve the issue as quickly as possible.
On the digital transformation blog, Erika Flora writes: “The incident management practice is about ensuring that when things fail, we have a solid way of getting our customers back to ‘normal’ quickly. It’s worth noting that ‘normal’ does not mean diagnosing and implementing the final, permanent solution, but rather getting our customers to a state where they can continue their work.”
Think of incident management as a team of firefighters poised to respond to fires and put them out as quickly and efficiently as possible. Moreover, no fire is exactly the same–some are large, some are small, and some are complex. The firefighters address them all.
But why not just avoid fires in the first place? Sure, apparent best practices can and should be in place to prevent fires. In our fire analogy, the equivalent would be not playing with matches or leaving a candle burning unsupervised. But what about the fires beyond our control, like a lightning strike or malfunctioning electrical wiring?
The costs of interruptions and downtime
Service outages, interruptions, and reduced quality all bring a heavy cost to an organization. If handled poorly, customer loyalty and trust immediately suffer, leading to churn and losses that directly impact the bottom line.
Here are a few well-known examples that illustrate the true cost of downtime:
- A 12-hour downtime for the Apple Store in 2015 cost the company an estimated $25 million.
- A 5-hour downtime in an operation center in 2016 cost Delta Airlines nearly $150 million due to 2,000+ canceled flights.
- A 14-hour downtime for Facebook in 2019 cost the company just shy of $90 million.
Forrester’s survey of IT directors in large U.S. enterprises indicates that the cost of downtime comes from lost revenue (53%), lost productivity (47%), and lost brand equity or trust (41%). In terms of customer churn, “One in three customers will leave a brand they love after just one bad experience, while 92% would completely abandon a company after two or three negative interactions,” according to the PwC’s Future of CX report.
Surprisingly, incidents that interrupt or degrade service quality are far from uncommon.
Because incidents are not as infrequent as one might assume and tend to carry significant costs to an organization, creating an effective incident management practice that enables quick response and resolution should be a top priority.
How to improve your incident management practice
Optimizing your incident management practice can help minimize the impact on customers and your organization. Here are 9 best practices from the Invensis blog for fine-tuning your process:
Accurately define the incident (e.g., urgency, impact, and severity).
“An issue can cause a huge business impact on several users. Thus, it is essential to categorize the issue as a significant incident.”
Implement workflows that lead to a fast resolution.
“Implementing a dynamic work process encourages you to re-establish a disrupted service rapidly.”
Match the right resources for the incident.
“Ensure that your best resources are implemented to work on significant incidents.”
Provide the proper training and tools to the incident management team.
“A major incident can occur at your IT, yet the initial step to taking care of it is being prepared.”
Keep stakeholders up to date.
“Ensure that the stakeholders are kept informed about the incident management throughout the life cycle of significant incidents.”
Connect incidents with ITIL processes.
“Once the major incidents are resolved, perform a root cause analysis by utilizing problem management strategies.”
Grow your knowledge base.
“Articulate an information base editorial template that captures critical details.”
Look for opportunities to improve.
“Document and analyze all major incidents with the goal that you can distinguish the areas to improve.”
Document processes for continual service improvement.
Documenting major incident processes “can help rectify flaws and serve for continual service improvement.”