Incident Management in a DevOps Environment

Blogs· 4min June 21, 2023

In this post, David discusses how Incident Management and the DevOps engineering culture. He shares how DevOps structures benefit the Incident Management process and the DevOps environments benefit the whole of the Service Delivery process.

Incident Management refresher

Incident Management in its shortest form is the team responsible for managing the formal response of the company when something goes wrong. When something is impacting our customers or we suspect it could, the Incident Management team hold the responsibility for ensuring our processes are followed and we respond in the best possible way.

If you're not familiar with the basics of Incident Management, make sure to check out David's previous blogpost "What is the life of an Incident Manager?". It includes a quick introduction into the day-to-day life and responsibilities of Incident Managers.

Processes

In a more traditional model outside of the DevOps world, many teams can be siloed. In no way blocked from talking, but there's just rarely a need to beyond a short handover of information. It can be a very effective method for ensuring responsibilities are very clear and roles set firm. It can have a disadvantage however of a lack of collaborative knowledge sharing.

In the DevOps world, the developers are much more involved in the live operational elements of the delivery of our services. Often the developers of the tool are the ones on-call at 3am when that tool needs support. It's not handed off to an operations team. This focus on keeping a close connection between Development and Operations leads to a much higher level of knowledge and expertise within the development teams of the incident process. A much higher investment in the day-to-day business of the company. At Form3, we use have a SecDevOps engineering culture, which is a variation of DevOps that integrates security into the development process.

DevOps' nature depends on innovative approaches to handling issues. DevOps is a culture that values openness, visibility, and quick learning. A culture of open, blameless communication between Development and Operations teams.

This can be difficult for those who have come from an more segmented framework background. Firm processes and procedures mixed with tasks and checklists. I know it was an adjustment for me when I first saw it! The basic purpose of DevOps is to give commercial value to a company by promoting open communication between operations teams and development teams. Dismantling old organisational barriers and enhancing transparency.

DevOps & Incident Management

Now we get into the meat of this post! Where making formal changes in incident response processing can be a lengthy endeavour. DevOps incident management processes are reflective of the development methodology and ever changing to adapt to the business need.

DevOps is about a continuous and iterative approach, where speed and efficiency are at its core. This same approach should be felt in Incident Management processes. Effective and efficient use of Automation should not just be used in the Development process, not just in detection but throughout the Incident Management process. One of the most common delays in Incident resolution is the bringing in of the required expertise needed. Working with the Development team on structure and on-call automation can resolve this delay in an instant.

To run effective Incident Management practices within a DevOps environment relies on the effective use of tooling, communication and positioning of resources. The biggest one of those truly being that communication. Incident Management are far more involved in not just the changes being made at every moment of every day within Development but within the other Operations teams also.

DevOps problem solving

The blameless culture of DevOps enabled us to communicate better and more frequently. Removing the rigid silo of teams while still maintaining separation of responsibilities. Collaboration is at the heart of continuous improvement. The blameless post-mortem is the link in the chain holding it all together. As such everyone should be involved. All parties can offer a perspective on the incident and on the future changes needed in prevention. DevOps does not leave Root Cause and Continuous Improvement solely in the hands of a Problem Manager. While they manage the next step in the process, after the urgency has subsided, incident managers take on an investigative and advisory role and assist in problem management and prevention.

These shifting of teams into other areas responsibilities relies on clear definitions of those responsibilities. Problem Management within DevOps can suffer the same fate as it can within traditional models when given less import than Incident. It is less urgent but has vast potential for long term value adding and improvement. Change Management can also be left in a state of stagnation, not updating approval processes and assessment methods.

Where DevOps models differ in this way is the involvement and investment of all parties in the overall goal and process through exposure. When an Incident Manager is involved in the Problem process, they appreciate and understand the steps needed and can improve the prior stages. Where Change Managers and Development are involved in both Incident and Problem stages, they can better understand the impact from the initial point. Equally the value is felt by Development in seeing how the products developed are being used and in getting valuable feedback on areas of improvement.

Conclusions

Here are some of the highlights we have covered in this post:

  • Incident Management in a DevOps environment comes with a wealth of benefits, collaboration and knowledge sharing is at the heart of all aspects.
  • DevOps places Incident Management processes into part of the bigger picture, interlinked and involved in all aspects of the business and equally does the same for all aspects of Development and Operations.
  • DevOps is not a magic button to solve all problems, it takes hard work and by its nature is never perfect. No system or framework is, DevOps companies know this and accept it.

Written by

github-icon
David Macarthur Incident Management Specialist

David is one of our Incident Managers at Form3. He has a focus on continual improvement in our processes across the whole company. He is also passionate about accessibility, diversity, inclusion and leadership.

You can find David on LinkedIn where he has several articles on various topics.