Monitor workflow - Improve

Improve

This page contains a description of the Gitlab Improve workflow vision as a part of our Monitor stage.

Why Improve?

Improve is the process of reviewing all events that happened around (before, during, and after) an incident and identifying how to change processes, system behavior and human behavior to prevent future incidents. Conducting an effective Post Incident Review requires preparation

User Journey

Preparation

Preparing for a Post Incident Review begins during the Triage process by documenting events and actions as they take place. This makes Post Incident Reviews much more effective. In addition to capturing events and actions taken by team members, it is helpful to collect metric visualizations that show when and how a system changed at the time of the incident.

Post Incident Review

Effective Post Incident Reviews are blameless. It should be stated at the beginning of the review that everyone involved acted with good intent and that they made the best decision they could with the information that they had. Setting this tone at the beginning of a review helps the team discover all system flaws and potential improvements. The review will walk through the event timeline of the incident disucssing why for each step (the Five whys method is an iterative interogative technique utilized by the GitLab Infrastructure Team to uncover true root cause).

Action Items

Once the root causes of all critical events that happened during the incident have been uncovered and understood, the team will brainstorm improvements to change or prevent those events, ultimately preventing the incident from happening again or preparing better response plans in the case an similar incident occurs. Action items should be written down, prioritized, and scheduled. All action items should be assigned a DRI (directly responsible individual) to ensure completion.

Follow-up

Action items are no good if they team does not follow-up with the DRI to inquire on progress and completion. Follow-up may occur during daily or weekly stand-ups.

Today

What's possible

We have not enabled the entire workflow detailed above, however, we do have a couple features you can take advantage of today to simplify your Improve processes: