Great Circle Waypoints Blog

Items of interest to Great Circle clients and friends

When The Boss should not be The Boss

Incident Commanders serve a crucial role in protecting company operations, so clearly they should be drawn from senior management due to their authority and experience, right? Wrong.

When I see clients whose pool of Incident Commanders is largely made up of senior managers, directors, and even vice presidents, I am simultaneously impressed with their commitment to incident response but worried about their results.

[…]

read more

Why is it important to separate the Incident Commander and Tech Lead roles?

There are many roles in high-tech incident response, such as incident commander, tech lead, communications lead, subject matter expert, and so forth. Individuals often fill multiple roles simultaneously, especially in the early stages of an incident; generally, this is OK, and particular roles can be handed off to other individuals as more people join the response. However, in my experience with incident response at Google and elsewhere, having one person trying to act as both the incident commander (IC) and the tech lead (TL) is a recipe for trouble. […]

read more

What’s the most interesting question in a blameless postmortem?

“How did we get lucky?”

I find that this is often the most interesting section of an incident postmortem. In other words, what might have happened, but didn’t? What could have happened, that would have been worse? Incidents often open your eyes to new and frightening possibilities that you hadn’t previously considered, and the postmortem is a good place to explore them.

read more

How often should your engineers be on call?

In one of the Slack channels that I frequent, someone recently asked what a reasonable duty cycle was, for engineers in a 24/7 on-call rotation with a single digit number of pages per week. In other words, under those circumstances, is it reasonable for a given individual to be on call one week in three, one week in four, or what? At least according to Google SRE, it’s a lot less often than that. […]

read more

How to improve your incident response times

How do you measure and improve the effectiveness of your incident responses? You can start by looking at the times associated with your responses. You can set targets for these times, and evaluate how well a given incident response met your targets. Over multiple incidents, you may be able to identify trends, and take steps to tune your response methods based on those trends. In this blog post, learn what the key times are, and how to improve each of them. […]

read more

[et_pb_module_placeholder selected_tabs="all"]

Great Circle Associates, Inc.

www.greatcircle.com
info@greatcircle.com
International: +1 415 861 3588
USA Toll Free: 877 GRT CRCL

<script language="javascript"><!-- [et_pb_line_break_holder] -->document.getElementById("main-header").innerHTML += `<!-- [et_pb_line_break_holder] -->

[et_pb_module_placeholder selected_tabs="all"]

<!-- [et_pb_line_break_holder] -->`;<!-- [et_pb_line_break_holder] --></script>