Was your last outage like Apollo 13, or more like Titanic?

Was your last outage a triumph, or a tragedy? Was it like Apollo 13, or more like Titanic?

Are you ready for your next outage? Are you prepared to respond quickly, smoothly, effectively, and efficiently? Users expect uptime, all the time, and it’s critical to your success. Whether it’s a massive web service like Google Search, or the file server in your 6-person office, if the service is unavailable, nothing else about it matters.

Mastering outages makes a critical difference by reducing downtime and disruption. This protects your users, your reputation, your staff, and your bottom line.

Preparation is key to mastering outages. Is your organization prepared to respond effectively to outages, or do you simply react haphazardly? You prevent an emergency from becoming a crisis with a response that is quick, smooth, effective, and efficient, in order to reliably solve the problem with minimum fuss and disruption.

Learn how to reduce downtime and the impact of outages on your company, customers, and staff at our Mastering Outages 1-day class in the San Francisco Bay Area on Friday 18 May 2018.

Sign up now at greatcircle.com/class.

A limited number of discounted Early Bird seats are available now, plus get a further $100 discount with code “Apollo13”.

Can’t make this date? Join our list to hear about future dates and locations for this class.

Interested in consulting and in-house training for your company? Questions about this class? Want to suggest your city for a future class? Contact us!

Mastering Outages full-day class scheduled

I’m excited to share that I just signed the venue contract for a full-day “Mastering Outages” public class on incident management, for here in the SF Bay Area on Friday 18 May 2018. I’ll be posting full details shortly, as I get everything else set up, but you should join my mailing list so that you don’t miss any announcements (or discounts!)…

Why does incident management matter?

Effective incident management matters because it both reduces downtime for your service, and reduces the impact of dealing with that downtime on your staff.

If you’re a service provider, uptime is critical for your success. Whether it’s a massive web service like Google Search, or the print service in your 6-person office: if the service is not available, nothing else about it matters. Your users expect uptime, and they notice downtime.

Besides the direct impact of outages on users, outages also have a huge impact on your staff and management. Outages are disruptive to your organization’s ongoing work, whatever that might be. Projects and other deliverables are delayed when staff are interrupted to deal with outages, and both the quality of their work and their own quality of life suffer.

In the worst cases, organizations risk entering a toxic cycle, as the resources involved in dealing with outages are drawn away from other work, which in turn causes delays, missed deadlines, and more stress for all concerned. In the wake of each outage, the work that was neglected while dealing with the outage still needs to get done and therefore becomes more urgent, while work to identify and address the root causes of the outage gets short shrift. The root causes remain unaddressed, which in turn leads to more outages, thus perpetuating the toxic cycle.

There are two basic ways to reduce the impact of downtime: reduce the number of outages, and reduce the time and disruption of dealing with outages. You reduce the number of outages by making the service more resilient through architectural and infrastructure changes, and you reduce both the duration and impact of each outage by improving your incident management practices.

By adopting proven incident management methods, you can both shorten the duration and user impact of outages, and reduce the disruptions that outages inflict your own staff and their ongoing development work.


I’ll be presenting a 3-hour workshop on incident management at the USENIX SREcon18 Americas conference in Silicon Valley on 27 March 2018:

Incident Command for IT—What We’ve Learned from the Fire Department

Don’t delay in registering for the conference; it has sold out each of the past several years, often before the Early Bird discount registration date arrives! This conference is always one of my favorites each year, and I highly recommend it.

I’ve also got an extended full-day version of this training available; contact me if you’d be interested in a private presentation for your organization.

Finally, I’m working on scheduling a public presentation of this full-day extended training for mid-May in San Francisco. If you want to be notified when that is scheduled, please sign up for our mailing list.