Mastering Outages – One‑day Class
Friday 18 May 2018, San Francisco Bay Area
How do you prevent an emergency from becoming a crisis?
Effective Incident Management
Incident Management is Critical for Effective Operations
You do your best to avoid problems and outages, but sometimes things go wrong, and your organization will be judged by how well you respond. When things go wrong, incident management is the key to a response which is quick, smooth, effective, and efficient.
Does your organization respond quickly, and immediately get to work on solving the problem? Or are your responses slow and disorganized?
Does your organization respond smoothly, when a response is needed? Or are your responses disjointed and haphazard?
Does your organization respond effectively, and reliably get the problem solved with a minimum of fuss and bother? Or are your responses unpredictable and erratic?
Does your organization respond efficiently, by involving only folks who can actually help solve the problem? Or do your responses disrupt your entire operation?
Brent can help your organization strengthen this critical capability
Preparation and Continuous Improvement are the Keys to
Effective Incident Management
Is your organization prepared to respond?
Who is going to respond? How are they going to be alerted to respond? What steps are they going to take when alerted? How can they enlist help from others if needed? How are all the responders going to organize and coordinate their activities? How are they going to communicate with each other, and with stakeholders beyond the response? How do you scale up and scale down the response, as the situation unfolds? How do you wrap up the response and return to normal operations?
How does your organization improve your responses, over time?
What information do you need to gather during a response, in order to conduct a useful critique of the response after if is concluded? What worked? What didn’t? What lucky breaks did you catch? What could have gone wrong, but didn’t? What needs to be addressed before next time? How do you conduct a blameless postmortem, to examine both the original problem (why was a response needed?), and the response itself?
Public Safety Agencies Understand Effective Incident Management
Every day, public safety agencies (fire, medical, law enforcement, etc.) manage a wide ranges of emergencies, large and small. They have developed an effective, standardized way of managing those emergencies called the Incident Command System (ICS). Great Circle principal and founder Brent Chapman has many years of experience with ICS in his public safety work. Brent has worked with leading companies such as Google to develop and refine successful incident management practices based on ICS.
Bring Proven, Effective Incident Management to Your Organization
Through Great Circle’s incident management consulting and training services
Great Circle can strengthen your organization’s incident management capabilities. We can help you develop the capabilities you need to quickly, effectively, and efficiently respond to incidents.
- Conduct a gap analysis of your incident management capabilities:
- What incident management capabilities do you need?
- What incident management capabilities do you already have?
- How do you get from where you are, to where you want to be?
- Develop and refine your organization’s incident management plan
- Train your staff in incident management principles and practices
- Educate your senior management about their role in effective incident management (i.e., how best to support and enable it, rather than inadvertently disrupt it)
- Identify incident management tools and technology that are compatible with your organization’s environment and culture
- Create, conduct, and critique incident management exercises
- Facilitate a blameless postmortem review of a past incident
Our one day Incident Management Tutorial is an effective way to bring your team up to speed on incident management, and get everyone onto the same page. The tutorial covers:
- The basic principles of Incident Command
- How to prepare for an effective response
- How to launch and manage an effective response
- How to evolve your response on the fly, scaling it up and down, as both the situation and your resources change
- How to communicate effectively among responders
- How to communicate beyond the responders, to management, customers, investors, regulators, the public, and others
- When and how to conclude a response and return to normal operations
- How to follow up effectively with a blameless postmortem
In addition, the tutorial includes:
- A guided incident management exercise, so that participants can practice applying these principles and methods
- Discussions of how best to adapt and apply these lessons to your organization
Bring Effective Incident Management to your organization
I was the lucky beneficiary of Brent’s work on Incident Management at Google. His leadership and direct effort resulted in nothing short of a full reformulation of this key competency of SRE. […] Through Brent’s work and training we were able to build structure and automation around this critical area of expertise, which has resulted in significant reductions in MTTR and improvements in the org’s ability to learn and grow from its service interruptions (e.g. through postmortems).Marc Alvidrez
Brent Chapman is an expert at emergency management, and at guiding organizations to prepare for and learn from emergencies, working from a strong background in IT infrastructure and site reliability engineering (SRE).
As a leader in Google’s legendary SRE organization, Brent convinced senior management of the need to strengthen and standardize the company’s incident management practices, and created the Incident Management at Google (IMAG) system that is now used throughout the company. He also helped refine the Postmortems at Google (PMAG) system that the company uses to learn from incidents large and small.
Brent brings a unique perspective to his work in IT, as a former air search and rescue pilot and incident commander, an emergency dispatcher and dispatch supervisor for major art & music festivals and events, and a Community Emergency Response Team (CERT) member and instructor.
Throughout his career, Brent has designed, built, managed, and scaled IT infrastructure and teams for everything from embryonic startups to giants such as Google, Apple, and Netflix. He is the coauthor of the highly regarded O’Reilly book Building Internet Firewalls, the developer of widely used open source software, and a popular speaker at conferences worldwide. He has worked with dozens of organizations both in Silicon Valley and around the world, as well as with a variety of non-profit and government entities.