Free Tech Talks on Incident Management
Would your team or professional group like a free tech talk on incident management? I deliver a limited number of these 1-hour talks every month to companies and user groups. Join the growing list of organizations that have benefited from these free talks, including LinkedIn, Atlassian, Pivotal, Okta, and BayLISA!
Learning from the Fire Department:
Experiences with Incident Command for IT
Google, Slack, PagerDuty, and many other leading companies have developed successful incident management practices based on the public safety world’s Incident Command System (ICS).
Brent Chapman presents a talk that offers key lessons learned by those organizations (and a few war stories!), including:
- Incident response is a critical IT capability
- It pays to explicitly distinguish between “normal” and “emergency” operations
- ICS principles apply well to IT incidents
- Simple modifications to public safety ICS practices make it better for IT needs
- Certain communications tools are more effective for incident response
- Checklists are very powerful and under-appreciated tools
- Blameless postmortems are a key to improving incident response
- Senior managers can inadvertently disrupt incident response just by showing up
This talk targets IT professionals of all types (including developers, sysadmins, DBAs, DevOps engineers, test/QA engineers, managers, and executives) and related specialists such as product managers, program managers, and project managers, and customer/user support staff. Anyone involved in or affected by incident management will likely find this talk interesting and valuable.
Brent delivered a fascinating talk that sparked a number of important followup conversations within and between our teams, easily a top 5 guest tech talk.
Schedule your free tech talk today!
I am an expert at emergency management for IT services, guiding companies to prevent, prepare for, respond to, and learn from emergencies. I work from a strong background in IT infrastructure, site reliability engineering (SRE), and public safety emergency management.
Slack recruited me to lead incident response and incident management for their Engineering organization and the company as a whole. I designed and built Slack’s incident management capabilities; kept incident management running smoothly day-to-day; helped the company learn from, prevent, and prepare for incidents; and shared Slack’s incident management story with Slack’s customers and the industry.
As a leader in Google’s legendary SRE organization, I convinced senior management of the need to strengthen and standardize the company’s incident management practices, and created the Incident Management at Google (IMAG) system used throughout the company. I also helped refine the Postmortems at Google (PMAG) system that the company uses to learn from incidents large and small.
I bring a unique perspective to my work in IT, as a former air search and rescue pilot and incident commander, an emergency dispatcher and dispatch supervisor for major art & music festivals and events, and a Community Emergency Response Team (CERT) member and instructor.
Throughout my career, I have designed, built, managed, and scaled IT infrastructure and teams for everything from embryonic startups to giants such as Google, Apple, and Netflix. I have worked with dozens of organizations in Silicon Valley and globally, and with various non-profit and government entities. I am the co-author of the highly regarded O’Reilly book Building Internet Firewalls, the developer of widely used open-source software, and a sought-after speaker at conferences worldwide.
I have a rare combination of experience as an emergency manager, technology manager, people manager, software developer, network/systems engineer, and educator. My extensive experience enables me to quickly and effectively dive in, assess a situation, and deliver results.