Founder and Principal
Brent Chapman is an expert at emergency management, and at guiding organizations to prepare for and learn from emergencies, working from a strong background in IT infrastructure and site reliability engineering (SRE).
As a leader in Google’s legendary SRE organization, Brent convinced senior management of the need to strengthen and standardize the company’s incident management practices, and created the Incident Management at Google (IMAG) system that is now used throughout the company. He also helped refine the Postmortems at Google (PMAG) system that the company uses to learn from incidents large and small.
Brent brings a unique perspective to his work in IT, as a former air search and rescue pilot and incident commander, an emergency dispatcher and dispatch supervisor for major art & music festivals and events, and a Community Emergency Response Team (CERT) member and instructor.
Throughout his career, Brent has designed, built, managed, and scaled IT infrastructure and teams for everything from embryonic startups to giants such as Google, Apple, and Netflix. He is the coauthor of the highly regarded O’Reilly book Building Internet Firewalls, the developer of widely used open source software, and a popular speaker at conferences worldwide. He has worked with dozens of organizations both in Silicon Valley and around the world, as well as with a variety of non-profit and government entities.
Brent has a rare combination of experience as an emergency manager, technology manager, people manager, software developer, network/systems engineer, and educator. Now, he shares that expertise worldwide with clients as the founder and principal of Great Circle Associates, Inc. His extensive experience enables him to quickly and effectively dive in, assess a situation, and deliver results.
Outages and other IT emergencies are expensive in many different ways, including lost sales, lost productivity, damaged reputation, and damaged morale.
I’m an expert in site reliability engineering, which is about avoiding these problems in the first place. However, despite everyone’s best efforts, sometimes things still go wrong, so I specialize in incident management, which is about resolving these problems quickly and effectively when they do occur.
It’s essential to be prepared, and to learn from each incident so that you’re better prepared for next time.
I can guide your organization to develop these critical incident management capabilities.