Site Reliability

Our top priority is service reliability

Reliability is everyone’s responsibility. It is not something that is “bolted on” to a product or service. Reliability is the foundation on which quality solutions are built and delivered.

Similarly, we will never be “done” with reliability. Computers fail in spectacular ways, so it is critical that we maintain energy and discipline around our reliability efforts. When things fail, we have an obligation to examine them in detail, looking for ways to avoid repeated failures.

An important aspect of reliability is dealing – in realtime – with fragility. When problems occur, it is imperative that we have a defined process for responding promptly and effectively. Every problem will be unique in some way, but the process of identifying the problem, theorizing potential fixes, and implementing those fixes safely will be largely the same.