Site Reliability Engineering (SRE) originated at Google when they tasked a team to make Google's already highly reliable services even more reliable. It's a discipline that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goal is to create scalable and highly reliable software systems.

Key Principles of SRE

Reliability First

The foremost goal is to ensure that services are reliable and available to users. However, SREs recognize that striving for 100% availability is often not the best use of resources. Instead, they focus on ensuring a service meets its Service Level Objective (SLO), which is a bit less than 100%.

Embrace Risk

SREs accept that no system can achieve 100% uptime. They define an acceptable level of risk and ensure that services are available within that threshold.

The rest of the content (12 read minutes) is restricted.

Please use your personal access token or register to access.

Create account or login