Service Level Objectives (SLOs) - noob to pro in under 30 minutes!

Ғылым және технология

Welcome to a beginner friendly but comprehensive overview of Service Level Objectives (SLOs). This is based on the real-world experience of managing multiple SRE teams at scale.
We'll start by explaining what SLOs are and how they fit into the software development life cycle. Once we have a foundation, we'll then jump right into building an example SLO for a hypothetical ecommerce checkout flow.
After our example is complete, we'll discuss Service Level Indicators (SLIs) and cover the four basic categories of SLIs; latency, availability, consistency, and throughput. We'll then discuss how you can actually measure your SLOs using SLIs, and provide a few code examples of what that looks like in production.
Once we understand how to integrate SLIs into our code, we'll then cover a range of development tools you can use to hook up your SLIs to dashboards and other monitoring solutions so you can start to visualize your SLOs.
If you're not interested in using existing tools to implement your SLOs and would like to build something more custom, we also discuss how to calculate SLO performance manually by using our previous ecommerce SLO example to demonstrate basic SLO math.
We'll then discuss the importance of reviewing your SLOs real-world performance, and using that newfound understanding to influence future designs and strategy. We also briefly discuss SLO Burn Rates and Error Budgets, but these are more advanced concepts that you shouldn't worry about when first getting started with SLOs.
Finally, we'll do some objection handling and provide our final thoughts on the usefulness of utilizing SLOs to measure the reliability of our systems.
TIME STAMPS
00:00 Intro
01:10 What are SLOs and where do we start?
03:10 Building an Ecommerce SLO Example
06:32 Service Level Indicators (SLIs)
09:58 Implementing SLOs with SLIs
14:00 Visualize SLOs with Dashboards and Other Tools
19:30 Basic SLO Math for our Ecommerce Example
22:03 Understanding our SLO Data
23:01 SLO Burn Rate and Error Budgets
24:10 Final Thoughts and Objection Handling
25:55 Outro
LINKS FOR RESOURCES MENTIONED
📚 Implementing Service Level Objectives by Alex Hidalgo - www.alex-hidalgo.com/the-slo-...
📚 Database Reliability Engineering by Laine Campbell and Charity Majors -www.oreilly.com/library/view/...
📚 Google's SRE Teams Recommended Book List - sre.google/books/
🛠️ Sloth - An open-source Prometheus SLO Generator - sloth.dev/
🛠️ Prometheus - An open-source monitoring solution - prometheus.io/
🛠️ Grafana - Versatile observability and monitoring dashboard solutions - grafana.com/
🛠️ Grafana Loki - An open-source log aggregation system - grafana.com/oss/loki/
🛠️ Apache Log4j 2 - An open-source logging library that integrates with Prometheus - logging.apache.org/log4j/2.x/
🛠️ Thanos - An open-source, highly available Prometheus setup - thanos.io/
🛠️ Nobl9 - A turn-key platform that lets you build SLOs from your existing monitoring - www.nobl9.com/
🌐 Setting SLOs with Custom Metrics by Google Cloud -cloud.google.com/blog/product...
🌐 Using Cloud Load Balancing Metrics by Google Cloud -cloud.google.com/stackdriver/...
🌐 A Great Summary of Googles SRE Lecture Series by Alexander S. Augenstein - asa55.github.io/class-sre-imp...

Пікірлер: 5

  • @juliairvin3396
    @juliairvin33968 ай бұрын

    Exactly what I needed when I needed it! Well done! Important concepts communicated efficiently!

  • @darkmodeclub

    @darkmodeclub

    7 ай бұрын

    Thanks so much! Glad to help :)

  • @sbhsilent7864
    @sbhsilent78645 ай бұрын

    This is amazing. I love this, seriously. More code examples plz! Subbed.

  • @lawrencecuneaz3348
    @lawrencecuneaz33487 ай бұрын

    phenomenal video 🔥instant sub

  • @darkmodeclub

    @darkmodeclub

    7 ай бұрын

    Glad you enjoyed it, thanks for your support!

Келесі