Artwork

Вміст надано Miko Pawlikowski. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Miko Pawlikowski або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.
Player FM - додаток Podcast
Переходьте в офлайн за допомогою програми Player FM !

Why you will fail without Chaos Engineering, with Kolton Andrus - HS#25

43:18
 
Поширити
 

Manage episode 442784521 series 3558558
Вміст надано Miko Pawlikowski. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Miko Pawlikowski або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

Introduction

Welcome to episode 25 of the HockeyStick podcast, where we delve into breakthroughs in tech, business, and performance.

In today's episode, Miko Pawlikowski sits down with Kolton Andrus, a well-known figure in the SRE and chaos engineering space. As the founder of Gremlin and a seasoned engineer, Kolton shares his insights into the evolution of chaos engineering, the challenges it faces, and his thoughts on the future of the industry.

The Journey of Chaos Engineering

Kolton Andrus begins by discussing the foundational ideas of chaos engineering. "It's about taming the chaos," he explains. The primary goal is to find system edges and handle them efficiently, ensuring reliability. Kolton emphasizes that organizations should invest in reliability as it is often a multimillion-dollar problem.

Shifting Roles at Gremlin

Kolton moved from being the CEO to the CTO of Gremlin. "It's been a journey," he reflects, noting that he felt his talents were best served in a technical role. This shift allowed him to work on product development and address the problems within chaos engineering more thoroughly.

The Importance of Chaos Engineering

Chaos engineering is an emotional topic for many SREs, like Miko Pawlikowski. It deals with intentionally injecting failures to test system resilience. Kolton highlights that the engineering part is crucial, "because whenever you tell someone I do chaos engineering, they think you're the joker… And that's the mistake."

The Branding Dilemma

While the concept and technique of chaos engineering are sound, its branding remains a challenge. The term "chaos" doesn't sit well with corporate executives. Kolton shares that although they leaned into the fun branding with Gremlin, it sometimes backfired. Executives want maturity and reliability, not something perceived as "immature."

Marketing and Acceptance

Marketing has always played a significant role in the adoption of chaos engineering. Many organizations found the name off-putting. Kolton notes that reliability engineering or resilience engineering might be better terms. The focus is on explaining to the stakeholders the benefits and necessity of adopting such practices.

Gamification in Engineering

One of the challenges in chaos engineering is getting organizations to adopt it systematically. Kolton mentions creating a rubric and scoring system for services, helping teams see their progress. "If you want people to do the right thing, you need to make it easy," he asserts.

The Evolving Landscape

Kolton acknowledges that the gaming industry, despite its need for reliable systems, often lags in adopting such practices. He points out that people are generally resistant to changes, especially when they seem complex or unnecessary.

Lessons Learned and Future Prospects

Over the eight years of Gremlin's journey, Kolton has faced numerous ups and downs. From being told they had product-market fit to being told they did not during the pandemic, it has been a learning experience. "It's super hard when it's your baby," Kolton admits, but the key is to keep iterating and improving.

Intelligent Health Checks

Gremlin's latest features focus on intelligent health checks, enabling even those without robust monitoring systems to understand their system's health. "How do we take the expertise that me and a lot of the engineers on my team have learned…and embed it into the product?" Kolton asks.

AI in Reliability

The conversation also touches on the role of AI in reliability engineering. Kolton is skeptical about the current AI capabilities. He believes AI can assist in guidance and analysis but cannot replace the need for deterministic solutions in complex distributed systems.

Kolton's Philosophy

Kolton's closing thoughts are reflective and grounded. He advocates for incremental improvements, "do a little better every day." This philosophy, he believes, applies not only to engineering but also to personal development.

Conclusion

Kolton Andrus's journey through chaos engineering and reliability offers valuable insights for anyone in the tech industry. His experiences underscore the importance of resilience, not just in systems but also in navigating the challenges of innovation and acceptance. Tune in to the full episode for an in-depth discussion on the future of chaos engineering and much more.

00:00 Introduction to Chaos Engineering

01:07 About Kolton Andrus

01:25 The Journey of Gremlin

02:01 The Evolution of Chaos Engineering

04:55 Challenges and Misconceptions

11:20 Real-World Examples and Impact

17:08 The Future of Chaos Engineering

21:04 The Expert vs. The Easy Button

21:19 Aligning Incentives for Reliability

22:51 Scoring and Gamification in Reliability

25:25 Industry Adoption and Challenges

28:02 The Human Element in Reliability Engineering

30:44 Reflections on Gremlin's Journey

35:04 Future Directions and AI in Reliability

41:44 Final Thoughts and Philosophy


This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.hockeystick.show
  continue reading

26 епізодів

Artwork
iconПоширити
 
Manage episode 442784521 series 3558558
Вміст надано Miko Pawlikowski. Весь вміст подкастів, включаючи епізоди, графіку та описи подкастів, завантажується та надається безпосередньо компанією Miko Pawlikowski або його партнером по платформі подкастів. Якщо ви вважаєте, що хтось використовує ваш захищений авторським правом твір без вашого дозволу, ви можете виконати процедуру, описану тут https://uk.player.fm/legal.

Introduction

Welcome to episode 25 of the HockeyStick podcast, where we delve into breakthroughs in tech, business, and performance.

In today's episode, Miko Pawlikowski sits down with Kolton Andrus, a well-known figure in the SRE and chaos engineering space. As the founder of Gremlin and a seasoned engineer, Kolton shares his insights into the evolution of chaos engineering, the challenges it faces, and his thoughts on the future of the industry.

The Journey of Chaos Engineering

Kolton Andrus begins by discussing the foundational ideas of chaos engineering. "It's about taming the chaos," he explains. The primary goal is to find system edges and handle them efficiently, ensuring reliability. Kolton emphasizes that organizations should invest in reliability as it is often a multimillion-dollar problem.

Shifting Roles at Gremlin

Kolton moved from being the CEO to the CTO of Gremlin. "It's been a journey," he reflects, noting that he felt his talents were best served in a technical role. This shift allowed him to work on product development and address the problems within chaos engineering more thoroughly.

The Importance of Chaos Engineering

Chaos engineering is an emotional topic for many SREs, like Miko Pawlikowski. It deals with intentionally injecting failures to test system resilience. Kolton highlights that the engineering part is crucial, "because whenever you tell someone I do chaos engineering, they think you're the joker… And that's the mistake."

The Branding Dilemma

While the concept and technique of chaos engineering are sound, its branding remains a challenge. The term "chaos" doesn't sit well with corporate executives. Kolton shares that although they leaned into the fun branding with Gremlin, it sometimes backfired. Executives want maturity and reliability, not something perceived as "immature."

Marketing and Acceptance

Marketing has always played a significant role in the adoption of chaos engineering. Many organizations found the name off-putting. Kolton notes that reliability engineering or resilience engineering might be better terms. The focus is on explaining to the stakeholders the benefits and necessity of adopting such practices.

Gamification in Engineering

One of the challenges in chaos engineering is getting organizations to adopt it systematically. Kolton mentions creating a rubric and scoring system for services, helping teams see their progress. "If you want people to do the right thing, you need to make it easy," he asserts.

The Evolving Landscape

Kolton acknowledges that the gaming industry, despite its need for reliable systems, often lags in adopting such practices. He points out that people are generally resistant to changes, especially when they seem complex or unnecessary.

Lessons Learned and Future Prospects

Over the eight years of Gremlin's journey, Kolton has faced numerous ups and downs. From being told they had product-market fit to being told they did not during the pandemic, it has been a learning experience. "It's super hard when it's your baby," Kolton admits, but the key is to keep iterating and improving.

Intelligent Health Checks

Gremlin's latest features focus on intelligent health checks, enabling even those without robust monitoring systems to understand their system's health. "How do we take the expertise that me and a lot of the engineers on my team have learned…and embed it into the product?" Kolton asks.

AI in Reliability

The conversation also touches on the role of AI in reliability engineering. Kolton is skeptical about the current AI capabilities. He believes AI can assist in guidance and analysis but cannot replace the need for deterministic solutions in complex distributed systems.

Kolton's Philosophy

Kolton's closing thoughts are reflective and grounded. He advocates for incremental improvements, "do a little better every day." This philosophy, he believes, applies not only to engineering but also to personal development.

Conclusion

Kolton Andrus's journey through chaos engineering and reliability offers valuable insights for anyone in the tech industry. His experiences underscore the importance of resilience, not just in systems but also in navigating the challenges of innovation and acceptance. Tune in to the full episode for an in-depth discussion on the future of chaos engineering and much more.

00:00 Introduction to Chaos Engineering

01:07 About Kolton Andrus

01:25 The Journey of Gremlin

02:01 The Evolution of Chaos Engineering

04:55 Challenges and Misconceptions

11:20 Real-World Examples and Impact

17:08 The Future of Chaos Engineering

21:04 The Expert vs. The Easy Button

21:19 Aligning Incentives for Reliability

22:51 Scoring and Gamification in Reliability

25:25 Industry Adoption and Challenges

28:02 The Human Element in Reliability Engineering

30:44 Reflections on Gremlin's Journey

35:04 Future Directions and AI in Reliability

41:44 Final Thoughts and Philosophy


This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.hockeystick.show
  continue reading

26 епізодів

Все серии

×
 
Loading …

Ласкаво просимо до Player FM!

Player FM сканує Інтернет для отримання високоякісних подкастів, щоб ви могли насолоджуватися ними зараз. Це найкращий додаток для подкастів, який працює на Android, iPhone і веб-сторінці. Реєстрація для синхронізації підписок між пристроями.

 

Короткий довідник