Thank you for the response to my last week’s post on Paying it Forward! I’m glad to have helped some of y’all out & I hope to continue doing it for the rest of February. As of now, I do not have plans to extend it. Please consider amplifying, if you believe somebody might benefit from this initiative.


With that out of the way, moving on to my ramblings for this week. Over the past couple of years, I've developed an interest in the disciplines of Chaos Engineering, Reliability, and Observability. While I spent most of last year trying to find my feet, my goal for the next couple of months is to be able to dive deeper in this space. Coming from a production support background, I believe it was slightly easy for me to grasp things & that sort-of kept me on track with the learning bit (I’m famous for losing motivation very easily, just ask anybody who knows me!). Transitioning, however, is still very much a work-in-progress.


Like I said, having transferable skills from my current job role simplified the process a whole lot. However, a lot of the “practical” part of learning did not come as easy because my day job dealt (and to-date, still deals) with administration & support of middleware infrastructure that is on-premise. Therefore it was difficult to be autodidactic & here’s how I managed to come this far with resources on the internet that are easily & freely available to everyone:


  • Free credits on GCP & AWS cloud: Everybody who signs up gets this & honestly, if you do not want to start off by installing stuff on your desktop/laptop these are pretty good for managed services.

  • Intermediate proficiency in at least one programming/scripting language: This was one thing I struggled with (& still struggle with). I love Python & Bash Scripting; it doesn’t need to be the case with everyone. Most of my resources in this section are ones you already probably know of - Hackerrank, LeetCode. These, of course, are websites you should use to practice only after you have understood the basics of the language. 

  • Basic-Intermediate proficiency in an Operating System: I was lucky to be in a profession that helped me gain this without any explicit efforts from my end. One great blog that I’d love to pass on for advancing those Linux chops is the Linux Kernels Internals blog. 

  • Container & container orchestration: While this whole section was clubbed together in my initial search (Yes, I was one of the people who searched how to Kubernetes!), you can start one notch higher than I did. How? By reading the docs, here & here

  • Everything by Charity Majors on Observability & SRE: I was lucky enough to find Charity’s blog + Twitter account pretty early on in my search for great content on Observability & Reliability. Not one to mince words, I remember her saying that Chaos Engineering without observability is just Chaos & (almost) a year later I don’t think I could agree more.

  • Choose an observability/monitoring tool & dig in: I chose to experiment with Dynatrace due to my familiarity with the platform & due to a sample DevOps pipeline being readily available to fork on GitHub. This might seem counterintuitive to a lot of people & I agree that there are open-source options out there to experiment with. As a newbie, this seemed to be a project with the lowest entry level barrier for me to implement & hence, the decision. Prometheus, Grafana, Fluentd are all great options that I have basic knowledge of & am trying to learn more of, this year.

  • Chaos Engineering book on O’Reilly: This is one of the first books last year after I heard Adrian Hornsby talk at the AWS Community Day in Pune. This book has very less tech jargon, making it an accessible & a great introductory read for folks who are just getting started. 

  • Experimenting with a Chaos Engineering tool (or just create your own!): There's a very wide range of cloud native chaos tools available in the market, both proprietary & open source. This link provides a very good overview of some of them. AWS Fault Injection Simulator is another offering I'm very excited to try out this year when it becomes fully available. Of course, if you have the programming finesse (I do not!) to create your own version of a Chaos/Fault injector & integrate it with an observability/monitoring tool, there's nothing like it!

  • Following folks on Twitter & LinkedIn: I absorb a lot of knowledge by reading & engaging with what folks post on Twitter & LinkedIn. By curating my feed, I have a whole stash of bookmarks that I keep adding to for learning more. I’ve also made it a rule to periodically review & housekeep them, if they do not serve the purpose any longer. This is not limited to just the topics I mentioned above, but is a general practice I follow to feed my curiosity.

  • Attending conferences* : With conferences going virtual last year, I had the opportunity to attend a few amazing ones surrounding this theme. I was also extremely fortunate to speak about my ongoing journey in this space at four different avenues last year. You can find them listed here, here, and here

* A lot of virtual conferences had slashed their attendance fees last year for greater accessibility & inclusion. Some of them were free-for-all. 

I'm still very much in the process of collating my learning resources for this year & shall be sharing that separately. Keep watching this space for more details!

  • LinkedIn
  • Twitter

©2021 by