Skip to main content

Introduction to Site Reliability Engineering (Azure)

I'm exploring different avenues of networking besides straightforward "Work on a network, stay late every day, no peace of mind, fight with higher ups to get resources.".

You may have noticed that there's a smattering of UX and HTML/Javascript coding on here. I like coding, but I'm not too knowledgeable yet. UX is what I really enjoy (And I'm even working with a major company to improve theirs).

Site Reliability Engineering seems to be a nice merging of the two. I imagine it's who you call when your eCommerce check-out services craps the bed.

Not naming any names.



Site Reliability Engineering (SRE) is "Let's make sure our systems stay up and functional for operations". Not much different than networking to the end user. "Does the service stay up long enough for me to use it? Can I depend on it to remain upward."

A good point to remember is most systems don't need to be up all of the time. You don't even want that. It may not be fine to sleep through 5 calls about an outage at 2 PM on Tuesday, but it's okay to know that some failure is normal.

There are even 'error budgets' - If a system is up 85% of the time, that works for some machines, and you can tinker a bit with new features. If something goes down, you're still within 'budget'.

Everyone needs rest - Even machines. If you're afraid to turn something off because "We might lose everything!", you have not integrated decent backup or maintenance practices.

SRE was started with a software engineering mindset, which surprised me...at first. In retrospect, it makes sense. Software developers have to make sure their programs don't have vulnerabilities that impact gathered data, underlying hardware, and operations.

A big part of SRE is "Well, is this system sufficient enough so people have time to improve it, or are they always putting out fires?"

It's a short segment, and I encourage you to read it!

Comments

Popular posts from this blog

Connecting IoT Devices to a Registration Server (Packet Tracer, Cisco)

In Packet Tracer, a demo software made by Cisco Systems. It certainly has changed a lot since 2016. It's almost an Olympic feat to even get started with it now, but it does look snazzy. This is for the new CCNA, that integrates, among other things, IoT and Automation, which I've worked on here before. Instructions here . I don't know if this is an aspect of "Let's make sure people are paying attention and not simply following blindly", or an oversight - The instructions indicate a Meraki Server, when a regular one is the working option here. I have to enable the IoT service on this server. Also, we assign the server an IPv4 address from a DHCP pool instead of giving it a static one. For something that handles our IoT business, perhaps that's safer; Getting a new IPv4 address every week or so is a minimal step against an intruder, but it is a step. There are no devices associated with this new server; In an earlier lab (not shown), I attached them to 'H

Securing Terraform and You Part 1 -- rego, Tfsec, and Terrascan

9/20: The open source version of Terraform is now  OpenTofu     Sometimes, I write articles even when things don't work. It's about showing a learning process.  Using IaC means consistency, and one thing you don't want to do is have 5 open S3 buckets on AWS that anyone on the internet can reach.  That's where tools such as Terrascan and Tfsec come in, where we can make our own policies and rules to be checked against our code before we init.  As this was contract work, I can't show you the exact code used, but I can tell you that this blog post by Cesar Rodriguez of Cloud Security Musings was quite helpful, as well as this one by Chris Ayers . The issue is using Rego; I found a cool VS Code Extension; Terrascan Rego Editor , as well as several courses on Styra Academy; Policy Authoring and Policy Essentials . The big issue was figuring out how to tell Terrascan to follow a certain policy; I made it, put it in a directory, and ran the program while in that directory

Building, Breaking, and Building A CRM with Retool

 I like no- or low-code solutions to things. I've often wanted to simply push a button or move some GUI around and have the code implement itself.  I've thought about building something that's like a customer relationship management (CRM) system for keeping up with my network better than my little spreadsheet where I click links and then go like something. The general idea in this CRM Development is:  To have a GUI to add people to a NRM (Network Relationship Management).       Attach it to a database (MySQL is what I went with eventually using Amazon Relational Database service, but you can use PostGRES, and probably others).     Make sure components are connected to each other in the retool interface. This video is a good start. Watching the tutorial video, heard some SQL commands and went 'Oh no 😳" before going "Wait I know basic SQL", which is good, because you'll see.  When you get set up, there's a plethora of resources you can use -- Incl