The Murphy-Beyer Effect

Related to a discussion my colleague Betsy and I had the other day, I was led to the following observation:

The end state of an SRE team that acquires new work and automates this new work as much as possible is to have only non-automatable (or practically non-automatable) work remaining.

What do I mean here? It's a little like the operational analogue of Amdahl's law. From parallel computing, Amdahl's law is a description of the limits of how much a computation can be sped up, given some description of what proportion of it is not parallelizable. There are a number of underlying reasons for this, of course, and it turns out that it is much less of an effective limit on the benefits of parallelization than it sounds, but it came to mind when I was thinking about SRE teams workstreams the other day.

To put it another way, a similar analogue for operations is the observation that the proportion of SRE work for a service which is non-automatable comes, over time, to dominate what an SRE team does.

Time and again, SRE chips away at the automatable portions, and creates new things that are automatable when the old things were not (for example, the Maglev project).

In the Google environment, this automates you out of the old job, and into a new one, where you apply the same techniques again. But if there is non-automatable work and the SRE team retains that as its scope grows, that portion of work will only grow. You need a way to say no to more things in order to control how high that goes.

Therefore, from that observation, we can derive the necessity of the hard limit for 50% engineering (or at least a limit on the toil).

Another nuance is the question of practically automatable versus theoretically automatable. If it is possible to do it, but you never get the time to do it, this rule will apply just as strongly. Something to bear in mind if teams get stuck in local minima.


Popular posts from this blog

On-call is broken: Kahneman & Tversky told me so