Cognitive Bias in On-Call: the evidence On August 27th 2017, I tweeted a link to what I described as a "survey about attitudes etc to oncall/risk". I urged SWEs/SREs to take it and asserted it should take < 5m to complete.

As later revealed, this was somewhat of a white lie: it was a survey, and it did indeed have time-limited questions which meant it plausibly take only a small number of minutes, but there was a lot more going on than any single test-taker could discern.

In fact, it was a cognitive bias detector -- a survey designed to gather evidence to confirm or reject the hypothesis that cognitive bias of various kinds might operate in on-call related situations, or in SRE thinking generally. In the construction of this survey, I was strongly inspired by Kahneman & Tversky's work about cognitive bias in general, as related in the wonderful Thinking, Fast and Slow, which I strongly recommend everyone to read.

Specifically, the survey looked at participants att…

The Murphy-Beyer Effect

Related to a discussion my colleague Betsy and I had the other day, I was led to the following observation:

The end state of an SRE team that acquires new work and automates this new work as much as possible is to have only non-automatable (or practically non-automatable) work remaining.

What do I mean here? It's a little like the operational analogue of Amdahl's law. From parallel computing, Amdahl's law is a description of the limits of how much a computation can be sped up, given some description of what proportion of it is not parallelizable. There are a number of underlying reasons for this, of course, and it turns out that it is much less of an effective limit on the benefits of parallelization than it sounds, but it came to mind when I was thinking about SRE teams workstreams the other day.

To put it another way, a similar analogue for operations is the observation that the proportion of SRE work for a service which is non-automatable comes, over time, to dominate wh…

Why programming is like writing poetry -- and vice versa

I am somewhat unusual in having a degree not only in CS & Maths, but also in Poetry Studies.

Yet there's more in common between poetry and programming than practitioners from either side usually recognise.

For example:

I'm wrestling with some problem. I'm not completely sure how to characterise it, or what the solution is. I might have a good first-order guess at how to tackle it, or I might not. Part of the function of writing is to firm up this first-order idea of what the right way is.

I open an editor partially to let my mind run cerebral digits around the edges of the problem. I open it partially for the pleasure of writing, of crafting something that might work, that might be the best I've done yet.

While I write, I think precisely and painstakingly about the power of words I use, and the higher-level constructions. There are many consequences to the words and the forms we choose: some obvious, some very much less so. I think hard about how each word relates …

On-call is broken: Kahneman & Tversky told me so

On-call is broken. It's used to paper over the cracks in systems, instead of as a reliable, reluctantly-used failsafe. The whole way we do it as an industry is just wrong.

Perhaps you disagree, but I think I have some pretty compelling evidence that it's true. I also think we can fix it -- or at least give it a good try.