Cognitive Bias in OnCall: the evidence
On August 27th 2017, I tweeted a link to what I described as a "survey about attitudes etc to oncall/risk". I urged SWEs/SREs to take it and asserted it should take < 5m to complete.As later revealed, this was somewhat of a white lie: it was a survey, and it did indeed have timelimited questions which meant it plausibly take only a small number of minutes, but there was a lot more going on than any single testtaker could discern.
In fact, it was a cognitive bias detector  a survey designed to gather evidence to confirm or reject the hypothesis that cognitive bias of various kinds might operate in oncall related situations, or in SRE thinking generally. In the construction of this survey, I was strongly inspired by Kahneman & Tversky's work about cognitive bias in general, as related in the wonderful Thinking, Fast and Slow, which I strongly recommend everyone to read.
Specifically, the survey looked at participants attitudes to risk, both specifically within the SRE servicemanagement context, and outside of it, together with questions on anchoring and availability. Leaving aside the psychological terms for the moment, the portion of the survey I found the most engaging was the question I designed to try to provoke anchoring effects  this involved creating a question with a timelimit and using the value of the time limit itself to see if that could provoke participants into reading a graph incorrectly. (I chose graphreading because in many ways, that's what our profession does for a living!)
I apologise for using people's time under somewhat of a false pretence, but the time consumed was voluntary, in the most part quite short, and the results are, in my opinion, quite interesting.
Results
Around 200 people took the survey, with the following distribution of roles:Around 150 of the participants declared what amounted to an average of 7.6 years of experience, with a standard deviation of 2.96, so we note that we seem not to be looking at a population of junior engineers.
Baseline
The next set of questions established a baseline for the majority of participants: in a context which had nothing to do with SRE, how would their choices compare to the population as a whole?"You have a chance to win a prize by choosing to draw a marble from one of two urns. Prizewinning marbles are coloured red. Which urn do you choose: Urn A containing 10 marbles, 1 of which is coloured red, or Urn B, containing 100 marbles, of which 8 are coloured red."
This is just a baseline statistics question, to see if the "background rate", if you like, of statistics knowledge is what you'd expect. (This bias is actually called denominator neglect.) It is not a particularly sophisticated question, of course; 10 marbles, of which 1 is coloured red is just like 100 marbles, 10 of which are coloured red, which is a 10% chance of success. Of course 100 marbles, 8 of which are coloured red is just an 8% chance, so urn A is universally better.
From a baseline comparison point of view, 3040% of students taking this test, are recorded as getting this wrong, whereas only 8.26% of us. so there is some evidence that our understanding is more sophisticated.
Things get more interesting from here on in.
Prospect Theory I: Gain
There are a bunch of ways that psychological kinks in our assessment of risk might reflect themselves. One theory that attempts to explain them is prospect theory.Q3: Do you prefer a 61% chance to win $520,000 or a 63% chance to win $500,000? (This would be a oneoff event, not a repeated sequence.)
From a pure econorational point of view  i.e. simple maximisation of returns  rational utility theory states that we should calculate the projected yield by multiplying the probability of the event by the gain of the event  so 0.61*520 = 317,200, whereas 0.63*500=315. Therefore picking the first, although it's less likely to result in a payout, will maximise your outcome.
However, if I understand prospect theory correctly, it turns out that humans don't think like this. Instead, how you frame the situation changes what you do, and in particular, expected gains matter, and when
you're getting a lifechanging amount of money, the marginal utility on 520k versus
500k is essentially zero. Therefore we maximise according to the largest probability of
getting any large sum, and so we pick the highest chance of getting something. That is indeed what we see
here.
Another way to put this is that we are risk averse; we want the largest chance of getting something, no matter what it is.
I had an intuition that SRE as a profession might well be risk averse, so I wasn't surprised to see this outcome. (This his is risk averse with respect to a gain of course  we see what happens with a loss later.) I was, however, surprised at the response to the next question:
"Do you prefer a 98% chance to win $520,000 or a 100% chance to win $500,000? (This would be a oneoff event, not a repeated sequence.)"
Another way to put this is that we are risk averse; we want the largest chance of getting something, no matter what it is.
I had an intuition that SRE as a profession might well be risk averse, so I wasn't surprised to see this outcome. (This his is risk averse with respect to a gain of course  we see what happens with a loss later.) I was, however, surprised at the response to the next question:
"Do you prefer a 98% chance to win $520,000 or a 100% chance to win $500,000? (This would be a oneoff event, not a repeated sequence.)"
Again, the econorational thing to do is to
multiply the probability by the gain: expected gain in the 98% case is $509k, expected
gain in the 100% case is $500k ... but we overweight the 2% possibility of gaining
nothing at all, and become risk averse.
The most interesting thing here, however, is the size of the disparity in choices: in the other question, the ratio is about 93/25  here, it's 113/10  8% versus 26%. The disparity is presumably greater here because the psychological disparity between 6163 and 98100 is also greater, even though the numerical difference is precisely the same.
"You run a very popular web service with a large numbers of users and paying customers. If there was an outage, would you prefer a 61% chance to lose $500,000 or a 63% chance to lose $480,000?"
The most interesting thing here, however, is the size of the disparity in choices: in the other question, the ratio is about 93/25  here, it's 113/10  8% versus 26%. The disparity is presumably greater here because the psychological disparity between 6163 and 98100 is also greater, even though the numerical difference is precisely the same.
Prospect Theory II: Loss
As discussed above, we now move onto the highly relevant question of loss rather than gain. My intuition here is that SRE's reactions might be distinct from the general population's reactions because the domain we work in cares
deeply about loss but not so much about gain. Of course, if an SRE is working with an ecommerce facility of some kind, it could be argued that more purchase of stuff (due to higher availability) would allow for more gain, but I think that this is not a firstorder way of thinking in our community.
This question used approximately the same dollar numbers and exactly the same probability numbers in order to make the answers as comparable as possible:
This question used approximately the same dollar numbers and exactly the same probability numbers in order to make the answers as comparable as possible:
This is much more evenly distributed  about 70/30  but again the
econorational calculation is that 0.61*500 is a 305k loss, versus 0.63*480 which is a 302.40k loss, therefore the 63%
choice is better. Yet we don't: instead, we pick the one which has the fractionally
higher chance of losing nothing  but it has the higher chance of losing more! Our behaviour has actually flipped around here: we have no choice but to lose something in this scenario, but we are risk seeking because we take the option
which actually loses us more in the steadystate case. Fascinating.
The contrast is actually illustrated even more starkly in the next question:
"You run a very popular web service with a large numbers of users and paying customers. If there was an outage, would you prefer a 98% chance of losing $20k, or a 100% chance to lose $18k?"
The contrast is actually illustrated even more starkly in the next question:
"You run a very popular web service with a large numbers of users and paying customers. If there was an outage, would you prefer a 98% chance of losing $20k, or a 100% chance to lose $18k?"
As before, this is effectively the same question, except we are moving to the edge of the probability
distribution where it is known that humans overweight probabilities, as indeed we do.
This is another increment more evenly distributed than before  approximately 66/33  but the econorational thing is not necessarily what you'd expect intuitively. It seems as if the first option would give you a larger chance of getting off scot free, but when you multiply it out, 0.98*20 = a loss of 19.6k, versus a sure loss of 18k. In this case, the sure loss of a smaller amount is better, so picking the second would be best.
The fact that we don't means that we are gambling with a 2% chance to lose nothing: this is risk seeking behaviour in the hope of avoiding loss.
Most people  69.49% out of 59 takers  picked the correct answer, 08:30, with 09:00 and 10:00 picking up 8.47% & 5.08% respectively. Note the small but noticeable peak at 09:30 (16.95%), which might or might not be due to anchoring effects from the 30 second timelimit.
From 86 takers, the most interesting thing is that, apart from a simple majority (53.49%) picking the correct answer, these distributions are nothing like each other. The sizeable peak at 08:45 (~15.12%), not matched by a similar peak at 09:45, and the peaks at 09:00 (11.63%) and 10:00 (10.47%) with the lower count of test takers selecting the intermediate value 09:30!
The Linda Problem, also known as the conjunction fallacy, is a famous effect in cognitive science where a majority of those sampled choose something which is logically incorrect (to wit, that it is more likely that something has two properties rather than just one.) (The result is controversial but is definitely reproduceable). I set out to see if SREs would suffer from the same error in thinking, by asking the following question:
So we do way better than the general population on this (somewhere up to 85% of those sampled get it wrong, we're at ~41%) but the interesting thing is how, when the domain of the question moves to something considerably more abstract, we get it a little bit wronger, but not much. The next question is actually the Linda problem in disguise:
Consider a regular sixsided die with four green faces and two red faces. The die will be rolled 20 times and the sequence of greens (G) and reds (R) will be recorded. You are asked to select one sequence from a set of three, and you will win $25 if the sequence you choose appears on successive rolls of the die. Do you pick RGRRR, GRGRRR, or GRRRRR?
Of course, the second sequence is one longer than the first sequence so is strictly less probable, but the sequence matches the proportions of the die a little better. In the outside world, apparently 66% of respondents pick this one, but we avoid the wrong answer by about 55% to 45%.
The first interesting fact is that, according to our answers, we behave unlike the standard prospect theory matrix: we are risk averse with loss and risk seeking with gains. It is an open question whether this result is reproduceable and whether or not it has anything to do with our profession's attitude and exposure to risk management generally.
Another interesting fact is that we have a betterthangeneralpopulation understanding of statistics, perhaps befitting our more scientific training, although it could be argued that this is mostly in domains with which we are familiar. (We still do make mistakes, though.)
The final point is that there is evidence that we are vulnerable to anchoring, particularly in stressful situations.
This is another increment more evenly distributed than before  approximately 66/33  but the econorational thing is not necessarily what you'd expect intuitively. It seems as if the first option would give you a larger chance of getting off scot free, but when you multiply it out, 0.98*20 = a loss of 19.6k, versus a sure loss of 18k. In this case, the sure loss of a smaller amount is better, so picking the second would be best.
The fact that we don't means that we are gambling with a 2% chance to lose nothing: this is risk seeking behaviour in the hope of avoiding loss.
Anchoring I
Anchoring is a really powerful and sometimes subtle, sometimes obvious effect. It turns out that it's possible to "anchor" the mind on a particular point for a particular context, making it mentally hard to break away from that point  for example, just mentioning a price before haggling is enough to change the range of prices that the negotiator will tend to stay within. (It works across many contexts; not just for numbers, but it is particularly pernicious for numbers and estimation tasks.)
For this section, I was wrestling with how to potentially demonstrate this effect in oncall contexts. I settled upon the idea of showing a graph to the testtakers and asking them when it would hit some critical threshold  in this case, 1.5  but I decided the anchoring would be more effective if it was a little outofband. Here is the graph in question:
With the luxury of time, it's pretty clear that the most plausible answer as to when the metric would hit ~1.5 again (08:30) is based on following observation: since 05:30, the metric rises for approximately an hour, and then drops shortly after the half hour  i.e. 05:30 → 06:30, 06:30 → 07:30, 07:30 → ... 08:30. It's the kind of thing that's easy to see after the fact. But in the kind of stressful situations that oncallers operate within, it's not necessarily that easy to see what's going on.
For that reason, in order to reproduce this effect, the test takers were divided into two cohorts who were shown different initial texts. The first text was this:
"You are in charge of a very important logs processing system used by 9001000 people in your large company. On the following page you will be shown a graph for a small number of seconds and asked to estimate, if the graph were continued, when the graph metric will hit a particular value.
EMPHASIS: THIS IS A TIMELIMITED QUESTION. YOU WILL HAVE 45 SECONDS TO ANSWER. Only move forwards when you are prepared to work quickly!"
The other cohort were shown the same thing except without mentioning the 9001000 figure, and the time limit was set to 30 seconds instead.
For the 30 second cohort, the choice of answers was relatively simple:
Most people  69.49% out of 59 takers  picked the correct answer, 08:30, with 09:00 and 10:00 picking up 8.47% & 5.08% respectively. Note the small but noticeable peak at 09:30 (16.95%), which might or might not be due to anchoring effects from the 30 second timelimit.
The 45 second/9001000 cohort, however, had a very different pattern.
From 86 takers, the most interesting thing is that, apart from a simple majority (53.49%) picking the correct answer, these distributions are nothing like each other. The sizeable peak at 08:45 (~15.12%), not matched by a similar peak at 09:45, and the peaks at 09:00 (11.63%) and 10:00 (10.47%) with the lower count of test takers selecting the intermediate value 09:30!
Anchoring & Availability II
I had further questions on outage durations, which attempted to anchor two cohorts of participants on outage durations of either 43 minutes or 107 minutes, but while the data demonstrates a longer average in the longer priming case, the data seem a bit too noisy to derive anything much from:
Type (43)

Min

Max

Mean

Stddev

Count

Minutes

0

120

44.71

32.34

24

Hours

0

24

3.92

6.91

12

Type (107)

Min

Max

Mean

Stddev

Count

Minutes

0

120

56.48

25.62

21

Hours

0

39

10.71

14.01

7

The Linda Problem (Conjunction Fallacy)
The Linda Problem, also known as the conjunction fallacy, is a famous effect in cognitive science where a majority of those sampled choose something which is logically incorrect (to wit, that it is more likely that something has two properties rather than just one.) (The result is controversial but is definitely reproduceable). I set out to see if SREs would suffer from the same error in thinking, by asking the following question:
"For almost half a year, your team has been troubleshooting a persistent but intermittent problem. Throughput between hosts will dramatically decrease for seconds or minutes. Eventually it fixes itself. It can happen several times a day but most often happens a few times a week. You believe you have tracked it down to a network problem. You found a new firmware version for your switches, the update notes of which referenced an obscure packet loss condition. After upgrading, the problem has not recurred since (23 days). Which is more likely, in your opinion, considering everything you know above  that the issue is a network problem, or that it is a network problem related to packet loss?"
So we do way better than the general population on this (somewhere up to 85% of those sampled get it wrong, we're at ~41%) but the interesting thing is how, when the domain of the question moves to something considerably more abstract, we get it a little bit wronger, but not much. The next question is actually the Linda problem in disguise:
Consider a regular sixsided die with four green faces and two red faces. The die will be rolled 20 times and the sequence of greens (G) and reds (R) will be recorded. You are asked to select one sequence from a set of three, and you will win $25 if the sequence you choose appears on successive rolls of the die. Do you pick RGRRR, GRGRRR, or GRRRRR?
Of course, the second sequence is one longer than the first sequence so is strictly less probable, but the sequence matches the proportions of the die a little better. In the outside world, apparently 66% of respondents pick this one, but we avoid the wrong answer by about 55% to 45%.
Conclusions
The first interesting fact is that, according to our answers, we behave unlike the standard prospect theory matrix: we are risk averse with loss and risk seeking with gains. It is an open question whether this result is reproduceable and whether or not it has anything to do with our profession's attitude and exposure to risk management generally.
Another interesting fact is that we have a betterthangeneralpopulation understanding of statistics, perhaps befitting our more scientific training, although it could be argued that this is mostly in domains with which we are familiar. (We still do make mistakes, though.)
The final point is that there is evidence that we are vulnerable to anchoring, particularly in stressful situations.
Further work on this would seem appropriate. Would any statisticians or psychologists like to help? (It seems like there should at least be a paper in it.)
Comments
Post a Comment