The following is ripped off wholesale from Charles Perrow’s Normal Accidents: Living with High Risk Technologies. It’s very rare that I resort to being a mere copyist but for one, I haven’t felt like writing anything lately and wanted to write something. And two, the concepts in Perrow’s book are important enough to write about. If you have a Kindle you can download a sample of his book and get a better idea of what I’m talking about here.
Today’s an important day. You stay home from work because you have a job interview that is important enough that it could put you years ahead in your career. And you’re pretty sure you could get it, but you do have to go to the interview.
When you stumble into the kitchen you see that your spouse left the burner on the stove under the coffee pot on. The coffee has boiled away and the glass pot has cracked. You’re a coffee addict so you clean up the mess and root around for an old drip coffeemaker. You find it and make coffee and drink it down.
Now you’re in a hurry and pissed off and distracted enough that you lock yourself out of the apartment and have left the car keys inside. No problem because you have hidden a spare key in the hallway for just such an emergency. This is a safety device called a redundancy. Then you remember you loaned a key to a friend so he could return some books of yours he borrowed.
Now it’s getting late. But there’a nice old man next door who drives his car once a month. He’s let you borrow his before. When you ask him he says he’d like to help you but his car broke down and wont be fixed for a few days. And BTW, he adds, the bus strike you’ve been hearing about has happened.
No matter. You call a cab but find out that since the bus strike started there are no cabs to be had because everybody’s taking a cab. You call to reschedule the interview and they do not hire you because they think you’re a flake that can’t even keep an appointment.
Now. What was the primary cause of this “accident?”
1. Human error, such as leaving the coffeepot on or forgetting the keys.
2. Mechanical failure, such as the old man’s car breaking down.
3. The environment, such as the bus strike and cab overload.
4. Design of the system, in which you can lock yourself out rather than having to use a door key to lock the door.
5. Procedures, such as warming coffee in a glass pot or not getting up extra early.
6. None of the above.
If you answered 6, you’re probably right.
If you answered 1, human error, you’re taking a stand on multiple failure accidents that resembles that of the President’s commission to Investigate the Accident at Three Mile Island nuclear power plant. The Commission blamed everybody but primarily the operators. The builders of the equipment, Babcock and Wilcox, blamed only the operators.
If you answered 2, mechanical error, you can join Metropolitan Edison who ran the plant. They held that the accident was caused by a faulty valve and sued Babcock and Wilcox.
If you answered 4, design of the system, you are in the company of Essex Corporation, who did a study for the NRC of the control room.
The cause of your inability to get to the most important interview of your life is found in the complexity of the system. Each of the failures-design, equipment, operators, procedures, or environment were in themselves trivial. Such things happen and since we know that the world is not perfect, we rarely even notice them. The bus strike wouldn’t be important if you had your car key or your neighbor’s car. The lack of the neighbor’s car wouldn’t matter if you could get a taxi. And if all this had happened any other day but today none of it would matter; you’d just go to work late, or call in sick.
On any other morning the broken coffeepot would have been merely annoying but it probably wouldn’t have made you forget your keys. So, the failures in themselves were trivial and each had a redundant back up. When the back ups were blocked the failures became serious when they interacted. It’s the interaction of multiple failures that was to blame. What you don’t expect is for all these things to happen at once.
That particular failure was not in the discrete failures but in what’s called tight coupling. The bus strike and lack of cabs are obviously tightly coupled (interdepent) where your keys and the neighbor’s car are not, but they all happened at the same time.
This is a good example for instructional purposes because everybody’s had days like these and it’s easy to see what happened because the events were so linear. It’s a bad example because disastrous failures are so complex that an operator cannot know what is happening when things go wrong because of the complexity of the system. Any part of, say , a nuclear plant can interact with any other part, not necessarily in an operational sequence.
And I’ll leave it at that because accidents such as these are too complex to go into here. The key point is that they are “normal” accidents in the sense that if the system is operated for long enough, any possible accident that can happen, will.