Survivorship Bias — A Danger Zone (Think again, do you really want to miss what is missing!)

Junaid Qazi, PhD
8 min readAug 17, 2019

--

By Dr. Junaid Qazi, PhD (www.scienceacademy.ca)

I want to be as successful as the Facebook CEO, Mark Zuckerberg. School is boring, I don’t want to attend the school, even Mark Zuckerberg did not finish his school, he dropped out!” This is typical thinking that has been developed over the years in many of our growing kids. Not only for our young generation, but most of us may also have developed this bug in our minds, at-least once in our lives.

Well, it is true that there are people who dropped out of school and became very successful millionaires and billionaires. Just google “successful college dropouts”, and you will find big names including Bill Gates, Steve Jobs, Michael Dell, James Cameron, and Oprah Winfrey[i],[ii],[iii]. Ok, these are the successful ones, did you think about those who could not make it, what if they were even more creative and talented than these mentioned names? The list of such people could be very long, and we don’t have that list. Now the question is, should we follow the path of these successful people without knowing how many could not make it? This is how our perception is skewed with survivorship bias and in the absence of a big chunk of very useful information, our decisions could be biased. We are in “A danger zone”, right?

Moving forward, we can ask some questions and try to understand the skewness and danger posed by survivorship bias (well, I would say the biased opinion because of the missing data!).

Bill Gates, Steve Jobs, Mark Zuckerberg are college dropout multimillionaires, so will I? Well, it is a fact that these big names are college dropouts. Their ideas won, they took a leap and miraculously became very successful. Indeed, they worked hard. However, we are ignoring a very important factor while discussing these big names. For a single college dropout, if not thousands, there are hundreds, who were not lucky enough. Unlike these big names, the circumstantial events were not in their favor and they could not manage to pave their way to success. This is a published study that “a majority of the most successful businesspeople graduated college in the United States”(~ 94%) [iv]. Now, should we consider the fact that a college degree is not important to be a successful businessperson? This assumption is an example of survivorship bias, and here it comes the danger zone. Associating dropout to success may not be right for everyone. All the available facts along with possible circumstances must be considered before we come to the decision.

They don’t make them like they used to. A very common comment, we come cross in manufacturing and goods production. Do we consider all the historical goods of poor quality while making this comment? Did all (poor and good quality products) survive over time? Well, most of the times, we even don’t know the amount or quantity of what could not survive over time. Our comments are based on only the best-produced items of the past that survived till today, right? Our perception is skewed again, and there is a bias, a survivorship bias, which arises from the fact that historical goods of poor quality are no longer visible.

In the past, people used to build better and more beautiful buildings. This is a common public impression, and I heard this or similar comment at every historical place. Is this a correct comment? Did we consider the fact that older buildings/structures are constantly torn down? The survived ones might be structurally sound, the most useful and/or beautiful building. Do we have any information on how many (in number) were demolished, were they ugly, and did we need them? There is a range of important and related questions before we consider the comment seriously. There is a high probability that the survived ones are only a small fraction of the total built in the past. The perception is again skewed, “a danger zone and a biased impression because of survivorship bias”.

Music was much better in the 1970s? This is another very common comment. In fact, my parents make this comment all the times and frankly speaking, I like that music as well. That is really amazing work. Let’s see the evolution of music over time with few facts. Today, the available technology has made it much easier for the music industry to create a large quantity of music. One can create music at home with very limited resources. So, more the music is created, higher the ratio of bad come to us along with the good music. It is true that only the best and most popular music is remembered, shared, and played over the years, and we are comparing today’s music with the best music from the 70s. There is a danger, did we think about the bad music of that time while making comparisons? How much was that bad music in quantity? This is another prime example of survivorship bias. Unintentionally, we are ignoring a piece of very important information (data) that skewed our perception.

Now, let’s move on and talk about some well know and published studies on possible effects of survivorship bias.

High-rise Syndrome in cats[v]. This is a very popular study from 1987. The results showed that cats who fall from less than six stories, and are still alive, have more injuries than those who fall from higher than six stories.

Source: todayifoundout.com (fair use policy)

The study proposed that, a possible reason could be “cats reach terminal velocity after righting themselves at about five stories, at this point they are no longer accelerating and can no longer sense that they are falling, this makes them relax which leads to less serious injuries in the cats who have fallen from six or more stories.” Well, did they record the cat who died after falling from higher stories? Later on, it was proposed that “another possible explanation for this phenomenon would be survivorship bias. Cats that die in falls are less likely to be brought to a veterinarian than injured cats, and thus many of the cats killed in falls from higher buildings are not reported in studies of the subject” [vi].

Where should we put armor? The planes coming home from the battle have bullet holes everywhere but the engine and the cockpit!

Source: wikipedia (fair use policy)

This example may not be a good business example however, it is often considered the birthplace of the idea of survivorship bias. During World War II, the allied forces wanted to add armor to their warplanes. Resources were strained and it was not a good idea to add armor to the whole plane. The first obvious thing was to find, which areas were most vulnerable to attack and would benefit most from the additional protection. For the said purpose, the experts studied warplanes that had been shot but successfully made it back home. The planes understudy had incurred no bullet holes to the engine or cockpit, and this led them to decide that the armor should be placed everywhere on the planes except for the cockpit and engine. What about the missing plans? Did they get the bullet? If yes, then where? What if the bullet did hit the engine or cockpit? A danger zone and skewed decision based on only the planes who survived!

At that time, Abraham Wald, a Hungarian mathematician, was a member of the statistical research group at Columbia University, where he applied his statistical skills to various wartime problems. He pointed out a big flaw in the study as the experts were only analyzing the planes that had made it home safely. If a plan came home successfully with bullet holes, those bullets were not fatal to the plane. Wald suggested the military to add armor to the areas where the surviving aircraft had no bullet holes instead [vii],[viii]. This analysis of warplanes that could not come back home, gave an idea of the survivorship bias.

With all the above examples and many more that you can think of, its good point to introduce a general definition of survivorship bias that shewed the perception and could have a significant effect on our decision.

“Survivorship bias or survival bias is a logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not, typically because of their lack of visibility. It can lead to false conclusions in several different ways and is a form of selection bias.”

While ignoring the failures, overly optimistic believes in successes, could lead to false and biased perception. It can lead to creating false thinking that such successful people or group have some special characteristics, rather than just a coincidence.

Indeed, it is not easy to notice the survivorship bias which could have significant effects on our analytics and hence decisions. The skewed perception because of missing data in survivorship bias could cost a lot to any organization. Hence, it is very important to carefully approximate what the missing is in our data. Machine learning models, trained on such data could give biased or false predictions and continuously harm the businesses.

Data scientists have a critical role and bigger responsibility on their shoulders as the absence of this missing information, because of survivorship bias could be life-threatening (warplane example) for someone! Remember, missing data could be the best data. Think again, do you really want to miss what is missing?”

About Dr. Junaid Qazi:

Dr. Junaid Qazi is a Subject Matter Specialist, Data Science & Machine Learning Consultant. He is a Professional Development Coach, Mentor, Author, and Invited Speaker. He can be reached for consulting projects and/or professional development training via LinkedIn (https://www.linkedin.com/in/jqazi/) or through his company website (www.scienceacademy.ca)

Furthermore, Dr. Qazi has a bestselling course on Data Science and Machine Learning on e-learning platforms including Udemy (https://www.udemy.com/course/data-science-and-machine-learning-using-python-bootcamp-qazi/?couponCode=AUG2019) and SkillShare (https://www.skillshare.com/r/user/junaidqazi).

https://www.udemy.com/course/data-science-and-machine-learning-using-python-bootcamp-qazi/?couponCode=OCT0719

--

--

Junaid Qazi, PhD

We offer professional development, corporate training, consulting, curriculum and content development in Data Science, Machine Learning and Blockchain.