BS in number’s clothing

Yogesh Upadhyaya
9 min readFeb 17, 2024

We ‘analyst types’ love numbers. We see the deluge of BS that comes at us and we expect numbers to protect us. Numbers are what separates truth from BS. Numbers are precise. Or so we believe. We should be cautious about this belief. People use numbers (and charts) all the time to manipulate our feelings. And yes, we ‘analyst types’ too have feelings. I describe three types of pseudo-quantification: assigning numbers to give a false sense of precision.

Image by Sarah Richter from Pixabay

Let us begin with a case of misrepresentation of pseudo quantification.

From the “It is not what it looks like” department.

This chart came to me via WhatsApp a few days ago and I verified and downloaded it from the source, Statista.

Source: https://www.statista.com/chart/31605/rank-of-misinformation-disinformation-among-selected-countries/

Misinformation was a bigger problem in India than anywhere else in the world — that is what I understood from the title. But that is not what the chart says. The World Economic Forum (WEF) reached out to 1,490 experts around the world and asked them to pick out the top risks from a list of predefined risks. Only the experts in India picked ‘Misinformation / disinformation’’ as the biggest risk. I was surprised. After all, it was in the US that President Donald Trump popularized the term ‘fake news’. I waded through the 124 page report to find out what experts in the US and other countries thought.

Appendix C on page 103 has the country wise top five list. Country after country has ‘Economic Downturn” as the first risk. These countries range from Australia to Belgium to the United States. Many other countries have economy related risks such as inflation or unemployment in their top 5. The Indian economy is doing well and perhaps that is why the selected experts did not think Economic Downturn was a big risk. They did not think Inflation and Unemployment were big risks either. So, in their infinite wisdom, they chose Misinformation. Perhaps a better title for the chart is ‘India among very few countries in the world where experts are not worried about Economic Downturn’.

The report from WEF raises many other questions. How many experts were there from India? Who were these experts? How was the global ranking of risks arrived at from individual country risks? Why is China not in the list of countries? Why is Israel missing? Is it because the survey was done in September and the Israeli experts did not point out armed conflict as a top risk at that time?

There are many many other methodological questions that I can raise from a 2+ hour reading of the report. The biggest of them is that rankings arrived at by using surveys are conceptually flawed. As Vacalav Smil said in his book ‘How the World really works’,

“Where it matters, ranking is impossible — or, at least, inadvisable. The heart is not more important than the brain; Vitamin C is no less indispensable for human health than Vitamin D.”

Ranking of different factors by experts is at best unusable and at worst silly. Nowhere is this more obvious than in the erstwhile Ease of Doing Business rankings of the World Bank.

From the Ministry of ‘Not even wrong’

A couple of years ago, I did a deep dive into the World Bank’s Ease of doing Business. What I found was shocking. The World Bank itself said the report and ranking were not meant to be a guide for making investments. The Ease of Doing Business was not for business people making business decisions! More shockingly, the ranking depended mainly on a survey of people who were not business people. As WB says “…most of the respondents are legal professionals such as lawyers, judges and notaries.” The factors that they used for their survey did not include costs of doing business. Any business leader would tell you that running a business is much easier when costs are low. Each of these is a enough reason to not take the results of such surveys at face value. However, even if these issues were addressed, the EODB ranking would be fatally flawed and that flaw exists in nearly all methodologies that convert qualitative factors into numbers.

The ratings at the heart of the EODB rankings were a weighted average of 11 factors (with 41 subfactors). The weights are relative ranks of the importance of the factor and they do not change with region, sector of the business or the stage of the business. In reality, the importance of different factors is very different for different businesses. Let us take an example. The speed of getting an electricity connection could be very important for a factory being set up in a rural area. But it would be irrelevant for a software startup which could begin at a coworking place. It would also be meaningless for a running factory. These fixed weightages are a serious problem for all the attempts to ‘quantify’ what are essentially unquantifiable problems and try to create an ‘index’.

Taking Smil’s analogy, Vitamin C and Vitamin D and many other micronutrients are important for a healthy body. Taking a weighted average of the level of essential substances in your body tells you absolutely nothing.

Converting answers to survey questions to numbers is a simple pseudo-quantification. A more sophisticated pseudo-quantification happens when models convert underlying numbers to probabilities which cannot be verified.

When numbers are someone’s feelings

In the 2023 cricket world cup, Afghanistan and Australia played a very exciting match. Australia was in deep trouble at 91/7 chasing Afghanistan’s score of 291. Australia eventually beat Afghanistan because of an astonishing 201 from 128 balls from Glen Maxwell. This chart purports to give the probability of Australia winning, over the period of the match.

Source: WhatsApp forard. Reputed websites routinley publish winning ‘probabilitites’

The chart says that at 91/7, Australia had less than 1% chance of winning. Over the course of Maxwell’s innings, this number kept increasing till it became 100% on the last ball of the match. What do these numbers mean?

When we say that a fair coin has a 50% chance of landing heads, what we mean is that if the same coin is tossed 100 times, we expect that it would show heads 50 times. More accurately, if we toss the coin in many sets of 100, the mean number of heads across these sets would be 50. How does this translate to a cricket match? When someone says that Team A has a 60% chance of beating team B, do they mean that if 100 matches were played between Team A and Team B, presumably under identical conditions, then Team A would win 60 of those matches?

Clearly that is absurd. Australia and Afghanistan would not play 100 matches in India at the Wankhade in a World Cup match. Even if they somehow did it over 100 consecutive days neither the teams nor the conditions would be the same. The absurdity compounds when the probability is updated after every over. In the 19th over in the second innings, when Australia’s 7th wicket fell, its probability of winning fell to less than 1%. What did that mean? That if millions of matches were played by the same teams under the same conditions and if Australia were chasing 291 runs and they had lost 7 wickets in the 19th over in 100 of these matches, Afghanistan would win at least 99 of them?

Some people argue that the 1% is not the probability of winning but the confidence that Australia would win from that situation. In which case, is the 1% just another way of saying ‘extremely unlikely’? If another wicket had fallen, would we have said it is supremely extremely unlikely that Australia will win? Are we using numbers because we are running out of words? There is no danger of running out of numbers between 0 and 100 as long as you are not shy of using 0s after the decimal point. The problem is that these numbers are as ambiguous as the words. There is no definition of what is meant by the 1% probability (or confidence).

The Infographic says that the probability calculation was powered by smart stats. Presumably this is the Smart Stats from ESPNCrinfo. Forecaster is a tool from the same package. The write up on the T20 Forecaster says that “Over the last three seasons of the IPL (2016 to 2018), the Forecaster had a 60%”success rate in correctly predicting the winning team at the start of the run-chase…”. Really, 60%? I feel that any reasonable cricket fan could do as good as that. But there is a bigger problem. The creators assume that past success — and 60% is not really success — of their model predicts future success. That is the case only for situations where there is stationarity.

India sent a rocket to the moon’s orbit. A rover emerged from the rocket and landed on the moon. Imagine the complexity of this endeavor. The Earth revolves around the sun and rotates around its own axis and the moon rotates around the Earth. How could the scientists predict where the moon would be a month after the rocket launch? They could do that because the laws of physics that govern the motion of the Earth, moon, rocket and the rover are well defined and do not change. Data obtained from the past can tell us exactly what will happen in the future. In their book Radical Uncertainty: Decision Making for an Unknowable Future, Mervyn King and John Kay have called this property stationarity. Stationarity is not a property of cricket or other sporting contests and also is mainly absent in fields such as economics.

There has been an attempt in the last two centuries or so to tame uncertainty. The attempt has succeeded in such diverse fields such as insurance and short term weather prediction. As King and Kay would say, mysteries in these fields have become puzzles. Puzzles that have one correct solution. It is likely that if there is a prediction of 60% chance of rain over ten days, then there would be rain on 6 of the ten days. However, for too many other fields, there has not been any success. Lack of stationarity is one reason for this. However, the lack of success has not stopped people from throwing around numbers in fields far more consequential than cricket matches.

In 2007–2008 the world was hit by a financial crisis. Financial institutions with long histories such Bear Stearns and Lehman Brothers faced bankruptcy. Doing a comprehensive analysis of the financial crisis is probably beyond the scope of many PhDs let alone this little blog. However what we can definitely judge is some of the commentary on the crisis.

When a few of his funds underperformed, the CFO of Goldman Sachs, David Viniar, said “We were seeing things that were 25-standard deviation moves, several days in a row.” A 25 standard deviation is extremely rare. As some researchers from different universities in the U.K. remarked, “If we observe a profit or loss once a day, then a mere 8- sigma event should occur less than once in the entire history of the universe… a 20-event corresponds to an expected occurrence period measured in years that is 10 times larger than the higher of the estimates of the number of particles in the Universe.”

Mr. Viniar either did not understand probability or he was using numbers to manipulate the feelings of us analytical types. His tactics worked. The Federal Reserve and the US government stepped in to support financial institutions and most executives in these institutions did not suffer any penalties whatsoever.

Numbers come to us, often in the form of charts and tables, all the time. When we take time to figure out how the numbers were arrived at, we find that the methodologies are shockingly poor. As we do not usually have the time to dig deeper, it may be wise to use the dictum, “BS unless proven otherwise”.

This article is part of the series — Tips for early / mid career analytical types.

“How to spot a bad expert” in this series has recieved a lot of attention.

Follow me on medium. Choose the option to get the stories delivered to your Inbox.

You can follow AskHow India (@AskHowIndia) or me (@YogeshUpadh) on twitter or on LinkedIn

--

--

Yogesh Upadhyaya

Entrepreneur. Economist. Investor. Actor. Technophile. Policy wonk. Comedian. I love to explore places where these worlds intersect.