What can data tell us about modern India?

Yogesh Upadhyaya
12 min readMar 5, 2022

I know I am going to get some dirty looks for saying this, but I read non-fiction books with a pen in hand. I underline the sentences that I like. I mark references for future. Worst of all, I make notes in the margins. I recently read the book — Whole numbers and Half Truths — and my copy is one of my more underlined and scribbled upon books. There are appreciative ‘Aha!’, there are sentences double underlined for extra emphasis and there are questions. A lot of questions. The book does justice to the title and there is a wealth of data in it. However, I am not sure that I always agree with the author when it comes to the subtitle: What data can and cannot tell us about modern India. In this post, I will take up some of the things I liked about the book and also a few of the places where I had questions or disagreements. Let us begin with the part where the book really shines.

The Aha! moments

I am a fan of the author Rukmini. I first came across her work in The Hindu. In a three part series, she analysed nearly 600 judgements in cases involving rape in Delhi’s seven district courts. The insights she gained from this painstaking work are shared in the first chapter of the book and are easily worth its price. So many enlightening passages! For example, why do we see headlines of girls abducted using ‘sedative laced cold drinks’? Turns out that it is because many of the rape cases are actually cases of consensual elopement. The family of the girl objects to this relationship — often for caste or community reasons. To show that the girl was taken away against her will, the family members allege that she was given a ‘sedative laced cold drink!’

I get immense pleasure whenever I understand something about the world that I could not even have guessed. Rukmini’s work has many such insights. Clearly she read all court judgements in Delhi and talked to many people. This told her what the right questions could be and she created the data categories to answer the questions. For example, how many of the rape cases are actually cases of family objecting to a consensual elopement?

Unfortunately, the right questions have not always been asked in the book. Let us take urbanisation as an example of this.

When data is not asked the right question

Early in the chapter on urbanisation is the statement

‘Even today, the median Indian lives in a village’.

This is true by the very strict definition that the Indian census uses to classify a habitation as a town. According to census, a habitation is a town if more than 5,000 people live in that habitation. And the density of population is more than 400 / square kilometre. And at least 75% of the male workforce is engaged in non-farm work. Such a strict definition is not used everywhere in the world.

In a paper published by IDFC institute, Tandel and others have used less strict definitions and found that more than 50% of India already lives in urban habitations. So, which definition should we use to assess the extent of urbanisation in the country? It depends on why you are interested in urbanisation.

If you are interested in economic development, you would note that urban habitations are generally more prosperous. Among other reasons, this is because

· Many more people live in an urban habitation than they do in a rural habitation. What is more, these people live closer together. This allows more businesses to become viable in an urban area. For example, a chai shop in a densely packed slum will have more potential customers than a similar shop in a well spread out village with very few families.

· Many people in a village are occupied in agriculture. In general, agriculture pays less than occupations that are found in urban settings. This makes the business of our tea stall owner in a village even less viable.

What is the lowest population point at which businesses are viable? Of course, there is no one answer. But as income levels increase, the same business becomes more viable at lower population levels and density. So from an economic development point of view, we should be using less strict definitions than we have been doing for decades.

The complexity does not end here. If you were interested in employment opportunities that urban centres provide, you could consider how far away does an individual live from them. Partha and others did just thatand found that even in 2011, more than 50% of Indians lived less than an hour away from an urban centre. Incidentally, the statistic was only 36% in China at the same time. I have covered urbanisation in India here.

Let us take another example of questions not asked. The issue of female foeticide.

India has a very low Sex Ratio at Birth (SRB). The ‘natural’ SRB is between 942 and 953 female births for every 1,000 male births. Any ratio lower than that, and it is likely that there is sex selective abortion in that society. Some couples go to doctors to determine the sex of their foetus and abort it in case it is a girl. Technologies for determination of sex of the foetus — such as ultrasound — started becoming widely available in mid 1980s. Since then, the SRB in India has fallen below 910. This practice is illegal but widespread across the country. States in North of India are much worse than those in South and Haryana was a particularly bad case. NITI Ayog reported that in 2014- 2016, SRB in Haryana was 832!

In recent years, the government of Haryana has claimed that it has starting reversing the decades of low SRB in the state. The data that the government uses to back its claims comes from Civil Registration System (CRS). Not everyone trusts it. Whether the state has actually made progress or not is a debated topic. I have covered it in three posts which can be found here, here and here. My conclusion, based on my interaction with government officers and neutral observers, was that I was optimistic. However, I do not have a high degree of confidence in my optimism. Hence, I would have loved to know Rukmini’s view on this as she has studied the CRS system closely in the context of Covid deaths. The book covers female foeticide but does not talk about Haryana.

In the two examples, we saw that it is important to ask the right questions if we want answers from data. There are some situations however, when data cannot answer even if we ask the right questions. Let us take the issue of employment as an example.

When data cannot answer important questions

At AskHow India, we have spent a lot of time trying to understand employment. If I were to share one insight, it would be — People look for Acchi Naukri (regular salaried job with ‘good’ pay and status) and not just kaam(employment). Most of the country’s employment measures don’t tell us anything about Acchi Naukri. There are other problems too. Let me explain with a made up example.

A farming family has a small piece of unirrigated land. They grow one or two crops a year depending on rainfall. The family grows enough grain for their yearly needs but has very little surplus for sale. They also have a small kirana (grocery) shop in their home in the village. The woman of the house tends to the shop when the man is in the field. When he comes back he sits in it. The shop does not make profit. The couple has two sons and one daughter. One of the sons helps his father in the field although the father does not need too much help on most days as the land holding is very small. The other son goes to Mumbai and works at construction sites for a few months every year. He works extraordinarily hard for the few months and for the rest of the time, he relaxes at home. The daughter helps her mother in domestic chores and in collecting firewood and animal feed. She also helps in the field at times. Most of the cash income of the family comes from the son’s construction work. And yet, our employment statistics would consider the father fully employed and the construction worker son partially employed. Collection of firewood and animal feed would not be considered as employment even though the cattle makes money for the family. The work of running the household is also not considered employment, as it is not in most countries.

This example barely begins to illustrate what is hidden behind the headlines that talk of quarterly employment numbers being higher or lower. If the construction worker son cannot get opportunities in Mumbai he would stay in the village. If a surveyor asks him about employment, he may reply that he works in the field or the shop to avoid embarrassment. The same field and the shop that did just fine without his help when he was in Mumbai. The data from the survey would show that his employment level has actually increased! This is when the cash income of the family has fallen sharply. How widespread is this kind of thing? We can’t say.

Before the pandemic, I went around the country and met around fifty business leaders. These leaders were owners or managers of businesses of different sizes in different sectors. Many of these leaders complained of high level of attrition in their blue collar workforce. One retail manager said that in her Mumbai warehouse, 40% of employees left their job every month! Her understanding was that many employees come with a target saving in mind and leave their job as soon as they achieve the target. This is rational thinking. Most blue collar jobs are not careers. They involve a lot of hard work for long hours. It makes sense for many people to think of them in terms of short term gigs. This is especially true for migrant workers because life in big cities can be very hard for them. And costly. Life in villages is easier and cheaper. So it makes perfect sense for someone to work hard for a few months and then go back home. This thinking is so well accepted that many business owners employ older, married folk in their companies knowing that they are likely to stay longer than younger people with fewer responsibilities. Again, how widespread is this? We don’t know and the data can’t tell us.

This is not all. For many people, salary is not the sole determinant of what is Acchi Naukri. For example, my interactions with skill trainers suggest that people prefer to train as data entry operators even when only one in hundred applicants gets that job. In comparison advertisements for sales job, even in good companies, do not get many applicants. Security and facilities management staffing companies note that the job of a security guard is preferred to any job in facilities management. To add to the complexity, people want a much higher pay for a job in a metro than for the same job in a smaller town.

The chapter on employment acknowledges some of these limitations of employment data. However, to me this acknowledgement does not correctly state how little our employment statistics tell us. The chapter starts with a discussion of the Modi government holding back the release of employment statistics prior to the 2019 elections. It ends with this sentence.

‘India’s jobs crisis is twofold — not enough jobs and suppressed data.’

No. India’s job crisis is a lack of Acchi Naaukri. Data, in the current form, however timely, will not help anyone achieve it.

Urbanisation, birth and employment are tough to measure. However, measuring beliefs is even tougher. Let us look at that next.

Where data gets really unreliable — measuring beliefs

The chapter on beliefs in my copy of the book is filled with questions. For example, here is the author on the results of a survey on attitudes towards democracy and authoritarianism

‘India, along with Pakistan and Russia, featured below the global average on the importance accorded to democracy.’

The immediate question that came up in my mind is how is that with similar attitudes, India has done so much better than Pakistan or Russia in preservation of democracy? Of all the countries that became independent after the Second World War, India is one of the very few that has remained democratic. Others are Belize, Jamaica, Mauritius, Papua New Guinea, Solomon Islands and Vanuatu — all tiny compared to India. So either the attitudes have not been measured well or they are not important. Or there is something different about India and it would be useful to know what that is.

Elsewhere, the author discusses the issue of tolerance and suggests that India is more intolerant than news headlines suggest. The following is a sentence from a section on a Pew survey finding that is supposed show that Indians may not be very tolerant.

‘A majority of Hindus in India see themselves as very different from their Muslim compatriots (66 percent) and most Muslims feel the same way, saying they are very different from Hindus (64 percent).’

It is not clear to me why the acceptance of differences is an indication of intolerance. In fact, you could argue that the question of tolerance does not arise unless there are differences! After all, what would you be tolerating if the person that you are tolerating is the same as you? The author has other data point in support of her assertion but then we come to the another question — how well were those data points collected? Take this one for example,

‘Even among Sikhs and Jains, who each form small minorities, a large majority said their friends came mainly or entirely from their small religious community.’

As I read this, I thought — where were the Sikhs polled? In Punjab or outside the state? If it was in Punjab then isn’t it natural that Sikhs said majority of their friends were Sikhs? After all, in large parts of the state, Sikhs form the overwhelming majority in a location. It is natural that most of those Sikhs have mainly Sikh friends.

To be clear, I am not commenting on the author’s point about the comparative tolerance of Indians and people of other countries. All I am saying is that some of the data she cites does not support her assertion. Additionally, I take survey data with a healthy dose of scepticism.

India, as everyone and their uncle will tell you, is a huge and extremely complex country. To draw a representative sample is very tough. If I talk to only kids in South Mumbai schools, I would believe that football is the most popular game in the country! Of course, people conducting surveys know the importance of drawing representative samples but their best intentions are defeated by the size and complexity and the cost and understanding needed to tackle it. Pre-election and even post polling surveys provide a great example of the challenge of drawing inferences based on surveys. If these cannot correctly predict the result of a single action like a vote, how can we trust them to tell us about such complex things as beliefs?

When the surveys are global, the complexities multiply. The reputation of the survey conducting body is absolutely no guarantee that the survey they conduct is any good. When I dived deep into the Ease of Doing Business surveys conducted by The World Bank, I was shocked to find how ordinary it was. Apart from a multitude of others ills, it was funny that the Ease of Doing Business survey did not even talk to businessmen!

Maybe the author has gone through all the surveys she has cited and has thought through all the questions that I have asked but has made the choice not to address them in the book for reasons of space. The book has ten chapters, each one very different, and hence covers a lot of ground. If the author had to answer all the questions that I am asking, perhaps the book would have become much bigger than she wanted. As a reader however, I would have preferred depth to breadth.

Buy the book. As I said, the insights from Rukmini’s own work are worth the price and your time. I realise that I have devoted way fewer sentences to the good work than I have to the work I have criticised, but that is because Rukmini’s work on crime and courts speaks for itself way better than I could. The rest of the book is also an easy read and is sure to tell things about India that readers may not be aware of. There is a chance though, that you also mark many of the same passages in your copy as I have. Data can tell us things about India only if we ask it the right questions and sometimes not even then.

If you have reached the end of this long post, then the chances are you are also interested in understanding India and in use of data to improve understanding. You could follow me on Medium (consider pressing the envelope icon to get my posts in your email).

You can follow AskHow India (@AskHowIndia) or me (@YogeshUpadh) on twitter or me on LinkedIn.

I have written many posts on using data to understand India. The posts relating to population can be found here. Post relating to employment are in this list.



Yogesh Upadhyaya

Entrepreneur. Economist. Investor. Actor. Technophile. Policy wonk. Comedian. I love to explore places where these worlds intersect.