Most Read

Josy Joseph’s Feast of Vultures

An award-winning journalist draws up on the stories of anonymous poor and famous Indians to weave together the challenges facing the nation.
Read More

Mahasweta : Life And Legacy

Mahasweta Devi's ideas and writing will continue to be the guiding principle for generations of writers, activists, academics and journalists.
Read More


High-speed supercomputing, the Information Age and Digitization have come together to enable the processing and analyzing of phenomenal amounts of data that is opening the world to positive and negative fallouts. Viktor Mayer-Schonberger, Professor of Internet Governance and Regulation at Oxford University and Kenneth Cukier, Data Editor of  The Economist explain the implications of this new reality in their recently-released book Big Data: A Revolution That Will Transform How We Live, Work and Think. The following piece is an extract from the first chapter.

Big Data– A Revolution That Will Transform How We Live, Work and Think
Author: Viktor Mayer-Schonberger and Kenneth Cukier
Publisher: John Murray
Price: Rs. 499
Pages: 242

IIn 2009 a new flu virus was discovered. Combining elements of the viruses that cause bird flu and swine flu, this new strain, dubbed H1N1, spread quickly. Within weeks, public health agencies around the world feared a terrible pandemic was under way. Some commentators warned of an outbreak on the scale of the 1918 Spanish flu that had infected half a billion people and killed tens of millions. Worse, no vaccine against the new virus was readily available. The only hope public authorities had was to slow its speed. But to do that, they  needed to know where it already was.

In the United States, the Centers for Disease Control and Prevention (CDC)requested that doctors inform them of new flu cases.  Yet, the picture of the pandemic that emerged was always a week or two out of date. People might feel sick for days but wait before consulting a doctor.

Relaying the information back to the central organizations took time, and the CDC only tabulated the numbers once a week. With a rapidly spreading disease, a two-week lag is an eternity. This delay completely blinded public health agencies at the most crucial moments.

As it happened, a few weeks before the H1N1 virus made headlines, engineers at the Internet giant Google published a remarkable paper in the scientific journal Nature. It created a splash among health officials and computer scientists but was otherwise overlooked. The authors explained how Google could “predict” the spread of the winter flu in the United States, not just nationally, but down to specific regions and even states. The company could achieve this by looking at what people were searching for on the Internet. Since Google receives more than three billion search queries every day and saves them all, it has plenty of data to work with.

Google took the 50 million most common search terms that Americans type and compared the list with CDC  data on the spread of seasonal flu between 2003 and 2008. The idea was to identify areas infected by the flu virus by what people searched for on the Internet. Others had tried to do this with Internet search terms, but no one else had as much data, processing power, and statistical know-how as Google.

While the Googlers guessed that the searches might be aimed at getting flu information- typing phrases like “medicine for cough and fever”- that wasn’t the point they didn’t know, and they designed a system that didn’t care. All their system did was look for correlations between the frequency of certain search queries and the spread of the flu over time and space. In total, they processed a staggering 450 million different mathematical models in order to test the search terms, comparing their predictions against actual flu cases from the CDC in 2007 and 2008.And they struck gold: their software found a combination of 45 search terms that, when used together in a mathematical model, had a strong correlation between their prediction and the official figures nationwide. Like the CDC, they could tell where the flu had spread, but unlike the CDC, they could tell it in near real time, not a week or two after the fact.

Thus when the H1N1 crisis struck in 2009, Google’s system proved to be a more useful and timely indicator than government statistics with their natural reporting lags. Public health officials were armed with valuable information.

Strikingly, Google’s method does not involve distributing mouth swabs or contacting physicians’ offices. Instead, it is built on “big data”- the ability of society to harness information in novel ways to produce useful insights or goods and services of significant value.

With it, by the time the next pandemic comes around, the world will have a better tool at its disposal to predict and thus prevent its spread.

Public health is only one area where big data is making a big difference. Entire business sectors are being reshaped by big data as well.


Add comment

Security code