Big Data is redefining journalism – telling the truth, removing bias, and opening new vistas for the future of investigative reporting
This year, Megha Rajagopalan, Alison Killing and Christo Buschek, journalists in BuzzFeed, a New York-based US Internet media, news and entertainment company with a focus on digital media, won the Pulitzer prize for exposing how China was building massive new prison and internment camps in the last three years, where thousands of Uighur Muslims were incarcerated. Megha – a BuzzFeed reporter, and Alison and Christo – contributors, never got anywhere near the camps when reporting the award-winning story. They used thousands of publicly available satellite images and interviewed dozens of former detainees who had escaped to build the story of how China has set up the largest detention centre of ethnic and religious minorities since World War II.
Pandemic breaking news came from Big Data
In 2019 December, BlueDot, a Canadian outbreak analysis company, tracked satellite images of clusters of outgoing flights from the Chinese city of Wuhan, and no incoming flights, to sound the first alarm that the city was being shut down. BlueDot’s researchers then analysed the social media chatter in the city to conclude that a virus was causing a strange kind of fever, which baffled doctors. The concentration of the infected was around a wet market, which sold exotic animals, like pangolins and bats which the Chinese consumed as food. It was nearly a month-and-a-half before the World Health Organization woke up to the Coronavirus pandemic.
Data & Storytelling comes together
Journalists, analysts, investigators, intelligence agencies the world over are using publicly available Big Data to extract intelligence from data patterns, to connect the dots and report the story. This is Data Journalism – making the numbers make sense. In 2009, The Guardian newspaper in the UK launched Datablog, a small blog offering full datasets behind the news stories in their publication.
Today, it comprises an entire news section highlighting data driven stories, searchable databases, data visualizations, and tools for exploring data. Journalists at Datablog use Google spreadsheets to share the full data behind their work, visualize and analyse that data, and provide stories for the newspaper and the website. The Guardian has integrated data into their news production process in an approach that distinguishes them from many other newspapers. Through Datablog, the Guardian provides access to the raw statistics behind the news and makes it exportable in whatever form the user desires.
Big data enables journalists both to tell stories they could not tell before, and to tell these stories more effectively. Technology enables access to large amounts of information, much of it published by government agencies. Powerful computers, not available earlier, run algorithms to unearth the hitherto unseen patterns in Big Data, that journalists weave into powerful stories using compelling story telling techniques.
COVID-19 makes the bell curve a buzzword
The COVID-19 pandemic has made all of us aware of the power of data driven story telling. Every electronic, digital, or print media carried the bars, charts and curves which showed the surging waves of the virus. “Flattening the curve” became a buzzword that every individual understood. Creating data journalism about COVID-19 transformed reporting. For one thing, the public aren’t thinking of the statistics as dry and abstract: they now understand how critical they are. People are learning about methodologies and bell curves and logarithmic scales because they’re no longer inclined to look away from the screen when they see those words. That means there are also huge opportunities to learn and for people to change their (usually negative!) relationship with numbers.
Big Data journalism has become a powerful weapon in the arsenal of journalists to report on stories about the most dangerous places on earth, without risking their lives. The raw story is architected from available data, analysed to identify patterns and anomalies.
The future of journalism?
At the beginning of 2016, a small group of journalists decided to investigate the journey of a chocolate bar, banana, and cup of coffee from the original plantations to their desks. Their investigation was prompted by reports that all of these products were produced in poor countries and mostly consumed in rich countries.
Starting from that data they decided to ask some questions: What are the labour conditions on these plantations like? Is there a concentration of land ownership by a small group? What kinds of environmental damage do these products cause in these countries? So El Diario and El Faro (two digital and independent media outlets in Spain and El Salvador) joined forces to investigate the dark side of the agroindustry business model in developing countries.
Image: Worldwide imports and exports of coffee in 2014. Courtesy:datajournalism.com
The resulting “Enslaved Land” project was a one-year cross-border and data-driven investigation that came with a subheading that gets straight to the point: “This is how poor countries are used to feed rich countries”. In fact, colonialism is the main issue of this project. As journalists, they didn’t want to tell the story of the poor indigenous people without examining a more systemic picture. They wanted to explain how landed property, corruption, organized crime, local conflicts and supply chains of certain products, were still part of a colonial system.
The project investigated five crops consumed widely in Europe and the US: sugar, coffee, cocoa, banana and palm oil in Guatemala, Colombia, Ivory Coast and Honduras. As a data driven investigation, they used the data to get from pattern to story. The choice of crops and countries was made based on a previous data analysis of 68 million records of United Nations World Trade.
The Future of Journalism will be shaped by Big Data Analytics; this is the only way to bridge the trust deficit that media has with its consumers. As Lee Baker famously said: “Data doesn’t lie. People do. If your data is biased, it is because it has been sampled incorrectly or you asked the wrong question (whether deliberately or otherwise).”