Data Journalism: How to Create Compelling Content from Data

Page 1 of 2


Article ImageThe risk of violence increases when an armed guard is present during a bank robbery. That's what reporters Shoshana Walter and Ryan Gabrielson uncovered while working on an assignment for The Center for Investigative Reporting, a nonprofit media organization.

Gabrielson was able to wrest that finding by doing some smart analysis of a hard-to-crack source: a detailed dataset of 31,640 bank crimes in the U.S. reported to the Federal Bureau of Investigation (FBI) from 2007 to 2011.

Being able to use the FBI's data meant the reporters didn't need to merely rely on anecdotes and expert opinions. "With data journalism you can actually measure how much something is occurring, and it just elevates your story," says Walter. The piece, "FBI Bank Robbery Data Shows Armed Guards Increase Risk of Violence," is part of the December 2014 Hired Guns series produced with CNN. The series is posted on the organization's website, Reveal.

Journalists have long relied on numbers to support narratives. Some in the field have even been doing deep dives into the data to find the stories buried inside-that's not new. But fresh interest, new data sources, and cheaper tools are mainstreaming what was once an advanced and specialized area of the newsroom. Today, even more reporters are translating columns and rows of information into compelling text and impactful visuals.

"Working with data is simply communication, and it's storytelling, and it's starting from a spreadsheet and turning it into something that someone will want to read and someone will take something away from," says Jonathan Soma, director of Columbia Journalism School's The Lede Program, which focuses on computing, data analysis, and data visualization.

These days, the data that journalists can use to drive stories is plentiful. In fact, more and more data is being made freely available all the time, Soma says. However, some information is easier to access and analyze than others.

"The point of this is to find stories that are important to the public, and sometimes, you can find that in a fairly small, easy-to-get dataset," says Mark Horvit, executive director of the nonprofit organization Investigative Reporters & Editors (IRE). "Sometimes, it demands fighting with the government or scraping something from a website. Sometimes, it demands wrestling huge piles of data into something that's meaningful, and journalists are doing all of that."

Gabrielson, now a staff reporter at ProPublica, says the open source movement that enables software to be available at no cost is making the methods and tools of data journalism more accessible. "Data reporters long toiled with glitchy software like [Microsoft] Access to manage and analyze data, unaware they had options," he says. However, in the past 10 years, journalists with some coding expertise have been able to tap into alternative programming languages such as SQL, R, and Python, which are cheaper and more powerful, he says.

Gathering and Analyzing Data in Different Ways

AJ Vicens, a reporter at Mother Jones, used a free SQL database-a Firefox add-on called SQLite-to organize data and run queries for his piece, "How Dark Money Is Taking Over Judicial Elections." He told the story using visuals and text, drawing from figures he gathered from three sources.

One was the National Institute on Money in State Politics (followthe?, an archive of contributions to political campaigns in 50 states. He also relied on data from The Brennan Center for Justice (a law and policy institute) and The Center for Responsive Politics (a research group that tracks money in U.S. politics and its effect on elections and public policy).

The process of putting the data into the database wasn't automated. The dataset from the National Institute on Money in State Politics had 700,000 rows, and Vicens put in only parts of the file. Information from The Brennan Center was locked in PDFs, so he had to manually input it into the database. He also entered data he queried directly from the website of The Center for Responsive Politics. 

"We could talk to judges around the country about what it's like to be in these elections and we did do that," he says, "but to say, ‘Here's the hard data on how much it's increasingly an independent spending game and that's problematic in these ways and this is how much money is involved'-it adds a backbone to it that I feel is worth the time and investment." His article is part of a data-driven package on judicial elections he wrote with reporter Andy Kroll for the November/December 2014 issue.

Walter and Gabrielson, meanwhile, obtained their data for the bank story by filing a Freedom of Information Act (FOIA) request with the FBI. Gabrielson used the statistical software SPSS to do a number of analyses.

Gabrielson first conducted a cross-tab analysis to see if there was a higher incidence of violence, injuries, and death with armed guards present. All these increased when security carried guns. But since two variables can move in concert but have absolutely nothing to do with each other, he then used logistic regression, he says.

The regression involved testing a number of variables against the outcome of whether the bank crime resulted in a violent event. Those variables included the presence of an armed security guard, the race and gender of criminal subjects, and whether the financial institution had bullet-resistant glass. The analysis showed the presence of an armed guard was the strongest predictor of violent events, which the FBI defines as a discharge of a firearm, use of another weapon, or an assault or physical altercation. "We also tested to determine if there was a relationship between armed guards and the likelihood of injuries and deaths, but the results were not statistically significant," Gabrielson says.

Page 1 of 2