Content Creators and Algorithmic Opacity


BEST PRACTICES SERIES

Big Data and journalism are becoming inextricably linked. Even in the early days of online research databases, newspapers were using data such as “X number of mentions in LexisNexis” as evidence regarding various phenomena. In these articles, thorough explanations were given to detail the use of word counts and their meaning. Currently, we seem to be embarking on the slippery slope of presenting statistics and data analysis as solid evidence, sans clarification or context. Even Nate Silver of the FiveThirtyEight website, after putting Donald Trump’s chances of becoming the Republican nominee at 12% to 13% back in January 2016, waited to clarify the rationale behind the prediction. To some, a U.S. presidential election feels life-altering. That is literally the case when data are being run through algorithms and used to make decisions, such as mortgage qualification or university admittance. 

Cathy O’Neil’s book, Weapons of Math Destruction, takes a look at data science initiatives used by banks, human resources departments, law enforcement, school systems, and more. She offers important warnings about relying on data without conducting further investigation. O’Neil’s analysis of criminal recidivism models—which are used by 24 states and are relied on by courts in sentencing—offers one of the starkest examples of the dangers of mathematical models and how they have “encoded human prejudice, misunderstanding, and bias into the software systems that increasingly managed our lives.” Any type of early involvement with police, along with other factors—such as criminal records of friends and family, exposure to drugs or alcohol, and neighborhood crime rates—can lead to higher assessment scores, resulting in longer sentencing. Individuals subject to exceptionally harsh treatment due to this criminal assessment scoring will find themselves in the same situation as teachers who are fired due to rigid scoring criteria and credit card users whose frequent visits to certain stores result in their spending limits being lowered. 

One of the most dangerous characteristics of these algorithms, according to O’Neil, is their opacity, which further complicates the situation. We are often unaware that computer programs are determining our future. O’Neil believes there are several steps that can be taken to use Big Data for good. They are directed toward data scientists, but are applicable to journalists as well, and include activities such as measuring the impact of programming results, conducting algorithmic audits, and taking a Hippocratic-like oath that “focuses on the possible misuses and misinterpretations” of mathematical models.

The nature of journalism lends itself to clarifying the opaque. This makes content creators the ideal practitioners to analyze the results of algorithms and shed light on how they operate. Journalists are likely to exercise a healthy skepticism when considering the research findings they are reporting on, explaining thoroughly how conclusions were reached and detailing the likelihood of far-reaching consequences for those whose data was manipulated. 

Journalists can also enlist the public’s help in striking the balance between data science and the public good. The Moral Machine is an online platform that seeks to gather opinions from people all over the world regarding ethical decisions made by algorithmic models. For example, should a self-driving car kill two passengers or five pedestrians? An article in The New York Times (“Whose Life Should Your Car Save?”) discussed a study in the journal Science, which found that a majority of respondents felt cars should minimize the overall amount of casualties, but also indicated they would prefer to buy a self-protective car. This raises questions, such as, should the government monitor and enforce algorithms? How will corporations respond to the will of the consumer? We need help in considering not only the technical but also the philosophical challenges of data science. As O’Neil says, “The technology already exists. It’s only the will we’re lacking.”  


Related Articles

Given their reliance on statistics, data scientists were stunned when Donald Trump won the U.S. presidential election. Did the faulty numbers and broken algorithms signal the death of Big Data? Or did they underscore the fact that this is a nascent field? The Monday-morning quarterbacking continues, but certain factors may lead to better best practices and a more conscientious climate. At a minimum, the need for higher quality data, sharper code writing, and more contextual, nuanced analysis is apparent.