AI: The Good News and the Fake News


BEST PRACTICES SERIES

A previous installment of this column asked, “Can Machine Learning Help Fight Fake News?” The 2017 column examined why automated fact-checking is such a complex problem. In the interim, the malaise of misinformation has multiplied. But recent advances in machine learning have added new dimensions to the problem.

As you know, success in machine learning projects comes from having reliable data to train the algorithms, but, alas: Data is usually in short supply. This motivated researchers to consider using AI to generate the data; in the last 5 years, we have seen dramatic improvements in deep learning approaches called generative models. There are several types of generative models, and one of them—generative adversarial networks (GANs)—has become quite sophisticated in generating synthetic data from existing input data.

Simply put, GANs are a set of two artificial neural networks—generator and discriminator—pitted against each other. The generator creates the synthetic data while the discriminator tries to determine whether the data is original or fake. The generator and discriminator try to outwit each other in this cat-and-mouse dance; in the process, the quality of the synthetic data keeps getting better and better, such that humans often can’t distinguish it from real data. 

GANs enable more efficient training of AI models, and there are also several other useful applications. There are tools that incorporate different GAN architectures; overall, they are pretty good at creating realistic images. They are also getting better at approximating human text, speech, and video. You can generate high-res images from low-res versions, preserve the voice of patients with speech conditions, and customize text-to-speech tools. That’s the good news.

AI is a dual-use technology, which means you can use it for good or for bad purposes. GANs exemplify this dichotomy. Synthetic data is also fake data. That’s a feature, but it can also be a bug. Tools that use GAN techniques are not only widely available, but they can also significantly reduce the cost and effort required to create fake media artifacts (or manipulate existing ones)—the so-called deep fakes.

Deep-fake images and videos can be used to malign reputations of private people and public figures. All of a sudden, we are staring at a disinformation dystopia at a scale we’ve not seen before. If we can’t be sure that seeing is believing, what happens to our trust in information and institutions? As citizens and organizations, how do we protect ourselves from deep-fake attacks? Unfortunately, there is no silver bullet. Combating deep fakes goes beyond technology and requires social and institutional responses.

We can use AI techniques, such as the previously mentioned discriminator models, to try to detect whether data was created using generative models or not. But that’s obviously not foolproof and relies on detection models staying a step ahead of generator models. For each content type, creating a set of benchmarks for generator and discriminator models can be very helpful.

We also need to examine metadata and details such as user activity and location to, perhaps, help us arrive at a confidence measure for the content. Content distribution platforms and social media networks can display details related to the provenance of the images and videos. As users and citizens, we will have to change some of our current trust behaviors and mental models as well.

As the threat of deep fakes grows, expect veracity and authentication mechanisms to become part of content and media platforms. Perhaps, as blockchain technology matures, it will become a system of truth for certain types of content. I won’t be surprised if content forensics emerges as a new segment in the digital content industry.

The fight against fake news that is fueled by deep-fake content is a rapidly evolving field with high stakes. We can expect several companies to take a stab at solving this challenge. If you have an innovative approach and succeed in solving this problem, I’d wager that you’ll find yourself on top of the competition.


Related Articles

Discovery models (such as text search, voice search, visual search, chatbots, natural language search, and faceted search) may overlap in some areas of functionality, but also enhance different use cases and appeal to different types of users.
RPA is an imaginative acronym that alludes to robots but the reality is more prosaic. RPA robots are intended to mimic tasks performed by humans on their computers after they are trained or programmed. Training these "bots" involves showing them the steps of a workflow or business process such as opening a website, keying in data and clicking through screens.
AI systems are not infallible. Sometimes, errors can seem harmless, but the stakes can be high in scenarios such as hiring, loan grants, policing, and criminal justice.
Reporting is easy when the issues surrounding the information are basically straightforward. But recent extenuating circumstances—in particular, the conflicting viewpoints surrounding hate crimes, climate change, and vaccinations—have flipped the script.