AI-Generated Books: Blueprint for the Future?


AI leadership in Europe is lagging behind the U.S. and China, according to a 2018 report from the European Commission. So at the beginning of April, I was pleased to read that scientific publisher Springer Nature had produced what it claimed was the first machine-generated academic book.

The prototype book, Lithium-Ion Batteries: A Machine-Generated Summary of Current Research, gives an overview of the latest research on lithium-ion batteries. It aims to help researchers manage information overload in the rapidly growing field by summarizing more than 150 research articles published between 2016 and 2018. In a press release, Springer Nature describes the content as “a cross-corpus auto-summarization of a large number of current research articles.” It was automatically compiled by an algorithm developed in collaboration with the Applied Computational Linguistics lab at Goethe University in Frankfurt.

How does the creation process work? The algorithm, dubbed Beta Writer, selects and processes relevant publications from Springer Nature’s content platform SpringerLink. It uses a similarity-based clustering routine to organize the source material into chapters and sections, creating summaries of the articles. Hyperlinks to the extracted material allow readers to refer back to the original source documents if they wish.

Henning Schoenenberger, Springer Nature’s director of product data and metadata manager, says in a press release, “research articles and books written by researchers and authors will continue to play a crucial role in scientific publishing,” but the company foresees many different types of content creation in the future, including entirely human-generated content, blended man-machine text generation, and material that is completely machine-generated.

Springer Nature has been upfront about its desire to use the project to initiate discussion about the opportunities, challenges, and limitations of the technology. With this in mind, it was decided not to manually copy edit the text before publication. A quick glance through the text suggests that, as Schoenenberger notes in the book’s introduction, the process of summarization is still imperfect, and “paraphrased texts, syntax and phrase association still seem clunky at times.” Then again, the purpose of this sort of publication is to succinctly and accurately convey a comprehensive range of information—elegant phrasing probably isn’t the main concern.

At first glance, it’s hard to imagine that Beta Writer will turn its hand to creating compelling literature anytime soon. But that said, AI algorithms are already being put to work to automate the production of journalism, converting data into news stories. Topics with frequent data updates (such as company earnings reports or weather updates) particularly lend themselves to this approach. Bloomberg News, for example, uses its Cyborg automated program to generate financial news stories—reportedly, about one-third of Bloomberg News is produced using automated technology.

Speaking at a 2017 conference in Texas, Joey Marburger, The Washington Post’s director of product, talked about how the newspaper was using almost 100 different bots. Perhaps the most well-known is Heliograf, “an intelligent, automated storytelling agent” that generates stories from real-time data sources, enabling the delivery of channel-specific and personalized stories, described by Marburger as “a bot … helping us do better journalism.”

And on a smaller scale, news providers can put AI to work not just to generate content, but as a productivity tool to assist journalists in their day-to-day activities. The Italian newspaper Il Secolo XIX is using an AI-powered digital assistant in its newsroom. It uses algorithms to analyze and classify content in real-time. When a journalist starts writing a story, the digital assistant checks the text for data consistency and suggests links to other internal and external sources.

There are already plenty of examples of AI creating content. But as Schoenenberger notes, machine-generated research text may become an entirely new kind of content with specific features not yet foreseen. Although the academic publishing world occupies its own specialized content niche, the wider world of content creation should watch this experiment with interest as it unfolds.   


Related Articles

The EU Copyright Directive, as it's known, is expected to pass into law when it comes to a final vote in January. It will remake the balance of power in copyright law in the European Union.