CAS Seeks Formula for Managing Information Overload

Every 2.6 seconds, the CAS (Chemical Abstracts Service) REGISTRY logs a new entry that defines and describes a recently discovered chemical substance. This statistic boggles the mind both for what it says about the pace of progress in scientific research and for what it says about CAS's ability to keep up with that pace.

The CAS REGISTRY is a comprehensive and authoritative source of information on chemicals for the scientific community. Each entry for a chemical substance in the registry is meticulously curated and quality controlled by CAS's scientific staff. Entries include the name, registry number, and literature references for each substance, as well as experimental and predicted property data such as melting and boiling points, commercial availability, preparation details, and regulatory information. It is made available online to scientists and researchers, who pay subscription fees to access it.

In early September, CAS registered its 50 millionth unique substance, a mere 9 months after it had reached the 40 million milestone. By way of contrast, it took the service 33 years to register its first 10 million substances.

Matt Toussant, Ph.D., SVP of editorial operations at CAS, which is a division of the American Chemical Society, has been at the epicenter of this wildly exponential explosion of information, and has sought to make sense of it.

"What we've been seeing over the past five or six years," Toussant says, "is the monetization of chemistry. There has been a real push to describe individual compounds in patents to try to establish some intellectual property protection that may never be commercialized. It's been something of a gold rush."

At CAS's headquarters in Columbus, Ohio, a staff of nearly 400 scientists, all of whom hold a master's degree or a Ph.D. in chemistry, comb through a constant stream of patent applications and scientific journal articles in search of new chemical substances. CAS will review and analyze on the order of 1.2 million published sources every year.

A scientist draws a digital picture of each novel substance in an information source, creating a kind of fingerprint for the chemical. That fingerprint is then automatically compared to CAS's entire database of known chemical compounds to make sure that it is truly a new discovery or creation.

Most new chemical substances are initially registered within 2 days of the publication of the patent with which they are associated. Within a month, a full-value edition of their registry entry is complete.

The amount of information that is processed and synthesized by CAS is staggering. It takes a substantial amount of technological support to manage all of this information effectively.

"The last few years have been associated with a lot of internal development," Toussant says. "Our technology has had to scale up rapidly to keep up with the pace of scientific discovery. We're constantly looking for new technologies-new software, new systems, new server approaches-to help us move faster and evaluate our literature more effectively."

"We put a lot of effort into maintenance so that our process is as perfect as we can make it," Toussant continues. "We don't want to be slogging through all of this information. We can't stay static. One of the keys to our ongoing success is the dynamic flow of our 45 different workflows for entering information in our processing operation."

CAS views efficient information management as an obligation due to the registry's stature as a source of chemical information in the scientific community. Hideaki Chihara, Ph.D., a former president of the Japan Association for International Chemical Information, called CAS a resource "that every scientist in the world relies on either directly or indirectly."

Though the task is daunting, Toussant and the staff of CAS are enthusiastic about their work and game for the challenge of keeping up with the rapidly accelerating pace of scientific discovery. "No other organization catalogs information the way we do," he says. "It's been a 100 year passion. Maybe even a love affair."