Semantic Tagging in Published Indexes
"Semantic information … enables publishers to distinguish their content from their competitors,’" explains Bill Kasdorf of Apex CoVantage, organizer/moderator of a preconference seminar on semantic tagging at the Society for Scholarly Publishing’s (SSP) annual conference this May in Boston. "In addition, great progress has been made recently in moving semantics beyond the theoretical: Actual publishers are actually doing it, and they'reactually getting real benefits from it."
Some people would argue that semantic tagging is nothing new. It can be defined as the assigning of selected controlled vocabulary (aka taxonomy) terms, especially by trained indexers, to content items, such as articles, images, or other documents, to reflect the meaning of the content. Human subject indexing is inherently semantic, because human indexers can discern the meaning of content. This has been done by periodical and other database index publishers for decades. Once the domain of large database publishing companies (H.W. Wilson, ProQuest, Gale, EBSCO, etc.), more affordable client/server and desktop software for taxonomy management, indexing, and web database publishing have enabled publishers of all sizes to engage in this form of semantic indexing. Meanwhile, the growing popularity of social tagging has made users more aware of the value of subject terms that reflect the meaning of a piece of content in comparison-free text word/phrase search.
Nevertheless, there are publishers that consider semantic tagging to be something more than mere controlled vocabulary-based human indexing; they are pursuing new techniques. This was evident in the participation in the SSP Boston conference’s semantic tagging seminar, Say What You Mean: How Semantic Tagging Makes Content More Discoverable, More Useful, and More Valuable.
One way that semantic indexing is distinguished from traditional subject indexing of documents is that it focuses on concepts rather than the documents as a whole. Panel presenter Stephen Rhind-Tutt, president of Alexander Street Press, LLC, explained that semantic indexing can answer complex questions of who, what, and when, such as "What battles during the Civil War resulted in more than 1,000 deaths?" Regular indexing merely answers the question "What documents discuss this battle?"
Specialized and multilevel facets (or metadata, depending on your perspective) of controlled vocabularies can be implemented to support semantically complex user queries, as done by humanities publisher Alexander Street Press. Its database of theatrical plays is indexed by the top-level facets, including playwright data, theater data, specific production data, theater company information, character characteristics, scene data, and play text data. Its Early Encounters in North America history database has nine controlled vocabularies, including author, source, year, place environment, flora, fauna, encounter, people, personal event, and cultural event. Setting up the controlled vocabulary and facets requires one to "go into the data and ask ‘what are the latent semantic issues that will be asked’ … This needs to be discipline specific," according to Rhind-Tutt. Finally, the content searched with faceted taxonomies and supporting interfaces needs to be sufficiently structured with metadata, tagging, or indexing that precisely captures each subject in its appropriate facet.
Another way that semantic indexing is distinguished from traditional subject indexing of documents is that it focuses on pieces of content at a finer, granular level rather than the documents as a whole. This is an approach taken by
medical research database developer Silverchair, as explained by its CTO Jake Zarnegar: "We apply semantic tags at any change of topic or concept in the data at any level—including articles, sections, paragraphs, tables, figures, equations, sidebars, videos, etc. Many taxonomic tagging systems deal with the entire data entity as one unit." Using its internally developed TOTEM taxonomy management platform, Silverchair inserts taxonomy tags into the XML content. According to Zarnegar, "Tagging should be done at the smallest ‘atomic’ level that can stand on its own if taken out." Whether the original source is a book, article, or pamphlet, subject indexing is often done to the paragraph level.