At Your Service Architecture
The ill-defined world of Web Services will surely drown publishers in an alphabet soup of open standard protocols at the heart of most plug-and-play systems. Technologies like SOAP (Simple Object Access Protocol) allow highly interactive information exchanges between sites, and the XQuery search language can render from well-tagged XML data specific sections and subsets of documents. Others like REST and XML-RPC are what Abbott calls "protocols that point to the future," although sifting through the choices will be daunting. "They have to get their head around what kind of formats make documents more intelligent and capable of wider use," says O'Reilly, and in most cases that means starting, as he did, with an ambitious program of densely tagging XML.
"XML is the method of choice for Web sites that wish to make their data available to the Web," and it has been critical to the successes of open content sites like Flickr and Blogger.com," says Abbott. "It seems the most portable and malleable." Open standards have multiple positive business effects, says David Spenhoff, VP of marketing at MarkLogic, which makes the XML database server behind O'Reilly's Safari U. Free from proprietary CMS approaches, XML-based approaches become interoperable with other sites and databases; publishers are not locked into specific vendors and don't have to waste time developing content models or query languages, let alone build custom point-to-point connections between partners that need to be rebuilt whenever one aspect of each system changes. Most important, open standards future-proof content by changing slowly and in unison. Best of all, "you don't have to know all the places the content needs to go," says Spenhoff. Tagged XML can be pulled, parsed, queried, and delivered endlessly among sites and to just about any imaginable device.
More than liberating data to flow more freely, open source and intelligent document protocols may arm publishers against a looming war against the commoditization of content by the proliferation of ad-supported search. At the New England Journal of Medicine site, doctors used to use Google-like search algorithms on case records from Massachusetts General Hospital. The unwieldy reams of full-document results required manual browsing to find cases directly related to their current patient. By re-engineering the Web site recently into a well-tagged XML base, an XQuery style search now provides a sense array of possible filters up front so queries can render only pieces of documents, or cases with specific citations and diagnoses, or with links to MRI scans.
Efforts like the NEJM "are not about bringing specific documents, but information contained in documents," says MarkLogic's Spenhoff, which provides the XQuery tools for the project. Now doctors can run sophisticated queries and assemble dossiers of cases joined by common citations or symptoms. Google and Yahoo! can't even approach such value-added results, and that is exactly how publishers want to keep it.
"We're hedging our bets against Google itself and against content becoming a commodity," says Corey Podolsky, director of business tech services at Oxford University Press. The OUP's upcoming "Oxford Publishing Platform" converts its massive library into XML and rich meta-tagging to provide digital subscriptions to libraries. Search engines now deliver so much free information, including digitized print product, that even venerable brands feel pressured. "Publishers have to find ways to add value to the process," he says, "and one of the most significant things we can do is add the context around the content, add the metadata and the environment that allows them to find and engage the content rather than a simple search." Ultimately, Oxford foresees mixed and matched document pieces that link to related data and multimedia, timelines that make sense of the result, historical background—all triggered by metatags and honed databases a search engine can't touch, he argues.
For O'Reilly Media, an XML-based service approach lets subscribers to its Safari online book library do data collection, rather than search. "It can show me all the code snippets with specific functions in any book with Java or Perl as the title," says Rayhill. "You can't do that in Google Print." In a service-oriented world, publishers hope that the content itself represents only one kind of value. How content gets parsed and reassembled by users, pushed into other applications, even combined and published with user-generated additions will represent the critical extra layer of value. According to Rayhill, "It allows us to have a leg up on search."