Question What You Read—With XQuery

Not long ago, talking to invisible people was a sign of madness. These days, however, it's not uncommon to see a shopper with a Bluetooth cell phone earpiece talking alone in the supermarket. The next big thing may be talking back to your books.

It seems like an eternity since the initial promises of XML, and many have faded from memory. Remember any? Pay the considerable cost of using XML to structure your documents, and they would pay you back by providing ways to convert, reuse, reassemble, or analyze them. Invest in structure now; get dividends soon. It's been a long wait, but interacting with documents may be just around the corner.

It turned out that simple document conversions—transformations like the "printer friendly" link on a Web page—were easy without XML. Reusing the content, like boilerplates common to marketing brochures and technical manuals, however, was less beneficial. Such reuse remains useful but is turning out to be of only limited applicability. Most documents don't reuse chunks of information because that information changes too quickly. Besides, technical documentation requires a discipline beyond the skills or inclinations of most authors. By contrast, the office documents we churn out daily are laissez-faire, always an easy sell.

So should we hang up our XML hats, concede that XML is a great tool for Web applications, and forget about content-centric applications other than RSS or technical documentation? If we invest a lot of effort developing a cookbook authored in XML, will it ever be easy to query the cookbook and say, "Create a special dessert edition with recipes that use chocolate chips"? If this example of reuse could become a mainstream business process, imagine the revenue streams for publishers like Amazon, with its 49-cent ebooks. And consider the power of analyzing libraries of office documents the way we mine data in structured data warehouses.

Don't give up this XML dream. There is momentum—developers a quarter million strong—behind an XML query standard, whose roots date back to 1998, the same year that the XML standard itself was released. In Web-years, XQuery has taken forever, and as of this writing the best guess is a mid-2006 approval as a W3C "Recommendation." Still, I've seen enough vendor interest and products claiming practical use of XQuery today that I contacted some vendors to sift the hype from the reality. I see signs that XQuery is already providing applications like those I mentioned above, and many more. "If only there were more XML documents to begin with," you say? Here's where XQuery gets even more exciting. The standard has been nurtured by authors who are very familiar with both XML and conventional data structures including the simple comma-separated value, or "CSV," format available as an export from Microsoft Excel and other data sources. As a result, you can use XQuery even on non-XML, structured data. On the XML side, Microsoft Office is long-due for an upgrade, and in 2006 Microsoft is hinting at significantly enhanced support for XML. If Microsoft can bridge the gap between ordinary office documents and XML, we will have not only structured information outside XML but also plenty of XML documents to analyze. Using XQuery for our "document warehouses" could become as common as querying or transforming information in our data warehouses.

I interviewed several XQuery tool vendors, and here is a sampling of what I learned: I asked why anyone should invest dollars and mental energy in a standard that is still not finished. Here's what Tim Matthews, president and co-founder of Ipedo, an enterprise information integration product vendor, said: "The version of XQuery that is in last call now is not likely to change much going forward. And in our product, we make sure to make things forward-compatible. Again, looking at the SQL world, there are always additions to the standard." SQL is the "XQuery" of databases.

I asked the same question of representatives at Data Direct, a subsidiary of Progress Software (which developed Stylus Studio software and Data Direct's XQuery for Java). Larry Kim, XML product manager at Data Direct, demonstrated how to use Stylus Studio's XQuery to analyze CSV Web logs and ferret out the name of a site and its users trying to access a Web server. Larry also pointed out that not only was XQuery a very compact language, but it could be applied to groups of XML files, different from the usual file-at-a-time orientation of other XML standards. Jonathan Robie, Data Direct's XML program manager and editor of several XQuery-related standards at the W3C, repeated this theme: "As one of the designers of XQuery, I felt it was important to allow queries to work on both XML and other data sources, in any environment."

So start thinking of questions to ask your books, your data warehouse, and your content repositories. Ask what XQuery can do for you.