Now that XML has moved beyond being the latest cool thing, and is in fact being widely adopted and deployed, some practical questions are being asked about it. But these questions are only starting to be answered. Perhaps the biggest question about XML is, "Now that I've got it, where am I supposed to keep it?" Some of the big database players think they've got the answer. Organizations are replete with storage technologies: relational databases, file servers, and document management systems, to name a few. And, perhaps to no one's surprise, XML data is found in all of these places and more. There are also newer, specialized technologies specifically designed for native XML storage.
Yet, for most organizations, relational databases are the dominant mechanism for storing and managing data. Moreover, there is great concentration in the relational database market (with technology from Oracle, IBM, and Microsoft dominating). Given this concentration of technology and vendors, it's worth looking at what these vendors plan to do about XML. Specifically, it's worth looking at each vendor's flagship database products: Oracle's 9i Database, IBM's DB2, and Microsoft SQL Server.
It's clear why these key players are taking XML seriously: The market for XML storage is a big one. According to the analyst firm ZapThink, the market for XML storage will grow from $75 million in 2000 to over $4.1 billion in 2005. And while the relational database vendors currently consume only 15% of the XML storage market, that percentage will grow to 65% by 2005. That leaves plenty of money for the specialized XML vendors to make, but it also means that the relational databases will be storing plenty of XML for years to come.
XML Versus Relational Data
The distinctions between XML and relational data are by now widely discussed and, for the most part, well-understood. But regardless of what a salesperson may be telling you this week, the differences are fundamental. Relational data is all about tidy rows and columns of well-understood, previously defined chunks of information—like names, addresses, prices, and product codes. People have come to use the word "structured" to refer to relational data, and the term makes sense.
XML data can also be somewhat structured. A set of names and addresses can be represented, perhaps equally well, as both relational data and XML data. But XML has two fundamental differences: 1) XML can embed hierarchies of parent-child relationships in ways that relational data cannot; and 2) XML doesn't care a lick how long or complex a given "field" or "record" is, while relational data is all about how long and complex the fields and records are.
Take the extreme (but not all that unusual) case of a lengthy technical document coded in XML. The entire "record" or XML document can be megabytes in length. It can consist of many parent and child nodes. Thus, an XML document is not likely going to fit neatly into columns and rows. As a result, XML data can be an odd fit in a relational database. So, Oracle, Microsoft, and IBM have been working hard to extend their products to better ingest, store, manage, and manipulate XML data.
To begin with, all of the major vendors have improved on an already available method of storing large chunks of data as a means of better supporting XML. The so-called BLOB (Binary Large Object) space in a relational database can be used to store large XML documents, and the vendors have refined these to differentiate BLOBs from CLOBs (Character Large Objects). Using BLOBs or CLOBs, whole XML documents can be securely moved in and out of a database, and secondary tools, such as an XML parser, can then be used to manipulate the XML as it is moved in and out of the BLOB.
For Robert Shimp, vice president for Oracle 9i Database marketing, the emergence of XML is part of the broader problem enterprises face as the growth and importance of "unstructured data" begins to rival the growth and importance of structured data. "Organizations are looking for a unified view of their data, both structured and unstructured," said Shimp. Moreover, according to Shimp, organizations suffer from a proliferation of too many data sources, many of which are too loosely managed. And this loose proliferation of assets is not good for companies, as it makes it difficult for them to efficiently manage and act on their intellectual capital. "It would be analogous to the CFO of a company handing out $100 of the company's money for each employee to manage," noted Shimp, "with no controls on how each employee would do it."
Solving the "Single Source" Publishing Problem
For organizations that have significant amounts of content, managing XML data becomes even more important. Publishers and others with large content stores are looking to solve the "single source" publishing problem, where they increasingly rely on both XML and structured data to be rendered into HTML, WAP, and other formats—often on-the-fly. Already, such automation could involve tying together many repositories, where a rendered HTML page could be derived from both structured and unstructured sources. In a manufacturing application, this could be a parts catalog where price and inventory data comes from a relational database and the product descriptions come from a document management system. In a magazine publishing application, this could be where the article content originates in a content management system while a related directory listing comes from a relational database.
Creating such a unified view of both unstructured and structured data is precisely where the major vendors see their offerings headed. Oracle, for instance, talks about "unifying...business data...and XML content," and IBM talks about "combining XML...and the power of data integration." And all of them, Microsoft included, are embracing the broader notions of Web services, where content and data are integrated over the Internet, using loosely-coupled components and XML as the all- purpose glue.
Besides the single-source publishing problem, other factors are driving the need for XML storage. ZapThink's research points to the growth in Web services, the increased use of XML for messaging, and the need for improved searching and querying of the XML. Taken together, these drivers suggest a growing need for storage technologies that provide more sophisticated management of XML data.