XML Hits the Big Time--Major Database Players Get into XML

Page 2 of 2


Looking Under the Hood
Database vendors are working to make their products function better with XML. While the products and approaches differ— and the big three all have both new commercial offerings and significant R&D under wraps—the approaches have some things in common.

To varying extents, they all rely on technologies for mapping the XML data to the relational fields, and back again. For example, IBM is developing "Extenders" for its DB2 database that will allow developers to map XML data to DB2 tables, and back again, and Microsoft has a programming facility called SQLXML for mapping and querying between, as the name suggests, SQL and XML. Oracle would argue—and industry analysts would tend to agree—that their mapping technologies are more deeply embedded in the product, especially with Oracle 9i, Revision 2, which is now generally available.

The major vendors all fully support the more stable XML standards, such as the core XML syntax, though they vary in their support of emerging standards. So the different products can parse the XML, at least against a document type definition (DTD), and in some cases against an XML schema. The products can also use XPath to traverse the hierarchical structure of the XML, but many have stopped short of supporting newer, developing standards, such as Xquery and other emerging standards for querying. Ron Schmelzer, senior analyst at ZapThink, follows XML data storage closely, and sees such standard support as critical to differentiating the various product offerings. Whereas relational database systems use SQL for querying, Schmelzer points out that SQL simply doesn't work "as well with the hierarchical nature of XML documents." As a result, said Schmelzer, "a number of initiatives exist to deal with XML-centric data query, insertion, and update operations."

And all of the vendors support emerging programming languages and application programming interfaces (APIs). The big three support Java for database access and connectivity, and emerging APIs for processing XML, such as the Document Object Model (DOM) and the Streaming API for XML (SAX). The emphasis, correctly, seems to be on giving software developers a ready toolkit for accessing, manipulating, retrieving, and updating XML data, and quickly transforming it to other forms—HTML, relational, other forms of XML, and so on.

Integration as a Crutch?
This last point—integration—is a focus for several of the vendors, notably IBM and Microsoft, both of which are heavily invested in marketing software development tools and methodologies. IBM as well has a huge professional services business, a large chunk of which is dedicated to database and XML integration. The rollout of the Web has meant an explosion of database integration and access, and the continued growth of XML will only accelerate this trend.

Oracle's Shimp, among others, would caution that application integration is only part of the problem, that there is underlying and more fundamental data analysis and modeling that needs to be done. In a situation where the data stores have multiplied (often for reasons of expediency), Shimp reasons that simply integrating the various databases may be a "crutch" to avoid the harder work that is being left undone. Indeed, the increased mix of data types—relational, nonrelational; structured, unstructured; and XML especially—have brought a new challenge to organizations. This challenge is to truly analyze all the data and develop a more unified and comprehensive data model. XML isn't so much a new problem, as a new and complex dimension on an existing problem.

The Data Model is the Key
Yet while all organizations would be wise to invest in this kind of comprehensive data modeling, the organization that has a lot of XML data indeed has some unique problems on its hands, and perhaps extra motivation to take a step back and analyze things. By its very nature, XML data is going to be different, and is going to require some different integration and handling. If you have various business databases, and a large store of XML, you likely are going to require at least separate instances of a relational database. For example, you could have all of your business or transactional data in one database, tuned to maximize the performance of that data. Your XML data could then reside in a second relational database that supports XML, such as the products from Oracle, IBM, and Microsoft.

These companies would likely argue that a single-vendor solution is preferable, and that, of course, their solution would be best. However, the reality is that you likely have many data sources already, from different vendors, and will likely live with some of these for some time to come. So while a comprehensive data model and more monolithic solution may be in your future, you will likely still have to knit some things together to create a comprehensive solution, at least for now.

SIDEBAR: Going Native with XML Data Stores
Count ZapThink's Ron Schmelzer among those who think that XML-specific storage solutions have a place in today's enterprise. "Native XML Data Stores—what we call NXDs—are relevant. Lots of big companies are using them," said Schmelzer. And while some technical people, as well as vendors, can be very passionate about one choice of repository over another, Schmelzer looks at the choice with the dispassionate view of an analyst.

"It all comes down to a few simple questions," said Schmelzer. "Where is your XML data coming from? What are you using it for? Do you have options for storing it?" As Schmelzer explained, if you already have a large store of XML, and need to continue to use it as XML, perhaps it is best to be maintained as XML, and not mapped back and forth to a relational database. Conversely, if it is already in a relational database and you can derive XML from that, why not keep it there?

There are some unique characteristics of XML that make it an odd fit for relational databases at times. It can be very hierarchical, full of complex and deep parent-child relationships. Individual fields and records can also be quite long, and thus likely very difficult for a relational database to navigate, query, and manipulate.

Those readers who have been following these technologies know that there are many companies out there already, and many approaches to storage. In their March 2002 Report, XML Data Storage Technologies and Trends, ZapThink identified 14 companies as NXD's, and noted that their architectural approaches included hierarchical, object-oriented, proprietary, and even relational. What is common among them is their focus on storing XML natively, or, as their report states, "without any transformation."

While there are many vendors, two of them seem to have the advantage of both mindshare and marketshare. ZapThink says that Software AG, with their Tamino product, and eXcelon with XIS "account for a large number of total licenses to date." Our own research for this article seemed to bear that out, with these two companies showing the most market presence and referenceable customers. ZapThink also points to XML Global, IXIASoft, and NeoCore as gaining some traction, and I would add Ipedo to that list, as they have had some notable customer wins over the past few quarters.

TABLE: Percentage of Relational Database Management System Licenses Sold for Application
Relational Database Management System49%
File Systems 21%
Web Sites 12%
Content/Document Management 10%
Directories 5%
Semi-Structured Stores 3%
Other 1%
Source: ZapThink

TABLE: Percentage of Relational Database Management System Licenses Sold for Application
2000 2001 2002 2003 2004 2005
Relational 85% 85% 84% 77% 74% 70%
Object-Oriented 15% 13% 11% 8% 6% 5%
XML 0% 2% 5% 15% 20% 25%
Source: ZapThink

TABLE: Percentage of Relational Database Management System Licenses Sold for Application
1999 2000 2001 2002 2003 2004 2005
Database Management System 0 20 66 195 550 1325 2490
NXD 23 69 170 390 720 1100 1580
Total 23 89 236 585 1270 2425 4070
Source: ZapThink

Page 2 of 2