XML: Everybody's (Finally) Doing It

The long awaited use of XML in office suites has arrived. OpenOffice was the first to migrate to XML, StarOffice 8 provides an extra layer of support for OpenOffice, Corel WordPerfect was an early XML adopter and will soon import/export to other XML office suites, and Microsoft Office 2007 is built on XML. OK, the future has arrived. Now what?

It took nearly 10 years for this XML transformation to occur, and adoption rates for the new office suites are slow but inescapable. Some large organizations, including government agencies, are deciding to wait before upgrading to Office 2007 due to the better, but completely new, interfaces and formats. OpenOffice and StarOffice mindshare is increasing, with customers mainly at the university and government levels.

That said, two major XML definitions are assured to survive: OpenOffice/StarOffice’s OpenDocument format (ODF) and Office Open XML (OOXML). The benefits of this move to XML are only slowly emerging, which may partially explain the slow adoption rates. Having reviewed both StarOffice and Office 2007 recently, I can see some obvious advantages to both products and to their use of XML generally. To see what leading vendors think, I asked two of them for their thoughts. Here are my views and those of two XML product vendors:

There are both near-term and strategic benefits to the use of XML in office documents, whether ODF or OOXML. By packaging document components in zip files, graphics, and text content are separated, providing an opportunity to fix corrupted documents. In earlier versions of MS Office, if a document's graphic became corrupt, the documents--Word, PowerPoint, or even Excel--often became useless. In the new scheme, you can simply replace the corrupt graphic. XML is also justly criticized for its verbosity. All those angle brackets and starting and ending tags cause file bloat. Zip technology used by both suites compress their XML documents to about half the size of proprietary binary formats. Even with the ever-decreasing cost of storage for files and backups, a 50% reduction is huge, and it also reduces the time it takes to back up these files.

In the longer term, openly published and standardized XML formats reduce and maybe eliminate the risk of being able to use older documents due to format changes. ODF has the gold-standard approvals of the ISO and IEC standards bodies. OOXML became an ECMA standard in late 2006, a step towards ISO approval. Another strategic consideration is the comparative richness of the two XML formats themselves.

Microsoft's OOXML is essentially presentational. That is, OOXML primarily expresses the look-and-feel of Word, PowerPoint, and Excel documents. However, ODF leverages richer XML standards for graphics and forms, an option unavailable to Microsoft since compatibility with earlier office products was essential. Since ODF started with a clean slate and a strong bias toward XML standards, ODF incorporates SVG and XForms provide additional meaning unavailable in look-and-feel formats. An SVG object (like "sprinkler head") can be searched, identified, and counted in a drawing, and graphics can be analyzed and transformed with other XML standards like XQuery or XSLT. Form fields can be validated against their types and even be pre-populated with appropriate values. Since ODF is a richer standard, transfer between ODF and OOXML will work best from ODF to OOXML, just as you can downsample a printable graphic for web delivery, but not the reverse. Lastly, when search engines become Office XML-aware, they will be able to search for information within office XML elements such as captions or titles, and even provide alternative views of search results. These powerful capabilities will eventually be common as vendors support XML standards. That is a slow process, especially for XForms and SVG.

What do the vendors themselves say? Altova CEO Alexander Falk says that the new features of XML Spy will allow developers to "extract, edit, query and transform XML data. This provides huge advantages to business people and application developers." Falk believes that waiting for an ideal world using richer XML features like DITA or DocBook misses the larger point: The bulk of the world's information is in basic office documents. John Kreisa, director of product marketing at Mark Logic, asserts that central XML repositories of Open Office and Office 2007 documents will provide opportunities for sophisticated analyses such as term frequency and understanding "relationships within the content like citation analysis between articles." (For detailed responses from these vendors, visit my blog at http://contentcurmudgeon.blogspot.com.)

The world's document repositories are slowly but steadily migrating towards XML. In response, vendors across the content management spectrum are enhancing their products significantly to exploit the benefits of XML.