XML Across the Publishing Lifecycle: Tools & Strategies to Promote Success

Page 1 of 2


A funny thing happened on the way to the Web. XML, designed as a Web standard to replace HTML, was intended to give every Web page the ability to define its own tags, breaking free of the Procrustean bed of presentation-only markup. Since the release of the XML standard in February 1998, most XML applications have instead been for data interchange and ecommerce.

That may be changing. What barriers must you overcome to succeed in XML content publishing? Let's start with a metaphor. Imagine that you want to build a new XML house. Having heard about the benefits of XML construction, you call up an architect and are told: "Using XML construction has benefits you can't achieve with any other form of construction. You're wise to get an early start. However, there are a few issues you should know about. We can buy the raw materials, but you will of course have to modify them." You reply: "Modify the materials, you mean like make my own nails?"

"Why yes. One of the many advantages of XML is that you can customize it. In fact, you must do so. But don't worry, there are lots of tools available to help with this customization."

If this is how they built houses, we'd all be living in tents. Setting up a realistic XML publishing system isn't simple, but today's standards and vendor offerings are making things easier. Although still more complicated than a trip to Warehouse Building Supplies, you no longer have to make your own nails. XML tools, standards, and vocabularies are all maturing, and publishers are making broad commitments not only to XML, but to content management system infrastructures for streamlining the entire publishing lifecycle.

Think of this lifecycle as starting with document model analysis and design, onward toward authoring and editing, then assembling content chunks to publish products. The cycle is then repeated as the designs evolve and the content is revised- with this caveat: Nearly everyone's content is "legacy," in a format other than XML. Converting that content to XML is not trivial. Moreover, the lifecycle is so complex that most organizations choose some automated system to manage it.

What's The Big Deal?
Let's begin by asking, "Why change your present system at all?" You could be noticing XML on competitive Web sites or in RFPs. Perhaps you notice it's getting harder to support your content's proprietary formats, or you can't derive products you need from them. Maybe the process of managing content through your whole lifecycle has become a hassle that only a few of your top talent understands. These warning signs suggest it might be time to upgrade your content to XML. After you've succeeded, these benefits can accrue:

  • Reduce your dependence on proprietary formats
  • Create many products from single-source XML assemblies
  • Achieve early strategic competitive advantage by implementing upcoming standards, tools, and technologies
  • Start working more closely with your customers and suppliers who already use XML

XML may not be for you. Proprietary authoring tools like Word and FrameMaker can provide limited cross-media outputs. If those suffice, perhaps you can avoid XML or postpone the move. However, if you decide to implement XML, consider non-technical issues too, such as your corporate culture. Does it resist change, or is there a grass-roots desire to modernize? Do you prefer gradualist approaches-piloting one part of the lifecycle before moving on to the whole? Or do you want to implement comprehensively? In both cases, you'll need to evaluate competing vendors and follow standard vendor selection procedures. You'll also need to decide just how to balance the use of in-house staff with vendor consultants. Consultants achieve quick but expensive re- sults; in-house expertise is available over the long haul.

Analysing, Designing & Applying Your XML Vocabulary
There's nothing magical about XML vocabularies, also called document models, DTDs or Schemas. These are simply sets of rules defining the structures found in your content. You can use existing models designed by industry groups (like DocBook for technical information), or you can create your own. Two excellent, inexpensive modeling tools are Altova's XML Spy and Tibco's Turbo XML. Spy is designed for Windows users; Turbo XML is written in multiplatform Java. Both can do much more than just model building; they can help convert content, produce graphical model views, and document the models. Just as we remodel houses, our XML models evolve too, especially after we apply them to all legacy content. Modeling and converting content is often an iterative process. Even if you hire a consultant to develop your initial model, investing in a modeling tool will help you evolve the model as necessary.

Just as we remodel houses, our XML models evolve too, especially after we apply them to all legacy content. Modeling and converting content is often an iterative process. Even if you hire a consultant to develop your initial model, investing in a modeling tool will help you evolve the model as necessary.

Tradeoffs: If your environment includes a mix of Macintosh, UNIX, and Windows systems, Java-based tools may work on all three. This reduces your total cost of ownership, including training. However, these tools are not tuned to any one platform and will run slower than tools designed for just one operating system.

Developing your model and converting your content to XML requires a deft combination of content analysis, strategic thinking, and business planning. Does the structure of your content fall into one or more categories, each covered by the model(s) you developed? Do those categories satisfy your anticipated product requirements? If your content already exists as HTML or can be easily converted to HTML, a freely available tool called HTML Tidy can help convert that to XHTML as a first step to your final conversion. XHTML itself can begin providing immediate benefits as input to ebooks and as a preparation for delivery to wireless devices. If you use Acrobat, Adobe and Texterity are developing tools to convert PDF to XML.

If the prospect of conversion with existing staff makes you uneasy, consider a vendor experienced in both model design and conversion to do the job. Costs start at around $10 per standard page, depending on your content and your XML model(s). Will you run legacy and new systems in parallel for a while? If so, be prepared to capture all legacy changes and apply them to your XML system.

Tradeoffs: Think of Tidy as a free, but 80% solution. Someone will need to review every "tidied" file. The resulting valid XML will still require additional model information.

Authoring and Editing XML Documents
Selecting an XML authoring tool requires a fundamental choice: Do you want a native XML tool like SoftQuad XMetaL or Arbortext Epic that works with XML directly, or do you prefer hybrid tools that add XML awareness to a word processor or publishing tool your organization already uses?

Examples of hybrid tools include BroadVision's One-to-One Publishing system with its customized version of Microsoft Word, Adobe's FrameMaker+ SGML, WorX SE, and S4/TEXT. Although using Word to create XML is appealing, it's interesting to note that Microsoft Corporation itself purchased a site-license to XMetaL for its in-house XML authoring needs. The more a tool looks and feels like a word processor, the more it is likely to depend on word processor styles. Styles, however, are notoriously free form, and ensuring their strict use to generate XML is challenging. Word, for example, will let you insert styles virtually anywhere in a document; XML is very strict about which elements are valid in each context. Thus, getting from a loosely styled word processing document to a validly structured XML document is not an easy achievement. If you insist on an MS Word solution, consider eXtyles, a tool designed to help editorial and production personnel clean up MS Word files so they stand a better chance of yielding valid XML.

One choice for having your word processor and getting valid XML, too, might be Corel WordPerfect. Now available as WordPerfect Office 2002 Professional Edition, WordPerfect has nearly 10 years' experience supporting XML (and SGML) natively. As with hybrid tools, some setup will be required, but the result might combine the best of word processing and XML.

Tradeoffs: Hybrid XML authoring tools are initially more familiar than native tools, so selecting them may reduce (not eliminate) training. However, be very wary about support costs. Ask for customer references. Also ask for proof that your legacy word processing content can easily be imported or converted to XML based on their use of styles. Lastly, remember that every time your DTD changes you'll need programming assistance to upgrade these hybrid tools and possibly your content, too.

Page 1 of 2