The Siren Song of Structure: Heeding the Call of Reusability

Page 2 of 3

The Cultural Dimension
Regardless of how you type and chunk your content, recognize that you are almost assuredly adding more work for content owners. Says Ryan of Stellent, "It's a mixed bag: you want structure as the administrators, but end-users want things as simple as possible." Companies that underestimate this dilemma often face major dissent when they try to role out new systems on a pre-existing organizational culture.

Internal culture clashes within digital publishing present an old problem, but some new twists are emerging. First of all, companies of all stripes are increasingly identifying their content as a key organizational asset in an era of high staff turnover and new channels to communicate with customers—and therefore realize that this content needs to be managed in a way that preserves meaning and value. This implies transitioning staff members away from the traditional standalone page and file metaphors that have dominated electronic publishing to date.

Technologies out Front
The frontline in the battle for structure are emerging technologies, which increasingly enable non-technical content owners to work in familiar applications while still applying the tags or delimiters necessary to identify structural elements. The three most common approaches to maintaining structure at the desktop involve integrating content management applications with different authoring environments, including: Word publishing software; XML editors, and browser-based forms input.

In a Word
The authoring environment of choice in the typical corporation today is Microsoft Word. Of course, Word is not—nor was meant to be—a structured authoring environment. Left to their own devices, most authors naturally break apart elements in a word processing document using formatting options (bold, fonts, alignment, etc.) instead. CMS vendors have therefore expended substantial effort finding clever ways to integrate Word into their systems.

Like several of its competitors, CMS vendor divine provides a plug-in for Word that will look for "styles" in a Word file and translate that content to a particular elemental snippet in the CMS server according to a mapping format that you determine in advance according to content type. Word users pre-select the content type and are presented with a set of styles (in the form of a Word stylesheet) associated with that type to apply to the document. These styles still provide visual formatting, but more importantly, they serve as structural delimiters. Need to isolate a byline to an article? Label it that way with the relevant style from the pull-down menu.

Upon "check-in" to the CMS, the document validates itself against the relevant XML type and gets rejected if the user has somehow missed or misapplied an essential style tag.

CMS vendor Stellent also enables you to use Word styles to delimit discrete elements in your documents. But Stellent takes this a step further with an inference engine that tries to interpret your Word formatting and automatically convert sections to suitable elements. That is, if the first line in a document is bold and centered, then it's probably a title. According to Stellent's Ryan "It's not 100% accurate, obviously, because people can do weird things with formatting. So it's best to get people to use styles, but we can do 80% accurate mapping."

Other CMS vendors, like Interwoven, Documentum, and Vignette, offer Word integration facilities as well.

This approach has its detractors. "It's a holy grail for the industry to make it easy to use Word without thinking about the structure," says XML guru Paul Prescod of consulting firm Constant Revolution. Prescod recounts helping frustrated users confronted with myriad errors upon trying to check documents that they had worked on for days into their CMS repository. Also, many document structures don't fit Word's stylesheet paradigm, especially those with many internal links, or myriad nested elements. "It seems great at first," argues Prescod, "but in the long run, it drives up your costs and frustration."

And then there is the challenge of getting users to apply proper styles to their text. Authors can have a tendency to choose the style element based on how they want the content to appear, rather than a proper descriptive label. Rob Page, CEO of CMS integrator Zope Corporation, throws down a gauntlet, "I challenge anyone to find an organization of greater than one person where more than 80% of their documents are reliably styled." Since word processing software has created an environment where presentation is purposely used to represent structural elements, this argument goes, any attempt to retrofit structure into Word is bound to fail. "We know from OCR and speech-to-text software that accuracy less than 70-80% is less than usable," argues Page.

Nevertheless, there are firms effectively transforming Word documents—after expending sufficient effort to get it right. At Network World, Gaffin's team has had to work with end-users and the CMS system itself to revise the rules and error checkingto improve usability. "We spend a lot of time with Percussion figuring out what will set off alarms," Gaffin explains. But Network World relies on human intervention as well, by incorporating Web editors into its workflow to (among other tasks) review that Word docs are structured properly before adding them into the repository.

As Easy as XML
If structure is very important to your firm, you could invest in desktop XML editors for authoring in lieu of word processing programs. Sample products include XMetal from Corel and Epic from Arbortext.

XML editors have been quickly adopted by developers and technical writers—groups typically comprised of mark-up veterans who find the latest generation of tools a welcome relief from cumbersome SGML editors. But most corporations have been resistant to introducing XML editors for workaday business users, and indeed, you should not assume that casual authors are "lite" versions of technical writers. Adam Gaffin declares that, "no one is going to sit there in my newsroom with an XML editor and add content."

Part of the problem lies in the complexity of XML editors. RSI's Bos explains, "XML editors are optimized for complicated manuals that typically don't exist in corporate environment beyond a technical publications department," and therefore are not very user-friendly out of the box. Even if your IT department lets you, Bos continues, "it's dangerous to drop in a desktop editing tool and assume you're done." Content management consultant Karl Fast adds, "If business users don't understand structure in Word, a tool they may have used every day for a decade, how can we expect them to understand it in native XML?"

So XML editors can be a tough sell, but perhaps not prohibitively so. "It depends on how much money you can save by moving to XML," says Paul Prescod, "and some authors may look forward to an environment more customized for them, but it's still really difficult to convince a marketing department to switch."

Browsing for an Editor
As an alternative to XML editors on the desktop, Prescod and others advocate the use of browser-based content authoring using form fields. This approach keeps XML tags or database structures hidden, but allows you to differentiate input elements through separate fields in an input form. You can also enable users to employ WYSIWYG formatting effects through Java and ActiveX widgets that recreate a word processor-like feel in the browser.

As a practical matter, if the author initially drafts the content in Word, she has to copy-and-paste into the relevant fields of the browser form. According to Zope's Rob Page, "this is the price you pay for a great system." He advocates developing interfaces that replicate Word functionality, then try to wean authors from desktop word processing software entirely.

Of course, weaning authors from existing desktop software may not be easy. And the more you make the browser look like Word, the more you risk enabling users to employ formatting to delimit structural elements, potentially inviting a messy mix of XML and HTML. But, Phil Suh of concludes, "much as I dislike the browser-based interface for content management, it's here to stay because, sadly, it's the best alternative."

Page 2 of 3