The Ins and Outs of Content

Mar 07, 2006

How do we create, manage, and publish content? Let me count the ways…

I must admit that I can not begin to count the myriad ways that content is input--created, captured, converted, and automatically ingested into today's content management systems. With multi-channel, multi-format, and multi-lingual publishing, the full matrix of paths through to the system output is daunting, to say the least.

However, there is one thing that all content inputs and outputs appear to have in common today: XML.

XML began life as a markup language (hence the ML). But the X (eXtensible) took it to places that its progenitor, SGML, was never likely to go. XML today has become the dominant tool for data/information exchange, dwarfing the traffic between systems still carried in earlier business-to-business, computer-to-computer protocols like CORBA and D-COM.

Marked-up Differences
For markup, XML shares one characteristic with XML data, the use of custom tags (elements with attributes) that identify different content types and allow them to be processed by the middleman in the picture, the content management system (CMS). But XML for markup differs dramatically from many markup languages of the past, which were primarily markup for formatting and styling the content. To this day, the most common markup remains HTML (HyperText Markup Language), which you may recognize as codes like the <p> tags and <br> that separate paragraphs and line breaks, the <b> tags and <it> tags that bold and italicize text, and the <table> and <li> list tags that arrange the content.

XML and XHTML, the XML-compatible form of HTML, have banished all styles and formats to CSS (Cascaded Style Sheets) and XSL (eXtensible Stylesheet Language). This implements the central CMS mantra of separating content from formatting and layout. And CSS/XSL really excels at layout. For even the simplest (X)HTML/CSS separations, style sheets altered in one place can relocate and restyle all the content elements of an entire Web site.

XHTML and CSS take a good deal of designer expertise for style sheets and a lot of content contributor control in the creation phase. Copy writers must follow a rigorous style guide to get consistent results. It is possible to use familiar HTML tags and have the style sheets reformat them in a uniform way. However, it is even stronger if writers, or formatting editors, who clean up the copy for publication, assign unique identifiers and class attributes that control the style and position of each element in the layout.

High Definition of XML
As a former digital video editor and columnist, I like to draw an analogy between the separation of content and format in HTML and CSS with the separation of luminance (the black and white signal) and chrominance (the color signal) in S-Video (separated video). All the better camcorders and color monitors of the last ten or fifteen years have had greatly improved picture quality as a result of the S-Video separation into two signals.

Now the latest rage in color television is component video, which uses three cables (RGB--red, green, blue) to carry three signals from our DVDs to our new DLP, Plasma, and LCD television sets. Like component video, our best content markup now includes three distinct parts, an XML document, an XML Schema (XSD--XML Schema Document, or DTD--Document Type Definition) that constrains the content creation inputs, and an XSL stylesheet (both XSLT - XSL Transformation, and XSL-FO - XSL Formatting Objects), which processes the many possible publishing outputs.

The practical result is that where HTML is thought by many to be much harder than creating content with a word processor, XML content creation is at least three times as hard as HTML. The qualitative result is that XML content can achieve a very high level of performance, to pursue the television analogy with HDTV, we could call it High Definition Content Management. The quantitative result is that XML can process complex workflows that move content through all those many inputs to the multi-channel outputs with great speed and efficiency.

For a sophisticated developer, any text editor can be used to create XML documents, XSD Schemas, and XSL stylesheets. Among the most popular are HomeSite and UltraEdit (Windows), BBEdit (Mac), Emacs (Unix), and jEdit (cross-platform Java, the author's favorite tool). But you are on your own and you must know what you are doing technically. More powerful XML editing tools constantly validate your content to match the allowed elements and attributes of your schema on input, and preview the results of XSL transformation in your multiple output formats (HTML, PDF, etc.) These too require some sophistication.

What's to become of everyday copy writers and tech editors who are too busy wordsmithing to learn all these dense languages and complex processes? It is testimony to the genius of today's software developers that they are building content management systems, with XML-based guided writing and structured editing front ends, which can rightfully claim to be as easy to use as a word processor. Perhaps even easier, if you don't mind the presence of a strict robot editor looking over your shoulder as you type.