Get It Together: Integrating Data with XML

Page 2 of 3


Extracting Data
Companies face a variety of problems when they try to integrate data from different databases. InterSystem's Paul Grabscheid says that his company has been working with a customer that's been trying to integrate a mix of databases. One of the problems is the inconsistent ways each department collects and stores data. He says, "With so many ways of identifying the same type of data, it's hard to get systems to talk to each other." Grabscheid believes that companies have spent huge amounts of money buying or building applications. Now they're looking for ways to get systems to communicate with one another.

One of the chief advantages of XML is the way that it integrates data across a variety of databases. The trouble is that conventional relational databases may not be flexible or fast enough for XML data retrieval. CareGroup uses InterSystems' Caché product. John Halamka says, "When you think of XML, it is like a hierarchal tree with branches and leaves. You can't jam XML into the columns and rows of a relational database. If you want to store XML, you need a tree-like database and Caché allows you to organize XML data the way XML is organized." Further, he says Caché is "blazingly fast" and inexpensive to operate. He runs it for 12,000 employees on two $10,000 UNIX boxes. He says that if he were to use Oracle, he would need millions of dollars in boxes.

David Leeper, product manager for NeoCore XMS, a product that builds self-constructing native XML databases says, "Before XML, you had the idea of building virtual systems where you would bring everything into a relational structure, but you needed to define everything up front." Having to define data before you access it, says Leeper, can limit your ability to retrieve data, especially when your needs change. By creating a physical address for each piece of data, NeoCore dramatically increases the speed at which they can retrieve data over a standard relational database, and they can change the database structure on the fly, something that would be impossible in a conventional relational database system.

Leeper says, "You just pour XML information in and you have something you can use immediately. This is an important differentiation from relational or other forms of technology. Because we index everything," he says, "you don't have a situation where you're compromising. You can look across all information as one repository and get a complete view of information."

NeoCore is working with one company that is responsible for collecting emissions data from 36 different states. Leeper says that they can't control data formats from each state and each one calls data elements by different names. "We helped create a process to bring data into a data store and aggregate heterogeneous information into one view where they are informed if a company within a state is having an emissions issue."

Tag, You're It
This type of project shows how the flexibility of XML and a database designed to process XML data can work in tandem to help companies integrate various types of data stored in databases, but what about companies working with unstructured data—that is, data not stored in a database—such as emails or documents? How can they take advantage of XML? They need to use meta tags to encode this unstructured data with XML information.

Jan Puzicha, chief technology officer at Recommind, a software company that helps corporations manage large amounts of unstructured data, explains that his company offers a set of tools that help categorize information by crawling the document and tagging relevant items with meta tags. These tags are actually XML data that identify such information as version, topic, location, name, etc. This allows them to use XML to represent data and manage information in an intelligent fashion. It also allows companies to incorporate this data into other systems that can read XML.

Puzicha says, "We see our product as a component for a full solution [whenever] we deploy into other systems such as content management systems and digital rights systems. We put in XML and get returned XML enriched by additional meta data. It's a natural way to talk to other systems."

XML at Work
In an industry where there's always talk about the next big thing, it is useful to find companies putting the technology to work in a real deployment situation. Halamka has instituted an XML solution at CareGroup Healthcare System and was able to reduce his budget dramatically. Halamka explains, "We've been involved with electronic medical records with a high degree of automation since the mid-80s, but we were a client-server with hundreds of different applications. With XML, we have wrapped all of our legacy apps with XML middleware." What this means is that users can now see all the client applications from a single tabbed interface. When the user clicks a tab, they move to a different application, but all of the applications are linked together in a single XML-enabled interface making it feel more like a single program.

Halamka found that by taking the software off of the desktop and placing all of the applications on the server side, and placing them in an XML wrapper, he was able to reduce the cost of IT services by 40 percent because he was no longer installing hundreds of client software licenses.

Page 2 of 3