Take a look at any average person's computer or workstation: It provides the perfect illustration of how quickly information can get lost, misfiled, or simply neglected. At the end of the day, information is king, and the abiding concern for businesses and individuals alike is not just learning how to access internal information, but also tapping into other resources that can be incorporated into their own workflow, as well as getting rid of data when it is no longer relevant.
Stepping back from the small picture of an individual user's PC to survey the potential vastness of an enterprise's information landscape, one quickly ascertains that when multiple users try to create, share, analyze, and store data, problems can surely escalate. Hence the evolution of content analysis tools, designed to meet the challenges of handling information found not only on public Web sites, but also company intranets, extranets, and portals. While more mainstream Web site analysis tools have focused on tracking visitors as they navigate through a Web site to help in determining the all-important "stickiness factor," content analysis tools start by analyzing and then aggregating the information itself. The value proposition of properly managing and distributing content can apply to a wide range of scenarios—whether it's helping a researcher in his quest for knowledge, boosting a company's revenues by disseminating marketing content with the goal of attracting and retaining customers, or helping a company manage its own workflow throughout locations around the world.
The Business of Content
Headquartered in Bethlehem, PA., Active Data Exchange specializes in digital content publishing and distribution across internal and external Web environments. For instance, its Active Syndicator software enables companies to aggregate content from multiple sources and then package it according to their exact specifications. Content can be aggregated through XML import or the company's Active Data Publisher product.
"We find that the obstacle that folks populating their sites face with content is the creation, development, and aggregation of that content," says CEO Susan Yee. "So, what our tools do is help enterprises who want to communicate and broadcast their content out across multiple channels or multiple destinations. We actually aggregate and then re-package their content at a granular level and put them through a content catalog so that intranet, extranet, and portal managers can review this content catalog to see what information they want to use."
She is quick to point out that technology has its limitations. "Content is really the key, because you can have all this great technology but if the content isn't what your visitors want, then who really cares?" Organization is vital to unlocking that content and making it accessible. "For instance, there's content that's trapped inside departments, workstations, or enterprises that people can't get to because the technology hasn't been implemented to make it available."
Yee believes there's a great opportunity for content licensing and online content syndication, something that entities like Dow Jones, Reuters, and AP have been doing for years. "They sell content, so this online syndication is nothing new," she says. "However, the real challenge is not just the content, but packaging the content. That's what we see as a great niche to be in. We're taking what some of the larger syndication companies have created on their own and are providing it as a tool for people who don't have that as their core business. I think the licensing market is a big one, but bigger than that is the intranet, extranet, and portal area; the licensing market is a sub-set of everyone who has content."
Content analysis tools can literally help save lives. SPSS, maker of the LexiQuest Text Mining Suite, provides software solutions used for the discovery and management of information in unstructured text. These tools are based on linguistics and can understand and provide accurate information to users' queries. SPSS also provides extensive capabilities in data mining with its Clementine product. Dr. Michael Liebman, the director of Computational Biology and Biomedical Information at the University of Pennsylvania's Abramson Cancer Research Center, is using SPSS's LexiMine for both data mining and text mining for cancer research, particularly in stratifying the disease into sub-types in an effort to come up with better diagnoses and discover more effective treatments for individuals.
Traditional medical literature tends to focus on disease as a stage or state, as opposed to something that evolves in a patient over time. Liebman is now researching contributing factors such as environment, lifestyle, obesity, and smoking in a quest for new answers. "In our research, we are trying to use information about a patient's history, to better understand what type of disease an individual has so it can be treated more effectively," Liebman says. "So, since this is a different way of looking at the disease, we don't have a lot of information that addresses it in that context." By using text mining, Liebman can go back through original literature and pull out information from clinical studies. "We don't automatically know what the concepts are that will be relevant, and so we're using LexiMine to help us identify concepts related to the things we're looking at and then we are able to go back and try to understand how these concepts are related so we can incorporate them into our computational models."
Such content analysis tools are truly changing the face of research. Liebman uses LexiMine to generate the knowledge he needs, and the Clementine software to refine the information that he ferrets out. "I'm not a clinician, so I look at these problems from a modeling perspective and from the outside world, but obviously that's not how it's been approached in the past," Liebman says. "And part of the reason is, a patient shows up on your doorstep and you have to treat them. You don't have time to think about the theory of the disease."