Content: Managed But Not Found

The Online Information conference in London has taken place every year since 1976, and has become the definitive conference for anyone in the information retrieval business, or the provision of information research services. I have to proclaim a bias, in that I have been chairman of the conference for the last three years, and have missed only three conferences in 26 years. Over the years, most of the major advances in information retrieval have been announced at the event, which this year attracted over 260 exhibitors.

This year search was very much center stage, with the opening keynote address being given by Jeff Dean, a distinguished engineer from Google. There was a very lively session on Internet search techniques, and quite a substantial number of papers on a wide range of information retrieval topics. Yet, looking at the exhibition floor, most of the big names in the information retrieval business were conspicuous by their absence. I am referring to companies like Verity, Fast Search, and Autonomy. There were also virtually no companies from the taxonomy sector, such as Clear Forest and Inxight. About the only interesting search vendor present was Cogenta (, a U.K. company that has just launched Research Director. This search utility is capable of creating personal Web agents to search internal reources, the Web, and proprietary databases.

At present, "content management" is the hot topic, with companies beginning to see the advantages of creating content repositories. (Deloitte Consulting recently published an excellent 25-page CM briefing, However, at present, CM focuses on creating the repository and the ability to separate content and design to facilitate the creation of documents. Effective back-end administration of the authors, reviewers, approvers, and documents has also been enticing. All this is very valuable. However, the lack of attention on how to retrieve all these documents concerns me.

Google is part of the problem. Too many senior managers are intoxicated by the ability to identify 325,000 hits in 0.09 seconds, without realizing how Google does the searching, and what Google cannot do, at least at present. What managers don't seem to appreciate are the benefits of being able to sort by date, author, subject relevance, and many other parameters. They see search only in terms of more is better. This results in a misconception that none of the search engine vendors seem willing or able to attack. Pretend you are an IS manager and then look at the Web sites of Verity and Convera. As with so many in the information industry, they seem to be preaching only to the converted.

In my experience, companies are not paying attention to how their employees search for information. In the case of Web sites, the emphasis is on providing taxonomic (list) routes to information, supported by hyperlinks between documents and sections of the site. Search is usually limited to keywords. In the case of most sites, "Advanced Search" does nothing more than reduce the number of hits rather than improve relevance or sort the list of hits. In the intranet environment, the balance is different. Information access through lists remains important, but providing effective hyperlinks proves a difficult task given the range of information, especially when content authoring is decentralized. Moreover, users will have a much wider range of search requests than is the case with a Web site. They will need to be able to trust the results implicitly because of the relative weakness of other information access routes.

Although there are probably over 150 CM vendors, there are probably fewer than 15 search software vendors and about the same number of taxonomy development vendors. Finding vendors is not the problem. The issue is in deciding how to implement the solution and that has to start right back with a very good understanding of (for example) how staff frame questions, the level of detail in the displayed results, and how they want to order the list. Just as important is working through the requirements for metadata development, and for document security.

Dick Senmark at Volvo puts it well in an article in the December 2002 issue of CIO Magazine, "The search engine industry and the research community alike often fail to acknowledge that intranets are not just downscaled versions of the Internet, but are instead a whole different environment in terms of both content and culture. Intranet owners must start to ask for search products specifically designed for intranet use and, more important, that fit the work environment at their company."

If search is not given due attention by an organization, the result is likely to be that the CMS investment will not be recovered through the effective re-use of the information contained in the documents. Instead, we will have the digital equivalent of medieval monasteries' chained libraries.