In early February, I had a most enjoyable lunch with four people that I have known most of my professional life. At a rough guess, our combined service in the information profession was the better part of 150 years. We met as the selection committee for the Tony Kent Strix Award.
Tony Kent, who died in 1997, made a major contribution to the development of information science and information services both in the U.K. and internationally, particularly in the field of chemistry. The award (www.ukeig.org.uk/awards/tonykentstrix.html) is given in recognition of an outstanding contribution to the field of information retrieval. Our group reflected on the extent to which the foundations of information retrieval were laid more than 40 years ago. For example, there is currently much interest in faceted navigation from companies such as Endeca and Siperian, the basis for which is the work on library classification schemes by the Indian mathematician S. R. Ranganathan in the 1930s.
At the heart of assessing search engine performance is the concept of relevance—a word that dates from 1733. Much of the early work on relevance in an information retrieval context was carried out in the 1950s and has been the subject of research ever since. However, if you listen to the assertions of certain search vendors, you wouldn’t think this was the case. One recently told me, when I queried the lack of any indication of relevance on the results from their search engine, that I have an old-fashioned view of search.
It is generally recognized that users are unwilling to go beyond 30 results (usually three pages) unless they see a good reason for doing so. The value of a relevance ranking, be it a percentage or a "star" graphic, is that it provides an indication of the point at which the long tail of largely irrelevant search results starts. If after 30 hits the percentage relevance is still around 90%, a user realizes it’s time to change the search strategy, either by using different keywords in the Basic search box or using an Advanced Search option. On the other hand, if the percentage is already dropping to 70% by the end of the first page of results, the user can feel reassured that clicking on further pages is not going to be of value.
The usual reaction to my concern about any lack of relevance indication is that Google doesn’t do it. One of the reasons for this (and there are others) is that with such large result sets, a relevance ranking would not be very useful. Not so with enterprise searches. Incidentally, I am also tired of people telling me how quickly a search with Google is completed. There is confusion about the difference between speed with which the results are returned and how long it takes to work through them to find the best information. When you do a Google search on a broad topic, time how long you spend working through the result set to find useful results. You may be unpleasantly surprised at how slow your search for information really is.
Another important facet of relevance is recall. In a web search, recall (the percentage of all relevant documents returned) is not of great value. However, in an enterprise setting it can be very important to be as certain as possible that you have found all relevant documents. In a court of law, or even in a meeting with your manager, worrying that the search engine may have missed something is not a good feeling. To achieve high recall requires a lot of dedicated work by the search team. And I mean team. Too often search is relegated to someone in IT on a part-time basis. In most sizeable organizations, there needs to be a search manager, someone doing serious and sensible analysis on search logs, another person who understands the formal and informal taxonomies of the organization, and someone on the help desk. That’s a total of four people, and it can’t be done well with less in the way of resources. The challenge is especially high with search engines using semantic/statistical search, where tuning can become a nightmare.
It is easy to be dazzled by search technology and by vendors who create the impression they are inventing new approaches to search usability. Instead go to www.dcs.gla.ac.uk/Keith/Preface.html and read the standard text on information retrieval by Keith Rijsbergen (published in 1979!) and gain a real insight into the basic principles of effective search. You might also send the link to your search vendor so that it can get a taste of good old-fashioned relevance.