Enterprise Search in Europe - A Brief History


      Bookmark and Share

BEST PRACTICES SERIES

After many successful Enterprise Search Summit events in the U.S., Information Today, Inc. (EContent's parent company) is launching Enterprise Search Summit Europe in London, and I am delighted to have been asked to be the conference chair. This is a noteworthy event because in a world where "search" and "Google" are synonymous, it is important to realize the scale of information retrieval research and of the enterprise search business in Europe.

The probable epicenter of this is the University of Cambridge. It was there, in 1972, that Karen Spärck Jones published a paper on the statistical interpretation of term specificity and its applications in retrieval. The significance of this paper was that it proposed the concept of inverse document frequency, or IDF. This concept and subsequent work by Spärck Jones and Stephen Robertson on a relevance weighting model called BM25 lie at the heart of most enterprise search applications to this day.

Move forward a couple of decades and we find Martin Porter working on the development of the Muscat search technology. Porter started to write the Muscat software in the 1980s in BCPL (Basic Combined Programming Language), which was invented in Cambridge as a prerunner to C, although it's obscure today. One of Porter's most important contributions to information retrieval was his stemming algorithm, initially developed in 1979 by Porter, Keith van Rijsbergen, and Robertson. As with IDF and BM25, this algorithm continues to be the basis for stemming words to enhance retrieval relevance. It was around this time that Michael Lynch began work on the Bayesian statistical approach to determining the meaning of documents, which, in 1996, was the basis for founding Autonomy. Before leaving Cambridge, however, it is important to mention the Xapian open source search application, which is based, to a substantial extent, on development work undertaken by Muscat before it was sold to Dialog Corp.

Meanwhile, at the Norwegian University of Science and Technology, a research team in the Department of Computer and Information Science led by professor Arne Halaas was developing the technology that in 1997 was launched as FAST Search & Transfer. The FAST Search technology was acquired
by Microsoft in early 2008 and has now been incorporated in the FAST Search Server for SharePoint 2010, as well as the stand-alone enterprise search application.

A number of other major search vendors are European in origin. Semaphore Smartlogic is based in the U.K. In France, François Bourdoncle and Patrice Bertin founded Exalead; in 2009 Bourdoncle and Bertin were involved in the development of the Alta Vista search engine. Around the same time, Sinequa was set up, but like Exalead, there was a long period of software development. Neither company really started to make commercial headway until about 2005. Another European enterprise search vendor is the Austrian company Fabasoft with its Mindbreeze application.

For any company operating in Europe, the ability to search for content in multiple languages is important. The concept that international companies can get away with making the corporate language English is just not workable because in most European countries, the definitive language for personnel information, contract documents, legal documents, and compliance filing is the national language of each country in which it is operating. For more than 10 years, the Cross Language Evaluation Forum (CLEF) of the EU has promoted R&D in multilingual information access. It has developed an infrastructure for the testing, tuning, and evaluation of information retrieval systems operating on European languages in both monolingual and cross-language contexts, and it has created test suites of reusable data that can be employed by system developers for benchmarking purposes.

Enterprise Search Europe will be taking the best ideas from the Enterprise Search Summits, but if you are able to come to London in the fall, you will find that the conference aims to be a place at which the enterprise search and information retrieval communities will be able to share ideas and requirements.

Multilingual search will no doubt feature strongly, and it is also likely that there will be more papers about the use of open source search. Within a few hours of the conference being announced in early April, paper submissions started to arrive in my inbox. I suspect that the greatest challenge the organizers will face will be finding a place for all the best submissions in a 2-day conference.