Searching for Structure


      Bookmark and Share

No wonder Google pulled off its public stock offering: More than 70 million Americans use search engines weekly according to the Pew Internet & American Life Project—and Google is the most popular search engine by far. Search engines aren't just popular, they're indispensable; almost one-third of respondents said they couldn't live without Internet search engines. But as the volume of information on the Web continues to expand, even the casual once-a-week searcher can become frustrated with the keyword search results from general search engines like Google. The pros, schooled in Boolean search skills, might fare a bit better, but how about the business person who's not a super searcher? Many are looking for a better way. This isn't to say Google, Ask Jeeves, and MSN will disappear. I expect they'll continue to grow. Rather, for specialized business purposes, new tools are required, and business people are increasingly open to experimentation as new search tools emerge.

Into the mix comes a growing set of firms focused on specialized search. The VCs see a trend: Search engine companies are one area seeing millions of dollars in funding. Players putting their stamp on specialized searching for business include the Factiva/IBM WebFountain collaboration (relying on text mining to combine the precision of search with the serendipity of discovery), FAST (analyzing data using semantic orientations), and Intelliseek (a new and interesting way of looking at blogs).

I'd argue that an often overlooked area—aggregated Web directories for particular product niches—double as terrific specialized search engines, and I think we'll see many more directories, particularly in various B2B niches, pop up in the coming years. When companies like Amazon take a product category to the Web, they first create a useful taxonomy combined with structured data for each listing in the database. Books, ISBN numbers, authors, publication dates, and reviews all need to be amassed, collated, and added to structured databases for cross linking and searching. In the B2B world, ebuild is a popular directory for building professionals and architects to search a detailed taxonomy of more than 230,000 building products. If you want to find all the books by your favorite author or to match doorknobs for a home addition, Amazon and ebuild are much better places to look than general engines.

Although the technologies and approaches differ, next generation search companies do several things well. They take advantage of free Internet content as raw materials to keep content costs at zero. Because of their traffic volume, they also get suppliers (like Amazon's publisher partners and ebuild's suppliers) to contribute product specs. Each of the engines also adds detailed levels of structure to unstructured text, making it useful to a specialized field. For example, Eliyon surfaces information on business professionals from a database of over 20 million people in over a million companies, all mined from the public Internet. Interestingly, Eliyon now allows individuals to update their personal biographical profile in the database through a tool on its site. Another specialized B2B engine, DolphinSearch, focuses on litigation support for the legal community.

A helpful feature of specialized directories and engines is locating related information that can also be searched and browsed. Amazon's "Customers who shopped for this item also bought" feature, Eliyon's linking of people's names to the companies they've worked for in the past, and Shareholder.com's revelation of the industry analysts and major investors in company financial data are all examples of how specialized services aid in related searches.

Applying a mix of sophisticated text-mining techniques, natural language disambiguation algorithms, and artificial intelligence methodologies to the public Web and news databases creates remarkable sets of highly structured and useful documents in a unified format, ready to be searched. Increasingly, specialized engines also mine hidden parts of the Web. For example, Intelliseek adds a unique twist in its analysis of blogs. By combining the time-stamp data found in each individual post from nearly two million blogs, active discussions are located and analyzed with time as a coordinate. Factiva's Insight for Reputation powered by WebFountain uses the time component as well, making the discovery of emerging opportunities and threats visible.

I suspect specialized search engines will provide structure to all kinds of new and interesting data in the future. B2B publishers and other domain experts, with the power of their detailed content taxonomy, have an opportunity to launch directories to monetize their unique content for Web search. General search engines won't go away, but the specialists now form an interesting market to watch.