NewspaperARCHIVE: A Case of Broken News

Page 2 of 3

      Bookmark and Share

THE SOLUTION
Fiscus and NewspaperARCHIVE looked at a number of different solutions that could meet the company's search needs, both open source and proprietary. To get started, Fiscus traveled to Information Today, Inc.'s Enterprise Search Summit West conference, where he narrowed it down to three possible contenders: Dieselpoint, Exalead, and the open source Apache Solr server.

According to Fiscus, the company couldn't afford to take a risk on a completely custom solution that might take a year of development time. It needed something that could be implemented quickly and efficiently. After considering the strengths and weaknesses of each, NewspaperARCHIVE ultimately went with Exalead.

Not surprisingly, Eric Rogge, senior director of marketing at Exalead, feels that NewspaperARCHIVE made the right choice. Given Exalead's origins in the field of web search, its products were engineered with web content in mind. "The technology [NewspaperARCHIVE is] using is usually designed for content management behind the firewall," says Rogge. "[Exalead] was originally designed to handle web content."

Exalead's CloudView grew out of the company's earlier attempts at creating a traditional web-based search engine. The current iteration of the product maintains a straightforward web interface that lets users collect and search a wide variety of both structured and unstructured data. Results are automatically sorted by categories, and the engine can also make suggestions and corrections to search terms based on alternate, incorrect, or phonetic spellings. The platform is also capable of sentiment and semantic analysis for use in discovery and analytics applications.

According to Rogge, CloudView is fully functional and easy to implement as a vanilla implementation while also maintaining a degree of extensibility. "We designed the product to be useful out of the box. But we also designed the product to be highly customizable," he says. In addition to end-user search applications such as NewspaperARCHIVE's, CloudView has been used to organize phone directories, IT help desks, economic development data, automotive sales listings, and ecommerce sites.

Rogge notes that a large chunk of the company's business comes from companies such as NewspaperARCHIVE, which are transitioning from another product that no longer suits their needs. "For enterprise search, we do a lot of replacement business: FAST, Autonomy, Google Search Appliance," says Rogge. "The most common reason why we get replacements is because we're a more economical solution. If you compare us to some of the other competitors out there, those products are very cumbersome to implement."

According to Rogge, replacing an existing implementation is a two-step process: "It's pretty straightforward. Usually, what we do is we build our index alongside and then we migrate the user interface." He explains that most of the implementation time is spent mimicking the existing user interface and incorporating any improvements the client might want, although the indexing time can vary based on the database.

"I saw them index an IT service desk app in 15 minutes," says Rogge. "We've had other applications where it [took] a fair amount of time." According to Rogge, the duration and difficulty of a transition ultimately comes down to the complexity of the existing user interface, the variability of the content being indexed, and the extent to which semantic extraction must be performed on the data.

Page 2 of 3