A Case of Clustered Clarity: UP Health Sciences Library System

Page 2 of 2

The Solution
The HSLS chose Vivisimo Velocity in large part because of its ability to delineate between the various meanings of search terms through the clusters, or headings, generated by this product. "It's really easy on a word that's pretty nebulous to know exactly where you want to go next while also getting a sense of the volume of material available on a particular subject," says Silverman. "The clustering resolves these linguistic conflicts very well."

Now, HSLS patrons are able to search the full text of all of the available titles with simple keywords and then drill down to the information that's most relevant to them, a vast improvement over the usage model of yesteryear. "If you look at the older model of print, you'd have to go to the catalogue, pick up a few books, and then read through each of them," says Phil Burgen, information architect for the HSLS. "Through Vivisimo, we've made a giant index that you can look everything up on all at once. There's no need to open each book."

The HSLS had a prototype up and running for internal testing in January of '05 and went live to the public a couple of months later. To implement Vivisimo Velocity, there were two main components that Burgen had to tackle. "One is a form and one is a parser," says Vivisimo's Taylor. "What the form does is remotely administer the remote search engines. It presents the query to the search engines, and then the parser will convert the output from that search." For sources with clearly defined XML not much has to be done in terms of parsing the content. "But if the results are in HTML, for example, or an unfriendly format, that's where we have to build a parser," says Taylor. "Every single source might be very different in the way it renders output. We do have templates that help speed this development, but certainly one can't just put in the URL and magically the software where to find the correct information to pull from a document."

While Burgen had a high level of technical know-how and a strong desire to learn the inner workings of XML to be able to build his own parsers, Velocity users typically don't have to delve that deep into the software to make it work with a variety of sources. "We have over 1,000 connectors to both public and private sources that we can pre-ship within the Velocity technology," says Taylor. "If we don't have a connector built, then we'll build that on our end. And anything we've done we can repurpose to other customers."

The Outcome
The HSLS's first implementation of Velocity has so far been met with high praise from its users. "When we first came up with the idea and then offered this, people realized right away that this was something they need," says Silverman. And not only University of Pittsburgh users, as clinicians at other universities without Vivisimo's technology have already begun to use the HSLS's implementation.

Because the public implementation of Velocity didn't go online until the school year was winding down, Silverman and Burgen haven't had a chance to glean the full impact that this technology has had on the way its users do business, but they're already getting quantifiable proof of its positive impact. "One of the trainees that's working with this has done a study looking at using the clustering versus not, and the early results show that their database with clustering is far and away the winner in getting the right information to people," says Burgen. "I have no doubt that this speeds up the process of getting to the right answer substantially."

Not only that, by monitoring the search logs, they were able to recognize just how dramatic an effect the clusters were having on the way in which its users were finding information. "What we found was that the number of searches steadily climbed, but the number of stupid searches steadily declined," says Silverman. "At first, people would try to search for complicated terms. Then you'd see people start going to simpler searches. They're letting the clusters do the work for them, which is exactly what we wanted people to do. Very early on people were getting it. They like the idea of putting in something simple to start with."

And this is only the beginning. "We're just now starting to break through the literal understanding of this," says Burgen. "We're currently looking at the possibility of using the clusters to discover new ways of doing terminology. What can Vivisimo's clusters come up with in terms of groupings of content that we may not have thought of on our own as we're doing human indexing internally? Because we've not really done too much with the search engine yet, I think we're still going to see some surprising things there."

One example of how that search engine might be used in the future is to "crawl, index, and cluster the results from each of the courses on the medical school's Web site," says Taylor. "So if students are looking to search for information about their courses, they'll be able to use the Vivisimo technology to sift through the huge volumes of information and supporting documents that are relative to their coursework."

Page 2 of 2