WebFountain: IBM Buzzes the Web for Intelligent Applications

The power of buzz lies mainly in the cachet it caries in its cryptic quality and clandestine communication. While there's a veritable industry in buzz-generation, it isn't true buzz unless it carries with it a certain "I don't know what." Anyone in the content business who's lately heard a humming that sounds something like WebFountain but couldn't quite make out a distinct message, listen up. According to Robert Carlson, vice president of IBM's WebFountain project, they are "coming out."

With investment figures hovering near $100 million as of this year, there's likely some clock-watching at IBM as to when this project will begin to pay off. WebFountain grew out of a late nineties IBM Almaden research project called Grand Central Station, which had the lofty goal of "reading the entire Web," according to Carlson. He says the vision included "harnessing the information on the Internet and extracting data that would have value to our customers." Last year, that vision grew from including all unstructured Web data to include all unstructured data, wherever it might reside. And, while IBM has a history of pure research projects, Carlson points out that these days, the company tries to build business processes around these projects.

Still hear buzzing?

Turns out that WebFountain has been in Beta for about a year with companies in a variety of industries including entertainment, financial services, packaged goods, and pharmaceutical. And, according to Carlson, business applications abound. He says, "I like the database analogy: WebFountain has the potential of being what SQL [Standard Query Language] was to the business world. While its not a database, the analogy helps illuminate the value of WebFountain. Before SQL, databases were seen as a technical asset rather than a business asset. With SQL, questions could be asked of information and now people see their databases as a strategic asset. That's how we see WebFountain," he says. "Now you can conceive of a specific question, and get a specific answer rather than a list of thousands of hits."

Like SQL, IBM has aspirations that WebFountain will actually change business processes by impacting the way that business tools are conceived and created. WebFountain will not be an application, but rather a platform on which applications are built. Carlson says, "We think of ourselves as a platform that includes content, technology, and operations. We want to allow anyone that wants to develop technology on top of our platform to do it. Then, in their deployment model, there would be cost for their application to access our platform." He anticipates that the first applications will be market, competitive intelligence, and brand management tools.

To illuminate a possible application based on WebFountain, Carlson describes the launch of a drug by a pharmaceutical company. Once a drug has been FDA approved, the goal is to reach peak sales as soon as possible, especially given the likely crowd of competitive drugs that will come to market within months of each other. IBM has been working with a customer that has a good product, but has been languishing in sales. He says, "We're helping them get closer to their customer… It turns out that how customers talk about this ailment is very different than the way the pharmaceutical company was talking about it."

WebFountain brings together information from chat rooms, advertising sites, competitors sites, etc. to help this brand manager understand the chasm between how they are positioning the drug and what about the ailment concerns consumers. This helps the company rapidly reposition the product in a competitive environment or even position it differently in other countries, where there are likely to be cultural differences in the way an ailment is described. But WebFountain wouldn't stop there. Carlson says that once a drug is successful, it "becomes a franchise…if you have a blockbuster drug you get an ‘atta-boy' and then asked how you will grow the brand." One way to do this is to look at other applications for the drugs and to compare the drug's chemical structures to all other drugs to see if it can be reengineered for repurposing. WebFountain works in three stages: base mining, in which indexing and search technologies are used to systematically mine the Internet using focused crawling; an industry component, which requires industry-specific expertise to know the types of algorithms with high value (IBM plans to work with customers and consulting organizations to build these); and the application, which will be delivered to customers as an on-demand service. This last part will call for WebFountain to periodically extract the information required by a given application—from the Web, third-party data sources, and even businesses' information silos—then push it out to a data store hosted by IBM's ebusiness on-demand hosting centers so it can be used by an end-user application to generate, for example, a brand-management dashboard.

"For every megabyte of data we read in," says Carlson, "we create about 10 megabytes of metadata. Our value proposition is the metadata. We extract all of this stuff—nouns, locations, entities—then it goes into an industry process." He says, "We construct higher-level value from this information."

There have been a number of research breakthroughs that have allowed IBM to create the WebFountain infrastructure; the technical challenge was to get to the scale. It is an operation made up of about a 1000-node Intel Linux cluster and half a Petabyte of storage, according to Carlson. While a "me-too solution" could be made by "cobbling together about 30 or so companies in the marketplace," according to Carlson, he doubts they could get past 10 million pages.

Carlson is anxious to build partnerships not only with application developers, but also with "content folks to help define the value of their content." He says, "Its no longer enough to say my data is authoritative, it has to be put in context with all other information within an organization." WebFountain aims to allow the value of the content to be applied to the business process. Carlson says, "The value is in the experts that create it. If you allow the application to define the value of your content, you won't stay in business."