The Discussion Miners
This new breed of reporting software and services is being provided by a handful of companies that crawl, spider, or index through communities, gathering content from a variety of sources-Usenet newsgroups, message boards, chat archives, customer-provided ratings and reviews. The postings from various sources are aggregated and analyzed using software that "reads" all of the postings and finds the themes or clusters. The general themes and clusters of information are packaged into reports and sold to companies as an adjunct to traditional market research, i.e. focus groups. Depending on the company, the practice and resulting service can be called a Corporate Intelligence Service, or a Brand Commentary Solution, or Consumer Insight and Forecasting.
A range of products and services at varying costs are available. Five to ten thousand dollars will get you simple monitoring-links to comments posted by consumers about the brand, no analysis or reporting of the content. One hundred thousand dollars per year buys regularly updated aggregate reports on what people are saying about a company's product or services across a variety of Internet sources.
How much are sites hosting the community being paid for use of their member-generated data? Zero.
Some of the companies that gather and utilize the member-generated content claim that only a small amount of the overall dataset comes from Web-based communities, and that the majority of the data is harvested from Usenet newsgroups. However, in looking at newsgroup activities for various brands, the quantity of relevant information (all messages minus spam and flames) is low in comparison to the aggregate of Web sites hosting communities. Web-based communities, unlike newsgroups, are often staffed at a significant ex- pense that adds substantially to the quality of the discussions. Communities, whether they belong to general portals or clubs or are dedicated to specific topics such as auto, health, hobbies, or shopping, generally contain high-quality member-generated content, and lots of it.
Sites hosting communities that foot the bill for editorial content, which cannot be used without consent since it is copyrighted, also have costs associated with implementing, managing, and maintaining their online community content. Many communities hold the copyrights on all member-generated content submitted to the site.
Here's where the rubber meets the road. Is the member-generated content free for use by others to profit from, or should these sites be compensated for the data that is being sourced in the reports to the corporations? Even those companies, whose techniques and services vary considerably-in the business of mining community content- have different views on this question.
Cincinnati-based Intelliseek uses its Enterprise Search Server and Corporate Intelligence Solutions to collect and analyze consumer comments on the Web and in Usenet newsgroups for its clients' products and services. Karthik Iyer, vice president of products, describes the process in four steps: "discovery, collection, analysis, reporting."
Intelliseek works with the client through a discovery process-finding where people are talking about the brand to determine the best places to "mine" for content. It then collects the data from these places and slices and dices the data for analysis. The reports provide a snapshot of what is being said across the consumer landscape in the online world. Select quotes from posts in the communities are provided along with feedback, competitive comparisons, top issues with the brand, top problem areas, and overall activity levels.
With regards to the use of vertical data-community content from a single site-Iyer addresses the issue quite clearly. "Most of the content is collected from our own Usenet server. For community data from single sites, we would work with [the community site] in a relationship to use their data."