Unknown Unknowns

When I think of memorable quotes from military leaders, Winston Churchill comes to mind, with “nothing to offer but blood, toil, sweat, and tears.” But the United States has been blessed with a poet in the upper echelons of government: former Secretary of Defense Donald Rumsfeld has developed a cult following on the web for his poetic discussion about unknowns.

On Feb. 12, 2002, he declaimed, “As we know, there are known knowns. There are things we know we know. We also know there are known unknowns. That is to say we know there are some things we do not know. But there are also unknown unknowns, the ones we don't know we don't know.”

Although he probably didn't realize it, this insight applies to the world of econtent as well as to weapons of mass destruction. One of the biggest challenges of info pros is knowing what information is out there and what we are missing—the unknown unknown.

When we are searching econtent, we often focus on constructing the perfect search strategy. We use the advanced search features of search engines. We search tags in the blogosphere. We dig into the content of podcasts. And those of us who grew up before the web build complex search statements with nested logic and controlled-vocabulary terms on the value-added online services.

If the information is out there, we'll find it, right? Well, no. While intelligent indexing, tagging, and enhanced metadata all help construct bigger needles in the information haystack, we still approach the haystack assuming we know what a needle looks like. Frank Knight, a 20th century economist, discussed a similar issue in what is now called “Knightian uncertainty,” the distinction between risk (randomness with knowable probabilities) and uncertainty (randomness with unknowable probabilities). Factoring in risk is a manageable task; calculating Knightian uncertainty is impossible.

As Rumsfeld pointed out, we searchers forget to factor in the unknown unknowns—the information that we do not even think to look for because we do not imagine that it exists. And this brings up a new challenge for econtent providers. Since you can't know what searchers will be looking for, and the searchers themselves may not know what the answer looks like, how do you create an information infrastructure that facilitates those magical “ah-ha” moments searchers live for?

One good example of a partial solution is Amazon.com's collection of innovative finding tools: its concordance of frequently used words, Statistically Improbable Phrases (SIPs), and cited references all help searchers mine the content of its online books in unexpected ways. Note that the tags that users assign to books are often close to useless, even though the purpose of tagging is to increase access points. Tagging is, in essence, descriptive cataloging, which only works if you can anticipate how someone else will be looking for an item. To get back to the analogy of that information haystack, tags are merely pointers to a needle; the SIPs, concordance, and citation references are the foot-long safety pins, spools of thread, and the odd sock in the haystack.

While additional finding tools, such as Exalead's proximity search feature and social bookmarking sites, help find information that we know or suspect is out there, the next breakthrough will be in moving from search to discovery.

Data visualization tools and data mining resources both attempt to address the problem of unknown unknowns by helping us view relationships among data without having to know ahead of time what those relationships might look like. I am about as left-brained (analytic, linear thinking) as they come, and maybe that's why I'm so fond of GapMinder (www.gapminder.org), a delightful data viz tool that takes global data and presents it in easy-to-understand graphics. NationMaster (www.nationmaster.com) is another data viz resource that allows users to extract meaning from statistics, both globally and (in www.statemaster.com) within the country.

While the commercial online services have developed fairly rudimentary clustering and data viz tools—see, for example, LexisNexis' grouping of results by source type, industry, and so on, and Factiva's Discovery Pane—what we really need are new tools for rich information discovery rather than mere retrieval.

And finally, we should meditate on Donald Rumsfeld's comment at the end of a European trip in June 2001: “A trained ape can know an awful lot of what is going on in this world, just by punching on his mouse for a relatively modest cost.” Perhaps all we really need are better mice—or more apes.