In the mid 1990s, I bought an Iomega Zip drive boasting 100MB of storage. It was the size of a small pancake. The other popular offline storage medium, a 3.5" diskette, held 1.44MB. Each Zip drive packed the equivalent of almost 70 diskettes. Who would ever need more than a few Zip drives?
You say you don't have any personal Big Data? Not worried about cloud storage? Think again. Private data, including passwords, are breached almost every month. WikiLeaks is the poster child for loss of massive amounts of classified information, all due to poor oversight of personal external drives. Most recently the case of Mat Honan, a Wired magazine journalist, comes to mind. Privacy breaches and lost data aren't always due to personal carelessness, although that is often a contributing factor.
You see references to "Big Data" everywhere. Even on the personal side, we all share a common sense of information overload. We don't (can't) back up all our information-personal, business, shared, and social-since it is stored in so many places. We often can't find critical pieces of information. And that's just our own. Expand the scope to massive corporate data sets, and the problem becomes mind-numbing. We also know that we aren't getting all the value that's available in this data storm. Not getting sufficient value from that information may be the biggest problem of all.
I get about a hundred press announcements and story pitches a day. I give each one a glance and make a quick save or delete decision. One caught my eye recently: "Drupal, Open Source, and Jobs." As I was reaching for the DEL key, I hesitated. Jobs and employment are in the news daily. The subject line was tantalizing and led to some interesting conversations with senior executives who provide Drupal services: Ron Huber, president of Achieve Internet, and Ben Finklea, CEO of the Drupal SEO company Volacci Corp. I wondered, could open source software (OSS) really help turn around the economy or enrich your career?
Paper books are, by nature, static things and so, for the most part, are digital versions of these books. The Oxford English Dictionary reinforces this notion. It defines an ebook as "an electronic version of a printed book," suggesting that these ebooks are little more than replicas of their digital counterparts. When you hear the term "ebook," you probably think of the iPad, Kindle, or perhaps PDF documents. You may think of a static file that is just like a printed book except that you need some kind of digital reading device. That is about to change, as digital documents in general, are being swept up in the collaborative behaviors of social media.
I assessed two competing digital paper products in the early '90s: Adobe Carousel (later renamed Acrobat) and Envoy. I settled on Acrobat, based on Adobe's deep expertise in print via PostScript. The idea of a timeless, reliable digital file format remains appealing today, especially when that format is ubiquitous, easy to create, search, and use (not to mention an ISO standard). Creating a PDF file from a properly styled word processing document, such as Microsoft Word, OpenOffice, or WordPerfect, gives you both print fidelity and automatic hyperlinking. You can be sure of a rendition faithful to the print version. Moreover, users on Windows, Mac, and UNIX platforms will see and use it the same way.
There is no standard IT definition of "cloud"-the National Institute of Standards and Technology is working on that as I write. Still, there is no question that we're entering a new age of cloud computing. The benefits of cloud storage and computing are many: e.g., lower cost than in-house storage, reduced stress on internal IT resources, efficiency, and guaranteed availability. Just like a sudden change in the weather, cloud storage and computing will be disruptive. It is one thing, however, to trust your MP3 files to a consumer cloud service. It is quite another to entrust your whole business information architecture.
I couldn't function without it, but email is becoming dysfunctional and showing its age. In 1998, I had two email accounts, one business (using Microsoft Outlook) and one personal, available only via dial-up service. As time passed, the quantity of my email accounts grew, as did the type of email servers and email providers. Times have changed, and not for the better.
It is a worthwhile exercise to review what I thought were trends in the technologies and tools that I follow and to examine my predictions over the past year. It improves my assessment of important technologies and can refine my forecasting approach. This past year I emphasized three trends: the potential benefits of XML as expressed in the eXtended Business Reporting Language (XBRL); a legal issue combining digital documents (Microsoft Word) and XML—i4i's suit against Microsoft's alleged patent infringement in Microsoft Office Word; and the importance of a new product category, e-readers.
As of this writing, it's impossible to escape daily news about the British Petroleum oil spill in the Gulf of Mexico. Amid all this news, there is one term we haven't yet heard that will have a large role in this saga as it plays out: e-discovery, which refers to the pretrial action in a lawsuit where parties can request or compel digital documents as evidence.
Like many of you, I read voraciously. My nightstand is always overflowing, and I'm running out of bookshelf space. Meanwhile, the cost of print publications continues to grow. So how do you cope? One approach is to use an e-reader, but which one?
Avi Rappoport: "Over the years I've been part of many enterprise content management initiatives. I've seen each repository grow, usually isolated from the others. The result is often an ecosystem of different, disconnected enterprise knowledge assets. An emerging cross-vendor standard called content management interoperability services (CMIS) offers to connect some of those repository dots."
May 2010 Issue
Posted May 07, 2010
Once in a while, I get asked what I call a "Flat Earth" question: "Prove to me the earth isn't flat." Then, I stumble to find good answers. So it was in a recent taxonomy modeling session, when a line-of-business participant asked me: "Why do we need folders anyway?"
In my annual review of XML, two events or trends stand out: First is the Aug. 23 injunction by U.S. District Judge Leonard Davis of East Texas against Microsoft (MS) selling Word products "that have the capability of opening a .XML, .DOCX, or .DOCM file (‘an XML file') containing custom XML." Secondly, there is eXtensible Business Reporting Language (XBRL) and tools emerging to leverage it.
We all recognize the twin trends of exponentially growing electronic content and more litigiousness. The landmark 2006 Federal Rules of Civil Procedures Rule 26 and its updates make all electronic stored information (ESI) subject to legal discovery, and ESI continues its unbridled growth. Yet cost controls are tighter these days than they were a decade ago, so we increasingly react to problems rather than nip them in the bud. If you already have an enterprise search system (or, more likely, several targeted search systems), do you really need anything else to respond to a civil suit requesting information stored anywhere in the enterprise?
I'm still searching for e-readers … but I'm getting close. Last year in my column "It Ain't Easy Being Green," I noted the problems newspapers face and the benefits of "going green." Since then, subscription costs have continued to rise and newspapers now need their own bailouts.
In April, when this column is printed, Americans will be engaged in the annual ritual of calculating and paying state and federal taxes. Although tax activity is most intense in April, we all pay taxes daily in many forms, such as sales tax. Just as taxes have always been with us, so too have taxonomies. Like dealing with taxes, taxonomy management is an evolving process that never ends. There are many definitions of "taxonomy," but I view them as merely the various ways we categorize and manage groups of things so we can find them, whether they're dishes in a cupboard or scrolls in the ancient library of Alexandria.
Mergers and acquisitions are all too common, as are company reorganizations. SharePoint is an increasingly popular repository option. The increasingly common end result: More and more enterprises have important content in at least two incompatible content management systems, and most users cannot access all the systems. Even if you know what's on the "other" system, getting there is usually a hassle.
As anyone with kids—or a good memory—knows, when you cross the "double digits" birthday threshold, it's a big deal. This year, XML crossed this threshold on Feb. 10, and this got me thinking about questions that I might ask this 10-year-old in order to gain perspective on its past and future. I know I'm late, but XML is nothing if not flexible. It assured me that even a belated party is better than none.
I knew that price wouldn't last, but I became hooked and since then I've renewed every 2 years, including the digital edition. However, this year's bill gave me a case of sticker shock at nearly 30 times the original teaser price. I've already switched several of my other print publications to digital, and I suspect pricing is going to strongly encourage more digital switching.
Are you sure that the search system you're using will satisfy the requirements of the Federal Rules for Civil Procedures (FRCP) regarding electronically stored information (ESI)? If your first reaction is "not more acronyms," I feel your pain. Vendors create acronyms faster than they upgrade their products. So let's start with the meaning of the FRCP, focusing on the amendments regarding ESI that went into effect on Dec. 1, 2006.
The vast majority of the clutter on local and network drives may contain records, but most are drafts or orphans that nobody recognizes and nobody dares to remove.
The Pulp and Paper Products Council reported recently that more than 900,000 tons of newsprint were produced in June. In the U.S., 55 million newspapers are sold each day. That’s a lot of trees to cut down, process, print, and deliver, with lots of fossil fuels consumed in the process. What is the alternative—move everything to the web? Broadband isn’t always available (certainly on the DC Metro), so this would limit content access. Web delivery isn’t completely eco-friendly either. Estimates of our total national energy bill devoted to information technology range up to 14%. Unlike your laptop or refrigerator, web servers must run continuously.
What a difference a year makes. Since my column in last year’s EC100 issue, content applications of all types have been showing their 2.0 stripes, increasingly blurring the boundaries between web and print, and where their content resides. As “Web 2.0” has become part of our vocabulary, Content 2.0 parallels are blurring web-based and non-web-based content.
The long awaited use of XML in office suites has arrived. OpenOffice was the first to migrate to XML, StarOffice 8 provides an extra layer of support for OpenOffice, Corel WordPerfect was an early XML adopter and will soon import/export to other XML office suites, and Microsoft Office 2007 is built on XML. OK, the future has arrived. Now what?
Internationally, the talk about moving to ODF is widespread. Meanwhile, Google and others are offering web-based alternatives. Corel WordPerfect is offering both an online office suite and promises to work interchangeably with Microsoft and ODF office suites. What’s going on here? Massive confusion and change, and this is just the beginning.
As a teenager in northern New Hampshire, I worked after school and on weekends in a small country store. I calculated retail prices, stamped them onto cans, then stocked the shelves. I also worked the checkout register, carefully entering each item’s price into the register. This was before the use of UPC bar codes—indeed, before the ubiquitous use of microprocessors.
Like its web counterpart, Content 2.0 is emerging in rapid fits and starts. There will be an evolution of electronic formats (and extinctions) via marketplace natural selection. Fundamental structural change is occurring in office documents, containers of 80% of all information. This happened first with ODF in OpenOffice and StarOffice 8, and now through OpenXML in Microsoft’s Office 2007.
As we started 2006, I saw the “Clash of the Titans” metaphor as a way to view the struggle to dominate our content tools: Google and Microsoft were the titans, locked in mortal combat.
Most of us—even we pack rats—must deal with the practical limits of magnetic and physical storage space. Like it or not, we have to be selective about what we keep and what we delete. While on the corporate side, the threat of litigation might provide incentive to toss stuff as soon as possible to avoid preserving content that could be the target of discovery in a lawsuit, there are also requirements that some things be maintained.
A subtle shift is occurring in the way we value and manage our office content—those files that constitute 80% of the investments we all make in our mainstream office work: text documents, spreadsheets, and presentations. Today there are tremendous legal pressures to ensure that we abide by various mandated schedules to keep documents as long as the law requires (but no longer). On the flip side, practices are emerging to selectively destroy many of our documents that we need not keep at all. Destruction provides a measure of protection from widely cast subpoena nets.
Up to the mid-’90s, managing content was easy. Records managers cataloged documents, locked them up, and when their retention period expired, destroyed them. The main threats were fire and water. Today, content comes in thousands of electronic formats, including email. Content’s central importance is attracting a new threat: patent litigators, the modern Willie Suttons, because—as Willie famously said about robbing banks—that’s where the money is.
Vendors always compete for your computing desktop. Some competition takes a whimsical form, like “Flying Toaster” screen savers. Some competition is strategic: operating systems, browsers, Internet services, and more recently, desktop search. Another big battle is brewing for your desktop. This time, it is about content—specifically, office documents: word processing, presentation, and spreadsheet files. The titans this time are Microsoft and Google, with assistance primarily from Sun and OpenOffice, and with lots of lesser players also getting into the act. Office products account for a large portion of Microsoft’s profits, so I believe this will be a Battle Royale.
It seems like an eternity since the initial promises of XML, and many have faded from memory. Remember any? Pay the considerable cost of using XML to structure your documents, and they would pay you back by providing ways to convert, reuse, reassemble, or analyze them. Invest in structure now; get dividends soon. It’s been a long wait, but interacting with documents may be just around the corner.
I’m not humming a tune about Google, but that company rocks, and its engine is very popular. On the content side, Adobe can claim universal acceptance of Acrobat and its built-in search. Most large firms have made long-term commitments to a single enterprise CMS from the likes of Documentum or FileNet, or to a single database vendor like Oracle. Each such commitment is also an indirect commitment to that vendor’s search system.
From the start, Adobe fine-tuned Acrobat with releases every 18 months or so. Most releases offered stunning new features, often with a modified interface, and an increasingly heavier client footprint that took correspondingly larger amounts of storage and time to load. Some versions seemed perfunctory; others offered significant new capabilities. Acrobat 7 falls into the latter camp. After letting the new 7.0 release settle down with the inevitable service upgrade, what is really new about Acrobat 7.01? More important, given Adobe’s acquisition of Macromedia, is it time to fundamentally reconsider your use of Adobe and Acrobat?
Maybe it’s because I have always had more stuff on my PCs than most of my peers, and as such, I have trouble finding things, that I was a very early adopter of desktop search. I really loved a $99 product called QuickFind from a small company named Softscape. I found it so useful that I wrote a review of it in 1998. QuickFind indexed and found all major files you created on your PC. In those days, “networking” meant “dial-up,” and my home PC was essentially standalone. PC viruses were almost unheard of. You found worms only in your garden and Trojan horses were the stuff of Greek mythology.
I am one of many who rode the XML content roller coaster up: high hopes for the use of SMIL in multimedia; SVG for graphics; create-once and reuse many times for everything from office documents to highly-disciplined technical documentation. And down: Microsoft ignored SMIL and SVG; few office workers ever mastered using MS Word styles that could provide additional document structure. What hope was there for the discipline and promise of XML?
While the word may cause eyes to roll, organizations may find that taxonomy also causes blood to boil.
Among the chores people hate most is filling out forms—paper or electronic—and vendors have struggled to make usable eforms for years. Three recent attempts show promise.
You may have invested a great deal in site design and maintenance yet still have a silent majority of frustrated Web visitors. Help might help.
Several times this year I’ve read proclamations from journalists and consultants that 2004 will be “The Year of Search.” Didn’t search already arrive?
Without trying to convince you that monitoring every XML-related occurrence is good for you, I will explain why I monitor the W3C and other sites. Perhaps you’ll see how stewards of econtent might also find it useful and even develop a taste for it.
2003 has been the “year of content,” and 2004 promises even more excitement. By content I mean a “book-like collection of related information objects;” “book-like” because nearly all content carries some of the attributes of books.
For nearly 10 years, Adobe Acrobat’s Portable Document Format (PDF) has remained the undisputed standard for visually-faithful electronic renditions of print documents. With such momentum, what more could Adobe do? Not rest on its laurels.
If you work with STM publishing, sooner or later you’ll need to produce mathematical expressions, which seems simple until you try to bridge the gap between authors and production.
What happened in the past five years to divert XML from its original use, and how does this affect plans for your content today?
You know a concept has gone mainstream when you find that related products are frequently out of stock at your local discount warehouse. For me, that epiphany was prompted by—as unlikely as it may seem—paper shredders.
Let’s look at taxonomies, categorization, product creep, and XML as further differentiators in selecting a single search solution for knowledge management.