A U.S. soldier accidentally kills an Italian secret service agent in Iraq. The Pentagon posts a report online in Portable Document Format (PDF), discussing the tragic incident. Clever readers on the Internet, however, are able to access blacked-out, confidential information in the PDF simply by cutting and pasting sections of the censored text into a Word file. Information-security problems caused by metadata, like the Pentagon fiasco from last spring, are becoming a pressing issue for the government and many corporations, are concerned that may transform the way information is shared online.
The National Security Agency (NSA)—charged with protecting U.S. government information systems and producing foreign signals intelligence information—recently released guidance to help other federal agencies contain digital content problems like these. In particular, the NSA warns of the dangers of documents made available online in Adobe PDF or Microsoft Word formats. But experts at document management companies like San Francisco-based Workshare Technology, Inc., are skeptical that the technical guidance provided by the NSA, instructing federal agencies on how to redact or edit PDF and Word documents, will be effective.
The NSA report is a good first step, but it is "trying to rely on user education alone," according to Ken Rutsky, EVP of worldwide products and marketing at Workshare. "Depending upon user education has proven to be unsuccessful time and again," he says.
The study, entitled, "Redacting with Confidence: How to Safely Publish Sanitized Reports Converted from Word to PDF," essentially summarizes the actions the NSA says agencies need to undertake to ensure that confidential information contained in Word or PDF files is not accidentally disclosed. The architectures and applications division of the NSA's Systems and Network Attack Center (SNAC) originally published the 14-page report.
According to the NSA, three common mistakes often lead to the unintentional exposure of data: covering text, charts, and tables with blackened rectangles to block the data from being read; reducing the size of images or covering them with black; and forgetting about metadata, the hidden information and revision histories embedded into digital content by many software programs today. "The way to avoid exposure is to ensure that sensitive information is not just visually hidden or made illegible, but is actually removed from the original document," according to the NSA report.
The NSA is currently focusing on the issue because the Pentagon isn't the only government agency to get into trouble over exposure of supposedly secret information left in documents. Other high-profile incidents include the following:
Last December, metadata included in a White House policy document called "Strategy for Victory in Iraq" showed who the author of the report was, causing some slight embarrassment to the Bush administration. Last fall, details of the assassination of the prime minister of Lebanon were revealed in a U.N. document, complicating the already tense political situation in the Middle East. And last spring, an email of testimony supporting the overhaul of the Social Security Administration (SSA) from a conservative group that still contained edited comments made by an associate commissioner of the SSA and a White House aide was sent to Democrats.
Corporations have also been embarrassed by similar problems: Recently, the editors of the New England Journal of Medicine reported that drug developer Merck & Co. had allegedly deleted information connecting the drug Vioxx to an alleged increased risk of heart disease from a document it submitted for publication. That deleted information was included in metadata embedded in the document. Overseas, Westpac Banking Corp., based in Sydney, Australia, sent out emails to Wall Street analysts containing full-year profit results that it had blacked out. The information was, however, retrievable by the recipients.
There are other famous cases too—some that have gone to trial. "We are also aware of litigation in which metadata has played an evidentiary role," says Dan Venglarik, an attorney for the Dallas-based law firm of Davis Munck Butrus, P.C. "In one case against an individual file-sharing defendant, metadata was used to show that the music files being made available by the defendant were ripped by different people using different software, negating any defense that the defendant was simply engaged in fair use copying of his own lawfully purchased CDs onto a computer."
Ron Hackett, program manager at SRS Technologies, points out that metadata reveals a great deal of information about a document. It contains information like the author of a document, the company that created it, and the date it was created or saved, as well as tracked changes and comments and hidden hyperlinks. Most agree that the government and companies need to set strict policies regarding document delivery over the Internet. They must "sanitize them," says Hackett. "A better method of reviewing and sanitizing electronic documents is needed to mitigate the inherent risk of these transfers."
Software solutions—available from Workshare and other vendors—can both set policy on document distribution and enforce it at the desktop and server levels. "The software looks at the content, scans it for violations, cures it of violations, warns the user of violations, and asks them if they are sure they still want to send the document," says Rutsky of Workshare Protect. There are other metadata removal tools available, including Payne Group's Metadata Assistant and Microsoft's add-on tool, which removes hidden data prior to transmission.
These kinds of tools may soon be seen as vital, as the disclosure of confidential facts can cause a calamity. "If the metadata is not actually removed, but is only deleted from the document, the secret information could easily be exposed," says Paul Dalton, an attorney at the Dallas law firm of Cowles & Thompson, P.C., which can expose organizations to incalculable risk.