The key, then, is to provide authorized clinicians with precisely the information they need, when they need it—but only the precise information they need, so that privacy is not compromised. In an environment such as CareGroup, which deals with 40 terabytes of patient information, such careful handling requires a team of 16 data analysts who provide the necessary views, reports, and query tools for the clinicians to use. Depending on the nature of the query, some reports would need to be stripped of identifying patient information, for example, and others might need to generalize the results so no specific patient information could be inferred. In addition, Halamka emphasized the tools "need to recognize roles and rights based on clinical needs." A query that is appropriate for one clinician to perform may not be appropriate for another. Halamka noted an emergency room doctor might need ready access to a broad set of patient information. The tools, Halamka continued, "should allow you to do your job while also protecting the patient."
Is Today's Technology Up to the Task?
Given the complexity of maintaining patient privacy in an increasingly digital world, it's reasonable to ask if the technology can support the requirement for privacy while also giving clinicians access to the information they need. Practitioners like Halamka would answer in the affirmative—"We do our very best with the tools we have"—but HIPAA compliance comes at a cost. (The Department of Health and Human Services estimates it will cost the industry $17 billion over ten years to implement the HIPAA privacy regulations.)
Some of the cost of HIPAA compliance is the human cost for the data curation work done at places like Care- Group. Other costs come as organizations integrate privacy software with patient record systems. At least one interested party, though, thinks the eventual solution to the patient privacy issue may involve a new approach to database technology itself. Researchers at IBM's Almaden Research Center in San Jose have been developing the technology behind Hippocratic Databases—databases that, according to IBM Fellow Dr. Rakesh Agrawal, support the primary mission of patient care while taking "responsibility for the data that they manage to prevent disclosure of private information."
Agrawal is widely recognized as a leading thinker in the field of datamining—the discovery of useful knowledge previously hidden in massive amounts of raw data—and has been writing about privacy issues for several years. Agrawal's idea of Hippocratic Databases presumes a system where "contracts" are created between databases and users to ensure the privacy and integrity of data. "This contract system is based on 10 principles," notes Agrawal, "including stipulations that the information will be kept accurate and up-to-date, the data is used solely for what it was specifically collected for, and the data is only retained for as long as it is needed."
First, Do No Harm
"Whatever, in connection with my professional practice or not, in connection with it, I see or hear, in the life of men, which ought not to be spoken of abroad, I will not divulge, as reckoning that all such should be kept secret." —Hippocratic Oath
Agrawal's interest in privacy and databases stems from his long and serious work in datamining. At various times, datamining has been viewed as problematic because of potential privacy concerns, and the topic has been frequently discussed at conferences where Agrawal was a speaker. Attending a conference in 1995, Agrawal was struck by a question from the audience, "Can technologists change the attitude that we are not responsible for the consequences of technology?" Agrawal admits, "the question stuck with me," and it motivated him to keep thinking about this issue of privacy. In Spring 2002, Agrawal and several colleagues from IBM presented a paper, "Hippocratic Databases," at the 28th Annual VLDB Conference in Hong Kong.
"We saw it as a call to the industry," said Agrawal, and the paper's introduction said, "We suggest that the database community has an opportunity to play a central role in this crucial debate involving the most cherished of human freedoms by re-architecting our database systems to include responsibility for the privacy of data as a fundamental tenet." And while patient record information is the most obvious and important problem, Agrawal is well aware that privacy extends to many other areas—finance immediately comes to mind. "Five years from now," according to Agrawal, "information about animate things in databases will completely dwarf information about inanimate things." Moreover, Agrawal suggests the logic of managing this animate information is very different, and privacy is just one issue that presents technical challenges to today's databases.
The challenges begin with how privacy clashes with some of the fundamental benefits of a traditional database, such as concurrency and recall. Databases are very good at capturing and committing records, and then immediately making these records available in views, query results, and reports. But, as Agrawal suggests, Hippocratic databases likely require more emphasis on "consented sharing" than on concurrency.
There are database technologies in use today that support privacy, but Agrawal would argue that they either don't go far enough or they don't support the kind of use cases that Hippocratic databases require. Medical researchers, for example, rely on statistical databases to provide meaningful answers to statistical questions (average, maximum, minimum, etc.) without compromising sensitive information about individuals. Statistical databases use techniques such as restricting types of queries and "data perturbation"—where noise is added or selected values are swapped. While Hippocratic databases would benefit from some of these statistical techniques, Agrawal and his colleagues point out that Hippocratic databases will need to support a much broader set of queries and usage.
Security and encryption technologies are also increasingly in use with databases. Agrawal notes that databases can apply multiple levels of security to database items—e.g., top secret, secret, confidential, and so forth. To date, though, these techniques have been implemented in ways that can make query results uneven or inaccurate—a "top secret" query could leave "confidential" records unreported, for example. "Many of our architectural ideas about Hippocratic databases have been inspired by this [security] work," wrote Agrawal and his colleagues.