Format Follows Function
Digital information comes in many formats; even what is traditionally thought of as digital assets, rather than content, covers a lot of file-extension ground. Audio clips might in WAV files, or in MP3 files at different sampling rates or using different algorithms. Video files may be in MPEG, or AVI, or any number of formats—that is, assuming the content isn't still on analog tape. Digital asset formats include Windows Media and Real Networks and related content might be in Microsoft Word, or Adobe Acrobat, or QuarkXPress, to name but a few. Anyone building a digital asset archive needs to determine which formats have the longest shelf life and how many formats to store. After all, a user retrieving a document may desire or require it in a different format.
"The formats just keep changing," says thePlatform's Olson, "and it's a lot of work to get things from one format to another. You want to think about formats carefully because you don't want to do it twice." For video, he says, the standard is probably television resolution. "The best quality you're going to get is analog TV, at 640 x 480 pixels, at a certain color depth. You should store TV-quality video that you can later convert down into lesser-quality files without paying a big penalty for doing that later." Also, he argues, pick something that's close to lossless.
Previously, he says, the video standard was an AVI or MPEG-2 master file format. "That's no longer true. You can use some pretty high bit-rate compressed formats, like Real or Microsoft, and this looks as good as I ever need it to get on a TV." Eventually, he predicts, all streaming video will be at TV quality, "even if that's 10 or 15 years from now." Depending on the system architecture, it may be advantageous to store multiple versions in the archive, so that they can be retrieved and delivered without complex real-time translation processes.
"What is the cost of storing things in multiple formats verses the cost of building an infrastructure to convert it after the fact?" asks Olson. "Say you're trying to do it live, in real time. What makes sense is to pick a format at the high end of the quality range. It might be Windows Media at three megabits, or it might be Real—but you have to pick one."
Or, he says, it might make sense to pick a common denominator. "You might decide to store in high bit-rate MPEG-4, even though it's an emerging standard, because it might be easier to convert from MPEG-4 to Windows Media or Real than from Windows or Real to each other."
Isilon's Goodwin is targeting even higher-resolution formats than TV—which need even bigger storage arrays. "An archive might comprise anything from uncompressed HDTV, which needs 1.5 gigabits per second, or compressed in MPEG-2, at up to 40 gigabits per second. And then you have all the streaming formats…you have a proliferation of formats, and you have a growth in the number of distribution points." That's a lot of storage, potentially duplicated for disaster recovery and geographical dispersal to multiple sites.
Finding Your Assets
No company wants its digital asset archive to be a roach motel, where assets check in, but don't check out. While it's essential to have an efficient way of identifying rich media assets and to store them in the appropriate archive, it's perhaps more important to have efficient tools for users to search and retrieve the information. That would imply a security mechanism and access controls, version controls, and logging.
"All too often, this information is sitting on network file shares. It's not properly archived, it's not properly tagged, there's no way for publishers to request the appropriate rendition. It's a huge inefficiency," says Interwoven's Cochran. "And how do you know who is using what, when, why, and how? How can you ensure that the proper image is being used, across hundreds of applications or thousands of people?"
In fact, Cochran argues, with a typical media company content management system, "Eighty percent of the pain is with their digital assets, and twenty percent is with their ordinary business documents and the structured information they're serving up through their Web applications."
Depending on the intended use of that information, the digital archive may also require real-time format converters, interfaces to different delivery systems (such as Web pages, email servers, or streaming media servers), intellectual property license management systems, and even billing or charge-back accounting mechanisms.
The use of such archived content, to create new material or as part of a production process, can impact workflow, says NXN's Schumacher. "The broader solutions partly involve the efficient management of content and the effective collaboration of teams during the production process," he says. "If programmers and artists can find files faster, access older versions quickly, communicate with each other simply and get management approvals rapidly, then production bottlenecks can be minimized and productivity can improve. This will not only help teams deliver their content on time, but it will save money in the process."
Cochran's position is that, "If you look at someone like a Disney or a CNN, they need to make sure they have an excellent, proper archive of all their digital assets, and an understanding of how to monetize those assets. Once you look at monetizing those assets, it's the same problem as faced by an enterprise: If you're using those assets, how do you know where the asset is being used? How can you track it, and ensure that access can be revoked if the asset is updated or if the subscription has expired?"
The type of asset should also be taken into consideration, says Cochran, as no one size fits all. "There are different technologies that you need to apply. The service that you need for a Photoshop file, where a user might request a different rendition, are very different than the services you need to apply to a movie feed, where you might need to make specific clips available for use in another property.
Architecture and Implementation
Once the requirements for the digital asset archive have been determined—what to store, what to retrieve—architectural issues come to the fore. There are several potential designs of an asset system. One approach is centralizing all content into a single relational database; that would be appropriate when there is large number of relatively small assets. However, this approach may prove inefficient when dealing with multi-megabyte or multi-gigabyte files. Another design would store the digital assets as separate files on a file server, using a content management system to provide access controls and searching mechanisms for those files.
When making those decisions, argues thePlatform's Olson, rich media managers shouldn't fall into short-term thinking. "Define your goals in terms of the 20-, 30-, 40-year objectives for your company—not for the goal of getting a good bonus this year."
Interwoven's Cochran says, "You need to start with a secure archive, a secure vault, which can store all the versions, and manage all the meta. The archive needs proper indexing facilities, which vary according to the different physical asset types. You also need transformation utilities, which are also dependent on the physical asset types."
Once the architecture has been determined, it's time to spend money, not only on the software infrastructure, but also on physical storage systems. Many vendors, ranging from Dell to EMC to IBM to Network Appliance to Sun, offer terabyte-scale storage systems, with fault-tolerant hardware, fast disk access, and often, integrated backup systems.
"You need to be very concerned about the hardware," argues Interwoven's Cochran. "You're going to be reading and writing large blocks of data to disk, and you need to have sufficient disk storage, and fast enough disk, to accommodate the archive over time. The repository is going to be massive: You're talking terabytes upon terabytes of disk storage. For something like Discovery Online, you might start out with 8 terabytes." Smaller companies, he says, might require only a single terabyte of disk space during an initial deployment.
Cochran recommends, "You might think about a nice, big server machine that sits in your glass house, in your data center, with professional backup, that will accommodate growth over time. You also have to make sure that you have a solid relational database that can store and manage all of the business objects—all of the assets themselves, with their actual metadata. Maybe multiple databases, if you're geographically distributed, that are synchronized with each other."
The physical assets could also be stored natively on the storage device's file system, with the database and content management system handling links and providing access to those files—that's a better approach, he says, with big digital files that are larger than a megabyte in size. There may also be a metadata model that links together all the storage systems within the enterprise, and that is separate from the actual archive of the digital assets, he says.
About the asset management software itself: businesses can choose to purchase an off-the-shelf solution, like those sold by Interwoven and thePlatform, or build one themselves. Isilon's Goodwin says that among his company's clients, about half go the homegrown route. "They just can't always find a commercial product that fits their needs," he says.
For the file server itself, Cochran says, "You can sit on a Network Appliance box, or sit on an EMC Symmetrix, and get really rapid file I/O (input/output)." Network Appliance Inc. sells freestanding network-attached storage appliances; EMC Corp.'s Symmetrix is a high-end storage platform that connects to servers or mainframes. Other major players in the storage arena include Dell Computer Corp., IBM Corp., StorageTek (Storage Technology Corp.), and Sun Microsystems Inc., as well as upstarts like Isilon.
Between storage arrays, content management systems, distribution systems, license and rights management system, and workflow and versioning tools, the issues involved with digital asset archiving are many—but the benefits are many as well. Between the value of the assets for internal operations, and the potential for monetizing new applications of assets, a digital asset archive is worth designing and implementing properly. Think in terms of decades, not today's latest technological innovation, and your storage archive will be serving your business for years to come.