Pssst! Anybody Listening? Handheld Audio May Be the Next Big Thing

Page 2 of 2


Informio: VoiceXML
Informio, "a wireless Web infrastructure services company," builds custom applications to give mobile professionals access to critical business data over a cell phone or wireless device. MUPpies (that's Mobile Urban Professionals) can use voice to interact with their own password-protected data—to update databases, input new sales figures, delete old sales leads, etc. Based on VoiceXML, such applications are geared for enterprises with large workforces who use sales force automation, customer relationship management (CRM), and other internal databases—think pharmaceuticals, financial firms, insurance, healthcare.

Informio's proprietary Unified Media Browser, a voice version of an Internet browser, supports VoiceXML as the interface for all content and applications. (VoiceXML 1.0 was adopted as the basis for development of a W3C dialog markup language in May 2000.) Informio partners with Nuance for speech recognition technology, and recently switched to SpeechWork's Speechify for text-to-speech (TTS).

An innovative and soon-to-be-everywhere service—mobile audio email—allows users to dial in and use their voice or a touchtone command to listen to email messages as streaming audio, MP3, or .wav files. Coming soon: the ability to hear attachments (but for long documents, there'll be an option to forward them to a printer or fax machine, too).

Mark Lowenstein, Informio's chief industry analyst, postulates, "Voice is complementary to what's happening in the data world; in some ways, voice addresses the shortcomings of text-based devices. Screen scraping is not too successful… we're trying to avoid some of those interface issues with audio. The Nirvana we're focused on is the two working hand in hand. But certain things have to fall in place first—the devices and the networks have to be brought along."

Listen While You Work, Work Out, Go to Work
The more common solution for "making audio," particularly for the consumer market, is radio-quality recordings of a human voice in a sound studio. Audible Inc. is one of the leaders in this genre, offering pay-per-download audiobooks, lectures, public radio programs, newspapers, comedy skits, and much more. Audio titles are then played back for listening either from a PC or a variety of mobile, Audible-ready devices (hands-free accessories such as headphones or a car kit can be used while you're commuting, exercising, or traveling).

Audible's content partners include more than 160 audiobook publishers, broadcasters, magazine and newspaper publishers, business information providers, and educational and cultural institutions. Says David Simpson, Audible director of business development, "We receive both analog and digital content (from our content partners) and produce it as human voice—we have studios in our offices with voice talent for recordings. The process is editing, encoding, compression, and encryption."

Although most of Audible's usage falls into the mobile, dock-and-go category, that will change with wireless. Future Pocket PCs will be able to get automatic updates for industry-specific content. Simpson explains the technical details: "To deliver wireless, we create a quarter-size HTML Web page that links to a Web server to deliver the audio file on a Pocket PC with an Audible player. There would be an encrypter/decrypter in the Pocket PC, delivered in some mutually agreeable Codec (COder-DECoder), transcoded over a TCP/IP connection into the RAM of the receiver device, played back with a player device."

Is Audible considering text-to-voice software? "Today, the only way to go is professionally rendered human speech. We think that's what people want to listen to. We're not doing anything with text-to-speech, and nothing is on the drawing board until it gets better."

The Future: Personalized and Multimodal
What can we expect on the wireless horizon? Look for increased personalization, like customized audio portals and Internet radio (with targeted ads, of course), and mobile, personal productivity tools—voice-enabled email, calendar, to-do lists, etc.

And the ultimate wireless vision (at least until The Next Big Thing) is "multimodal" applications. This is where voice and keypad (text) input is combined with audio and visual output, so you could, say, read a WAP message and then tap to initiate a phone call. And it would be contextually sensitive, too, so at your desktop you might read email as text, but while driving, you'd choose to have a voice- and audio-centric experience.

So should we all rush out and get second mortgages to invest in wireless audio? Well, no, not just yet. There are still hurdles to overcome. Standards for hardware devices, user interfaces, and programming languages need to be worked out so developers can commit to applications without risk of writing appliance-specific interfaces for a device that might not be on the market tomorrow. Currently, a fragmentation of standards for wireless multimedia delivery makes that risky.

Also, upgrading to 3G (third generation) wireless networks to support high-speed, high-capacity audio and video is proving more arduous than imagined. Combined voice and data are what's needed for the wireless environment, but in the U.S. at least, the path is littered with standards, regulatory issues, and the question of profitability. It will certainly happen, but the question is how soon. Many operators, handset manufacturers, and equipment manufacturers will have trouble surviving until the boom times start—by which time, the industry will probably be dominated by fewer, consolidated players ("3G Mobile: a Booming Industry, One Day, Maybe." Commentwire by Datamonitor, http://, April 27, 2001).

The Wall Street Journal's Jessica Perry agrees. "The jury's out on the willingness to pay for any of this stuff. Some of these companies are going to have difficulty staying viable, but there's a place for all media in convergence…it still has to be figured out."

Page 2 of 2