Company: Sony Computer Entertainment America
A subsidiary of Sony Corporation of America, Sony Computer Entertainment America (SCEA) markets the PlayStation (PS) family of products and develops, publishes, markets, and distributes software for the PS 1 console and the PS 2 computer entertainment system. Additionally, SCEA has developed and currently manages a global online gaming network for its PlayStation 2 console. In the five years since its introduction, Sony has shipped more than 100 million PlayStation 2 units worldwide. www.us.playstation.com
Every month, PlayStation 2's online network draws millions of users from across a wide range of demographics, bringing together players of varying ages and cultural backgrounds. Once in the system, users generate their own screen names, input titles for their games, and can communicate with fellow players via text messaging. All this text is potentially visible to the entire online community, creating the need for a way to monitor these lines of text and filter out anything vulgar or that wouldn't be considered family-friendly. "We want to work with teams that can help us protect our users from any of the normal Internet badness that you get on PCs," says Glen Van Datta, director of online technology, SCEA.
Vendor of Choice: Teragram Corporation
Founded in 1997, Teragram Corporation has grown to be a leader in multilingual natural language processing technologies. "The name of the company reflects our mission," says Yves Schabes, president and co-founder of Teragram. "Gram reflects something written down. Tera refers to a large scale. Ergo, we are a provider of linguistic technology that works at an extremely large scale."
Teragram's customers include such Web giants as Yahoo!, AOL, and Ask.com; major publishing companies and news organizations like the New York Times, Elsevier, and Forbes.com; and major corporations like HP, Toshiba, and SCEA. Teragram's business is split into two separate but interrelated halves. "The company has two assets. One is our dictionaries that we write and maintain in more than 30 languages. The other half is writing software that uses those dictionaries at a very large scale, using a lot of pattern-matching and trying to get at the meaning of the words," says Schabes. www.teragram.com
The Problem in Depth
PlayStation 2's online network gives its users the opportunity to play against others from across the country and around the globe, all from the comfort of their own homes. The reach of this network, combined with the console's broad and diverse customer base, brings together users of all ages and from many different cultures. To help protect its younger users by providing a family-friendly experience, SCEA needed a way to be able to filter out vulgarity in real time from all the instances where users are given the opportunity to submit text. "It's something we believe is important to ensure our community is safe and enjoyable," says Van Datta.
Implementing such a filter is easier said than done, though, especially with PlayStation 2's global footprint. "The big players are the U.S., which has essentially one language we deployed in; then Japan and Korea, which also only have one language each; and then there's Europe, where we had to encompass 21 different languages," says Van Datta.
There are also the pitfalls inherent in trying to identify what is and is not family-friendly, considering the slipperiness of any individual language. "The question of classifying if some interaction is family-friendly or not can be pretty tricky sometimes. A lot of words can be ambiguous. A lot of things that don't look to be family-friendly at first can turn out to be fine, and vice versa. Additionally, meanings can change all the time," says Van Datta. "XXX, for example, used to not be family-friendly, until a movie came out called XXX by Sony."
And if that weren't enough, there's also the issue of having to deal with the linguistic creativity of users eager to route around anything they see as infringing on their ability to speak openly. "You'd be surprised how creative people can be in trying to circumvent certain words," says Van Datta. So it's not just a matter of filtering out certain words, but also all the various iterations of letters, numbers, and symbols that can be put together to refer to words that are decidedly not appropriate for the whole family.