To implement a family-friendly filter robust enough and capable of working in near-real time, SCEA turned to Teragram. "We gave them a basic classification engine and also some initial content and rules for some languages. Sony then took our toolkit and built on top of that," says Schabes. "What we provide is the technology. We don't really suggest any possible use for it. It's completely up to our customer to decide what they're going to do."
For SCEA, that meant taking Teragram's software and installing it across their network of servers. "We have game servers that are used for matching players between each other. Even if the game is itself a peer-to-peer game, you have to come to our serves to do some matching," says Van Datta. "In those servers we've integrated in the Teragram software. For every message that comes to the server that's in text form, we then look and see if there's something in there that's some kind of text that could be considered vulgar."
When Teragram's software identifies a vulgar word or phrase, it can then remove the offensive term or the entire text thread. "We can either reject the entire message or we can reject just the word and replace it with something else," says Van Datta. "We replace that text with asterisks." But the level of filtering that Teragram's software is capable of goes beyond simply identifying a single offensive word; it can even recognize how words are being used in context. "It's very sophisticated. You can't use the words girl and sex in the same sentence, but you can say girl and sex. You can't use 69 unless there's a reference to a car in there. The Teragram software allows you to set rules like that up," says Van Datta.
Teragram's robust set of dictionaries allows SCEA to accommodate the cultural differences that can exist between how one country perceives a word versus another. "One of the beauties of Teragram's solution is that we can put any kind of language in the same dictionary, so if one word is vulgar in the UK but not in the U.S. we have a vulgarity dictionary that can distinguish between the two and react based on what country a particular user is playing in," says Van Datta.
Teragram's software can also be set up so that it is adaptive, helping SCEA to identify new uses of formerly safe words in vulgar ways. "We basically have set some procedures that key us in on if somebody's using a new vulgarity in a new way. So once we've identified that, we can go in and change our rules to address that," says Van Datta.
By all accounts, SCEA's implementation of Teragram's software has been an unqualified success. "We've done some real number crunching on the Teragram stuff to figure out if we should do it ourselves or buy third-party software. In this case, Teragram's software is literally taking nanoseconds to do this filtering in the servers," says Van Datta. "It's the fastest, most efficient thing we've found. They're definitely doing everything we want and more. They're actually the only third-party software in our network; we've designed everything else completely from the ground up."
As language is a constantly evolving thing, it's up to SCEA to keep on top of this evolution to ensure the protection of its users. "The ongoing part of this is, are we doing an effective job of matching what we're doing to keep up with how the communities and our users are trying to circumvent it?" says Van Datta. "We started off with about 20 rules. We're up to 45 now. I'm pretty sure we've probably tripled or quadrupled our actual words or combination of words in our dictionary. I'm sure we don't catch 100%, but it's a constant thing to try and keep up."
But while SCEA's filters are currently running 100% of the time, there still is a major loophole in their ability to maintain a family-friendly environment. "If you want to do something that circumvents the filter, it'd be doing something with voice," says Van Datta about its online network's voice chat capabilities. "If you want to speak vulgarities in voice, that's something we can't filter."
Teragram's plans for its software and dictionaries, while to date focused primarily on large companies with huge amounts of data and documents to sort through, does include expanding out to the masses. "That's the next step, having this technology trickle down to the consumer level, and before that medium-sized businesses," says Schabes. "There is a trend towards an exponential increase in information at all levels everywhere, so the need for technology such as ours to sort through and categorize that information will become acute in the coming years."