Language Analysis Systems: Intelligence by Any Name

Page 2 of 2

      Bookmark and Share

Who Are You?
Over 20-plus years, LAS has made a name for itself by eschewing the traditional string-matching method of name recognition used by most search engines in favor of a knowledge-based, object-oriented approach that deconstructs names to reveal ethnic origins and attributes such as the proper order of a given name, gender, marital status, literal meanings, and relationships to other names. "Names vary depending on the culture from which they originate," says Hermansen. "It doesn't take much to realize that if you only have one way of searching, some types of names are going to be overlooked. You can't cover all of the variations with a single search approach."

LAS believes that it differs from competing name-recognition tools because of its holistic approach to searches: "Our name-recognition software addresses the quality of the database both to analyze it and to remediate it," Hermansen explains. "The user can type in a name and receive immediate results telling them what culture the name is from, whether it's in the right order, the gender of the name, probable variations of the name in frequency order, and the countries in which that name is most likely to be found," he adds.

"Conducting effective name-matching involves three elements," Hermansen continues. "First, you have to understand the quality and characteristics of the database from which a search originates. You have to know what's in there that needs to be accounted for. A search algorithm cannot be built unless you first understand what's in the database," he says. "The search engine itself is the second leg of the stool," Hermansen adds. "The search engine's sole purpose is to mediate between the database and the user. The third leg, of course, is the user.

LASt Company Standing
Despite serving a niche market for much of its existence, Hermansen says LAS fits comfortably in the broader digital content space. "I have a feeling I've been living in the econtent world for 20 years now," he says. "This has been the federal government's problem from day one"—namely, how to manage and store information.

"With the econtent space, it's the volume of information that creates the problem," Hermansen continues. "It's always been the same challenge to get the right data to the right client in the right amount of time. What's different now is the scale of it. The challenge that has come out of name-recognition, specifically, is that people have avoided the problem. They didn't know about it, didn't acknowledge it, or avoided it altogether. It's the volume and exponential growth [of information] that exacerbates the problem. With ASCII, you only have 256 things you can say. But that isn't enough to even begin to cover Chinese characters, where you need 12,000 just to have a fair vocabulary.

"We're trying to bring standards to these types of data formats for names that will allow for proper aggregation," he concludes. "That's where we see our value—in helping to pull together these elements for which there are no standards."

Page 2 of 2