Profiled: Applied Semantics
www.appliedsemantics.com
Co-Founder: Gilad Elbaz
CEO: Jordan Libit
Number of Employees: 34
Founded: 1998
We've all been there: You enter a word or phrase into the "search" field of any major search engine, and instead of netting a targeted list of hits worthy of further exploration, you're inundated with page after page of sites featuring one or more of the words you entered, regardless of their meaning, context, or relevance to the search you're conducting. A common problem, it's one professional searchers have grudgingly learned to live with and work around.It's also a reality that many content management solutions providers are working to change. Among them is Los Angeles, California-based Applied Semantics (née Oingo), a software developer whose mission is to empower businesses to better organize, manage, and retrieve digital information in Web-enabled, enterprise, and ecommerce environments. The innovation of two California Institute of Technology graduates on a quest to make computers more "human-literate," Applied Semantics today offers a product suite of enterprise solutions that, in the words of Co-Founder Gil Elbaz, "help knowledge managers extract more value from their content and save money" in the process.
Circa 1998
At the heart of the Applied Semantics product line is Conceptual Information Retrieval and Communication Architecture (CIRCA), a communication platform that is scalable, language-sensitive, intelligent, and refreshingly accurate in making information locatable. The proprietary technology is based on an extensive ontology consisting of millions of words, meanings, and their conceptual relationships to other meanings in the human language. Thought to be the world's largest database of general knowledge—with more than 1.2 million words, half-a-million concepts, and tens of millions of relationships—CIRCA matches words and phrases to its ontology, performs linguistic analysis, disambiguates them into meanings, and weighs those meanings by importance, thus making computers more effective in managing and retrieving information. (For example, the word "java" would be recognized as an alternate name for coffee, an Indonesian island, and a computer language.) "CIRCA is about figuring out what a document is really about," Elbaz explains. "Unlike typical search engines, like Verity, Google, or AltaVista—which retrieve information based on the exact string of text [the user enters]—CIRCA maps words in the document with concepts in our ontology. Once we have a representation of what the document is about, we can then summarize and categorize it." Indeed, the soul of CIRCA is its ontology, which Applied Semantics has built and updates continuously in three ways. In addition to employing a team of 15 lexicographers and computational linguists who manually add information to the database, the company gathers data through a process called mechanical ontology expansion. "Basically, we crawl significant chunks of the Web [using proprietary algorithms] looking for patterns of repetition," says Elbaz. "You can actually derive the relationships between objects and terms in this manner. Finally, we license data via free public databases and other specialized sources…and purchase data for customers who want specific vertical knowledge bases built into the ontology."
The company itself originated with Elbaz and Co-Founder/CTO Adam Weissman, who launched Oingo in 1998 with the purpose of "focusing on unstructured information," Elbaz recalls. "We were trying to create a meaning-based search engine that would be based on a new way to store and represent knowledge. We did, in fact, successfully launch a search engine that continues to run today." (Oingo.com conducts meaning-based searches across 15 broad categories and hundreds more subcategories, including arts, business, computers, health, news, reference, shopping, and sports. It continues to be powered by CIRCA, and is operated by Applied Semantics' Naming Solutions division.)
"As the market shifted, we wanted to take the CIRCA technology and apply it to specific enterprise solutions," Elbaz continues. Since changing the company's name to Applied Semantics in May 2001 in an effort to better reflect Oingo's altered business model, Elbaz and Weissman's team has targeted the publishing, pharmaceutical/biotechnology, and financial services industries with three principle products:
- Auto-Categorizer, a plug-in to existing data-management technologies that automatically assigns documents to a predefined or customized directory to improve knowledge mining and retrieval;
- Page Summarizer, which deciphers the meanings of documents and provides customized, accurate summaries to improve the knowledge discovery process; and
- Metadata Creator, a plug-in to existing search technologies that adds automated metadata to improve knowledge discovery.
Available separately or in a package solution, Applied Semantics' enterprise tools use the same XML language to communicate results back to the user. Enterprise customers include the Smithsonian Institution and QwestDex Direct, which recently acquired a database of more than 2.3 million businesses from a third party in an effort to increase the list inventory available in its DotComDirectory.com business database. In order to quickly integrate those entries into its existing inventory, QwestDex needed an automated solution that would classify each of the 2.3 million Web sites to one or more of its 4,500 yellow page headings. Enter Auto-Categorizer, which was able to map the 4,500-topic taxonomy in four days.
Circa 2002 (and Beyond)
"One of the problems for content managers is getting the content in a format that will allow other people to find it," Elbaz laments. "One of the main ways that is done is by putting the right metadata on it, whether it be through categorization or appropriately summarizing the information. What we're doing, essentially, is providing that metadata for them."
"Early content-management solutions are mostly about the publishing process and the fundamental needs of managing documents," adds Steve Bernstein, general manager of the Enterprise Solutions division. (Bernstein, former vice president of product marketing for Inxight, joined Applied Semantics last October.) "Questions of concern in those days were, 'Where do we keep the documents?', 'Who determines when a document is complete?', and 'How do I manage version control?' The next step in that hierarchy is categorization, or 'How do I create information about the documents that will be relevant in the future?' Once you have a content-management system, the workflow, and the conversion aspects squared away, then it's about retrieval and understanding patterns.
"You can't be content-free," Bernstein continues. "You have to have an understanding of how topics relate to one another culturally. That's the only way, and it's the right approach to solving the vast number of problems content managers face."
Those problems include properly addressing the components of knowledge management that Elbaz says most content managers often neglect—namely, the "middle steps" that allow them to manage their content more effectively. "We are just in the infancy of trying to get computers to work with language in such a way that they're deriving actual meaning from a document and doing something intelligent with that document," he notes. "Once we get the core technology down, I'm looking forward to improving voice recognition [and its relationship to] datamining, monitoring chat groups, and automatic learning. There are so many interesting applications yet to be explored."