Teragram, a provider of multilingual natural language processing technologies, has announced that its proprietary categorization software is being used to enable auto-categorization of content for the Homeland Security Digital Library (HSDL).
The HSDL is a national security resource whose mission is to make a vast number of electronic documents (in PDF, video, and audio formats)--featuring key strategy, policy, and organizational advanced research--available to a user base of local, state, and federal homeland security policy makers. Teragram's technology enables the HSDL to automatically categorize documents saved in multiple formats for search and retrieval. Sample topic areas include Law and Justice, Borders and Immigration, Infrastructure Protection, Terrorism and Society, Weapons and Weapons Systems, Emergency Management, and Public Health.
Each day, HSDL content developers add new documents in PDF, video, and audio formats to the system, and Teragram's software is designed to ensure that these documents are properly categorized and available to approved users. Teragram worked with the library to develop a semantic, rules-based model for auto-categorization that allows the HSDL content development teams to manage growing content, providing a level of recall and precision in portal browse and search applications. Content managers can utilize reports generated by the auto-categorizer to determine whether documents are being categorized as expected.
Any changes needed, such as adding new terminology or making rules more inclusive or exclusive, can then be made by the library's taxonomy developers. Teragram's categorization technology adds flexibility to the HSDL's system. Together, the system enables full-text and metadata searchability and metadata-determined relevancy across the library collection. In addition, the library's Web interface utilizes Teragram's categorization engine to build filtered, context-specific search navigation menus in its search and browse functions.