Cross-lingual text mining

Discovering knowledge from large volumes of multilingual text data just got easier with new text mining technology from IBM Research. Using globally distributed databases, this cross-lingual text mining technology developed by the research team in Tokyo allows users to search through – and find value in – data written in a language they don’t understand.

Knowledge Discovery

For example, manufacturers selling products in the U.S., Europe and Asia could quickly identify defects, or complaints based on the data from tens of thousands of customer contact reports stored by call center operators in local customer languages. The cross-lingual text mining technology extracts context from portions of the text that the user wishes to analyze, translated to their preferred language. It analyzes and returns results, highlighting irregularities such as defects or complaints that were previously unnoticed, due to language barriers.

"Finding accurate translation pairs (to match one language to another) was a challenge in developing the technology. Often, notes taken by call center operators are not grammatically correct or truncated." said Tetsuya Nasukawa, a senior technical staff member at IBM Research – Tokyo.

“The terms being analyzed may not be defined in general translation dictionaries. So, this text mining compares how each concept is expressed in the textual database of the source’s native language – and in the textual database of the requested foreign language to determine the translation pairs.”

To go from a search tool, to a technique that extracts valuable information – from any language domain – users can apply toward trend analysis, claim processing, and other fields, the team in Tokyo used TAKMI (text analysis and knowledge mining) to find noteworthy features, trends and important issues without reading all of the data, and additional technology which extracts translation pairs from any language domains.

Last year, IBM's text mining research team received the Field Innovation Award from The Japanese Society for Artificial Intelligence in recognition of its pioneering text mining research and development effort.