Big Data is big. It’s
2.5 quintillion bytes of
data every day big. Proper noun Big. Dr. Dimitri Kanevsky, a Master
Inventor with IBM Research, is applying the techniques to discover meaning
within Big Data to speech transcription and translation.
He recently earned a Tan Chin Tuan Exchange Fellowship in
Engineering from Nanyang Technological University in Singapore for his work in
this area, and lectured on everything from patents, to methods on optimizing
Big Data, and how those methods are used in speech and translation
technologies.
“The
award was given to me for developing methods that allow machines to operate
with large data that mostly is sparse, as in the data contains small amount of
significant information hidden among the volumes of data.
“I led
a team that applied these methods to speech, creating a new field: sparse
representation in speech,” Kanevsky said.
|
Dr. Dimitri Kanevsky delivering his lecture,
Why I Care About Hessian-Free Optimization,
at Nanyang Technological University, Singapore |
Kanevsky, who has been deaf
since early childhood, spoke to the NTU students through a combination i-Pad,
Skype, and human stenographer to put his words on screen (watch one of his NTU lectures,
here). Via wireless Internet loop, Kanevsky spoke; a stenographer
typed; text via a tool called Streamtext appeared on the screen in the
classroom; students read and responded; the stenographer heard the students and
typed again; all in a seamless, real-time flow.
Why all the moving parts? Because understanding speech, like
finding valuable information in any kind of data, requires more than just a
clever algorithm.
Current speech recognition technology is not accurate enough
to understand and transcribe (much less translate) a lecture. The “data” of
variations just in English accents, the distance of someone from a microphone, and
background noise all make it difficult to accurately decode everything being
said. This is why smartphones’ voice recognition technology – and their small
vocabularies – won’t work in these environments. Not to mention, the
translation delay would not be acceptable in a live lecture or discussion.
Kanevsky’s system instead takes advantage of off-the-shelf
components (versus expensive proprietary technology used for television closed
captioning); works over the web; and most-importantly grabs the important
spoken information.
Applying Big Data computing to speech
For a machine to truly process speech data, it needs
cognitive computing – a system with architecture that imitates how the human
brain understands information. IBM Watson’s ability to understand natural
language is just a first piece to a complex cognitive computing puzzle. But as
cognitive computing is applied to Big Data, it will also revolutionize speech
recognition and speech translation.
“One of the biggest challenges facing researchers who
develop cognitive computing for Big Data is to develop faster methods to
process large amount of data through these systems. That is what my team is
developing: efficient and fast algorithms for speech transcription and
translation,” Kanevsky said.
The techniques to find useful information in Big Data and
understand, transcribe, or translate speech are intertwined. Kanevsky explains
this with an example of trying to find audio data that represents spoken
phrases stored in an audio archive.
Kanevsky and a team of collaborators earned a patent in 1997 for developing a way to search
audio using speech recognition. It formed the basis for data mining that
involves pattern recognition technologies.
|
“You have to transcribe all the spoken phrases stored in those
archives. Then, when someone searches for a phrase like ‘I have a dream,’ the
system will find all strings for ‘I have a dream’ within the stored data, and produce
links to audio that contains this spoken phrase,” Kanevsky said.
As data mining
techniques more-efficiently identify small relevant chunks of information from
Big Data, and get applied to speech and translation technologies, it creates
reusable processes for decoding phrases that need to be analyzed to produce a final,
decoded phrase.
Applying transcription and translation to all parts of
life
Kanevsky’s work to route speech transcriptions and translations
over the Internet more than 15 years ago was the world’s first. He’s since put
similar technology into glasses that overlaid text that described or translated
what the user looked at. And he also patented the Artificial
Passenger that converses with drivers to keep them awake.
The next great speech challenge is machine translation
across different languages. Kanevsky’s team is now working on an automatic
means for a speech-to-text tool – based on cognitive computing concepts – that simplifies
spoken English for meetings between IBM researchers in the U.S. and China.
“Demonstration of real time transcriptions provided by
human writers helped to start the work on developing an automatic means for
transcription of meetings between our teams in China and in US,” Kanevsky said.
Labels: accessibility, big data, machine learning, speech transcription, speech translation