IBM
develops analytics and classification technology to provide data for a new kind of multimedia-based search engine
Editor’s note: This posting
was authored by Zvi Kons, researcher in the Speech Technologies group at IBM
Research - Haifa
When you walk down a busy
street, do you ever notice the sounds that surround you? People, traffic, music;
city sounds are often like the foreign language the couple next to you at the
café is speaking—background chatter. That city buzz, though, together with related
visual images, has the potential to generate a continuous stream of information
that can indicate real-time dynamics of the city.
To gather, process, analyze
and ultimately separate useful sound from white noise, my team at IBM
Research-Haifa is working on new technology for searchable audio analysis as
part of the EU-funded project called SMART (Search
Engine for Multimedia Environment Generated Content).
We’re developing algorithms
and an engine to analyze those city sounds, extracting information that can be cross
referenced with video images to generate real-time content. Our research on audio
classification is an integral aspect of a new kind of internet search engine
that could provide locally oriented, readily available and informative content
with practical applications.
Capturing the sights and
sounds of city streets to gain insight
Our team collected data from
two locations in Santander, Spain. Because the municipality is a partner in the
SMART project, they offered to support the technical aspects of the
infrastructure needed and are helping test the technology. Cameras and
microphones set up in the town square and market area provided continuous audio
and visual data of normal daily activity for one month, collecting more than 1,000
hours of data. We analyzed the sounds to note various types of activities, and
to identify patterns and anomalies, like peak hours for busy crowds in the
market square, traffic, and special events.
Santander city square
|
Visual representation of weekly audio from the city square |
The audio from the video
above and others produced this diagram that shows a visual representation
of the weekly crowd activity level; blue for low activity, red for high
activity.
Another sample detected a
day with unusual crowd noise, music, and applause. By cross-referencing with
video footage from nearby street cameras, it turned out to be from a protest
rally on a nearby street, which could be important information for analyzing
any immediate security risk, or the need to send a news team to report on a
developing story.
Listen to the mid-day rally
as it passed on the top right corner of the frame:
Applying audio analytics
The
sounds of privacy
To address
potential privacy and legal issues, the SMART team used wide angles and low
resolution for the video cameras. The microphones were placed at a distance
to pick up crowd noise rather than intelligible speech or individual
conversations.
|
The idea behind SMART’s new
multi-media-based search engine is the incorporation of information gleaned
from the environment. We can use data from city sounds and video images, as
well as social media like tweets, to identify events and situations in
real-time and make that information available online. The sounds of the city
can help identify a drunken brawl, a spontaneous demonstration, a musical
event, or an accident during rush hour. This kind of readily available
information could be valuable for security systems, municipal and media use,
and helpful knowledge for city residents.
Our research highlights the
enormous potential of easily accessible information in our physical surroundings.
The technology to use that information has exciting and practical applications
for smart cities, with innovative ways to interpret sounds and images.
Labels: audio, EU, IBM Research - Haifa, multimedia, Santander, search, SMART, speech analytics, Zvi Kons