IBM 5 in 5 2012: Sight

A pixel will be worth a thousands words.

Editor’s note: This 2012 5-in-5 article is by IBM’s John Smith, senior manager, Intelligent Information Management.

They say a picture is worth a thousand words, but for computers, they’re just thousands of pixels. But within the next five years, IBM Research thinks that computers will not only be able to look at images, but help us understand the 500 billion photos we’re taking every year (that’s about 78 photos for each person on the planet).

Getting a computer to see

The human eye processes images by parsing colors and looking at edge information and texture characteristics. In addition, we can understand what an object is, the setting it’s in and what it may be doing. While a human can learn this rather quickly, computers traditionally haven’t been able to make these determinations, instead relying on tags and text descriptions to determine what the image is.

One of the challenges of getting computers to “see,” is that traditional programming can’t replicate something as complex as sight. But by taking a cognitive approach, and showing a computer thousands of examples of a particular scene, the computer can start to detect patterns that matter, whether it’s in a scanned photograph uploaded to the web, or some video footage taken with a camera phone.



Let’s say we wanted to teach a computer what a beach looks like. We would start by showing the computer many examples of beach scenes. The computer would turn those pictures into distinct features, such as color distributions, texture patterns, edge information, or motion information in the case of video. Then, computer would begin to learn how to discriminate beach scenes from other scenes based on these different features. For instance, it would learn that for a beach scene, certain color distributions are typically found, compared to a downtown cityscape, where certain distributions of edges are what make them distinct from other scenes.

Once the computer learns this kind of basic discrimination, we can then go a step further and teach it about more detailed activities that could happen within the beach scene: we could introduce a volleyball game or surf competition at the beach. The system would continue to build on these simpler concepts of what a beach is to the point that it may be able to distinguish different beach scenes, or even discern a beach in France from one in California. In essence, the machine will learn the way we do.

Helping doctors see diseases before they occur

In the medical field where diagnoses come from MRI, X-Ray and CT images, cognitive visual computing can play an important role in helping doctors recognize issues such as tumors, blood clots, or other problems, sooner. Often what's important in these images is subtle and microscopic, and require careful measurements. Using the pattern recognition techniques described above, a computer can be trained to effectively recognize what matters most in these images.

Take dermatology. Patients often have visible symptoms of skin cancer by the time they see a doctor. By having many images of patients from scans over time, a computer then could look for patterns and identify situations where there may be something pre-cancerous, well before melanomas become visible.

Share a photo – get better discounts

It’s not only images from specialized devices that are useful. The photos we share and like on social networks, such as Facebook and Pinterest can provide many insights. By looking at the images that people share or like on these social networks, retailers can learn about our preferences – whether we’re sports fans, where we like to travel, or what styles of clothing we like – to deliver more targeted promotions and offer individualized products and services.

Imagine getting promotions for kitchen gadgets or even certain kinds of food based on the images pinned to your “Dream Kitchen” Pinterest board.

Using Facebook photos to save lives

Sharing photos on social networks is not only beneficial for retailers and marketers, it could also help in emergency management situations. Photos of severe storms – and the damage they cause, such as fires or electrical outages – uploaded to the web could help electrical utilities and local emergency services to determine in real time what’s happening, what the safety conditions are and where to send crews. This same type of analysis could also be done with security cameras within a city. By aggregating all of the video data, police datacenters could analyze and determine possible security and safety issues.

In five years, computers will be able to sense, understand, and act upon these large volumes of visual information to help us make better decisions and gain insights into a world they couldn’t previously decipher.

If you think cognitive systems will most-likely have the ability to see, before augmenting the other senses, vote for it, here.

IBM thinks these cognitive systems will connect to all of our other senses. You can read more about taste, smell, hearing, and touch technology in this year’s IBM 5 in 5.

Labels: , , ,