IBM Research: Getting Hot with Data Retrieval

IBM scientists and developers spanning from Almaden, California; Tucson, Arizona and Zurich, Switzerland recently achieved a significant breakthrough in distributed FLASH cache for enterprise transaction processing. The technology was recently unveiled at the IBM EDGE 2012 conference in Orlando, Florida and was demonstrated to show a latency improvement of more than 5x for certain workloads. The advancement will make finding documents faster and with real time analytics.

We recently caught up with one IBM's storage technologies scientist Dr. Ioannis Koltsidas in Switzerland to understand the achievement.

"In this era of Big Data, this technology can help with
real time analytics for banking transactions, medical data
and billing systems," said Dr. Koltsidas

Ioannis Koltsidas: Sure, simply put we have created a novel caching framework that exploits synergies between storage area network (SAN) storage and servers called Triton. In complex global IT environments it is not uncommon to have multiple servers connected to a SAN. Within these environments there is hot data, which is accessed often, and cold data, which isn’t. We’ve developed several novel technologies that enable users to access the hot data at a fraction of the SAN latency by storing it in local caches based on Flash memory.

IK: I’d say that most data-intensive applications will benefit from this technology. We are especially looking at applications such as transaction processing for brokerage workloads, document retrieval and content management, as well as Virtual Machine storage in scale-out environments. Also, in this era of Big Data, this technology can help with real time analytics for banking transactions, medical data and billing systems, for instance.

Using large solid state drive arrays such as the IBM EXP30 Ultra we can store up to 10 Terabytes in the cache. So if an organization has a lot of hot data we can make it quick to retrieve.

IK: I helped in designing a smart way to manage the cache so that high performance and high scalability can be achieved. More specifically, an algorithm that recognizes which data is hot and which is not. It nearly knows what you want before you do, because it looks for patterns in what data is accessed and when.

IK: It will be generally available in 2013, but we will continue to refine the code and look to port it to different server and storage platforms and make it available to both native and virtual environments. We also see a strong opportunity with IBM PureSystems and Netezza.

IK: My PhD thesis focused on databases for flash storage so this is a topic near and dear to me. As I mentioned we are also now firmly in the era of Big Data and as nearly any scientist will tell you its always good to have strong market demand for your research.