Hybrid storage for the hybrid cloud

By Arvind Krishna, Senior Vice President and Director of IBM Research

IBM Research has long played a pivotal role in the evolution of storage. In 1956 IBM researchers helped to create RAMAC, the first magnetic hard disk. They also developed the giant magneto-resistive head for disk drives in the 1980s that still serves as the basis for all of today’s disk drives. In 1995 they helped IBM to win a National Medal of Technology for rewritable disks. Now it’s time for us to take another big step.

With data growing at 50 percent per year, IBM is investing $1 billion to manage this digital wellspring with storage software for the hybrid cloud. This five-year investment includes research and development of new cloud storage software, object storage and open standard technologies such as OpenStack.

While perhaps not quite as captivating as watching RAMAC’s massive spinning disks must have been, the archiving capability for this new technology, called Spectrum Storage, did win an Emmy back in 2011. If you like having digital media instantly available on everything from your phone to your smart TV, you have IBM’s “Spectrum Archive” and our researchers to thank.

Living up to our heritage in developing industry leading technology for IBM products, IBM Research played a significant role in this announcement by inventing four out of six IBM Spectrum Storage offerings and contributing heavily to the other two.

Unboxing the full spectrum of data storage

We predict that storage software will overtake storage hardware by 2020, by which time it will have to manage 40 zettabytes (40 sextillion bytes) of data. We believe most of that data will be in hybrid cloud because of the flexibility it offers businesses. A company that has your data, or data you want, will be able to manage, analyze, add to, and transfer it all from a single dashboard, something impossible to do today on storage hardware that sits alone in a datacenter.

The other major benefit of storage software is that it can access and analyze any kind of data wherever it lives, no matter the hardware, platform, or format. So, from mobile devices linked to your bank, to servers full of unstructured social media information, data – via the cloud – can be understood.

This technology is already demonstrating its value. For example, Caris Life Sciences is using a part of the Spectrum portfolio to speed up the company’s molecular profiling services for cancer patients. Scientists at DESY, a major research center out of Germany, use Spectrum to crunch more than 20 GB of data per second to study atomic structures.

Beyond the next five years and all of its zettabytes, software-defined-storage can help lead us to new technologies like phase-change memory (PCM), STT-RAM, and beyond. In fact, our scientists in Zurich made a breakthrough last year in the materials development of PCM, which promises to bridge the performance gap between the main memory and storage electronics from mobile phones to cloud data centers. And its unique physical properties make it ideal to serve as the memory for our work on brain chip architecture.

That’s what’s so exciting about the storage world – it’s always moving forward. As far as we’ve come in the storage evolution, the journey is just beginning. As it has done in the past, IBM Research will be there every step of the way.


From Discovering Rock Patterns to Detecting Cyber Threats

IBM scientist Marc Ph. Stoecklin has always been fascinated by discovering unusual patterns as long as he can remember. Even as a child, during family hikes in the Swiss Alps, he was looking for crystals and special patterns of rocks.

An IBMer since 2006, this past summer Marc joined the next wave of IBM millennial managers, and today he leads a team of experts working on cyber security analytics with a particular focus on advanced threat detection and cyber security data visualization. In simple terms, he helps security analysts detect, understand, and counter sophisticated cyber security attacks.

“I’ve always enjoyed analyzing data and mining them for unusual changes and deviations from expected behavior," says Marc. 

When traditional security solutions use signature-based detection (knowing what unwanted events to look for, i.e., a strict pattern), Marc’s team looks into how machine learning, data mining, and statistical modeling can be applied to learn the behavior of laptops and mobile devices on the network.

“We look for the unknown and unexpected, like an irregular heartbeat. Research in behavioral analytics has become crucial, particularly when it comes to detecting advanced threats and targeted attacks. Attacks which an organization cannot anticipate, nor has adequate protection mechanisms for, simply because they do not know what to expect in advance, and attackers exploit zero day vulnerabilities – these attacks were crafted just for this specific organization, and no signature or blacklist will catch them,” Marc said.

“The last resort is to be constantly on the watch for any unusual behavior patterns, which reveals the presence of an attacker.”

“On top of that, data visualization is very important as the human brain is brilliant at exposing abnormal patterns, but it has to be pre-processed and condensed, especially when there is so much data as in security. Scrolling through lines of a spreadsheet isn’t nearly as effective as looking at a heat map, which is why visual design is so important to our tools,” Marc said.

Goodbye Zurich. Hello New York!

“I had the opportunity to do my PhD thesis here at IBM Research-Zurich. The IBM lab is really one of the few places in the world where you can join an industrial PhD program. It was a truly great experience both being in an academic program as well as conducting research which had direct industry impact,” Marc says.

As a member of IBM's AURORA project, Marc and the team created a flow-based network traffic monitoring and visualization system where he was responsible for the design and development of the anomaly detection and user interface components; the system has since been commercialized by other IBM businesses.

“It was really a rewarding experience as a young graduate creating software, which was sold to IBM clients worldwide,” Marc said with a proud smile.

In 2011 he left Zurich to join the IBM T.J. Watson Research Center in Yorktown Heights, NY. Here, he participated in the development of the IBM Cyber Security Analytics and Intelligence research platform.

“I didn’t have any second thoughts moving from Switzerland to New York. Sometimes you have to make decisions when you know this is the right thing to do,” Marc said.

He continues, “For me it was a chance to go into a new field and to start exploring something completely different. And I was fortunate that IBM offered me this opportunity.”

“A Needle in a Haystack” 

Still collaborating intensively with the scientists in Yorktown Heights, Marc returned to Zurich in 2012 and last summer was promoted to manager of a global team.

“It is an absolute pleasure to work with such a talented team. Most of the team members are millennials, all of whom have PhDs in data mining, machine learning or behavioral modeling in areas like big data security analytics and malware analysis. So, we speak the same jargon and relate to each other,” Marc said.

Coming from the millennial generation and being “digital natives,” Marc believes his team has an almost innate capability of understanding the minds and strategies of today’s hackers.

“You have to think in a multidisciplinary way about how things are connected and how the attacker may capitalize on different attack vectors. If you look at how enterprises and organizations have changed in the recent few years, as they embrace new technologies such as cloud services, mobile -- and employees are allowed use their own devices (BYOD) for business purposes -- the traditional perimeter-based security defense mechanisms become drastically less effective,” Marc said.

“Building curtain walls around a castle will not protect crown jewels that get distributed outside the castle.”

He continues, “One of the biggest challenges we face is that many of the attacks today are not discovered, and only a small fraction are eventually disclosed to the public. So how can anyone understand the latest trends? This is like looking for the needle in a haystack without even knowing what the needle looks like. There are only a few known examples out there and we need to study them carefully and extrapolate. The methodologies we devise have to be able to catch variants thereof as well as new types of variations.”

Looking Forward

Marc sees a number of interesting technologies over the next five years as being the real breakthroughs in cyber security.

“Understanding where the crown jewels are and who accesses them, how, when, and why will be a major focus for the future. This involves contextual models, and adaptive data protection by building fine-grained perimeters around the data, as well as real-time, historical, and predictive behavior monitoring using advanced machine learning analytics, such as techniques used in IBM Watson,” Marc said.

Searching for crystals in the Swiss mountains as a young boy Marc could never have dreamed of having such a successful career.

Marc had this advice for other young researchers.

"Whichever path you take, it is important to be innovative, and constantly questioning the state-of-the art. And most of all: start identifying missing bits, which you should embrace to make a difference"


IBM’s Zurich Lab Builds Up Systems Biology Team

ZRL Biology
Maria Gabrani (l) and Maria Rodriguez
Martinez tackle cancer with Big Data

Recently, SystemsX.ch, the Swiss initiative in systems biology, announced that IBM Research - Zurich was accepted as a new partner organization within its framework - a significant achievement making it easier for IBM scientists to collaborate with the greater Swiss biological research community.

Led by IBM Fellow Alessandro Curioni, we recently spoke with Maria Gabrani and Maria Rodriguez Martinez, two of the systems biology scientists in Zurich. The two are already involved in a number of SystemsX.ch projects.

Why is it important for IBM Research to be part of SystemsX.ch?

Maria Rodriguez Martinez:  Within the SystemsX.ch we are given an ideal framework for collaborating with a number extremely high-caliber institutes across Switzerland, like ETH Zurich, University of Zurich, University Bern and the University Hospital Zurich. Previously, to participate we had to partner with a member organization, which worked, but being able to directly participate in calls is much more productive.

In addition, SystemX.ch and the projects it funds are multi-disciplinary, which is ideally suited to our lab in Zurich since our science covers everything from atoms to analytics.

Maria Gabrani: Unlike similar organizations SystemX.ch really enforces collaboration across different disciplines, which makes us as scientists think more broadly on how to solve a particular challenge. Maria and I are a good example of this. While I focus on image processing and pattern recognition techniques, she focuses on quantitative models to study molecular patterns in biology, including cancerous cells.

SystemX.ch also has a strong pulse on global trends and it scales consistency across Switzerland, so most of the organisations are focused on the same direction, which obviously should reflect positively on a faster outcome.

Govind Kaigala and the microfluidic probe
What are some of the SystemX.ch projects at IBM?

MRM: There are several. One on-going project is led by my colleague Govind Kaigala who is collaborating with pathologists at the University Hospital Zürich to test a new prototype tool to accurately diagnose different types of cancer. This work is based on a technology developed by IBM scientists called a microfluidic probe, which is a microfabricated device to precisely shape, present and handle nanoliter volumes of liquids on biological surfaces.

Within our group, we have a PhD student starting in a few weeks who will work with Maria and me on another collaboration with the University Hospital Zürich specifically focused on prostate cancer. 

More specifically, we will analyze the genomic data of 40 patients who have been extensively profiled at the molecular level. We have information about their DNA, genes, mRNAs and proteins. The challenge is all the data, most of which is not informative, so the less relevant data needs to be removed, while still keeping the models accurate. It’s a lot harder than it sounds.

Another project that will start in April 2015, focuses on breast cancer. One of the challenges here is that cancer is a heterogenous disease, meaning that many different types of cancer cells coexist in a tumor. When you sequence or measure single cells you get a random mixture of these cell populations, which makes hard to understand the molecular alterations that characterize each population. Also, since there isn’t one particular type of cancer cell, it is difficult to treat with a single drug. 

In this project, we will generate single cell data, including the RNA and protein molecules, from the breast cancer. Then we will analyze both and attempt to reconstruct the structure of the cell populations. The final goal is to develop a statical model that identifies which population of cells has the most-relevant characteristics which need to be targeted with treatment.

Another project, with IBM scientist Jasmina Bogojeska, is focused on the human immunodeficiency virus, commonly known as HIV. After turning a deadly infection into a chronic, life-long treatable disease by cocktails of antiretroviral drugs, focus has shifted towards the ultimate goal in the combat against HIV-1: The elimination of the virus from its host.The SystemX.ch project has the ambitious goal of defining the major host and viral factors that play a role in the pathomechanisms of HIV-1 latency.

MG: Systems biology is a quintessential Big Data problem. The data is sparse and undetermined. We have developed novel techniques which enable us to detect the most relevant patterns that can describe the full set. These techniques haven’t been applied yet in this area. It will need to be explored as part of the project with the goal of finding the patterns which are most informative for the molecular phenotypes at hand, enabling the deciphering and the observation of the heterogeneity. The result is that we can do more agnostic analysis and in that respect more optimal treatment recommendations for patients, individually.

How does your research fit in with other IBM projects like Medical Sieve?

MG: Yes, Medical Sieve, which is being developed with colleagues in our Almaden, Haifa and the Australia lab, is an image-guided informatics system that acts as a filter to sort the essential clinical information physicians need to know about the patient for diagnosis and treatment planning. 

The primary focus of Medical Sieve is in analyzing the radiology type of images, such as CT and MRI. Under the context of systems biology, we focus on digital pathology and gene expression imaging, which is different. We want to model the changes the cells undergo during pathogenesis for better disease stratification and, by integrating our models with the omic work from Maria, for predicting disease progression.  In the discussions I have had with the Medical Sieve team they see our work as an additional input for the imaging technology to make the cognitive system even broader.

We can also potentially extend our method to the cytology-based prototype cancer diagnosis tool that Govind is currently developing for his SystemX.ch project.


From Internship to Professor: Cryptographer Looks Back at the Birth of Identity Mixer

IBM researchers today announced Identity Mixer, a cloud-based technology that holds potential to help consumers better protect online personal data.

Dr. Anna Lysyanskaya
The cryptographic algorithm encrypts the certified identity attributes of a user, such as their age, nationality, address and credit card number in a way that allows the user to reveal only selected pieces to third parties. The result, consumers don't lose any data, and businesses don't have to worry about securing it. (try for yourself)

Dr. Anna Lysyanskaya, a professor of computer science at Brown University, co-invented the technology with IBM cryptographer Dr. Jan Camenisch. The two worked together on Identity Mixer more than a decade ago when Anna was a summer intern at IBM's Zurich Lab, publishing a number of seminal papers on anonymous credential systems.

Today, on Data Privacy Day, we caught up with Anna to look back and to hear about her current research.

While I know it was some time ago, can you reflect on your internship at IBM Research in Zurich and share how it helped prepare you for your career?

Anna: I originally wanted to spend a summer in Zurich because I just wanted to mix it up, to take a summer break. Little did I know that it would lead to a collaboration with Jan and a research breakthrough that has been supremely important to my research career.  

Did you know back then how important privacy would be 10-15 years later?

Anna: Yes, it was pretty clear to me even back then that, unless we take serious steps to adopt privacy-protecting technologies, all our activities could easily be tracked. 

Do you have any anecdotes or stories about when you and Jan were developing the idea for Identity Mixer?

Anna: A pretty funny one is that we initially thought, towards the end of the summer in 1999, that anonymous credentials, which would eventually be called Identity Mixer, was a pretty straightforward idea given the prior work both of us had done. So when I came back in the summer of 2000, we figured we should work that one out quickly just to tie loose ends from the previous summer, and then move on to other, more challenging problems. I guess we are still tying those loose ends, because we are still working on anonymous credentials.  

Now that we have looked back, what are you currently working on?

Anna: Jan's and my most-recent collaboration, also with Anja Lehmann and Gregory Neven of IBM Zurich, is on password-authenticated secret sharing, which we nicknamed the Memento Protocol, after the Christopher Nolan film of the same name.

Here, we considered a scenario where users' data is backed up by a collection of servers, chosen by each user in such a way that the user is relatively certain that they won't all conspire against him or her. We showed that all a human user really needs to remember in this setting is a short password the same every time, no need to ever change it in order to gain secure access to his data. This work appeared in the most recent CRYPTO conference.

Other things I have been working on range from non-interactive zero-knowledge proofs, to physically uncloneable functions to, yes, more anonymous credentials. 

What will online privacy look like five years from now?

Anna: Hopefully we have, with the recent stories of data breaches, reached a point where large corporations understand that they need to protect the privacy of their data and their users. So this may lead to better security; whether five years is soon enough is not clear to me at this point, but I hope so.   

In my opinion, a missing ingredient is leadership. I think IBM can show leadership in educating the industry on what can be done, and how to do it, and also how not to do it. It is already doing it to a large extent, and hopefully can do more.  

Another missing ingredient is education, and not just for undergraduates, but also for practitioners. Here at Brown we are working on a Master's program that will consist of a mix of on-campus and remote learning, and will teach executives what they need to know about security, privacy, and related law and policy. We are very excited about this!

Join Anna, Jan and other experts today, 28 January at 10:00 AM New York (16:00 Paris) for a live Tweet Chat about Identity Mixer and privacy technologies. Use #identitymixer. For details visit http://ibm.biz/identitymixer


What are the sounds of the city saying?

IBM develops analytics and classification technology to provide data for a new kind of multimedia-based search engine

Editor’s note: This posting was authored by Zvi Kons, researcher in the Speech Technologies group at IBM Research - Haifa

When you walk down a busy street, do you ever notice the sounds that surround you? People, traffic, music; city sounds are often like the foreign language the couple next to you at the café is speaking—background chatter. That city buzz, though, together with related visual images, has the potential to generate a continuous stream of information that can indicate real-time dynamics of the city.

To gather, process, analyze and ultimately separate useful sound from white noise, my team at IBM Research-Haifa is working on new technology for searchable audio analysis as part of the EU-funded project called SMART (Search Engine for Multimedia Environment Generated Content).

We’re developing algorithms and an engine to analyze those city sounds, extracting information that can be cross referenced with video images to generate real-time content. Our research on audio classification is an integral aspect of a new kind of internet search engine that could provide locally oriented, readily available and informative content with practical applications. 

Capturing the sights and sounds of city streets to gain insight

Our team collected data from two locations in Santander, Spain. Because the municipality is a partner in the SMART project, they offered to support the technical aspects of the infrastructure needed and are helping test the technology. Cameras and microphones set up in the town square and market area provided continuous audio and visual data of normal daily activity for one month, collecting more than 1,000 hours of data. We analyzed the sounds to note various types of activities, and to identify patterns and anomalies, like peak hours for busy crowds in the market square, traffic, and special events.  

Santander city square

Visual representation of weekly audio from the city square

The audio from the video above and others produced this diagram that shows a visual representation of the weekly crowd activity level; blue for low activity, red for high activity.

Another sample detected a day with unusual crowd noise, music, and applause. By cross-referencing with video footage from nearby street cameras, it turned out to be from a protest rally on a nearby street, which could be important information for analyzing any immediate security risk, or the need to send a news team to report on a developing story.

Listen to the mid-day rally as it passed on the top right corner of the frame:

Applying audio analytics

The sounds of privacy
To address potential privacy and legal issues, the SMART team used wide angles and low resolution for the video cameras. The microphones were placed at a distance to pick up crowd noise rather than intelligible speech or individual conversations.
The idea behind SMART’s new multi-media-based search engine is the incorporation of information gleaned from the environment. We can use data from city sounds and video images, as well as social media like tweets, to identify events and situations in real-time and make that information available online. The sounds of the city can help identify a drunken brawl, a spontaneous demonstration, a musical event, or an accident during rush hour. This kind of readily available information could be valuable for security systems, municipal and media use, and helpful knowledge for city residents.

Our research highlights the enormous potential of easily accessible information in our physical surroundings. The technology to use that information has exciting and practical applications for smart cities, with innovative ways to interpret sounds and images.


The IBM Mainframe: The Machine that Keeps Making History

by Dr. John E. Kelly III, Senior Vice President, Solutions Portfolio and Research

In December, I had the opportunity to take part in a tribute to one of the great architects of the IBM mainframe, Erich Bloch.

In 1985, President Ronald Reagan presented Bloch and two of his IBM colleagues, Fred Brooks and Bob Evans, with the first U.S. National Medal of Technology and Innovation. The award cited "their contributions to the IBM System/360, a computer system and technologies that revolutionized the data processing industry."

Jim Collins, author of Good to Great, ranks the S/360 as one of the all-time top three business accomplishments, along with Ford’s Model T and Boeing’s first jetliner, the 707. That’s pretty exclusive company.

This S/360 evolved over time into what we know today as “z System,” now 50 years young. Today, IBM launched the latest, most powerful iteration of the mainframe: the z13. This remarkable technology continues to power the platforms upon which entire industries and much of our global economy depend.

What sets the z13 apart? The same thing that differentiated its predecessors from the competition: the world-class IBM research and development inside every machine. There is no greater font of innovation in our industry.

That’s critical, because clients today require systems of insight. They must bring together massive amounts of data — customer data, enterprise data, mobile, social, streaming and genomic data — and analyze all of it to provide insights we’ve never had before. And they must do it securely.

IBM Research contributed significantly to achieving these goals in the z13, especially in the architecture and design of the processor and memory-cache systems, optimizing the technology for performance gains. Research results also factored into the high-security crypto devices inside, strengthening hardware protection.

We are now entering a new era of cognitive computing, in which computers will increasingly analyze problems in more human-like ways. And so it’s fair to ask: will the mainframe still be relevant in this context?

The answer is a resounding “yes.” In fact, it may be one of very few systems that can make this leap. The perfect combination to perform this task is mainframes and cloud, sometimes attached, sometimes one and the same where the mainframe is the cloud. 

As part of this transformation, we are bringing the computation to the core data. z13 is designed to speed the mobile transaction experience by allowing organizations to conduct analytics on the mainframe without the need to offload data to other systems. z13 "native analytics" can handle billions of transactions per day while analyzing data in real-time. It delivers faster, more personalized services with new levels of fraud protection. This is a competitive differentiator made possible by IBMers in Research.

We’re already starting to infuse cognitive sensing technologies inside the machines. We’ll continue to embed our most advanced learning machine technology into the mainframe, not only to gain greater efficiencies, but also to extract better and deeper insights for our clients from the tremendous amounts of data they possess. 
All of this innovation is accelerating on z13. In just the past year, IBM has received more than 500 patents on mainframe technology. There are companies that would be thrilled to have 500 patents in their entire portfolio.

This is innovation with a purpose. We listened to clients from retail to healthcare, from global finance to transportation, from energy to government.  They told us what they needed to drive their industries forward and to serve their customers and constituents better. We took that feedback and brought together the brightest minds thousands of IBMers who are passionate about solving the world’s greatest challenges to design a system that is unmatched.

Every new generation of the IBM mainframe is state-of-the-art technology that has the power to transform industries and society. The z13 is the latest product of this remarkable legacy.

The revolution continues. 

Tune in to the Livestream event on Wednesday, January 14, from 2 – 4:30 p.m. ET (US) when IBM will share a whole new generation of IBM z Systems; and watch the new z Systems movie: New Possibilities (beginning 11:30 a.m. ET on January 14) to find out how today's IBM mainframe is built for the needs of today’s digital economy.


Validating a (hack-free) web experience

Patent verifies information flow downgraders  

Hacks, phishing, and other malicious cyberattacks all happen due to bugs in a software program’s code. Because no one can manually check the millions of lines of code behind a mobile app or a website, developers use verification software to check their work. And even then, sometimes the analysis that traditionally trusts that defenses installed by the developer were implemented correctly, misses vulnerabilities. 

The patent process

What I see most often is someone with a great idea, but who doesn't think of it in terms of a patentable idea. As a Master Inventor, I want to help my colleagues move those ideas to patents. – Marco Pistoia

It’s also about point of view. Many file patents that describe what they have done. But what’s more valuable is the abstract, creative use of the idea. – Omer Tripp
This is where IBM Master Inventors Omer Tripp, PhD and Marco Pistoia, PhD focused patent #8,635,602: Verification of information-flow downgraders. Filed in 2010, the invention aims to close the loop on code that – because of poorly implemented defenses – compromises sensitive data, such as bank account numbers, and passwords. 

“I would say the patent is more valuable today (it was issued in 2014) because of the explosion in cloud and mobile technologies, which necessitate defenses that are more challenging to implement correctly than ever before. We’re all accessing websites that require sensitive information about ourselves, and sometimes the software asking for that information is vulnerable,” said Omer, who has filed 174 patent applications and been issued 64. In 2012, Omer earned more than 1 percent of IBM’s entire patent total of 6,478.

Omer and Marco set out four years ago to verify what software applications claimed to be secure “downgrader” code (the part of a software program that sanitizes or validates untrusted input to a website, or obfuscates and declassifies confidential data before its release). By developing a way to check a website’s information flow downgrader, they found that live, implemented code still had security holes. “Our tool simulates what developers have a hard time testing for, such as ‘double encoded’ input, or other odd combinations of validation routines,” said Marco, who has 148 patent filings and 72 patent issuances to his name.

Their ISSTA 2011 paper, Path-and index-sensitive string analysis based on monadic second-order logic demonstrated vulnerabilities on several open source websites, earning ACM SIGSOFT’s Distinguished Paper Award. 

Trust and verify

Downgraders take input that’s not trusted, like erroneous details entered into a website login form, and help make it trusted. They sanitize the information by getting rid of certain unintended, and potentially malicious, characters and substrings. But they can be tricked by recursive nesting of the payload, and other clever tricks. The invention detects when a downgrader incorrectly allows (or rejects) accurate input – and can be integrated into standard analysis tools.

"Developers do validate that their software works as it should. But what’s often left unchecked are the inputs to the software – how the average person checks Twitter, or applies for a bank account,” Marco said. “We can now analyze those inputs to make sure the defenses a developer puts in place work, or alternatively, have errors.”  

“Now, we want to connect this invention with others we’re working on in this area, namely tools that automatically fix broken defenses. This would help developers check their code, and their downgraders,” Omer said.

More about IBM's 2014 patent leadership