BlueSNP scales statistical genetics studies

New open source tool could help researchers and hospitals find the chromosomal needle in the haystack using the processing power of computer clusters.

Editor’s note: This article is by Dr. Robert Prill, IBM Research Alamaden.

At the genome level, everyone is 99.5 percent identical. It is the half percent difference that may hold the key to understanding many diseases and give clues about potential new treatments for common and rare diseases. At millions of places in the human genome, the DNA spelling (the nucleotide arrangements of adenine, guanine, thymine and cytosine) varies from person to person. Single-letter spelling differences between individuals are called single nucleotide polymorphisms, or SNPs (pronounced "snips").

A new open source tool from IBM Research called BlueSNP lets genetics researchers harness computer clusters to rapidly analyze vast numbers of peoples' SNPs and diseases to discover the genetic factors influencing disease predisposition.

The standard statistical genetics method for this type of analysis is a genome-wide association study (GWAS), which involves sifting through more than one million SNPs in hundreds to thousands of people to discover the handful of SNPs that alter the risk of getting a particular disease. The method works by identifying the places in the genome where people who are afflicted by a disease tend to have a certain DNA spelling, while people who are healthy tend to have a different DNA spelling. When a disease-associated SNP is in or near a gene (the genetic instructions to make a protein), it can lead to a hypothesis about how certain nucleotide spelling combinations affect disease risk and could lead to new medicines.

Translational GWAS -- from laboratory research to health system analytics

Two converging trends are transforming GWAS from a method intended for relatively small-scale research studies (one disease, thousands of people) to an analytic tool that scales to the patient population of an entire health system (thousands of diseases, hundreds of thousands of people). 

Identifying patient groups is the first step of GWAS. Analytics for electronic medical records now make it possible to efficiently identify groups of patients satisfying a myriad of inclusion and exclusion criteria. For example: A healthcare analyst can identify the thousands of people across a health system who have diabetes, but not many additional health problems; and also identify thousands of people who don't have diabetes, but are otherwise similar to the diabetic group (i.e., matched controls). 

The second trend is that the cost of DNA sequencing and related genomic data acquisition technologies are rapidly dropping. The DNA sequencing industry is on track to achieve the 2013 goal of a $1,000 genome – about the cost of a dental crown. Measuring just the SNPs, a low resolution type of DNA sequencing, already cost far less. In genomics, the bottleneck is data analysis, not data generation.

Open Source BlueSNP: Clinical bioinformatics is big-data science

IBM Research’s BlueSNP open source software can help genetics researchers to apply the GWAS method to analyze many more people and diseases than previously possible. Based on the R language and Hadoop, it runs standard GWAS calculations on Hadoop clusters. Because BlueSNP is open source, the genetics research community can adapt the code to answer new questions that have not yet been asked. BlueSNP makes analyzing thousands of diseases as easy as analyzing one disease and by using thousands of compute cores, results come just as fast. When combined with electronic health records data, it can open a pathway to a genomics-based personalized medicine.

Figure 1.  SNPs located at the proximal end of chromosome 4 (arrow) exceeds the threshold for genome-wide significance (line) indicating that the DNA in this region of the human genome is associated with the disease under investigation.

BlueSNP's full source code is available for download, here.


A scientist for a smarter planet: Bruno Michel

IBM Research scientist Bruno Michel is a classic intellectual omnivore, with scientific interests ranging from semiconductors and computer system design to solar energy, biology and water cooling systems. Here’s a video profile in which Bruno describes his career path, his motivations and some of  his ideas for saving the planet...

Read more about his work on the IBM Smarter Planet blog.


Seeing the potential in digitized Braille

IBM Fellow Chieko Asakawa recalls the painstaking process of translating her textbooks to Braille when she was in college. No Braille textbooks were available, so family members read while Chieko translated using a Braille typewriter. Translating an English textbook required her family members to spell out each word – which could take 20 to 30 minutes per page because Braille is “written” by punching holes into paper.

When she joined IBM Research – Tokyo in 1985, Chieko dedicated her work to the digitization of Braille. Twenty-seven years later, the Japan Braille Library has recognized Chieko for that dedication with the ninth Honma Kazuo Bunka Award.

Digitizing Braille

Chieko started by collaborating with Braille libraries and volunteer groups from across Japan to advance the digitization project. The group launched an inter-library Braille network in Japan with goal of putting Braille books online – in 1988.

Honma Kazuo Bunka Award

The award was founded by the Japan Braille Library in 2004, recognition of its blind founder, Kazuo Honma, who devoted his life to making books available to the blind, helping to improve their quality of life and realize an inclusive society. 
The Braille translation network works like this: volunteers install Chieko’s digital Braille editor onto a PC. By making Braille translation data available online, it allowed Braille libraries to easily share books; helped reduce duplication of work by volunteers; and allowed volunteers to split translation project among a number of different teams in different locations.

The network was initially hosted by IBM Japan. Today, it is operated by Sapie Library, a nationwide online library managed by the Japan Braille Library and the National Association of Institutions of Information Service for Visually Impaired Persons of Japan – making Braille books available anytime, anywhere.

Chieko continued to improve Web accessibility for the blind by developing a talking web browser that converts text on Web pages to speech. Home Page Reader, developed in 1997, allowed a visually impaired person to surf the Internet by spoken word. It was capable of reading web pages in American or British English, French, German, Italian, Spanish, Japanese and other languages, and has since been rolled up into IBM’s Easy Web Browsing project.

Chieko continues to broaden her research scope beyond visual impairment by looking at the further integration of computer technology and human knowledge. She is now working on how crowdsourcing can help create technologies that everyone can use and benefit from. 


Coding the human heart

IBM Research and Cardioid
Dr. Jeremy Rice and Cardioid.
Editor’s note: This article is by Dr. Jeremy Rice a computational physiologist at IBM’s Thomas J. Watson Research Center, as told to Chris Nay, IBM Research Communications.

The Journal of the American College of Cardiology reported in 2006 that about two of 1,000 people, worldwide, die of a ventricular arrhythmia every year – the most common cause of sudden cardiac death. Predicting who will die suddenly from a ventricular arrhythmia is a huge challenge, but current computer simulations that could help cardiologists find effective therapies take hours or even days to run a single heartbeat.

To address this need for faster simulations, my team at IBM Research is working with the Lawrence Livermore National Lab on Cardioid, a code that simulates the human heart on the 20 Petaflop Blue Gene/Q, Sequoia. The 96-rack installation at LLNL can run Cardioid roughly 1,200 times faster than other published results to simulate in exquisite detail the electrophysiology of up to three billion heart cells (similar to the number in real heart) and their cell-to-cell electric coupling.

Our hope is that cardiac specialists could eventually model how their patients will react to certain drugs and how genetic variations predispose some patients to arrhythmias.

What’s happening in the heart

Cardioid runs fast enough to allow scientific inquires that were previously impractical with the existing modeling platforms. For example, we have the potential to model natural variations seen across a real population of patients.

Every heart beat electrically excites every heart cell (unlike skeletal muscle that recruits more or fewer muscles cells depending on the demands). Each heart cell is a small excitable system, and electrical activation spreads in a salutatory fashion to its neighbors.

But during a heart beat, there isn’t a clear separation between the activation of a cell and the interaction between the cells. Hence, the problem of a heart simulation needs a two-fold solution that requires a computer with the ability to track the individual cells and their interactions, up to 10,000 times for each heart beat (the number of interactions between cells, per second). This inter-processor communication is exactly what Sequoia does well.

So, what does Cardiod, running on Sequoia, reveal?

The heart beats at a constantly changing rate, and we found that arrhythmia occurs much more at elevated drug concentrations and at slower heart rates (termed bradycardia), or abnormally skipped heart beats. Our mathematical models can complement clinical work to allow cardiac specialists and researchers to extract more data from experimental studies.

A heart outside the body

“In 1946, Dr. John Gibbon, who built the first heart-lung machine 9 years earlier and went on to perform the first human heart bypass operation in 1953, began working with then-president of IBM Thomas Watson Sr. and five IBM engineers to build an improved heart-lung machine.”

Read the rest of the story, here.
Experiments don’t let us look inside a beating heart to see what is happening at the level of the individual cells. But our simulations let us predict the individual cellular responses. In fact, our simulations suggest arrhythmias can arise from subtle and complex cellular interactions that cannot be resolved with experimental techniques, now or in the foreseeable future.

Building a billion heart cells

Models of single cardiac cells have been developed since the 1960s, and three-dimensional anatomic (or geometric) models of various cardiac structures have been developed since the 1990s. And our work, with the University of Rochester, last year showed that models can greatly complement experimental and clinical data to reveal more about how genetic variations affect a person’s susceptibility to arrhythmia. However, this work only modeled single cells or small networks of cells that lack the complexity of the whole heart.

IBM & LLNL Collaborate

IBM and LLNL announced the Deep Computing Solutions collaboration. It gives businesses access to HPC systems and to IBM and LLNL scientists and engineers for simulations like Cardiod’s.

Our work to model the entire heart actually started in 1998, harnessing Sequoia’s early predecessor, the Blue Gene/L. Matthias Reumann, a post-doctoral student at IBM’s Thomas J Watson Research Lab at the time (now at IBM Research – Australia), developed a code to decompose the heart into small pieces and distribute the work to up to 32,000 processors – an unprecedented number at the time.

Cardioid is today’s version of that code. It can scale to more than 100 times the number of cores than the original – and runs about 13,000 times faster overall on Sequoia. This power means simulations of between 180 million and three billion heart cells, depending on the level of spatial detail we want to model.

Cardioid runs fast enough to allow scientific inquires that were previously impractical with the existing modeling platforms. We have the potential to model natural variations seen across a real population of patients. For example, we could measure drug effects highly dependent on individual characteristics such as age, gender, and disease history.

Also, patients’ drug concentrations constantly change from the time the drug is administered until it eventually leaves their bodies (or another dose is administered). Simulating these time scales were simply impracticable before Cardioid.

We hope to see Cardioid used by companies in the pharmaceutical industry and medical device companies to help with clinical decision support in treating deadly diseases, such as arrhythmias and congestive heart failure. Longer-term applications could include virtual drug trials over simulated patient population.

This video shows how a medication can impact the heart to promote arrhythmias. Here we see a block of the heart wall in a common experimental preparation, with the simulated drug on the left and control case on the right. The red color shows activated tissue, and the blue shows recovered tissue. The interface between red and blue shows a wavefront of activation that spreads from cell to cell. The two sequential stimulations (applied at the location of the boxes on the left surface) produce an arrhythmia known as a spiral wave, where the wavefront can continually spin. Without drug, the pattern will die out quickly whereas with the drug, the cycle is sustained and is potentially fatal.


Cryptographer, Change Agent, Anita Borg Award Winner

Maria Dubovitskaya is a PhD student of cryptography at the IBM Research – Zurich Lab and a member of the IBM Academy of Technology. She recently won the Anita Borg Change Agent Award in recognition of her technical leadership and efforts to encourage women to pursue scientific and technical fields. 

Tell us about the PhD in efficient cryptographic protocols for privacy protection  that you are pursuing.

Maria Dubovitskaya: The majority of electronic transactions involve querying databases such as when buying goods online, buying media content, or retrieving medical records. Strict access controls are required in order to perform secure operations on sensitive data. At the same time, more and more users want to minimize the amount of information the service provider can glean from a transaction.

For example?

MD: Take the pharmaceutical industry. A pharma company's database search queries can divulge a lot of information about its research strategy and future product plans. Likewise, businesses have a strong interest in keeping their patent queries hidden because such records can easily reveal a company's sensitive business strategies.

You mentioned the healthcare sector. Tell us more about its need for data security.

MD: A hospital database contains patient medical records. Controls need to be in place that allow only relevant medical personnel access to a patient's record. Given the frequent changes in medical staff at hospitals, a role-based or attribute-based approach supporting revocation of users is an obvious solution.

What people may not realize is that the mere query pattern for a particular record may reveal considerable information about, say, the seriousness of a patient's condition or the phase of the treatment. Even the access control policy in itself can divulge sensitive information about a patient's illness, just by containing a list of his or her treating specialists.

What about trendy new mobile apps that reveal the user’s location?

MD: These are nothing but vast databases with records indexed by their location. So, for example, people looking for a nearby restaurant currently have no choice but to reveal their location to the service provider. More and more people find this an objectionable invasion of privacy.

What solution do cryptographers envision?

MD: We are addressing the need efficient protocols that provide oblivious and anonymous access to a database, while preserving expressive — and, possibly, hidden — access control, and supporting revocation of users and payments.

What motivated you personally to work in cryptography?  

MD: It's an exciting application of mathematics because it solves timely problems. There are so many uses — including secure e-banking or e-payments, in fact any secure transaction over the Internet — that wouldn’t be feasible without cryptography.

And it’s more than that; it's a very powerful instrument. For example, let’s say you need to prove that you have a driver’s license in order to perform a certain Internet transaction. Cryptography allows you to do that, but without revealing your exact date of birth or other irrelevant personal data — at IBM, we're working on a solution called Identity Mixer that does this.

Maria Dubovitskaya's Change Agent Award acceptance speech.

As a female scientist, have you encountered any obstacles along the way?

MD: Not at IBM I haven’t. IBM really lives its diversity policy and I feel that I’m challenged just as much as my male colleagues. I’m neither coddled nor discriminated against. Unfortunately this isn’t the case everywhere.

At my university in Russia, there were only some six or seven women in my class of about 90 students. That’s why I am so active in student outreach programs, especially for young women.

Maria's outreach activities include founding a Women in Technology group in Russia, and participating in IBM's Exploring Interest in Technology and Engineering (EXITE) for middle-school girls.

How can we build well-balanced teams if so many girls drop out of technical career tracks either right after high school, or university, or even after working for a few years?

MD: My outreach activities and especially winning the Anita Borg Change Agent Award has given me additional incentive to mentor girls and support them from school age through senior career stages.

These programs encourage curiosity to pursue a career in technical fields and give girls an opportunity to make a fully informed choice of career prospects. This includes building up their confidence and fostering their ability to face up to the challenges. It's very rewarding, but also a lot of responsibility.

Speaking of these types of programs, tell us about the the Grace Hopper Celebration of Women in Computing.

MD: At first I was a bit skeptical about a women-only conference. I’ve learned that many technical women in the US and India welcome this format, but Europeans have been less than enthusiastic. But it turned out to be an excellent and very useful event. More than 3,600 women and girls attended!

The tracks and workshops included career development and various technical tracks. In fact, security and social networks were two of the main technical topics. Talks were given by the best and most senior women in the field. It was a great opportunity for networking, and included an extensive job fair. All the top IT companies — including IBM, of course — were there evaluating resumes and conducting interviews. It really made a believer out of me.

What advice would you give to young women interested in pursing a career in research and technology?  

MD: Let me emphasize that I would give the following advice not just to women, but to young people in general: Believe in yourself and pursue what you are curious and passionate about. Be professional and work hard. This will give you the confidence you need to succeed.

In addition, be persistent: Don't give up if something doesn't work out. 


60 Seconds with an IBM scientist

Who: Diego Alejandro Ortiz-Yepes
Location: IBM Research - Zurich
Nationality: Colombian

Focus: Computer Scientist focused on Mobile Security

"Every day new security vulnerabilities are reported for PCs, which will eventually impact mobile technologies. Our clients in banking and government need to proactively stay ahead of these attacks to keep their data safe." 

"So the team I am on develops new levels of security to protect mobile transactions of today and the future -- which range from transferring money to hopefully someday, e-voting."

"And about my hair: After I left Colombia, I started dying my hair during high school in New Zealand. Ever since then, every time I take a vacation I dye it a different color. So far it has been red, orange, blonde, green, pink and now purple.  Next up IBM blue."

Insider Tip:

"If you use your mobile device for security-sensitive operations, such as accessing, editing or storing confidential or private information, or using financial services, make sure that you only install and run apps that come from well-known trusted sources. When installing an app on your device, it never hurts to carefully review the permissions that it requires to make sure that it is not being granted more permissions than absolutely necessary."

To see Diego's real hair color and his publications and patents click here


30 Years in Japan

Editor's note: This article is by IBM Research - Tokyo Director Norishige (Noly) Morimoto.

Opening in 1982, IBM's research lab in Tokyo was its fourth – and first in Asia. IBM saw Japan as an important market to invest in research where information science and advanced computer technologies were emerging as hot research topics for major IT companies. 

Past IBM Research - Tokyo Directors

1982-86 Hisashi Kobayashi
1986-95 Norihisa Suzuki
1995-2000 Kazuo Iwano
2000-04 Yoichi Takao
2004-06 Kazushi Kuse
2006-09 Hiroshi Maruyama

Hisashi Kobayashi, who was working at IBM Research's Thomas J. Watson Research Center in Yorktown Heights, NY, was appointed director of the new lab (initially called the IBM Japan Science Institute). Dr. Kobayashi placed a keen focus on natural language processing to develop a kana-kanji conversion program, and Japanese speech recognition and handwritten kanji character recognition technologies.

But the lab also expanded into computer science, engineering and manufacturing technologies – including image and graphics processing technology, kanji-input system, communication networks, software engineering, VLSI (Very Large Scale Integration) design, parallel processing architecture, advanced workstation and artificial intelligence.

I am the lab's seventh director. And 30 years later, we're still making innovative moves -- sometimes literally. For example, in June of this year, we moved from Yamato City in Kanagawa Prefecture to a new office in Toyosu, Tokyo to be closer to our clients and partner institutions. Our science and technology team also just moved to Shin-Kawasaki (close to Tokyo's Haneda Airport) to advance research collaboration in nano-devices with The University of Tokyo.  

Today, our lab focuses on business analytics, industry solutions, and workload optimization system software. And we continue to develop breakthrough text analysis technology and accessibility technology.

Our text mining technology is used across industries such as manufacturing, finance, insurance, broadcast, telecommunication and retail. In the era of mobile communication, social networking and Big Data, we are broadening our accessibility research scope to study how analytics and collaboration technologies help advance the information access capabilities of elderly, illiterate, and disabled people to help them take an active role in our society. 

I look forward to what our lab will continue to accomplish for our clients and the world.