IBM Research: Ultimate Big Data: Processing everything in the entire universe since time began

Professor Michael Garrett of Leiden University is allowed to be a little nonchalant about the current interest in Big Data. As a leading radio astronomer, he scans outer space for radio waves that might indicate intelligent life on other planets. He gathers, processes and analyzes as much data as possible from the entire known universe, going back to the beginning of cosmic time.

Now that’s Big Data.

In August, Professor Garrett was among 13 winners of $10,000 academic grants at the IBM Big Data and Analytics 2013 Faculty Awards. He has earmarked his prize money to go towards ASTRON’s efforts in the Search for Extraterrestrial Intelligence (SETI), where it will be used to develop algorithms and data-intensive analysis techniques that will also be applicable to other fields outside of astronomy. The goal is to study time-variable radio phenomena using a range of innovative SETI algorithms developed at Berkeley University.

Between searching for E.T. and accepting the award, we had a chance to ask him some questions:

Q: Tell me about yourself. 

Michael Garrett: I am both a Professor at Leiden University and the General Director and Scientific Director of the Netherlands Institute for Radio Astronomy (ASTRON). Our main mission is to make discoveries in astronomy, particularly radio astronomy. So it’s an institute which is building radio telescopes, operating them, and also doing astronomy – analyzing the data that comes in from those telescopes.

We’re usually trying to push technology to the bleeding edge – in terms of both hardware and software. That’s one of the reasons why we are an interesting partner for IBM. In a sense, the telescopes that we are building are really pushing the state of the art in the amounts of data they produce – huge amounts of complex data.

So IBM sees this as an application that they can cut their teeth with. Looking towards the future, we expect that there are going to be huge sets of data to be analyzed, and that fits very well with IBM’s focus on Big Data.

Q: How much data are you handling?

MG: At the moment, we’re looking at 1000 TB (1 Petabyte), but that’s kind of at the limit of what can be done in data processing (not in storage). We can generate in the future much larger volumes of data, but we can’t process it, all and we certainly can’t store it. So that’s a big data problem in general.

And one of the instruments we’re working towards is the Square Kilometre Array (SKA) which we’re hoping to build in Australia and South Africa – a big international EUR 1 bill project – that’s expected to generate 10 or 100 or even more times the current global internet traffic in data terms.

So it’s a problem for us because we want to make use of that data to understand the universe better. But it’s an opportunity for a company like IBM to work with an entity that’s producing huge amounts of data now, rather than 10 or 20 years from now.

The commercialization of Big Data faces the same problem – they all have big sets of data coming from a huge variety of sources and (like us) trying to make sense of it, trying to see if there are correlations between different measurements, different data that you can squeeze out of all this data, different correlations, to see if there's intelligence to draw out of this mass.

So there’s a lot of overlap in terms of handling these huge data sets and extract from them information.

Q: So you’re not about to give me a world-breaking exclusive that you’ve found alien life?

MG: No. I wish I was! Really the search hasn’t really started. Although people have been doing these experiments for 50 years, they can only cover a very tiny fraction of the sky. And I think the advantage that we have now – the new radio telescopes that we’ve built like LOFAR, which is being used today and SKA, which will be built in the future, for the first time they allow us to look at this fantastic frequency revolution and time resolution.

So with these new telescopes we’re looking for these natural signals from cosmic objects, But also, in principle, these artificial signals. And there’s never been a better time to go out and look for them, with these new generation of telescopes.

Q. How will the award money be used?

MG: It’s mainly supporting a collaboration we have with an astronomer in Berkeley, California. He’s probably one of the few young radio astronomers who’s focused on SETI, and he’s an an outstanding scientist.

We’re going to spend it on getting him over here to collaborate with us and to introduce his algorithms and techniques with what we are doing.