“Simple” threshold algorithm earns Gödel Prize

Editor’s note: this article is by IBM Fellow Ronald Fagin.

Lauded for its “elegant mathematical properties,” the threshold algorithm that I developed with two colleagues led to our writing a paper (Optimal Aggregation Algorithms for Middleware) that won the 2014 Gödel Prize, which is awarded jointly by the Association for Computing Machinery (ACM) Special Interest group on Algorithms and Computation, and the European Association for Theoretical Computer Science. The award is given annually for a breakthrough paper in theoretical computer science. Today, the algorithm is used in applications and systems that demand optimal results for gathering multi-sourced information. But its origin, and the paper’s, goes back to 1996.

Praise for threshold algorithm

The paper provides a framework to design and analyze algorithms where aggregation of information from multiple data sources is needed, such as in information retrieval and machine learning. In these situations, the threshold algorithm offers a very efficient method for producing a single unified list of the ‘top k’ results from the combined data sources.

The threshold algorithm’s elegant mathematical properties and simplicity are particularly suitable for use in middleware, software that is often used to augment computer operating systems that support complex, distributed applications. The authors also introduced the notion of instance optimality, an extremely strong guarantee of performance, and showed that the threshold algorithm is instance optimal. The paper’s groundbreaking results have built a foundation for much follow-on research.
In 1996, I wrote a paper about an efficient data aggregation algorithm (incidentally, referred to as “Fagin's algorithm”) that gathers information from multiple sources into a single top-k list (like gathering a “top 10” list within a given set of data).

In 2001, my colleagues, Moni Naor and Amnon Lotem, and I developed a new, improved aggregation algorithm, which we called the threshold algorithm. It can be used when aggregation of information from multiple data sources is needed, to produce a top-k list. We showed that it is optimal in an extremely strong sense, in that no other algorithm can access less data than the threshold algorithm does and still obtain the correct answer.

Honestly, I was worried that the paper might be rejected because the algorithm was too simple. It took only 10 lines on a page to explain! But in our 2001 paper, we called it “remarkably simple” – sort of like turning a bug into a feature – and submitted the paper about it to the premier database theory conference, the ACM Symposium on Principles of Database Systems (PODS). The Occam’s razor approach worked. The paper was not only accepted, but won the Best Paper Award!  It went on to win the Test-of-Time Award at PODS a decade later, and is now the Gödel Prize’s first database paper, 13 years later (the Gödel Prize committee considers papers within a 14 year window).

The algorithmic setting where it is desired to find a top-k list for information that is aggregated, while minimizing the number of database accesses, has proven to be an extremely useful one. The threshold algorithm (or a variation of it) has now been applied in a large number of settings, including databases (relational, multimedia, semi-structured, probabilistic, and spatial), semantic web, information retrieval, distributed sensor networks, and product design. In fact, according to Google Scholar, the paper has now been cited almost 1,500 times.

Ron Fagin is an IBM Fellow, which is IBM's highest technical honor. There are currently 87 active IBM Fellows, and there have been only 257 IBM Fellows in the 51-year history of the program. Ron received his B.A. in mathematics from Dartmouth College and his Ph.D. in mathematics from the University of California at Berkeley. He has won an IBM Corporate Award, eight IBM Outstanding Innovation Awards, an IBM Outstanding Technical Achievement Award, and two IBM key patent awards. He is a Fellow of IEEE, ACM, and AAAS (American Association for the Advancement of Science). He has co-authored three papers that won Best Paper Awards and three papers that won Test-of-time Awards, all in major conferences. He was named Docteur Honoris Causa by the University of Paris, and a "Highly Cited Researcher" by ISI (the Institute for Scientific Information). He won the IEEE Technical Achievement Award, IEEE W. Wallace McDowell Award, and ACM SIGMOD Edgar F. Codd Innovations Award (a lifetime achievement award in databases). He is a member of the US National Academy of Engineering and the American Academy of Arts and Sciences.

Labels: , , ,