Lauded for its “elegant mathematical properties,” the
threshold algorithm that I developed with two colleagues led to our writing a
paper (*Optimal
Aggregation Algorithms for Middleware*) that won the 2014 Gödel Prize,
which is awarded jointly by the Association for Computing Machinery (ACM)
Special Interest group on Algorithms and Computation, and the European
Association for Theoretical Computer Science. The award is given annually for a
breakthrough paper in theoretical computer science. Today, the algorithm is
used in applications and systems that demand optimal results for gathering
multi-sourced information. But its origin, and the paper’s, goes back to 1996.

Praise for threshold algorithmThe paper provides a framework to design
and analyze algorithms where aggregation of information from multiple data
sources is needed, such as in information retrieval and machine learning. In
these situations, the threshold algorithm offers a very efficient method for
producing a single unified list of the ‘top k’ results from the combined data
sources. The threshold algorithm’s elegant
mathematical properties and simplicity are particularly suitable for use in
middleware, software that is often used to augment computer operating systems
that support complex, distributed applications. The authors also introduced
the notion of instance optimality, an extremely strong guarantee of
performance, and showed that the threshold algorithm is instance optimal. The
paper’s groundbreaking results have built a foundation for much follow-on
research. |

In 1996, I wrote a paper about an efficient data aggregation
algorithm (incidentally, referred to as “Fagin's algorithm”) that gathers information
from multiple sources into a single top-*k*
list (like gathering a “top 10” list within a given set of data).

In 2001, my colleagues, Moni Naor and Amnon Lotem, and
I developed a new, improved aggregation algorithm, which we called the
threshold algorithm. It can be used when aggregation of information from
multiple data sources is needed, to produce a top-*k* list. We showed that it is optimal in an extremely strong sense,
in that no other algorithm can access less data than the threshold algorithm
does and still obtain the correct answer.

Honestly, I was worried that the paper might be rejected
because the algorithm was too simple. It took only 10 lines on a page to
explain! But in our 2001 paper, we called it “remarkably simple” – sort of like
turning a bug into a feature – and submitted the paper about it to the premier
database theory conference, the ACM Symposium on Principles of Database Systems
(PODS). The Occam’s razor
approach worked. The paper was not only accepted, but won the Best Paper
Award! It went on to win the Test-of-Time
Award at PODS a decade later, and is now the Gödel Prize’s first database paper, 13 years later (the Gödel Prize committee considers
papers within a 14 year window).

Labels: Almaden, big data, godel prize, Ron Fagin