Of the nearly 10,000 entries submitted to the 15th Annual Webby Awards, fewer than 10% were distinguished as an Official Honoree. This honor signifies an outstanding caliber of work.
Software researchers at the IBM Research Visual Communications Lab in Cambridge, Massachusetts designed the website Many Bills “to make congressional legislation easier to digest.” For their work in simplifying Congress’ bills, the site was named an Official Webby Honoree.
The inspiration for Many Bills came, in part, during the team’s 2009 Transparent Text symposium. Attendees from the nonprofit MAPLight wanted to visualize seemingly unrelated subjects often written into legislation. Done to help gain votes from colleagues, among many other reasons, MAPLight Used the example of a credit card reform bill (HR 627*, sponsored by Rep. Carolyn Maloney, D-NY) that included a section about firearms in national parks.
“Guns in parks, in the same context as credit card reform, seemed a bit off topic,” said Researcher Irene Ros. “So, we began thinking of ways to visualize these potential inconsistencies.”
Reading and learning from many, many bills
Legislation is public record. Anyone can find “firearm” and “national park” in HR 627 – after 13,761 words, on page 32, in section 512, paragraph (a), sub-paragraph (3). Digital copies are available at the Library of Congress’ website, Thomas, which the site GovTrack.us aggregates to make bills downloadable in bulk.
Bills are categorized by the Congressional Research Service, an arm of the Library of Congress. Many Bills breaks up the bills into these constituent sections and assigns topics to them. It assigned HR 627’s individual sections to the categories “economics,” “education,” and “natural resources.”
Many Bills “learned” how to decipher and categorize this legislative lingo by digesting 10 years worth of past bills using a machine learning toolkit called MALLET.
Users improve Many Bills
Searching for a topic in Many Bills not only shows which bills contain that term or terms, but also highlights the terms within the bill – making it easy to find the term “firearm” in HR 627, for example.
Many Bills also allows users to identify “misfits,” or the text that goes beyond minor category deviations, and into entirely different topics, within a bill (such as HR 627’s section 512). And because Many Bills may not always classify a bill correctly, the site offers users a crowdsourcing feature to flag topics and the “misfit” labels.
We wanted to create a place for people to go and easily look at legislation that is otherwise arcane, baroque documentation.
Everyone from journalists, students, and armchair policy wonks can search, study, comment on, and share legislation that affects subjects they care about. Investigating the legislative record of an incumbent congressperson or senator around election time has never been easier.
This visualization technology has broad applicability in analyzing any large set of documents, such as the tax code; lengthy business documents and processes; state legal statutes – any place where the amount of text data is an impediment to its digestion.
For now, IBM hopes that Many Bills will help make a seemingly endless ocean of legislation more useful and accessible to everyone. Many Bills’ technology is another example of how analytics can help us understand and organize large data sets.
For more about Many Bills, the team presented at The ACM CHI Conference on Human Factors in Computing Systems in April. And in July, graduate student intern Elif Aktolga from UMass-Amherst will present how she improved Many Bills’ “misfit” detection algorithm at SIGIR 2011.
*Note: The example of HR 627, which passed into law in the 111th Congress, was used only to demonstrate Many Bills’ technology. No judgment or political statement is intended.
Labels: congress, legislation, machine learning