LegiNation: A Case of Classifying Bills

Feb 29, 2012

Article ImageLegiNation:

As a recent internet start-up that provides tools to help consumers find, track, and share information about bills, legislators, and related information from all 50 states, LegiNation was founded with the intent of making state level legislation more easily accessible to professionals and, ultimately, to the public at large. LegiNation will provide two major services: BillTrack50, which serves people with a financial or other interest in legislation; and Chatterslate, which will be a social venue constructed to encourage discussion, collaboration and community.


Business Challenge:

At a state level, new bills and amendments are introduced every day. LegiNation founder and president, Karen Suhaka, knew that gathering these files (there were 140,000 in 2011 alone) from all 50 states into one database would take a lot of work, and that was only half the battle. Once the bills were collected, she then needed to summarize, classify, and create keywords for each document, and began looking for an outside vendor to help her complete the process.

Vendor of Choice: Pingar

Founded in 2007 and headquartered in New Zealand, Pingar's mission is to transform unstructured data into business intelligence by developing technology that utilizes text mining and natural language processing. In addition to having strong research roots and ongoing relationships with several universities, in March of 2011, Pingar launched its API, which works any existing enterprise content management system.


The Problem in Depth:

One of the most unique characteristics of the American government is its ability to change and evolve. New bills and amendments are introduced at both a state and national level every day, and while there are comprehensive resources for those wishing to track legislature at the national level, the same cannot be said for state level legislature. "There aren't many very good resources available for the public to see what kind of bills and what legislators are doing at a state level," says Karen Suhaka, president and founder of LegiNation. To remedy this, LegiNation is compiling a searchable index of legislature from every state and making all of this data available to the public. "We are trying to gather it all together, get it into a consistent format, make it searchable, and let people look."

This process isn't easy though. With 50 states all producing different bills and amendments, it is difficult to keep up. "The 50 states are all dramatically different. It is part of the beauty of the United States, but it is also a challenge when you are trying to gather national data," says Suhaka. For Suhaka though, finding the information is only part of the challenge. "First we find what has been introduced the previous day, and if any amendments have come out or any actions have happened, or votes have been taken. That is the metadata. Then we get the actual file from the state. We gather all that up and we try to, this is where most of the work comes from our side, process all of that." LegiNation must convert all of information into a consistent format, or XML, and from there, the data then needs to be analyzed and classified in order to make it searchable, something that Suhaka acknowledges could not be done by one person.

In 2011, LegiNation collected over 140,000 bills. "In order to create tools that people could use for searching and categorizing, we needed to analyze the text. When you are talking big data, it's not really big data, but it is still quite a large number of documents. You could not reasonably read all the files and pick keywords out of them and summarize them by hand. That is not practical for an affordable service like ours, says Suhaka.  Because of the number of documents involved LegiNation, began searching for an outside entity for help.

The Solution:

According to Megan Tobin, VP of marketing and Americas, Suhaka's use of the Pingar API is something the company has never seen before. "While we definitely believe that better classification of information, the ability to find and use that information is important, often purchasers of the tools are motivated by government compliance reporting and things like that. This is somebody who had a really interesting and creative idea who was using our software in a way that we hadn't ever seen, so it was pretty enticing."

"I looked at a number of services and Pingar was definitely the easiest one for me to implement," says Suhaka. In addition, Pingar "had a free sandbox where you could go and make sure it works, so that was pretty interesting." With Pingar's API, which as Tobin explains, "has about 20 different components so you can use what you want or the entire platform," Suhaka is able to "just pass them an entire bill text and get back a list of keywords. And then that is part of the data we provide to people and part of the data that we use internally." Pingar's services are split into three groups. "There is the content analysis, rapid discovery, and entity analysis. Content analysis helps you summarize and complete research on projects, look at a file or file store and comes up with themes and then summarizes those themes. Entity extraction is the piece that helps run the contextual metadata within documents and also within file stores and helps you classify that information."

Once LegiNation sends out a web call, Pingar "searches through a document and picks out metadata. It picks out names, addresses, organization names, specific topics. It can look through a single document or large files, or a drive or a company's Sharepoint. It doesn't just look at if something is a person's name, but it also looks at how it is being used in that paragraph and document. A company name may also sound like a person's name, it will identify that and look at how it is being used, and then classify that by how it is being used," explains Tobin. "It is that combination of the analytic piece and machine learning and natural language process that make it very powerful and very strong."

After LegiNation receives Pingar's list of keywords and scores, it "keeps the keywords in a separate table, and then our final process of cleaning up, and calculating all of our stats, and then the bills are available online for people to search and view. When you search for a topic, you'll get a little grid that shows all the bills that meet your keywords. One of the columns is the keywords we got back from Pingar, and that is great because you can now filter by the keywords but also, it kind of suggests to you words you may not have thought of," says Suhaka.

The Result:

According to Suhaka, Pingar's "API was really simple. It only took a couple hours for us to start submitting our first bills from our database and get back to our database." Suhaka didn't run into any technical issues along the way. "It is just really easily to build around. None of the other services that we played with did we actually get to work at the volumes we wanted to do. It's a solid technology. It is quick. I can legitimately process all the bills that have changed in the day overnight."

LegiNation is currently in a trial period, where anyone can go and look for legislature in their state, and though it is just getting up in started, with Pingar's help, Suhaka has "lots of hopes and dreams for once the base functionality is done we'll be able to show trends, and charts, and pie charts, and all kind of stuff based on the keyword stats. I think that will be ton of fun."

LegiNation already has a fan in Pingar. "I think it is a really interesting use of our technology, it is definitely a more consumer based used than we had ever envisioned," says Tobin. "We want people to download our API and build applications with it, so we are very excited about this because it certainly shows that it can be used for applications that we have envisioned and ones that we haven't."