project echelon: fighting government opacity with open source data mining [View all]
ECHELON is a project designed to make up for the U.S. government's shortcomings in regards to disclosures and open data. Due to the poor quality of the data released, simple question about the workings of our government are difficult to answer and useful questions often can't be answered at all. ECHELON uses a variety of computational techniques and clever database design to overcome the hurdles of trying to model the government inside a computer based on the limited information publicly available .
Overarching generalities aside, ECHELON's main purpose is three fold. Firstly, ECHELON takes data from the government and structures it in useful ways. This is accomplished by creating models derived from domain expertise in the given data and then inserting the data into Datomic, a curious database with a powerful query engine. Once the data has been loaded, ECHELON goes beyond just providing a better interface by enhancing the data via the usage of record linkage and information extraction techniques. In particular, this means that we can figure out that something named "Big Company Incorporated (formerly known as Small Company co.)" represents the same being as "SMALL COMPANY INC.". Lastly, ECHELON aims to provide durable and reliable ID's for the various beings that are involved in the workings of the government.
https://github.com/sunlightlabs/echelon/blob/master/README.md