Now that I am trying to get to a regular blogging schedule, I realized that I have not wished my readers a happy new year. Although I am traditionally late with these kind of things, I suppose now is too late to wish you all a very happy 2010. But, perhaps it is not too late to wish you all to have a great new decade?
I think that 2010 could be the beginning of a beautiful decade. The Decade of Data perhaps? There have been so many data-related developments the last couple of years, that I tend to believe that a lovely stage has been set.
During the last decade, several books that are closely related to data availability have become immensely popular. Freakonomics may be the most prominent example of this new kind of books. It combines a popular way of writing about advanced statistical techniques with applications on interesting sets of data. Ian Ayres’ Super Crunchers perhaps takes this approach even further, by describing more about both the nature of the applied statistical techniques (experiments and regression analysis) and making the most of the increasing availability of data.
The improvements of data analysis (including the abovementioned, but also including more academic innovations) perhaps are only left behind by the improvements in data availability. For instance, Hans Rosling promotes the public availability and use of large amounts of data, and does so by providing the public with means of creating mesmarizing graphics. See more on the website Gapminder.org.
Data collection is one thing, but data maintenance is something completely different. Gary King recognizes the speed at which data gets inaccessible, and how heterogeneous data-formats are, and decided to initiate the Dataverse Network. The Dataverse Network is a server-based approach on storing, managing, and providing to others the data resulting from the countless surveys and experiments performed in science. I think it is an impressive attempt in facilitating (academic) researchers in finding and sharing their data.
Also, governments are trying to upon up the collections of data their decisisons are based upon. Think about the possibilities of using these data, either for checking your government, or for (other) academic purposes! For instance, in the USA, government databases are made public on data.gov. From their website:
As a priority Open Government Initiative for President Obama’s administration, Data.gov increases the ability of the public to easily find, download, and use datasets that are generated and held by the Federal Government. Data.gov provides descriptions of the Federal datasets (metadata), information about how to access the datasets, and tools that leverage government datasets. The data catalogs will continue to grow as datasets are added. Federal, Executive Branch ata are included in the first version of Data.gov.
The above merely serves as a few examples of the exiting developments regarding public availability of data. I will continue to write about this, both detailing the examples given above, as well as about more lovely examples. An overview of the data I find interesting, is collected here.