Public Data
Enhancing Data Discovery and Exploration
Benjamin Yolken (yolken@google.com)
June 2011 Overview
Disseminating public statistics ● Objectives and challenges ● Google's approach: Public Data Explorer
Conclusion Objective
Make public statistics accessible, useful, and well-organized.
Public statistics (2) Public statistics (4)
Accessible...
(1) Access: Data need to be online and findable ● Provider web sites ● Third-party aggregators ● Search engines
(2) Understanding: Statisticians aren't the only users ● Lay users: Teachers, students, journalists, policy makers ● Computers: Search engines ● If not accessible to non-experts, data can become unused or, worse, misused
Strategies: ● Easy access and reuse ● Documentation in plain language ● Clear organization / presentation ● Partnerships Useful...
There are a lot of distractions today: tables and simple plots are not enough
Need to engage not just with users' eyes, but also their brains
Strategies: ● Interactive visualizations ● Easy export ● APIs ● Social media Well-organized...
Go beyond flat lists of data... ● Topics ● Time periods ● Geographic regions ● Formats ● Languages, etc...
Ultimately, depends on having good metadata
Strategies: ● Metadata ● Improved organization / categorization PDE Intro Video Public Data Explorer (PDE)
Google's approach to making public data accessible, useful, and organized
Outgrowth of Google's acquisition of Trendalyzer
Stand-alone product: http://www.google.com/publicdata
Contains 40+ datasets from official data providers around the world PDE: Metadata
Dataset Publishing Language (DSPL) ● Designed for interactive exploration and visualization ● Released under BSD, open source license ● Combines data tables (CSV) with metadata (XML) PDE: Dataset Creation and Upload PDE: Visualization
Demo link PDE: Embed
Demo link Conclusion
Need to make statistics accessible, useful, organized
Google's Public Data Explorer is one potential tool ● We welcome your data! ● Good solutions / tools from others as well: Microsoft, Socrata, Tableau, IBM, etc.
Key advice: Think about the users, their needs
Really exciting area, only scratched the surface in terms of what's possible Thank you!
Questions?