Public Data

Enhancing Data Discovery and Exploration

Benjamin Yolken (yolken@.com)

June 2011 Overview

Disseminating public statistics ● Objectives and challenges ● Google's approach: Public Data Explorer

Conclusion Objective

Make public statistics accessible, useful, and well-organized.

Public statistics (2) Public statistics (4)

Accessible...

(1) Access: Data need to be online and findable ● Provider web sites ● Third-party aggregators ● Search engines

(2) Understanding: Statisticians aren't the only users ● Lay users: Teachers, students, journalists, policy makers ● Computers: Search engines ● If not accessible to non-experts, data can become unused or, worse, misused

Strategies: ● Easy access and reuse ● Documentation in plain language ● Clear organization / presentation ● Partnerships Useful...

There are a lot of distractions today: and simple plots are not enough

Need to engage not just with users' eyes, but also their brains

Strategies: ● Interactive visualizations ● Easy export ● APIs ● Social media Well-organized...

Go beyond flat lists of data... ● Topics ● Time periods ● Geographic regions ● Formats ● Languages, etc...

Ultimately, depends on having good metadata

Strategies: ● Metadata ● Improved organization / categorization PDE Intro Video Public Data Explorer (PDE)

Google's approach to making public data accessible, useful, and organized

Outgrowth of Google's acquisition of Trendalyzer

Stand-alone product: http://www.google.com/publicdata

Contains 40+ datasets from official data providers around the world PDE: Metadata

Dataset Publishing Language (DSPL) ● Designed for interactive exploration and visualization ● Released under BSD, open source license ● Combines data tables (CSV) with metadata (XML) PDE: Dataset Creation and Upload PDE: Visualization

Demo link PDE: Embed

Demo link Conclusion

Need to make statistics accessible, useful, organized

Google's Public Data Explorer is one potential tool ● We welcome your data! ● Good solutions / tools from others as well: Microsoft, Socrata, Tableau, IBM, etc.

Key advice: Think about the users, their needs

Really exciting area, only scratched the surface in terms of what's possible Thank you!

Questions?