<<

Data for Libraries Principles, Excel, Data Studio, Tableau Agenda

● Library data - what data, why visualize, some examples ● Basic principles of , types of , good and bad ● Excel features ● Google Data Studio ● Tableau Public (Desktop) Trends, Patterns, Relationships

● Budget ● Circulation activity ● Collections ● Instruction ● E-Resource usage data (COUNTER, Proxy server, other) ● Interlibrary loan ● Services: virtual reference, staff-logged transactions, equipment checkouts (e.g. laptops), study rooms bookings ● Website activity (Google Analytics) ● Gate counts ● Other?

Metrics that Matter! Basic Principle of Data Visualization

Keep it Simple Which is easier to understand? Chartjunk

From Stephen Few Show Me the Numbers Kinds of Data ● Numeric data - "countable" = Metrics ○ e.g., FT uses of a journal ○ Can do arithmetic on - add, median/mean, min/max - meaningfully ○ may be absolute or percentage ○ don't be fooled by data that looks like numbers but isn't, see below ● Categorical data = Dimensions ○ labels like different collections of books: Stacks, Reference, Oversize ○ includes "nominal" data that looks like numbers but isn't really, like ISBNs, room numbers ○ may have a "natural" order like LC classes, rainbow colors ○ Often best to present in “most to least” ● Dates/times - can be metrics (e.g. median pub years) or dimensions (month of year); has natural order that should override “most to least” ● Ordinal data - rankings - the number does not represent a quantity of anything Basic Principles of Data Visualization Know what charts are best for what data

○ Bar charts - comparison of dimensional things (LC classes, types of patrons, databases) ■ horizontal bars for bigger labels ○ Line charts - trend over time (budget, circ or any service activity, e-resource use) ○ Scatter charts - correlation between two metrics variables, data point can be categorical ○ Histograms - bar charts that show frequency where the order of the bars is meaningful (book imprints grouped in decades) ○ Heat - visualize tiers of quantities across a matrix or literal ○ Pareto and Treemaps - proportions of 100% Basic Principles of Data Visualization Avoid any type of that displays 2-d area in data values

● Principle: people don't judge area as quantity very well - we're linear (length) ● pie charts/donut charts ● stacked bar and area charts ● cartoon figures as stylized "bars" ● radar charts (LibQual) Bar Charts ● Vertical or horizontal - horizontal best when labels don't show well in vertical ● Good for showing metric data in dimensions ● Always start Y scale at zero

Data from UPEI Library Website Google Analytics using Google Data Studio Grouped bar chart - time as dimension Bad "bar" charts

People see area, obscures linear values Stacked bar chart: How much work to see trends in each data set? Instead of stacked bar when lots of data Panel bar chart - keep the scales the same! Unit Charts - Bad! Which is easier to understand quickly? Line charts SJSU

Best for showing data with continuity, usually time, especially a few sets of related data

UPEI Library Service Desk Transactions Line Charts

● Usually start Y scale at zero or show it if go negative ● Intervals must be equal in size (e.g. years, months, quarters, etc.) ● Average slope of 45 degrees is often good

https://eagereyes.org/basics/banking-45-degrees Why stacked area is bad Look at the blue data - are Central sales going up? Radar Charts: Very bad!!

From UPEI's 2013 LibQual report. Showing contributions towards a whole

When your data should add up to 100%

● Pie charts - most well known, but not well used ● Treemaps ● Pareto charts Pie Chart Use rarely, with very few slices, data labelled

Video: Salvaging the Pie - http://www.darkhorseanalytics.com/blog/salvaging-the-pie Which helps you understand the data better?

Taken from Magnuson book Treemaps

Area (2d) properly represents proportion of 100% Pareto Charts

● Emphasizes most important contributions towards a total ● Data organized by frequency/largest to smallest ● Visualize cumulative percentage Histograms

Frequency summary of large raw data (graph version of pivot ) esp. for continuous data that needs to be grouped into ranges

Example: Ebook usage in EBSCO Ebook package by Publication Decade Scatter chart (or Scatter )

Often categorical data with two data variables to detect points are correlation or odd countries interactions between the two variables. Heat Map

To illustrate patterns in a set of data that can be arranged in 2D, often a matrix or map.

Reference Desk Transaction Counts Fall 2010-Spring 2012 When to just use a Table instead of a Graphic Chart

● Users want to see individual values, not just trends and patterns ● Users want to see precise values ● Values need to be shown in various groupings, e.g., subtotals ● Data includes different units, e.g. dollars, percentages, unit counts

Charts are best when it's the trend, pattern, or relationship among the data points, not their precise values, that matters most

Taken from Stephen Few, Show Me the Numbers Library dashboards

● What is a dashboard? ideally live data, not just static tables ● Why not? maintenance - keep fresh! ● Ball State U - basic stats - http://www.bsu.edu/libraries/dashboard/public.php ● Georgia Tech - http://www.library.gatech.edu/dashboard/ ● Portland State U - http://library.pdx.edu/about/library-data-dashboard/ ● Indiana University - Kokomo http://iuk.edu/library/about_library/dashboard/index.php ● SJSU - http://library.sjsu.edu/dashboard/dashboard ● NMSU - http://lib.nmsu.edu/dashboard/libuse.shtml (note submenus) Steps to Make Good Visualizations

1. what data you need and how to get it 2. Acquire raw data 3. Filter, refine, and process for use/ingest into charts/software 4. Ingest into display tools and create visualizations Plan what you need - what do you need to reveal?

Circulation data: what to include, exclude (reserves?), patron type info?

Collection data: volumes versus titles, locations, e-materials

E-Resource usage data: COUNTER, non-COUNTER, current, titles over time, Linking/OpenURL data

ILL data: by material type, by patron characteristics

Proxy server data: off-campus, on-campus, products without real usage reports

Website use data: Google Analytics? Feedback form use?

Other data: Staff-logged transaction data, virtual ref service? Tips specific to library data

LC Class:

● Too many for any useful graph - how to handle? Excel Charts

● Most likely to copy-and-paste to Word or elsewhere - link versus image ● Excel 2016 has many new options, not all are good practice ○ Pareto ○ Treemap ○ Histogram ○ Waterfall - cumulative effect of positive and negative values ○ Box & Whisker - shows median, median of top and bottom halves and min/max values ○ Sunburst - hierarchical data - rings of pie charts Demo Excel Charts Data Viz Products

"Business Intelligence"

● Google Data Studio ● Tableau ● Microsoft Power BI ● Qlik ● others

Concept:

Data in sources - spreadsheets, APIs, SQL databases, etc.

Connect sources to BI platform to make reports, dashboards, etc. Google Data Studio

Steps:

1. Make sure your data is cleaned up, with one header row 2. Create new report 3. Create new data source - select what to link to - must select one worksheet at a time from a spreadsheet with multiple worksheets/tabs 4. If needed create new "metrics" from text columns (e.g. day of week) 5. Connect the data source to the report (can add multiple sources to a single report) 6. Start to add charts, text labels, etc. Can have multiple pages. 7. Share report - if for website, "More" - "Public on the Web" Google Data Studio

https://datastudio.google.com/

Does not appear with usual list of Apps, not even "more"

Can connect to Google Sheets, Google Analytics, many other data sources

If data source is live/updated, Data studio report is too

Demo Google Data Studio Tableau

Commercial Product

Purchase options: Public, Desktop, Server, Online (hosted Server)

Public is free:

● 10GB space ● Some connectors not available in free version ● Reports are entirely public ● Limit to 1 million rows of data (per source?) ● Download app

Demo Tableau Public Tableau Web Data Connectors

Facebook: http://files.tableaujunkie.com/facebooksearch/pagefeedwebconnect.html

Instagram: https://illonage.github.io/

Many others: http://tableau.github.io/webdataconnector/community/ Recommended Reading and Q&A

● Edward, Tufte. The Visual Display of Quantitative . Graphics Press, Cheshire, USA [many reprints and revisions, original was 1983] ● Few, Stephen. Show Me the Numbers : Designing Tables and Graphs to Enlighten. Oakland, Calif. : Analytics Press, c2004. ● Few, Stephen. Information Dashboard Design. Sebastopol, Calif. : O'Reilly, 2006. ● Magnuson, Lauren. Data Visualization : A Guide to Visual Storytelling for Libraries. Rowman & Littlefield Publishers, 2016. A LITA Guide.

Questions? Discussion?