Open source data catalog An overview of CKAN Augusto Herrmann Open Knowledge Brazil CKAN Overview | Augusto Herrmann

Topics covered in this presentation

• Introduction • Under the hood

○ what is CKAN ○ installation and maintenance

○ who uses it • Site administration

○ feature tour • Directions (where to find stuff) • Features of CKAN • Data publishing

2 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Time constraints

• pick and choose topics accordingly • I’ll be quick, but will address questions

by Moyan Brenn

3 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

First, a quick poll

•who is familiar with

○ the concepts of

○ browsing open data catalogs

○ including data in CKAN catalogs

○ installing CKAN

○ developing / theming CKAN

b y sean dreilinger

4 IV Moscow Urban Forum What is it? CKAN Overview | Augusto Herrmann

What is it?

Comprehensive

Knowledge

Archive

Network

by degreezero2000

6 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

What is it?

by Steven de Costa

An open source for open data catalogs

7 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs Affero GPL 3 Licence

● if you offer it as software-as-a- service (SaaS), you also have to make source code available https://github.com/ckan/ckan more than 7 years old more than 80 developers

8 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

● stores metadata, not data itself (in principle)

● makes it easy to find data

● keep handy documentation about data

by Reeding Lessons

9 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

● data must be available on the internet in a permanent URL

○ directly linkable

by Dave Winer

10 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

● data must be available on the internet in a permanent URL

○ no captcha!

by LuChOeDu

11 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

● structured data

○ no tables inside pdf or doc

■ common offenders: statistic bulletins, official press

○ no tables as images

by Petras Gagilas

12 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

What is it?

An open source software for open data catalogs

● open formats

○ common formats: csv, json, xml, rdf

● open licences

○ “Open data and content can be freely used, modified, and shared by anyone for any purpose” - opendefinition.org ○ examples: CC 4.0, ODbL, OGL

by Jonathan Grey

13 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Who makes it?

● Open Knowledge http://okfn.org http://br.okfn.org ● Community of developers http://github.com/ckan/ckan

● Governance: CKAN Association http://ckan.org/about/association

14 IV Moscow Urban Forum Who uses it? CKAN Overview | Augusto Herrmann

Who uses it?

● national governments

● local and regional governments

● parliaments

● civil society (e.g. community instances)

● research institutions (open research data)

more at: http://ckan.org/instances

16 IV Moscow Urban Forum Who uses it?

National govenments CKAN Overview | Augusto Herrmann

data.gov.uk

United Kingdom

Source code: https://github. com/datagovuk

18 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

data.gov

USA

19 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

dados.gov.br

Brazil

Source code: http://dev.dados.gov. br/codigo/dev/tema-ckan

20 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

and many other countries

● Argentina ● Mexico ● Australia ● Netherlands ● Austria ● Norway ● Canada ● Romania ● Germany ● Slovakia Riley Kaminer ● Iceland ● Sweden ● Ireland ● Switzerland ● Italia ● Uruguay

● Japan

21 IV Moscow Urban Forum Who uses it?

City govenments CKAN Overview | Augusto Herrmann

dados.recife.pe.gov.br

Recife, PE, Brazil

Source code: http://dados.recife.pe.gov. br/source/ckan_dados_recife_20140828.zip

23 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

data.rio.rj.gov.br

Rio de Janeiro, RJ, Brazil

24 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

datapoa.com.br

Porto Alegre, RS, Brazil

25 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

data.buenosaires.gob.ar

Buenos Aires, Argentina

26 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

opendata.caceres.es

Cáceres, Spain

27 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

data.kk.dk

Copenhagen, Denmark

28 IV Moscow Urban Forum Who uses it?

Community instances CKAN Overview | Augusto Herrmann

datahub.io

Open Knowledge

30 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

hubofdata.ru

OpenGovData.ru

31 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Internationalization (i18n)

● available in 53 languages

● languages with 99% or more complete in version 2.2: ○ bulgarian ○ japanese ○ catalan ○ norweigan ○ czech ○ portuguese (br) ○ dutch ○ spanish by Eric Andresen ○ french ○ swedish ○ finnish

○ german ○ italian

32 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Russian localization

● 92% completed for version 2.2

● translation of version 2.3 will soon begin

● join the localization team: ○ collaborative translation platform - Transifex ○ https://www.transifex.com/projects/p/ckan/language/ru/

33 IV Moscow Urban Forum Features

by Jereme Rauckman CKAN Overview | Augusto Herrmann

Catalog and search data

● catalog through the web interface, using the API or harvesting tools

● search all metadata fields

● faceted search

○ organization, tag, format, license

● data is sorted out as “datasets” and “resources”

35 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Find related data

● related or similar resources are registered in the same dataset (e.g. same data, but different format; same data, but for differing time periods, etc.)

36 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Find relevant metadata

● title ● description ● unique identifier ● author and maintainer ● license ● website or source page for the data ● groups, tags, organizations ● format (for the resource) ● other (including custom ones)

37 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Preview data

● preview a sample of the resource as a table, chart, map, etc. ● interactive - e.g. tables are sortable by column, axes in charts can be configured to any column, etc. ● uses the recline.js data visualization library

38 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Preview data

39 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Handle geospatial data

● through the ckanext-spatial extension ● visualize geo data in a map (e.g. contours of plazas and parks) ● search for data inside a user-defined bounding box selectable by the user in a search query

40 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

See a dataset’s change history

● track changes to a dataset ● see who did what and when

41 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Sort out datasets by organization

● each organization can manage their own data in the catalog and authorize users who can edit ● gets their own page in the catalog with visibility for the data they publish ● is also a facet available for search

42 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Sort out datasets into groups

● another way to link related datasets ● useful for thematic classification ● is also a facet available for search

43 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Sort out datasets into tags

● free-form user (editor) defined tags ● also for searching

44 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Custom themes

● simple customization (colors, layout of main page, portal title, etc.) can be made through the user interface by the site administrator ● for deeper customization, use the extension programming interface (Python) and develop custom templates (Jinja2)

45 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Extensible

● programming interface for creating extensions ● extension repository extensions.ckan.org ● has many extensions with varying degrees of maturity

46 IV Moscow Urban Forum James Petts CKAN Overview | Augusto Herrmann

FileStore and DataStore

● built-in extensions ● FileStore: allow for uploading files and store them in CKAN, instead of just linking to a URL ● DataStore: allow for querying data through an API, even “joining” data from different resources ○ also comes with the DataPusher service, which updates the DataStore on each DRs Kulturarvsprojekt file registered

47 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Harvesting

● metadata can be harvested from another portal by using the etension ckanex-harvest ● in (configurable) time, data newly catalogued or modified in the source will show up in the harvesting portal

by Martin Pettitt

48 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Feedback

● there are extensions for users to comment in a specific dataset ● stimulates discussion about and improvement of data

49 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Access by API

● uses http requests (pseudo-RESTful) ● consumes and returns metadata in JSON format ● you can do programmatically any operation you can do using the UI (e.g., searching) ● by using an access key on the API you can overcome access throttling limitations and also do any of the same read and write operations your user is allowed to do via UI ● useful for processing and cataloguing data in great volumes (e.g. apply a fix to many datasets in a batch, by Andrea Vallejos include many similar resources in a dataset, etc.)

50 IV Moscow Urban Forum Cataloguing data on CKAN CKAN Overview | Augusto Herrmann

Datasets and resources

● resources can be data files, API entry points, query examples, extended data documentation, etc. ● a resource has exactly one format and URL ● datasets can have one or more resources ● as a general guideline, can be catalogued under the same dataset: ○ resources that are representations of the same data in various formats ○ resources that are about the same data but in different time periods ○ resources that are about the same data but in different regional spans

52 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Datasets and resources

● a dataset has ○ a single source (URL for a source page of the data) ○ a single license ○ a single author ○ a single maintainer ○ a single (or none) organization ○ a set of groups that applies to the whole dataset ○ a set of tags that applies to the whole dataset

53 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Organizations

● only organization editors (or admins) can create datasets in it ● users can create datasets in any organizations for which they are editors ● organization admins can invite existing or new users for the organization and assign them a role (member, editor or administrator)

54 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Creating a new dataset

● Click “add a new dataset” ○ on the dataset search screen; or

○ on the organization screen for an organization for which you are an editor or admin

55 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Creating a new dataset

● CKAN will ask for the following basic metadata: ○ title ○ description ○ tags ○ license

○ organization (if you’re editor on more than one organization) ● when finished, click “Next: add data”

to include resources

56 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Including resources

● select “link to file”, “link to an API” or “upload a file” (in case FileStore is enabled) ● type in name, description and format ● if you have other resources to include, select “save & add another” ● after including all resources, click “next: additional info”

57 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Additional dataset information

● “visibility”: “public” can be seen by any site visitor; “private” means visible to members of the organization only ● “author” / “author e-mail”: person or organization responsible for producing the data ● “maintainer” / “maintainer e-mail”: person or organization technically responsible for keeping data available ● optional custom fields ● press “finish” to create the dataset

58 IV Moscow Urban Forum Under the hood

BiblioArchives / LibraryArchives CKAN Overview | Augusto Herrmann

System Architecture

• Usually sits alongside a CMS (e.g. or Wordpress)

• WGSI Application pluggable to Apache (modwsgi), to nginx, etc.

• PostgreSQL database (metadata, access control, etc.)

(for indexing and searching)

• Other components (depending on the installed and in-use extensions)

60 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Installing CKAN

• Supported operating system:

• Other possible OS’s:

○ Debian

○ CentOS

○ Red Hat

○ Windows (version 1.8 of CKAN) http://www.hackneyworkshop.com/2012/03/30/ckan-on-windows/

○ OS X

61 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Installing CKAN

• Types of installation

○ Ubuntu 12.04 64-bit server package

○ source code

○ using Docker

62 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Package install sudo apt-get update

sudo apt-get install -y nginx apache2 ● Requirements: Ubuntu 12.04 64-bit server libapache2-mod-wsgi libpq5

wget http://packaging.ckan.org/python- ●installs CKAN and DataPusher (for DataStore) ckan_2.2_amd64.deb

sudo dpkg -i python-ckan_2.2_amd64.deb

●Steps: sudo apt-get install -y 1. Install the CKAN package and its solr-jetty

dependencies sudo service apache2 restart 2. Install PostgreSQL and Solr sudo service nginx restart

3. Restart Apache and Nginx

63 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Source code install

● sequence of commands depend on operating system

○ detailed instructions for each are available in: https://github.com/ckan/ckan/wiki/How-to-Install-CKAN 1. install dependency packages 2. install CKAN packages into a Python virtualenv 3. configure Postgres database 4. create a CKAN configuration file (production.ini) 5. configure Solr 6. create database tables 7. configure DataStore (optional) 8. link to who.ini (Repoze.who configuration file) 64 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Docker install $ docker run -d --name db ckan/postgresql $ docker run -d --name solr ckan/solr ● Requirement: have Docker installed and $ docker run -d -p 80:80 --link db:db configured --link solr:solr ckan/ckan

● set of 3 commands

● Docker downloads images automatically (can take a long time)

65 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Initial configuration

• Create a site administrator user paster sysadmin add seanh -c /etc/ckan/default/production.ini

• Create other users if necessary

• Edit production.ini (for instance to configure the site name)

ckan.site_title = Open data portal

66 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Other maintenance commands

• Rebuild search index paster --plugin=ckan search-index rebuild -- config=/etc/ckan/std/std.ini • Create and remove users paster --plugin=ckan user add exampleuser -- config=/etc/ckan/std/std.ini paster --plugin=ckan user remove exampleuser -- config=/etc/ckan/std/std.ini

67 IV Moscow Urban Forum CKAN site administration CKAN Overview | Augusto Herrmann

Simple customization

http:///ckan-admin/config/

● some simple customization changes can be made through the UI by the site administrator

○ site title and description

○ color scheme

○ intro text, about text and others

○ custom css

69 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

User registration

● by default, user self-registration is enabled

● to disable (e.g. to avoid spam), change a flag in .ini file ckan.auth.create_user_via_web = False

70 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Registering new groups and organizations

● by default, creating new organizations is enabled for all editors

● to disable, change a flag in .ini file ckan.auth.user_create_organizations = False

● likewise, the same for groups ● note: site admin can always create groups and organizations regardless

71 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Manage users

● look for user in http:///user/

● when logged in as admin, you see a “manage” button under the user profile ● admin can edit profile, change passwords or delete the user

72 IV Moscow Urban Forum Directions

by Nick Page CKAN Overview | Augusto Herrmann

Documentation

http://docs.ckan.org

There are specific manuals for specific audiences:

● End user (editor) ● Site administrator

● Maintainer

74 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Documentation

Also manuals for specific subjects:

● API guide ● Extending guide ● Theming guide ● Contributing guide

by John Haslam

75 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Where to get help

On mailing lists:

● CKAN Global User Group https://groups.google.com/forum/#!forum/ckan-global-user-group ● ckan-dev https://lists.okfn.org/mailman/listinfo/ckan-dev

by Upupa4me

76 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Where to get help

On IRC chat:

server: irc.freenode.net channel: #ckan

by Garry Knight

77 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Where to get help

Paid support:

● hosting with a SLA ● deployment and maintenance ● support, consultancy, training

by glasseyes view

78 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Where to try CKAN

demo.ckan.org ● free for experimentation, cataloguing data and getting to know CKAN ● content is periodically wiped out

by Horia Varlan

79 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Where to register datasets

datahub.io

● community instance ● as an individual, if you don’t have you own CKAN, this is an option ● e.g. data that has been cleaned up as result of a hackathon

80 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann

Questions?

thank you спасибо

[email protected] [email protected]

IV Moscow Urban Forum