Open source data catalog An overview of CKAN Augusto Herrmann Open Knowledge Brazil CKAN Overview | Augusto Herrmann
Topics covered in this presentation
• Introduction • Under the hood
○ what is CKAN ○ installation and maintenance
○ who uses it • Site administration
○ feature tour • Directions (where to find stuff) • Features of CKAN • Data publishing
2 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Time constraints
• pick and choose topics accordingly • I’ll be quick, but will address questions
by Moyan Brenn
3 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
First, a quick poll
•who is familiar with
○ the concepts of open data
○ browsing open data catalogs
○ including data in CKAN catalogs
○ installing CKAN
○ developing / theming CKAN
b y sean dreilinger
4 IV Moscow Urban Forum What is it? CKAN Overview | Augusto Herrmann
What is it?
Comprehensive
Knowledge
Archive
Network
by degreezero2000
6 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
What is it?
by Steven de Costa
An open source software for open data catalogs
7 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs Affero GPL 3 Licence
● if you offer it as software-as-a- service (SaaS), you also have to make source code available https://github.com/ckan/ckan more than 7 years old more than 80 developers
8 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
● stores metadata, not data itself (in principle)
● makes it easy to find data
● keep handy documentation about data
by Reeding Lessons
9 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
● data must be available on the internet in a permanent URL
○ directly linkable
by Dave Winer
10 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
● data must be available on the internet in a permanent URL
○ no captcha!
by LuChOeDu
11 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
● structured data
○ no tables inside pdf or doc
■ common offenders: statistic bulletins, official press
○ no tables as images
by Petras Gagilas
12 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
What is it?
An open source software for open data catalogs
● open formats
○ common formats: csv, json, xml, rdf
● open licences
○ “Open data and content can be freely used, modified, and shared by anyone for any purpose” - opendefinition.org ○ examples: CC 4.0, ODbL, OGL
by Jonathan Grey
13 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Who makes it?
● Open Knowledge http://okfn.org http://br.okfn.org ● Community of developers http://github.com/ckan/ckan
● Governance: CKAN Association http://ckan.org/about/association
14 IV Moscow Urban Forum Who uses it? CKAN Overview | Augusto Herrmann
Who uses it?
● national governments
● local and regional governments
● parliaments
● civil society (e.g. community instances)
● research institutions (open research data)
more at: http://ckan.org/instances
16 IV Moscow Urban Forum Who uses it?
National govenments CKAN Overview | Augusto Herrmann
data.gov.uk
United Kingdom
Source code: https://github. com/datagovuk
18 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
data.gov
USA
19 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
dados.gov.br
Brazil
Source code: http://dev.dados.gov. br/codigo/dev/tema-ckan
20 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
and many other countries
● Argentina ● Mexico ● Australia ● Netherlands ● Austria ● Norway ● Canada ● Romania ● Germany ● Slovakia Riley Kaminer ● Iceland ● Sweden ● Ireland ● Switzerland ● Italia ● Uruguay
● Japan
21 IV Moscow Urban Forum Who uses it?
City govenments CKAN Overview | Augusto Herrmann
dados.recife.pe.gov.br
Recife, PE, Brazil
Source code: http://dados.recife.pe.gov. br/source/ckan_dados_recife_20140828.zip
23 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
data.rio.rj.gov.br
Rio de Janeiro, RJ, Brazil
24 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
datapoa.com.br
Porto Alegre, RS, Brazil
25 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
data.buenosaires.gob.ar
Buenos Aires, Argentina
26 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
opendata.caceres.es
Cáceres, Spain
27 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
data.kk.dk
Copenhagen, Denmark
28 IV Moscow Urban Forum Who uses it?
Community instances CKAN Overview | Augusto Herrmann
datahub.io
Open Knowledge
30 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
hubofdata.ru
OpenGovData.ru
31 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Internationalization (i18n)
● available in 53 languages
● languages with 99% or more complete in version 2.2: ○ bulgarian ○ japanese ○ catalan ○ norweigan ○ czech ○ portuguese (br) ○ dutch ○ spanish by Eric Andresen ○ french ○ swedish ○ finnish
○ german ○ italian
32 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Russian localization
● 92% completed for version 2.2
● translation of version 2.3 will soon begin
● join the localization team: ○ collaborative translation platform - Transifex ○ https://www.transifex.com/projects/p/ckan/language/ru/
33 IV Moscow Urban Forum Features
by Jereme Rauckman CKAN Overview | Augusto Herrmann
Catalog and search data
● catalog through the web interface, using the API or harvesting tools
● search all metadata fields
● faceted search
○ organization, tag, format, license
● data is sorted out as “datasets” and “resources”
35 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Find related data
● related or similar resources are registered in the same dataset (e.g. same data, but different format; same data, but for differing time periods, etc.)
36 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Find relevant metadata
● title ● description ● unique identifier ● author and maintainer ● license ● website or source page for the data ● groups, tags, organizations ● format (for the resource) ● other (including custom ones)
37 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Preview data
● preview a sample of the resource as a table, chart, map, etc. ● interactive - e.g. tables are sortable by column, axes in charts can be configured to any column, etc. ● uses the recline.js data visualization library
38 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Preview data
39 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Handle geospatial data
● through the ckanext-spatial extension ● visualize geo data in a map (e.g. contours of plazas and parks) ● search for data inside a user-defined bounding box selectable by the user in a search query
40 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
See a dataset’s change history
● track changes to a dataset ● see who did what and when
41 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Sort out datasets by organization
● each organization can manage their own data in the catalog and authorize users who can edit ● gets their own page in the catalog with visibility for the data they publish ● is also a facet available for search
42 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Sort out datasets into groups
● another way to link related datasets ● useful for thematic classification ● is also a facet available for search
43 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Sort out datasets into tags
● free-form user (editor) defined tags ● also for searching
44 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Custom themes
● simple customization (colors, layout of main page, portal title, etc.) can be made through the user interface by the site administrator ● for deeper customization, use the extension programming interface (Python) and develop custom templates (Jinja2)
45 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Extensible
● programming interface for creating extensions ● extension repository extensions.ckan.org ● has many extensions with varying degrees of maturity
46 IV Moscow Urban Forum James Petts CKAN Overview | Augusto Herrmann
FileStore and DataStore
● built-in extensions ● FileStore: allow for uploading files and store them in CKAN, instead of just linking to a URL ● DataStore: allow for querying data through an API, even “joining” data from different resources ○ also comes with the DataPusher service, which updates the DataStore on each DRs Kulturarvsprojekt file registered
47 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Harvesting
● metadata can be harvested from another portal by using the etension ckanex-harvest ● in (configurable) time, data newly catalogued or modified in the source will show up in the harvesting portal
by Martin Pettitt
48 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Feedback
● there are extensions for users to comment in a specific dataset ● stimulates discussion about and improvement of data
49 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Access by API
● uses http requests (pseudo-RESTful) ● consumes and returns metadata in JSON format ● you can do programmatically any operation you can do using the UI (e.g., searching) ● by using an access key on the API you can overcome access throttling limitations and also do any of the same read and write operations your user is allowed to do via UI ● useful for processing and cataloguing data in great volumes (e.g. apply a fix to many datasets in a batch, by Andrea Vallejos include many similar resources in a dataset, etc.)
50 IV Moscow Urban Forum Cataloguing data on CKAN CKAN Overview | Augusto Herrmann
Datasets and resources
● resources can be data files, API entry points, query examples, extended data documentation, etc. ● a resource has exactly one format and URL ● datasets can have one or more resources ● as a general guideline, can be catalogued under the same dataset: ○ resources that are representations of the same data in various formats ○ resources that are about the same data but in different time periods ○ resources that are about the same data but in different regional spans
52 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Datasets and resources
● a dataset has ○ a single source (URL for a source page of the data) ○ a single license ○ a single author ○ a single maintainer ○ a single (or none) organization ○ a set of groups that applies to the whole dataset ○ a set of tags that applies to the whole dataset
53 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Organizations
● only organization editors (or admins) can create datasets in it ● users can create datasets in any organizations for which they are editors ● organization admins can invite existing or new users for the organization and assign them a role (member, editor or administrator)
54 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Creating a new dataset
● Click “add a new dataset” ○ on the dataset search screen; or
○ on the organization screen for an organization for which you are an editor or admin
55 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Creating a new dataset
● CKAN will ask for the following basic metadata: ○ title ○ description ○ tags ○ license
○ organization (if you’re editor on more than one organization) ● when finished, click “Next: add data”
to include resources
56 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Including resources
● select “link to file”, “link to an API” or “upload a file” (in case FileStore is enabled) ● type in name, description and format ● if you have other resources to include, select “save & add another” ● after including all resources, click “next: additional info”
57 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Additional dataset information
● “visibility”: “public” can be seen by any site visitor; “private” means visible to members of the organization only ● “author” / “author e-mail”: person or organization responsible for producing the data ● “maintainer” / “maintainer e-mail”: person or organization technically responsible for keeping data available ● optional custom fields ● press “finish” to create the dataset
58 IV Moscow Urban Forum Under the hood
BiblioArchives / LibraryArchives CKAN Overview | Augusto Herrmann
System Architecture
• Usually sits alongside a CMS (e.g. Drupal or Wordpress)
• WGSI Application pluggable to Apache (modwsgi), to nginx, etc.
• PostgreSQL database (metadata, access control, etc.)
• Apache Solr (for indexing and searching)
• Other components (depending on the installed and in-use extensions)
60 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Installing CKAN
• Supported operating system:
• Other possible OS’s:
○ Debian
○ CentOS
○ Red Hat
○ Windows (version 1.8 of CKAN) http://www.hackneyworkshop.com/2012/03/30/ckan-on-windows/
○ OS X
61 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Installing CKAN
• Types of installation
○ Ubuntu 12.04 64-bit server package
○ source code
○ using Docker
62 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Package install sudo apt-get update
sudo apt-get install -y nginx apache2 ● Requirements: Ubuntu 12.04 64-bit server libapache2-mod-wsgi libpq5
wget http://packaging.ckan.org/python- ●installs CKAN and DataPusher (for DataStore) ckan_2.2_amd64.deb
sudo dpkg -i python-ckan_2.2_amd64.deb
●Steps: sudo apt-get install -y postgresql 1. Install the CKAN package and its solr-jetty
dependencies sudo service apache2 restart 2. Install PostgreSQL and Solr sudo service nginx restart
3. Restart Apache and Nginx
63 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Source code install
● sequence of commands depend on operating system
○ detailed instructions for each are available in: https://github.com/ckan/ckan/wiki/How-to-Install-CKAN 1. install dependency packages 2. install CKAN packages into a Python virtualenv 3. configure Postgres database 4. create a CKAN configuration file (production.ini) 5. configure Solr 6. create database tables 7. configure DataStore (optional) 8. link to who.ini (Repoze.who configuration file) 64 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Docker install $ docker run -d --name db ckan/postgresql $ docker run -d --name solr ckan/solr ● Requirement: have Docker installed and $ docker run -d -p 80:80 --link db:db configured --link solr:solr ckan/ckan
● set of 3 commands
● Docker downloads images automatically (can take a long time)
65 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Initial configuration
• Create a site administrator user paster sysadmin add seanh -c /etc/ckan/default/production.ini
• Create other users if necessary
• Edit production.ini (for instance to configure the site name)
ckan.site_title = Open data portal
66 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Other maintenance commands
• Rebuild search index paster --plugin=ckan search-index rebuild -- config=/etc/ckan/std/std.ini • Create and remove users paster --plugin=ckan user add exampleuser -- config=/etc/ckan/std/std.ini paster --plugin=ckan user remove exampleuser -- config=/etc/ckan/std/std.ini
67 IV Moscow Urban Forum CKAN site administration CKAN Overview | Augusto Herrmann
Simple customization
http://
● some simple customization changes can be made through the UI by the site administrator
○ site title and description
○ color scheme
○ intro text, about text and others
○ custom css
69 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
User registration
● by default, user self-registration is enabled
● to disable (e.g. to avoid spam), change a flag in .ini file ckan.auth.create_user_via_web = False
70 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Registering new groups and organizations
● by default, creating new organizations is enabled for all editors
● to disable, change a flag in .ini file ckan.auth.user_create_organizations = False
● likewise, the same for groups ● note: site admin can always create groups and organizations regardless
71 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Manage users
● look for user in http://
● when logged in as admin, you see a “manage” button under the user profile ● admin can edit profile, change passwords or delete the user
72 IV Moscow Urban Forum Directions
by Nick Page CKAN Overview | Augusto Herrmann
Documentation
http://docs.ckan.org
There are specific manuals for specific audiences:
● End user (editor) ● Site administrator
● Maintainer
74 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Documentation
Also manuals for specific subjects:
● API guide ● Extending guide ● Theming guide ● Contributing guide
by John Haslam
75 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Where to get help
On mailing lists:
● CKAN Global User Group https://groups.google.com/forum/#!forum/ckan-global-user-group ● ckan-dev https://lists.okfn.org/mailman/listinfo/ckan-dev
by Upupa4me
76 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Where to get help
On IRC chat:
server: irc.freenode.net channel: #ckan
by Garry Knight
77 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Where to get help
Paid support:
● hosting with a SLA ● deployment and maintenance ● support, consultancy, training
by glasseyes view
78 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Where to try CKAN
demo.ckan.org ● free for experimentation, cataloguing data and getting to know CKAN ● content is periodically wiped out
by Horia Varlan
79 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Where to register datasets
datahub.io
● community instance ● as an individual, if you don’t have you own CKAN, this is an option ● e.g. data that has been cleaned up as result of a hackathon
80 IV Moscow Urban Forum CKAN Overview | Augusto Herrmann
Questions?
thank you спасибо
[email protected] [email protected]
IV Moscow Urban Forum