Development of an Open­Source Research Platform Mitchell Rysavy

Abstract Problem Statement Analysis of Existing Platforms Application Goals Features Easy Application Setup Docker Container Support User Account and Social Network Login Data Import Taggable and Commentable Notes and Media Areas of Improvement Security Precautions and Concerns Conclusion Abstract

Genealogy software is often hard to use and this complexity may drive users away from using it at all. The purpose of this project is address the problem of being hard to use and to reach people that might not use it. I propose to create a collaborative genealogy research platform where family members can share information easily and in concert with technologies they are already familiar with.

Problem Statement

Genealogy is a complicated field to get involved in, and is imagined to be dominated by older members of families with an interest in preserving family history. The reality is that anybody can get involved in their own genealogical search, and nowadays finding information does not even have to involve leaving the comfort of the Internet.

Analysis of Existing Platforms

While it is possible to get involved in genealogy at any time, there are few modern resources that integrate well with the existing social, collaborative Internet structure. A few such software suites that attempt to make logging genealogical data easy are Ancestry.com, FamilySearch.org, and . The most easy­to­use of these is Ancestry.com, however it has its drawbacks: it is closed­source, and most of the interesting information it contains is not free, even though most data that it provides access to is in the public domain. The most feature­complete is arguably Gramps, which is a cross­platform desktop application that supports import and export in various formats and has extensive note­taking capabilities, but it is fairly difficult to share these files among people for collaboration. All of the above solutions are fairly old, and many rely on old technologies. FamilySearch, the organization behind FamilySearch.org, has created a few proposals for technologies that would bring ancestry research into the current decade, namely GEDCOMX. GEDCOMX is a proposed specification for publishing genealogy information over the Internet via a RESTful web service. Although the specification exists, there are no major clients outside of FamilySearch’s own incomplete ones that use it currently. Because of the niche nature of genealogy software, there is little competition and this will likely become the specification that will be used by new software being written going forward.

Application Goals

By inspecting the existing services outlined above, it is evident that a new solution would use newer, extensible, and portable technologies. Ideally, a new genealogy service would also provide this basic functionality: ● A way to migrate data completely away from other genealogy systems, so people can easily share information they already have. ● A way to export a full backup of data, so people can move it to another system or between systems. ● Be open source, so users can collaborate to extend the application. ● Implement (at least partially) the GEDCOMX specification so data will be easily readable by future researchers. ● Provide an easy way to invite others to collaborate, optionally using social media.

Features

Familiar (named because of the shared root with ‘family’) aims to be an easy­to­setup, easy­to­use, and easy­to­share solution for genealogists that want to self­host their data. There are a number of key features that contribute to these goals: Docker container support, Google and Facebook login support, GEDCOM and GRAMPS import support.

Easy Application Setup

Setting up Familiar is designed to be as easy as possible. On a system with Ruby, the PostgreSQL database server, and a Procfile­based process manager such as foreman or forego installed, Familiar setup is fairly straightforward. Dependencies are installed with a standard Ruby `bundle install` and the database is setup using `rake db:create db:migrate` (provided the executing user has the appropriate permissions in PostgreSQL.) Application configuration closely follows the guidelines for a “12­factor app,” as defined by 12factorapp.net. As such, most configuration is contained in environment variables. ​ ​ Currently, environment settings that are used are Facebook OAuth IDs and secrets, Google OAuth IDs and secrets, and Amazon AWS credentials for using S3 as an image storage backend. In order to further streamline configuration, a bash script (configure_environment.sh) was created to correctly setup the environment variables. This simplicity allows Familiar to be run easily on a server or a cloud service, such as Heroku or Amazon EC2.

Docker Container Support

Much of the process detailed above can be streamlined using Docker containers and the docker­compose tool for managing several containers linked together. If the user has Docker installed, they do not even need to install PostgreSQL, Ruby, or any of the other necessary tools themselves. A docker­compose.yml file has been provided to install and configure the Familiar container and the necessary management tools. This brings setup for the whole stack down to four commands: docker­compose build docker­compose run web rake db:create db:migrate docker­compose run web ./configure_environment.sh docker­compose up

User Account and Social Network Login

User authentication is provided via the Ruby on Rails gem OmniAuth (Github: intridea/omniauth.) OmniAuth has a number of authentication strategies, most of which are OAuth 2­based, meaning that in order for an application to authenticate a user against a third­party service, all that is needed is an application ID and an application secret (obtained by the developer from the third party.) For users that do not want to use social network sign­in, or servers that do not have it configured, traditional username and password authentication is also provided via an OmniAuth strategy, OmniAuth­Identity. Passwords are stored as a salted hash in the database using bcrypt.

Data Import

The ability to import existing genealogical data is a key feature of Familiar. This functionality is provided via a web page only available to site administrators. Currently, data can be imported from two different formats: GRAMPS XML and GEDCOM 5. GRAMPS is a Java­based, desktop genealogy data manager, as described in the Analysis of Existing Platforms. It has the ability to export an XML which contains a complete archive of all of its data, including (optionally) Base64­encoded images. Importing families and relationships from GRAMPS XML is supported, however importing source data and images is not. GEDCOM version 5 is the de facto standard for genealogical data storage, and via the ‘’ Ruby library, person data and family relationship data is able to be imported into the Familiar database. Both import methods feature date processing for birth/death dates, and dates are stored using the ISO­8601 date format.

Taggable and Commentable Notes and Media

Familiar supports writing and saving notes and photos. Notes are edited with a rich text editor in­browser. Both notes and photos have tag lists, which can be used to organize photo and note collections. Additionally, any signed­in user can comment on any note or photo that they have access to.

Areas of Improvement

Familar has reached ‘alpha’ status, and a production instance is currently being run on https://Familiar.rysavys.me. However, there are a number of areas where improvement is ​ necessary or suggested. First, Docker support, while functional, makes a number of assumptions for the user and does not support Amazon S3 as an image storage backend. If this process were further streamlined, a web service could be created to automatically create instances of Familiar for end users, much like something like Slack. Secondly, Familiar is not (yet) a real contender as a genealogy sitebuilder because it does not support source citations. Adding this support would be non­trivial if all the different types of citations supported by GEDCOM were added, but it would be necessary for complete data migration. Third, the data import functionality provided is very lossy. While basic data can be imported, most data is discarded and there is not a way to import existing photos or tag data from a service such as Ancestry.com. Fourth, there is not currently a data export functionality. Since Familiar is based on stateless design, a web crawler could be written fairly easily using simple tools like wget or curl to download the HTML content of every page on the site, but this is less­than­ideal for migrating data to another service. A reasonable next step would be to use the gedcom library already used for import to generate GEDCOM data from the data in the database. Security Precautions and Concerns

As a Ruby on Rails application, Familiar is already at risk for a number of vulnerabilities, given the popularity of the toolset. However, precautions have been taken to make Familiar secure. First, user account passwords (for users that do not log in with a social network) are hashed with a salt using bcrypt, which is a currently popular library for password encryption. Second, there is a simple user permissions system: users are either editors, signed­in users, or anonymous users. Below is a list of the permissions tiers.

Anonymous users can: ● View data about deceased individuals ● View photos and notes

Signed­in users can do everything anonymous users can do, as well as: ● Comment on photos and notes

Editors can do everything signed­in users can do, as well as: ● Delete comments ● Create, edit and delete person data, photos and notes ● View data about living individuals ● Perform data import ● View a list of all users

While these security precautions are in place, they are not perfect and it is likely that there are undiscovered bugs. A number of areas exist where improvements could be made: ● No genealogy data is encrypted. ● HTTP is the default. This is not really an application feature, but in order to secure the connection using HTTPS the user needs to obtain an SSL certificate (from somewhere like Let’s Encrypt or StartSSL) and run behind an HTTPS proxy or modify the app configuration.

Conclusion

As a total replacement for existing genealogy management systems, Familiar provides a small subset of the necessary features for a complete system. However, particular care has been given to ease of use and setup, and with more development, it could easily turn into a simple, capable, and complete sitebuilder.