Development of an Opensource Genealogy Research Platform
Total Page:16
File Type:pdf, Size:1020Kb
Development of an OpenSource Genealogy Research Platform Mitchell Rysavy Abstract Problem Statement Analysis of Existing Platforms Application Goals Features Easy Application Setup Docker Container Support User Account and Social Network Login Data Import Taggable and Commentable Notes and Media Areas of Improvement Security Precautions and Concerns Conclusion Abstract Genealogy software is often hard to use and this complexity may drive users away from using it at all. The purpose of this project is address the problem of genealogy software being hard to use and to reach people that might not use it. I propose to create a collaborative genealogy research platform where family members can share information easily and in concert with technologies they are already familiar with. Problem Statement Genealogy is a complicated field to get involved in, and is imagined to be dominated by older members of families with an interest in preserving family history. The reality is that anybody can get involved in their own genealogical search, and nowadays finding information does not even have to involve leaving the comfort of the Internet. Analysis of Existing Platforms While it is possible to get involved in genealogy at any time, there are few modern resources that integrate well with the existing social, collaborative Internet structure. A few such software suites that attempt to make logging genealogical data easy are Ancestry.com, FamilySearch.org, WebTrees and Gramps. The most easytouse of these is Ancestry.com, however it has its drawbacks: it is closedsource, and most of the interesting information it contains is not free, even though most data that it provides access to is in the public domain. The most featurecomplete is arguably Gramps, which is a crossplatform desktop application that supports import and export in various formats and has extensive notetaking capabilities, but it is fairly difficult to share these files among people for collaboration. All of the above solutions are fairly old, and many rely on old technologies. FamilySearch, the organization behind FamilySearch.org, has created a few proposals for technologies that would bring ancestry research into the current decade, namely GEDCOMX. GEDCOMX is a proposed specification for publishing genealogy information over the Internet via a RESTful web service. Although the specification exists, there are no major clients outside of FamilySearch’s own incomplete ones that use it currently. Because of the niche nature of genealogy software, there is little competition and this will likely become the specification that will be used by new software being written going forward. Application Goals By inspecting the existing services outlined above, it is evident that a new solution would use newer, extensible, and portable technologies. Ideally, a new genealogy service would also provide this basic functionality: ● A way to migrate data completely away from other genealogy systems, so people can easily share information they already have. ● A way to export a full backup of data, so people can move it to another system or between systems. ● Be open source, so users can collaborate to extend the application. ● Implement (at least partially) the GEDCOMX specification so data will be easily readable by future researchers. ● Provide an easy way to invite others to collaborate, optionally using social media. Features Familiar (named because of the shared root with ‘family’) aims to be an easytosetup, easytouse, and easytoshare solution for genealogists that want to selfhost their data. There are a number of key features that contribute to these goals: Docker container support, Google and Facebook login support, GEDCOM and GRAMPS import support. Easy Application Setup Setting up Familiar is designed to be as easy as possible. On a system with Ruby, the PostgreSQL database server, and a Procfilebased process manager such as foreman or forego installed, Familiar setup is fairly straightforward. Dependencies are installed with a standard Ruby `bundle install` and the database is setup using `rake db:create db:migrate` (provided the executing user has the appropriate permissions in PostgreSQL.) Application configuration closely follows the guidelines for a “12factor app,” as defined by 12factorapp.net. As such, most configuration is contained in environment variables. Currently, environment settings that are used are Facebook OAuth IDs and secrets, Google OAuth IDs and secrets, and Amazon AWS credentials for using S3 as an image storage backend. In order to further streamline configuration, a bash script (configure_environment.sh) was created to correctly setup the environment variables. This simplicity allows Familiar to be run easily on a server or a cloud service, such as Heroku or Amazon EC2. Docker Container Support Much of the process detailed above can be streamlined using Docker containers and the dockercompose tool for managing several containers linked together. If the user has Docker installed, they do not even need to install PostgreSQL, Ruby, or any of the other necessary tools themselves. A dockercompose.yml file has been provided to install and configure the Familiar container and the necessary management tools. This brings setup for the whole stack down to four commands: dockercompose build dockercompose run web rake db:create db:migrate dockercompose run web ./configure_environment.sh dockercompose up User Account and Social Network Login User authentication is provided via the Ruby on Rails gem OmniAuth (Github: intridea/omniauth.) OmniAuth has a number of authentication strategies, most of which are OAuth 2based, meaning that in order for an application to authenticate a user against a thirdparty service, all that is needed is an application ID and an application secret (obtained by the developer from the third party.) For users that do not want to use social network signin, or servers that do not have it configured, traditional username and password authentication is also provided via an OmniAuth strategy, OmniAuthIdentity. Passwords are stored as a salted hash in the database using bcrypt. Data Import The ability to import existing genealogical data is a key feature of Familiar. This functionality is provided via a web page only available to site administrators. Currently, data can be imported from two different formats: GRAMPS XML and GEDCOM 5. GRAMPS is a Javabased, desktop genealogy data manager, as described in the Analysis of Existing Platforms. It has the ability to export an XML which contains a complete archive of all of its data, including (optionally) Base64encoded images. Importing families and relationships from GRAMPS XML is supported, however importing source data and images is not. GEDCOM version 5 is the de facto standard for genealogical data storage, and via the ‘gedcom’ Ruby library, person data and family relationship data is able to be imported into the Familiar database. Both import methods feature date processing for birth/death dates, and dates are stored using the ISO8601 date format. Taggable and Commentable Notes and Media Familiar supports writing and saving notes and photos. Notes are edited with a rich text editor inbrowser. Both notes and photos have tag lists, which can be used to organize photo and note collections. Additionally, any signedin user can comment on any note or photo that they have access to. Areas of Improvement Familar has reached ‘alpha’ status, and a production instance is currently being run on https://Familiar.rysavys.me. However, there are a number of areas where improvement is necessary or suggested. First, Docker support, while functional, makes a number of assumptions for the user and does not support Amazon S3 as an image storage backend. If this process were further streamlined, a web service could be created to automatically create instances of Familiar for end users, much like something like Slack. Secondly, Familiar is not (yet) a real contender as a genealogy sitebuilder because it does not support source citations. Adding this support would be nontrivial if all the different types of citations supported by GEDCOM were added, but it would be necessary for complete data migration. Third, the data import functionality provided is very lossy. While basic data can be imported, most data is discarded and there is not a way to import existing photos or tag data from a service such as Ancestry.com. Fourth, there is not currently a data export functionality. Since Familiar is based on stateless design, a web crawler could be written fairly easily using simple tools like wget or curl to download the HTML content of every page on the site, but this is lessthanideal for migrating data to another service. A reasonable next step would be to use the gedcom library already used for import to generate GEDCOM data from the data in the database. Security Precautions and Concerns As a Ruby on Rails application, Familiar is already at risk for a number of vulnerabilities, given the popularity of the toolset. However, precautions have been taken to make Familiar secure. First, user account passwords (for users that do not log in with a social network) are hashed with a salt using bcrypt, which is a currently popular library for password encryption. Second, there is a simple user permissions system: users are either editors, signedin users, or anonymous users. Below is a list of the permissions tiers. Anonymous users can: ● View data about deceased individuals ● View photos and notes Signedin users can do everything anonymous users can do, as well as: ● Comment on photos and notes Editors can do everything signedin users can do, as well as: ● Delete comments ● Create, edit and delete person data, photos and notes ● View data about living individuals ● Perform data import ● View a list of all users While these security precautions are in place, they are not perfect and it is likely that there are undiscovered bugs.