ADTRAN MEMORANDUM CTB-2009-02

DATE: 06/05/09

SUBJECT: Overview of Sparta to Jira Migration1

FROM: C. Trevor Bowen

TO: Quality Directors and Managers, Dennis McMahan

COPY TO: Dan Joffe

ABSTRACT

Sparta is a long-standing issue tracking system, developed internally by Adtran, and used since 1998. With over 50,000 issues, Sparta has served Adtran well in managing defects. However, Adtran has recently begun migrating from Sparta to Jira. Sparta is written in Microsoft's ASP, based in Microsoft's SQL 2000, and hosted on a Microsoft Windows 2000 platform. Jira is written in Java, based on MySQL's 5.0 , and hosted on a Linux RHEL 5.2 clone platform. Migration between the two systems requires some knowledge of both systems, especially the target system, Jira and MySQL. This memo overviews some of the essential concepts of both systems, as well as detailing fundamentals of the custom tools developed to perform the migration. This information should prove helpful as Adtran's other issue tracking systems are migrated to Jira.

1 /home/tbowen/Desktop/ctb_2009-02_sparta_jira_migration.odt

Page 1 of 11 ADTRAN MEMORANDUM CTB-2009-02

INTRODUCTION

Sparta was developed internally by Adtran and used since 1998 to track issues and defects. However, Adtran has recently begun to migrate from Sparta to Jira, a popular commercial issue management system. The EN division has already migrated all open field issues to Jira, as of May 2009. New issues are being entered into Jira, instead of Sparta. Adtran's CN division is in the process of defining data categorization, workflow, and view screens. Once this is finalized, and some training is provided, the CN division will migrate to Jira too.

The Sparta tool uses Microsoft's SQL Server 2000 as its back-end database. User's interact with Sparta through a web-browser, Microsoft's Internet Information Services (IIS), and a front-end developed in Microsoft's ASP language. The Sparta tool resides on the Adtran server, srv-sparta.

Jira is a front-end, written in Java, executed in most any Java Server, such as Tomcat. Jira can use most any SQL backend relational database management system, but Adtran uses MySQL. A traditional web- server is not essential, because Tomcat can serve traditional content as well as Java Server Pages (JSP). However, Apache is used to proxy traffic to eliminate Tomcat's annoying, port 8080 suffix. Adtran's primary Jira instance resides on the Adtran server, jira.adtran.com.

Migration was complicated, because Adtran had a live Jira system with production data, and because the architecture was completely different for both systems. Ultimately, a program was developed in that was used to directly read the data from Sparta's MS-SQL server, transform it appropriately, and directly write it to the MySQL database of an offline Jira development server. After the developmental database was verified, it was uploaded to the offline production server, which was afterward restarted.

This memo explains the rationale, method, and complications of the path chosen to migrate the data. Adtran has many different issue tracking systems, which may one day all be migrated to Jira. If that occurs, then fundamentals of this process must be well understood and documented, as they will be used repeatedly.

MIGRATION METHODS

Adtran's migration to Jira is not a unique challenge. In fact, Jira's developer, Atlassian, details several different methods for migrating data to Jira: http://confluence.atlassian.com/display/JIRA/Migrating+from+Other+Issue+Trackers

1. Built-in importers – Developed specifically for importing from , Mantis, and FogBugz – and no other systems. 2. CSV Importer – Jira offers a built-in wizard that imports issue data packed into a CSV format. However, this only migrates issue basics, not transition history, attachments, etc.. Even comments require a special workaround. 3. Third-Party scripts for Trac – Sparta is not Trac, so this is not helpful.

Page 2 of 11 ADTRAN MEMORANDUM CTB-2009-02

4. Jelly Script – This was Atlassian's recommended method, since Adtran used a custom system, and since transition history was required. 5. RPC services – Offers SOAP and XML-RPC manipulation of Jira. 6. “Your own method” – Direct manipulation of the database is discouraged and not remotely supported.

Options #1 and #3 were immediately dismissed, since Sparta is not one of the supported source . Given Adtran's requirements for full data migration (no data loss for any migrated issue), option #2 was quickly ruled out too. Option #6 was the first thought; however, it was also abandoned at first, because it was strongly discouraged by Atlassian and forum members. The next option, #4, was a “front door” approach, Jelly scripts. This method was used as the first real attempt to migrate Sparta data.

DEAD END: JELLY SCRIPT

Jelly is Apache's XML based scripting “language”, where each XML tag is bound to a Java object or method. Jelly's XML attributes relate to method arguments, and tags can be nested, simulating argument inheritance or dependency.

When Jira's Jelly runner is enabled, Jelly scripts can be fed into a running Jira service to automate data entry. Although well-documented:

http://www.atlassian.com/software/jira/docs/latest/jelly.html

Jira's Jelly implementation leaves much to be desired, which is not apparent until after much experience:

1. Parsing errors are reported merely as a parsing error. No line numbers or offender information is provided. Debug reduces to a brute-force, binary-search, trial-and-error approach, iteratively testing and reducing the input vector until the error is isolated. This is trivial for a handful of lines, but several of the Jelly scripts used to load Sparta data were over a million lines long! 2. Various symbols must be escaped by transforming them into HTML symbols. For example, an apostrophe (') cannot be merely escaped by a backslash, rather, it must be transformed into “'”. This is not difficult, but the necessary mappings are not documented and must be discovered through trial-and-error and a priori knowledge of HTML markup and XML. 3. Jelly scripts are executed as interpreted. Therefore, any interuption leaves Jira in an indeterminate state, forcing complete reload of the original database and re-execution of the corrected entire Jelly script. 4. Jelly execution is slow, very slow. Final migration of the Sparta data using Jelly scripts required over 24 hours, even though all data was local. There were no network or VPN dependencies. 5. Jelly execution consumes a lot of memory. Over 6 GB of RAM was required and allocated to Tomcat's JVM to execute a single 260 MB Jelly script. 6. Jira's Jelly implementation is incomplete. Many functions required manipulation of the SQL

Page 3 of 11 ADTRAN MEMORANDUM CTB-2009-02

database during Jelly execution to adjust timestamps, issue numbers, and user information, which the Jelly tag failed to request and store. 7. Jira's Jelly implementation is horribly broke! The workflow transition tag not only failed to update the timestamps correctly, it also failed to leave any trace in the workflow history log, requiring excessing SQL post-processing to rectify, which yielded the entire process practically useless.

Although an interesting concept, the Jelly script approach proved to be a dead end, because of performance and capability issues. Several Perl programs were developed to read the Sparta database directly from the server, srv-sparta, and write Jelly scripts to local disk, which could be used to import the Sparta data into Jira. These scripts are available for reference here, executed in this order:

[email protected]:/home/tbowen/bin/sync_sparta_users.pl [email protected]:/home/tbowen/bin/sync_sparta_issues.pl [email protected]:/home/tbowen/bin/sync_sparta_history.pl [email protected]:/home/tbowen/bin/sync_sparta_attachments.pl

Later, these scripts were combined into a single Perl program, although this effort was not finalized:

[email protected]:/home/tbowen/bin/sync_sparta_all.pl

Ultimately, these programs were abandoned, because direct SQL manipulation was increasingly added to the Jelly scripts to compensate for incomplete or broken Jelly tags. Eventually, it became more practical to transport the entire database via SQL statements, transforming the data as necessary using an intermediate Perl program.

Because of the time wasted and trust lost investing in Atlassian's Jelly implementation, which was recommended before the RPC approach, the RPC services approach was never seriously investigated.

DIRECT SQL MIGRATION VIA CUSTOM PERL PROGRAM

Although unsupported and discouraged by Atlassian, the Sparta database was migrated by directly querying the Sparta database on srv-sparta, locally transforming the data, and directly writing to a local development server via SQL statements.

The Perl programming language was initially designed for parsing and transforming large amounts of data, and Perl offers powerful modules for interfacing with various databases; therefore, Perl was chosen to develop the programmable step, #3, in the following procedure for migrating Sparta to Jira:

1. Prepare new Jira project via Jira's web GUI. 2. Convert Jira database to UTF-8 encoding. 3. Read, transform, and import Sparta data into development server. 4. Transport database from development to production server.

These fundamental steps are expanded below.

Page 4 of 11 ADTRAN MEMORANDUM CTB-2009-02

1) PREPARE JIRA PROJECT

A new Jira project was created to house all of the issues ported from Sparta into Jira. The project was designed to accommodate read-only access, so users could search, read, and link against the full Sparta database, after it was migrated to Jira. Although similar states, resolutions, fields and other issue data types existed in the Jira system, several new data types were created, so issue data would match as closely as possible between Sparta and Jira.

Creation and setup of a new Jira project through Jira's web interface is well documented in Atlassian's administrator's guide: http://www.atlassian.com/software/jira/docs/latest/administration.html

Therefore, these general instructions will not be duplicated here. However, steps specific to the Sparta migration included creation of the following Jira data types:

• States – Several issue states existed uniquely in the Sparta system, which were necessarily created in Jira. • Resolutions – Multiple issue resolutions were used in Sparta, which were not available in Jira. These were added to the Jira system plus a suffix, “(SPARTA ONLY – DO NOT USE)”, intended to prevent usage outside of the Jira project. • Custom Fields – Most of the fields used in Sparta did not exist in Jira, while some existed, but were used differently. Consequently, several custom fields (as opposed to native, Jira defined fields) were created to hold data parallel to the Sparta project. Because of the large number of custom fields required, a Jelly script was created to insert the custom fields

After creating these fundamental data types, the following Sparta specific schemes were created using much of the above data types:

• “Sparta” Issue Type Scheme – Bug, Feature Request, and Field Issue types. • “Read Only” Permission Scheme – All Sparta users can read, but only project administrators can alter data. • “Sparta” Field Configuration Scheme – All fields related to the Sparta project are available to all issue types. • “Sparta” Issue Type Screen Scheme – All fields currently viewable in the Sparta application are viewable and editable on the primary tab for all issue types, but all defunct Sparta fields are viewable on a separate tab, labeled “Defunct Sparta Fields”. • “Sparta” Workflow Scheme – Any issue type can transition from any Sparta state to any other Sparta state, including the current state.

Workflow creation is normally a tedious process, involving repeated “mouse-clicking” to designate all the possible states and the transition paths between each state. Definition of the issue states was not too painful for the Sparta workflow, since there are only 11 states available to a Sparta issue within Jira. However, allowing an issue to transition from any state to any other state would produce an exercise

Page 5 of 11 ADTRAN MEMORANDUM CTB-2009-02 leading to insanity! Fortunately, workflows are stored within Jira in an XML document. This document may be imported and exported. Therefore, the XML document for a functioning project was exported and reverse-engineered. From this, a Perl program was developed to generate an XML workflow document describing the highly flexible Sparta workflow, which program is located here:

[email protected]:/home/tbowen/bin/create_workflow.pl

The functionality of this project was verified by creating, transitioning, and deleting a few test issues before migration.

2) CONVERT DATABASE CHARACTER ENCODING

The production Jira server already contained live issue data, which could not be lost. Unfortunately, the data was stored in the Latin1 character set. This character set is not compatible with several of the symbols used by computers and other non-English languages, which was already causing some Sparta issues not to display correctly. The UTF-8 character set is far more flexible, accommodating a wider assortment of symbols from multiple languages and sources. Therefore, conversion to the UTF-8 encoding process was integrated into the retrieval process from the production server:

Convert MySQL Database Encoding, Latin1 to UTF-8

On the Jira server, the Jira service had to first be shut-down using a terminal on the remote, production host:

sudo /etc/init.d/jira stop

Afterward, the database could be safely exported without any fear of ongoing database modification by active users:

mysqldump -h localhost -u root -p --default-character-set=latin1 -c \ --insert-ignore --skip-set-charset jira_313 > dump.

This dumped the entire Jira database to a single, Latin1 encoded, plain-text file (dump.sql), consisting of SQL commands sufficient to recreate the entire database. This file was retrieved using SCP, a secure-shell copy command:

scp -p jira.adtran.com:/home/tbowen/dump.sql /home/tbowen/dump.sql

The Latin1 character encoding for the SQL dump was converted to to UTF-8 using:

iconv -f ISO-8859-1 -t UTF-8 dump.sql > dump_utf8.sql

Although the file was now converted, it still contained many internal statements that declared the encoding to be Latin1. These were converted using a Perl one-liner to preform a “search and replace”:

Page 6 of 11 ADTRAN MEMORANDUM CTB-2009-02

perl -pi -w -e 's/CHARSET=latin1/CHARSET=utf8/g;' dump_utf8.sql

On the local, development server, the converted database was loaded using the shell, initialized like so:

mysql -u root -p jira_313 --default-character-set=utf8

Inside the mysql shell, the following commands were entered to recreate and load the database with UTF-8 encoding:

DROP DATABASE jira_313; CREATE DATABASE jira_313 CHARACTER SET utf8 COLLATE utf8_general_ci; USE jira_313; SOURCE dump_utf8.sql; quit;

Now, the database was ready for new records based on the Sparta data.

3) MIGRATE DATA

A custom Perl program was developed to read data directly from the Sparta Microsoft SQL server. This data was then transformed inside the Perl program and written to the underlying MySQL server for the offline Jira service. This program is available here:

[email protected]:/home/tbowen/bin/migrate_sparta.pl

Please consult this file for the final implementation of all the program components described below.

Reading Sparta's Microsoft SQL Data

Unsurprisingly, Microsoft does not offer a Linux client to access their SQL server. However, the kind folks at FreeTDS offer an open-source library to connect a Linux computer to a Microsoft SQL server:

http://www.freetds.org

Several useful tutorials are available for downloading, installing, configuring, and utilizing the FreeTDS library with Perl's Sybase database access module:

http://www.perlmonks.org/?node_id=392385 http://www.easysoft.com/developer/languages/perl/tutorial_data_web.html http://www.unixodbc.org/doc/FreeTDS.html http://coding.derkeiler.com/Archive/Perl/perl.dbi.users/2006-09/msg00108.html

After installation and configuration of the FreeTDS library, the following files were created on the development server:

Page 7 of 11 ADTRAN MEMORANDUM CTB-2009-02

/etc/unixODBC/odbcinst.ini [FreeTDS] Description = v.062 with protocol v7.0 Driver = /usr/lib/libtdsodbc.so.0 UsageCount = 1

The above file specifies the location of the FreeTDS driver for the unixODBC manager. The next file specifies the host's default drivers:

/etc/unixODBC/odbc.ini [srv-sparta] Driver = FreeTDS Description = Sparta Database Trace = No Server = srv-sparta Port = 1433 Database = sparta

This last file designates the user's driver configuration, which overrides the host specification:

/home/tbowen/.odbc.ini [srv-sparta] Driver = FreeTDS Description = Sparta Database Trace = No Server = srv-sparta Port = 1433 Database = sparta

[common] Driver = FreeTDS Description = Common Sparta Database Trace = No Server = srv-sparta Port = 1433 Database = common

Although a exists for communicating with Microsoft SQL servers using the unix ODBC manager, it proved buggy. The Sybase Perl module proved much more effective and was ultimately used in all communication with the Sparta server. (Microsoft bought Sybase and creatively renamed the product to Microsoft SQL Server.)

A Perl program was developed that would read all the available tables on Sparta, decipher the table structure and field types, and copy the data to a new database on a local MySQL server:

[email protected]:/home/tbowen/bin/shadow_sparta.pl

This program proved useful for duplicating the available Sparta, but more importantly, it proved helpful in researching and understanding the procedures required to communicate with the Sparta SQL server. Many parts of this program reappeared in the final, migrate_sparta.pl, program.

Page 8 of 11 ADTRAN MEMORANDUM CTB-2009-02

The FreeTDS tsql and isql programs proved helpful for interactive debugging of SQL queries sent to the Sparta MS-SQL server.

Writing SQL Data to MySQL

Interfacing a MySQL server through the DBI module is well documented. Tutorials abound on the web, so configuration will not be detailed here.

Reading LDAP User Data

Adtran's Jira system is configured to use the CORP LDAP service (running under Microsoft's Active Directory server) for all user authentication. This requires users to enter the same user name and password as those used to log into any computer controlled by the CORP domain. Unfortunately, Sparta used a separate user management database. Neither the user names nor passwords matched the CORP database. Consequently, the Sparta identification for over a thousand users needed to be migrated to the LDAP based system.

This was accomplished using the same credentials utilized by Jira's LDAP authentication system to retrieve a list of users, full names, and email addresses. A Sparta user name was compared to the list of Jira user names, searching for a match. If no match was discovered, then the username was truncated to 7 characters, which is a current Adtran policy. If no match was found, then a new user would have to be created in Jira.

The Sparta user name was then checked against the list of LDAP usernames. If no match was found, then the username would again be truncated to 7 characters and checked against the LDAP list If no match was found, the user name was flagged, and the Sparta username was used in Jira. Once a Jira username was settled, a new user was created based on the LDAP information, if available; otherwise, the Sparta user information was used.

This algorithm was embedded as the first part of the migrate_sparta.pl program to synchronize users between the two systems.

An unavoidable consequence of this algorithm is that a person may have two unique, separate accounts: one for Jira and one for Sparta. This will require manual manipulation of the SQL database, assigning all history for the old Sparta user (“spartaId”) to the new Jira user (“corpId”), like so:

UPDATE userbase SET username='corpId' WHERE username='spartaId'; UPDATE membershipbase SET USER_NAME='corpId' WHERE USER_NAME='spartaId'; UPDATE jiraissue SET ASSIGNEE='corpId' WHERE ASSIGNEE='spartaId'; UPDATE jiraissue SET REPORTER='corpId' WHERE REPORTER='spartaId'; UPDATE jiraaction SET AUTHOR='corpId' WHERE AUTHOR='spartaId'; UPDATE jiraaction SET UPDATEAUTHOR='corpId' WHERE UPDATEAUTHOR='spartaId'; UPDATE projectroleactor SET ROLETYPEPARAMETER='corpId' WHERE \ ROLETYPEPARAMETER='spartaId';

Page 9 of 11 ADTRAN MEMORANDUM CTB-2009-02

Jira caches all user data, so any changes made to the underlying MySQL database require a restart of Jira to take effect.

System Database Architectures

The Sparta database architecture is straight-forward, contained in 7 essential tables:

• common.person – Contains basic user information: user id, first name, last name, email, phone number, department, office location, etc. • common.product – Product information used by Sparta, derived from BaaN. This is the second table used from the “common” database on srv-sparta for this project.

• sparta.localperson – Maps username to Sparta user id number. • sparta.issue – Holds issue status, field values, summary, author, assignee, etc. • sparta.history – A transaction table that logs any action taken by a user on a given issue, plus possible comments and some breadcrumbs associated with field value changes. • sparta.files – Attachment data. • sparta.nomenclature – A translation table that maps action and feature numbers to displayed text.

The Jira architecture is not so simple. Spread across 96 tables, the Jira database requires “joins” of multiple tables to produce the simplest of queries. Some of the tables are documented here:

http://confluence.atlassian.com/display/JIRA/Database+Schema

And, some example queries are available here:

http://confluence.atlassian.com/display/JIRA/Example+SQL+queries+for+JIRA

For direct manipulation of the underlying MySQL database, one of the most important tables to master is the SEQUENCE_VALUE_ITEM. This table contains the next number available for use in each of the other tables. Failure to update this table will result in the inevitable corruption of data as Jira overwrites data created during the migration process.

Unfortunately, given the size and breadth of the Jira architecture, a fuller discussion of the tables used in the migration process is well beyond the scope of this introductory memo. The reader is commended to the migration source code for detailed examples, since various tidbits are only documented there.

Several generic Perl functions were developed and used in migrate_sparta.pl, which create Jira users, update user attributes, create Jira components, create Jira versions, retrieve versions, set custom field values, add comments, and execute a transition in a workflow. These functions should prove useful in future migration efforts.

Page 10 of 11 ADTRAN MEMORANDUM CTB-2009-02

4) TRANSPORT DATA

All development was performed on a dedicated server to isolate unwanted effects from the production environment. After migrating all Sparta data to the development server, the MySQL database was again dumped, copied to the production server, and loaded over the old data – similar to the steps performed in the original conversion and retrieval process.

Afterward, the Jira service could be restarted:

sudo /etc/init.d/jira start

Modification of any issue data, outside of Jira, requires that the database to be re-indexed, which synchronizes the external Lucene index to the updated issue information. This index file is used for all searching and sorting functions. After Jira was re-indexed, users were able to begin searching for information, reading issues, moving issues to new projects, creating new issues, and linking new issues against migrated Sparta issues.

SUMMARY

The Sparta database has been migrated from a Microsoft SQL Server to Jira's MySQL backend server for Adtran's EN division. The CN division will migrate soon, pending definition of projects and screens. A monolithic Perl program was developed to read, transform, and write the SQL data. Migration process requires a few minutes to a few hours, depending on network connection. Processes and code developed in this project should prove useful as other issue management systems migrate to Jira within Adtran.

SUMMARY

Many thanks to Marla Harvey, the Sparta administrator, and Ed Bryan, who helped in understanding Sparta, so it could be properly migrated.

Page 11 of 11