Socorro Documentation Release 2

Mozilla

June 18, 2014

Contents

1 Overview 3 1.1 Socorro Server...... 3 1.2 Socorro UI...... 3 1.3 Data Flow...... 3

2 Installation 5 2.1 Socorro VM (built with Vagrant + Puppet)...... 5 2.2 Automated Install using Puppet...... 5 2.3 Manual Install...... 5

3 Collector 11 3.1 Collector Python Configuration...... 11 3.2 Common Configuration...... 11 3.3 Collector Configuration...... 11

4 Processor 13 4.1 Introduction...... 13

5 Middleware API 15 5.1 API map...... 15 5.2 Bugs...... 16 5.3 Crashes Comments...... 17 5.4 Crashes Frequency...... 19 5.5 Crashes Paireduuid...... 21 5.6 Crashes Signatures...... 22 5.7 Extensions...... 23 5.8 Crash Trends...... 24 5.9 Job...... 25 5.10 Priorityjobs...... 26 5.11 Products...... 26 5.12 Products Builds...... 27 5.13 Signature URLs...... 29 5.14 Search...... 30 5.15 List Report...... 33 5.16 Versions Info...... 36 5.17 Forcing an implementation...... 37

6 Socorro UI 39 6.1 Coding Standards...... 39

i 6.2 Adding new reports...... 39

7 UI Installation 43 7.1 Installation...... 43 7.2 Trouble Shooting...... 45

8 Server 47 8.1 The Applications...... 47

9 crontabber 49 9.1 crontab runs crontabber...... 49 9.2 Dependencies...... 49 9.3 Own configurations...... 50 9.4 App names versus/or class names...... 51 9.5 Manual intervention...... 51 9.6 Frequency and execution time...... 52 9.7 Timezone and UTC...... 52 9.8 Writing cron apps (aka. jobs)...... 52

10 Throttling 55 10.1 throttleConditions...... 55

11 Deployment 57 11.1 Introduction...... 57 11.2 Outage Page...... 57

12 Development Discussions 59 12.1 Coding Conventions...... 59 12.2 New Developer Guide...... 59 12.3 Glossary...... 70 12.4 Standalone Development Environment...... 86 12.5 Unit Testing...... 87 12.6 Crash Repro Filtering Report...... 88 12.7 Disk Performance Tests...... 89 12.8 Dumping Dump Tables...... 91 12.9 JSON Dump Storage...... 93 12.10 Processed Dump Storage...... 96 12.11 Report Database Design...... 97 12.12 Code and Database Update...... 99 12.13 Out-of-Date Data Warning...... 106 12.14 Database Schema...... 107 12.15 Package...... 113 12.16 Schema...... 113 12.17 Tables used primarily when processing Jobs...... 113 12.18 Tables primarily used during data extraction...... 115 12.19 Tables primarily used for materialized views...... 116 12.20 Dimensions tables...... 116 12.21 View tables...... 117 12.22 Bug tracking...... 119 12.23 Meta data...... 120 12.24 Database Setup...... 120 12.25 Common Config...... 121 12.26 Populate ElasticSearch...... 124

13 PostgreSQL Database 127 ii 13.1 PostgreSQL Database Tables by Data Source...... 127 13.2 Manually Populated Tables...... 127 13.3 Tables Receiving External Data...... 127 13.4 Automatically Populated Reference Tables...... 128 13.5 Matviews...... 128 13.6 Application Management Tables...... 129 13.7 Deprecated Tables...... 129 13.8 PostgreSQL Database Table Descriptions...... 130 13.9 Raw Data Tables...... 130 13.10 Normalized Fact Tables...... 131 13.11 Dimensions...... 133 13.12 Matviews...... 135 13.13 Note On Release Channel Columns...... 137 13.14 Application Support Tables...... 137 13.15 Creating a New Matview...... 138 13.16 Do I Want a Matview?...... 138 13.17 Components of a Matview...... 139 13.18 Creating the Matview Table...... 139 13.19 Database Admin Function Reference...... 142 13.20 MatView Functions...... 142 13.21 Schema Management Functions...... 146 13.22 Other Administrative Functions...... 148 13.23 Custom Time-Date Functions...... 148 13.24 Database Misc Function Reference...... 150 13.25 Formatting Functions...... 150 13.26 API Functions...... 151 13.27 Populate PostgreSQL...... 151

14 How generic app and an example works using configman 155 14.1 The minimum app...... 155 14.2 Connecting and handling transactions...... 155 14.3 What was the point of that?!...... 156

15 Writing documentation 157 15.1 Installing Sphinx...... 157 15.2 Making the HTML...... 157 15.3 Making it appear on ReadTheDocs...... 157 15.4 Or, just send the pull request...... 157 15.5 Or, just edit the documentation online...... 158

16 Indices and tables 159

iii iv Socorro Documentation, Release 2

The current focus of Socorro development is to make a server which can accept crash reports from . See http://wiki.mozilla.org/Breakpad for more information. Socorro mailing list https://lists.mozilla.org/listinfo/tools-socorro This documentation is available on Github, and if you want to, feel free to clone the repo, make some changes in a fork and send us a pull request. Contents:

Contents 1 Socorro Documentation, Release 2

2 Contents CHAPTER 1

Overview

The Socorro Crash Reporting system consists of two pieces, the Socorro Server and the Socorro UI.

1.1 Socorro Server

Server is a Python API and a collection of applications and web services that use the API. The applications together embody a set of servers to take crash dumps generated by remote clients, process using the breakpad_stackdump application and save the results in HBase. Additional processes aggregate and filter data for storage in a relational database. The server consists of these components: • Collector • Hadoop/HBase • Processor • [[SocorroRegistrar]] • [[SocorroWebServices]]

1.2 Socorro UI

Socorro UI is a Web application to access and analyze the database contents via search and generated reports.

1.3 Data Flow

Crash dumps are accepted by the Collector, a mod_wsgi application running under Apache. Collector stores the crashes into HBase. Using Hadoop jobs, the crash dumps in HBase are converted into searchable json files using Processor. The Processor s are also long running applications that live on Hadoop processing nodes. They accept tasks from map reduce jobs and employ the stackwalk_server? to convert crashes into json files stored back into HBase. Filtering through these converted crashes using the Throttling rules initially applied by the Collector. The Socorro UI allows developers to browse the crash information from the relational database. In addition to being able to examine specific individual crash reports, there are trend reports that show which crashes are the most common as well as the status of bugs about those crashes in .

3 Socorro Documentation, Release 2

Next Steps: Installation

4 Chapter 1. Overview CHAPTER 2

Installation

2.1 Socorro VM (built with Vagrant + Puppet)

You can build a standalone Socorro development VM - see Setup a development environment for more info. The config files and puppet manifests in ./puppet/ are a useful reference when setting up Socorro for the first time.

2.2 Automated Install using Puppet

It is possible to use puppet to script an install onto an existing environment. This has been tested in EC2 but should work on any regular Lucid install. See puppet/bootstrap.sh for an example.

2.3 Manual Install

2.3.1 Requirements

Breakpad client and symbols Socorro aggregates and reports on Breakpad crashes. Read more about getting started with Breakpad. You will need to produce symbols for your application and make these files available to Socorro.

• Linux (tested on Ubuntu Lucid and RHEL/CentOS 6) • HBase (Cloudera CDH3) • PostgreSQL 9.0 • Python 2.6

2.3.2 Ubuntu

1. Add PostgreSQL 9.0 PPA from https://launchpad.net/~pitti/+archive/postgresql 2. Add Cloudera apt source from https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation- InstallingCDH3onUbuntuSystems

5 Socorro Documentation, Release 2

3. Install dependencies using apt-get As root: apt-get install supervisor rsyslog libcurl4-openssl-dev build-essential sun-java6-jdk ant python-software-properties subversion libpq-dev python-virtualenv python-dev libcrypt-ssleay-perl phpunit php5-tidy python-psycopg2 python-simplejson apache2 libapache2-mod-wsgi memcached php5-pgsql php5-curl php5-dev php-pear php5-common php5-cli php5-memcache php5 php5-gd php5-mysql php5-ldap hadoop-hbase hadoop-hbase-master hadoop-hbase-thrift curl liblzo2-dev postgresql-9.0 postgresql-plperl-9.0 postgresql-contrib

2.3.3 RHEL/Centos

Use “text install” Choose “minimal” as install option. 1. Add Cloudera yum repo from https://ccp.cloudera.com/display/CDHDOC/CDH3+Installation#CDH3Installation- InstallingCDH3onRedHatSystems 2. Add PostgreSQL 9.0 yum repo from http://www.postgresql.org/download/linux#yum 3. Install Sun Java JDK version JDK 6u16 - Download appropriate package from http://www.oracle.com/technetwork/java/javase/downloads/index.html 4. Install dependencies using YUM: As root: yum install python-psycopg2 simplejson httpd mod_ssl mod_wsgi postgresql-server postgresql-plperl perl-pgsql_perl5 postgresql-contrib subversion make rsync php-pecl-memcache memcached php-pgsql subversion gcc-c++ curl-devel ant python-virtualenv php-phpunit-PHPUnit hadoop-0.20 hadoop-hbase daemonize

5. Disable SELinux As root: Edit /etc/sysconfig/selinux and set “SELINUX=disabled” 6. Reboot As root: shutdown -r now

2.3.4 Download and install Socorro

Determine latest release tag from https://wiki.mozilla.org/Socorro:Releases#Previous_Releases Clone from github, as the socorro user: git clone https://github.com/mozilla/socorro git checkout LATEST_RELEASE_TAG_GOES_HERE cd socorro cp scripts/config/commonconfig.py.dist scripts/config/commonconfig.py

Edit scripts/config/commonconfig.py From inside the Socorro checkout, as the socorro user, change: databaseName.default=’breakpad’ databaseUserName.default=’breakpad_rw’ databasePassword.default=’aPassword’

If you change the password, make sure to change it in sql/roles.sql as well.

2.3.5 Run unit/functional tests, and generate report

From inside the Socorro checkout, as the socorro user:

6 Chapter 2. Installation Socorro Documentation, Release 2

make coverage

2.3.6 Set up directories and permissions

As root: mkdir /etc/socorro mkdir /var/log/socorro mkdir -p /data/socorro useradd socorro chown socorro:socorro /var/log/socorro mkdir /home/socorro/primaryCrashStore /home/socorro/fallback chown apache /home/socorro/primaryCrashStore /home/socorro/fallback chmod 2775 /home/socorro/primaryCrashStore /home/socorro/fallback

Note - use www-data instead of apache for /ubuntu Compile minidump_stackwalk From inside the Socorro checkout, as the socorro user: make minidump_stackwalk

2.3.7 Install socorro

From inside the Socorro checkout, as the socorro user: make install

By default, this installs files to /data/socorro. You can change this by specifying the PREFIX: make install PREFIX=/usr/local/socorro

2.3.8 How Socorro Works

There are two main parts to Socorro: 1. collects, processes, and allows real-time searches and results for individual crash reports This requires both HBase and PostgreSQL, as well as the Collector, Crashmover, Monitor, Processor and Middleware and UI. Individual crash reports are pulled from long-term storage (HBase) using the /report/index/ page, for example: http://crash-stats/report/index/YOUR_CRASH_ID_GOES_HERE The search feature is at: http://crash-stats/query 2. a set of batch jobs which compiles aggregate reports and graphs, such as “Top Crashes by Signature” This requires PostgreSQL, Middleware and UI. It triggered once per day by the “daily_matviews” cron job, covering data processed in the previous UTC day. Every other page on http://crash-stats is of this type.

2.3. Manual Install 7 Socorro Documentation, Release 2

2.3.9 Crash Flow

The basic flow of an incoming crash is: (breakpad client) -> (collector) -> (local file system) -> (newCrashMover.py) -> (hbase) A single machine will need to run the Monitor service, which watches hbase for incoming crashes and queues them up for the Processor service (which can run on one or more servers). Monitor and Processor use PostgreSQL to coordinate. Finally, processed jobs are inserted into both hbase and PostgreSQL

2.3.10 Configure Socorro

These pages show how to start the services manually, please also see the next section “Install startup scripts”: • Start configuration with Common Config • On the machine(s) to run collector, setup Collector • On the machine(s) to run collector setup Crash Mover • On the machine to run monitor, setup Monitor • On same machine that runs monitor, setup Deferred Cleanup • On the machine(s) to run processor, setup Processor

2.3.11 Install startup scripts

RHEL/CentOS only (Ubuntu TODO - see ./puppet/files/etc_supervisor for supervisord example) As root: ln -s /data/socorro/application/scripts/init.d/socorro-{monitor,processor,crashmover} /etc/init.d/ chkconfig socorro-monitor on chkconfig socorro-processor on chkconfig socorro-crashmover on service httpd restart chkconfig httpd on service memcached restart chkconfig memcached on

2.3.12 Install Socorro cron jobs

As root: ln -s /data/socorro/application/scripts/crons/socorrorc /etc/socorro/ crontab /data/socorro/application/scripts/crons/example.crontab

2.3.13 PostgreSQL Config

RHEL/CentOS - Initialize and enable on startup (not needed for Ubuntu) As root:

8 Chapter 2. Installation Socorro Documentation, Release 2

service postgresql initdb service postgresql start chkconfig postgresql on

As root: • edit /var/lib/pgsql/data/pg_hba.conf and change IPv4/IPv6 connection from “ident” to “md5” • edit /var/lib/pgsql/data/postgresql.conf and: – uncomment # listen_addresses = ‘localhost’ – change TimeZone to ‘UTC’ • edit other postgresql.conf paramters per www.postgresql.org community guides

2.3.14 Populate PostgreSQL Database

Refer to Populate PostgreSQL for information about loading the schema and populating the database. This step is required to get basic information about existing product names and versions into the system.

2.3.15 Configure Apache

As root: edit /etc/httpd/conf.d/socorro.conf cp config/socorro.conf /etc/httpd/conf.d/socorro.conf mkdir /var/log/httpd/{crash-stats,crash-reports,socorro-}.example.com chown apache /data/socorro/htdocs/application/logs/

Note - use www-data instead of apache for debian/ubuntu

2.3.16 Enable PHP short_open_tag

As root: edit /etc/php.ini and make the following changes: short_open_tag= On date.timezone=’America/Los_Angeles’

2.3.17 Configure Kohana (PHP/web UI)

Refer to UI Installation (deprecated as of 2.2, new docs TODO)

2.3.18 Hadoop+HBase install

Configure Hadoop 0.20 + HBase 0.89 Refer to https://ccp.cloudera.com/display/CDHDOC/HBase+Installation Note - you can start with a standalone setup, but read all of the above for info on a real, distributed setup! RHEL/CentOS only (not needed for Ubuntu) Install startup scripts As root:

2.3. Manual Install 9 Socorro Documentation, Release 2

service hadoop-hbase-master start chkconfig hadoop-hbase-master on service hadoop-hbase-thrift start chkconfig hadoop-hbase-thrift on

2.3.19 Load Hbase schema

FIXME this skips LZO suport, remove the “sed” command if you have it installed From inside the Socorro checkout, as the socorro user: cat analysis/hbase_schema | sed ’s/LZO/NONE/g’ | hbase shell

2.3.20 System Test

Generate a test crash: 1. Install http://code.google.com/p/crashme/ add-on for Firefox 2. Point your Firefox install at http://crash-reports/submit See: https://developer.mozilla.org/en/Environment_variables_affecting_crash_reporting If you already have a crash available and wish to submit it, you can use the standalone submitter tool: From inside the Socorro checkout, as the socorro user: virtualenv socorro-virtualenv . socorro-virtualenv/bin/activate pip install poster cp scripts/config/submitterconfig.py.dist scripts/config/submitterconfig.py export PYTHONPATH=.:thirdparty python scripts/submitter.py -u http://crash-reports/submit -j ~/Downloads/crash.json -d ~/Downloads/crash.dump

You should get a “CrashID” returned. Check syslog logs for user.*, should see the CrashID returned being collected. Attempt to pull up the newly inserted crash: http://crash-stats/report/index/YOUR_CRASH_ID_GOES_HERE The (syslog “user” facility) logs should show this new crash being inserted for priority processing, and it should be available shortly thereafter.

10 Chapter 2. Installation CHAPTER 3

Collector

Collector is an application that runs under Apache using mod-python. Its task is accepting crash reports from remote clients and saving them in a place and format usable by further applications. Raw crashes are accepted via HTTP POST. The form data from the POST is then arranged into a JSON and saved into the local file system. The collector is responsible for assigning an ooid? (Our Own ID) to the crash. It also assigns a Throttle? value which determines if the crash is eventually to go into the relational database. Should the saving to a local file system fail, there is a fallback storage mechanism. A second file system can be configured to take the failed saves. This file system would likely be an NFS mounted file system. After a crash is saved, there is an app called Crash Mover that will transfer the crashes to HBase.

3.1 Collector Python Configuration

Like all the Socorro applications, the configuration is actually executable Python code. Two configuration files are relevant for collector • Copy .../scripts/config/commonconfig.py.dist to .../config/commonconfig.py. This configura- tion file contains constants used by many of the Socorro applications. • Copy .../scripts/config/collectorconfig.py.dist to .../config/collectorconfig.py

3.2 Common Configuration

There are two constants in ‘.../scripts/config/commonconfig.py’ of interest to collector: jsonFileSuffix, and dumpFile- Suffix. Other constants in this file are ignored. To setup the common configuration, see Common Config.

3.3 Collector Configuration collectorconfig.py has several options to adjust how files are stored: See sample config code on Github

11 Socorro Documentation, Release 2

12 Chapter 3. Collector CHAPTER 4

Processor

4.1 Introduction

Socorro Processor is a multithreaded application that applies JSON/dump pairs to the stackwalk_server application, parses the output, and records the results in the hbase. The processor, coupled with stackwalk_server, is computation- ally intensive. Multiple instances of the processor can be run simultaneously from different machines. See sample config code on Github

13 Socorro Documentation, Release 2

14 Chapter 4. Processor CHAPTER 5

Middleware API

5.1 API map

5.1.1 New-style, documented services

• /bugs/ • /crashes/ – /crashes/comments – /crashes/frequency – /crashes/paireduuid – /crashes/signatures • extensions/ • crashtrends/ • job/ • priorityjobs/ • products/ • products/builds/ • products/ – products/builds/ – products/versions/ • report/ – report/list/ • signatureurls • search/ – search/crashes/ – search/signatures/ • util/ – util/versions_info/

15 Socorro Documentation, Release 2

5.1.2 Old-style, undocumented services

See source code in .../socorro/services/ for more details. • /adu/byday • /adu/byday/details • /bugs/by/signatures • /crash • /current/versions • /email • /emailcampaigns/campaign • /emailcampaigns/campaigns/page • /emailcampaigns/create • /emailcampaigns/subscription • /emailcampaigns/volume • /reports/hang • /schedule/priority/job • /topcrash/sig/trend/history • /topcrash/sig/trend/rank

5.2 Bugs

Return a list of signature - bug id associations.

5.2.1 API specifications

HTTP method POST URL schema /bugs/ Full URL /bugs/ Example http://socorro-api/bpapi/bugs/ data: signatures=mysignature+anothersig+jsCrashSig

5.2.2 Mandatory parameters

Name Type of value Default value Description signatures List of strings None Signatures of bugs to get.

5.2.3 Optional parameters

None.

16 Chapter 5. Middleware API Socorro Documentation, Release 2

5.2.4 Return value

In normal cases, return something like this: { "hits":[ { "id":"789012", "signature":"mysignature" }, { "id":"405060", "signature":"anothersig" } ], "total":2 }

5.3 Crashes Comments

Return a list of comments on crash reports, filtered by signatures and other fields.

5.3.1 API specifications

HTTP GET method URL /crashes/comments/(parameters) schema Full /crashes/comments/signature/(signature)/products/(products)/from/(from_date)/to/(to_date)/versions/(versions)/os/(os_name)/branches/(branches)/reasons/(crash_reason)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/ URL Exam- http://socorro- ple api/bpapi/crashes/comments/signature/SocketSend/products/Firefox/versions/Firefox:4.0.1/from/2011- 05-01/to/2011-05-05/os/Windows/

5.3.2 Mandatory parameters

Name Type of value Default value Description signature String None Signature of crash reports to get.

5.3. Crashes Comments 17 Socorro Documentation, Release 2

5.3.3 Optional parameters

Name Type of De- Description value fault value products String or ‘Fire- The product we are interested in. (e.g. Firefox, Fennec, Thunderbird. . . ) list of fox‘ strings from Date Now - Search for crashes that happened after this date. Can use the following 7 days formats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-dd HH:ii:ss.S‘. to Date Now Search for crashes that happened before this date. Can use the following formats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-dd HH:ii:ss.S‘. versions String or None Restring to a specific version of the product. Several versions can be list of specified, separated by a + symbol. strings os String or None Restrict to an Operating System. (e.g. Windows, Mac, Linux. . . ) Several list of versions can be specified, separated by a + symbol. strings branches String or None Restrict to a branch of the product. Several branches can be specified, list of separated by a + symbol. strings reasons String or None Restricts search to crashes caused by this reason. list of strings build_ids Integer or None Restricts search to crashes that happened on a product with this build ID. list of integers build_from Integer or None Restricts search to crashes with a build id greater than this. list of integers build_to Integer or None Restricts search to crashes with a build id lower than this. list of integers re- String ‘any‘ Can be ‘any‘, ‘browser‘ or ‘plugin‘. port_process re- String ‘any‘ Can be ‘any‘, ‘crash‘ or ‘hang‘. port_type plugin_in String or ‘name‘ Search for a plugin in this field. ‘report_process‘ has to be set to ‘plugin‘. list of strings plu- String ‘de- How to search for this plugin. report_process has to be set to plugin. Can gin_search_mode fault‘ be either ‘default‘, ‘is_exactly‘, ‘contains‘ or ‘starts_with‘. plu- String or None Terms to search for. Several terms can be specified, separated by a + gin_terms list of symbol. report_process has to be set to plugin. strings

5.3.4 Return value

In normal cases, return something like this:

18 Chapter 5. Middleware API Socorro Documentation, Release 2

{ "hits":[ { "date_processed":"2011-03-16 06:54:56.385843", "uuid":"06a0c9b5-0381-42ce-855a-ccaaa2120116", "user_comments":"My firefox is crashing in an awesome way", "email":"[email protected]" }, { "date_processed":"2011-03-16 06:54:56.385843", "uuid":"06a0c9b5-0381-42ce-855a-ccaaa2120116", "user_comments":"I <3 Firefox crashes!", "email":"[email protected]" } ], "total":2 }

If no signature is passed as a parameter, return null.

5.4 Crashes Frequency

Return the number and frequency of crashes on each OS.

5.4.1 API specifications

HTTP GET method URL /crashes/frequency/(parameters) schema Full /crashes/frequency/signature/(signature)/products/(products)/from/(from_date)/to/(to_date)/versions/(versions)/os/(os_name)/branches/(branches)/reasons/(crash_reason)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/ URL Exam- http://socorro- ple api/bpapi/crashes/frequency/signature/SocketSend/products/Firefox/versions/Firefox:4.0.1/from/2011- 05-01/to/2011-05-05/os/Windows/

5.4.2 Mandatory parameters

Name Type of value Default value Description signature String None Signature of crash reports to get.

5.4. Crashes Frequency 19 Socorro Documentation, Release 2

5.4.3 Optional parameters

Name Type of De- Description value fault value products String or ‘Fire- The product we are interested in. (e.g. Firefox, Fennec, Thunderbird. . . ) list of fox‘ strings from Date Now - Search for crashes that happened after this date. Can use the following 7 days formats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-dd HH:ii:ss.S‘. to Date Now Search for crashes that happened before this date. Can use the following formats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-dd HH:ii:ss.S‘. versions String or None Restring to a specific version of the product. Several versions can be list of specified, separated by a + symbol. strings os String or None Restrict to an Operating System. (e.g. Windows, Mac, Linux. . . ) Several list of versions can be specified, separated by a + symbol. strings branches String or None Restrict to a branch of the product. Several branches can be specified, list of separated by a + symbol. strings reasons String or None Restricts search to crashes caused by this reason. list of strings build_ids Integer or None Restricts search to crashes that happened on a product with this build ID. list of integers build_from Integer or None Restricts search to crashes with a build id greater than this. list of integers build_to Integer or None Restricts search to crashes with a build id lower than this. list of integers re- String ‘any‘ Can be ‘any‘, ‘browser‘ or ‘plugin‘. port_process re- String ‘any‘ Can be ‘any‘, ‘crash‘ or ‘hang‘. port_type plugin_in String or ‘name‘ Search for a plugin in this field. ‘report_process‘ has to be set to ‘plugin‘. list of strings plu- String ‘de- How to search for this plugin. report_process has to be set to plugin. Can gin_search_mode fault‘ be either ‘default‘, ‘is_exactly‘, ‘contains‘ or ‘starts_with‘. plu- String or None Terms to search for. Several terms can be specified, separated by a + gin_terms list of symbol. report_process has to be set to plugin. strings

5.4.4 Return value

In normal cases, return something like this:

20 Chapter 5. Middleware API Socorro Documentation, Release 2

{ "hits":[ { "count": 167, "build_date":"20120129064235", "count_mac":0, "frequency_windows":1, "count_windows": 167, "frequency":1, "count_linux":0, "total": 167, "frequency_linux":0, "frequency_mac":0 }, { "count":1, "build_date":"20120129063944", "count_mac":1, "frequency_windows":0, "count_windows":0, "frequency":1, "count_linux":0, "total":1, "frequency_linux":0, "frequency_mac":1 } ], "total":2 }

5.5 Crashes Paireduuid

Return paired uuid given a uuid and an optional hangid.

5.5.1 API specifications

HTTP method GET URL schema /crashes/paireduuid/(optional_parameters) Full URL /crashes/paireduuid/uuid/(uuid)/hangid/(hangid)/ Example http://socorro-api/bpapi/crashes/paireduuid/uuid/e8820616-1462-49b6-9784-e99a32120201/

5.5.2 Mandatory parameters

Name Type of value Description uuid String Unique identifier of the crash report.

5.5.3 Optional parameters

Name Type of value Default value Description hangid String None Hang ID of the crash report.

5.5. Crashes Paireduuid 21 Socorro Documentation, Release 2

5.5.4 Return value

Return an object like the following: { "hits":[ { "uuid":"e8820616-1462-49b6-9784-e99a32120201" } ], "total":1 }

Note that if a hangid is passed to the service, it will always return maximum one result. Remove that hangid to get all paired uuid.

5.6 Crashes Signatures

Return top crashers by signatures.

5.6.1 API specifications

HTTP GET method URL /crashes/signatures/(optional_parameters) schema Full /crashes/signatures/product/(product)/version/(version)/to_from/(to_date)/ URL duration/(number_of_days)/crash_type/(crash_type)/limit/(number_of_results)/ os/(operating_system)/ Exam- http://socorro-api/bpapi/crashes/signatures/product/Firefox/version/9.0a1/ ple

5.6.2 Mandatory parameters

Name Type of value Description product String Product for which to get top crashes by signatures. version String Version of the product for which to get top crashes.

5.6.3 Optional parameters

Name Type of Default Description value value crash_type String all Type of crashes to get, can be “browser”, “plugin”, “content” or “all”. end_date Date Now Date before which to get top crashes. duration Int One week Number of hours during which to get crashes. os String None Limit crashes to only one OS. limit Int 100 Number of results to retrieve.

22 Chapter 5. Middleware API Socorro Documentation, Release 2

5.6.4 Return value

Return an object like the following: { "totalPercentage": 0.9999999999999994, "end_date":"2011-12-08 00:00:00", "start_date":"2011-12-07 17:00:00", "crashes":[ { "count":3, "mac_count":3, "changeInRank": 11, "currentRank":0, "previousRank": 11, "percentOfTotal": 0.142857142857143, "win_count":0, "changeInPercentOfTotal": 0.117857142857143, "linux_count":0, "hang_count":0, "signature":"objc_msgSend | __CFXNotificationPost", "previousPercentOfTotal": 0.025, "plugin_count":0 } ], "totalNumberOfCrashes":1 }

5.7 Extensions

Return a list of extensions associated with a crash’s UUID.

5.7.1 API specifications

HTTP method GET URL schema /extensions/(optional_parameters) Full URL /extensions/uuid/(uuid)/date/(crash_date)/ Example http://socorro-api/bpapi/extensions/uuid/xxxx-xxxx-xxxx/date/2012-02-29T01:23:45+00:00/

5.7.2 Mandatory parameters

Name Type of value Default value Description uuid String None Unique Identifier of the specific crash to get extensions from. date Datetime None Exact datetime of the crash.

5.7.3 Optional parameters

None

5.7. Extensions 23 Socorro Documentation, Release 2

5.7.4 Return value

Return a list of extensions: { "total":1, "hits":[ { "report_id": 1234, "date_processed":"2012-02-29T01:23:45+00:00", "extension_key": 5678, "extension_id":"[email protected]", "extension_version":"1.2" } ] }

5.8 Crash Trends

Return a list of nightly or aurora crashes that took place between two dates.

5.8.1 API specifications

HTTP GET method URL /crashtrends/(optional_parameters) schema Full URL /crashtrends/start_date/(start_date)/end_date/(end_date)/product/(product)/version/(version) Example http://socorro-api/bpapi/crashtrends/start_date/2012-03-01/end_date/2012-03- 15/product/Firefox/version/13.0a1

5.8.2 Mandatory parameters

Name Type of value Default value Description start_date Datetime None The earliest date of crashes we wish to evaluate end_date Datetime None The latest date of crashes we wish to evaluate. product String None The product. version String None The version.

5.8.3 Optional parameters

None

5.8.4 Return value

Return a total of crashes, along with their build date, by build ID:

24 Chapter 5. Middleware API Socorro Documentation, Release 2

[ { "build_date":"2012-02-10", "version_string":"12.0a2", "product_version_id": 856, "days_out":6, "report_count": 515, "report_date":"2012-02-16", "product_name":"Firefox" } ]

5.9 Job

Handle the jobs queue for crash reports processing.

5.9.1 API specifications

HTTP method GET URL schema /job/(parameters) Full URL /job/uuid/(uuid)/ Example http://socorro-api/bpapi/job/uuid/e8820616-1462-49b6-9784-e99a32120201/

5.9.2 Mandatory parameters

Name Type of value Default value Description uuid String None Unique identifier of the crash report to find.

5.9.3 Optional parameters

None

5.9.4 Return value

With a GET HTTP method, the service will return data in the following form: { "hits":[ { "id":1, "pathname":"", "uuid":"e8820616-1462-49b6-9784-e99a32120201", "owner":3, "priority":0, "queueddatetime":"2012-02-29T01:23:45+00:00", "starteddatetime":"2012-02-29T01:23:45+00:00", "completeddatetime":"2012-02-29T01:23:45+00:00", "success": True, "message":"Hello" }

5.9. Job 25 Socorro Documentation, Release 2

], "total":1 }

5.10 Priorityjobs

Handle the priority jobs queue for crash reports processing.

5.10.1 API specifications

HTTP method GET, POST URL schema /priorityjobs/(parameters) Full GET URL /priorityjobs/uuid/(uuid)/ GET Example http://socorro-api/bpapi/priorityjobs/uuid/e8820616-1462-49b6-9784-e99a32120201/ POST Example http://socorro-api/bpapi/priorityjobs/, data: uuid=e8820616-1462-49b6-9784-e99a32120201

5.10.2 Mandatory parameters

Name Type of value Default value Description uuid String None Unique identifier of the crash report to mark.

5.10.3 Optional parameters

None

5.10.4 Return value

With a GET HTTP method, the service will return data in the following form: { "hits":[ {"uuid":"e8820616-1462-49b6-9784-e99a32120201"} ], "total":1 }

With a POST HTTP method, it will return true if the uuid has been successfully added to the priorityjobs queue, and false if the uuid is already in the queue or if there has been a problem.

5.11 Products

Return information about product(s) and version(s) depending on the parameters the service is called with.

26 Chapter 5. Middleware API Socorro Documentation, Release 2

5.11.1 API specifications

HTTP method GET URL schema /products/(optional_parameters) Full URL /products/versions/(versions) Example http://socorro-api/bpapi/products/versions/Firefox:9.0a1/

5.11.2 Optional parameters

Name Type of value Default Description value ver- String or list of None Several product:version strings can be specified, separated by a sions strings + symbol.

5.11.3 Return value

If the service is called with the optional versions parameter, the service returns an object with an array of results labeled as hits and a total: { "hits": [ { "is_featured": boolean, "throttle": float, "end_date": "string", "start_date": "integer", "build_type": "string", "product": "string", "version": "string" } ... ], "total": 1 }

If the service is called with no parameters, it returns an object containing a list of products as well as a total, indicating the number of products returned: {"hits":[ { "sort":1, "release_name":"firefox", "rapid_release_version":"5.0", "product_name":"Firefox" }, ... ],"total":6 }

5.12 Products Builds

Query and update information about builds for products.

5.12. Products Builds 27 Socorro Documentation, Release 2

5.12.1 API specifications

HTTP method GET, POST URL schema /products/builds/(optional_parameters) Full URL /products/builds/product/(product)/version/(version)/date_from/(date_from)/ GET Example POST Example http://socorro-api/bpapi/products/builds/product/Firefox/version/9.0a1/ http://socorro-api/bpapi/products/builds/product/Firefox/, data: version=10.0&platform=macosx&build_id=20120416012345& build_type=Beta&beta_number=2&repository=mozilla- central

5.12.2 Mandatory GET parameters

Name Type of value Default value Description product String None Product for which to get nightly builds.

5.12.3 Optional GET parameters

Name Type of value Default value Description version String None Version of the product for which to get nightly builds. from_date Date Now - 7 days Date from which to get nightly builds.

5.12.4 GET return value

Return an array of objects: [ { "product":"string", "version":"string", "platform":"string", "buildid":"integer", "build_type":"string", "beta_number":"string", "repository":"string", "date":"string" }, ... ]

5.12.5 Mandatory POST parameters

Name Type of value Default value Description product String None Product for which to add a build. version String None Version for new build, e.g. “10.0”. platform String None Platform for new build, e.g. “macosx”. build_id String None Build ID for new build (YYYYMMDD######). build_type String None Type of build, e.g. “Release”, “Beta”, “Aurora”, etc.

28 Chapter 5. Middleware API Socorro Documentation, Release 2

5.12.6 Optional POST parameters

Name Type of Default Description value value beta_numberString None Beta number if build_type is “Beta”. Mandatory if build_type is “Beta”, ignored otherwise. repository String “” The repository from which this release came.

5.12.7 POST return value

On success, returns a 303 See Other redirect to the newly-added build’s API page at: /products/builds/product/(product)/version/(version)/

5.13 Signature URLs

Returns a list of urls for a specific signature, product(s), version(s)s as well as start and end date. Also includes the total number of times this URL has been reported for the parameters specified above.

5.13.1 API specifications

HTTP GET method URL /signatureurls/(parameters) schema Full /signa- URL tureurls/signature/(signature)/start_date/(start_date)/end_date/(end_date)/products/(products)/versions/(versions) Exam- http://socorro-api/bpapi/signatureurls/signature/samplesignature/start_date/2012-03- ple 01T00:00:00+00:00/end_date/2012-03- 31T00:00:00+00:00/products/Firefox+Fennec/versions/Firefox:4.0.1+Fennec:13.0/

5.13.2 Mandatory parameters

Name Type of value Default value Description signature String None The signature for which urls shoud be found start_date Date None Date from which to collect urls end_date Date None Date up to, but not including, for which urls should be collected products String None Product(s) for which to find urls versions String None Version(s) of the above products to find urls for

5.13.3 Return value

Returns an object with a list of urls and the total count for each, as well as a counter, ‘total’, for the total number of results in the result set. { “hits”: [ {“url”: “about:blank”, “crash_count”: 1936}, ...

5.13. Signature URLs 29 Socorro Documentation, Release 2

], “total”: 1 }

5.14 Search

Search for crashes according to a large number of parameters and return a list of crashes or a list of distinct signatures.

5.14.1 API specifications

HTTP GET method URL /search/(data_type)/(optional_parameters) schema Full /search/(data_type)/for/(terms)/products/(products)/from/(from_date)/to/(to_date)/in/(fields)/versions/(versions)/os/(os_name)/branches/(branches)/search_mode/(search_mode)/reasons/(crash_reasons)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/result_number/(number)/result_offset/(offset)/ URL Exam- http://socorro- ple api/bpapi/search/crashes/for/libflash.so/in/signature/products/Firefox/versions/Firefox:4.0.1/from/2011- 05-01/to/2011-05-05/os/Windows/

5.14.2 Mandatory parameters

Name Type of value Default value Description data_type String ‘signatures‘ Type of data we are looking for. Can be ‘crashes‘ or ‘signatures‘.

30 Chapter 5. Middleware API Socorro Documentation, Release 2

5.14. Search 31 Socorro Documentation, Release 2

5.14.3 Optional parameters

Name Type of De- Description value fault value for String or None Terms we are searching for. Each term must be URL encoded. Several list of terms can be specified, separated by a + symbol. strings products String or ‘Fire- The product we are interested in. (e.g. Firefox, Fennec, Thunderbird. . . ) list of fox‘ strings from Date Now - Search for crashes that happened after this date. Can use the following 7 days formats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-dd HH:ii:ss.S‘. to Date Now Search for crashes that happened before this date. Can use the following formats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-dd HH:ii:ss.S‘. in String or All Fields we are searching in. Several fields can be specified, separated by a + list of symbol. This is NOT implemented for PostgreSQL. strings versions String or None Restring to a specific version of the product. Several versions can be list of specified, separated by a + symbol. strings os String or None Restrict to an Operating System. (e.g. Windows, Mac, Linux. . . ) Several list of versions can be specified, separated by a + symbol. strings branches String or None Restrict to a branch of the product. Several branches can be specified, list of separated by a + symbol. strings search_modeString ‘de- Set how to search. Can be either ‘default‘, ‘is_exactly‘, ‘contains‘ or fault‘ ‘starts_with‘. reasons String or None Restricts search to crashes caused by this reason. list of strings build_ids Integer or None Restricts search to crashes that happened on a product with this build ID. list of integers build_from Integer or None Restricts search to crashes with a build id greater than this. list of integers build_to Integer or None Restricts search to crashes with a build id lower than this. list of integers re- String ‘any‘ Can be ‘any‘, ‘browser‘ or ‘plugin‘. port_process re- String ‘any‘ Can be ‘any‘, ‘crash‘ or ‘hang‘. port_type plugin_in String or ‘name‘ Search for a plugin in this field. ‘report_process‘ has to be set to ‘plugin‘. list of strings plu- String ‘de- How to search for this plugin. report_process has to be set to plugin. Can gin_search_mode fault‘ be either ‘default‘, ‘is_exactly‘, ‘contains‘ or ‘starts_with‘. plu- String or None Terms to search for. Several terms can be specified, separated by a + gin_terms list of symbol. report_process has to be set to plugin. strings re- Integer 100 Number of results to return. 32sult_number Chapter 5. Middleware API re- Integer 0 Offset of the first result to return. sult_offset Socorro Documentation, Release 2

5.14.4 Return value

If data_type is crashes, return value looks like: { "hits":[ { "count":1, "signature":"arena_dalloc_small | arena_dalloc | free | CloseDir", }, { "count":1, "signature":"XPCWrappedNativeScope::TraceJS(JSTracer *, XPCJSRuntime*)", "is_solaris":0, "is_linux":0, "numplugin":0, "is_windows":0, "is_mac":0, "numhang":0 } ], "total":2 }

If data_type is signatures, return value looks like: { "hits":[ { "client_crash_date":"2011-03-16 13:55:10.0", "dump":"...", "signature":"arena_dalloc_small | arena_dalloc | free | CloseDir", "process_type": null, "id": 231224257, "hangid": null, "version":"4.0b13pre", "build":"20110314162350", "product":"Firefox", "os_name":"Mac OS X", "date_processed":"2011-03-16 06:54:56.385843", "reason":"EXC_BAD_ACCESS / KERN_INVALID_ADDRESS", "address":"0x1d3aff03", "...":"..." } ], "total":1 }

If an error occured, the API will return something like this: Well, for the moment it doesn’t return anything but an Internal Error HTTP header... We will improve that soon! :)

5.15 List Report

Return a list of crash reports with a specified signature and filtered by a wide range of options.

5.15. List Report 33 Socorro Documentation, Release 2

5.15.1 API specifications

HTTP GET method URL /report/list/(parameters) schema Full /re- URL port/list/signature/(signature)/products/(products)/from/(from_date)/to/(to_date)/versions/(versions)/os/(os_name)/branches/(branches)/reasons/(crash_reason)/build_ids/(build_ids)/build_from/(build_from)/build_to/(build_to)/report_process/(report_process)/report_type/(report_type)/plugin_in/(plugin_in)/plugin_search_mode/(plugin_search_mode)/plugin_terms/(plugin_terms)/ Exam- http://socorro- ple api/bpapi/report/list/signature/SocketSend/products/Firefox/versions/Firefox:4.0.1/from/2011-05- 01/to/2011-05-05/os/Windows/

5.15.2 Mandatory parameters

Name Type of value Default value Description signature String None Signature of crash reports to get.

34 Chapter 5. Middleware API Socorro Documentation, Release 2

5.15.3 Optional parameters

Name Type of De- Description value fault value products String or ‘Fire- The product we are interested in. (e.g. Firefox, Fennec, Thunderbird. . . ) list of fox‘ strings from Date Now - Search for crashes that happened after this date. Can use the following 7 days formats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-dd HH:ii:ss.S‘. to Date Now Search for crashes that happened before this date. Can use the following formats: ‘yyyy-MM-dd‘, ‘yyyy-MM-dd HH:ii:ss‘ or ‘yyyy-MM-dd HH:ii:ss.S‘. versions String or None Restring to a specific version of the product. Several versions can be list of specified, separated by a + symbol. strings os String or None Restrict to an Operating System. (e.g. Windows, Mac, Linux. . . ) Several list of versions can be specified, separated by a + symbol. strings branches String or None Restrict to a branch of the product. Several branches can be specified, list of separated by a + symbol. strings reasons String or None Restricts search to crashes caused by this reason. list of strings build_ids Integer or None Restricts search to crashes that happened on a product with this build ID. list of integers build_from Integer or None Restricts search to crashes with a build id greater than this. list of integers build_to Integer or None Restricts search to crashes with a build id lower than this. list of integers re- String ‘any‘ Can be ‘any‘, ‘browser‘ or ‘plugin‘. port_process re- String ‘any‘ Can be ‘any‘, ‘crash‘ or ‘hang‘. port_type plugin_in String or ‘name‘ Search for a plugin in this field. ‘report_process‘ has to be set to ‘plugin‘. list of strings plu- String ‘de- How to search for this plugin. report_process has to be set to plugin. Can gin_search_mode fault‘ be either ‘default‘, ‘is_exactly‘, ‘contains‘ or ‘starts_with‘. plu- String or None Terms to search for. Several terms can be specified, separated by a + gin_terms list of symbol. report_process has to be set to plugin. strings re- Integer 100 Number of results to return. sult_number re- Integer 0 Offset of the first result to return. sult_offset

5.15. List Report 35 Socorro Documentation, Release 2

5.15.4 Return value

In normal cases, return something like this: { "hits":[ { "client_crash_date":"2011-03-16 13:55:10.0", "dump":"...", "signature":"arena_dalloc_small | arena_dalloc | free | CloseDir", "process_type": null, "id": 231224257, "hangid": null, "version":"4.0b13pre", "build":"20110314162350", "product":"Firefox", "os_name":"Mac OS X", "date_processed":"2011-03-16 06:54:56.385843", "reason":"EXC_BAD_ACCESS / KERN_INVALID_ADDRESS", "address":"0x1d3aff03", "...":"..." }, { "client_crash_date":"2011-03-16 11:35:37.0", "...":"..." } ], "total":2 }

If signature is empty or nonexistent, raise a BadRequest error. If another error occured, the API will return a 500 Internal Error HTTP header.

5.16 Versions Info

Return information about one or several couples product:version.

5.16.1 API specifications

HTTP method GET URL schema /util/versions_info/(optional_parameters) Full URL /util/versions_info/versions/(versions)/ Example http://socorro-api/bpapi/util/versions_info/versions/Firefox:9.0a1+Fennec:7.0/

5.16.2 Mandatory parameters

None.

5.16.3 Optional parameters

Name Type of value Default value Description versions String or list of strings None Product:Versions couples for which information is asked.

36 Chapter 5. Middleware API Socorro Documentation, Release 2

5.16.4 Return value

If parameter versions is unvalid, return value is None. Otherwise it looks like this: { "product_name:version_string":{ "product_version_id": integer, "version_string":"string", "product_name":"string", "major_version":"string" or None, "release_channel":"string" or None, "build_id":[list, of, decimals] or None } }

5.17 Forcing an implementation

For debuging reasons, you can add a parameter to force the API to use a specific implementation module. That module must be inside socorro.external and contain the needed service implementation. Name Type of value Default value Description force_api_impl String None Force the service to use a specific module. For example, if you want to force search to be executed with ElasticSearch, you can add to the middleware call force_api_impl/elasticsearch/. If socorro.external.elasticsearch exists and contains a search module, it will get loaded and used.

5.17. Forcing an implementation 37 Socorro Documentation, Release 2

38 Chapter 5. Middleware API CHAPTER 6

Socorro UI

The Socorro UI is a KohanaPHP implementation that will operate the frontend website for the website.

6.1 Coding Standards

Maintaining coding standards will encourage current developers and future developers to implement clean and consis- tent code throughout the codebase. The PEAR Coding Standards (http://pear.php.net/manual/en/standards.php) will serve as the basis for the Socorro UI coding standards. • Always include header documentation for each class and each method. – When updating a class or method that does not have header documentation, add header documentation before committing. – Header documentation should be added for all methods within each controller, model, library and helper class. – @param documentation is required for all parameters – Header documentation should be less than 80 characters in width. • Add inline documentation for complex logic within a method. • Use 4 character tab indentations for both PHP and Javascript • Method names must inherently describe the functionality within that method. – Method names must be written in a camel-case format. e.g. getThisThing – Method names should follow the verb-noun format, such as a getThing, editThing, etc. • Use carriage returns in if statements containing more than 2 statements and in arrays containing more than 3 array members for readability. • All important files, such as controllers, models and libraries, must have the Mozilla Public License at the top of the file.

6.2 Adding new reports

Here is an example of a new report which uses a web service to fetch data (JSON via HTTP) and displays the result as an HTML table.

39 Socorro Documentation, Release 2

Kohana uses the Model-View-Controller (MVC) pattern: http://en.wikipedia.org/wiki/Model-view-controller Create model, view(s) and controller for new report (substituting “newreport” for something more appropriate):

6.2.1 Configuration (optional) webapp-php/application/config/new_report.php

// The number of rows to display. $config[’numberofrows’]= 20;

// The number of results to display on the by_version page. $config[’byversion_limit’]= 300; ?>

6.2.2 Model webapp-php/application/models/newreport.php See Add a service to the Middleware for details about writing a middleware service for this to use.

6.2.3 View webapp-php/application/views/newreport/byversion.php New Report for <?php out::H($product) ?> <?php out::H($version) ?>

6.2.4 Controller webapp-php/application/controllers/newreport.php

40 Chapter 6. Socorro UI Socorro Documentation, Release 2

public function __construct() { parent::__construct(); $this->newreport_model= new NewReport_Model(); }

// Public functions map to routes on the controller // http:///NewReport/index/[product, version, ?’foo’=’bar’, etc] public function index() { $resp= $this->newreport_model->getNewReportViaWebService(); if ($resp){ $this->setViewData(array( ’resp’ => $resp, ’nav_selection’ => ’new_report’, ’foo’ => $resp->foo, )); } else { header("Data access error", TRUE, 500); $this->setViewData(array( ’resp’ => $resp, ’nav_selection’ => ’new_report’, )); } }

} ?>

6.2. Adding new reports 41 Socorro Documentation, Release 2

42 Chapter 6. Socorro UI CHAPTER 7

UI Installation

7.1 Installation

Follow these steps to get the Socorro UI up and running.

7.1.1 Apache

Set up Apache with a vhost as you see fit. You will either need AllowOverride to enable .htaccess files or you may paste the .htaccess rules into your vhost.

7.1.2 KohanaPHP Installation

1. Copy .htaccess file and edit the host path if your webapp is not at the domain root.: cp htaccess-dist .htaccess vim .htaccess

2. Copy application/config/config.php-dist and change the hosting path and domain.: cp application/config/config.php-dist application/config/config.php vim application/config/config.php

For a production install, you may want to set $config[’display_errors’] to FALSE. 3. Copy application/config/database.php and edit its database settings.: cp application/config/database.php-dist application/config/database.php vim application/config/database.php

4. Copy application/config/cache.php and update the cache setting to be file-based or memcache-based.: cp application/config/cache.php-dist application/config/cache.php vim application/config/cache.php

5. If you selected memcache-based caching, copy application/config/cache_memcache.php and update the settings accordingly.: cp application/config/cache_memcache.php-dist application/config/cache_memcache.php vim application/config/cache_memcache.php

6. Copy all other config -dist files to their config location.:

43 Socorro Documentation, Release 2

cp application/config/application.php-dist application/config/application.php cp application/config/webserviceclient.php-dist application/config/webserviceclient.php cp application/config/daily.php-dist application/config/daily.php cp application/config/products.php-dist application/config/products.php

7. Copy application/config/auth.php and edit it to setup your preferred authentication method, or to disable authen- tication. Edit $config[’driver’] to change your authentication method. Edit $config[’proto’] to remove the https requirement if necessary.: cp application/config/auth.php-dist application/config/auth.php vim application/config/auth.php

8. If you are using LDAP, copy application/config/ldap.php and edit its settings.: cp application/config/ldap.php-dist application/config/ldap.php vim application/config/ldap.php

9. Ensure that the application logs and cache directories are writeable.: a+rw application/logs application/cache

7.1.3 Dump Files

Socorro UI needs to access the processed dump files via HTTP. You will need to setup Apache or some other system to ensure that dump files may be accessed at ‘http://example.com/dumps/.jsonz’ . This can be accomplished via mod_rewrite rules, just like in the next section “Serving Raw dump files”. Example config: processeddumps.mod_rewrite.txt Next, update the $config[’crash_dump_local_url’] value in application/config/application.php to point to the proper directory.

7.1.4 Raw Dump Files

When a user is logged in to Socorro UI as an admin, they may view raw crash dump files. These raw crashes can be served up by Apache by adding the following rewrite rules. The values should match the val- ues in the middleware code at scripts/config/commonconfig.py settings. Links to raw dumps are available in the http://example.com/report/index/{uuid} crash report pages. Example config: webapp-php/docs/rawdumps.mod_rewrite.txt Next, update the $config[’raw_dump_url’] value in application/config/application.php to point to the proper directory.

7.1.5 Web Services

Many parts of Socorro UI rely on web services provided by the Python-based middleware layer.

7.1.6 Middleware

Copy the scripts/config/webapiconfig.py file, edit it accordingly and execute the script to listen on the indicated port.: cp scripts/config/webapiconfig.py-dist scripts/config/webapiconfig.py.py vim scripts/config/webapiconfig.py python scripts/webservices.py 8083

44 Chapter 7. UI Installation Socorro Documentation, Release 2

7.1.7 Socorro UI

Copy application/config/webserviceclient.php, edit the file and change $config[’socorro_hostname’] to contain the proper hostname and port number. If necessary, update $config[’basic_auth’]: cp application/config/webserviceclient.php-dist application/config/webserviceclient.php vim application/config/webserviceclient.php

7.1.8 Testing Your Setup

There are 2 ways in which you can test your Socorro UI setup.

7.1.9 Search

Visit the website containing the Socorro UI, and click Advanced Search. Perform a search for the product you’ve added to the site, which you know have crash reports associated with it in the reports table in your database.

7.1.10 Report

Within the search results set you received, click a signature in the results set. Next click the timestamp for a particular signature, which will take you to a page that displays an individual crash report.

7.2 Trouble Shooting

7.2.1 println the sql

To see what SQL queries are being executed: Edit ‘webapp-php/system/libraries/Database.php’ line 443 Ko- hana::log(‘debug’, $sql); Do a svn ignore on this file, if you plan on checking in code. This will show up in the debug log ‘application/logs/date.log.php’ Examine your database and see why you don’t get the expected results.

7.2.2 404?

Is your ‘.htaccess’ properly setup?

7.2.3 /report/pending never goes to /report/index?

If you see a pending screen and didn’t expect one this means that the record in report and dumps couldn’t be joined so it’s waiting for the processor on the backend to populate one or both tables. Investigate with the uuid and look at reports and dump tables.

7.2.4 Config Files

Ensure that the appropriate config files in webapp/application/config have been copied from .php-dist to .php

7.2. Trouble Shooting 45 Socorro Documentation, Release 2

46 Chapter 7. UI Installation CHAPTER 8

Server

The Socorro Server is a collection of Python applications and a Python package ([[SocorroPackage]]) that runs the backend of the Socorro system.

8.1 The Applications

Executables for the applications are generally found in the .../scripts directory. • ../scripts/startCollector.py - Collector • ../scripts/startDeferredCleanup.py - Deferred Cleanup • ../scripts/startMonitor.py - Monitor • ../scripts/startProcessor.py - Processor • ../scripts/startTopCrashes.py - Top Crashers By Signature • ../scripts/startBugzilla.py - BugzillaAssociations • ../scripts/startMtfb.py - MeanTimeBeforeFailure • ../scripts/startServerStatus.py - server status • ../scripts/startTopCrashByUrl.py - Top Crashers By URL

47 Socorro Documentation, Release 2

48 Chapter 8. Server CHAPTER 9

crontabber

crontabber is a script that handles all cron job scripting. Unlike traditional crontab all execution is done via the ./crontabber.py script and the configuration about frequency and exact time to run is part of the configuration files. The configuration is done using configman and it looks something like this: # name: jobs # doc: List of jobs and their frequency separated by ‘|‘ # converter: configman.converters.class_list_converter jobs=socorro.cron.jobs.foo.FooCronApp|12h socorro.cron.jobs.bar.BarCronApp|1d socorro.cron.jobs.pgjob.PGCronApp|1d|03:00

9.1 crontab runs crontabber

crontabber can be run at any time. Because the exact execution time is in configuration you can’t accidentally execute jobs that aren’t supposed to execute simply by running crontabber. However, it can’t be run as daemon. It actually needs to be run by UNIX crontab every, say, 5 minutes. So instead of your crontab being a huge list of jobs at different times, all you need is this:

*/5 **** PYTHONPATH="..." socorro/cron/crontabber.py

That’s all you need! Obviously the granularity of crontabber is limited by the granularity you execute it. By moving away from UNIX crontab we have better control of the cron apps and their inter-relationship. We can also remove unnecessary boilerplate cruft.

9.2 Dependencies

In crontabber the state of previous runs of cron apps within are remembered (stored internally in a JSON file) which makes it possible to assign dependencies between the cron apps. This is used to potentially prevent running jobs. Not to automatically run those that depend. For example, if FooCronApp depends on BarCronApp it just won’t run if BarCronApp last resulted in an error or simply hasn’t been run the last time it should. Overriding dependencies is possible with the --force parameter. For example, suppose you know BarCronApp can now be run you do that like this:

49 Socorro Documentation, Release 2

./crontabber.py --job=BarCronApp --force

Dependencies inside the cron apps are defined by settings a class attribute on the cron app. The attribute is called depends_on and its value can be a string, a tuple or a list. In this example, since BarCronApp depends on FooCronApp it’s class would look something like this: from socorro.cron.crontabber import BaseCronApp class BarCronApp(BaseCronApp): app_name=’BarCronApp’ app_description=’Does some bar things’ depends_on=(’FooCronApp’,)

def run(self): ...

9.3 Own configurations

Each cron app can have its own configuration(s). Obviously they must always have a good default that is good enough otherwise you can’t run crontabber to run all jobs that are due. To make overrideable configuration options add the required_config class attribute. Here’s an example: from configman import Namespace from socorro.cron.crontabber import BaseCronApp class FooCronApp(BaseCronApp): app_name=’foo’

required_config= Namespace() required_config.add_option( ’bugzilla_url’, default=’https://bugs.mozilla.org’, doc=’Base URL for bugzilla’ )

def run(self): ... print self.config.bugzilla_url ...

Note: Inside that run() method in that example, the self.config object is a special one. It’s basically a refer- ence to the configuration specifically for this class but it has access to all configuration objects defined in the “root”. I.e. you can access things like self.config.logger here too but other cron app won’t have access to self.config.bugzilla_url since that’s unique to this app. To override cron app specific options on the command line you need to use a special syntax to associate it with this cron app class. Usually, the best hint of how to do this is to use python crontabber.py --help. In this example it would be: python crontabber.py --job=foo --class-FooCronApp.bugzilla_url=...

50 Chapter 9. crontabber Socorro Documentation, Release 2

9.4 App names versus/or class names

Every cron app in crontabber must have a class attribute called app_name. This value must be unique. If you like, it can be the same as the class it’s in. When you list jobs you list the full path to the class but it’s the app_name within the found class that gets remembered. If you change the app_name all previously know information about it being run is lost. If you change the name and path of the class, the only other thing you need to change is the configuration that refers to it. Best practice recommendation is this: • Name the class like a typical python class, i.e. capitalize and optionally camel case the rest. For example: UpdateADUCronApp • Optional but good practice is to keep the suffix CronApp to the class name. • Make the app_name value lower case and replace spaces with -.

9.5 Manual intervention

First of all, to add a new job all you need to do is add it to the config file that crontabber is reading from. Thanks to being a configman application it automatically picks up configurations from files called crontabber.ini, crontabber.conf or crontabber.json. To create a new config file, use admin.dump_config like this: python socorro/cron/crontabber.py --admin.dump_conf ini

All errors that happen are reported to the standard python logging module. Also, the latest error (type, value and traceback) is stored in the JSON database too. If any of your cron apps have an error you can see it with: python socorro/cron/crontabber.py --list-jobs

Here’s a sample output: === JOB ======Class: socorro.cron.jobs.foo.FooCronApp App name: foo Frequency: 12h Last run: 2012-04-05 14:49:56 (1 minute ago) Next run: 2012-04-06 02:49:56 (in 11 hours, 58 minutes)

=== JOB ======Class: socorro.cron.jobs.bar.BarCronApp App name: bar Frequency: 1d Last run: 2012-04-05 14:49:56 (1 minute ago) Next run: 2012-04-06 14:49:56 (in 23 hours, 58 minutes) Error!! (1 times) File "socorro/cron/crontabber.py", line 316, in run_one self._run_job(job_class) File "socorro/cron/crontabber.py", line 369, in _run_job instance.main() File "/Use[snip]orro/socorro/cron/crontabber.py", line 47, in main self.run() File "/Use[snip]orro/socorro/cron/jobs/bar.py", line 10, in run raise NameError(’doesnotexist’)

9.4. App names versus/or class names 51 Socorro Documentation, Release 2

It will only keep the latest error but it will include an error count that tells you how many times it has tried and failed. The error count increments every time any error happens and is reset once no error happens. So, only the latest error is kept and to find out about past error you have to inspect the log files. NOTE: If a cron app that is configured to run every 2 days runs into an error; it will try to run again in 2 days. So, suppose you inspect the error and write a fix. If you’re impatient and don’t want to wait till it’s time to run again, you can start it again like this: python socorro/cron/crontabber.py --job=my-app-name # or if you prefer python socorro/cron/crontabber.py --job=path.to.MyCronAppClass

This will attempt it again and no matter if it works or errors it will pick up the frequency from the configuration and update what time it will run next.

9.6 Frequency and execution time

The format for configuring jobs looks like this:

socorro.cron.jobs.bar.BarCronApp|30m

or like this:

socorro.cron.jobs.pgjob.PGCronApp|2d|03:00

Hopefully the format is self-explanatory. The first number is required and it must be a number followed by “y”, “d”, “h” or “m”. (years, days, hours, minutes). For jobs that have a frequency longer than 24 hours you can specify exactly when it should run. This format has to be in the 24-hour format of HH:MM. If you’re ever uncertain that your recent changes to the configuration file is correct or not, instead of waiting around you can check it with: python socorro/cron/crontabber.py --configtest

which will do nothing if all is OK.

9.7 Timezone and UTC

No. There is no timezone in any of the dates and times in crontabber. All is assumed local time. I.e. whatever the server it’s running on is using. The reason for this is the ability to specify exactly when something should be run. So if you want something to run at exactly 3AM every day, that’s 3AM in relation to where the server is located.

9.8 Writing cron apps (aka. jobs)

Because of the configurable nature of the crontabber the actual cron apps can be located anywhere. For example, if it’s related to HBase it could for example be in socorro/external/hbase/mycronapp.py. However, for the most part it’s probably a good idea to write them in socorro/cron/jobs/ and write one class per file to make it clear. There are already some “sample apps” in there that does nothing except serving as good examples. With time, we can hopefully delete these as other, real apps, can work as examples and inspiration.

52 Chapter 9. crontabber Socorro Documentation, Release 2

The most common apps will be execution of certain specific pieces of SQL against the PostgreSQL database. For those, the socorro/cron/jobs/pgjob.py example is good to look at. At the time of writing it looks like this: from socorro.cron.crontabber import PostgreSQLCronApp class PGCronApp(PostgreSQLCronApp): app_name=’pg-job’ app_description=’Does some foo things’

def run(self, connection): cursor= connection.cursor() cursor.execute(’select relname from pg_class’)

Let’s pick that a part a bit... The most important difference is the different base class. Unlike the BaseCronApp class, this one is executing the run() method with a connection instance as the one and only parameter. That connection will automatically take care of transactions! That means that you don’t have to run something connection.commit() and if you want the transaction to roll back, all you have to do is raise an error. For example: def run(self, connection): cursor= connection.cursor() today= datetime.datetime.today() cursor.execute(’INSERT INTO jobs (room) VALUES (bathroom)’) if today.strftime(’%A’) in (’Saturday’,’Sunday’): raise ValueError("Today is not a good day!") else: cursor.execute(’INSERT INTO jobs(tool) VALUES (brush)’)

Silly but hopefully it’s clear enough. Raising an error inside a cron app will not stop the other jobs from running other than the those that depend on it.

9.8. Writing cron apps (aka. jobs) 53 Socorro Documentation, Release 2

54 Chapter 9. crontabber CHAPTER 10

Throttling

The Collector has the ability to vet crashes as the come into the system. Originally, this system was used to provide a statistical sampling from the incoming stream of crashes. In 1.8, throttling is a way to allow a sampling of crashes to be put into the database. Throttling, the disposition of a JSON/dump pair, is controlled by the contents of the JSON file. The JSON files are collections of keys and values. Collector can examine these key/value pairs and assign a pass through probability. For example we may want to pass 100% of all alpha or beta releases to the database. In production, however, we may want to only save 10%. For details on how to configure throtttling, see the configuration section of Collector. Below is a section about the collector throttling rules.

10.1 throttleConditions

This option tells the collector how to route a given JSON/dump pair to storage for further processing or deferred storage. This consists of a list of conditions in this form: (JsonFileKey?, ConditionFunction?, Probability) • JsonFileKey?: a name of a field from the HTTP POST form. The possibilities are: “StartupTime?”, “Vendor”, “InstallTime?”, “timestamp”, “Add-ons”, “BuildID”, “SecondsSinceLastCrash?”, “UserID”, “ProductName?”, “URL”, “Theme”, “Version”, “CrashTime?” • ConditionFunction?: a function returning a boolean, regular expression or a constant used to test the value for the JsonFileKey?. • Probability: an integer between 0 and 100 inclusive. At 100, all JSON files, for which the ConditionFunction? returns true, will be saved in the database. At 0, no JSON files for which the ConditionFunction? returns true will be saved to the database. At 25, there is twenty-five percent probability that a matching JSON file will be written to the database. There must be at least one entry in the throttleConditions list. The example below shows the default case. These conditions are applied one at a time to each submitted crash. The first match of a condition function to a value stops the iteration through the list. The probability of that first matched condition will be applied to that crash. Keep the list short to avoid bogging down the collector.: throttleConditions= cm.Option() throttleConditions.default=[ #("Version", lambda x: x[-3:] == "pre", 25), # queue 25% of crashes with version ending in "pre" #("Add-ons", re.compile(’inspector\@mozilla\.org\:1\..*’), 75), # queue 75% of crashes where the inspector addon is at 1.x #("UserID", "d6d2b6b0-c9e0-4646-8627-0b1bdd4a92bb", 100), # queue all of this user’s crashes #("SecondsSinceLastCrash", lambda x: 300 >= int(x) >= 0, 100), # queue all crashes that happened within 5 minutes of another crash

55 Socorro Documentation, Release 2

(None, True, 10) # queue 10% of what’s left ]

56 Chapter 10. Throttling CHAPTER 11

Deployment

11.1 Introduction

Below are general deployment instructions for installations of Socorro.

11.2 Outage Page if the system is to be taken down for maintenance, these steps will show users an outage page during the maintenance period • backup webapp-php/index.php • You can copy webapp-php/docs/outage.php over webapp-php/index.php and all traffic will be served this outage message. • Do work • copy backup over webapp-php/index.php add other task instructions here

57 Socorro Documentation, Release 2

58 Chapter 11. Deployment CHAPTER 12

Development Discussions

12.1 Coding Conventions

12.1.1 Introduction

The following coding conventions are designed to ensure that the Socorro code is easy to read, hack, test, and deploy.

12.1.2 Style Guide

• Python should follow PEP 8 with 4 space indents • PHP code follows the PEAR coding standard • JavaScript is indented by four spaces • Unit Testing is strongly encouraged

12.1.3 Review

New checkins that are non-trivial should be reviewed by one of the core hackers. The commit message should indicate the reviewer and the issue number if applicable.

12.1.4 Testing

Any features that are only available to admins should be tested to ensure that only non-admin users to not have access. Before checking in changes to the socorro python code, be sure to run the unit tests.

12.2 New Developer Guide

If you are new to Socorro, you will find here good resources to start hacking:

59 Socorro Documentation, Release 2

12.2.1 General architecture of Socorro

If you clone our git repository, you will find the following folders. Here is what each of them contains: Folder Description analysis/ Contains metrics jobs such as mapreduce. Will be moved. config/ Contains the Apache configuration for the different parts of the Socorro application. docs/ Documentation of the Socorro project (the one you are reading right now). scripts/ Scripts for launching the different parts of the Socorro application. socorro/ Core code of the Socorro project. sql/ SQL scripts related to our PostgreSQL database. Contains schemas and update queries. thirparty/ External libraries used by Socorro. tools/ External tools used by Socorro. webapp-php/ Front-end PHP application (also called UI). See Socorro UI.

Socorro submodules

The core code module of Socorro, called socorro, contains a lot of code. Here are descriptions of every submodule in there: Module Description collector All code related to collectors. cron All cron jobs running around Socorro. database PostgreSQL related code. deferredcleanup Osolete. external Here are related to external resources like databases. integrationtest Osolete. lib Different libraries used all over Socorro’s code. middleware New-style middleware services place. monitor All code related to monitors. othertests Some other tests? services Old-style middleware services place. storage HBase related code. unittest All our unit tests are here. webapi Contains a few tools used by web-based services.

12.2.2 Setup a development environment

The best and easiest way to get started with a complete dev environment is to use Vagrant and our installation script.

Standalone dev environment in your existing environment If you don’t want to do things the easy way, or can’t use a virtual machine, you can install everything in your own development environment. All steps are described in Standalone Development Environment.

1. Install VirtualBox from: http://www.virtualbox.org/ 2. Install Vagrant from: http://vagrantup.com/ 3. Download base box # NOTE: if you have a 32-bit host, change "lucid64" to "lucid32" vagrant box add socorro-all http://files.vagrantup.com/lucid64.box

60 Chapter 12. Development Discussions Socorro Documentation, Release 2

4. Copy base box, boot VM and provision it with puppet: vagrant up

5. Add to /etc/hosts (on the HOST machine!): 33.33.33.10 crash-stats crash-reports socorro-api

Enjoy your Socorro environment! • browse UI: http://crash-stats • submit crashes: http://crash-reports/submit (accepts HTTP POST only, see System Test for information on submitting test crashes) • query data via middleware API: http://socorro-api/bpapi/adu/byday/p/WaterWolf/v/1.0/rt/any/osx/start/YYYY- MM-DD/end/YYYY-MM-DD (where WaterWolf is a valid productname and YYYY-MM-DD are valid start/end dates)

Apply your changes

Edit files in your git checkout on the host as usual. To actually make changes take effect, you can run: vagrant provision

This reruns puppet inside the VM to deploy the source to /data/socorro and restarts any necessary services.

How Socorro works

See How Socorro Works and Crash Flow.

Setting up a new database

Note that the existing puppet manifests populate PostgreSQL if the “breakpad” database does not exist. See Populate PostgreSQL for more information on how this process works, and how to customize it.

Enabling HBase

Socorro supports HBase as a long-term storage archive for both raw and processed crashes. Since it requires Sun (now Oracle) Java and does not work with OpenJDK, and generally has much higher memory requirements than all the other dependencies, it is not enabled by default. If you wish to enable it, edit the nodes.pp file: vi puppet/manifests/nodes/nodes.pp

And remove the comment (‘#’) marker from the socorro-hbase include: # include socorro-hbase

Re-provision vagrant, and HBase will be installed, started and the default Socorro schema will be loaded: vagrant provision

NOTE - this will download and install Java from Oracle, which means that you will be bound by the terms of their license agreement - http://www.oracle.com/technetwork/java/javase/terms/license/

12.2. New Developer Guide 61 Socorro Documentation, Release 2

Debugging

You can SSH into your VM by running: vagrant ssh

By default, your socorro git checkout will be shared into the VM via NFS at /home/socorro/dev/socorro Running “make install” as socorro user in /home/socorro/dev/socorro will cause Socorro to be installed to /data/socorro/. You will need to restart the apache2 or supervisord services if you modify middleware or backend code, respectively (note that “vagrant provision” as described above does all of this for you). Logs for the (PHP Kohana) webapp are at: /data/socorro/htdocs/application/logs/

All other Socorro apps log to syslog, using the user.* facility: /var/log/user.log

Apache may log important errors too, such as WSGI apps not starting up or problems with the Apache or PHP configs: /var/log/apache/error.log

Supervisord captures the stderr/stdout of the backend jobs, these are normally the same as syslog but may log important errors if the daemons cannot be started. You can also find stdout/stderr from cron jobs in this location: /var/log/socorro/

Loading data from an existing Socorro install

Given a PostgreSQL dump named “minidb.dump”, run the following. vagrant ssh # shut down database users sudo /etc/init.d/supervisor force-stop sudo /etc/init.d/apache2 stop

# drop old db and load snapshot sudo su - postgres dropdb breakpad createdb -E ’utf8’ -l ’en_US.utf8’ -T template0 breakpad pg_restore -Fc -d breakpad minidb.dump

This may take several hours, depending on your hardware. One way to speed this up would be to add more CPU cores to the VM (via virtualbox GUI), default is 1. Add “-j n” to pg_restore command above, where n is number of CPU cores - 1

Pulling crash reports from an existing production install

The Socorro PostgreSQL database only contains a small subset of the information about individual crashes (enough to run aggregate reports). For instance the full stack is only available in long-term storage (such as HBase). If you have imported a database from a production instance, you may want to configure the web UI to pull individual crash reports from production via the web service (so URLs such as http://crash- stats/report/index/YOUR_CRASH_ID_GOES_HERE will work).

62 Chapter 12. Development Discussions Socorro Documentation, Release 2

The /report/index page actually pulls it’s data from a URL such as: http://crash- stats/dumps/YOUR_CRASH_ID_GOES_HERE.jsonz You can cause your dev instance to fall back to your production instance by modifying: webapp-php/application/config/application.php

Change the URL in this config value to point to your desired production instance:

Note that the crash ID must be in both your local database and the remote (production) HBase instance for this to work. See https://github.com/mozilla/socorro/blob/master/webapp-php/application/config/application.php-dist

(OPTIONAL) Populating Elastic Search

See Populate ElasticSearch.

12.2.3 Add a service to the Middleware

Architecture overview

The middleware is a simple REST API providing JSON data depending on the URL that is called. It is made of a list of services, each one binding a certain URL with parameters. Documentation for each service is available in the Middleware API page. Those services are not containing any code, but are only interfaces. They are using other resources from the external module. That external module is composed of one submodule for each external resource we are using. For example, there is a PostgreSQL submodule, an ElasticSearch submodule and a HBase submodule. You will also find some common code among external resources in socorro.lib.

Class hierarchy

12.2. New Developer Guide 63 Socorro Documentation, Release 2

REST services in Socorro are divided into two separate modules. socorro.middleware is the module that con- tains the actual service, the class that will receive HTTP requests and return the right data. However, services do not do any kind of computation, they only find the right implementation class and call it. Implementations of services are found in socorro.external. They are separated in submodules, one for each external resource that we use. For example, in socorro.external.postgresql you will find everything that is related to data stored in PostgreSQL: SQL queries mainly, but also arguments sanitizing and data formating. The way it works overall is simple: the service in socorro.middleware will define a URL and will parse the argu- ments when the service is called. That service will then find the right implementation class in socorro.external and call it with the parameters. The implementation class will do what it has to do (SQL query, computation... ) and return a Python dictionary. The service will then automatically transform that dictionary into a JSON string and send it back via HTTP.

Create the service

First create a new file for your service in socorro/middleware/ and call it nameofservice_service.py. This is a convention for the next version of our config manager. Then create a class inside as follow: import logging from socorro.middleware.service import DataAPIService logger= logging.getLogger("webapi") class MyService(DataAPIService):

service_name="my_service" # Name of the submodule to look for in external uri="/my/service/(. *)" # URL of the service

def __init__(self, config): super(MyService, self).__init__(config) logger.debug(’MyService service __init__’)

def get(self, *args): # Parse parameters of the URL params= self.parse_query_string(args[0])

# Find the implementation module in external depending on the configuration module= self.get_module(params)

# Instantiate the implementation class impl= module.MyService(config=self.context)

# Call and return the result of the implementation method return impl.mymethod(**params) uri is the URL pattern you want to match. It is a regular expression, and the content of each part ((.*)) will be in args. service_name will be used to find the corresponding implementation resource. It has to match the filename of the module you need. If you want to add mandatory parameters, modify the URI and values will be passed in args.

64 Chapter 12. Development Discussions Socorro Documentation, Release 2

Use external resources

The socorro.external contains everything related to outer resources like databases. Each submodule has a base class and classes for specific functionalities. If the function you need for your service is not already in there, you create a new file and a new class to implement it. To do so, follow this pattern: from socorro.external.myresource.base import MyResourceBase class MyModule(MyResourceBase):

def __init__(self, *args, **kwargs): super(MyModule, self).__init__(*args, **kwargs)

def my_method(self, **kwargs): do_stuff() return my_json_result

One of the things that you will want to do is filtering arguments and giving them default values. There is a function to do that in socorro.lib.external_common that is called parse_arguments. The documentation of that function says: Return a dict of parameters.

Take a list of filters and for each try to get the corresponding value in arguments or a default value. Then check that value’s type.

Example: filters = [ ("param1", "default", ["list", "str"]), ("param2", None, "int"), ("param3", ["list", "of", 4, "values"], ["list", "str"]) ] arguments = { "param1": "value1", "unknown": 12345 } => { "param1": ["value1"], "param2": 0, "param3": ["list", "of", "4", "values"] }

Here is an example of how to use this: class Products(PostgreSQLBase): def versions_info(self, **kwargs): # Parse arguments filters=[ ("product","Firefox","str"), ("versions", None,["list","str"]) ] params= external_common.parse_arguments(filters, kwargs)

params.product # "Firefox" by default or a string params.versions # [] by default or a list of strings

12.2. New Developer Guide 65 Socorro Documentation, Release 2

Configuration

Finally add your service to the list of running services in scripts/config/webapiconfig.py.dist as follow: import socorro.middleware.search_service as search import socorro.middleware.myservice_service as myservice # add servicesList= cm.Option() servicesList.doc=’a python list of classes to offer as services’ servicesList.default= [myservice.MyService, search.Search, (...)] # add

You can also add a config key for the implementation of your service. If you don’t, your service will use the default config key (serviceImplementationModule). To add a specific configuration key: # MyService service config myserviceImplementationModule= cm.Option() myserviceImplementationModule.doc="String, name of the module myservice uses." myserviceImplementationModule.default=’socorro.external.elasticsearch’ # for example

Then restart Apache and you should be good to go! If you’re using a Vagrant VM, you can hit the middleware directly by calling http://socorro-api/bpapi/myservice/params/.

And then?

Once you are done creating your service in the middleware, you might want to use it in the WebApp. If so, have a look at Socorro UI. You might also want to document it. We are keeping track of all existing services’ documentation in our Middleware API page. Please add yours!

Writing a PostgreSQL middleware unit test

First create your new test file in the appropriate localtion as specified above, for example so- corro/unittest/external/postgresql/test_myservice.py Next you want to import the following: from socorro.external.postgresql.myservice import MyService import socorro.unittest.testlib.util as testutil

As this is a PostgreSQL service unit test we also add: from .unittestbase import PostgreSQLTestCase

Next item to add is your setup_module function, below is a barebones version that would be sufficient for most tests: #------def setup_module(): testutil.nosePrintModule(__file__)

Next is the setup function in which you create and populate your dummy table(s) #======class TestMyService(PostgreSQLTestCase):

#------def setUp(self):

66 Chapter 12. Development Discussions Socorro Documentation, Release 2

super(TestMyService, self).setUp()

cursor= self.connection.cursor()

#Create table cursor.execute(""" CREATE TABLE product_info ( product_version_id integer not null, product_name citext, version_string citext, ); """)

# Insert data cursor.execute(""" INSERT INTO product_info VALUES ( 1, ’%s’, ’%s’ ); """%("Firefox","8.0"))

self.connection.commit()

For your test table(s) you can include as many, or as few, columns and rows of data as your tests will require. Next we add the tearDown function that will clean up after our tests has run, by dropping tables we created in the setUp function. #------def tearDown(self): """ Cleanup the database, delete tables and functions """ cursor= self.connection.cursor() cursor.execute(""" DROP TABLE product_info; """) self.connection.commit() super(TestProducts, self).tearDown()

Next, we write our actual tests against the dummy data we created in setUp. First step is to create an instance of the class we are going to test: #------def test_get(self): products= Products(config=self.config)

Next we write our first test passing the parameters to our function it expects: #...... # Test 1: find one exact match for one product and one version params={ "versions":"Firefox:8.0" }

Next we call our function passing the above parameters: res= products.get_versions( **params)

12.2. New Developer Guide 67 Socorro Documentation, Release 2

The above will now return a response that we need to test and determine whether it contains what we expect. In order to do this we create our expected response: res_expected={ "hits":[ { "product_version_id":1, "product_name":"Firefox", "version_string":"8.0" } ], "total":1 }

And finally we call the assertEquals function to test whether our response matches our expected response: self.assertEqual(res, res_expected)

Running a PostgreSQL middleware unit test

If you have not already done so, install nose tests. From the commons line run the command: sudo apt-get install python-nose

Once the installation completes change directory to, socorro/unittest/config/ and run the following: cp commonconfig.py.dist commonconfig.py

Now you can open up the file and edit it’s contents to match your testing environment. If you are running this in a VM via Socorro Vagrant, you can leave the content of the file as is. Next cd into socorro/unittest. To run all of the unit tests, run the following: nosetests

When writing a new test you most likely are more interested in running your own, and just your own, in- stead of running all of the unit tests that form part of Socorro. If your test is located in, for example unittest/external/postgresql/test_myservice.py then you can run your test as follows: nosetests socorro.external.postgresql.test_myservice

Ensuring good style

To ensure that the Python code you wrote passes PEP8 you need to run check.py. To do this your first step is to install it. From the terminal run: pip install -e git://github.com/jbalogh/check.git#egg=check

P.S. You may need to sudo the command above Once installed, run the following: check.py/path/to/your/file

12.2.4 How to Review a Pull Request

Part of our job as developers is to review and provide feedback on what our colleagues do. The goal of this process is to:

68 Chapter 12. Development Discussions Socorro Documentation, Release 2

• test that a new feature works as expected • make sure the code is clean • make sure the code doesn’t break anything Here are several steps you can follow when reviewing a pull request. Depending on the size of that pull request, you might want to skip some phases.

Read the code

The first task when reviewing is to read the code and verify that it is coherent and clean. Try to understand the algorithm and its goal, make sure that it is what was asked in the related bug. When there is something that you find non-trivial and that is not documented, ask for a doc-string or an inline comment so it becomes easier for others to understand the code.

Pull the code into your local environment

To go on testing, you will need to have the code in your local environment. Let’s say you want to test the branch my-dev-branch of rhelmer’s git repository. Here is one method to get the content of that remote branch into your repo: git remote add rhelmer https://github.com/rhelmer/socorro.git # the first time only git fetch rhelmer my-dev-branch:my-dev-branch git checkout my-dev-branch

Once you are in that branch, you can actually test the code or run tools on it.

Use a code quality tool

Running a code quality tool is a good and easy way to find coding and styling problems. For Python, we use check.py (check by jbalogh on github). This tool will run pyflakes on a file or a folder, and will then check that PEP 8 is respected. To install check.py, run the following command: pip install -e git://github.com/jbalogh/check.git#egg=check

For JavaScript, we suggest that you use JSHint. There are also a lot of tools for PHP, you can choose one you like. For HTML and CSS files, please use the tools from the W3C: CSS Validator and HTML Validator.

Run the unit tests

Socorro has a growing number of unit tests that are very helpful at verifying nothing breaks. Before approving and merging a pull request, you should run all unit tests to make sure they still pass. Note that those unit tests will be run when the pull request is merged, but it is easier to fix something before it lands on master than after. To run the unit tests in a Vagrant VM, do the following: make test

This installs all the dependencies needed and run all the tests. You need to have a running PostgreSQL instance for this to work, with a specific config file for the tests in socorro/unittest/config/commonconfig.py. For further documentation on unit tests, please read Unit Testing.

12.2. New Developer Guide 69 Socorro Documentation, Release 2

Test manually

This is not always possible in a local environment, but when it is you should make sure the new code behave as expected. Read applychanges-label.

Test before

This is a process to verify that one’s work is good and can go into master with little risk of breaking something. However, the developer is responsible for his or her bug and that review process doesn’t mean he or she shouldn’t go through all these steps. The reviewer is here to make sure the developer didn’t miss something, but it’s easier to fix something before a review process than after. Please test your code before opening a pull request!

12.3 Glossary

Build: a date encoding used to identify when a client was compiled. (submission metadata) Crash Report Details Page - A crash stats page displaying all known details of a crash Crash Dump/Metadata pair - shorthand for The pair of Raw Crash Dump and corresponding Raw Crash Metadata Deferred Job Storage: a file system location where Crash Dump/Metadata pair are kept without being processed. Dump File: See Raw Crash Dump, don’t use this term it makes me giggle Job: a job queue item for a Raw Crash Dump that needs to be processed JSON Dump Storage: the Python module that implements File System Materialized view: the tables in the database containing the data for used in statistical analysis. Including: [[Mean- TimeBeforeFailure]], Top Crashers By Signature, Top Crashers By URL. The “Trend Reports” from the Socorro UI display information from these tables. Minidump: see ‘raw crash dump’ Minidump_stackwalk: an application from the Breakpad project that takes a raw dump file, marries it with symbols and produces output usable by developers. This application is invoked by Processor. Monitor: the Socorro application in charge of queuing jobs. See Monitor OOID: A crash report ID. Originally a 32bit value, the original legacy system stored it in the database as a hexidecimal text form. Each crash is assigned an OOID by the Collector when the crash is recieved. Platform: the OS that a client runs on. This term has been historically a point of confusion and it is preferred that the term OS or Client OS be used instead. Processed Dump Storage: the disk location where the output files of the minidump_stackwalk program are stored. The actual files are stored with a .jsonz extension. Processor: the Socorro application in charge of applying minidump_stackwalk to queued jobs. See Processor Raw Crash Dump, Raw Dump: the data sent from a client to Socorro containing the state of the application at the time of failure. It is paired with a Raw Crash Metadata file. Raw Crash Metadata - the metadata sent from a client to Socorro to describe the Raw Crash. It is saved in JSON format, not to be confused with a Cooked Crash Dump. Raw JSON file: See Crash Dump Metadata... a file in the JSON format containing metadata about a ‘dump file’. Saved with a ‘.json’ suffix. Release: a categorization of an application’s product name and version. The categories are: “major”, “milestone”, or “development”. Within the database, an enum called ReleaseEnum? represents these categories.

70 Chapter 12. Development Discussions Socorro Documentation, Release 2

Reporter: another name for the Socorro UI Skip List: lists of signature regular expressions used in generating a crash’s overall signature in the Processor. see Signature Generation Standard Job Storage: a file system location where JSON/dump pairs are kept for processing Throttling: statistically, we don’t have to save every single crash. This option of the Collector configuration allows us to selectively throw away dumps. Trend Reports: the pages in the Socorro UI that display the data from the materialized views. UUID: a univeral unique identifier. Term is being deprecated in favor of OOID. Web head: a machine that runs Collector

12.3.1 Deferred Job Storage

Deferred storage is where the JSON/dump pairs are saved if they’ve been filtered out by Collector throttling. The location of the deferred job storage is determined by the configuration parameter deferredStorageRoot found in the Common Config. JSON/dump pairs that are saved in deferred storage are not likely to ever be processed further. They are held for a configurable number of days until deleted by Deferred Cleanup. Occasionally, a developer will request a report via Reporter on a job that was saved in deferred storage. Monitor will look for the job in deferred storage if it cannot find it in standard storage. For more information on the storage technique, see File System

12.3.2 JSON Dump Storage

What this system offers

Crash data is stored so that it can be quickly located based on a Universally Unique Identifier (uuid) or visited by the date and time when reported

Directory Structure

The crash files are located in a tree with two branches: the name or “index” branch and the date branch. • The name branch consists of paths based on the first few pairs of characters of the uuid. Name branch holds the two data files and a relative symbolic link to the date branch directory associated with the particular uuid. For the uuid: 22adfb61-f75b-11dc-b6be-001321b0783d The “depth” is the number of sub-directories between the name directory and the actual file. By default, to conserve inodes, depth is two.

– By default, the json file is stored (depth 2) as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be- 001321b0783d.json – The json file could be stored (depth 4) as %(root)s/name/22/ad/fb/61/22adfb61-f75b-11dc-b6be- 001321b0783d.json – The dump file is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d.dump – The symbolic link is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d and (see below) references (own location)/%(toDateFromName)s/2008/09/30/12/05/webhead01_0/ • The date branch consists of paths based on the year, month, day, hour, minute-segment, webhead host name and a small sequence number. For each uuid, it holds a relative symbolic link referring to the actual name directory holding the data for that uuid. For the uuid above, submitted at 2008-09-30T12:05 from webhead01

12.3. Glossary 71 Socorro Documentation, Release 2

– The symbolic link is stored as %(root)s/date/2008/09/30/12/05/webhead01_0/22adfb61-f75b-11dc- b6be-001321b0783d and references (own location)/%(toNameFromDate)s/22/ad/fb/61/ • Note (name layout) In the examples on this page, the name/index branch uses the first 4 characters of the uuid as two character-pairs naming subdirectories. This is a configurable setting called storageDepth in the Collector configuration. To use the 8 characters, storageDepth is set to 4. To use 6 characters, set to 3. The default storageDepth is 2 because on our system, with (approximately) 64K leaf directories, the number of files per leaf is reasonable; and the number of inodes required by directory entries is not so large as to cause undue difficulty. A storageDepth of 4 was examined, and was found to crash the file system by requiring too many inodes. • If the uuids are such that their initial few characters are well spread among all possibles, then the lookup can be very quick. If the first few characters of the uuids are not well distributed, the resulting directories may be very large. If, despite well chosen uuids, the leaf name directories become too large, it would be simple to add another level, reducing the number of files by approximately a factor of 256; however bear in mind the issue of inodes. • Note (symbolic links) The symbolic links are relative rather than absolute, to avoid issues that might arise from variously mounted nfs volumes. • Note (maxDirectoryEntries) If the number of links in a particular webhead subdirectory would exceed maxDi- rectoryEntries, then a new webhead directory is created by appending a larger _N : .../webhead01_0 first, then .../webhead01_1 etc. For the moment, maxDirectoryEntries is ignored for the name branch.

How it’s used

We use the file system storage for incoming dumps caught by Collector. There are two instances of the file system used for different purposes: standard storage and deferred storage.

Standard Job Storage

This is where json/dump pairs are stored for further processing. The Monitor finds new dumps and queues them for processing. It does this by walking the date branch of the file system using the API function destructiveDateWalk. As it moves through the date branch, it notes every uuid (in the form of a symbolic link) that it encounters. It queues the information from the symbolic link and then deletes the symbolic link. This insures that it only ever finds new entries. Later, the Processor will read the json/dump pair by doing a direct lookup of the uuid on the name branch. In the case of priority processing, the target uuid is looked up directly on the name branch. Then the link to the date branch is used to locate and delete the link on the date branch. This insures that a priority job is not found a second time as a new job by the Monitor.

Deferred Job Storage

This is where jobs go that are deferred by Monitor‘s throttling mechanism. If a json/dump pair is needed for priority processing, it can be looked up directly on the name branch. In such a case, just as with priority jobs in standard storage, we destroy the links between the two branches. However, in this case, destroying the links prevents the json/dump pair from being deleted by the deferred cleanup process. When it comes time to drop old json/dump pairs that are no longer needed within the deferred storage, the system is given a date threshold. It walks the appropriate parts of the date branch older than the threshold. It uses the links to the name branch to blow away the elderly json/dump pairs. class JsonDumpStorage socorro.lib.JsonDumpStorage holds data and implements methods for creating and accessing crash files.

72 Chapter 12. Development Discussions Socorro Documentation, Release 2 public methods • __init__(self, root=".", maxDirectoryEntries=1024, **kwargs) Take note of our root directory, maximum allowed date->name links per directory, some relative relations, and whatever else we may need. Much of this (c|sh)ould be read from a config file. Recognized keyword args: – dateName. Default = ‘date’ – indexName. Default = ‘name’ – jsonSuffix. Default = ‘.json’. If not startswith(‘.’) then ‘.’ is prepended – dumpSuffix. Default = ‘.dump’. If not startswith(‘.’) then ‘.’ is prepended – dumpPermissions. Default 660 – dirPermissions. Default 770 – dumpGID. Default None. If None, then owned by the owner of the running script. • newEntry (self, uuid, webheadHostName=’webhead01’, timestamp=DT.datetime.now()) Sets up the name and date storage for the given uuid. – Creates any directories that it needs along the path to the appropriate storage location (possibly adjusting ownership and mode) – Creates two relative symbolic links:

* the date branch link pointing to the name directory holding the files; * the name branch link pointing to the date branch directory holding that link. – Returns a 2-tuple containing files open for writing: (jsonfile,dumpfile) • getJson (self, uuid) Returns an absolute pathname for the json file for a given uuid. Raises OSError if the file is missing • getDump (self, uuid) Returns an absolute pathname for the dump file for a given uuid. Raises OSError if the file is missing • markAsSeen (self,uuid) Removes the links associated with the two data files for this uuid, thus marking them as seen. Quietly returns if the uuid has no associated links. • destructiveDateWalk (self) This function is a generator that yields all(see note) uuids found by walking the date branch of the file system. Just before yielding a value, it deletes both the links (from date to name and from name to date) After visiting all the uuids in a given date branch, recursively deletes any empty subdirectories in the date branch Since the file system may be manipulated in a different thread, if no .json or .dump file is found, the links are left, and we do not yield that uuid note To avoid race conditions, does not visit the date subdirectory corresponding to the current time • remove (self, uuid) Removes all instances of the uuid from the file system including the json file, the dump file, and the two links if they still exist.

12.3. Glossary 73 Socorro Documentation, Release 2

– Ignores missing link, json and dump files: You may call it with bogus data, though of course you should not • move (self, uuid, newAbsolutePath) Moves the json file then the dump file to newAbsolutePath. – Removes associated symbolic links if they still exist. – Raises IOError if either the json or dump file for the uuid is not found, and retains any links, but does not roll back the json file if the dump file is not found. • removeOlderThan (self, timestamp) – Walks the date branch removing all entries strictly older than the timestamp. – Removes the corresponding entries in the name branch. member data Most of the member data are set in the constructor, a few are constants, the rest are simple calculations based on the others. • root: The directory that holds both the date and index(name) subdirectories • maxDirectoryEntries: The maximum number of links in each webhead directory on the date branch. Default = 1024 • dateName: The name of the date branch subdirectory. Default = ‘date’ • indexName: The name of the index branch subdirectory. Default = ‘name’ • jsonSuffix: the suffix of the json crash file. Default = ‘.json’ • dumpSuffix: the suffix of the dump crash file. Default = ‘.dump’ • dateBranch: The full path to the date branch • nameBranch: The full path to the index branch • dumpPermissions: The permissions for the crash files. Default = 660 • dirPermissions: The permissions for the directories holding crash files. Default = 770 • dumpGID: The group ID for the directories and crash files. Default: Owned by the owner of the running script. • toNameFromDate: The relative path from a leaf of the dateBranch to the nameBranch • toDateFromName: The relative path from a leaf of the nameBranch to the dateBranch • minutesPerSlot: How many minutes in each sub-hour slot. Default = 5 • slotRange: A precalculated range of slot edges = range(self.minutesPerSlot, 60, self.minutesPerSlot)

12.3.3 Processed Dump Storage

Processed dumps are stored in two places: the relational database as well as in flat files within a file system. This forking of the storage scheme came from the realization that the infrequently used data within the database ‘dumps’ tables was causing performance problems within PostgreSQL. The ‘dumps’ tables took nearly eighty percent of the total storage, making replication and backup problematic. Since the ‘dumps’ table’s data is used only when a user requests a specific crash dump by uuid, most of the data is rarely, if ever, accessed. We decided to migrate these dump into a file system storage outside the database. Details can be seen at: Dumping Dump Tables

74 Chapter 12. Development Discussions Socorro Documentation, Release 2

In the file system, after processing, dumps are stored a gzip compressed JSON file format. This format echos a flattening of the ‘reports’, ‘extensions’ and the now deprecated ‘dumps’ tables within the database.

Directory Structure

Just as in the JsonDumpStorage scheme, there are two branches: ‘name’ and ‘date’

Access by Name

Most lookups of processed crash data happens by name. We use a radix storage technique where the first 4 characters of the file name are used for two levels of directory names. A file called aabbf9cb-395b-47e8-9600-4f20e2090331.jsonz would be found in the file system as .../aa/bb/aabbf9cb-395b-47e8-9600-4f20e2090331.jsonz

Access by Date

For the purposes of finding crashes that happened at specific date and time, a hierarchy of date directories offer quick lookup. The leaves of the date directories contain symbolic links to the locations of crash data.

JSON File Format example:

{"signature": "nsThread::ProcessNextEvent(int, int*)", "uuid": "aabbf9cb-395b-47e8-9600-4f20e2090331", "date_processed": "2009-03-31 14:45:09.215601", "install_age": 100113, "uptime": 7, "last_crash": 95113, "product": "SomeProduct", "version": "3.5.2", "build_id": "20090223121634", "branch": "1.9.1", "os_name": "Mac OS X", "os_version": "10.5.6 9G55", "cpu_name": "x86", "cpu_info": "GenuineIntel family 6 model 15 stepping 6", "crash_reason": "EXC_BAD_ACCESS / KERN_INVALID_ADDRESS", "crash_address": "0xe9b246", "User Comments": "This thing crashed.\nHelp me Kirk.", "app_notes": "", "success": true, "truncated": false, "processor_notes": "", "distributor":"", "distributor_version": "", "add-ons": [["{ABDE892B-13A8-4d1b-88E6-365A6E755758}", "1.0"], ["{b2e293ee-fd7e-4c71-a714-5f4750d8d7b7}", "2.2.0.9"], ["{972ce4c6-7e08-4474-a285-3208198ce6fd}", "3.5.2"]], "dump":"OS|Mac OS X|10.5.6 9G55\\nCPU|x86|GenuineIntel family 6 model 15 stepping 6|2\\nCrash|EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE|0x1558c095|0\\n Module|firefox-bin||firefox-bin|988FA8BFC789C4C07C32D61867BB42B60|0x00001000|0x00001fff|\\n..... "}

The “dump” component is the direct streamed output from the Breakpad “minidump_stackwalk” program. Unfortu- nately, that project does not give detailed documentation of the format.

12.3. Glossary 75 Socorro Documentation, Release 2

12.3.4 Standard Job Storage

Standard storage is where the JSON/dump pairs are saved while they wait for processing. The location of the standard storage is determined by the configuration parameter storageRoot found in the Common Config. The file system is divided into two parts: date based storage and name based storage. Both branches use a radix sort breakdown to locate files. The original version of Socorro used only the date based storage, but it was found to be too slow to search when under a heavy load. For a deeper discussion of the storage technique: see File System

12.3.5 Top Crashers By URL

Introduction

The Top Crashers By Url report displays aggregate crash counts by unique urls or by unique domains. From here one can drill down to crash signatures. For crashes with comments, we display the comment in a link to the individual crash. In the future, signatures will be linked to search results, once we support url/domain as a search parameter.

Details

Data Definitions Urls - This is everything before the query string. Domains - This is the entire hostname. Examples: http://www.example.com/page.html?foo=bar

• url - http://www.example.com/page.html • domain - www.example.com chrome://example/content/extension.xul • url - chrome://example/content/extension.xul • domain - example about:config invalid, no protocol

Filtering

For a crash report to be counted it much have the following: • A url which is not null or empty and which has a protocol • Aggregates are calculated 1 day at a time for the previous day • At the level of aggregation, it must have more than 1 record Crash data viewed from the url perspective is a very long tail of crashes for a single unique url. We cut off this tail which reduces data storage and processing time by an order of magnitude. A consequence of this filtering (only good urls + multiple crashes) makes the total crash aggregates much lower than top crashers or raw queries. Keep this in mind when using aggregates: Top crashers (by os) is a much better gauge.

76 Chapter 12. Development Discussions Socorro Documentation, Release 2

Administration

Configuring new products The Top Crashers By URl report is powered by the tcbyurlconfig and productdims tables. 1. Make sure your product is in the productdims table (a) If not, insert it. The following sets up a specific version of a specific product for all, win, and mac platforms.:

INSERT INTO productdims (product, version, os_name, release) VALUES (’Firefox’, ’3.0.4’, ’ALL’,’major’); INSERT INTO productdims (product, version, os_name, release) VALUES (’Firefox’, ’3.0.4’, ’Win’,’major’); INSERT INTO productdims (product, version, os_name, release) VALUES (’Firefox’, ’3.0.4’, ’Mac’,’major’);

2. Insert a config entry for the exact product you want to report on. usually this is os_name = ALL.: INSERT INTO tcbyurlconfig (productdims_id, enabled) SELECT id, ’Y’ FROM productdims WHERE product = ’Firefox’ AND version = ’3.0.4’ AND os_name = ’ALL’;

3. wait for results 4. reap the profit. Suspending Reports Table tcbyurlconfig has an ‘enabled’ column. Set it false to stop the cron from updating the reports for a particular product. Mozilla Specific Make sure to match up the release type. versions with pre are milestone. Versions with a or b in them are development.

Operations

This report is populated by a cron python script which runs at 10:00 PM PST. The run is controlled by configuration data from a table in the database. All products which are enabled in this config table will have their daily report generated. In future this will be managed via an admin page, but currently it is managed via SQL.

Development

Details about the database design are in Report Database Design

12.3.6 Top Crashers By Signature

Introduction

Topcrashers By Signature compiles the 14 days’ worth of crash reports (organized by signature) for a given version. This report is useful for finding new topcrashes, determining if topcrashes have been filed, and seeing trending of topcrashes over time (for a specific version).

12.3. Glossary 77 Socorro Documentation, Release 2

Details

For the ideal topcrashers by signature report, we want to gather the following data: • crashes by version (e.g., Firefox 3.0.9) • date a crash occurred (to know if it’s within our window) • stack signature • average uptime (since last browser start) averaged over window • bug numbers related to crash signature Additionally, we need the ability to either a) go back in time or b) “freeze” the topcrashers by signature report on a specific day. This allows us to compare, say, the last day of a release to the newest release (e.g., Firefox 3.0.8 to Firefox 3.0.9). Without the ability to go back to a specific day of topcrash reports or freeze topcrash reports, we have no easy ability to compare releases (as new crashes come in for old releases, the topcrash list changes substantially). Ideal Outputs (to be filled) See [[SocorroUIInstallation]] for additional details.

Operations

• Need a recalculation every 4 to 6 hours • Need top 500 signatures, ranked over last 14 days • Note that this implies for the database that each slice is aggregated from the full window (which slides forward each time)

12.3.7 Signature Generation

Introduction

The Processor creates an overall signature for a crash based on the signatures of the stack frame of the crashing thread. It walks the stack from the frame with the lowest number (the top of the stack) applying rules and accumulating a list of signatures found to be relevant. Once the rules are done, the list of signatures is concatenated into a single string. That single string become the crash’s overall signature.

Normalization

Before any frame signatures are considered, they are normalized. This is just a string formating change. Runs of spaces are compressed to just one space. Commas are insured to always be followed by a space, integer values are replaced by ‘int’. Signatures that match the signaturesWithLineNumbersRegEx regular expression are combined with their source code line. Frames that have no function information are written as sourcecode/line number pairs. If no source code is available, it tries to find a module/address pair. Failing that, it falls back to just an address.

The SkipList Rules

The signature is generated by walking through each stack frame considering its ‘name’ (as normalized above). Frames / names are skipped or added to the signature list according to the rules. When a signature list is complete, it is

78 Chapter 12. Development Discussions Socorro Documentation, Release 2 converted to string by concatenating the frame names with spaces and a vertical bar between each name, for exam- ple: objc_msgSend | IdleTimerVector is the signature for a stack that contained (irrelevant frames), “objc_msgSend”, “IdleTimerVector” which matched neither prefix nor irrelevant regular expressions and possibly other frames which did not become part of the signature. regular expressions

Each SkipList rule is a regular expression. Typically, it takes the form of an alternation of frame names, but any legal regular expression can be used. Regular expression alternation syntax is a|b|c: Match on ‘a’ or ‘b’ or ‘c’. This work is done in Python, so use Python Regular Expression Syntax signatureSentinels

A typical rule might be: “_purecall”. This is the first rule to be applied. The code iterates through the stack frame, throwing away everything it finds until it encounters a match to this regular expression or the end of the stack. If it finds a match, it passes all the frames after the match to the next step. If it finds no match, it passes the whole list of frames to the next step. irrelevantSignatureRegEx

A typical rule might be: “@0x0-9a-fA-F{2,}|@0x1-9a-fA-F|RaiseException|CxxThrowException”. A frame which matches this regular expression will be appended to the signature only if a prefix frame has already been seen (see next rule). prefixSignatureRegEx

A typical rule might be “@0x0|strchr|strstr|strlen|PL_strlen|strcmp|wcslen|memcpy|memmove|memcmp|malloc|realloc|objc_msgSend”, though at Mozilla it has grown much longer. This is the rule that generates compound signatures. A frame that matches this regular expression changes the state of the machine to ‘seen prefix’. In ‘seen prefix’ state, irrelevant or prefix frames are appended. As soon as a frame is neither, it is appended and the signature list is complete. Once the signature list is complete, the signature is generated as mentioned above

12.3.8 Crash Mover

The Collector dumps all the crashes that it receives into the local file system. This application is responsible for transferring those crashes into hbase. Configuration: import stat import socorro.lib.ConfigurationManager as cm

#------# general numberOfThreads= cm.Option() numberOfThreads.doc=’the number of threads to use’ numberOfThreads.default=4

12.3. Glossary 79 Socorro Documentation, Release 2

#------# source storage sourceStorageClass= cm.Option() sourceStorageClass.doc=’the fully qualified name of the source storage class’ sourceStorageClass.default=’socorro.storage.crashstorage.CrashStorageSystemForLocalFS’ sourceStorageClass.fromStringConverter= cm.classConverter from config.collectorconfig import localFS from config.collectorconfig import localFSDumpDirCount from config.collectorconfig import localFSDumpGID from config.collectorconfig import localFSDumpPermissions from config.collectorconfig import localFSDirPermissions from config.collectorconfig import fallbackFS from config.collectorconfig import fallbackDumpDirCount from config.collectorconfig import fallbackDumpGID from config.collectorconfig import fallbackDumpPermissions from config.collectorconfig import fallbackDirPermissions from config.commonconfig import jsonFileSuffix from config.commonconfig import dumpFileSuffix

#------# destination storage destinationStorageClass= cm.Option() destinationStorageClass.doc=’the fully qualified name of the source storage class’ destinationStorageClass.default=’socorro.storage.crashstorage.CrashStorageSystemForHBase’ destinationStorageClass.fromStringConverter= cm.classConverter from config.commonconfig import hbaseHost from config.commonconfig import hbasePort from config.commonconfig import hbaseTimeout

#------# logging syslogHost= cm.Option() syslogHost.doc=’syslog hostname’ syslogHost.default=’localhost’ syslogPort= cm.Option() syslogPort.doc=’syslog port’ syslogPort.default= 514 syslogFacilityString= cm.Option() syslogFacilityString.doc=’syslog facility string ("user","local0", etc)’ syslogFacilityString.default=’user’ syslogLineFormatString= cm.Option() syslogLineFormatString.doc=’python logging system format for syslog entries’ syslogLineFormatString.default=’Socorro Storage Mover (pid %(process)d): %(asctime)s %(levelname)s - %(threadName)s - %(message)s’ syslogErrorLoggingLevel= cm.Option() syslogErrorLoggingLevel.doc=’logging level for the log file (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’ syslogErrorLoggingLevel.default= 10 stderrLineFormatString= cm.Option()

80 Chapter 12. Development Discussions Socorro Documentation, Release 2

stderrLineFormatString.doc=’python logging system format for logging to stderr’ stderrLineFormatString.default=’ %(asctime)s %(levelname)s - %(threadName)s - %(message)s’ stderrErrorLoggingLevel= cm.Option() stderrErrorLoggingLevel.doc=’logging level for the logging to stderr (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’ stderrErrorLoggingLevel.default= 10

12.3.9 Collector

Collector is an application that runs under Apache using mod-python. Its task is accepting crash reports from remote clients and saving them in a place and format usable by further applications. Raw crashes are accepted via HTTP POST. The form data from the POST is then arranged into a JSON and saved into the local file system. The collector is responsible for assigning an ooid? (Our Own ID) to the crash. It also assigns a Throttle? value which determines if the crash is eventually to go into the relational database. Should the saving to a local file system fail, there is a fallback storage mechanism. A second file system can be configured to take the failed saves. This file system would likely be an NFS mounted file system. After a crash is saved, there is an app called Crash Mover that will transfer the crashes to HBase.

Collector Python Configuration

Like all the Socorro applications, the configuration is actually executable Python code. Two configuration files are relevant for collector • Copy .../scripts/config/commonconfig.py.dist to .../config/commonconfig.py. This configura- tion file contains constants used by many of the Socorro applications. • Copy .../scripts/config/collectorconfig.py.dist to .../config/collectorconfig.py

Common Configuration

There are two constants in ‘.../scripts/config/commonconfig.py’ of interest to collector: jsonFileSuffix, and dumpFile- Suffix. Other constants in this file are ignored. To setup the common configuration, see Common Config.

Collector Configuration collectorconfig.py has several options to adjust how files are stored: See sample config code on Github

12.3.10 Reporter

Deprecated. See :ref:‘uiinstallation-chapter‘_

12.3. Glossary 81 Socorro Documentation, Release 2

12.3.11 Monitor

Monitor is a multithreaded application with several mandates. It’s main job is to find new JSON/dump pairs and queue them for further processing. It looks for new JSON/dump pairs in the file system location designated by the constant storageRoot from the Common Config file. Once it finds a pair, it queues them as a “job” in the database ‘jobs’ table and assigns it to a specific processor. Once queued, the monitor goes on to find other new jobs to queue. Monitor also locates and queues priority jobs. If a user requests a report via the Reporter and that crash report has not yet been processed, the Reporter puts the requested crash’s UUID into the database’s ‘priorityjobs’ table. Monitor looks in three places for the requested job: • the processors - if monitor finds the job already assigned to a processor, it raises the priority of that job so the processor will do it quickly • the storageRoot file system - if the job is found here, it queues it for priority processing immediately rather than waiting for standard mechanism to eventually find it • the deferredStorageRoot file system - if the requested crash was filtered out by server side throttling, monitor will find it and queue it immediately from that location. Monitor is also responsible for keeping the StandardJobStorage file system neat and tidy. It monitors the ‘jobs’ queue in the database. Once it sees that a previously queued job has been completed, it moves the JSON/dump pairs to long term storage or it deletes them (based on a configuration setting). Jobs that fail their further processing stage are also either saved in a “failed” storage area or deleted. Monitor is a command line application meant to be run continuously as a daemon. It can log its actions to stderr and/or to automatically rotating log files. See the configuration options below beginning with stderr* and logFile* for more information. The monitor app is found as .../scripts/monitor.py In order to run monitor, the socorro package must be visible somewhere on the python path.

Configuration

Monitor, like all the Socorro applications, uses the common configuration for several of its constants. For setup of common configuration, see Common Config. monitor also has an executable configuration file of its own. A sample file is found at .../scripts/config/monitorconfig.py.dist. Copy this file to .../scripts/config/monitorconfig.py and edit it for site specific settings. In each case where a site specific value is desired, replace the value for the .default member. standardLoopDelay Monitor has to scan the StandardJobStorage looking for jobs. This value represents the delay between scans.: standardLoopDelay= cm.Option() standardLoopDelay.doc=’the time between scans for jobs (HHH:MM:SS)’ standardLoopDelay.default=’00:05:00’ standardLoopDelay.fromStringConverter= cm.timeDeltaConverter cleanupJobsLoopDelay Monitor archives or deletes JSON/dump pairs from the StandardJobStorageThis? value represents the delay between runs of the archive/delete routines.: cleanupJobsLoopDelay= cm.Option() cleanupJobsLoopDelay.doc=’the time between runs of the job clean up routines (HHH:MM:SS)’ cleanupJobsLoopDelay.default=’00:05:00’ cleanupJobsLoopDelay.fromStringConverter= cm.timeDeltaConverter

82 Chapter 12. Development Discussions Socorro Documentation, Release 2

priorityLoopDelay The frequency to look for priority jobs.: priorityLoopDelay= cm.Option() priorityLoopDelay.doc=’the time between checks for priority jobs (HHH:MM:SS)’ priorityLoopDelay.default=’00:01:00’ priorityLoopDelay.fromStringConverter= cm.timeDeltaConverter saveSuccessfulMinidumpsTo: saveSuccessfulMinidumpsTo= cm.Option() saveSuccessfulMinidumpsTo.doc=’the location for saving successfully processed dumps (leave blank to delete them instead)’ saveSuccessfulMinidumpsTo.default=’/tmp/socorro-sucessful’ saveFailedMinidumpsTo: saveFailedMinidumpsTo= cm.Option() saveFailedMinidumpsTo.doc=’the location for saving dumps that failed processing (leave blank to delete them instead)’ saveSuccessfulMinidumpsTo.default=’/tmp/socorro-failed’ logFilePathname Monitor can log its actions to a set of automatically rotating log files. This is the name and location of the logs.: logFilePathname= cm.Option() logFilePathname.doc=’full pathname for the log file’ logFilePathname.default=’./monitor.log’ logFileMaximumSize This is the maximum size in bytes allowed for a log file. Once this number is achieved, the logs rotate and a new log is started.: logFileMaximumSize= cm.Option() logFileMaximumSize.doc=’maximum size in bytes of the log file’ logFileMaximumSize.default= 1000000 logFileMaximumBackupHistory The maximum number of log files to keep.: logFileMaximumBackupHistory= cm.Option() logFileMaximumBackupHistory.doc=’maximum number of log files to keep’ logFileMaximumBackupHistory.default= 50 logFileLineFormatString A Python format string that controls the format of individual lines in the logs: logFileLineFormatString= cm.Option() logFileLineFormatString.doc=’python logging system format for log file entries’ logFileLineFormatString.default=’ %(asctime)s %(levelname)s - %(message)s’ logFileErrorLoggingLevel Logging is done in severity levels - the lower the number, the more verbose the logs.: logFileErrorLoggingLevel= cm.Option() logFileErrorLoggingLevel.doc=’logging level for the log file (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’ logFileErrorLoggingLevel.default= 10

12.3. Glossary 83 Socorro Documentation, Release 2

stderrLineFormatString In parallel with creating log files, Monitor can log to stderr. This is a Python format string that controls the format of individual lines sent to stderr.: stderrLineFormatString= cm.Option() stderrLineFormatString.doc=’python logging system format for logging to stderr’ stderrLineFormatString.default=’ %(asctime)s %(levelname)s - %(message)s’

stderrErrorLoggingLevel Logging to stderr is done in severity levels independently from the log file severity levels - the lower the number, the more verbose the output to stderr.: stderrErrorLoggingLevel= cm.Option() stderrErrorLoggingLevel.doc=’logging level for the logging to stderr (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’ stderrErrorLoggingLevel.default= 40

12.3.12 File System

Socorro uses two similar file system storage schemes in two distinct places within the system. Raw crash dumps from the field use a system called JSON Dump Storage while at the other end, processed dumps use the Processed Dump Storage scheme.

12.3.13 Deferred Cleanup

When the Collector throttles the flow of crash dumps, it saves deferred crashes into Deferred Job Storage. These JSON/dump pairs will live in deferred storage for a configurable number of days. It is the task of the deferred cleanup application to implement the policy to delete old crash dumps. The deferred cleanup application is a command line app meant to be run via as a cron job. It should be set to run once every twenty-four hours.

Configuration deferredcleanup uses the common configuration for to get the constant deferredStorageRoot. For setup of common configuration, see Common Config. deferredcleanup also has an executable configuration file of its own. A sample file is found at .../scripts/config/deferredcleanupconfig.py.dist. Copy this file to .../scripts/config/deferredcleanupconfig.py and edit it for site specific settings. In each case where a site specific value is desired, replace the value for the .default member. maximumDeferredJobAge This constant specifies how many days deferred jobs are allowed to stay in deferred storage. Job deletion is permanent.: maximumDeferredJobAge= cm.Option() maximumDeferredJobAge.doc=’the maximum number of days that deferred jobs stick around’ maximumDeferredJobAge.default=2

dryRun Used during testing and development, this prevents deferredcleanup from actually deleting things.:

84 Chapter 12. Development Discussions Socorro Documentation, Release 2

dryRun= cm.Option() dryRun.doc="don’t really delete anything" dryRun.default= False dryRun.fromStringConverter= cm.booleanConverter logFilePathname Deferredcleanup can log its actions to a set of automatically rotating log files. This is the name and location of the logs.: logFilePathname= cm.Option() logFilePathname.doc=’full pathname for the log file’ logFilePathname.default=’./processor.log’ logFileMaximumSize This is the maximum size in bytes allowed for a log file. Once this number is achieved, the logs rotate and a new log is started.: logFileMaximumSize= cm.Option() logFileMaximumSize.doc=’maximum size in bytes of the log file’ logFileMaximumSize.default= 1000000 logFileMaximumBackupHistory The maximum number of log files to keep.: logFileMaximumBackupHistory= cm.Option() logFileMaximumBackupHistory.doc=’maximum number of log files to keep’ logFileMaximumBackupHistory.default= 50 logFileLineFormatString A Python format string that controls the format of individual lines in the logs: logFileLineFormatString= cm.Option() logFileLineFormatString.doc=’python logging system format for log file entries’ logFileLineFormatString.default=’ %(asctime)s %(levelname)s - %(message)s’ logFileErrorLoggingLevel Logging is done in severity levels - the lower the number, the more verbose the logs.: logFileErrorLoggingLevel= cm.Option() logFileErrorLoggingLevel.doc=’logging level for the log file (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’ logFileErrorLoggingLevel.default= 20 stderrLineFormatString In parallel with creating log files, Monitor can log to stderr. This is a Python format string that controls the format of individual lines sent to stderr.: stderrLineFormatString= cm.Option() stderrLineFormatString.doc=’python logging system format for logging to stderr’ stderrLineFormatString.default=’ %(asctime)s %(levelname)s - %(message)s’ stderrErrorLoggingLevel Logging to stderr is done in severity levels independently from the log file severity levels - the lower the number, the more verbose the output to stderr.:

12.3. Glossary 85 Socorro Documentation, Release 2

stderrErrorLoggingLevel= cm.Option() stderrErrorLoggingLevel.doc=’logging level for the logging to stderr (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’ stderrErrorLoggingLevel.default= 40

12.4 Standalone Development Environment

You can easily bring up a full Socorro VM, see Setup a development environment for more info. However, in some cases it can make sense to run components standalone in a development environment, for example if you want to run just one or two components and connect them to an existing Socorro install for debugging.

12.4.1 Setting up

1) clone the repo (http://github.com/mozilla/socorro) git clone git://github.com/mozilla/socorro.git cd socorro/

2) set up Python path export PYTHONPATH=.:thirdparty/

3) create virtualenv and use it (this installs all needed Socorro dependencies) make virtualenv . socorro-virtualenv/bin/activate

4) copy default Socorro config (also see Common Config) pushd scripts/config for file in *.py.dist; do cp $file ‘basename $file .dist‘; done edit commonconfig.py (...) popd

12.4.2 Install and configure UI

1) symlink webapp-php/ to HTDOCS area mv ~/public_html ~/public_html.old ln -s ./webapp-php ~/public_html

2) copy default webapp config (also see UI Installation) cp htaccess-dist .htaccess pushd webapp-php/application/config/ for file in *.php-dist; do cp $file ‘basename $file -dist‘; done edit database.php config.php (...) popd

3) make sure log area is writable to webserver user chmod o+rwx webapp-php/application/logs

86 Chapter 12. Development Discussions Socorro Documentation, Release 2

12.4.3 Launch standalone Middleware instance

Edit scripts/config/webapiconfig.py and change wsgiInstallation to False (this allows the middleware to run in stan- dalone mode): wsgiInstallation.default= False

NOTE - make sure to use an unused port, it should be the same as whatever you configure in webapp- php/application/config/webserviceclient.php python scripts/webservices.py 9191

This will use whichever database you configured in commonconfig.py

12.5 Unit Testing

There are (some, and a growing number of) unit tests for the Socorro code

12.5.1 How to Unit Test

• configure your test environment (see below) • install nosetests • cd to socorro/unittests • chant nosetests and observe the result – You should expect more than 185 tests (186 as of 2009-03-25) – You should see exactly two failures (unless you are running as root), with this assertion: Assertion- Error: You must run this test as root (don’t forget root’s PYTHONPATH):

ERROR: testCopyFromGid (socorro.unittest.lib.testJsonDumpStorageGid.TestJsonDumpStorageGid) ERROR: testNewEntryGid (socorro.unittest.lib.testJsonDumpStorageGid.TestJsonDumpStorageGid)

• You may ‘observe’ the result by chanting nosetests > test.out 2>&1 and then examining test.out (or any name you prefer) • There is a bash shell file: socorro/unittest/red which may sourced to provide a bash function red that simplifies watching test logfiles in a separate terminal window. In that window, cd to the unittest sub-directory of interest, then source the file: . ../red, then chant red. The effect is to clear the screen, then tail -F the logfile associated with tests in that directory. You may chant red –help to be reminded. • The red file also provides a function noseErrors which simplifies the examination of nosetests output. Chant noseErrors –help for a brief summary.

12.5.2 How to write Unit Tests

Nose provides some nice tools. Some of the tests require nose and nosetests (or a tool that mimics its behavior) However, it is also quite possible to use Python’s unittest. No tutorial here. Instead, take a look at an existing test file and do something usefully similar.

12.5. Unit Testing 87 Socorro Documentation, Release 2

12.5.3 Where to write Unit Tests

To maintain the current test layout, note that for every directory under socorro, there is a same-name directory under socorro/unittest where the test code for the working directory should be placed. In addition, there is unittest/testlib that holds a library of useful testing code as well as some tests for that library. If you add a unittest subdirectory holding new tests, you must also provide init.py which may be empty, or nosetests will not enter the directory looking for tests.

12.5.4 How to configure your test environment

• You must have a working postgresql installation see Installation for version. It need not be locally hosted, though if not, please be careful about username and password for the test user. Also be careful not to step on a working database: The test cleanup code drops tables. • You must either provide for a postgreql account with name and password that matches the config file or edit the test config file to provide an appropriate test account and password. That file is so- corro/unittest/config/commonconfig.py. If you add a new test config file that needs database access, you should import the details from commonconfig, as exemplified in the existing config files. • You must provide a a database appropriate for the test user (default: test. That database must support PLPGSQL. As the owner of the test database, while connected to that database, invoke CREATE LANGUAGE PLPGSQL; • You must have installed nose and nosetests; nosetests should be on your PATH and the nose code/egg should be on your PYTHONPATH • You must have installed the psycopg2 python module • You must adjust your PYTHONPATH to include the directory holding soccoro. E.g if you have in- stalled socorro at /home/tester/Mozilla/socorro then your PYTHONPATH should look like ...:/home/tester/Mozilla:/home/tester/Mozilla/thirdparty:...

12.6 Crash Repro Filtering Report

12.6.1 Introduction

This page describes a report that assists in analyzing crash data for a stack signature in order to try and reproduce a crash and develop a reproducible test case.

12.6.2 Details for each release pull a data set of one weeks worth of data ranked by signature like: http://crash-stats.mozilla.com/query/query?do_query=1&product=Firefox&version=Firefox%3A3.0.10&date=&range_value=7&range_unit=days&query_search=signature&query_type=contains&query= the provide a list like this with several fields of interest for examing the data Date Product Version Build OS CPU Reason Address Uptime Comments but also need to add urls into the version of this report that is behind auth. “reason” is not so helpful to me at this stage, but others can weigh in on the idea of removing it. maybe just make it include all these or allow users to pick the fields it shows like bugzilla does? Signature,Crash Address,UUIDProduct,Version,Build,OS,Time,Uptime,Last Crash,URL,User Comments anyway, get something close to what we have now in “Crash Reports in PR_MD_SEND”

88 Chapter 12. Development Discussions Socorro Documentation, Release 2 http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.0.10&query_search=signature&query_type=contains&query=&date=&range_value=7&range_unit=days&do_query=1&signature=_PR_MD_SEND next allow the report user to apply filters to build more precise queries from the set of reports.. filters might be from any of the fields or it would really cool if we could also filter from other items in the crash report like the full stack trace and/or module list: filter uptime? < 60 seconds and filter address? exactly_matches 0x187d000 and fliter url? contains mail..com or fliter url? conttains mail.yahoo.com and filter modulelist? does_not_contain "mswsock.dll 5.1.2600.3394" that last example of module list might be a stretch, but would be very valuable to check module list for existance or non-existance of binary components and their version numbers. from there we would want to see the results and export to csv to import things like url lists into page load testing systems to look for reproducible crashers.

12.7 Disk Performance Tests

12.7.1 Introduction

Any DBMS for a database which is larger than memory can be no faster than disk speed. This document outlines a series of tests for testing disk speed to determine if you have an issue. Written originally by PostgreSQL Experts Inc. for Mozilla.

12.7.2 Running Tests

Note: all of the below require you to have plenty of disk space available. And their figures are only reliable if nothing else is running on the system. Simplest Test: The DD Test This test measures the most basic single-threaded disk access: a large sequential write, followed by a large sequential read. It is relevant to database performance because it gives you a maximum speed for sequential scans for large tables. Real table scans are generally about 30% of this maximum. dd is a Unix command line utility which simply writes to a block device. We use it for this 3-step test. The other thing you need to know for this test is your RAM size. 1. We create a large file which is 2x the size of RAM, and synch it to disk. This makes sure that we get the real sustained write rate, because caching can have little effect. Since there are 125000 blocks per GB (8k blocksize is used because it’s what Postgres uses), if we had 8GB of RAM, we would run the following: time sh -c "dd if=/dev/zero of=ddfile bs=8k count=1000000 && sync"

dd will report a time and write rate to us, and “time” will report a larger time. The time and rate reported by dd represents the rate without any lag or synch time; divide the data size by the time reported by “time” for synchronous file writing rate. 2. Next we want to write another large file, this one the size of RAM, in order to flush out the FS cache so that we can read directly from disk later.: dd if=/dec/zero of=ddfile2 bs=8K count=500000

3. Now, we want to read the first file back. Since the FS cache is full from the second file, this should be 100% disk access:

12.7. Disk Performance Tests 89 Socorro Documentation, Release 2

time dd if=ddfile of=/dev/null bs=8k

This time, “time” and dd will be very close together; any difference will be strictly storage lag time.

12.7.3 Bonnie++

Bonnie++ is a more sophisticated set of tests which tests random reads and writes, as well as seeks, and file cre- ation and deletion operations. For a modern system, you want to use the last version, 1.95, downloaded from http://www.coker.com.au/bonnie++/experimental/ This final version of bonnie++ supports concurrency and measures lag time. However, it is not available in package form in most OSes, so you’ll have to compile it using g++. Again, for Mozilla we want to test performance for a database which is larger than RAM, since that’s what we have. Therefore, we’re going to run a concurrent Bonnie++ test where the total size of the files is about 150% of RAM, forcing the use of disk. We’re also going to run 8 threads to simulate concurrent file access. Our command line for a machine with 16GB RAM is: bonnie++ -d /path/to/storage -c 8 -r 16000 -n 100

The results we get back look something like this: Version 1.95 ------Sequential Output------Sequential Input- --Random- Concurrency 8 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP tm-breakpad0 32000M 757 99 71323 16 30594 5 2192 99 57555 4 262.5 13 Latency 15462us 6918ms 4933ms 11096us 706ms 241ms Version 1.95 ------Sequential Create------Random Create------tm-breakpad01-maste -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 100 44410 75 +++++ +++ 72407 81 45787 77 +++++ +++ 63167 72 Latency 9957us 477us 533us 649us 93us 552us

So, the interesting parts of this are: Sequential Output: Block: this is sequential writes like dd does. It’s 70MB/s. Sequential Input: Block: this is sequential reads from disk. It’s 57MB/s. Sequential Output: Rewrite: is reading, then writing, a file which has been flushed to disk. This rate will be lower than either of the above, and is at 30MB/s. Random: Seeks: this is how many individual blocks Bonnie can seek to per second; it’s a fast 262. Latency: this is the full round-trip lag time for the mentioned operation. On this platform, these times are catas- trophically bad; 1/4 second round-trip to return a single random block, and 3/4 seconds to return the start of a large file. The figures on file creations and deletion are generally less interesting to databases. The +++++ are for runs that were so fast the error margin makes the figures meaningless; for better figures, increase -n.

12.7.4 IOZone

Now, if you don’t think Bonnie++ told you enough, you’ll want to run Iozone. Iozone is a benchmark mostly know for creating pretty graphs (http://www.iozone.org/) of filesystem performance with different file, batch, and block sizes. However, this kind of comprehensize profiling is completely unnecessary for a DBMS, where we already know the file access pattern, and can take up to 4 days to run. So do not run Iozone in automated (-a) mode! Instead, run a limited test. This test will still take several hours to run, but will return a more limited set of relevant results. Run this on a 16GB system with 8 cores, from a directory on the storage you want to measure:

90 Chapter 12. Development Discussions Socorro Documentation, Release 2

iozone -R -i 0 -i 1 -i 2 -i 3 -i 4 -i 5 -i 8 -l 6 -u 6 -r 8k -s 4G -F f1 f2 f3 f4 f5 f6

This runs the following tests: write/read, rewrite/reread, random-read/write, read-backwards, re-write-record, stride- read, random mix. It does these tests using 6 concurrent processes, a block size of 8k (Postgres’ block size) for 4G files named f1 to f6. The aggregate size of the files is 24G, so that they won’t all fit in memory at once. In theory, the relevance of these tests to database activity is the following: write/read: basic sequential writes and reads. rewrite/reread: writes and reads of frequently accessed tables (in memory) random-read/write: index access, and writes of individual rows read-backwards: might be relevant to reverse index scans. re-write-record: frequently updated row behavior stride-read: might be relevant to bitmapscan random mix: general database access average behavior. The results you get will look like this: Children see throughput for 6 initial writers = 108042.81 KB/sec Parent sees throughput for 6 initial writers = 31770.90 KB/sec Min throughput per process = 13815.83 KB/sec Max throughput per process = 35004.07 KB/sec Avg throughput per process = 18007.13 KB/sec Min xfer = 1655408.00 KB

And so on through all the tests. These results are pretty self-explanatory, except that I have no idea what the difference between “Children see” and “Parent sees” means. Iozone documentation is next-to-nonexistant. Note: IOZone appears to have several bugs, and places where its documentation and actual features don’t match. Particularly, it appears to have locking issues in concurrent access mode for some writing activity so that concurrency throughput may be lower than actual.

12.8 Dumping Dump Tables

A work item that came out of the Socorro Postgres work week is to dump the dump tables and store cooked dumps as gzipped files. Drop dumps table convert each dumps table row to a compressed file on disk

12.8.1 Bugzilla https://bugzilla.mozilla.org/show_bug.cgi?id=484032

12.8.2 Library support

‘done’ as of 2009-05-07 in socorro.lib.dmpStorage (Coding, testing is done; integration testing is done, ‘go live’ is today) Socorro UI /report/index/{uuid} • Will stop using the dumps table. • Will start using gzipped files

12.8. Dumping Dump Tables 91 Socorro Documentation, Release 2

– Will use the report uuid to locate the dump on a file system – Will use apache mod-rewrite to serve the actual file. The rewrite rule is based on the uuid, and is ‘simple’: AABBCCDDEEFFGGHHIIJJKKLLM2090308.jsonz => AA/BB/AABBCCDDEEFFGGHHIIJJKKLLM2090308.jsonz – report/index will include a link to JSON dump link rel=’alternate’ type=’application/json’ href=’/reporter/dumps/cdaa07ae-475b-11dd- 8dfa-001cc45a2ce4.jsonz’

12.8.3 Dump file format

• Will be gzip compressed JSON encoded cooked dump files • Partial JSON file • Full JSONZ file

12.8.4 On Disk Location

application.conf dumpPath Example for kahn $config’dumpPath’? = ‘/mnt/socorro_dumps/named’; In the dumps directory we will have an .htaccess file: AddType "application/json; charset=UTF-8" jsonz AddEncoding gzip jsonz

Webhead will serve these files as: Content-Type: application/json; charset=utf-8 Content-Encoding: gzip

**Note:* You’d expect the dump files to be named json.gz, but this is broken in Safari. By setting HTTP headers and naming the file jsonz, an unknown file extension, this works across browsers.

12.8.5 Socorro UI

• Existing URL won’t change. • Second JSON request back to server will load jsonz file Example: • http://crash-stats.mozilla.com/report/index/d92ebf79-9858-450d-9868-0fe042090211 • http://crash-stats.mozilla.com/dump/d92ebf79-9858-450d-9868-0fe042090211.jsonz mod rewrite rules will match /dump/.jsonz and change them to access a file share.

12.8.6 Future Enhancement

A future enhancement if we find webheads are high CPU would be to move populating the report/index page to client side.

92 Chapter 12. Development Discussions Socorro Documentation, Release 2

12.8.7 Test Page http://people.mozilla.org/~aking/Socorro/dumpingDump/json-test.html - Uses browser to decompress a gzip com- pressed JSON file during an AJAX request, pulls it apart and appends to the page. Test file made with gzip dump.json

12.9 JSON Dump Storage

12.9.1 What this system offers

Crash data is stored so that it can be quickly located based on a Universally Unique Identifier (uuid) or visited by the date and time when reported

12.9.2 Directory Structure

The crash files are located in a tree with two branches: the name or “index” branch and the date branch. • The name branch consists of paths based on the first few pairs of characters of the uuid. Name branch holds the two data files and a relative symbolic link to the date branch directory associated with the particular uuid. For the uuid: 22adfb61-f75b-11dc-b6be-001321b0783d The “depth” is the number of sub-directories between the name directory and the actual file. By default, to conserve inodes, depth is two.

– By default, the json file is stored (depth 2) as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be- 001321b0783d.json – The json file could be stored (depth 4) as %(root)s/name/22/ad/fb/61/22adfb61-f75b-11dc-b6be- 001321b0783d.json – The dump file is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d.dump – The symbolic link is stored as %(root)s/name/22/ad/22adfb61-f75b-11dc-b6be-001321b0783d and (see below) references (own location)/%(toDateFromName)s/2008/09/30/12/05/webhead01_0/ • The date branch consists of paths based on the year, month, day, hour, minute-segment, webhead host name and a small sequence number. For each uuid, it holds a relative symbolic link referring to the actual name directory holding the data for that uuid. For the uuid above, submitted at 2008-09-30T12:05 from webhead01

– The symbolic link is stored as %(root)s/date/2008/09/30/12/05/webhead01_0/22adfb61-f75b-11dc- b6be-001321b0783d and references (own location)/%(toNameFromDate)s/22/ad/fb/61/ • Note (name layout) In the examples on this page, the name/index branch uses the first 4 characters of the uuid as two character-pairs naming subdirectories. This is a configurable setting called storageDepth in the Collector configuration. To use the 8 characters, storageDepth is set to 4. To use 6 characters, set to 3. The default storageDepth is 2 because on our system, with (approximately) 64K leaf directories, the number of files per leaf is reasonable; and the number of inodes required by directory entries is not so large as to cause undue difficulty. A storageDepth of 4 was examined, and was found to crash the file system by requiring too many inodes. • If the uuids are such that their initial few characters are well spread among all possibles, then the lookup can be very quick. If the first few characters of the uuids are not well distributed, the resulting directories may be very large. If, despite well chosen uuids, the leaf name directories become too large, it would be simple to add another level, reducing the number of files by approximately a factor of 256; however bear in mind the issue of inodes. • Note (symbolic links) The symbolic links are relative rather than absolute, to avoid issues that might arise from variously mounted nfs volumes. • Note (maxDirectoryEntries) If the number of links in a particular webhead subdirectory would exceed maxDi- rectoryEntries, then a new webhead directory is created by appending a larger _N : .../webhead01_0 first, then .../webhead01_1 etc. For the moment, maxDirectoryEntries is ignored for the name branch.

12.9. JSON Dump Storage 93 Socorro Documentation, Release 2

12.9.3 How it’s used

We use the file system storage for incoming dumps caught by Collector. There are two instances of the file system used for different purposes: standard storage and deferred storage.

12.9.4 Standard Job Storage

This is where json/dump pairs are stored for further processing. The Monitor finds new dumps and queues them for processing. It does this by walking the date branch of the file system using the API function destructiveDateWalk. As it moves through the date branch, it notes every uuid (in the form of a symbolic link) that it encounters. It queues the information from the symbolic link and then deletes the symbolic link. This insures that it only ever finds new entries. Later, the Processor will read the json/dump pair by doing a direct lookup of the uuid on the name branch. In the case of priority processing, the target uuid is looked up directly on the name branch. Then the link to the date branch is used to locate and delete the link on the date branch. This insures that a priority job is not found a second time as a new job by the Monitor.

12.9.5 Deferred Job Storage

This is where jobs go that are deferred by Monitor‘s throttling mechanism. If a json/dump pair is needed for priority processing, it can be looked up directly on the name branch. In such a case, just as with priority jobs in standard storage, we destroy the links between the two branches. However, in this case, destroying the links prevents the json/dump pair from being deleted by the deferred cleanup process. When it comes time to drop old json/dump pairs that are no longer needed within the deferred storage, the system is given a date threshold. It walks the appropriate parts of the date branch older than the threshold. It uses the links to the name branch to blow away the elderly json/dump pairs.

12.9.6 class JsonDumpStorage socorro.lib.JsonDumpStorage holds data and implements methods for creating and accessing crash files. public methods • __init__(self, root=".", maxDirectoryEntries=1024, **kwargs) Take note of our root directory, maximum allowed date->name links per directory, some relative relations, and whatever else we may need. Much of this (c|sh)ould be read from a config file. Recognized keyword args: – dateName. Default = ‘date’ – indexName. Default = ‘name’ – jsonSuffix. Default = ‘.json’. If not startswith(‘.’) then ‘.’ is prepended – dumpSuffix. Default = ‘.dump’. If not startswith(‘.’) then ‘.’ is prepended – dumpPermissions. Default 660 – dirPermissions. Default 770 – dumpGID. Default None. If None, then owned by the owner of the running script. • newEntry (self, uuid, webheadHostName=’webhead01’, timestamp=DT.datetime.now()) Sets up the name and date storage for the given uuid.

94 Chapter 12. Development Discussions Socorro Documentation, Release 2

– Creates any directories that it needs along the path to the appropriate storage location (possibly adjusting ownership and mode) – Creates two relative symbolic links:

* the date branch link pointing to the name directory holding the files; * the name branch link pointing to the date branch directory holding that link. – Returns a 2-tuple containing files open for writing: (jsonfile,dumpfile) • getJson (self, uuid) Returns an absolute pathname for the json file for a given uuid. Raises OSError if the file is missing • getDump (self, uuid) Returns an absolute pathname for the dump file for a given uuid. Raises OSError if the file is missing • markAsSeen (self,uuid) Removes the links associated with the two data files for this uuid, thus marking them as seen. Quietly returns if the uuid has no associated links. • destructiveDateWalk (self) This function is a generator that yields all(see note) uuids found by walking the date branch of the file system. Just before yielding a value, it deletes both the links (from date to name and from name to date) After visiting all the uuids in a given date branch, recursively deletes any empty subdirectories in the date branch Since the file system may be manipulated in a different thread, if no .json or .dump file is found, the links are left, and we do not yield that uuid note To avoid race conditions, does not visit the date subdirectory corresponding to the current time • remove (self, uuid) Removes all instances of the uuid from the file system including the json file, the dump file, and the two links if they still exist. – Ignores missing link, json and dump files: You may call it with bogus data, though of course you should not • move (self, uuid, newAbsolutePath) Moves the json file then the dump file to newAbsolutePath. – Removes associated symbolic links if they still exist. – Raises IOError if either the json or dump file for the uuid is not found, and retains any links, but does not roll back the json file if the dump file is not found. • removeOlderThan (self, timestamp) – Walks the date branch removing all entries strictly older than the timestamp. – Removes the corresponding entries in the name branch. member data Most of the member data are set in the constructor, a few are constants, the rest are simple calculations based on the others. • root: The directory that holds both the date and index(name) subdirectories

12.9. JSON Dump Storage 95 Socorro Documentation, Release 2

• maxDirectoryEntries: The maximum number of links in each webhead directory on the date branch. Default = 1024 • dateName: The name of the date branch subdirectory. Default = ‘date’ • indexName: The name of the index branch subdirectory. Default = ‘name’ • jsonSuffix: the suffix of the json crash file. Default = ‘.json’ • dumpSuffix: the suffix of the dump crash file. Default = ‘.dump’ • dateBranch: The full path to the date branch • nameBranch: The full path to the index branch • dumpPermissions: The permissions for the crash files. Default = 660 • dirPermissions: The permissions for the directories holding crash files. Default = 770 • dumpGID: The group ID for the directories and crash files. Default: Owned by the owner of the running script. • toNameFromDate: The relative path from a leaf of the dateBranch to the nameBranch • toDateFromName: The relative path from a leaf of the nameBranch to the dateBranch • minutesPerSlot: How many minutes in each sub-hour slot. Default = 5 • slotRange: A precalculated range of slot edges = range(self.minutesPerSlot, 60, self.minutesPerSlot)

12.10 Processed Dump Storage

Processed dumps are stored in two places: the relational database as well as in flat files within a file system. This forking of the storage scheme came from the realization that the infrequently used data within the database ‘dumps’ tables was causing performance problems within PostgreSQL. The ‘dumps’ tables took nearly eighty percent of the total storage, making replication and backup problematic. Since the ‘dumps’ table’s data is used only when a user requests a specific crash dump by uuid, most of the data is rarely, if ever, accessed. We decided to migrate these dump into a file system storage outside the database. Details can be seen at: Dumping Dump Tables In the file system, after processing, dumps are stored a gzip compressed JSON file format. This format echos a flattening of the ‘reports’, ‘extensions’ and the now deprecated ‘dumps’ tables within the database.

12.10.1 Directory Structure

Just as in the JsonDumpStorage scheme, there are two branches: ‘name’ and ‘date’

12.10.2 Access by Name

Most lookups of processed crash data happens by name. We use a radix storage technique where the first 4 characters of the file name are used for two levels of directory names. A file called aabbf9cb-395b-47e8-9600-4f20e2090331.jsonz would be found in the file system as .../aa/bb/aabbf9cb-395b-47e8-9600-4f20e2090331.jsonz

12.10.3 Access by Date

For the purposes of finding crashes that happened at specific date and time, a hierarchy of date directories offer quick lookup. The leaves of the date directories contain symbolic links to the locations of crash data.

96 Chapter 12. Development Discussions Socorro Documentation, Release 2

12.10.4 JSON File Format example:

{"signature": "nsThread::ProcessNextEvent(int, int*)", "uuid": "aabbf9cb-395b-47e8-9600-4f20e2090331", "date_processed": "2009-03-31 14:45:09.215601", "install_age": 100113, "uptime": 7, "last_crash": 95113, "product": "SomeProduct", "version": "3.5.2", "build_id": "20090223121634", "branch": "1.9.1", "os_name": "Mac OS X", "os_version": "10.5.6 9G55", "cpu_name": "x86", "cpu_info": "GenuineIntel family 6 model 15 stepping 6", "crash_reason": "EXC_BAD_ACCESS / KERN_INVALID_ADDRESS", "crash_address": "0xe9b246", "User Comments": "This thing crashed.\nHelp me Kirk.", "app_notes": "", "success": true, "truncated": false, "processor_notes": "", "distributor":"", "distributor_version": "", "add-ons": [["{ABDE892B-13A8-4d1b-88E6-365A6E755758}", "1.0"], ["{b2e293ee-fd7e-4c71-a714-5f4750d8d7b7}", "2.2.0.9"], ["{972ce4c6-7e08-4474-a285-3208198ce6fd}", "3.5.2"]], "dump":"OS|Mac OS X|10.5.6 9G55\\nCPU|x86|GenuineIntel family 6 model 15 stepping 6|2\\nCrash|EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE|0x1558c095|0\\n Module|firefox-bin||firefox-bin|988FA8BFC789C4C07C32D61867BB42B60|0x00001000|0x00001fff|\\n..... "}

The “dump” component is the direct streamed output from the Breakpad “minidump_stackwalk” program. Unfortu- nately, that project does not give detailed documentation of the format.

12.11 Report Database Design

12.11.1 Introduction

With the launch of [[MeanTimeBeforeFailure]] and Top Crashers By URL reports, we have added 8 new database tables. The call into the following categories: • configuration – mtbfconfig – tcbyurlconfig • facts – mtbffacts – topcrashurlfacts • dimensions – productdims – urldims

12.11. Report Database Design 97 Socorro Documentation, Release 2

– signaturedims • relational – topcrashurlfactsreports What relational? Aren’t they all?

12.11.2 Star Schema

Taking inspiration from data warehousing, we implement the datastore with dimensional modeling instead of rela- tional modeling. The pattern used star schemas. Our implementation is a very lightweight approach as we don’t automatically generate facts for every combination of dimensions. This is not a Pentaho competitor :) Star schemas are optimized for: • read only systems • large amounts of data • viewed from different levels of granularity

12.11.3 Pattern

The dimensions and facts are the heart of the pattern. dimensions Each dimension is property with various attributes and values at different levels of granularity. Example: urldims - table would have columns: id domain url

Sample values 1. en-us.www.mozilla.com, ALL 2. http://en-us.www.mozilla.com/en-US/firefox/3.0.5/whatsnew/ 3. en-us.www.mozilla.com, http://en-us.www.mozilla.com/en-US/firefox/features/ We see a dimension that describes the property “url”. This is useful for talking about crashes that happen on a specific url. We also see two levels of granularity, a specific URL as well as all urls under a domain. Dimensions give us ways to slice and dice aggregate crash data, then drill down or rollup this information. Note: time could be a dimension ( and usually is in data warehouses ). For MTBF and Top Crash By URl we don’t treat it as a 1st class dimension as their are no requirements to roll it up ( say to Q1 crashes, etc) and having it be a column in the facts table provides better performance. facts For a given report it will be powered by a main facts table. Example: topcrashurlfacts - table would have the columns: id count rank day

98 Chapter 12. Development Discussions Socorro Documentation, Release 2

productdims_id urldims_id signaturedims_id

A top crashers by url fact has two key elements, an aggregate crash count and the rank respective to others facts. So if we have static values for all dimensions and day, then we can see who has the most crashes. Reporting The general pattern of creating a report is for a series of static and 1 or two variable dimensions, display the facts that meet this criteria.

12.12 Code and Database Update

12.12.1 Socorro Wish List

One of my (griswolf) directives is approximately “make everything work efficiently and the same.” Toward this end, there are several tasks: Probably most important, we have an inefficient database design, and some inefficient code working with it. Next, we have a collection of ‘one-off’ code (and database schemas) that could be more easily maintained using a common infrastructure, common coding conventions, common schema layout, common patterns. Finally, we have enhancement requests that would become more feasible after such changes: Such requests would be more easily handled in a cleaner programming environment; and in a cleaner environment there might be fewer significant bugs, leaving more time to work on enhancements. Current state: See [[SocorroDatabaseSchema]]

12.12.2 Another Way to do Materialized Views?

The current system is somewhere between ad hoc reporting and a star architecture. The main part of this proposal focuses on converting further toward a star architecture. However there may be another way: MapReduce techniques, which could possibly be run external to Mozilla (for instance: Amazon Web Services) could be used to mine dump files and create statistical data stored in files or database. Lars mentioned to me that we now have some statistics folk on board who are interested in this.

12.12.3 Database Design

• There are some legacy tables (reports, topcrasher) that are not normalized. Other tables are partly normalized. Non-normal has consequences:

– Data is duplicated, causing possible synchronization issues.

* JOSH: duplicated data is normal for materialized views and is not a problem a priori. – Data is duplicated, increasing size.

* JOSH: I don’t believe that the matview tables are that large, although we will want to look at partitioning them in the future because they will continue to grow.

* FRANK: Lars points out that size-limiting partitions which reference each other must all be partitioned on the same key. This makes partitions a little more interesting

12.12. Code and Database Update 99 Socorro Documentation, Release 2

– SELECT statements on multiple varchar fields, even when indexed, are probably slower than SELECT statements on a single foreign key. (And even if not, maintaining larger index tables has a time and space cost) • There are legacy tables that contain deprecated columns, a slight inefficiency. • In some cases, separable details are conflated, making it difficult to access by a single area of concern. For instance, the table that describes our products has an os_name column, requiring us to pretend we deal with an os named ‘ALL’ in order to examine product data without regard to os. • According to Postgresql consultants, some types are not as efficient as others. Example TEXT (which we use only a little) is slightly more time-efficient than VARCHAR(n) (which we mostly use)

– JOSH: this is a minor issue, and should only be changed if we’re modifying the fields/tables anyway. – FRANK: We have already run into a size limitation for signatures which are now VARCHAR(255). Experiment shows that conversion to TEXT is slow because of index rebuilding, but conversion to VARCHAR(BIGGER_NUMBER) can be done by manipulating typemod (the number of chars in VARCHAR) in the system tables. So change from VARCHAR to TEXT needs to be scheduled in advance, with an expected ‘long’ turn around. • Current indexes were carefully audited during PGExperts week. Schema changes will require careful reevalua- tion

12.12.4 Commonality

• Some of the tables that provide statistics (Mean Time Before Failure, for example) use a variant of the “Star” data warehousing pattern, which is well known and understood. Some do not. After discussion we have reached agreement that all should be partly ‘starred’

– osdims and productdims are appropriate dimension tables for each view that cares about operating system or product – url and signature ‘dimension’ tables are used to filter materialized views:

* the ‘fact’ tables for views will use ids from these filter/dimension tables * the filter/dimension tables will hold only data that has passed a particular frequency threshold, initial guess at threshold: 3 per week. • Python code has been written by a variety of people with various skill levels, doing things in a variety of ways. Mostly, this is acceptable, but required changes give us an opportunity. • We now specify Python version 2.4, which is adequate. Possible to upgrade to 2.5.x or 2.6.x with both ease and safety. This is an opportunity to do so. No code needs to change for this. • New features (safely) available in Python 2.5: – unified try/except/finally: instead of a try/finally block holding a try/except block – there is a very nice with: syntax useful for block-scoped non GC’d resources such as open files (like try: with an automatic finally: at block end) – generators are significantly more powerful, which might have some uses in our code – and lots more that seems less obviously useful to Socorro – better exception hierarchy • New features (safely) available in Python 2.6 – json library ships with Python 2.6 – multiprocessing library parallel to threading library ships with Python 2.6

100 Chapter 12. Development Discussions Socorro Documentation, Release 2

– Command line option ‘-3’ flags things that will work differently or fail in Python 3 (looking ahead is good) • We use nosetests which is not correctly and fully functional in a Python 2.4 environment.

12.12.5 Viewable Interface

• We have been gradually providing a more useful view of the crash data. Sometimes this is intrinsically hard, sometimes it is made more difficult by our schema. • We have requests for: – Better linkage between crash reports and bugs – Ability to view by OS and OS version, by signature, by product, by product version (some of this will be easier with a new schema) – Ability to view historical data, current data, (sliding) windows of data and trends • Some of the requests seem likely to be too time or space costly. In some cases these might be feasible with a more efficient system

12.12.6 Consequences of Possible Changes

• (Only) Add new tables (two kinds of changes) – “replace in place”, for instance add table reports_normal while leaving table reports in place) – “brand new”, for instance add new productdims and osdims tables to serve a new tobcrashbysignature table – Existing views are not impacted (for good or ill) – Duplication of data (some tables near normal form, some not, etc) becomes worse than it now is – No immediate need to migrate data: Options

* Maybe provide two views: “Historic” and “Current” * Maybe write ‘orrible look-both-ways code to access both tables from single view * Maybe migrate data – Code that looks at old schema is (mostly?) unchanged – Code that looks at new schema is opportunity for improved design, etc. – Can do one thing at a time, with multiple ‘easy’ rollouts (each one is still a rollout, though) – Long term goal: Stop using old tables and code • (Only) Drop redundant or deprecated columns in existing tables: – Existing views are no less useful, Viewer and Controller code will need some maintenance – Data migration is ‘simple’

* beware that dropped columns may be part of a (foreign) key or index – Data migration is needed at rollout – Minimally useful • Optimize database types, indexes, keys:

12.12. Code and Database Update 101 Socorro Documentation, Release 2

– Existing views are not much impacted

* May want to optimize queries in Viewer and Controller code * May need to guard for field size or type in Controller code – Details of changes are ‘picky’ and may need some hand holding by consultants, maybe testing. • Normalize existing tables (while adding new tables as needed): – Much existing code needs re-write

* With different Model comes a need for different Viewers and Controllers * Opportunity to clarify old code * Opportunity to optimize queries – Data migration is needed at rollout – Rollout is complex (but need only one for complete conversion) – JOSH: in general, Matview generation should be optimized to be insert-only. In some cases, this will involve having a “current week” partition which gets dropped and recreated until the current week is completed. Updates are generally at least 4x as expensive as inserts.

12.12.7 Rough plan as of 2009 June

• Soon: Materialized views will make use of dimensions and ‘filtered dimensions’ tables • Later: Normalize the ‘raw’ data to make use of tables describing operating system and product details. Leave signatures and urls raw

12.12.8 Specific Database Changes

Star Data Warhousing Existing tables • –dimension: signaturedims: associate the base crash signature string with an id– Use signature TEXT directly • dimension: productdims: associate a product, version, release and os_name with an id – os_name is neither sufficient for os drill-down (which wants os_version) nor properly part of a product dimension • dimension: urldims: associate (a large number of) domains and urls, each pair with an id • config: mtbfconfig: specifies the date-interval during which a given product (productdims) is of interest for MTBF analysis • config: tcbyurlconfig: specifies whether a particular product (productdims) is now of interest for Top Crash by URL analysis. • fact: mtbffacts: collects daily summary of average time before failure for each product • –report: topcrashurlfactsreports: associates a crash uuid and a comment with a row of topcrashurlfacts ?Appar- ently never used?– Needed/Changed tables Matview changes “Soon”

102 Chapter 12. Development Discussions Socorro Documentation, Release 2

• config (new): product_visibility: Specifies date interval during which a product (productdims id) is of interest for any view. ?Replaces mtbfconfig? • dimension (new): osdims: associate an os name and os version with an id • dimension (edit): productdims: remove the os_name column (replaced by another dimension osdims above) • fact (replace): topcrashers: The table now in use to provide Top Crash by Signature view. Will be replaced by topcrashfacts • fact (new): topcrashfacts: collect periodic count of crashes, average uptime before crash and rank of each signature by signature, os, product

– replaces existing topcrashers table which is poorly organized for current needs • config (new): tcbysignatureconfig: specify which products and operating systems are currently of interest for tcbysigfacts • fact: (renamed, edit) top_crashes_by_url: collects daily summary of crashes by product, url (productdims, urldims) • fact: (new): top_crashes_by_url_signature: associates a given row from top_crashes_by_url with one or more signatures Incoming (raw) changes “Later” • details (new): osdetails, parallel to osdims, but on the incoming side will be implemented later • details (new): productdetails, parallel to productdims, but on the incoming side will be implemented later • reports: Holds details of each analyzed crash report. It is not in normal form, which causes some ongoing difficulty

– columns product, version, build should be replaced by productdetails foreign key later – column signature LARS: NULL is a legal value here. We’ll have to make sure that we use left outer joins to retrieve the report records. – columns cpu_name, cpu_info are not currently in use in any other table, but could be a foreign key into cpudims – columns os_name, os_version should be replaced by osdims foreign key – columns email, user_id are deprecated and should be dropped Details New or significantly changed tables New product_visibility table (soon, matview): table product_visibility ( id serial NOT NULL PRIMARY KEY, productdims_id integer not null, start_date timestamp, -- used by MTBF end_date timestamp, ignore boolean default False -- force aggregation off for this product id

New osdims table (soon, matview) NOTE: Data available only if ‘recently frequent’: table osdims( id serial NOT NULL PRIMARY KEY, os_name TEXT NOT NULL, os_version TEXT); constraint osdims_key (os_name, os_version) unique (os_name, os_version);

12.12. Code and Database Update 103 Socorro Documentation, Release 2

Edited productdims table (soon, matview) NOTE: use case for adding products is under discussion: CREATE TYPE release_enum AS ENUM (’major’, ’milestone’, ’development’); table productdims ( id serial NOT NULL PRIMARY KEY, product TEXT NOT NULL, version TEXT NOT NULL, release release_enum NOT NULL, constraint productdims_key (product, version) unique ( product, version ) );

New product_details table (later, raw data) NOTE: All data will be stored (raw data should not lose details): table product_details ( id serial NOT NULL PRIMARY KEY, product TEXT NOT NULL, -- /was/ character varying(30) version TEXT NOT NULL, -- /was/ character varying(16) release release_enum NOT NULL -- /was/ character varying(50) NOT NULL );

Edit mtbffacts to use edited productdims and new osdims (soon, matview): table mtbffacts ( id serial NOT NULL PRIMARY KEY, avg_seconds integer NOT NULL, report_count integer NOT NULL, window_end timestamp, -- was DATE productdims_id integer, osdims_id integer constraint mtbffacts_key unique ( productdims_id, osdims_id, day ); );

New top_crashes_by_signature table (soon, matview): table top_crashes_by_signature ( id serial NOT NULL PRIMARY KEY, count integer NOT NULL DEFAULT 0, average_uptime real DEFAULT 0.0, window_end timestamp without time zone, window_size interval, productdims_id integer NOT NULL, -- foreign key. NOTE: Filtered by recent frequency osdims_id integer NOT NULL, -- foreign key. NOTE: Filtered by recent frequency signature TEXT constraint top_crash_by_signature_key (window_end, signature, productdims_id, osdims_id) unique (window_end, signature, productdims_id, osdims_id) ); -- some INDEXes are surely needed --

New/Renamed top_crashes_by_url table (soon, matview): table top_crashes_by_url ( id serial NOT NULL, count integer NOT NULL, window_end timestamp without time zone NOT NULL, window_size interval not null, productdims_id integer, osdims_id integer NOT NULL, urldims_id integer constraint top_crashes_by_url_key (uridims_id,osdims_id,productdims_id, window_end) unique (uridims_id,osdims_id,productdims_id, window_end) );

New top_crashes_by_url_signature (soon, matview):

104 Chapter 12. Development Discussions Socorro Documentation, Release 2

table top_crash_by_url_signature ( top_crashes_by_url_id integer, -- foreign key count integer NOT NULL, signature TEXT NOT NULL constraint top_crashes_by_url_signature_key (top_crashes_by_url_id,signature) unique (top_crashes_by_url_id,signature) );

New crash_reports table (later, raw view) Replaces reports table: table crash_reports ( id serial NOT NULL PRIMARY KEY, uuid TEXT NOT NULL -- /was/ character varying(50) client_crash_date timestamp with time zone, install_age integer, last_crash integer, uptime integer, cpu_name TEXT, -- /was/ character varying(100), cpu_info TEXT, -- /was/ character varying(100), reason TEXT, -- /was/ character varying(255), address TEXT, -- /was/ character varying(20), build_date timestamp without time zone, started_datetime timestamp without time zone, completed_datetime timestamp without time zone, date_processed timestamp without time zone, success boolean, truncated boolean, processor_notes TEXT, user_comments TEXT, -- /was/ character varying(1024), app_notes TEXT, -- /was/ character varying(1024), distributor TEXT, -- /was/ character varying(20), distributor_version TEXT, -- /was/ character varying(20) signature TEXT, productdims_id INTEGER, -- /new/ foreign key NOTE Filtered by recent frequency osdims_id INTEGER, -- /new/ foreign key NOTE Filtered by recent frequency urldims_id INTEGER -- /new/ foreign key NOTE Filtered by recent frequency -- /remove - see productdims_id/ - product character varying(30), -- /remove - see productdims_id/ version character varying(16), -- /remove - redundant with build_date/ -- build character varying(30), -- /remove - see urldims_id/ url character varying(255), -- /remove - see osdims_id/ os_name character varying(100), -- /remove - see osdims_id/ os_version character varying(100), -- /remove - deprecated/ email character varying(100), -- /remove - deprecated/ user_id character varying(50), ); -- This is a partitioned table: INDEXes are provided on date-based partitions

Tables with Minor Changes: varchar->text: table branches ( product TEXT NOT NULL, -- /was/ character varying(30) version TEXT NOT NULL, -- /was/ character varying(16) branch TEXT NOT NULL, -- /was/ character varying(24) PRIMARY KEY (product, version) table extensions ( report_id integer NOT NULL, -- foreign key date_processed timestamp without time zone, extension_key integer NOT NULL, extension_id TEXT NOT NULL, -- /was/ character varying(100)

12.12. Code and Database Update 105 Socorro Documentation, Release 2

extension_version TEXT -- /was/ character varying(16) table frames ( report_id integer NOT NULL, date_processed timestamp without time zone, frame_num INTEGER NOT NULL, signature TEXT -- /was/ varchar(255) ); table priority_jobs uuid TEXT NOT NULL PRIMARY KEY -- /was/ varchar(255) table processors ( id serial NOT NULL PRIMARY KEY, name TEXT NOT NULL UNIQUE, -- /was/ varchar(255) startdatetime timestamp without time zone NOT NULL, lastseendatetime timestamp without time zone ); table jobs ( id serial NOT NULL PRIMARY KEY, pathname TEXT NOT NULL, -- /was/ character varying(1024) uuid TEXT NOT NULL UNIQUE, -- /was/ varchar(50) owner integer, priority integer DEFAULT 0, queueddatetime timestamp without time zone, starteddatetime timestamp without time zone, completeddatetime timestamp without time zone, success boolean, message TEXT, FOREIGN KEY (owner) REFERENCES processors (id) ); table urldims ( id serial NOT NULL PRIMARY KEY, domain TEXT NOT NULL, -- /was/ character varying(255) url TEXT NOT NULL -- /was/ character varying(255) key url -- for drilling by url key domain -- for drilling by domain ); table topcrashurlfactsreports ( id serial NOT NULL PRIMARY KEY, uuid TEXT NOT NULL, -- /was/ character varying(50) comments TEXT, -- /was/ character varying(500) topcrashurlfacts_id integer );

12.13 Out-of-Date Data Warning

While portions of this doc are still relevant and interesting for current socorro usage, be aware that it is extremely out of date when compared to current schema.

106 Chapter 12. Development Discussions Socorro Documentation, Release 2

12.14 Database Schema

12.14.1 Introduction

Socorro is married to the PostgreSQL database: It makes use of a significant number of PostrgeSQL and psycopg2 (python) features and extensions. Making a database-neutral API has been explored, and for now is not being pursued. The tables can be divided into three major categories: crash data, aggregate reporting and process control.

12.14.2 crash data

12.14.3 reports

This table participates in DatabasePartitioning Holds a lot of data about each crash report: Table "reports" Column | Type | Modifiers | Description ------+------+------+------id | integer | not null serial | unique id client_crash_date | timestamp with time zone | | as reported by client date_processed | timestamp without time zone | | when entered into jobs table uuid | character varying(50) | not null | unique tag for job product | character varying(30) | | name of product ("Firefox") version | character varying(16) | | version of product("3.0.6") build | character varying(30) | | build of product ("2009041522") signature | character varying(255) | | signature of ’top’ frame of crash url | character varying(255) | | associated with crash install_age | integer | | in seconds since installed last_crash | integer | | in seconds since last crash uptime | integer | | in seconds since recent start cpu_name | character varying(100) | | as reported by client ("x86") cpu_info | character varying(100) | | as reported by client ("GenuineIntel family 15 model 4 stepping 1") reason | character varying(255) | | as reported by client address | character varying(20) | | memory address os_name | character varying(100) | | name of os ("Windows NT") os_version | character varying(100) | | version of os ("5.1.2600 Service Pack 3") email | character varying(100) | | -- deprecated build_date | timestamp without time zone | | product build date (column build has same info, different format) user_id | character varying(50) | | -- deprecated started_datetime | timestamp without time zone | | when processor starts processing report completed_datetime | timestamp without time zone | | when processor finishes processing report success | boolean | | whether finish was good truncated | boolean | | whether some dump data was removed processor_notes | text | | error messages during monitor processing of report user_comments | character varying(1024) | | if any, by user app_notes | character varying(1024) | | arbitrary, sent by client (exception detail, etc) distributor | character varying(20) | | future use: "Linux distro" distributor_version | character varying(20) | | future use: "Linux distro version"

Partitioned Child Table Indexes: "reports_aDate_pkey" PRIMARY KEY, btree (id) "reports_aDate_unique_uuid" UNIQUE, btree (uuid) "reports_aDate_date_processed_key" btree (date_processed)

12.14. Database Schema 107 Socorro Documentation, Release 2

"reports_aDate_product_version_key" btree (product, version) "reports_aDate_signature_date_processed_key" btree (signature, date_processed) "reports_aDate_signature_key" btree (signature) "reports_aDate_url_key" btree (url) "reports_aDate_uuid_key" btree (uuid) Check constraints: "reports_aDate_date_check" CHECK (aDate::timestamp without time zone <= date_processed AND date_processed < aDate+WEEK::timestamp without time zone) Inherits: reports

12.14.4 dumps

This table is deprecated (dump data is stored in the file system) see [[DumpingDumpTables]] for more information.

12.14.5 branches

This table has been replaced by a view of productdims: CREATE VIEW branches AS SELECT product,version,branch FROM productdims;

12.14.6 extensions

This table participates in [[DatabasePartitioning]]. Holds data about what extensions are associated with a given report: Table "extensions" Column | Type | Modifiers | Description ------+------+------+------report_id | integer | not null | in child: foreign key reference to child of table ’reports’ date_processed | timestamp without time zone | | set to time when the row is inserted extension_key | integer | not null | the name of this extension extension_id | character varying(100) | not null | the id of this extension extension_version | character varying(30) | | the version of this extension

Partitioned Child Table Indexes: "extensions_aDate_pkey" PRIMARY KEY, btree (report_id) "extensions_aDate_report_id_date_key" btree (report_id, date_processed) Check constraints: "extensions_aDate_date_check" CHECK (’aDate’::timestamp without time zone <= date_processed AND date_processed < ’aDate+WEEK’::timestamp without time zone) Foreign-key constraints: "extensions_aDate_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports_aDate(id) ON DELETE CASCADE Inherits: extensions

12.14.7 frames

This table participates in [[DatabasePartitioning]] Holds data about the frames in the dump associated with a particular report: Table "frames" Column | Type | Modifiers | Description ------+------+------+------report_id | integer | not null | in child: foreign key reference to child of table reports

108 Chapter 12. Development Discussions Socorro Documentation, Release 2

date_processed | timestamp without time zone | | set to time when the row is inserted (?) frame_num | integer | not null | ordinal: one row per stack-frame per report, from 0=top signature | character varying(255) | | signature as returned by minidump_stackwalk

Partitioned Child Table Indexes: "frames_aDate_pkey" PRIMARY KEY, btree (report_id, frame_num) "frames_aDate_report_id_date_key" btree (report_id, date_processed) Check constraints: "frames_aDate_date_check" CHECK (’aDate’::timestamp without time zone <= date_processed AND date_processed < ’aDate+WEEK’::timestamp without time zone) Foreign-key constraints: "frames_aDate_report_id_fkey" FOREIGN KEY (report_id) REFERENCES reports_aDate(id) ON DELETE CASCADE Inherits: frames

Aggregate Reporting ======

.. image:: SocorroSchema.Aggregate.20090722.png

12.14.8 productdims

Dimension table that describes the product, version, gecko version (‘branch’) and type of release. Note that the release string is completely determined by the version string: A version like ‘X.Y.Z’ is ‘major’. A version with suffix ‘pre’ is ‘development’ and a version with ‘a’ or ‘b’ (alpha or beta) is ‘milestone’. Note: current version does not conflate os details (see osdims): Table productdims Column | Type | Modifiers | Description ------+------+------+------id | integer | (serial) | product | text | not null | version | text | not null | branch | text | not null | gecko version release | release_enum | | ’major’, ’milestone’, ’development’ Indexes: "productdims_pkey1" PRIMARY KEY, btree (id) "productdims_product_version_key" UNIQUE, btree (product, version) "productdims_release_key" btree (release)

12.14.9 osdims

Dimension table that describes an operating system name and version. Because there are so many very similar Linux versions, the data saved here is simplified which allows many different ‘detailed version’ Linuxen to share the same row in this table.: Table osdims Column | Type | Modifiers | Description ------+------+------+------id | integer | (serial) | os_name | character varying(100) | | os_version | character varying(100) | | Indexes: "osdims_pkey" PRIMARY KEY, btree (id) "osdims_name_version_key" btree (os_name, os_version)

12.14. Database Schema 109 Socorro Documentation, Release 2

12.14.10 product_visibility

Specifies the date-interval during which a given product (productdims_id is the foreign key) is of interest for aggregate analysis. MTBF obeys start_date, but calculates its own end date as 60 days later. Top crash by (url|signature) tables obey both start_date and end_date. Column ignore is a boolean, default False, which allows a product version to be quickly turned off. Note: Supersedes mtbfconfig and tcbyurlconfig. (MTBF is not now in use): Table product_visibility Column | Type | Modifiers | Description ------+------+------+------productdims_id | integer | not null | start_date | timestamp without time zone | | end_date | timestamp without time zone | | ignore | boolean | default false | Indexes: "product_visibility_pkey" PRIMARY KEY, btree (productdims_id) "product_visibility_end_date" btree (end_date) "product_visibility_start_date" btree (start_date) Foreign-key constraints: "product_visibility_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE

12.14.11 time_before_failure

Collects daily summary of average (mean) time before failure for each product of interest without regard to specific signature.: Table time_before_failure Column | Type | Modifiers | Description ------+------+------+------id | integer | (serial) | sum_uptime_seconds | double precision | not null | report_count | integer | not null | productdims_id | integer | | osdims_id | integer | | window_end | timestamp without time zone | not null | window_size | interval | not null | Indexes: "time_before_failure_pkey" PRIMARY KEY, btree (id) "time_before_failure_os_id_key" btree (osdims_id) "time_before_failure_product_id_key" btree (productdims_id) "time_before_failure_window_end_window_size_key" btree (window_end, window_size) Foreign-key constraints: "time_before_failure_osdims_id_fkey" FOREIGN KEY (osdims_id) REFERENCES osdims(id) ON DELETE CASCADE "time_before_failure_productdims_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE

12.14.12 top_crashes_by_signature

The “fact” table that associates signatures with crash statistics: Table top_crashes_by_signature Column | Type | Modifiers | Description ------+------+------+------id | integer | (serial) | count | integer | not null default 0 | uptime | real | default 0.0 | signature | text | |

110 Chapter 12. Development Discussions Socorro Documentation, Release 2

productdims_id | integer | | osdims_id | integer | | window_end | timestamp without time zone | not null | window_size | interval | not null | Indexes: "top_crashes_by_signature_pkey" PRIMARY KEY, btree (id) "top_crashes_by_signature_osdims_key" btree (osdims_id) "top_crashes_by_signature_productdims_key" btree (productdims_id) "top_crashes_by_signature_signature_key" btree (signature) "top_crashes_by_signature_window_end_idx" btree (window_end DESC) Foreign-key constraints: "osdims_id_fkey" FOREIGN KEY (osdims_id) REFERENCES osdims(id) ON DELETE CASCADE "productdims_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE

12.14.13 urldims

A dimensions table that associates an url and its domain with a particular id. For example, given full url http://www.whatever.com/some/path?foo=bar&goo=car the domain is the host name: www.whatever.com the url is everything before the query part: http://www.whatever.com/some/path: Table "urldims" Column | Type | Modifiers | Description ------+------+------+------id | integer | not null serial | unique id domain | character varying(255) | not null | the hostname url | character varying(255) | not null | the url up to query Indexes: "urldims_pkey" PRIMARY KEY, btree (id) "urldims_url_domain_key" UNIQUE, btree (url, domain)

12.14.14 top_crashes_by_url

The “fact” table that associates urls with crash statistics: Table top_crashes_by_url Column | Type | Modifiers | Description ------+------+------+------id | integer | (serial) | count | integer | not null | urldims_id | integer | | productdims_id | integer | | osdims_id | integer | | window_end | timestamp without time zone | not null | window_size | interval | not null | Indexes: "top_crashes_by_url_pkey" PRIMARY KEY, btree (id) "top_crashes_by_url_count_key" btree (count) "top_crashes_by_url_osdims_key" btree (osdims_id) "top_crashes_by_url_productdims_key" btree (productdims_id) "top_crashes_by_url_urldims_key" btree (urldims_id) "top_crashes_by_url_window_end_window_size_key" btree (window_end, window_size) Foreign-key constraints:

12.14. Database Schema 111 Socorro Documentation, Release 2

"top_crashes_by_url_osdims_id_fkey" FOREIGN KEY (osdims_id) REFERENCES osdims(id) ON DELETE CASCADE "top_crashes_by_url_productdims_id_fkey" FOREIGN KEY (productdims_id) REFERENCES productdims(id) ON DELETE CASCADE "top_crashes_by_url_urldims_id_fkey" FOREIGN KEY (urldims_id) REFERENCES urldims(id) ON DELETE CASCADE

12.14.15 top_crashes_by_url_signature

Associates count of each signature with a row in top_crashes_by_url table: Table top_crashes_by_url_signature Column | Type | Modifiers | Description ------+------+------+------top_crashes_by_url_id | integer | not null | signature | text | not null | count | integer | not null | Indexes: "top_crashes_by_url_signature_pkey" PRIMARY KEY, btree (top_crashes_by_url_id, signature) Foreign-key constraints: "top_crashes_by_url_signature_fkey" FOREIGN KEY (top_crashes_by_url_id) REFERENCES top_crashes_by_url(id) ON DELETE CASCADE

12.14.16 topcrashurlfactsreports

Associates a job uuid with comments and a row in the topcrashurlfacts table.: Table "topcrashurlfactsreports" Column | Type | Modifiers | Description ------+------+------+------id | integer | not null serial | unique id uuid | character varying(50) | not null | job uuid string comments | character varying(500) | | ?programmer provided? topcrashurlfacts_id | integer | | crash statistics for a product,os,url,signature and day Indexes: "topcrashurlfactsreports_pkey" PRIMARY KEY, btree (id) "topcrashurlfactsreports_topcrashurlfacts_id_key" btree (topcrashurlfacts_id) Foreign-key constraints: "topcrashurlfactsreports_topcrashurlfacts_id_fkey" FOREIGN KEY (topcrashurlfacts_id) REFERENCES topcrashurlfacts(id) ON DELETE CASCADE

12.14.17 alexa_topsites

Stores a weekly dump of the top 1,000 sites as measured by Alexa (csv): Table "public.alexa_topsites" Column | Type | Modifiers ------+------+------domain | text | not null rank | integer | default 10000 last_updated | timestamp without time zone | not null default now() Indexes: "alexa_topsites_pkey" PRIMARY KEY, btree (domain)

112 Chapter 12. Development Discussions Socorro Documentation, Release 2

12.15 Package

The applications that run the Server are written in Python. The source code for these packages is collected into a single package. There is no current installation script for this package. It just must be available somewhere on the PYTHONPATH.

12.15.1 Package Layout

• .../scripts : for socorro applications • .../scripts/config : configuration for socorro applications • .../socorro : python package root • .../socorro/collector : modules used by the collector application • .../socorro/cron : modules used by various applications intended to run by cron • .../socorro/database : modules associated with the relational database • .../socorro/deferredcleanup : modules used by the deferred file system cleanup script • .../socorro/integrationtest : for future use • .../socorro/lib : common modules used throughout the system • .../socorro/monitor : modules used by the monitor application • .../socorro/processor : modules used by the processor application • .../socorro/unittest : testing framework modules

12.16 Schema

(See bottom of page for inline graphic)

12.17 Tables used primarily when processing Jobs

Reports (Partitioned) Reports table contains the ‘cooked’ data received from breakpad and abstracted. Data from this table is further trans- formed into ‘materialized views’ (see below). Reports is unchanged from prior version.: CREATE TABLE reports ( id serial NOT NULL PRIMARY KEY, client_crash_date timestamp with time zone, date_processed timestamp without time zone, uuid character varying(50) NOT NULL UNIQUE, product character varying(30), version character varying(16), build character varying(30), signature character varying(255), url character varying(255), install_age integer, last_crash integer, uptime integer,

12.15. Package 113 Socorro Documentation, Release 2

cpu_name character varying(100), cpu_info character varying(100), reason character varying(255), address character varying(20), os_name character varying(100), os_version character varying(100), email character varying(100), -- Now always NULL or empty build_date timestamp without time zone, user_id character varying(50), -- Now always NULL or empty started_datetime timestamp without time zone, completed_datetime timestamp without time zone, success boolean, truncated boolean, processor_notes text, user_comments character varying(1024), app_notes character varying(1024), distributor character varying(20), distributor_version character varying(20) ); Indices are on child/partition tables, not base table index: date_processed index: uuid index: signature index: url index: (product,version) index: (uuid, date_processed) index: (signature, date_processed)

Processors Processors table keeps track of the current state of the processor that pull things out of the file system and into the reports database. Processors is unchanged from prior version.: CREATE TABLE processors ( id serial NOT NULL PRIMARY KEY, name varchar(255) NOT NULL UNIQUE, startdatetime timestamp without time zone NOT NULL, lastseendatetime timestamp without time zone );

Jobs Jobs table holds data about jobs that are queued for the processors to handle. Jobs is unchanged from prior version.: CREATE TABLE jobs ( id serial NOT NULL PRIMARY KEY, pathname character varying(1024) NOT NULL, uuid varchar(50) NOT NULL UNIQUE, owner integer, priority integer DEFAULT 0, queueddatetime timestamp without time zone, starteddatetime timestamp without time zone, completeddatetime timestamp without time zone, success boolean, message text, FOREIGN KEY (owner) REFERENCES processors (id) on delete cascade ); index: owner index: (owner, starteddatetime)

114 Chapter 12. Development Discussions Socorro Documentation, Release 2

index (completeddatetime, priority DESC)

Priority Jobs Priority Jobs table is used to mark rows in the jobs table that need to be processed soon. Priority Jobs is unchanged from prior versions.: CREATE TABLE priortyjobs ( uuid varchar(255) NOT NULL PRIMARY KEY );

12.18 Tables primarily used during data extraction

Branches Branches table associates a product and version with with the gecko version (called ‘branch’): CREATE TABLE branches ( product character varying(30) NOT NULL, version character varying(16) NOT NULL, branch character varying(24) NOT NULL );

Extensions (Partitioned) Extensions table associates a report with the extensions on the crashing application. Extensions is unchanged from prior version. (Not now in use): CREATE TABLE extensions ( report_id integer NOT NULL, -- Foreign key references parallel reports partition(id) date_processed timestamp without time zone, extension_key integer NOT NULL, extension_id character varying(100) NOT NULL, extension_version character varying(16), FOREIGN KEY (report_id) REFERENCES reports_(id) on delete cascade ); Index is on child/partition tables, not base table index: (report_id,date_processed)

Frames (Partitioned) Frames table associates a report with the stack frames and their signatures that were seen in the crashing application. Frames is unchanged from prior version.: CREATE TABLE frames ( report_id integer NOT NULL, date_processed timestamp without time zone, frame_num integer NOT NULL, signature varchar(255) FOREIGN KEY (report_id) REFERENCES reports_(id) on delete cascade ); Index is on child/partition tables, not base table index: (report_id,date_processed)

Plugins Electrolysis support for out of process plugin crashes:

12.18. Tables primarily used during data extraction 115 Socorro Documentation, Release 2

CREATE TABLE plugins (

id serial NOT NULL PRIMARY KEY, filename TEXT NOT NULL, name TEXT NOT NULL, CONSTRAINT filename_name_key UNIQUE (filename, name) )

Plugins_Reports? (Partitioned) Records oopp details. a report has 0 or 1 entry in this table.: CREATE TABLE plugins_reports ( report_id INTEGER NOT NULL, plugin_id INTEGER NOT NULL, date_processed TIMESTAMP WITHOUT TIME ZONE, version TEXT NOT NULL )

Indices are on child/partition tables, not base table. Setup via schema.py Example for plugins_reports_20100125: PRIMARY KEY (report_id, plugin_id), CONSTRAINT plugins_reports_20100125_report_id_fkey FOREIGN KEY (report_id) REFERENCES reports_20100125 (id) ON DELETE CASCADE, CONSTRAINT plugins_reports_20100125_plugin_id_fkey FOREIGN KEY (plugin_id) REFERENCES plugins (id) ON DELETE CASCADE, CONSTRAINT plugins_reports_20100125_date_check CHECK ((’2010-01-25 00:00:00’::TIMESTAMP without TIME zone <= date_processed) AND ( date_processed < ’2010-02-01 00:00:00’::TIMESTAMP without TIME zone)

12.19 Tables primarily used for materialized views product visibility Product visibility controls which products are subject to having data aggregated into the various materialized views. Replaces mtbfconfig, tcbyurlconfig: CREATE TABLE product_visibility ( productdims_id integer NOT NULL PRIMARY KEY, start_date timestamp, -- set this manually for all mat views end_date timestamp, -- set this manually: Used by mat views that care ignore boolean default False, -- force aggregation off for this product id FOREIGN KEY (productdims_id) REFERENCES productdims(id) ); index: end_date index: start_date

12.20 Dimensions tables signaturedims Signature dims was a table associating signature with id, no longer used. Instead, signatures are stored directly in the places that need them. productdims Product dims associates a product, version and release key. An enum is used for the release key. Product dims has changed from prior version by dropping the os_name column, which has been promoted into its own osdims table.:

116 Chapter 12. Development Discussions Socorro Documentation, Release 2

CREATE TYPE release_enum AS ENUM (’major’, ’milestone’, ’development’);"

CREATE TABLE productdims ( id serial NOT NULL PRIMARY KEY, product TEXT NOT NULL, -- varchar(30) version TEXT NOT NULL, -- varchar(16) release release_enum -- ’major’:x.y.z..., ’milestone’:x.ypre, ’development’:x.y[ab]z ); unique index: (product,version) index: release osdims OS dims associates an os name and version. Promoted from earlier versions where os_name was stored directly in ‘facts’ tables.: CREATE TABLE osdims ( id serial NOT NULL PRIMARY KEY, os_name CHARACTER VARYING(100) NOT NULL, os_version CHARACTER VARYING(100) ); index: (os_name,os_version) urldims URL dims associates a domain and a simplified url. URL dims is unchanged from prior version.: CREATE TABLE urldims ( id serial NOT NULL, domain character varying(255) NOT NULL, url character varying(255) NOT NULL ); unique index: (url,domain)

12.21 View tables

View tables now have a uniform layout: • id: The unique id for this row • aggregated data: As appropriate for the view • keys: One or more of signature, urldims id, productdims id, osdims id • window_end: Used to keep track of most recently aggregated row • window_size: Used redundantly in case aggregation window changes time before failure Aggregate the amount of time the app ran from startup to fail, and from prior fail to current fail. Replaces mtbffacts table.: CREATE TABLE time_before_failure ( id serial NOT NULL PRIMARY KEY, sum_uptime_seconds integer NOT NULL, report_count integer NOT NULL, productdims_id integer, osdims_id integer, window_end TIMESTAMP WITHOUT TIME ZONE NOT NULL,

12.21. View tables 117 Socorro Documentation, Release 2

window_size INTERVAL NOT NULL, FOREIGN KEY (productdims_id) REFERENCES productdims(id), FOREIGN KEY (osdims_id) REFERENCES osdims(id) ); index: (window_end,window_size) index: productdims_id index: osdims_id top crashes by signature Aggregate the number of crashes per unit of time associated with a particular stack signature. Replaces topcrashers table.: CREATE TABLE top_crashes_by_signature ( id serial NOT NULL PRIMARY KEY, count integer NOT NULL DEFAULT 0, uptime real DEFAULT 0.0, signature TEXT, productdims_id integer, osdims_id integer, window_end TIMESTAMP WITHOUT TIME ZONE NOT NULL, window_size INTERVAL NOT NULL, FOREIGN KEY (productdims_id) REFERENCES productdims(id), FOREIGN KEY (osdims_id) REFERENCES osdims(id) ); index: productdims_id index: osdims_id index: signature index: (window_end,window_size) top crashes by url Aggregate the number of crashes associated with a particular URL. Replaces topcrashurlfacts table.: CREATE TABLE top_crashes_by_url ( id serial NOT NULL PRIMARY KEY, count integer NOT NULL, urldims_id integer, productdims_id integer, osdims_id integer, window_end TIMESTAMP WITHOUT TIME ZONE NOT NULL, window_size INTERVAL NOT NULL, FOREIGN KEY (urldims_id) REFERENCES urldims(id) FOREIGN KEY (productdims_id) REFERENCES productdims(id), FOREIGN KEY (osdims_id) REFERENCES osdims(id) ); index: count index: urldims_id index: productdims_id index: osdims_id index: (window_end,window_size) top crashes by url signature Associate top crashes by url with their signature(s). Promoted from prior topcrashurlfacts where signaturedims id was stored directly. Use of this table allows multiple signatures to be associated with the same crashing url.: CREATE TABLE top_crashes_by_url_signature ( top_crashes_by_url_id integer NOT NULL, -- foreign key signature TEXT NOT NULL,

118 Chapter 12. Development Discussions Socorro Documentation, Release 2

count integer NOT NULL, FOREIGN KEY (top_crashes_by_url_id) REFERENCES crashes_by_url(id) ); primary key: (top_crashes_by_url_id,signature) top crash url facts reports Associate a crash uuid and comment with a particular top crash by url row. This table’s schema is unchanged from prior version, but the topcrashurlfacts_id column is re-purposed to map to the new top_crashes_by_url table.: CREATE TABLE topcrashurlfactsreports ( id serial NOT NULL PRIMARY KEY, uuid character varying(50) NOT NULL, comments character varying(500), topcrashurlfacts_id integer FOREIGN KEY (topcrashurlfacts_id) REFERENCES top_crashes_by_url(id) ); index: topcrashurlfacts_id

12.22 Bug tracking bugs Periodically extract new and changed items from the bug tracking database. Bugs is recently added.: CREATE TABLE bugs ( id int NOT NULL PRIMARY KEY, status text, resolution text, short_desc text ); bug associations Associate signatures with bug ids. Bug Associations is recently added.: CREATE TABLE bug_associations ( signature text NOT NULL, bug_id int NOT NULL, FOREIGN KEY (bug_id) REFERENCES bugs(id) ); primary key: (signature, bug_id) index: bug_id

Nightly Builds Stores nightly builds in Postgres.: CREATE TABLE builds ( product text, version text, platform text, buildid BIGINT, changeset text, filename text, date timestamp without time zone default now(), CONSTRAINT builds_key UNIQUE (product, version, platform, buildid) );

12.22. Bug tracking 119 Socorro Documentation, Release 2

12.23 Meta data

Server status Server Status table keeps track of the current status of jobs processors. Server status is unchanged from prior version.: CREATE TABLE server_status ( id serial NOT NULL PRIMARY KEY, date_recently_completed timestamp without time zone, date_oldest_job_queued timestamp without time zone, avg_process_sec real, avg_wait_sec real, waiting_job_count integer NOT NULL, processors_count integer NOT NULL, date_created timestamp without time zone NOT NULL ); index: (date_created,id)

12.24 Database Setup

This app is under development. For progress information see: Bugzilla 454438 This is an application that will set up the PostgreSQL database schema for Socorro. It starts with an empty database and creates all the tables, indexes, constraints, stored procedures and triggers needed to run a Socorro instance. Before this application can be run, however, there have been set up a regular user that will be used for the day to day operations. While it is not recommended that the regular user have the full set of super user privileges, the regular user must be privileged enough to create tables within the database. Before the application that sets up the database can be run, the Common Config must be set up. The configuration file for this app itself is outlined at the end of this page.

12.24.1 Running the setupDatabase app

.../scripts/setupDatabase.py

12.24.2 Configuring setupDatabase app

This application relies on its own configuration file as well as the common configuration file Common Config. copy the .../scripts/config/setupdatabaseconfig.py.dist file to .../scripts/config/setupdatabase.py and edit the file to make site specific changes. logFilePathname

120 Chapter 12. Development Discussions Socorro Documentation, Release 2

Monitor can log its actions to a set of automatically rotating log files. This is the name and location of the logs.: logFilePathname= cm.Option() logFilePathname.doc=’full pathname for the log file’ logFilePathname.default=’./monitor.log’ logFileMaximumSize This is the maximum size in bytes allowed for a log file. Once this number is achieved, the logs rotate and a new log is started.: logFileMaximumSize= cm.Option() logFileMaximumSize.doc=’maximum size in bytes of the log file’ logFileMaximumSize.default= 1000000 logFileMaximumBackupHistory The maximum number of log files to keep.: logFileMaximumBackupHistory= cm.Option() logFileMaximumBackupHistory.doc=’maximum number of log files to keep’ logFileMaximumBackupHistory.default= 50 logFileLineFormatString A Python format string that controls the format of individual lines in the logs: logFileLineFormatString= cm.Option() logFileLineFormatString.doc=’python logging system format for log file entries’ logFileLineFormatString.default=’ %(asctime)s %(levelname)s - %(message)s’ logFileErrorLoggingLevel Logging is done in severity levels - the lower the number, the more verbose the logs.: logFileErrorLoggingLevel= cm.Option() logFileErrorLoggingLevel.doc=’logging level for the log file (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’ logFileErrorLoggingLevel.default= 10 stderrLineFormatString In parallel with creating log files, Monitor can log to stderr. This is a Python format string that controls the format of individual lines sent to stderr.: stderrLineFormatString= cm.Option() stderrLineFormatString.doc=’python logging system format for logging to stderr’ stderrLineFormatString.default=’ %(asctime)s %(levelname)s - %(message)s’ stderrErrorLoggingLevel Logging to stderr is done in severity levels independently from the log file severity levels - the lower the number, the more verbose the output to stderr.: stderrErrorLoggingLevel= cm.Option() stderrErrorLoggingLevel.doc=’logging level for the logging to stderr (10 - DEBUG, 20 - INFO, 30 - WARNING, 40 - ERROR, 50 - CRITICAL)’ stderrErrorLoggingLevel.default= 40

12.25 Common Config

To avoid repetition between configurations of a half dozen independently running applications, common settings are consolidated in a common configuration file: OB.../scripts/config/commonconfig.py.dist.

12.25. Common Config 121 Socorro Documentation, Release 2

All Socorro applications have these constants available to them. For a Socorro applications that are command line driven, each of these default values can be overidden by a command line switch of the same name. To setup this configuration file, just copy the example, .../scripts/config/commonconfig.py.dist to .../scripts/config/commonconfig.py. Edit the file for your local situation.: import socorro.lib.ConfigurationManager as cm import datetime import stat

#------# Relational Database Section databaseHost= cm.Option() databaseHost.doc=’the hostname of the database servers’ databaseHost.default=’localhost’ databasePort= cm.Option() databasePort.doc=’the port of the database on the host’ databasePort.default= 5432 databaseName= cm.Option() databaseName.doc=’the name of the database within the server’ databaseName.default=’’ databaseUserName= cm.Option() databaseUserName.doc=’the user name for the database servers’ databaseUserName.default=’’ databasePassword= cm.Option() databasePassword.doc=’the password for the database user’ databasePassword.default=’’

#------# Crash storage system jsonFileSuffix= cm.Option() jsonFileSuffix.doc=’the suffix used to identify a json file’ jsonFileSuffix.default=’.json’ dumpFileSuffix= cm.Option() dumpFileSuffix.doc=’the suffix used to identify a dump file’ dumpFileSuffix.default=’.dump’

#------# HBase storage system hbaseHost= cm.Option() hbaseHost.doc=’Hostname for hbase hadoop cluster. May be a VIP or load balancer’ hbaseHost.default=’localhost’ hbasePort= cm.Option() hbasePort.doc=’hbase port number’ hbasePort.default= 9090 hbaseTimeout= cm.Option() hbaseTimeout.doc=’timeout in milliseconds for an HBase connection’

122 Chapter 12. Development Discussions Socorro Documentation, Release 2

hbaseTimeout.default= 5000

#------# misc processorCheckInTime= cm.Option() processorCheckInTime.doc=’the time after which a processor is considered dead (hh:mm:ss)’ processorCheckInTime.default="00:05:00" processorCheckInTime.fromStringConverter= lambda x: str(cm.timeDeltaConverter(x)) startWindow= cm.Option() startWindow.doc=’The start of the single aggregation window (YYYY-MM-DD [hh:mm:ss])’ startWindow.fromStringConverter= cm.dateTimeConverter deltaWindow= cm.Option() deltaWindow.doc=’The length of the single aggregation window ([dd:]hh:mm:ss)’ deltaWindow.fromStringConverter= cm.timeDeltaConverter defaultDeltaWindow= cm.Option() defaultDeltaWindow.doc=’The length of the single aggregation window ([dd:]hh:mm:ss)’ defaultDeltaWindow.fromStringConverter= cm.timeDeltaConverter

# override this default for your particular cron task defaultDeltaWindow.default=’00:12:00’ endWindow= cm.Option() endWindow.doc=’The end of the single aggregation window (YYYY-MM-DD [hh:mm:ss])’ endWindow.fromStringConverter= cm.dateTimeConverter startDate= cm.Option() startDate.doc=’The start of the overall/outer aggregation window (YYYY-MM-DD [hh:mm])’ startDate.fromStringConverter= cm.dateTimeConverter deltaDate= cm.Option() deltaDate.doc=’The length of the overall/outer aggregation window ([dd:]hh:mm:ss)’ deltaDate.fromStringConverter= cm.timeDeltaConverter initialDeltaDate= cm.Option() initialDeltaDate.doc=’The length of the overall/outer aggregation window ([dd:]hh:mm:ss)’ initialDeltaDate.fromStringConverter= cm.timeDeltaConverter

# override this default for your particular cron task initialDeltaDate.default=’4:00:00:00’ minutesPerSlot= cm.Option() minutesPerSlot.doc=’how many minutes per leaf directory in the date storage branch’ minutesPerSlot.default=1 endDate= cm.Option() endDate.doc=’The end of the overall/outer aggregation window (YYYY-MM-DD [hh:mm:ss])’ endDate.fromStringConverter= cm.dateTimeConverter debug= cm.Option() debug.doc=’do debug output and routines’ debug.default= False debug.singleCharacter=’D’ debug.fromStringConverter= cm.booleanConverter

12.25. Common Config 123 Socorro Documentation, Release 2

12.26 Populate ElasticSearch

12.26.1 Install ElasticSearch

First you need to install ElasticSearch. The procedure is well described in this tutorial: Setting up elasticsearch. Don’t bother configuring ES if you don’t know you will need it, it generally works just fine out of the box. Note: ElasticSearch is not yet included in our Vagrant dev VMs but should be sometime soon.

12.26.2 Increase open files limit

ElasticSearch needs to open a lot of files when indexing, often reaching the limits imposed by UNIX systems. To avoid errors when indexing, you will have to increase the limits imposed by your OS. First see what user is running ElasticSearch. It may be root or vagrant. Use top for example and look for an elasticsearch-l process. Then edit /etc/security/limits.conf and add at the end the following: root soft nofile 4096 root hard nofile 10240

Replace root with vagrant (or whatever user is running ES) if needed, save and restart your VM. You will also need to increase the system-wide file descriptors limit by editing /etc/sysctl.conf and adding at the end: fs.file-max= 100000

After you saved and closed the file, run sysctl -p, then cat /proc/sys/fs/file-max to verify it worked. No restart is required here. Note: I am not sure whether restarting the VM is necessary, or if ElasticSearch only is needed. Don’t hesitate to make this more precise with the result of your experiments. Source: http://www.cyberciti.biz/faq/linux-increase-the-maximum-number-of-open-files/

12.26.3 Download the dump

You can get a recent dump for ElasticSearch in http://people.mozilla.org/~agaudebert/socorro/es-dumps/. You will also need to get the mapping of our Socorro indexes: http://people.mozilla.org/~agaudebert/socorro/es- dumps/mapping.json

12.26.4 Run the script

The script to import crashes into ElasticSearch is not yet merged into our official repository. To get it, you will need to fetch github.com/AdrianGaudebert/socorro and checkout branch 696722-script-import-es: git remote add AdrianGaudebert https://github.com/AdrianGaudebert/socorro.git git fetch AdrianGaudebert git branch --track 696722-script-import-es AdrianGaudebert/696722-script-import-es git checkout 696722-script-import-es

Before you can run the script, you will have to stop supervisord: sudo /etc/init.d/supervisor force-stop

124 Chapter 12. Development Discussions Socorro Documentation, Release 2

The script is called movecrashes.py and is in .../scripts/. It has a few dependencies over Socorro and thus needs to be ran from the root of a Socorro directory with $PYTHONPATH = .:thirdparty. Use it as follow: python scripts/movecrashers.py import /path/to/dump.tar /path/to/mapping.json

This will simply import all crash reports contained in the dump into ElasticSearch, without cleaning anything before. If you want to have more data than available in the dump, you can just run that import again and create duplicates. If you want to clean the old socorro data first, just run rebuild instead of import: python scripts/movecrashers.py rebuild /path/to/dump.tar /path/to/mapping.json

Note that this will only delete indexes called socorro_xxxxxx. If you’re using a shared ES instance, or have other indexes you want to keep, there is no risk they get deleted in this process.

12.26. Populate ElasticSearch 125 Socorro Documentation, Release 2

126 Chapter 12. Development Discussions CHAPTER 13

PostgreSQL Database

13.1 PostgreSQL Database Tables by Data Source

Last updated: 2011-01-15 This document breaks down the tables in the Socorro PostgreSQL database by where their data comes from, rather than by what the table contains. This is a prerequisite to populating a brand-new socorro database or creating synthetic testing workloads.

13.2 Manually Populated Tables

The following tables have no code to populate them automatically. Initial population and any updating need to be done by hand. Generally there’s no UI, either; use queries. • daily_crash_codes • os_name_matches • os_names • process_types • product_release_channels • products • release_channel_matches • release_channels • uptime_levels • windows_versions • product_productid_map • report_partition_info

13.3 Tables Receiving External Data

These tables actually get inserted into by various external utilities. This is most of our “incoming” data. bugs list of bugs, populated by bugzilla-scraper

127 Socorro Documentation, Release 2 extensions populated by processors plugins_reports populated by processors raw_adu populated by daily batch job from metrics releases_raw populated by daily FTP-scraper reports populated by processors

13.4 Automatically Populated Reference Tables

Lookup lists and dimension tables, populated by cron jobs and/or processors based on the above tables. Most are annotated with the job or process which populates them. Where the populating process is marked with an @, that indicates a job which is due to be phased out. addresses cron job, part of update_reports_clean based on reports domains cron job, part of update_reports_clean based on reports flash_versions cron job, part of update_reports_clean based on reports os_versions cron job, update_os_versions based on reports@ cron job, update_reports_clean based on reports plugins populated by processors based on crash data product_version_builds cron job, update_product_versions, based on releases_raw product_versions cron job, update_product_versions, based on releases_raw reasons cron job, update_reports_clean, based on reports reports_bad cron job, update_reports_clean, based on reports future cron job to delete data from this table signatures cron job, update_signatures, based on reports@ cron job, update_reports_clean, based on reports

13.5 Matviews

Reporting tables, designed to be called directly by the mware/UI/reports. Populated by cron job batch. Where popu- lating functions are marked with a @, they are due to be replaced with new jobs. bug_associations not sure daily_crashes daily_crashes based on reports daily_hangs update_hang_report based on reports os_signature_counts update_os_signature_counts based on reports product_adu daily_adu based on raw_adu product_signature_counts update_product_signature_counts based on reports reports_clean update_reports_clean based on reports reports_user_info update_reports_clean based on reports reports_duplicates find_reports_duplicates based don reports signature_bugs_rollup not sure signature_first@ update_signatures based on reports@ signature_products update_signatures based on reports@

128 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2 signature_products_rollup update_signatures based on reports@ tcbs update_tcbs based on reports uptime_signature_counts update_uptime_signature_counts based on reports

13.6 Application Management Tables

These tables are used by various parts of the application to do other things than reporting. They are populated/managed by those applications. • email campaign tables – email_campaigns – email_campaigns_contacts – email_contacts • processor management tables – jobs – priorityjobs – priority_jobs_* – processors – server_status • UI management tables – sessions • monitoring tables – replication_test • cronjob and database management – cronjobs – report_partition_info

13.7 Deprecated Tables

These tables are supporting functionality which is scheduled to be removed over the next few versions of Socorro. As such, we are ignoring them. • alexa_topsites • builds • frames • osdims • priorityjobs_log • priorityjobs_logging_switch • product_visibility

13.6. Application Management Tables 129 Socorro Documentation, Release 2

• productdims • productdims_version_sort • release_build_type_map • signature_build • signature_productdims • top_crashes_by_signature • top_crashes_by_url • top_crashes_by_url_signature • urldims

13.8 PostgreSQL Database Table Descriptions

This document describes the various tables in PostgreSQL by their purpose and essentially what data each contains. This is intended as a reference for socorro developers and analytics users. Tables which are in the database but not listed below are probably legacy tables which are slated for removal in future Socorro releases. Certainly if the tables are not described, they should not be used for new features or reports.

13.9 Raw Data Tables

These tables hold “raw” data as it comes in from external sources. As such, these tables are quite large and contain a lot of garbage and data which needs to be conditionally evaluated. This means that you should avoid using these tables for reports and interfaces unless the data you need isn’t available anywhere else – and even then, you should see about getting the data added to a matview or normalized fact table.

13.9.1 reports

The primary “raw data” table, reports contains the most used information about crashes, one row per crash report. Primary key is the UUID field. The reports table is partitioned by date_processed into weekly partitions, so any query you run against it should include filter criteria (WHERE) on the date_processed column. Examples: WHERE date_processed BETWEEN ’2012-02-12 11:05:09+07’ AND ’2012-02-17 11:05:09+07’ WHERE date_processed >= DATE ’2012-02-12’ AND date_processed < DATE ’2012-02-17’ WHERE utc_day_is(date_processed, ’2012-02-15’)

Data in this table comes from the processors.

13.9.2 extensions

Contains information on add-ons installed in the user’s application. Currently linked to reports via a synthetic report_id (this will be fixed to be UUID in some future release). Data is partitioned by date_processed into weekly partitions, so include a filter on date_processed in every query hitting this table. Has zero to several rows for each crash. Data in this table comes from the processors.

130 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

13.9.3 plugins_reports

Contains information on some, but not all, installed modules implicated in the crash: the “most interesting” modules. Relates to dimension table plugins. Currently linked to reports via a synthetic report_id (this will be fixed to be UUID in some future release). Data is partitioned by date_processed into weekly partitions, so include a filter on date_processed in every query hitting this table. Has zero to several rows for each crash. Data in this table comes from the processors.

13.9.4 bugs

Contains lists of bugs thought to be related to crash reports, for linking to crashes. Populated by a daily cronjob.

13.9.5 bug_associations

Links bugs from the bugs table to crash signatures. Populated by daily cronjob.

13.9.6 raw_adu

Contains counts of estimated Average Daily Users as calculated by the Metrics department, grouped by product, version, build, os, and UTC date. Populated by a daily cronjob.

13.9.7 releases_raw

Contains raw data about Mozilla releases, including product, version, platform and build information. Populated hourly via FTP-scraping.

13.9.8 reports_duplicates

Contains UUIDs of groups of crash reports thought to be duplicates according to the current automated duplicate- finding algorithm. Populated by hourly cronjob.

13.10 Normalized Fact Tables

13.10.1 reports_clean

Contains cleaned and normalized data from the reports table, including product-version, os, os version, signature, reason, and more. Partitioned by date into weekly partitions, so each query against this table should contain a predicate on date_processed: WHERE date_processed BETWEEN ’2012-02-12 11:05:09+07’ AND ’2012-02-17 11:05:09+07’ WHERE date_processed >= DATE ’2012-02-12’ AND date_processed < DATE ’2012-02-17’ WHERE utc_day_is(date_processed, ’2012-02-15’)

Because reports_clean is much smaller than reports and is normalized into unequivocal relationships with dimenstion tables, it is much easier to use and faster to execute queries against. However, it excludes data in the reports table which doesn’t conform to normalized data, including: • product versions before the first Rapid Release versions (e.g. Firefox 3.6)

13.10. Normalized Fact Tables 131 Socorro Documentation, Release 2

• Camino • corrupt reports, including ones which indicate a breakpad bug Populated hourly, 3 hours behind the current time, from data in reports via cronjob. The UUID column is the primary key. There is one row per crash report, although some crash reports are suspected to be duplicates. Columns: uuid artificial unique identifier assigned by the collectors to the crash at collection time. Contains the date collected plus a random string. date_processed timestamp (with time zone) at which the crash was received by the collectors. Also the partition key for partitioning reports_clean. Note that the time will be 7-8 hours off for crashes before February 2012 due to a shift from PST to UTC. client_crash_date timestamp with time zone at which the users’ crashing machine though the crash was happening. Often innacurrate due to clock issues, is primarily supplied as an anchor timestamp for uptime and install_age. product_version_id foreign key to the product_versions table. build numeric build identifier as supplied by the client. Might not match any real build in product_version_builds for a variety of reasons. signature_id foreign key to the signatures dimension table. install_age time interval between installation and crash, as reported by the client. To get the reported install date, do ( SELECT client_crash_date - install_age ). uptime time interval between program start and crash, as reported by the client. reason_id foreign key to the reasons table. address_id foreign key to the addresses table. os_name name of the OS of the crashing host, for OSes which match known OSes. os_version_id foreign key to the os_versions table. hang_id UUID assigned to the hang pair grouping for hang pairs. May not match anything if the hang pair was broken by sampling or lost crash reports. flash_version_id foreign key to the flash_versions table process_type Crashing process type, linked to process_types dimension. release_channel release channel from which the crashing product was obtained, unless altered by the user (this hap- pens more than you’d think). Note that non-Mozilla builds are usually lumped into the “release” channel. duplicate_of UUID of the “leader” of the duplicate group if this crash is marked as a possible duplicate. If UUID and duplicate_of are the same, this crash is the “leader”. Selection of leader is arbitrary. domain_id foreign key to the domains dimension architecture CPU architecture of the client as reported (e.g. ‘x86’, ‘arm’). cores number of CPU cores on the client, as reported.

13.10.2 reports_user_info

Contains a handful of “optional” information from the reports table which is either security-sensitive or is not included in all reports and is large. This includes the full URL, user email address, comments, and app_notes. As such, access to this table in production may be restricted.

132 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

Partitioned by date into weekly partitions, so each query against this table should contain a predicate on date_processed. Relates to reports_clean via UUID, which is also its primary key.

13.10.3 product_adu

The normalized version of raw_adu, contains summarized estimated counts of users for each product-version since Rapid Release began. Populated by daily cronjob.

13.11 Dimensions

These tables contain lookup lists and taxonomy for the fact tables in Socorro. Generally they are auto-populated based on encountering new values in the raw data, on an hourly basis. A few tables below are manually populated and change extremely seldom, if at all. Dimensions which are lookup lists of short values join to the fact tables by natural key, although it is not actually nec- essary to reference them (e.g. os_name, release_channel). Dimension lists which have long values or are taxonomies or heirarchies join to the fact tables using a surrogate key (e.g. product_version_id, reason_id). Some dimensions which come from raw crash data have a “first_seen” column which displays when that value was first encountered in a crash and added to the dimension table. Since the first_seen columns were added in September 2011, most of these will have the value ‘2011-01-01’ which is not meaningful. Only dates after 2011-09-15 actually indicate a first appearance.

13.11.1 addresses

Contains a list of crash location “addresses”, extracted hourly from the raw data. Surrogate key: address_id.

13.11.2 daily_crash_codes

Reference list for the cryptic single-character codes in the daily_crashes table. Legacy, to be eventually restructured. Natural key: crash_code. Manually populated.

13.11.3 domains

List of HTTP domains extracted from raw reports by applying a truncation regex to the crashing URL. These should contain no personal information. Contains a “first seen” column. Surrogate key: domain_id

13.11.4 flash_versions

List of Abobe Flash version numbers harvested from crashes. Has a “first_seen” column. Surrogate key: flash_version_id.

13.11.5 os_names

Canonical list of OS names used in Sorocco. Natural key. Fixed list, manually populated.

13.11. Dimensions 133 Socorro Documentation, Release 2

13.11.6 os_versions

List of versions for each OS based on data harvested from crashes. Contains some garbage versions because we cannot validate. Surrogate key: os_version_id.

13.11.7 plugins

List of “interesting modules” harvested from raw crashes, populated by the processors. Surrogate key: ID. Links to plugins_reports.

13.11.8 process_types

Standing list of crashing process types (browser, plugin and hang). Manually input. Natural key.

13.11.9 products

List of supported products, along with the first version on rapid release. Manually maintained. Natural key: prod- uct_name.

13.11.10 product_versions

Contains a list of versions for each product, since the beginning of rapid release (i.e. since Firefox 5.0). Version numbers are available expressed several different ways, and there is a sort column for sorting versions. Also contains build_date/sunset_date visibility information and the featured_version flag. “build_type” means the same thing as “release_channel”. Surrogate key: product_version_id. Version columns include: version_string The canonical, complete version number for display to users release_version The version number as provided in crash reports (and usually the same as the one on the FTP server). Can be missing suffixes like “b2” or “esr”. major_version Just the first two numbers of the version number, e.g. “11.0” version_sort An alphanumeric string which allows you to sort version numbers in the correct order. beta_number The sequential beta release number if the product-version is a beta. For “final betas”, this number will be 99.

13.11.11 product_version_builds

Contains a list of builds for each product-version. Note that platform information is not at all normalized. Natural key: product_version_id, build_id.

13.11.12 product_release_channels

Contains an intersection of products and release channels, mainly in order to store throttle values. Manually populated. Natural key: product_name, release_channel.

134 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

13.11.13 reasons

Contains a list of “crash reason” values harvested from raw crashes. Has a “first seen” column. Surrogate key: reason_id.

13.11.14 release_channels

Contains a list of available Release Channels. Manually populated. Natural key. See “note on release channel columns” below.

13.11.15 signatures

List of crash signatures harvested from incoming raw data. Populated by hourly cronjob. Has a first_seen column. Surrogate key: signature_id.

13.11.16 uptime_levels

Reference list of uptime “levels” for use in reports, primarily the Signature Summary. Manually populated.

13.11.17 windows_versions

Reference list of Window major/minor versions with their accompanying common names for reports. Manually pop- ulated.

13.12 Matviews

These data summaries are derived data from the fact tables and/or the raw data tables. They are populated by hourly or daily cronjobs, and are frequently regenerated if historical data needs to be corrected. If these matviews contain the data you need, you should use them first because they are smaller and more efficient than the fact tables or the raw tables.

13.12.1 correlations

Summaries crashes by product-version, os, reason and signature. Populated by daily cron job. Is the root for the other correlations reports. Correlation reports in the database will not be active/populated until 2.5.2 or later.

13.12.2 correlation_addons

Contains crash-count summaries of addons per correlation. Populated by daily cronjob.

13.12.3 correlation_cores

Contains crash-count summaries of crashes per architecture and number of cores. Populated by daily cronjob.

13.12. Matviews 135 Socorro Documentation, Release 2

13.12.4 correlation_modules

Will contain crash-counts for modules per correlation. Will be populated daily by pull from Hbase.

13.12.5 daily_crashes

Stores crash counts per product-version, OS, and day. This is probably the oldest matview, and has unintuitive and historical column names; it will probably be overhauled or replaced. The report_type column defines 5 different sets of counts, see daily_crash_codes above. We recommended that you use the VIEW daily_crash_ratio instead of using daily_crashes, as the structure of daily_crashes is hard to understand and is likely to change in the future.

13.12.6 daily_hangs and hang_report daily_hangs contains a correlation of hang crash reports with their related hang pair crashes, plus additional summary data. Duplicates contains an array of UUIDs of possible duplicates. hang_report is a dynamic view which flattens daily_hangs and its related dimension tables.

13.12.7 nightly_builds contains summaries of crashes-by-age for Nightly and Aurora releases. Will be populated in Socorro 2.5.1.

13.12.8 product_crash_ratio

Dynamic VIEW which shows crashes, ADU, adjusted crashes, and the crash/100ADU ratio, for each product and versions. Recommended for backing graphs and similar.

13.12.9 product_os_crash_ratio

Dynamic VIEW which shows crashes, ADU, adjusted crashes, and the crash/100ADU ratio for each product, OS and version. Recommended for backing graphs and similar.

13.12.10 product_info dynamic VIEW which suppies the most essential information about each product version for both old and new prod- ucts.

13.12.11 signature_products and signature_products_rollup

Summary of which signatures appear in which product_version_ids, with first appearance dates. The rollup contains an array-style summary of the signatures with lists of product-versions.

13.12.12 tcbs

Short for “Top Crashes By Signature”, tcbs contains counts of crashes per day, signature, product-version, and columns counting each OS.

136 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

13.13 Note On Release Channel Columns

Due to a historical error, the column name for the Release Channel in various tables may be named “release_channel”, “build_type”, or “build_channel”. All three of these column names refer to exactly the same thing. While we regret the confusion, it has not been thought to be worth the refactoring effort to clean it up.

13.14 Application Support Tables

These tables are used by various parts of the application to do other things than reporting. They are populated/managed by those applications. Most are not accessible to the various reporting users, as they do not contain reportable data.

13.14.1 data processing control tables

These tables contain data which supports data processing by the processors and cronjobs. product_productid_map maps product names based on productIDs, in cases where the product name supplied by Breakpad is not correct (i.e. FennecAndroid). reports_bad contains the last day of rejected UUIDs for copying from reports to reports_clean. intended for auditing of the reports_clean code. os_name_matches contains regexs for matching commonly found OS names in crashes with canonical OS names. release_channel_matches contains LIKE match strings for release channels for channel names commonly found in crashes with canonical names. special_product_platforms contains mapping information for rewriting data from FTP-scraping to have the correct product and platform. Currently used only for Fennec. transform_rules contains rule data for rewriting crashes by the processors. May be used in the future for other rule-based rewriting by other components.

13.14.2 email campaign tables

These tables support the application which emails crash reporters with follow-ups. As such, access to these tables will restricted. • email_campaigns • email_campaigns_contacts • email_contacts

13.14.3 processor management tables

These tables are used to coordinate activities of the up-to-120 processors and the monitor. jobs The current main queue for crashes waiting to be processed. priorityjobs The queue for user-requested “priority” crash processing. processors The registration list for currently active processors. server_status Contains summary statistics on the various processor servers.

13.13. Note On Release Channel Columns 137 Socorro Documentation, Release 2

13.14.4 UI management tables sessions contains session information for people logged into the administration interface for Socorro.

13.14.5 monitoring tables replication_test Contains a timestamp for ganglia to measure the speed of replication.

13.14.6 cronjob and database management

These tables support scheduled tasks which are run in Socorro. cronjobs contains last-completed and success/failure status for each cronjob which affects the database. Currently does not include all cronjobs. report_partition_info contains configuration information on how the partitioning cronjob needs to partition the var- ious partitioned database tables. socorro_db_version contains the socorro version of the current database. updated by the upgrade scripts. socorro_db_version_history contains the history of version upgrades of the current database.

13.15 Creating a New Matview

A materialized view, or “matview” is the results of a query stored as a table in the PostgreSQL database. Matviews make user interfaces much more responsive by eliminating searches over many GB or sparse data at request time. The majority of the time, new matviews will have the following characteristics: • they will pull data from reports_clean and/or reports_user_info • they will be updated once per day and store daily summary data • they will be updated by a cron job calling a stored procedure The rest of this guide assumes that all three conditions above are true. For matviews for which one or more conditions are not true, consult the PostgreSQL DBAs for your matview.

13.16 Do I Want a Matview?

Before proceeding to construct a new matview, test the responsiveness of simply running a query over reports_clean and/or reports_user_info. You may find that the query returns fast enough ( < 100ms ) without its own matview. Remember to test the extreme cases: Firefox release version on Windows, or Fennec aurora version. Also, matviews are really only effective if they are smaller than 1/4 the size of the base data from which they are constructed. Otherwise, it’s generally better to simply look at adding new indexes to the base data. Try populating a couple days of the matview, ad-hoc, and checking its size (pg_total_relation_size()) compared to the base table from which it’s drawn. The new signature summaries was a good example of this; the matviews to meet the spec would have been 1/3 the size of reports_clean, so we added a couple new indexes to reports_clean instead.

138 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

13.17 Components of a Matview

In order to create a new matview, you will create or modify five or six things: 1. a table to hold the matview data 2. an update function to insert new matview data once per day 3. a backfill function to backfill one day of the matview 4. add a line in the general backfill_matviews function 5. if the matview is to be backfilled from deployment, a script to do this 6. a test that the matview is being populated correctly. Point (6) is not yet addressed by a test framework for Socorro, so we’re skipping it currently. For the rest of this doc, please refer to the template matview code sql/templates/general_matview_template.sql in the Socorro source code.

13.18 Creating the Matview Table

The matview table should be the basis for the report or screen you want. It’s important that it be able to cope with all of the different filter and grouping criteria which users are allowed to supply. On the other hand, most of the time it’s not helpful to try to have one matview support several different reports; the matview gets bloated and slow. In general, each matview will have the following things: • one or more grouping columns • a report_date column • one or more summary data columns If they are available, all columns should use surrogate keys to lookup lists (i.e. use signature_id, not the full text of the signature). Generally the primary key of the matview will be the combination of all grouping columns plus the report date. So, as an example, we’re going to create a simple matview for summarizing crashes per product, web domain. While it’s unlikely that such a matview would be useful in practice (we could just query reports_clean directly) it makes a good example. Here’s the model for the table: table product_domain_counts product_version domain report_date report_count key product_version, domain, report_date

We actually use the custom procedure create_table_if_not_exists() to create this. This function handles idempotence, permissions, and secondary indexes for us, like so: SELECT create_table_if_not_exists(’product_domain_counts’ $x$ CREATE TABLE product_domain_counts ( product_version_id INT NOT NULL, domain_id INT NOT NULL, report_date DATE NOT NULL, report_count INT NOT NULL DEFAULT 0,

13.17. Components of a Matview 139 Socorro Documentation, Release 2

constraint product_domain_counts_key ( product_version_id, domain_id, report_date ) ); $x$, ’breakpad_rw’, ARRAY[’domain_id’] );

See DatabaseAdminFunctions in the docs for more information about the function. You’ll notice that the resulting matview uses the surrogate keys of the corresponsing lookup lists rather than the actual values. This is to keep matview sizes down and improve performance. You’ll also notice that there are no foriegn keys to the various lookup list tables; this is partly a performance optimization, but mostly because, since matviews are populated by stored procedure, validating input is not critical. We also don’t expect to need cascading updates or deletes on the lookup lists.

13.18.1 Creating The Update Function

Once you have the table, you’ll need to write a function to be called by cron once per day in order to populate the matview with new data. This function will: • be named update_{name_of_matview} • take two parameters, a date and a boolean • return a boolean, with true = success and ERROR = failure • check if data it depends on is available • check if it’s already been run for the day • pull its data from reports_clean, reports_user_info, and/or other matviews (_not_ reports or other raw data tables) So, here’s our update function for the product_domains table: CREATE OR REPLACE FUNCTION update_product_domain_counts ( updateday DATE, checkdata BOOLEAN default TRUE ) RETURNS BOOLEAN LANGUAGE plpgsql SET work_mem = ’512MB’ SET temp_buffers = ’512MB’ SET client_min_messages = ’ERROR’ AS $f$ BEGIN -- this function populates a daily matview -- for crash counts by product and domain -- depends on reports_clean

-- check if we’ve been run IF checkdata THEN PERFORM 1 FROM product_domain_counts WHERE report_date = updateday LIMIT 1; IF FOUND THEN RAISE EXCEPTION ’product_domain_counts has already been run for %.’,updateday; END IF; END IF;

-- check if reports_clean is complete IF NOT reports_clean_done(updateday) THEN

140 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

IF checkdata THEN RAISE EXCEPTION ’Reports_clean has not been updated to the end of %’,updateday; ELSE RETURN TRUE; END IF; END IF;

-- now insert the new records -- this should be some appropriate query, this simple group by -- is just provided as an example INSERT INTO product_domain_counts ( product_version_id, domain_id, report_date, report_count ) SELECT product_version_id, domain_id, updateday, count(*) FROM reports_clean WHERE domain_id IS NOT NULL AND date_processed >= updateday::timestamptz AND date_processed < ( updateday + 1 )::timestamptz GROUP BY product_version_id, domain_id;

RETURN TRUE; END; $f$;

Note that the update functions could be written in PL/python if you wish; however, there isn’t yet a template for that.

13.18.2 Creating The Backfill Function

The second function which needs to be created is one for backfilling data for specific dates, for when we need to backfill missing or corrected data. This function will also be used to fill in data when we first deploy the matview. The backfill function will generally be very simple; it just calls a delete for the days data and then the update function, with the “checkdata” flag disabled: CREATE OR REPLACE FUNCTION backfill_product_domain_counts( updateday DATE ) RETURNS BOOLEAN LANGUAGE plpgsql AS $f$ BEGIN

DELETE FROM product_domain_counts WHERE report_date = updateday; PERFORM update_product_domain_counts(updateday, false);

RETURN TRUE; END; $f$;

13.18.3 Adding The Function To The Omnibus Backfill

Usually when we backfill data we recreate all matview data for the period affected. This is accomplished by inserting it into the backfill_matviews table: INSERT INTO backfill_matviews ( matview, function_name, frequency ) VALUES ( ’product_domain_counts’, ’backfill_product_domain_counts’, ’daily’ );

13.18. Creating the Matview Table 141 Socorro Documentation, Release 2

NOTE: the above is not yet active. Until it is, send a request to Josh Berkus to add your new backfill to the omnibus backfill function.

13.18.4 Filling in Initial Data

Generally when creating a new matview, we want to fill in two weeks or so of data. This can be done with either a Python or a PL/pgSQL script. A PL/pgSQL script would be created as a SQL file and look like this: DO $f$ DECLARE thisday DATE := ’2012-01-14’; lastday DATE; BEGIN

-- set backfill to the last day we have ADU for SELECT max("date") INTO lastday FROM raw_adu;

WHILE thisday <= lastday LOOP

RAISE INFO ’backfilling %’, thisday;

PERFORM backfill_product_domain_counts(thisday);

thisday := thisday + 1;

END LOOP;

END;$f$;

This script would then be checked into the set of upgrade scripts for that version of the database.

13.19 Database Admin Function Reference

What follows is a listing of custom functions written for Socorro in the PostgreSQL database which are intended for database administration, particularly scheduled tasks. Many of these functions depend on other, internal functions which are not documented. All functions below return BOOLEAN, with TRUE meaning completion, and throw an ERROR if they fail, unless otherwise noted.

13.20 MatView Functions

These functions manage the population of the many Materialized Views in Socorro. In general, for each matview there are two functions which maintain it: update_{matview_name} ( DATE ) fills in one day of the matview for the first time will error if data is already present, or source data is missing

142 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

backfill_{matview_name} ( DATE ) deletes one day of data for the matview and recreates it. will warn, but not error, if source data is missing safe for use without downtime

Exceptions to the above are generally for procedures which need to run hourly or more frequently (e.g. up- date_reports_clean, reports_duplicates). Also, some functions have shortcut names where they don’t use the full name of the matview (e.g. update_adu). Note that the various matviews can take radically different amounts of time to update or backfill ... from a couple of seconds to 10 minutes for one day. In addition, there are several procedures which are designed to update or backfill multiple matviews for a range of days. These are designed for when there has been some kind of widespread issue in crash processing and a bunch of crashes have been reprocessed and need to be re-aggregated. These mass-backfill functions generally give a lot of command-line feedback on their progress, and should be run in a screen session, as they may take hours to complete. These functions, as the most generally used, are listed first. If you are doing a mass-backfill, you probably want to limit the backfill to a week at a time in order to prevent it from running too long before committing.

13.20.1 backfill_matviews

Purpose: backfills data for all matviews for a specific range of dates. For use when data is either missing or needs to be retroactively corrected. Called By: manually by admin as needed backfill_matviews ( startdate DATE, optional enddate DATE default current_date, optional reportsclean BOOLEAN default true )

SELECT backfill_matviews( ’2011-11-01’, ’2011-11-27’, false ); SELECT backfill_matviews( ’2011-11-01’ ); startdate the first date to backfill enddate the last date to backfill. defaults to the current UTC date. reportsclean whether or not to backfill reports_clean as well. defaults to true supplied because the backfill of re- ports_clean takes a lot of time.

13.20.2 backfill_reports_clean

Purpose: backfill only the reports_clean normalized fact table. Called By: admin as needed backfill_reports_clean ( starttime TIMESTAMPTZ, endtime TIMESTAMPTZ, )

SELECT backfill_reports_clean ( ’2011-11-17’, ’2011-11-29 14:00:00’ );

13.20. MatView Functions 143 Socorro Documentation, Release 2 starttime timestamp to start backfill endtime timestamp to halt backfill at Note: if backfilling less than 1 day, will backfill in 1-hour increments. If backfilling more than one day, will backfill in 6-hour increments. Can take a long time to backfill more than a couple of days.

13.20.3 update_adu, backfill_adu

Purpose: updates or backfills one day of the product_adu table, which is one of the two matviews powering the graphs in socorro. Note that if ADU is out of date, it has no dependancies, so you only need to run this function. Called By: update function called by the update_matviews cron job. update_adu ( updateday DATE ); backfill_adu ( updateday DATE );

SELECT update_adu(’2011-11-26’);

SELECT backfill_adu(’2011-11-26’); updateday DATE of the UTC crash report day to update or backfill

13.20.4 update_products

Purpose: updates the list of product_versions and product_version_builds based on the contents of releases_raw. Called By: daily cron job update_products ( )

SELECT update_products ( ’2011-12-04’ );

Notes: takes no parameters as the product update is always cumulative. As of 2.3.5, only looks at product_versions with build dates in the last 30 days. There is no backfill function because it is always a cumulative update.

13.20.5 update_tcbs, backfill_tcbs

Purpose: updates “tcbs” based on the contents of the report_clean table Called By: daily cron job update_tcbs ( updateday DATE, checkdata BOOLEAN optional default true )

SELECT update_tcbs ( ’2011-11-26’ ); backfill_tcbs ( updateday DATE

144 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

)

SELECT backfill_tcbs ( ’2011-11-26’ ); updateday UTC day to pull data for. checkdata whether or not to check dependant data and throw an error if it’s not found. Notes: updates only “new”-style versions. Until 2.4, update_tcbs pulled data directly from reports and not re- ports_clean.

13.20.6 update_daily_crashes, backfill_daily_crashes

Purpose: updates “daily_crashes” based on the contents of the report_clean table Called By: daily cron job update_daily_crashes ( updateday DATE, checkdata BOOLEAN optional default true )

SELECT update_daily_crashes ( ’2011-11-26’ ); backfill_daily_crashes ( updateday DATE )

SELECT backfill_daily_crashes ( ’2011-11-26’ ); updateday UTC day to pull data for. checkdata whether or not to check dependant data and throw an error if it’s not found. Notes: updates only “new”-style versions. Until 2.4, update_daily_crashes pulled data directly from reports and not reports_clean. Probably the slowest of the regular update functions; can date up to 4 minutes to do one day.

13.20.7 update_rank_compare, backfill_rank_compare

Purpose: updates “rank_compare” based on the contents of the reports_clean table Called By: daily cron job update_rank_compare ( updateday DATE optional default yesterday, checkdata BOOLEAN optional default true )

SELECT update_rank_compare ( ’2011-11-26’ ); backfill_rank_compare ( updateday DATE optional default yesterday )

SELECT backfill_rank_compare ( ’2011-11-26’ ); updateday UTC day to pull data for. Optional; defaults to ( CURRENT_DATE - 1 ). checkdata whether or not to check dependant data and throw an error if it’s not found.

13.20. MatView Functions 145 Socorro Documentation, Release 2

Note: this matview is not historical, but contains only one day of data. As such, running either the update or backfill function replaces all existing data. Since it needs an exclusive lock on the matview, it is possible (though unlikely) for it to fail to obtain the lock and error out.

13.20.8 update_nightly_builds, backfill_nightly_builds

Purpose: updates “nightly_builds” based on the contents of the reports_clean table Called By: daily cron job update_nightly_builds ( updateday DATE optional default yesterday, checkdata BOOLEAN optional default true )

SELECT update_nightly_builds ( ’2011-11-26’ ); backfill_nightly_builds ( updateday DATE optional default yesterday )

SELECT backfill_nightly_builds ( ’2011-11-26’ ); updateday UTC day to pull data for. checkdata whether or not to check dependant data and throw an error if it’s not found. Optional.

13.21 Schema Management Functions

These functions support partitioning, upgrades, and other management of tables and views.

13.21.1 weekly_report_partitions

Purpose: to create new paritions for the reports table and its child tables every week. Called By: weekly cron job weekly_report_partitions ( optional numweeks integer default 2, optional targetdate date default current_date )

SELECT weekly_report_partitions(); SELECT weekly_report_partitions(3,’2011-11-09’); numweeks number of weeks ahead to create partitions targetdate date for the starting week, if not today

13.21.2 try_lock_table

Purpose: attempt to get a lock on a table, looping with sleeps until the lock is obtained. Called by: various functions internally

146 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

try_lock_table ( tabname TEXT, mode TEXT optional default ’EXCLUSIVE’, attempts INT optional default 20 ) returns BOOLEAN

IF NOT try_lock_table(’rank_compare’, ’ACCESS EXCLUSIVE’) THEN RAISE EXCEPTION ’unable to lock the rank_compare table for update.’; END IF; tabname the table name to lock mode the lock mode per PostgreSQL docs. Defaults to ‘EXCLUSIVE’. attempts the number of attempts to make, with 3 second sleeps between each. optional, defaults to 20. Returns TRUE for table locked, FALSE for unable to lock.

13.21.3 create_table_if_not_exists

Purpose: creates a new table, skipping if the table is found to already exist. Called By: upgrade scripts create_table_if_not_exists ( tablename TEXT, declaration TEXT, tableowner TEXT optional default ’breakpad_rw’, indexes TEXT ARRAY default empty list )

SELECT create_table_if_not_exists ( ’rank_compare’, $q$ create table rank_compare ( product_version_id int not null, signature_id int not null, rank_days int not null, report_count int, total_reports bigint, rank_report_count int, percent_of_total numeric, constraint rank_compare_key primary key ( product_version_id, signature_id, rank_days ) );$q$, ’breakpad_rw’, ARRAY [ ’product_version_id,rank_report_count’, ’signature_id’ ]); tablename name of the new table to create declaration full CREATE TABLE sql statement, plus whatever other SQL statements you only want to run on table creation such as priming it with a few records and creating the primary key. If running more than one SQL statement, separate them with semicolons. tableowner the ROLE which owns the table. usually ‘breakpad_rw’. optional. indexes an array of sets of columns to create regular btree indexes on. use the array declaration as demonstrated above. default is to create no indexes. Note: this is the best way to create new tables in migration scripts, since it allows you to rerun the script multiple times without erroring out. However, be aware that it only checks for the existance of the table, not its definition, so if you modify the table definition you’ll need to manually drop and recreate it.

13.21. Schema Management Functions 147 Socorro Documentation, Release 2

13.22 Other Administrative Functions

13.22.1 add_old_release

Purpose: Allows you to add an old release to productdims/product_visibility. Called By: on demand by Firefox or Camino teams. add_old_release ( product_name text, new_version text, release_type release_enum default ’major’, release_date DATE DEFAULT current_date, is_featured BOOLEAN default FALSE ) returns BOOLEAN

SELECT add_old_release (’Camino’,’2.1.1’); SELECT add_old_release (’Camino’,’2.1.2pre’,’development’,’2012-03-09’,true);

Notes: if this leads to more than 4 currently featured versions, the oldest featured vesion will be “bumped”.

13.23 Custom Time-Date Functions

The present Socorro database needs to do a lot of time, date and timezone manipulation. This is partly a natural consequence of the application, and the need to use both DATE and TIMESTAMPTZ values. The greater need is legacy timestamp, conversion, however; currently the processors save crash reporting timestamps as TIMESTAMP WITHOUT TIMEZONE in Pacific time, whereas the rest of the database is TIMESTAMP WITH TIME ZONE in UTC. This necessitates a lot of tricky time zone conversions. The functions below are meant to make it easier to write queries which return correct results based on dates and timestamps.

13.23.1 tstz_between tstz_between ( tstz TIMESTAMPTZ, bdate DATE, fdate DATE ) RETURNS BOOLEAN

SELECT tstz_between ( ’2011-11-25 15:23:11-08’, ’2011-11-25’, ’2011-11-26’ );

Checks whether a timestamp with time zone is between two UTC dates, inclusive of the entire ending day.

13.23.2 utc_day_is utc_day_is ( TIMESTAMPTZ, TIMESTAMP or DATE ) RETURNS BOOLEAN

148 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

SELECT utc_day_is ( ’2011-11-26 15:23:11-08’, ’2011-11-28’ );

Checks whether the provided timestamp with time zone is within the provided UTC day, expressed as either a times- tamp without time zone or a date.

13.23.3 utc_day_near utc_day_near ( TIMESTAMPTZ, TIMESTAMP or DATE ) RETURNS BOOLEAN

SELECT utc_day_near ( ’2011-11-26 15:23:11-08’, ’2011-11-28’ );

Checks whether the provided timestamp with time zone is within an hour of the provided UTC day, expressed as either a timestamp without time zone or a date. Used for matching when related records may cross over midnight.

13.23.4 week_begins_utc week_begins_utc ( TIMESTAMP or DATE ) RETURNS timestamptz

SELECT week_begins_utc ( ’2011-11-25’ );

Given a timestamp or date, returns the timestamp with time zone corresponding to the beginning of the week in UTC time. Used for partitioning data by week.

13.23.5 week_ends_utc week_ends_utc ( TIMESTAMP or DATE ) RETURNS timestamptz

SELECT week_ends_utc ( ’2011-11-25’ );

Given a timestamp or date, returns the timestamp with time zone corresponding to the end of the week in UTC time. Used for partitioning data by week.

13.23.6 week_begins_partition week_begins_partition ( partname TEXT ) RETURNS timestamptz

SELECT week_begins_partition ( ’reports_20111219’ );

Given a partition table name, returns a timestamptz of the date and time that weekly partition starts.

13.23. Custom Time-Date Functions 149 Socorro Documentation, Release 2

13.23.7 week_ends_partition week_ends_partition ( partname TEXT ) RETURNS timestamptz

SELECT week_ends_partition ( ’reports_20111219’ );

Given a partition table name, returns a timestamptz of the date and time that weekly partition ends.

13.23.8 week_begins_partition_string week_begins_partition_string ( partname TEXT ) RETURNS text

SELECT week_begins_partition_string ( ’reports_20111219’ );

Given a partition table name, returns a string of the date and time that weekly partition starts in the format ‘YYYY- MM-DD HR:MI:SS UTC’.

13.23.9 week_ends_partition_string week_ends_partition_string ( partname TEXT ) RETURNS text

SELECT week_ends_partition_string ( ’reports_20111219’ );

Given a partition table name, returns a string of the date and time that weekly partition ends in the format ‘YYYY- MM-DD HR:MI:SS UTC’.

13.24 Database Misc Function Reference

What follows is a listing of custom functions written for Socorro in the PostgreSQL database which are useful for application development, but do not fit in the “Admin” or “Datetime” categories.

13.25 Formatting Functions

13.25.1 build_numeric build_numeric ( build TEXT ) RETURNS NUMERIC

SELECT build_numeric ( ’20110811165603’ );

150 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

Converts a build ID string, as supplied by the processors/breakpad, into a numeric value on which we can do compu- tations and derive a date. Returns NULL if the build string is a non-numeric value and thus corrupted.

13.25.2 build_date build_date ( buildid NUMERIC ) RETURNS DATE

SELECT build_date ( 20110811165603 );

Takes a numeric build_id and returns the date of the build.

13.26 API Functions

These functions support the middleware, making it easier to look up certain things in the database.

13.26.1 get_product_version_ids get_product_version_ids ( product CITEXT, versions VARIADIC CITEXT )

SELECT get_product_version_ids ( ’Firefox’,’11.0a1’ ); SELECT get_product_version_ids ( ’Firefox’,’11.0a1’,’11.0a2’,’11.0b1’);

Takes a product name and a list of version_strings, and returns an array (list) of surrogate keys (product_version_ids) which can then be used in queries like:

SELECT * FROM reports_clean WHERE date_processed BETWEEN ’2012-03-21’ AND ’2012-03-38’ WHERE product_version_id = ANY ( $list );

13.27 Populate PostgreSQL

Socorro supports multiple products, each of which may contain multiple versions. • A product is a global product name, such as Firefox, Thunderbird, Fennec, etc. • A version is a revision of a particular product, such as Firefox 3.6.6 or Firefox 3.6.5 • A branch is the indicator for the Gecko platform used in a Mozilla product / version. If your crash reporting project does not have a need for branch support, just enter “1.0” as the branch number for your product / version.

13.27.1 Customize CSV files

Socorro comes with a set of CSV files you can customize and use to bootstrap your database. Shut down all Socorro services, drop your database (if needed) and load the schema. From inside the Socorro checkout, as postgres user:

13.26. API Functions 151 Socorro Documentation, Release 2

./socorro/external/postgresql/setupdb_app.py --database_name=breakpad_rw

Customize CSVs, at minimum you probably need to bump the dates and build IDs in: raw_adu.csv reports.csv releases_raw.csv You will probably want to change “WaterWolf” to your own product name and version history, if you are setting this up for production. Also, note that the backfill procedure will ignore build IDs over 30 days old. From inside the Socorro checkout, as the postgres user: cd tools/dataload edit *.csv ./import.sh

See PostgreSQL Database Tables by Data Source for a complete explanation of each table.

13.27.2 Run backfill function to populate matviews

Socorro depends upon materialized views which run nightly, to display graphs and show reports such as “Top Crash By Signature”. IMPORTANT NOTE - many reports use the reports_clean_done() stored procedure to check that reports exist for the last UTC hour of the day being processed, as a way to catch problems. If your crash volume is low enough, you may want to modify this function (it is in breakpad_schema.sql referenced above). Normally this is run for the previous day by cron_daily_matviews.sh but you can simply run the backfill function to bootstrap the system: This is normally run by the import.sh, so take a look in there if you need to make adjustments. There also needs to be at least one featured version, which is controlled by setting “featured_version” column to “true” for one or more rows in the product_version table. Restart memcached as the root user: /etc/init.d/memcached restart

Now the Socorro UI should now work. You can change settings using the admin UI, which will be at http://crash-stats/admin (or the equivalent hostname for your install.)

13.27.3 Load data via snapshot

If you have access to an existing Socorro database snapshot, you can load it like so: # shut down database users sudo /etc/init.d/supervisor force-stop sudo /etc/init.d/apache2 stop

# drop old db and load snapshot sudo su - postgres dropdb breakpad createdb -E ’utf8’ -l ’en_US.utf8’ -T template0 breakpad pg_restore -Fc -d breakpad minidb.dump

This may take several hours, depending on your hardware. One way to speed this up would be to:

152 Chapter 13. PostgreSQL Database Socorro Documentation, Release 2

• If in a VirtualBox environment, add more CPU cores to the VM (via virtualbox GUI), default is 1 • Add “-j n” to pg_restore command above, where n is number of CPU cores - 1

13.27. Populate PostgreSQL 153 Socorro Documentation, Release 2

154 Chapter 13. PostgreSQL Database CHAPTER 14

How generic app and an example works using configman

14.1 The minimum app

To illustrate the example, let’s look at an example of an app that uses generic_app to leverage configman to run. Let’s look at weeklyReportsPartitions.py As you can see, it’s a subclass of the socorro.app.generic_app.App class which is a the-least-you-need wrapper for a minimal app. As you can see, it takes care of logging and executing your main function.

14.2 Connecting and handling transactions

Let’s go back to the weeklyReportsPartitions.py cron script and take a look at what it does. It only really has one configman option and that’s the transaction_executor_class. The default value is TransactionExecutorWithBackoff which is the class that’s going to take care of two things: 1. execute a callable that accepts an opened database connection as first and only parameter 2. committing the transaction if there are no errors and rolling back the transaction if an exception is raised 3. NB: if an OperationalError or InterfaceError exception is raised, TransactionExecutorWithBackoff will log that and retry after configurable delay Note that TransactionExecutorWithBackoff is the default transaction_executor_class but if you override it, for example by the command line, with TransactionExecutor no exceptions are swallowed and it doesn’t retry. Now, connections are created and closed by the ConnectionContext class. As you might have noticed, the default database_class defined in the TransactionExecutor is socorro.external.postgresql.connection_context.ConnectionContext as you can see here The idea is that any external module (e.g. HBase, PostgreSQL, etc) can define a ConnectionContext class as per this model. What its job is is to create and close connections and it has to do so in a contextmanager. What that means is that you can do this: connector= ConnectionContext() with connector() as connection: # opens a connection do_something(connection) # closes the connection

And if errors are raised within the do_something function it doesn’t matter. The connection will be closed.

155 Socorro Documentation, Release 2

14.3 What was the point of that?!

For one thing, this app being a configman derived app means that all configuration settings are as flexible as configman is. You can supply different values for any of the options either by the command line (try running --help on the ./weeklyReportsPartitions.py script) and you can control them with various configuration files as per your liking. The other thing to notice is that when writing another similar cron script, all you need to do is to worry about exactly what to execute and let the framework take care of transactions and opening and closing connections. Each class is supposed to do one job and one job only. configman uses not only basic options such as database_password but also more complex options such as aggregators. These are basically invariant options that depend on each other and uses functions in there to get its stuff together.

156 Chapter 14. How generic app and an example works using configman CHAPTER 15

Writing documentation

To contribute with your documentation follow these steps to be able to modify the git repo, build a local copy and deploy on ReadTheDocs.org.

15.1 Installing Sphinx

Sphinx is an external tool that compiles these reStructuredText into HTML. Since it’s a python tool you can install it with easy_install or pip like this: pip install sphinx

15.2 Making the HTML

Now you can build the docs with this simple command: cd docs make html

This should update the revelant HTML files in socorro/docs/_build and you can preview it locally like this (on OS X for example): open _build/html/index.html

To modify the index itself, edit index.rst and (for instance you may want to add or remove a document filename, without the rst extension, from the ”.. toctree::” section)

15.3 Making it appear on ReadTheDocs

ReadTheDocs.org is wired to build the documentation nightly from this git repository but if you want to make docu- mentation changes appear immediately you can use their webhooks to re-create the build and update the documentation right away.

15.4 Or, just send the pull request

If you have a relevant update to the documentation but don’t have time to set up your Sphinx and git environment you can just edit these files in raw mode and send in a pull request.

157 Socorro Documentation, Release 2

15.5 Or, just edit the documentation online

The simplest way to edit the documentation is to just edit it inside the Github editor. To get started, go to https://github.com/mozilla/socorro and browse in the docs directory to find the file you want to edit. Then click the “Edit this file” button in the upper right-hand corner and type away. When you’re done, write a comment underneath and click “Commit Changes”. If you are unsure about how to edit reStructuredText and don’t want to trial-and-error your way through the editing, then one thing you can do is to copy the text into an online reStructuredText editor and see if you get the syntax right. Obviously you’ll receive warnings and errors about broken internal references but at least you’ll know if syntax is correct.

158 Chapter 15. Writing documentation CHAPTER 16

Indices and tables

• genindex • modindex • search

159