Report of the 2003 Meeting of the ICCAT Working Group On

SCRS/2008/024 Collect. Vol. Sci. Pap. ICCAT, 64(7):2617-2640 (2009) DESCRIPTION OF THE ICCAT TAGGING INFORMATION SYSTEM C. Palma 1 and P. Kebe1 SUMMARY This document describes the major elements of the ICCAT tagging information system, namely the data collection process, the relational database management system, the coding system and standard rules adopted, and, the revision policy behind the tagging information received by the various ICCAT parties. RÉSUMÉ Le présent document décrit les principaux éléments du système d’information de marquage de l’ICCAT, à savoir le processus de collecte de données, le système de gestion des bases de données relationnelles, le système de codage et les règles standard adoptées ainsi que la politique de révision faisant suite à la réception des informations de marquage émanant des diverses Parties de l’ICCAT. RESUMEN En este documento se describen los principales elementos del sistema ICCAT de información sobre marcado, es decir, los procesos de recopilación de datos, el sistema de gestión de la base de datos relacional, el sistema de codificación y las normas estándar adoptadas, así como la política de revisión de la información sobre marcado recibida de varias Partes de ICCAT. KEYWORDS Tagging, tag codes, tag recoveries 1. Introduction ICCAT has collected all over the years tagging information on tuna, tuna like species and sharks, made available by various contracting parties (or related Scientific Institutions) being United States one of the most important providers of tagging data. Up to 2001, the information was processed, validated and stored in structured ASCII files. Since then, the Secretariat started the migration process to a relational database system (MS-SQL Server 2000). During that period, various improvements were made the central tagging database system, in particular the inclusion of components to manage the ICCAT conventional tag inventory, code normalization, automatic data assimilation, validation procedures, multi-tier data management programs, and, the migration to the MS-SQL Server 2005 platform. During last year, the Secretariat started a complex revision/adjustment of the ICCAT tagging database system, in order to incorporate the recommendations proposed by the Ad Hoc Working Group on Tagging Coordination (Madrid, March 15-16 2007) and adopted by the Sub-Committee on Statistics (Anon 2007). Nowadays, the ICCAT tagging database system has integrated a much of the recommendation proposals, but this work still under way. Around 5 full additional weeks will be needed to finalize this ongoing task. Two new modules are under the prototyping phase (Survey manager and Electronic Tagging manager). Moreover, several auxiliary tables newly created (or under normalization) have a preliminary coding system. Guidance was requested to experts (as stated in Appendix 6 of Anon, 2007) to implement those coding systems. Although, the description of the ICCAT tagging database system here presented is focused on its current state, several explanations were included foreseeing its definitive development status. In addition, it’s slightly 1 ICCAT Secretariat, c/ Corazón de María, 8 (6th fl.), 28002 Madrid, Spain. 2617 technical content, aims to contribute to the development of the ICCAT-USA tagging data exchange protocol, and must not be viewed as a complete reference manual. 2. Data collection and processing Despite the existence of official reporting forms (www.iccat.int/Documents/Stats/tag-lottery-ENG.pdf ), tagging information is reported to ICCAT with heterogeneous structures and in a multitude of formats. Since 2002, each dataset received by ICCAT Secretariat is inventoried and the source information (electronic files, emails, and “pdf” copies of data received on paper) is stored in a proper file server location. The inventory process consists of registering the dataset with a unique ICCAT internal registry number in conjunction with various attributes of the source files received (reporting person/Institution, date reported, date registered, storage location links, etc.). In a primary (after inventory) phase the information is verified, recoded and transformed into an uniform structure easily assimilated by the Tagging database system. The time spent during this process can vary from around 1 hour (for the simplest and normalized cases) to more than one month (entire database revisions). On a second phase, the normalized data (already transformed into ICCAT standard codes) is dumped to a temporary table set (inside the main database) for posterior validation. A unique data input process number ( “InProcID”) is given to the dataset (database dependent identifier). During the validation process, the corresponding release and recovery data is cross-checked with the available data in the main table set, and classified in two different subsets: “new info” (if none of the key fields matches; “updated info” (if some of the key fields matches). All the “new info” subset is added to the main database. Each tuple (record) of the “updated info” passes through a second fine grain analysis (field based). At the end, a new record is generated combining the unchanged fields (current record of the main database tables) with the updated fields (temporary record from temporary tables set), which will replace the current record. When needed, the three records (current, temporary, new) are visualized side-by-side before replacing the information. Currently, the record that has been replaced is dropped from the database (not and effective historical track change system right now) but the update event is recorded in special fields. The secretariat has also under development a module that fulfills this requirement. 3. Data base management system The tagging relational database management system (RDBMS) is composed of various parts. A MS-SQL Server 2005 database (about 50 tables in total), one main application (VBA Projects in MS_ACCESS 2007) used as a direct interface to tagging information, and a MS-SQL Server Managements Studio Solution (contains various projects) which manages in a centralized way all the batch scripting development. The administrative system (security tasks, server roles, credentials, backup system, maintenance plans, etc.) is embedded in the server infrastructure. 3.1 Design principles As all the databases developed by the Secretariat within the ICCAT database system (ICCAT.DB), the tagging database (DB.TAG) adopted a number of standard design concepts. For the objectives of this document he most important ones are: ‐ “CamelCase” naming convention (en.wikipedia.org/wiki/CamelCase): used for persistent objects like tables, fields, indexes, functions, stored procedures, etc. (Uppercase is for reserved word only) ‐ Normalization principle: a balanced database normalization form (compromise between flexibility and strictness) was adopted in design. Tables were, at least, normalized to the 3NF standard (Third normal form). Note that, higher normalization degrees typically involve more tables and a larger number of joins, which can reduce performance, while lower normalized degrees are more flexible but more sensible to errors (en.wikipedia.org/wiki/Database_normalization#Normal_forms). ‐ Unicode support: all system was designed to fully support Unicode character set. ‐ Uniqueness of names: every name identifying a distinct object is unique across the entire ICCAT-DB (yet, the same object can exist in two distinct databases: e.g. Species table is shared across various databases) ‐ Object naming rules: only these characters are allowed in the names: [A…Z; a…z; 0…9; “_”]. Spaces or other special characters are not allowed in name composition. 2618 ‐ Table/Fields naming rules: Plural for naming Table objects. Singular for naming field object (e.g. “Species” will store species data, with their respective attributes – fields – being: SpeciesID, SpeciesCode, SpecScientificName, etc.) ‐ Reserved suffixes: “ID” for the primary key (mostly integer type) unique identifier field; “Code” for a secondary (alfa-numeric type) unique identifier field. 3.2 Field types standard formats There were also adopted a set of standard generic formats for certain field types. Table 1 describes the most important ones: 3.3 Database structures Figure 1 gives a simplified perspective (table names and relationships) of the entire Tagging database structure (detailed Table definitions are presented in Appendix 1). The internal database structure has various functional modules (groups of tables and other objects serving a specific purpose) each one taking care of a specific job. The modules are: ‐ Release/recovery data: manages all the tagging information (releases/recoveries) received in the Secretariat by the ICCAT parties. Figure 2 shows its structure. ‐ ICCAT tags inventory: manages all the conventional tags series assembled for ICCAT over time (production, distribution of set, etc.). The corresponding table structures and relationships are detailed in Figure 3. ‐ Tagging surveys (ongoing task): manages Tag Surveys information reported to the Secretariat. A preliminary structure of this module is shown in Figure 4. ‐ Historical updates (ongoing task): stores all the changes (records updated/replaced) made to main tables. This module shall only take care of the changes made to the Releases/recovery main tables. ‐ Lottery (ongoing task): manages the ICCAT tag lottery system. Each module interact each other (sharing data, virtual linkage) in a transparent way to the user (trigger automation). For example, when a

Report of the 2003 Meeting of the ICCAT Working Group On

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support