<<

NCI Genomic Data Commons (GDC)

GDC DATA PROCESSING

GDC Data Processing Page 1 NCI Genomic Data Commons (GDC)

Genomic Data Processing

Genomic Data Processing Overview

The GDC uses submitted FASTQ or BAM formatted sequence and microarray data to generate derived analysis data. This includes analyses such as tumor sequence variant calls, RNA-Seq gene expression quantification values, and copy-number segmentation values.

Sequence data is aligned (or realigned) to the latest human genome reference. The resulting alignments are then processed to produce derived data. The alignment and derived data are available to users via the GDC Data Portal [8]. Array data is processed using data type specific methods.

Each phase of processing is standardized into common pipelines that use open source sequence analysis tools. All sequence data submitted to the GDC is subjected to analysis through these standard pipelines. When data is successfully processed, it is made available through the GDC Data Portal and other access tools. If the data processing reveals underlying issues in the data, the associated files will be recalled and will not be available through the GDC.

The genomic data processing pipelines were developed in consultation with senior experts in the field of genomics and are regularly evaluated and updated as current tools and parameter sets are improved and developed.

Reference Genome and Alignment Workflow

Reference genome alignment is the first step of data processing for all sequencing-based workflows. While different alignment algorithms are used for each case depending on read length and type, all alignments are performed on the same version of the GRCh38 reference genome. See the GDC Documentation site for details on the algorithm used for each pipeline. Viral and decoy sequences are included, which draw reads that would not normally map to the human genome, provide information on the presence of , and allow for a more accurate alignment. The current virus decoy set contains 10 types of human viruses, including human cytomegalovirus (CMV), Epstein-Barr virus (EBV), hepatitis B (HBV), hepatitis C (HCV), human immunodeficiency virus (HIV), human herpes virus 8 (HHV-8), human T-lymphotropic virus 1 (HTLV-1), Merkel cell polyomavirus (MCV), simian vacuolating virus 40 (SV40) and human papillomavirus (HPV).

An initial alignment is performed separately on each read group, which is defined as a set of reads that originates from one Illumina sequencing lane. The subsequent set of alignments that originate from a single aliquot are then merged. Pipeline-specific details about the alignment and

GDC Data Processing Page 2 NCI Genomic Data Commons (GDC) downstream analyses can be found in their respective section or documentation site.

GDC Data Harmonization Pipeline

Image not found or type unknown [9]

GDC Pipeline Overviews

Brief summaries of the workflow used by the GDC are listed below. Each summary has a link to its corresponding section of the GDC Documentation Website [10]. The GDC Documentation website contains details about each step of the pipeline, the command-line parameters used to run each step, and information about the corresponding files available at the GDC Data Portal.

DNA-Seq WXS Somatic Variant Analysis

The DNA-Seq Somatic Variant Analysis pipeline identifies and characterizes somatic mutations by comparing reference alignments from tumor and normal samples from the same case. The validity of these mutations is assessed using internal algorithms and external variant databases. A co-cleaning step is implemented by recalibrating base quality scores and realigning indels for a more accurate alignment. Four separate algorithms (MuSE [11], Mutect2 [12], SomaticSniper [13], Varscan2 [14]) are then used to perform variant calling on paired tumor/normal samples to identify somatic mutations. Variants are annotated independently and with information from external databases such as dbSNP [15] and OMIM [16]. All annotated variant calls from one project are then aggregated into one MAF file [17] per variant calling pipeline. MAF files are filtered to remove any potentially erroneous or germline variant calls. After they are filtered, open-access Somatic MAFs are available to the general public, whereas the unfiltered MAFs are available only to dbGaP- authorized investigators.

DNA-Seq-Flow

Image not found or type unknown [18]

See the GDC Documentation website for an overview of the:

DNA-Seq Analysis Pipeline [19]

VCF File Format [20]

MAF File Format [17]

GDC Data Processing Page 3 NCI Genomic Data Commons (GDC)

DNA-Seq WGS Somatic Variant Analysis

The DNA-Seq WGS data is aligned using the same method as the WXS pipeline documented above. WGS variant calling uses a pipeline developed by the Sanger Institute. This pipeline calls somatic variants using CaVEMan and Pindel, copy number variants using ASCAT-NGS, and structural variants using BRASS.

WGS-Sanger-Pipeline

Image not found or type unknown [21]

See the GDC Documentation website for an overview of the:

DNA-Seq Analysis Pipeline [19]

RNA-Seq Gene Expression Analysis

The RNA-Seq Analysis pipeline quantifies protein-coding gene expression based on the number of reads aligned to each gene. A "two-pass" method is used in which RNA-Seq reads are first aligned to the reference genome to detect splice junctions. A second alignment is then performed using the information from splice junctions to increase the quality of the alignment. Read counts are measured on a gene level using HTSeq and normalized using the Fragments Per Kilobase of transcript per Million mapped reads (FPKM) and FPKM Upper Quartile (FPKM-UQ) methods with custom scripts. Transcript fusion files are also generated using STAR Fusion and Arriba.

RNA-Seq Gene Expression Analysis Pipeline

Image not found or type unknown [22]

See the GDC Documentation website for a detailed overview of the:

mRNA-Seq Analysis Pipeline [23] scRNA-Seq Gene Expression Analysis

The scRNA-Seq Analysis pipeline generates counts using CellRanger, which are available in both filtered and raw format. Seurat is then used to perform a secondary expression analysis on the counts, which produces coordinates for several methods of graphical representation, differentially expressed genes, and the full analysis in loom format.

GDC Data Processing Page 4 NCI Genomic Data Commons (GDC) scRNA-Workflow

Image not found or type unknown [24]

See the GDC Documentation website for a detailed overview of the:

scRNA-Seq Analysis Pipeline [25] miRNA-Seq Analysis

The miRNA-Seq pipeline quantifies micro-RNA gene expression. The names and genomic locations of each miRNA are retrieved from miRBase [26], and the expression levels are measured and normalized post-alignment. Normalization is performed using the Reads per Millions (RPM) method. Expression levels for known miRNAs and observed miRNA isoforms are generated for each sample. miRNA-Seq-Flow

Image not found or type unknown [27]

See the GDC Documentation website for a detailed overview of the:

miRNA Expression Pipeline [28]

Copy Number Variation Analysis

The Copy Number Variation Analysis pipeline detects duplications or deletions of contiguous chromosomal regions by measuring array intensity levels. Level 2 [29] normalized array data is used to perform circular binary segmentation analysis with the R package DNACopy [30]. Circular binary segmentation (CBS) analysis divides each chromosome into contiguous segments of equal copy-number and quantifies each.

CNV-Flow

Image not found or type unknown [31]

See the GDC Documentation website for a detailed overview of the:

Copy Number Variation Analysis Pipeline [32]

Methylation Array Analysis

GDC Data Processing Page 5 NCI Genomic Data Commons (GDC)

At the GDC, beta values calculated from methylation array analyses are processed by converting previous coordinates from an older reference genome to the newer GRCh38 reference genome. Processed methylation array probe sets were analyzed to determine transcript and CpG Island (CGI) proximity to each associated CpG site. This metadata is associated with each methylation beta value and associated probe, which is matched to a specific CpG site.

Methylation-Flow

Image not found or type unknown [33]

See the GDC Documentation website for a detailed overview of the:

Methylation Liftover Pipeline [34]

Pipeline Implementation

GDC pipelines are packaged as a series of Docker containers [35]. Docker can wrap up a complete environment that contains everything a bioinformatics pipeline needs to run. This includes code, runtime, tools, and, libraries. This method significantly improves reproducibility and portability of bioinformatics software in Linux systems. Realignment annotations, such as the docker ID, time cost, and exact command used in the docker container are stored as properties of the workflow for each file created. All other QC metrics and realignment tool logs are saved as individual files in the object store. For data remediation, the GDC examines all QC results manually for problem detection. The GDC will establish criteria and implement automatic remediation steps in the workflow.

Biospecimen Data Standardization

Biospecimen data refers to information associated with the physical sample taken from a participant and its processing down to the aliquot level for sequencing experiments. This data falls into several key categories:

Standard Identifiers: project-unique identifiers and universally unique identifiers (UUIDs) that enable cases and samples to be referenced and linked to associated clinical and analytical data Provenance: metadata that indicates the upstream sources of the sample (research program, research project, and donor individual) as well as the downstream products of sample processing (e.g., extracted DNA or RNA analyte) Quality Control: metadata that express the values of quality control tests performed on

GDC Data Processing Page 6 NCI Genomic Data Commons (GDC)

biospecimens and analyzed products (e.g., percent tumor nuclei, RIN values, A260/A240 values)

For major NCI CCG programs, biospecimen data is provided by a Biospecimen Core Resource (BCR) under contract to the NCI. Data is submitted in an established, schema-valid XML format. This data includes program and project identifiers, UUIDs, and the relationships between case, sample, and aliquot. UUIDs submitted by BCRs are typically adopted by the GDC.

For other submitters, data in the BCR XML format is accepted. However, the GDC also provides a simpler means for submission of a minimal set of biospecimen data, in which a data may be formatted in a JSON or tab-delimited (TSV) text file and submitted to the GDC Submission Portal.

The GDC Data Model [36] uses a graph representation that has no technical limits on adjusting the entities and relationships. However there may be effects on quality control, reporting, accounting and user interface/experience. Therefore, major changes to the model needed to support new biospecimen information will undergo review by the GDC Data Model Change Control Board.

Submitting Biospecimen Entities

Links to the dictionary entry for each biospecimen entity are listed below. Each entry contains information about each field and a downloadable template for submission.

Case [37] Sample [38] Portion [39] Analyte [40] Aliquot [41] Read Group [42] Slide [43]

Biospecimen Entity Field Information [44]

Term Category CDE Required?

a260 a280 ratio [45] Analyte 5432595 No

adapter name [46] Read Group --- No

adapter sequence [47] Read Group --- No

aliquot quantity [48] Aliquot --- No

aliquot volume [49] Aliquot --- No

amount [50] Aliquot --- No

amount [51] Analyte --- No

GDC Data Processing Page 7 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

analyte quantity [52] Analyte --- No

analyte type id [53] Aliquot 5432508 No

analyte type id [54] Analyte 5432508 No

analyte type [55] Aliquot 2513915 No

analyte type [56] Analyte 2513915 Yes

analyte volume [57] Analyte --- No

base caller name [58] Read Group --- No

base caller version [59] Read Group --- No

biospecimen anatomic site [60] Sample 4742851 No

biospecimen laterality [61] Sample 2007875 No

bone marrow malignant cells [62] Slide --- No

catalog reference [63] Sample --- No

chipseq antibody [64] Read Group --- No

chipseq target [65] Read Group --- No

composition [66] Sample 5432591 No

concentration [67] Aliquot 5432594 No

concentration [68] Analyte 5432594 No

consent type [69] Case --- No

creation datetime [70] Portion 5432592 No

current weight [71] Sample 5432606 No

days to collection [72] Sample 3008340 No

days to consent [73] Case --- No

days to lost to followup [74] Case 6154721 No

days to sample procurement [75] Sample --- No

days to sequencing [76] Read Group --- No

diagnosis pathologically confirmed [77] Sample --- No

disease type [78] Case 6161017 No

distance normal to tumor [79] Sample 3088708 No

distributor reference [80] Sample --- No

GDC Data Processing Page 8 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

experiment name [81] Read Group --- Yes

experimental protocol type [82] Analyte --- No

flow cell barcode [83] Read Group --- No

fragment maximum length [84] Read Group --- No

fragment mean length [85] Read Group --- No

fragment minimum length [86] Read Group --- No

fragment standard deviation length [87] Read Group --- No

fragmentation enzyme [88] Read Group --- No

freezing method [89] Sample 5432607 No

growth rate [90] Sample --- No

includes spike ins [91] Read Group --- No

index date [92] Case 6154722 No

initial weight [93] Sample 5432605 No

instrument model [94] Read Group 5432604 No

intermediate dimension [95] Sample --- No

is ffpe [96] Portion 4170557 No

is ffpe [97] Sample 4170557 No

is paired end [98] Read Group --- Yes

lane number [99] Read Group --- No

library name [100] Read Group --- Yes

library preparation kit catalog number [101] Read Group --- No

library preparation kit name [102] Read Group --- No

library preparation kit vendor [103] Read Group --- No

library preparation kit version [104] Read Group --- No

library selection [105] Read Group --- Yes

library strand [106] Read Group --- No

library strategy [107] Read Group --- Yes

longest dimension [108] Sample 5432602 No

lost to followup [109] Case 6161018 No

GDC Data Processing Page 9 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

method of sample procurement [110] Sample --- No

multiplex barcode [111] Read Group --- No

no matched normal low pass wgs [112] Aliquot --- No

no matched normal targeted sequencing [113] Aliquot --- No

no matched normal wgs [114] Aliquot --- No

no matched normal wxs [115] Aliquot --- No

normal tumor genotype snp match [116] Analyte 4588156 No

number expect cells [117] Read Group --- No

number proliferating cells [118] Slide 5432636 No

oct embedded [119] Sample 5432538 No

passage count [120] Sample --- No

report uuid [121] Sample --- No

percent eosinophil infiltration [122] Slide 2897700 No

percent follicular component [123] Slide --- No

percent granulocyte infiltration [124] Slide 2897705 No

percent inflam infiltration [125] Slide 2897695 No

percent lymphocyte infiltration [126] Slide 2897710 No

percent monocyte infiltration [127] Slide 5455535 No

percent necrosis [128] Slide 2841237 No

percent neutrophil infiltration [129] Slide 2841267 No

percent normal cells [130] Slide 2841233 No

percent rhabdoid features [131] Slide 6790120 No

percent sarcomatoid features [132] Slide 2429786 No

percent stromal cells [133] Slide 2841241 No

percent tumor cells [134] Slide 5432686 No

percent tumor nuclei [135] Slide 2841225 No

platform [136] Read Group --- Yes

portion number [137] Portion 5432711 No

preservation method [138] Sample 5432521 No

GDC Data Processing Page 10 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

primary site [139] Case 6161019 No

prostatic chips positive count [140] Slide --- No

prostatic chips total count [141] Slide --- No

prostatic involvement percent [142] Slide --- No

read group name [143] Read Group --- Yes

read length [144] Read Group --- Yes

ribosomal rna 28s 16s ratio [145] Analyte --- No

rin [146] Read Group 5278775 No

rna integrity number [147] Analyte --- No

sample ordinal [148] Sample --- No

sample type id [149] Sample --- No

sample type [150] Sample 3111302 Yes

section location [151] Slide --- Yes

selected normal low pass wgs [152] Aliquot --- No

selected normal targeted sequencing [153] Aliquot --- No

selected normal wgs [154] Aliquot --- No

selected normal wxs [155] Aliquot --- No

sequencing center [156] Read Group --- Yes

sequencing date [157] Read Group --- No

shortest dimension [158] Sample 5432603 No

single cell library [159] Read Group --- No

size selection range [160] Read Group --- No

source center [161] Aliquot --- No

spectrophotometer method [162] Analyte 3008378 No

spike ins concentration [163] Read Group --- No

spike ins fasta [164] Read Group --- No

target capture kit catalog number [165] Read Group --- No

target capture kit name [166] Read Group --- No

target capture kit target region [167] Read Group --- No

GDC Data Processing Page 11 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

target capture kit vendor [168] Read Group --- No

target capture kit version [169] Read Group --- No

target capture kit [170] Read Group --- Yes

time between clamping and freezing [171] Sample 5432611 No

time between excision and freezing [172] Sample 5432612 No

tissue collection type [173] Sample --- No

tissue microarray coordinates [174] Slide --- No

tissue type [175] Sample 5432687 Yes

to trim adapter sequence [176] Read Group --- No

tumor code id [177] Sample --- No

tumor code [178] Sample --- No

tumor descriptor [179] Sample 3288124 No

weight [180] Portion 5432593 No

well number [181] Analyte 5432613 No

Clinical Data Standardization

Clinical data on cases enrolled in major NCI CCG programs are provided by Biospecimen Core Repositories (BCRs) in schema-valid XML format. Other submitters may provide clinical data in a tab-delimited text or json format via a GDC web page. Clinical data elements indexed by the GDC are described in the following table.

More information about each of the clinical elements, including their descriptions and value domains, can be found in the GDC Data Dictionary Viewer. The links for each clinical entity dictionary entry are listed below:

Demographic [182] Diagnosis [183] Exposure [184] Family History [185] Follow Up [186] Molecular Test [187] Pathology Detail [188] Treatment [189]

GDC Data Processing Page 12 NCI Genomic Data Commons (GDC)

Clinical Data Table [190]

Term Category CDE Required?

aa change [191] Molecular Test 6142508 No

additional pathology findings [192] Pathology Detail --- No

adrenal hormone [193] Diagnosis C2264 No

adverse event grade [194] Follow Up 2944515 No

adverse event [195] Follow Up 3125302 No

age at diagnosis [196] Diagnosis 3225640 Yes

age at index [197] Demographic 6028530 No

age at onset [198] Exposure --- No

age is obfuscated [199] Demographic --- No

aids risk factors [200] Follow Up --- No

ajcc clinical m [201] Diagnosis 3440331 No

ajcc clinical n [202] Diagnosis 3440330 No

ajcc clinical stage [203] Diagnosis 3440332 No

ajcc clinical t [204] Diagnosis 3440328 No

ajcc pathologic m [205] Diagnosis 3045439 No

ajcc pathologic n [206] Diagnosis 3203106 No

ajcc pathologic stage [207] Diagnosis 3203222 No

ajcc pathologic t [208] Diagnosis 3045435 No

ajcc staging system edition [209] Diagnosis 2722309 No

alcohol days per week [210] Exposure 3114013 No

alcohol drinks per day [211] Exposure 3124961 No

alcohol history [212] Exposure 2201918 No

alcohol intensity [213] Exposure 3457767 No

alcohol type [214] Exposure --- No

anaplasia present type [215] Pathology Detail 4925534 No

anaplasia present [216] Pathology Detail 6059599 No

ann arbor b symptoms described [217] Diagnosis --- No

GDC Data Processing Page 13 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

ann arbor b symptoms [218] Diagnosis 2902402 No

ann arbor clinical stage [219] Diagnosis 5615604 No

ann arbor extranodal involvement [220] Diagnosis 3364582 No

ann arbor pathologic stage [221] Diagnosis 5615605 No

antigen [222] Molecular Test 6142523 No

asbestos exposure [223] Exposure 1253 No

barretts esophagus goblet cells present [224] Follow Up 3440216 No

best overall response [225] Diagnosis 2003324 No

biospecimen type [226] Molecular Test --- No

biospecimen volume [227] Molecular Test --- No

blood test normal range lower [228] Molecular Test 6142571 No

blood test normal range upper [229] Molecular Test 6142535 No

bmi [230] Follow Up 2006410 No

body surface area [231] Follow Up 653 No

bone marrow malignant cells [232] Pathology Detail --- No

breslow thickness [233] Pathology Detail 64809 No

burkitt lymphoma clinical variant [234] Diagnosis 3770421 No

cause of death source [235] Demographic --- No

cause of death [236] Demographic 2554674 No

cause of response [237] Follow Up 6161025 No

cd4 count [238] Follow Up 4182751 No

cdc hiv risk factors [239] Follow Up --- No

cell count [240] Molecular Test 6142528 No

chemo concurrent to radiation [241] Treatment --- No

child pugh classification [242] Diagnosis 2931791 No

chromosome [243] Molecular Test 6142404 No

cigarettes per day [244] Exposure 2001716 No

circumferential resection margin [245] Pathology Detail 6161030 No

classification of tumor [246] Diagnosis 3288124 No

GDC Data Processing Page 14 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

clonality [247] Molecular Test --- No

coal dust exposure [248] Exposure --- No

cog liver stage [249] Diagnosis 6013618 No

cog neuroblastoma risk group [250] Diagnosis 4616452 No

cog renal stage [251] Diagnosis 6013641 No

cog rhabdomyosarcoma risk group [252] Diagnosis 6133604 No

columnar mucosa present [253] Pathology Detail --- No

comorbidity method of diagnosis [254] Follow Up 6142386 No

comorbidity [255] Follow Up 2970715 No

consistent pathology review [256] Pathology Detail --- No

copy number [257] Molecular Test 6142519 No

country of residence at enrollment [258] Demographic 7050286 No

cytoband [259] Molecular Test 6142405 No

days to adverse event [260] Follow Up 6154728 No

days to best overall response [261] Diagnosis 6154732 No

days to birth [262] Demographic 6154723 No

days to comorbidity [263] Follow Up --- No

days to death [264] Demographic 6154724 No

days to diagnosis [265] Diagnosis 6154733 No

days to follow up [266] Follow Up 6154727 Yes

days to imaging [267] Follow Up --- No

days to last follow up [268] Diagnosis 3008273 No

days to last known disease status [269] Diagnosis 3008273 No

days to progression free [270] Follow Up --- No

days to progression [271] Follow Up 6154730 No

days to recurrence [272] Diagnosis 6154731 No

days to recurrence [273] Follow Up 6154731 No

days to test [274] Molecular Test --- No

days to treatment end [275] Treatment 6154725 No

GDC Data Processing Page 15 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

days to treatment start [276] Treatment 6154726 No

diabetes treatment type [277] Follow Up 3587247 No

disease response [278] Follow Up 5750671 No

dlco ref predictive percent [279] Follow Up 2180255 No

degree [280] Pathology Detail --- No

dysplasia type [281] Pathology Detail --- No

ecog performance status [282] Follow Up 88 No

eln risk classification [283] Diagnosis --- No

enneking msts grade [284] Diagnosis 6003955 No

enneking msts [285] Diagnosis 6003958 No

enneking msts stage [286] Diagnosis 6060045 No

enneking msts tumor site [287] Diagnosis 6003957 No

environmental tobacco smoke exposure [288] Exposure --- No

esophageal columnar dysplasia degree [289] Diagnosis 3440917 No

esophageal columnar metaplasia present [290] Diagnosis 3440218 No

ethnicity [291] Demographic 2192217 Yes

evidence of recurrence type [292] Follow Up --- No

exon [293] Molecular Test 6142411 No

exposure duration years [294] Exposure C83280 No

exposure duration [295] Exposure --- No

exposure type [296] Exposure --- No

eye color [297] Follow Up C157437 No

fev1 fvc post bronch percent [298] Follow Up 3302956 No

fev1 fvc pre bronch percent [299] Follow Up 3302955 No

fev1 ref post bronch percent [300] Follow Up 3302948 No

fev1 ref pre bronch percent [301] Follow Up 3302947 No

figo stage [302] Diagnosis 3225684 No

figo staging edition year [303] Diagnosis --- No

first symptom prior to diagnosis [304] Diagnosis 6133605 No

GDC Data Processing Page 16 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

gastric esophageal junction involvement [305] Diagnosis 6059632 No

gender [306] Demographic 2200604 Yes

gene symbol [307] Molecular Test 6142392 Yes

gleason grade group [308] Diagnosis 5918370 No

gleason grade tertiary [309] Diagnosis --- No

gleason patterns percent [310] Diagnosis --- No

goblet cells columnar mucosa present [311] Diagnosis 3440219 No

greatest tumor dimension [312] Pathology Detail --- No

gross tumor weight [313] Pathology Detail 6133606 No

haart treatment indicator [314] Follow Up --- No

height [315] Follow Up 649 No

hepatitis sustained virological response [316] Follow Up 6423783 No

histone family [317] Molecular Test 6142511 No

histone variant [318] Molecular Test 6142515 No

history of tumor type [319] Follow Up --- No

history of tumor [320] Follow Up --- No

hiv viral load [321] Follow Up 2649682 No

hormonal contraceptive type [322] Follow Up --- No

hormonal contraceptive use [323] Follow Up --- No

hormone replacement therapy type [324] Follow Up --- No

hpv positive type [325] Follow Up 2922649 No

hysterectomy margins involved [326] Follow Up --- No

hysterectomy type [327] Follow Up --- No

icd 10 code [328] Diagnosis 3226287 No

igcccg stage [329] Diagnosis --- No

imaging result [330] Follow Up --- No

imaging type [331] Follow Up --- No

immunosuppressive treatment type [332] Follow Up --- No

initial disease status [333] Treatment --- No

GDC Data Processing Page 17 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

inpc grade [334] Diagnosis 6133602 No

inpc histologic group [335] Diagnosis 4616372 No

inrg stage [336] Diagnosis 5777238 No

inss stage [337] Diagnosis 6133603 No

international prognostic index [338] Diagnosis 2500234 No

intron [339] Molecular Test 6514355 No

irs group [340] Diagnosis 6141658 No

irs stage [341] Diagnosis 5162089 No

ishak fibrosis score [342] Diagnosis 3182621 No

iss stage [343] Diagnosis 2465385 No

karnofsky performance status [344] Follow Up 2003853 No

laboratory test [345] Molecular Test --- No

largest extrapelvic peritoneal focus [346] Pathology Detail 6690680 No

last known disease status [347] Diagnosis 5424231 No

laterality [348] Diagnosis 827 No

loci abnormal count [349] Molecular Test 6074182 No

loci count [350] Molecular Test 6074183 No

locus [351] Molecular Test 6142506 No

lymph node involved site [352] Pathology Detail --- No

lymph node involvement [353] Pathology Detail --- No

lymph nodes positive [354] Pathology Detail 89 No

lymph nodes tested [355] Pathology Detail 3 No

lymphatic invasion present [356] Pathology Detail 64171 No

margin distance [357] Diagnosis --- No

margin status [358] Pathology Detail --- No

margins involved site [359] Diagnosis --- No

marijuana use per week [360] Exposure --- No

masaoka stage [361] Diagnosis 3952848 No

medulloblastoma molecular classification [362] Diagnosis 6002209 No

GDC Data Processing Page 18 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

menopause status [363] Follow Up 2434914 No

metaplasia present [364] Pathology Detail --- No

metastasis at diagnosis site [365] Diagnosis 3029815 No

metastasis at diagnosis [366] Diagnosis 6133614 No

method of diagnosis [367] Diagnosis 6161031 No

micropapillary features [368] Diagnosis 6068784 No

mismatch repair mutation [369] Molecular Test 6142534 No

mitosis karyorrhexis index [370] Diagnosis 4616412 No

mitotic count [371] Diagnosis C47864 No

mitotic count [372] Molecular Test C47864 No

mitotic total area [373] Molecular Test --- No

molecular analysis method [374] Molecular Test 6142401 Yes

molecular consequence [375] Molecular Test 6142403 No

morphologic architectural pattern [376] Pathology Detail --- No

morphology [377] Diagnosis 3226275 Yes

nadir cd4 count [378] Follow Up --- No

necrosis percent [379] Pathology Detail --- No

necrosis present [380] Pathology Detail --- No

non nodal regional disease [381] Pathology Detail --- No

non nodal tumor deposits [382] Pathology Detail 3107051 No

number of cycles [383] Treatment --- No

number proliferating cells [384] Pathology Detail 5432636 No

occupation duration years [385] Demographic 2435424 No

ovarian specimen status [386] Diagnosis 6690671 No

ovarian surface involvement [387] Diagnosis 6690674 No

pack years smoked [388] Exposure 2955385 No

pancreatitis onset year [389] Follow Up 3457763 No

papillary renal cell type [390] Diagnosis --- No

parent with radiation exposure [391] Exposure --- No

GDC Data Processing Page 19 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

pathogenicity [392] Molecular Test --- No

percent tumor invasion [393] Pathology Detail --- No

perineural invasion present [394] Pathology Detail 64181 No

peripancreatic lymph nodes positive [395] Pathology Detail 5983082 No

peripancreatic lymph nodes tested [396] Pathology Detail 6050944 No

peritoneal fluid cytological status [397] Diagnosis 6690681 No

ploidy [398] Molecular Test 6142527 No

pregnancy outcome [399] Follow Up --- No

pregnant at diagnosis [400] Diagnosis --- No

premature at birth [401] Demographic 6010765 No

primary diagnosis [402] Diagnosis 6161032 Yes

diagnosis is primary disease [403] Diagnosis --- Yes

primary gleason grade [404] Diagnosis 5936800 No

prior malignancy [405] Diagnosis 3382736 No

prior treatment [406] Diagnosis 4231463 No

procedures performed [407] Follow Up --- No

progression or recurrence anatomic site [408] Follow Up 6161026 No

progression or recurrence type [409] Follow Up 6142385 No

progression or recurrence [410] Diagnosis 3121376 No

progression or recurrence [411] Follow Up 3121376 No

prostatic chips positive count [412] Pathology Detail --- No

prostatic chips total count [413] Pathology Detail --- No

prostatic involvement percent [414] Pathology Detail --- No

race [415] Demographic 2192199 Yes

radon exposure [416] Exposure 2816352 No

reason treatment ended [417] Treatment --- No

recist targeted regions number [418] Follow Up --- No

recist targeted regions sum [419] Follow Up --- No

reflux treatment type [420] Follow Up 3440206 No

GDC Data Processing Page 20 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

regimen or line of therapy [421] Treatment 6161024 No

relationship age at diagnosis [422] Family history 5300571 No

relationship gender [423] Family history 6161021 No

relationship primary diagnosis [424] Family history 6161022 No

relationship type [425] Family history 2690165 No

relative with cancer history [426] Family history 6161023 No

relatives with cancer history count [427] Family history --- No

residual disease [428] Diagnosis 2608702 No

residual tumor [429] Pathology Detail C4809 No

respirable crystalline silica exposure [430] Exposure --- No

rhabdoid percent [431] Pathology Detail 6790120 No

rhabdoid present [432] Pathology Detail --- No

risk factor treatment [433] Follow Up 6514356 No

risk factor [434] Follow Up 6142389 No

route of administration [435] Treatment C38114 No

sarcomatoid percent [436] Pathology Detail 2429786 No

sarcomatoid present [437] Pathology Detail --- No

satellite nodule present [438] Diagnosis --- No

scan tracer used [439] Follow Up --- No

second exon [440] Molecular Test --- No

second gene symbol [441] Molecular Test 6142393 No

secondary gleason grade [442] Diagnosis 5936802 No

secondhand smoke as child [443] Exposure 6841888 No

site of resection or biopsy [444] Diagnosis 6161034 Yes

sites of involvement [445] Diagnosis --- No

size extraocular nodule [446] Pathology Detail --- No

smokeless tobacco quit age [447] Exposure --- No

smoking frequency [448] Exposure --- No

specialized molecular test [449] Molecular Test 6142526 No

GDC Data Processing Page 21 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

supratentorial localization [450] Diagnosis 3133891 No

synchronous malignancy [451] Diagnosis 6142390 No

test analyte type [452] Molecular Test 6142394 No

test result [453] Molecular Test 6142397 Yes

test units [454] Molecular Test 6142525 No

test value [455] Molecular Test 6142524 No

therapeutic agents [456] Treatment 2975232 No

time between waking and first smoke [457] Exposure 3279220 No

tissue or organ of origin [458] Diagnosis 6161035 Yes

tobacco smoking onset year [459] Exposure 2228604 No

tobacco smoking quit year [460] Exposure 2228610 No

tobacco smoking status [461] Exposure 2181650 No

tobacco use per day [462] Exposure --- No

transcript [463] Molecular Test 6142465 No

transglottic extension [464] Pathology Detail --- No

treatment anatomic site [465] Treatment 5615611 No

treatment arm [466] Treatment 7068995 No

treatment dose units [467] Treatment --- No

treatment dose [468] Treatment 2182728 No

treatment effect indicator [469] Treatment --- No

treatment effect [470] Treatment 6514354 No

treatment frequency [471] Treatment --- No

treatment intent type [472] Treatment 2793511 No

treatment or therapy [473] Treatment 4231463 No

treatment outcome [474] Treatment 5102383 No

treatment type [475] Treatment 5102381 No

tumor confined to organ of origin [476] Diagnosis 4925494 No

tumor depth [477] Diagnosis --- No

tumor focality [478] Diagnosis 3174022 No

GDC Data Processing Page 22 NCI Genomic Data Commons (GDC)

Term Category CDE Required?

tumor grade [479] Diagnosis 2785839 No

tumor largest dimension diameter [480] Pathology Detail 64215 No

tumor regression grade [481] Diagnosis 6471217 No

tumor thickness [482] Pathology Detail C176286 No

type of smoke exposure [483] Exposure --- No

type of tobacco used [484] Exposure --- No

undescended testis corrected age [485] Follow Up --- No

undescended testis corrected laterality [486] Follow Up --- No

undescended testis corrected method [487] Follow Up --- No

undescended testis corrected [488] Follow Up --- No

undescended testis history laterality [489] Follow Up --- No

undescended testis history [490] Follow Up C12326 No

variant origin [491] Molecular Test --- No

variant type [492] Molecular Test 6142402 No

vascular invasion present [493] Pathology Detail 64358 No

vascular invasion type [494] Pathology Detail 3168001 No

viral hepatitis serologies [495] Follow Up 4395982 No

vital status [496] Demographic 5 Yes

weeks gestation at birth [497] Demographic 2737369 No

weight [498] Follow Up 651 No

weiss assessment score [499] Diagnosis 3648744 No

who cns grade [500] Diagnosis --- No

who nte grade [501] Diagnosis --- No

wilms tumor histologic subtype [502] Diagnosis 4358735 No

year of birth [503] Demographic 2896954 No

year of death [504] Demographic 2897030 No

year of diagnosis [505] Diagnosis 2896960 No

years smoked [506] Exposure 3137957 No

zygosity [507] Molecular Test 6142510 No

GDC Data Processing Page 23 NCI Genomic Data Commons (GDC)

GDC Reference Files

Reference files used by the GDC data harmonization and generation pipelines are provided below. MD5 checksums are provided for verifying file integrity after download. Additional files are also included to allow for reproduction of GDC pipeline analyses.

GRCh38.d1.vd1 Reference Sequence

GRCh38.d1.vd1.fa.tar.gz [508]

md5: 3ffbcfe2d05d43206f57f81ebb251dc9 file size: 875.3 MB

This reference genome is used by the GDC for all sequencing and array based analyses. This file is composed of the following sequences:

GCA_000001405.15_GRCh38_no_alt_analysis_set [509] Sequence Decoys (GenBank Accession GCA_000786075) [510] Virus Sequences [511]

Index Files

Index files are built from the GDC reference genome and are used with the software listed below.

GDC.h38.d1.vd1 BWA Index Files

GRCh38.d1.vd1_BWA.tar.gz [512] md5: 015f5223bddd93b6e8f7a038c171f7be file size: 3.2 GB

GDC.h38.d1.vd1 GATK Index Files

GRCh38.d1.vd1_GATK_indices.tar.gz [513] md5: f64be73587a7f376c0d8353f1636dca7 file size: 104 KB

GDC.h38.d1.vd1 STAR2 Index Files

star.index.genome.d1.vd1.gtfv22.tar.gz [514] md5: 7c2e6bd5767239c7c9eb618cd03bcadb file size: 24.9 GB

GDC Data Processing Page 24 NCI Genomic Data Commons (GDC)

Annotation Files

Annotation files contain information about the position and identity of regions in the reference genome. They allow software to calculate expression values.

GDC.h38 miRNA database files

mirna_database.tar.gz [515] md5: d078aec8561d72b52e475e3f932865e4 file size: 185 MB

GDC.h38 GENCODE v22 GTF (used in RNA-Seq alignment and by HTSeq)

gencode.v22.annotation.gtf.gz [516] md5: 291330bdcff1094bc4d5645de35e0871 file size: 39.0 MB

GDC.h38 Flattened GENCODE v22 GFF (used by DEXSeq for exon quantification)

gencode.v22.annotation.flattened.gff [517] md5: 5a843572abc2321e42eaafbe99c47363 file size: 125 MB

GDC.h38 GENCODE TSV

gencode.gene.info.v22.tsv [518] md5: 0a3f1d9b0a679e2a426de36d8d74fbf9 file size: 6 MB

Miscellaneous Files

Antibody Description Files for TCGA RPPA Data

TCGA_antibodies_descriptions.gencode.v22.tsv [519] md5: b5d84afabed98a034121372df01d726f file size: 35 KB

Genome Annotation Files for Legacy TCGA Data

TCGA.hg19.June2011.gaf [520]

md5: b9e0c2b81736d82d62bb6ab8cc517644 file size: 629 MB

GDC Data Processing Page 25 NCI Genomic Data Commons (GDC)

TCGA.hg18.Feb2011.gaf [521]

md5: 9a5c05c5b836ec19517871f30f2bccba file size: 558 MB

SNP6 GRCh38 Remapped Probeset File for Copy Number Variation Analysis

snp6.na35.remap.hg38.subset.txt.gz [522] md5: 051457f33d264d74825a41d6b0378ac4 file size: 14.1 MB Data Release 12 and after

If you are using Masked Copy Number Segment for GISTIC analysis, please only keep probesets with freqcnv = FALSE

SNP6 GRCh38 Liftover Probeset File for Copy Number Variation Analysis

snp6.na35.liftoverhg38.txt.zip [523] md5: 0f982112bc81f31f1ad49a785a10305f file size: 14.4 MB Before Data Release 12

GDC VEP Cache File

homo_sapiens.tar.gz [524] md5: 57064d0b081f0b99b2663977121f23c5 file size: 4.6 GB

GDC Panel of Normal (PON) Files used for Variant Calling

These files are controlled and require dbGaP access to download. You will need to use the gdc-client [525] to download these.

For Tumor-Only Variant Calling Pipeline gatk4_mutect2_4136_pon.vcf.tar

uuid: 6c4c4a48-3589-4fc0-b1fd-ce56e88c06e4 md5: 725d891e02ca93edaabac8b09322439e file size: 92 MB

For Tumor / Normal Variant Calling Pipeline

GDC Data Processing Page 26 NCI Genomic Data Commons (GDC)

MuTect2.PON.4136.vcf.tar

uuid: 6b45b9f7-893e-4947-83b6-db0402471e23 md5: d13a138dcf4e9f1ec8a69ac3a4f64ca9 file size: 121 MB

MuTect2.PON.5210.vcf.tar

uuid: 726e24c0-d2f2-41a8-9435-f85f22e1c832 md5: 5b5c1c3e208aa9a403cc4a8ff39e7f1f file size: 146 MB

GDC Data Processing Page 27 NCI Genomic Data Commons (GDC)

Source URL: https://gdc.cancer.gov/about-data/gdc-data-processing

Links: [1] https://gdc.cancer.gov/about-data/data-harmonization-and-generation/genomic-data-harmonization-0 [2] https://gdc.cancer.gov/about-data/data-harmonization-and-generation/genomic-data-harmonization- 0#Overview [3] https://gdc.cancer.gov/about-data/data-harmonization-and-generation/genomic-data-harmonization- 0#ReferenceAlignment [4] https://gdc.cancer.gov/about-data/data-harmonization-and-generation/genomic-data-harmonization- 0#Pipelines [5] https://gdc.cancer.gov/about-data/data-harmonization-and-generation/biospecimen-data-harmonization [6] https://gdc.cancer.gov/about-data/data-harmonization-and-generation/clinical-data-harmonization [7] https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files [8] https://portal.gdc.cancer.gov/ [9] https://gdc.cancer.gov/files/public/image/gdc-data-harmonization-pipeline.png [10] https://docs.gdc.cancer.gov [11] http://www.biorxiv.org/content/early/2016/05/25/055467.abstract [12] http://www.nature.com/nbt/journal/v31/n3/abs/nbt.2514.html [13] http://bioinformatics.oxfordjournals.org/content/28/3/311.short [14] http://genome.cshlp.org/content/22/3/568.short [15] https://www.ncbi.nlm.nih.gov/SNP/ [16] https://www.ncbi.nlm.nih.gov/omim [17] https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/ [18] https://gdc.cancer.gov/files/public/image/Flow_DNA_Seq.png [19] https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/ [20] https://docs.gdc.cancer.gov/Data/File_Formats/VCF_Format/ [21] https://gdc.cancer.gov/files/public/image/WGS-Sanger-DR28.png [22] https://gdc.cancer.gov/files/public/image/RNA-Seq-DR29.png [23] https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/ [24] https://gdc.cancer.gov/files/public/image/scRNA-Pipeline-DR28.png [25] https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline/#scrna-seq- pipeline [26] http://www.mirbase.org/ [27] https://gdc.cancer.gov/files/public/image/Flow_miRNA_Seq.png [28] https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/miRNA_Pipeline/ [29] https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/data-levels [30] https://bioconductor.org/packages/release/bioc/html/DNAcopy.html [31] https://gdc.cancer.gov/files/public/image/Flow_CNV.png [32] https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/CNV_Pipeline/ [33] https://gdc.cancer.gov/files/public/image/Flow_Methylation.png [34] https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Methylation_LO_Pipeline/ [35] https://www.docker.com/what-docker [36] https://gdc.cancer.gov/developers/gdc-data-model [37] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=case [38] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=sample [39] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=portion [40] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=analyte [41] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=aliquot [42] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=read_group [43] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=slide

GDC Data Processing Page 28 NCI Genomic Data Commons (GDC)

[44] https://gdc.cancer.gov/biospecimen-entity-field-information [45] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=a260_a280_ratio [46] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=adapter_name [47] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=adapter_sequence [48] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=aliquot_quantity [49] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=aliquot_volume [50] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=amount [51] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=amount [52] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=analyte_quantity [53] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=analyte_type_id [54] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=analyte_type_id [55] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=analyte_type [56] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=analyte_type [57] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=analyte_volume [58] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=base_caller_name [59] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=base_caller_version [60] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=biospecimen_anatomic_site [61] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=biospecimen_laterality [62] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=bone_marrow_malignant_cells [63] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=catalog_reference [64] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=chipseq_antibody [65] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=chipseq_target [66] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=composition [67] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=concentration [68] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=concentration [69] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-

GDC Data Processing Page 29 NCI Genomic Data Commons (GDC) view&id=case&anchor=consent_type [70] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=portion&anchor=creation_datetime [71] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=current_weight [72] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=days_to_collection [73] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=case&anchor=days_to_consent [74] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=case&anchor=days_to_lost_to_followup [75] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=days_to_sample_procurement [76] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=days_to_sequencing [77] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=diagnosis_pathologically_confirmed [78] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=case&anchor=disease_type [79] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=distance_normal_to_tumor [80] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=distributor_reference [81] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=experiment_name [82] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=experimental_protocol_type [83] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=flow_cell_barcode [84] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=fragment_maximum_length [85] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=fragment_mean_length [86] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=fragment_minimum_length [87] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=fragment_standard_deviation_length [88] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=fragmentation_enzyme [89] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=freezing_method [90] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=growth_rate [91] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=includes_spike_ins [92] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=case&anchor=index_date [93] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=initial_weight [94] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-

GDC Data Processing Page 30 NCI Genomic Data Commons (GDC) view&id=read_group&anchor=instrument_model [95] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=intermediate_dimension [96] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=portion&anchor=is_ffpe [97] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=is_ffpe [98] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=is_paired_end [99] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=lane_number [100] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=library_name [101] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=library_preparation_kit_catalog_number [102] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=library_preparation_kit_name [103] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=library_preparation_kit_vendor [104] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=library_preparation_kit_version [105] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=library_selection [106] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=library_strand [107] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=library_strategy [108] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=longest_dimension [109] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=case&anchor=lost_to_followup [110] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=method_of_sample_procurement [111] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=multiplex_barcode [112] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=no_matched_normal_low_pass_wgs [113] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=no_matched_normal_targeted_sequencing [114] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=no_matched_normal_wgs [115] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=no_matched_normal_wxs [116] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=normal_tumor_genotype_snp_match [117] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=number_expect_cells [118] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=number_proliferating_cells [119] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-

GDC Data Processing Page 31 NCI Genomic Data Commons (GDC) view&id=sample&anchor=oct_embedded [120] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=passage_count [121] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=pathology_report_uuid [122] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_eosinophil_infiltration [123] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_follicular_component [124] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_granulocyte_infiltration [125] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_inflam_infiltration [126] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_lymphocyte_infiltration [127] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_monocyte_infiltration [128] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_necrosis [129] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_neutrophil_infiltration [130] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_normal_cells [131] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_rhabdoid_features [132] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_sarcomatoid_features [133] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_stromal_cells [134] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_tumor_cells [135] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=percent_tumor_nuclei [136] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=platform [137] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=portion&anchor=portion_number [138] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=preservation_method [139] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=case&anchor=primary_site [140] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=prostatic_chips_positive_count [141] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=prostatic_chips_total_count [142] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=prostatic_involvement_percent [143] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=read_group_name [144] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-

GDC Data Processing Page 32 NCI Genomic Data Commons (GDC) view&id=read_group&anchor=read_length [145] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=ribosomal_rna_28s_16s_ratio [146] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=rin [147] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=rna_integrity_number [148] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=sample_ordinal [149] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=sample_type_id [150] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=sample_type [151] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=section_location [152] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=selected_normal_low_pass_wgs [153] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=selected_normal_targeted_sequencing [154] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=selected_normal_wgs [155] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=selected_normal_wxs [156] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=sequencing_center [157] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=sequencing_date [158] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=shortest_dimension [159] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=single_cell_library [160] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=size_selection_range [161] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=aliquot&anchor=source_center [162] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=spectrophotometer_method [163] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=spike_ins_concentration [164] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=spike_ins_fasta [165] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=target_capture_kit_catalog_number [166] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=target_capture_kit_name [167] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=target_capture_kit_target_region [168] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=target_capture_kit_vendor [169] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-

GDC Data Processing Page 33 NCI Genomic Data Commons (GDC) view&id=read_group&anchor=target_capture_kit_version [170] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=target_capture_kit [171] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=time_between_clamping_and_freezing [172] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=time_between_excision_and_freezing [173] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=tissue_collection_type [174] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=slide&anchor=tissue_microarray_coordinates [175] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=tissue_type [176] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=read_group&anchor=to_trim_adapter_sequence [177] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=tumor_code_id [178] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=tumor_code [179] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=sample&anchor=tumor_descriptor [180] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=portion&anchor=weight [181] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=analyte&anchor=well_number [182] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=demographic [183] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=diagnosis [184] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=exposure [185] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=family_history [186] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=follow_up [187] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test [188] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail [189] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition-view&id=treatment [190] https://gdc.cancer.gov/clinical-data-table [191] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=aa_change [192] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=additional_pathology_findings [193] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=adrenal_hormone [194] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=adverse_event_grade [195] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=adverse_event [196] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=age_at_diagnosis [197] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=age_at_index

GDC Data Processing Page 34 NCI Genomic Data Commons (GDC)

[198] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=age_at_onset [199] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=age_is_obfuscated [200] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=aids_risk_factors [201] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ajcc_clinical_m [202] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ajcc_clinical_n [203] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ajcc_clinical_stage [204] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ajcc_clinical_t [205] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ajcc_pathologic_m [206] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ajcc_pathologic_n [207] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ajcc_pathologic_stage [208] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ajcc_pathologic_t [209] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ajcc_staging_system_edition [210] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=alcohol_days_per_week [211] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=alcohol_drinks_per_day [212] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=alcohol_history [213] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=alcohol_intensity [214] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=alcohol_type [215] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=anaplasia_present_type [216] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=anaplasia_present [217] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ann_arbor_b_symptoms_described [218] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ann_arbor_b_symptoms [219] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ann_arbor_clinical_stage [220] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ann_arbor_extranodal_involvement [221] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ann_arbor_pathologic_stage [222] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=antigen

GDC Data Processing Page 35 NCI Genomic Data Commons (GDC)

[223] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=asbestos_exposure [224] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=barretts_esophagus_goblet_cells_present [225] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=best_overall_response [226] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=biospecimen_type [227] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=biospecimen_volume [228] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=blood_test_normal_range_lower [229] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=blood_test_normal_range_upper [230] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=bmi [231] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=body_surface_area [232] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=bone_marrow_malignant_cells [233] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=breslow_thickness [234] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=burkitt_lymphoma_clinical_variant [235] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=cause_of_death_source [236] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=cause_of_death [237] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=cause_of_response [238] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=cd4_count [239] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=cdc_hiv_risk_factors [240] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=cell_count [241] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=chemo_concurrent_to_radiation [242] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=child_pugh_classification [243] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=chromosome [244] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=cigarettes_per_day [245] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=circumferential_resection_margin [246] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=classification_of_tumor [247] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=clonality

GDC Data Processing Page 36 NCI Genomic Data Commons (GDC)

[248] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=coal_dust_exposure [249] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=cog_liver_stage [250] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=cog_neuroblastoma_risk_group [251] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=cog_renal_stage [252] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=cog_rhabdomyosarcoma_risk_group [253] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=columnar_mucosa_present [254] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=comorbidity_method_of_diagnosis [255] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=comorbidity [256] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=consistent_pathology_review [257] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=copy_number [258] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=country_of_residence_at_enrollment [259] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=cytoband [260] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=days_to_adverse_event [261] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=days_to_best_overall_response [262] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=days_to_birth [263] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=days_to_comorbidity [264] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=days_to_death [265] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=days_to_diagnosis [266] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=days_to_follow_up [267] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=days_to_imaging [268] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=days_to_last_follow_up [269] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=days_to_last_known_disease_status [270] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=days_to_progression_free [271] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=days_to_progression [272] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=days_to_recurrence

GDC Data Processing Page 37 NCI Genomic Data Commons (GDC)

[273] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=days_to_recurrence [274] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=days_to_test [275] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=days_to_treatment_end [276] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=days_to_treatment_start [277] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=diabetes_treatment_type [278] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=disease_response [279] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=dlco_ref_predictive_percent [280] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=dysplasia_degree [281] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=dysplasia_type [282] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=ecog_performance_status [283] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=eln_risk_classification [284] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=enneking_msts_grade [285] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=enneking_msts_metastasis [286] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=enneking_msts_stage [287] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=enneking_msts_tumor_site [288] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=environmental_tobacco_smoke_exposure [289] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=esophageal_columnar_dysplasia_degree [290] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=esophageal_columnar_metaplasia_present [291] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=ethnicity [292] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=evidence_of_recurrence_type [293] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=exon [294] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=exposure_duration_years [295] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=exposure_duration [296] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=exposure_type [297] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=eye_color

GDC Data Processing Page 38 NCI Genomic Data Commons (GDC)

[298] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=fev1_fvc_post_bronch_percent [299] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=fev1_fvc_pre_bronch_percent [300] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=fev1_ref_post_bronch_percent [301] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=fev1_ref_pre_bronch_percent [302] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=figo_stage [303] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=figo_staging_edition_year [304] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=first_symptom_prior_to_diagnosis [305] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=gastric_esophageal_junction_involvement [306] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=gender [307] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=gene_symbol [308] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=gleason_grade_group [309] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=gleason_grade_tertiary [310] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=gleason_patterns_percent [311] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=goblet_cells_columnar_mucosa_present [312] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=greatest_tumor_dimension [313] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=gross_tumor_weight [314] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=haart_treatment_indicator [315] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=height [316] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=hepatitis_sustained_virological_response [317] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=histone_family [318] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=histone_variant [319] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=history_of_tumor_type [320] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=history_of_tumor [321] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=hiv_viral_load [322] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=hormonal_contraceptive_type

GDC Data Processing Page 39 NCI Genomic Data Commons (GDC)

[323] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=hormonal_contraceptive_use [324] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=hormone_replacement_therapy_type [325] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=hpv_positive_type [326] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=hysterectomy_margins_involved [327] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=hysterectomy_type [328] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=icd_10_code [329] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=igcccg_stage [330] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=imaging_result [331] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=imaging_type [332] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=immunosuppressive_treatment_type [333] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=initial_disease_status [334] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=inpc_grade [335] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=inpc_histologic_group [336] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=inrg_stage [337] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=inss_stage [338] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=international_prognostic_index [339] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=intron [340] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=irs_group [341] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=irs_stage [342] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ishak_fibrosis_score [343] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=iss_stage [344] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=karnofsky_performance_status [345] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=laboratory_test [346] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=largest_extrapelvic_peritoneal_focus [347] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=last_known_disease_status

GDC Data Processing Page 40 NCI Genomic Data Commons (GDC)

[348] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=laterality [349] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=loci_abnormal_count [350] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=loci_count [351] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=locus [352] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=lymph_node_involved_site [353] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=lymph_node_involvement [354] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=lymph_nodes_positive [355] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=lymph_nodes_tested [356] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=lymphatic_invasion_present [357] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=margin_distance [358] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=margin_status [359] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=margins_involved_site [360] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=marijuana_use_per_week [361] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=masaoka_stage [362] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=medulloblastoma_molecular_classification [363] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=menopause_status [364] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=metaplasia_present [365] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=metastasis_at_diagnosis_site [366] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=metastasis_at_diagnosis [367] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=method_of_diagnosis [368] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=micropapillary_features [369] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=mismatch_repair_mutation [370] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=mitosis_karyorrhexis_index [371] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=mitotic_count [372] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=mitotic_count

GDC Data Processing Page 41 NCI Genomic Data Commons (GDC)

[373] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=mitotic_total_area [374] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=molecular_analysis_method [375] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=molecular_consequence [376] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=morphologic_architectural_pattern [377] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=morphology [378] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=nadir_cd4_count [379] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=necrosis_percent [380] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=necrosis_present [381] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=non_nodal_regional_disease [382] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=non_nodal_tumor_deposits [383] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=number_of_cycles [384] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=number_proliferating_cells [385] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=occupation_duration_years [386] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ovarian_specimen_status [387] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=ovarian_surface_involvement [388] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=pack_years_smoked [389] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=pancreatitis_onset_year [390] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=papillary_renal_cell_type [391] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=parent_with_radiation_exposure [392] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=pathogenicity [393] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=percent_tumor_invasion [394] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=perineural_invasion_present [395] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=peripancreatic_lymph_nodes_positive [396] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=peripancreatic_lymph_nodes_tested [397] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=peritoneal_fluid_cytological_status

GDC Data Processing Page 42 NCI Genomic Data Commons (GDC)

[398] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=ploidy [399] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=pregnancy_outcome [400] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=pregnant_at_diagnosis [401] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=premature_at_birth [402] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=primary_diagnosis [403] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=diagnosis_is_primary_disease [404] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=primary_gleason_grade [405] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=prior_malignancy [406] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=prior_treatment [407] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=procedures_performed [408] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=progression_or_recurrence_anatomic_site [409] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=progression_or_recurrence_type [410] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=progression_or_recurrence [411] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=progression_or_recurrence [412] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=prostatic_chips_positive_count [413] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=prostatic_chips_total_count [414] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=prostatic_involvement_percent [415] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=race [416] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=radon_exposure [417] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=reason_treatment_ended [418] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=recist_targeted_regions_number [419] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=recist_targeted_regions_sum [420] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=reflux_treatment_type [421] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=regimen_or_line_of_therapy [422] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=family_history&anchor=relationship_age_at_diagnosis

GDC Data Processing Page 43 NCI Genomic Data Commons (GDC)

[423] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=family_history&anchor=relationship_gender [424] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=family_history&anchor=relationship_primary_diagnosis [425] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=family_history&anchor=relationship_type [426] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=family_history&anchor=relative_with_cancer_history [427] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=family_history&anchor=relatives_with_cancer_history_count [428] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=residual_disease [429] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=residual_tumor [430] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=respirable_crystalline_silica_exposure [431] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=rhabdoid_percent [432] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=rhabdoid_present [433] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=risk_factor_treatment [434] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=risk_factor [435] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=route_of_administration [436] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=sarcomatoid_percent [437] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=sarcomatoid_present [438] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=satellite_nodule_present [439] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=scan_tracer_used [440] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=second_exon [441] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=second_gene_symbol [442] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=secondary_gleason_grade [443] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=secondhand_smoke_as_child [444] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=site_of_resection_or_biopsy [445] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=sites_of_involvement [446] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=size_extraocular_nodule [447] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=smokeless_tobacco_quit_age

GDC Data Processing Page 44 NCI Genomic Data Commons (GDC)

[448] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=smoking_frequency [449] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=specialized_molecular_test [450] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=supratentorial_localization [451] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=synchronous_malignancy [452] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=test_analyte_type [453] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=test_result [454] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=test_units [455] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=test_value [456] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=therapeutic_agents [457] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=time_between_waking_and_first_smoke [458] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=tissue_or_organ_of_origin [459] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=tobacco_smoking_onset_year [460] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=tobacco_smoking_quit_year [461] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=tobacco_smoking_status [462] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=tobacco_use_per_day [463] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=transcript [464] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=transglottic_extension [465] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_anatomic_site [466] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_arm [467] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_dose_units [468] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_dose [469] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_effect_indicator [470] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_effect [471] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_frequency [472] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_intent_type

GDC Data Processing Page 45 NCI Genomic Data Commons (GDC)

[473] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_or_therapy [474] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_outcome [475] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=treatment&anchor=treatment_type [476] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=tumor_confined_to_organ_of_origin [477] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=tumor_depth [478] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=tumor_focality [479] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=tumor_grade [480] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=tumor_largest_dimension_diameter [481] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=tumor_regression_grade [482] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=tumor_thickness [483] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=type_of_smoke_exposure [484] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=type_of_tobacco_used [485] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=undescended_testis_corrected_age [486] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=undescended_testis_corrected_laterality [487] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=undescended_testis_corrected_method [488] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=undescended_testis_corrected [489] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=undescended_testis_history_laterality [490] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=undescended_testis_history [491] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=variant_origin [492] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=variant_type [493] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=vascular_invasion_present [494] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=pathology_detail&anchor=vascular_invasion_type [495] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=viral_hepatitis_serologies [496] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=vital_status [497] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=weeks_gestation_at_birth

GDC Data Processing Page 46 NCI Genomic Data Commons (GDC)

[498] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=follow_up&anchor=weight [499] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=weiss_assessment_score [500] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=who_cns_grade [501] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=who_nte_grade [502] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=wilms_tumor_histologic_subtype [503] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=year_of_birth [504] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=demographic&anchor=year_of_death [505] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=diagnosis&anchor=year_of_diagnosis [506] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=exposure&anchor=years_smoked [507] https://docs.gdc.cancer.gov/Data_Dictionary/viewer/#?view=table-definition- view&id=molecular_test&anchor=zygosity [508] https://api.gdc.cancer.gov/data/254f697d-310d-4d7d-a27b-27fbf767a834 [509] ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh38/seqs_for_alignment_pipelines/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz [510] ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000786075.2_hs38d1/GCA_000786075.2_hs38d1_genomic.fna.gz [511] https://gdc.cancer.gov/files/public/file/GRCh83.d1.vd1_virus_decoy.txt [512] https://api.gdc.cancer.gov/data/25217ec9-af07-4a17-8db9-101271ee7225 [513] https://api.gdc.cancer.gov/data/2c5730fb-0909-4e2a-8a7a-c9a7f8b2dad5 [514] https://api.gdc.cancer.gov/data/50e11f67-ceb4-41f9-a73c-2a5aa9ed0af0 [515] https://api.gdc.cancer.gov/data/683367de-81c9-408c-85fd-2391f3e537ee [516] https://api.gdc.cancer.gov/data/25aa497c-e615-4cb7-8751-71f744f9691f [517] https://api.gdc.cancer.gov/data/45fd5d98-b5cd-45ff-9d02-fc3f260532cc [518] https://api.gdc.cancer.gov/data/b011ee3e-14d8-4a97-aed4-e0b10f6bbe82 [519] https://api.gdc.cancer.gov/v0/data/c3802b58-a2bf-41dc-8a67-99e8610a1e82 [520] https://api.gdc.cancer.gov/v0/data/93ec34fc-bbc6-426e-8d4b-cde53aba66bb [521] https://api.gdc.cancer.gov/data/70c40bbe-c332-4613-a0b5-e5b25b20d016 [522] https://api.gdc.cancer.gov/data/9bd7cbce-80f9-449e-8007-ddc9b1e89dfb [523] https://api.gdc.cancer.gov/data/77fbfff6-2acc-47ca-a5f6-c488beb46879 [524] https://api.gdc.cancer.gov/data/8b9278b3-1e0c-430a-aae5-a944428401c0 [525] https://gdc.cancer.gov/access-data/gdc-data-transfer-tool

GDC Data Processing Page 47