Mutation Profiling of Mismatch Repair-Deficient Colorectal Cancers Using an in Silico Genome Scan to Identify Coding Microsatellites1

Mutation Profiling of Mismatch Repair-Deficient Colorectal Cancers Using an in Silico Genome Scan to Identify Coding Microsatellites1

[CANCER RESEARCH 62, 1284–1288, March 1, 2002] Advances in Brief Mutation Profiling of Mismatch Repair-deficient Colorectal Cancers Using an in Silico Genome Scan to Identify Coding Microsatellites1 Jane Park,2 Doron Betel,2 Robert Gryfe, Katerina Michalickova, Nando Di Nicola, Steven Gallinger, Christopher W. V. Hogue, and Mark Redston3 Centre for Cancer Genetics, Samuel Lunenfeld Research Institute, Mount Sinai Hospital, and Departments of Laboratory Medicine and Pathobiology, Biochemistry, and Surgery, University of Toronto, Toronto, Ontario, Canada M5G 1X5 [J. P., D. B., R. G., K. M., N. D. N., S. G., C. W. V. H., M. R.], and Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115 [M. R.] Abstract tides and dinucleotides (7). This is exemplified by gene inactivating frameshift mutations in coding microsatellites in MSI-H CRCs, most Human colorectal, endometrial, and gastric cancers with defective DNA notably transforming growth factor, ␤ receptor II (8). Therefore, an in mismatch repair (MMR) have microsatellite instability, a unique molec- ular alteration characterized by widespread frameshift mutations of re- silico search for genes with coding microsatellites should uncover the petitive DNA sequences. We developed “Kangaroo,” a bioinformatics novel genetic targets involved in the molecular progression of these program for searches in nucleotide and protein sequence databases, and neoplasms. Unfortunately, the current query programs of the public performed an in silico genome scan for DNA coding microsatellites that sequence databases have two limitations that prohibit such a search. may have novel mutations in MMR-deficient cancers. Examination of 29 First, they do not support searches for low complexity regions, be- previously untested coding polyadenines revealed widespread mutations cause these regions are filtered out as “background noise,” and sec- in MMR-deficient colorectal cancers, with the highest frequencies in ond, they do not allow searching solely within human open reading ERCC5, CASP8AP2, p72, RAD50, CDC25, RECQL1, CBF2, RACK7, frames. We devised a computer program, “Kangaroo,” that searches GRK4, and DNAPK (range, 10–33%). This algorithm allows comprehen- sive mutation profiling of MMR-deficient cancers, an important step in for DNA sequences in annotated human GenBank records. Although understanding the pathogenesis of these neoplasms. GenBank is a highly redundant database, we identified many records containing coding microsatellite sequences and demonstrated muta- Introduction tions in a number of novel target genes that may be involved in the CRC4 is the second leading cause of cancer death in North Amer- pathogenesis of MMR-deficient cancers. This approach unveils the ica, providing the impetus for research aimed at understanding the possibility of comprehensive mutation profiling of MMR-deficient biology of this disease. Among the important discoveries, in recent cancers and will be integral to uncovering the biologically important years it has become clear that there are at least two major molecular molecular alterations of these neoplasms. pathogenetic pathways to CRC: (a) MSI, because of defects in DNA MMR; and (b) chromosomal instability, because of defects in mitotic Materials and Methods spindle apparatus and other genes (1). Importantly, the pathological Bioinformatics Search Algorithm. We developed a two-step search algo- and clinical attributes of the cancers arising out of each of these two rithm, Kangaroo, written in C computer programming language using the pathways are different. MSI-H CRCs are more often located in the NCBI toolkit (J. Ostell, NCBI Software Development Toolkit, 1997)5 and right colon, are typically polypoid, and have high grade histology with developed on a dual Pentium II processor Linux machine. In the first step, a prominent lymphoid reaction (2). This pathway also underlies most NCBI GenBank records are retrieved, and coding region sequences are parsed cases of hereditary nonpolyposis colon cancer (3) and leads to cancers out from all of the records. NCBI GenBank records, are accessed from our that display less aggressive growth characteristics with fewer metas- in-house database (SeqHound),6,7 which mirrors the NCBI latest GenBank tases and better overall survival (4). The fundamental difference release (v.123.0 Apr.2001), the NCBI taxonomy database, and the Brookhaven between these two cancer pathways lies in the underlying mechanism protein databank (9). Coding region information was derived from the se- of genomic instability (1). CRCs with chromosomal instability are quence annotations as entered in the GenBank flatfile by the individual record characterized by widespread chromosomal deletions and transloca- submitters. Although GenBank provides a reliable source of regularly updated tions, whereas those with MSI have ubiquitous DNA mutations (3, 5, records, it is a highly redundant database, and, thus, a single gene may be 6). As predicted by bacterial and yeast models, MMR deficiency leads represented as many as 20 times in our searches. In the second step, Kangaroo to instability of short repeated sequences, particularly mononucleo- searches through coding regions for the DNA pattern submitted by the user. We designed Kangaroo to permit searches of short and/or low complexity DNA sequences and query sequences that contain IUPAC DNA ambiguity Received 9/26/01; accepted 1/14/02. codes. The search algorithm is based on Regular Expression functions and is The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked advertisement in accordance with part of the NCBI C toolkit. The strategy described here was extended to search 18 U.S.C. Section 1734 solely to indicate this fact. different organism databases and to search general DNA and protein records. 1 Supported in part by the National Cancer Institute of Canada with funds from the To develop this search algorithm into a user-friendly, public, bioinformatics Terry Fox Run. M. R. was the recipient of a Research Scientist Award from the National Cancer Institute of Canada supported with funds provided by the Canadian Cancer tool, we amalgamated the features into a web-based application. Kangaroo, 8 Society. which runs on a four processor Sun Solaris server, can perform searches 2 These authors contributed equally to this work. through amino acids, DNA, and annotated coding regions in 10 different 3 To whom requests for reprints should be addressed, at Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115. Phone: (617) 732-7592; Fax: (617) 264-6301; E-mail: [email protected]. 5 Internet address: ftp://ncbi.nlm.nih.gov/toolbox/ncbi_tools. harvard.edu. 6 K. Michalickova, G. D. Bader, R. Isserlin, and C. W. V. Hogue. SeqHound biological 4 The abbreviations used are: CRC, colorectal cancer; NCBI, National Center for sequence database system as a platform for bioinformatics research, manuscript in Biotechnology Information; MMR, mismatch repair; MSH3, mutS (Escherichia coli) preparation. homologue 3; MSI, microsatellite instability; MSI-H, high frequency microsatellite in- 7 Internet address: http://bioinfo.mshri.on.ca. stability; PMS2, postmeiotic segregation increased (S. cervesiae)2. 8 Internet address: http://bioinfo.mshri.on.ca/kangaroo. 1284 Downloaded from cancerres.aacrjournals.org on October 4, 2021. © 2002 American Association for Cancer Research. MUTATION PROFILING OF MISMATCH REPAIR-DEFICIENT CANCERS organisms with custom flexibility that is not available in other recent database search tools (10–12). Tissue Samples and MSI Testing. Patients (Ͻ50 years of age) with resected CRCs were identified through the Ontario Cancer Registry in a population-based study (4). Paraffin-embedded tissues were obtained and a histopathological review performed to locate regions of high neoplastic cellu- larity (Ͼ50%). Tissue was microdissected and DNA extracted as described (4). Briefly, tissue was scraped from two to three unstained 10-␮m slides into 50–100 ␮l of lysis buffer [10 mm Tris-Cl (pH 7.0), 100 mm KCl, 2.5 mM MgCl2, and 0.45% Tween 20]. After a 10-min incubation at 95°C, tissue samples were subjected to proteinase K (20 mg/ml, 15–35 ␮l) digestion overnight at 65°C. A total of 16 human cancer cell lines were obtained from the American Type Culture Collection (Manassas, VA), including 7 MSI-H CRC cell lines (SW48, LS174T, LS411, LoVo, HCT-8, HCT-116, and DLD- 1), 1 MSI-H endometrial carcinoma cell line (HEC1A), and 8 microsatellite stable CRC cell lines (HT-29, SW480, SW620, SW837, SW1116, Colo320HSR, LS513, and LS1034; Refs. 13–16). DNA was extracted from the cell lines using DNeasy Tissue kit (Qiagen, Mississauga, ON), according to the manufacturer’s instructions. MSI was tested in the primary CRCs by PCR of Fig. 1. Semilog distribution of mononucleotide repetitive sequences identified in five reference panel loci outlined in the National Cancer Institute Workshop on GenBank records of annotated human coding region. Mononucleotides less than six bases in length were too abundant to enumerate using Kangaroo. For the purposes of this figure, Microsatellite Instability, and CRCs were classified as microsatellite stable, searches for coding microsatellites Ͼ13 nucleotides in length were truncated because of low frequency microsatellite instability, or MSI-H as defined (17). The loci apparent ambiguities in many of these sequence

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us