(12) United States Patent (10) Patent No.: US 8,407,013 B2 Rogan (45) Date of Patent: Mar

Total Page:16

File Type:pdf, Size:1020Kb

(12) United States Patent (10) Patent No.: US 8,407,013 B2 Rogan (45) Date of Patent: Mar USOO8407013B2 (12) United States Patent (10) Patent No.: US 8,407,013 B2 Rogan (45) Date of Patent: Mar. 26, 2013 (54) AB INITIOGENERATION OF SINGLE COPY Claverie, J-M.. “Computational Methods of the Identification of GENOMIC PROBES Genes in Vertebrate Genomic Sequences.” Hum Molec Genet, 1997. 6.10:1735-1744. Craig, J.M., et al., “Removal of Repetitive Sequences from FISH (76) Inventor: Peter K. Rogan, London (CA) Probes Using PCR-Assisted Affinity Chromatography.” Hum Genet, 1997, 100/3-4:472-476. (*) Notice: Subject to any disclaimer, the term of this Delcher, A.L., et al., “Alignment of Whole Genomes.” Nucl Acids patent is extended or adjusted under 35 Res, 1999, 27/11:2369-2376. U.S.C. 154(b) by 0 days. Devereux, J., et al., A Comprehensive Set of Sequence Analysis Programs for the VAX, NuclAcids Res, 1984, 12/1:387-395. Dover, G., et al., “Molecular Drive.” Trends in Genetics, 2002, (21) Appl. No.: 13/469,531 18.11:587-589. Edgar, R.C., et al., “PILER: Identification and Classification of (22) Filed: May 11, 2012 Genomic Repeats.” Bioinformatics, 2005, 21(S1):i152-i158. Eisenbarth, I., et al., "Long-Range Sequence Composition Mirrors (65) Prior Publication Data Linkage Disequilibrium Pattern in a 1.13 Mb Region of Human Chromosome 22, Human Molec Genet, 2001, 10/24:2833-2839. US 2012/O253689 A1 Oct. 4, 2012 Faranda, S., et al., “The Human Genes Encoding Renin-Binding Related U.S. Application Data Protein and Host Cell Factor are Closely Linked in Xq28 and Tran scribed in the Same Direction. Gene, 1995, 155:237-239. (63) Continuation of application No. 12/794.933, filed on Healy, J., et al., “Annotating Large Genomes with Exact Word Jun. 7, 2010, now Pat. No. 8,209,129, which is a Matches. Genome Res, 2003, 13:2306-2315. Howell, M.D., et al., “Rapid Identification of Hybridization Probes continuation of application No. 1 1/324,102, filed on for Chromosomal Walking.” Gene, 1987, 55:41-45. Dec. 30, 2005, now Pat. No. 7,734,424. Jareborg, N., et al., "Comparative Analysis of Noncoding Regions of 77 Orthologous Mouse and Human Gene Pairs.” Genome Res, 1999, (60) Provisional application No. 60/687,945, filed on Jun. 9:815-824. 7, 2005. Jurka, J., “Repeats in Genomic DNA: Mining and Meaning.” Curr Opin in Struct Biol, 1998, 8/3:333-337. (51) Int. Cl. Jurka, J., et al., “Censor-A Program for Identification and Elimina G06F 9/00 (2011.01) tion of Repetitive Elements from DNA Sequences.” Computers CI2N IS/II (2006.01) Chem, 1996, 20/1:119-121. Kent, W.J., et al., "Conservation, Regulation, Synteny, and Introns in CI2O I/68 (2006.01) a Large-Scale C. briggsae-C. elegans Genomic Alignment. Genome (52) U.S. Cl. ......................... 702/20:536/24.3:435/6.11 Res, 2000, 10:115-1125. (58) Field of Classification Search ........................ None Kent, W.J., “BLAT The Blast-Like Alignment Tool.” Genome Res., See application file for complete search history. 2002, 12:656-664. Li, Y-C., et al., “Microsatellites: Genomic Distribution, Putative (56) References Cited Functions and Mutational Mechanisms: A Review.” Molec Ecol, 2002, 11:2453-2465. Lichter, P., et al., “Delineation of Individual Human Chromosomes in U.S. PATENT DOCUMENTS Metaphase and Interphase Cells by In Situ Suppression Hybridiza 6,150,160 A 11/2000 Kazazian, Jr. tion Using Recombinant DNA Libraries.” Hum Genet, 1988, 6,828,097 B1 12/2004 Knoll et al. 80,3:224-234. 7,014,997 B2 3, 2006 Knoll et al. Morgenstern, B., et al., “DIALIGN: Finding Local Similarities by 2003/0022204 A1 1/2003 Lansdorp Multiple Sequence Alignment.” Bioinformatics, 1998, 14/3:290 2003/0044822 A1 3/2003 Fletcher et al. 2.94. 2003. O108943 A1 6/2003 Gray et al. Mottez, E., et al., “Conservation in the 5' Region of the Long Inter 2003. O1947.18 A1 10/2003 Tomita et al. spersed Mouse Ll Repeat: Implication of Comparative Sequence 2004O161773 A1 8/2004 Rogan et al. Analysis.” Nucl Acids Res, 1986, 14/7:31 19-3136. 2004/024.1734 A1 12/2004 Davis Nakamura, Y, et al., “Variable Number of Tandem Repeat (VNTR) 2005, OO64450 A1 3/2005 Lucas et al. Markers for Human Gene Mapping.” Science, 1987. 235: 1616-1622. FOREIGN PATENT DOCUMENTS (Continued) WO O188089 A2 11/2001 Primary Examiner — John S Brusca OTHER PUBLICATIONS (74) Attorney, Agent, or Firm — Tracy Jong Law Firm; Altschul, S.F., et al., “Basic Local Alignment Search Tool.” J Mol Tracy P. Jong Biol, 1990, 215/3:403-410. (57) ABSTRACT Bardoni, et al., “Isolation and Characterization of a Family of Single copy sequences Suitable for use as DNA probes can be Sequences Dispersed on the Human X Chromosome. Cytogenet and defined by computational analysis of genomic sequences. Cell Genet, Human Gene Mapping 9. Abstracts of Workshop Presen tations, Paris Conference, 1987, p. 575. The present invention provides an ab initio method for iden Batzoglou, S., et al., “Human and Mouse Gene Structure: Compara tification of single copy sequences for use as probes which tive Analysis and Application to Exon Prediction.” Genome obviates the need to compare genomic sequences with exist Research, 2000, 10:950-958. ing catalogs of repetitive sequences. By dividing a target Buhler, J., “Efficient Large-Scale Sequence Comparisonby Locality reference sequence into a series of shorter contiguous Sensitive Hashing.” Bioinformatics, 2001, 17/5:419–428. sequence windows and comparing these sequences with the Carrillo, H., et al., “The Multiple Sequence Alignment Problem in reference genome sequence, one can identify single copy Biology.” SIAM J Applied Math, 1988, 48/5:1073-1082. sequences in a genome. Probes can then be designed and Chang, P-C., et al., “Design and Assessment of Fast Algorithm for Identifying Specific Probes for Human and Mouse Genes.” produced from these single copy intervals. Bioinformatics, 2003, 19/11:1311-1317. 24 Claims, 2 Drawing Sheets US 8,407,013 B2 Page 2 OTHER PUBLICATIONS Schwartz, S., et al., “PipMaker-A Web Server for Aligning Two Genomic DNA Sequences.” Genome Res, 2000, 10:577-586. Newkirk, H.L., et. al., “Distortion of Quantitative Genomic and Smit, A.F.A., “The Origin of Interspersed Repeats in the Human Expression Hybridization by Cot-1 DNA: Mitigation of this Effect.” Genome.” Current Opin in Gen & Dev, 1996, 6/6:743-748. Vermeesch, J.R., et al., “Interstitial Telomeric Sequences at the Junc NuclAcids Res, 2005, 33/22:e 191, 8 pages. tion Site of a Jumping Translocation.” Human Genet, 1997, 99:735 Newkirk, H.L., et al., “Determination of Genomic Copy Number 737. with Quantitative Microsphere Hybridization.” Human Mutation, Vincens, P. et al., “A Strategy for Finding Regions of Similarity in 2006, 27/4:376-386. Complete Genome Sequences.” Bioinformatics, 1998, 14/8:715 Price, A.L., et al., “De Novo Identification of Repeat Families in 725. Large Genomes.” Bioinformatics, 2005, 21(S1):i1351-i1358. Zhang, Z. et al., “A Greedy Algorithm for Aligning DNA Rogan, P.K., et al., L1 Repeat Elements in the Human e-Gy-Globin Sequences.” J of Comp Biol. 2000, 7/1-2:203-214. Gene Intergenic Region: Sequence Analysis and Concerted Evolu Gene Expression: vol. 2. Eukaryotic Chromosomes, 1983, Lewin, B., tion with this Family, Mol Biol, 1987, 4/4:327-342. Ed., Wiley, p. 503, Wiley & Sons, Inc., New York City, New York. U.S. Patent Mar. 26, 2013 Sheet 2 of 2 US 8,407,013 B2 FIG. 2 INPUT 1, SEQUENCE OF REGION 202 2, LENGTH OF SUBSEQUENCE (L) 3. LENGTH OFFSETBETWEEN SUBSEQUENCES PROGRAMABINTO.PL. 204 CREATES A SET OF INDIVIDUAL SUBSEQUENCES COVERING REGION FOR GENOME COMPARISIONS SCRIPTWUBL (INPUT FROM ABINITIO.PL). suiciences 1. GENOME COMPARISON WITH WU-BLASTN 206 HAVE BEEN 2. PROGRAMBLASTPARSE:FILTER AND ANALYZED CONDENSE OUTPUT TO HIT LIST BASED ON EMPRICALLY DERVED CRITERA PROGRAM COUNTHITS, PLTAKES THE OUTPUT FROM BLASTPARSE.PL. 1. DISTILL HIT LIST FOREACHINTERVAL TO A COPY NUMBER 208 2. SORT BY SEQUENCE COORDINATE 3. IDENTIFY INTERVALS WITH MULTIPLE HITS (THESE CONTAIN REPEATELEMENTS) 4. RECORD SINGLE COPY INTERVALSAS SETA 210 1. GROUP ADJACENT SINGLE COPY INTERVALS INTO CONTIGS (L1...}, WHICHARE MEMBERS OF THE SETA 2. FOREACH CONTIG, CREATEA SERIES OF SUBSEQUENCES WITHSMALL OFFSETUPTOL FROM BEGINNING AND END OF CONTIG WITH PROGRAM SUBSEQ SPAWN INDEPENDENT THREADS UPSTREAMBOUNDARY (U) DOWNSTREAMBOUNDARY (D) UNTIL COUNTHITS CALL PROGRAMS. PRODUCES. HIT COUNT 1. SCRIPT WUBL >1 (DEFINESSINGLE COPY 2. PROGRAMBLASTPARSE BOUNDARY) 3. PROGRAMCOUNTHTS 1. FOREACH CONTIG, RECORD COORDINATES OF SINGLE COPY INTERVALBOUNDARIES (U.D) 2. COMBINE WITH ADJACENT SINGLE COPY CONTIG TO DEFINE COMPLETE INTERVAL (A-UA+D) US 8,407,013 B2 1. 2 AB INTO GENERATION OF SINGLE COPY blocking their hybridization, or by deducing the single copy GENOMIC PROBES sequences by comparisons of known genomic reference sequences with comprehensive databases of consensus CROSS REFERENCE TO RELATED sequences that are representative of established repetitive APPLICATIONS sequence families and subfamilies (Jurka, Curr Opin Struct Biol. 1998, 8(3):333-7). This continuation-in-part application claims the benefit of Cot-1 DNA is often used to attempt to suppress cross U.S. Ser. No. 60/687,945, filed Jun. 7, 2005, non-provisional hybridization of repetitive sequences to probes. The problem application U.S. Ser. No. 1 1/324,102 filed on Dec. 30, 2005 with attempting to suppress repeat hybridization with Cot-1 and now U.S. Pat. No. 7,734,424 issued Jun. 8, 2010, and 10 DNA is that it can result in enhanced non-specific hybridiza continuation application U.S. Ser. No. 12/794,933 filed on tion between probes and genomic targets. Specifically, it has Jun. 7, 2010, also publication number US 2010-024.0880A1.
Recommended publications
  • Easybuild Documentation Release 20210907.0
    EasyBuild Documentation Release 20210907.0 Ghent University Tue, 07 Sep 2021 08:55:41 Contents 1 What is EasyBuild? 3 2 Concepts and terminology 5 2.1 EasyBuild framework..........................................5 2.2 Easyblocks................................................6 2.3 Toolchains................................................7 2.3.1 system toolchain.......................................7 2.3.2 dummy toolchain (DEPRECATED) ..............................7 2.3.3 Common toolchains.......................................7 2.4 Easyconfig files..............................................7 2.5 Extensions................................................8 3 Typical workflow example: building and installing WRF9 3.1 Searching for available easyconfigs files.................................9 3.2 Getting an overview of planned installations.............................. 10 3.3 Installing a software stack........................................ 11 4 Getting started 13 4.1 Installing EasyBuild........................................... 13 4.1.1 Requirements.......................................... 14 4.1.2 Using pip to Install EasyBuild................................. 14 4.1.3 Installing EasyBuild with EasyBuild.............................. 17 4.1.4 Dependencies.......................................... 19 4.1.5 Sources............................................. 21 4.1.6 In case of installation issues. .................................. 22 4.2 Configuring EasyBuild.......................................... 22 4.2.1 Supported configuration
    [Show full text]
  • Bioperl What’S Bioperl?
    Bioperl What’s Bioperl? Bioperl is not a new language It is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. Bioperl and perl Bioperl Modules Perl Modules Perls script input Perl Interpreter output Bioperl and Perl Why bioperl for bioinformatics? Perl is good at file manipulation and text processing, which make up a large part of the routine tasks in bioinformatics. Perl language, documentation and many Perl packages are freely available. Perl is easy to get started in, to write small and medium-sized programs. Where to get help Type perldoc <modulename> in terminal Search for particular module in https://metacpan.org Bioperl Document Object-oriented and Process-oriented programming Process-oriented: Yuan Hao eats chicken Name object: $name Action method: eat Food object: $food Object-oriented: $name->eat($food) Modularize the program Platform and Related Software Required Perl 5.6.1 or higher Version 5.8 or higher is highly recommended make for Mac OS X, this requires installing the Xcode Developer Tools Installation On Linux or Max OS X Install from cpanminus: perlbrew install-cpanm cpanm Bio::Perl Install from source code: git clone https://github.com/bioperl/bioperl-live.git cd bioperl-live perl Build.PL ./Build test (optional) ./Build install Installation On Windows Install MinGW (MinGW is incorporated in Strawberry Perl, but must it be installed through PPM for ActivePerl) : ppm install MinGW Install Module::Build, Test::Harness and Test::Most through CPAN: Type cpan to enter the CPAN shell. At the cpan> prompt, type install CPAN Quit (by typing ‘q’) and reload CPAN.
    [Show full text]
  • The Bioperl Toolkit: Perl Modules for the Life Sciences
    Downloaded from genome.cshlp.org on January 25, 2012 - Published by Cold Spring Harbor Laboratory Press The Bioperl Toolkit: Perl Modules for the Life Sciences Jason E. Stajich, David Block, Kris Boulez, et al. Genome Res. 2002 12: 1611-1618 Access the most recent version at doi:10.1101/gr.361602 Supplemental http://genome.cshlp.org/content/suppl/2002/10/20/12.10.1611.DC1.html Material References This article cites 14 articles, 9 of which can be accessed free at: http://genome.cshlp.org/content/12/10/1611.full.html#ref-list-1 Article cited in: http://genome.cshlp.org/content/12/10/1611.full.html#related-urls Email alerting Receive free email alerts when new articles cite this article - sign up in the box at the service top right corner of the article or click here To subscribe to Genome Research go to: http://genome.cshlp.org/subscriptions Cold Spring Harbor Laboratory Press Downloaded from genome.cshlp.org on January 25, 2012 - Published by Cold Spring Harbor Laboratory Press Resource The Bioperl Toolkit: Perl Modules for the Life Sciences Jason E. Stajich,1,18,19 David Block,2,18 Kris Boulez,3 Steven E. Brenner,4 Stephen A. Chervitz,5 Chris Dagdigian,6 Georg Fuellen,7 James G.R. Gilbert,8 Ian Korf,9 Hilmar Lapp,10 Heikki Lehva¨slaiho,11 Chad Matsalla,12 Chris J. Mungall,13 Brian I. Osborne,14 Matthew R. Pocock,8 Peter Schattner,15 Martin Senger,11 Lincoln D. Stein,16 Elia Stupka,17 Mark D. Wilkinson,2 and Ewan Birney11 1University Program in Genetics, Duke University, Durham, North Carolina 27710, USA; 2National Research Council of
    [Show full text]
  • Vasco Da Rocha Figueiras Algoritmos Para Genómica Comparativa
    Universidade de Aveiro Departamento Electrónica, Telecomunicações 2010 e Informática Vasco da Rocha Algoritmos para Genómica Comparativa Figueiras Universidade de Aveiro Departamento de Electrónica, Telecomunicações 2010 e Informática Vasco da Rocha Algoritmos para Genómica Comparativa Figueiras Dissertação apresentada à Universidade de Aveiro para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Engenharia Electrónica e Telecomunicações, realizada sob a orientação científica do Doutor José Luís Oliveira, Professor associado da Universidade de Aveiro. o júri presidente Prof. Doutor Armando José Formoso de Pinho Professor Associado do Departamento de Electrónica, Telecomunicações e Informática da Universidade de Aveiro Prof. Doutor Rui Pedro Sanches de Castro Lopes Professor Coodenador do Departamento de Informática e Comunicações do Instituto Politécnico de Bragança orientador Prof. Doutor José Luis Guimarães de Oliveira Professor Associado do Departamento de Electrónica, Telecomunicações e Informática da Universidade de Aveiro agradecimentos Durante o desenvolvimento desta dissertação, recebi muito apoio de colegas e amigos. Agora que termino o trabalho não posso perder a oportunidade de agradecer a todas as pessoas que me ajudaram nesta etapa. Um agradecimento ao meu orientador Professor Doutor José Luís Oliveira e ao Doutor Miguel Monsanto Pinheiro, pela oportunidade de aprendizagem. À minha família pelo apoio incondicional, incentivo e carinho. Aos meus colegas e amigos por toda a ajuda e pelos momentos de convívio e de descontracção, que quebraram tantas dificuldades, ao longo desta etapa. A ti, Rita por todo o carinho, paciência e amor. palavras-chave Bioinformática, sequenciação, BLAST, alinhamento de sequências, genómica comparativa. resumo Com o surgimento da Genómica e da Proteómica, a Bioinformática conduziu a alguns dos avanços científicos mais relevantes do século XX.
    [Show full text]
  • Perl for Bioinformatics Perl and Bioperl I
    Perl for Bioinformatics Perl and BioPerl I Jason Stajich July 13, 2011 Perl for Bioinformatics Slide 1/83 Outline Perl Intro Basics Syntax and Variables More complex: References Routines Reading and Writing Regular Expressions BioPerl Intro Useful modules Data formats Databases in BioPerl Sequence objects details Trees Multiple Alignments BLAST and Sequence Database searching Other general Perl modules Perl for Bioinformatics Perl Intro Slide 2/83 Outline Perl Intro Basics Syntax and Variables More complex: References Routines Reading and Writing Regular Expressions BioPerl Intro Useful modules Data formats Databases in BioPerl Sequence objects details Trees Multiple Alignments BLAST and Sequence Database searching Other general Perl modules Perl for Bioinformatics Perl Intro Basics Slide 3/83 Why Perl for data processing & bioinformatics Fast text processing Regular expressions Extensive module libraries for pre-written tools Large number of users in community of bioinformatics Scripts are often faster to write than full compiled programs Cons Syntax sometimes confusing to new users; 'There's more than one way to do it' can also be confusing. (TMTOWTDI) Cons Not a true object-oriented language so some abstraction is clunky and hacky Scripting languages (Perl, Python, Ruby) generally easier to write simply than compiled ones (C, C++, Java) as they are often not strongly typed and less memory management control. Perl for Bioinformatics Perl Intro Basics Slide 4/83 Perl packages CPAN - Comprehensive Perl Archive http://www.cpan.org Perl
    [Show full text]
  • Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb
    pipenightdreams osgcal-doc mumudvb mpg123-alsa tbb-examples libgammu4-dbg gcc-4.1-doc snort-rules-default davical cutmp3 libevolution5.0-cil aspell-am python-gobject-doc openoffice.org-l10n-mn libc6-xen xserver-xorg trophy-data t38modem pioneers-console libnb-platform10-java libgtkglext1-ruby libboost-wave1.39-dev drgenius bfbtester libchromexvmcpro1 isdnutils-xtools ubuntuone-client openoffice.org2-math openoffice.org-l10n-lt lsb-cxx-ia32 kdeartwork-emoticons-kde4 wmpuzzle trafshow python-plplot lx-gdb link-monitor-applet libscm-dev liblog-agent-logger-perl libccrtp-doc libclass-throwable-perl kde-i18n-csb jack-jconv hamradio-menus coinor-libvol-doc msx-emulator bitbake nabi language-pack-gnome-zh libpaperg popularity-contest xracer-tools xfont-nexus opendrim-lmp-baseserver libvorbisfile-ruby liblinebreak-doc libgfcui-2.0-0c2a-dbg libblacs-mpi-dev dict-freedict-spa-eng blender-ogrexml aspell-da x11-apps openoffice.org-l10n-lv openoffice.org-l10n-nl pnmtopng libodbcinstq1 libhsqldb-java-doc libmono-addins-gui0.2-cil sg3-utils linux-backports-modules-alsa-2.6.31-19-generic yorick-yeti-gsl python-pymssql plasma-widget-cpuload mcpp gpsim-lcd cl-csv libhtml-clean-perl asterisk-dbg apt-dater-dbg libgnome-mag1-dev language-pack-gnome-yo python-crypto svn-autoreleasedeb sugar-terminal-activity mii-diag maria-doc libplexus-component-api-java-doc libhugs-hgl-bundled libchipcard-libgwenhywfar47-plugins libghc6-random-dev freefem3d ezmlm cakephp-scripts aspell-ar ara-byte not+sparc openoffice.org-l10n-nn linux-backports-modules-karmic-generic-pae
    [Show full text]
  • Web & Grid Technologies in Bioinformatics, Computational And
    Current Bioinformatics, 2008, 3, 000-000 1 Web & Grid Technologies in Bioinformatics, Computational and Systems Biology: A Review Azhar A. Shah1, Daniel Barthel1, Piotr Lukasiak2,3, Jacek Blazewicz2,3 and Natalio Krasnogor*,1 1School of Computer Science, University of Nottingham, Jubilee Campus, NG81BB, Nottingham, UK 2Institute of Computing Science, Poznan University of Technology, Piotrowo 2, 60-965 Poznan, Poland 3Institute of Bioorganic Chemistry, Laboratory of Bioinformatics, Polish Academy of Sciences, Noskowskiego 12/14, 61- 704 Poznan, Poland Abstract: The acquisition of biological data, ranging from molecular characterization and simulations (e.g. protein fold- ing dynamics), to systems biology endeavors (e.g. whole organ simulations) all the way up to ecological observations (e.g. as to ascertain climate change’s impact on the biota) is growing at unprecedented speed. The use of computational and networking resources is thus unavoidable. As the datasets become bigger and the acquisition technology more refined, the biologist is empowered to ask deeper and more complex questions. These, in turn, drive a runoff effect where large re- search consortia emerge that span beyond organizations and national boundaries. Thus the need for reliable, robust, certi- fied, curated, accessible, secure and timely data processing and management becomes entrenched within, and crucial to, 21st century biology. Furthermore, the proliferation of biotechnologies and advances in biological sciences has produced a strong drive for new informatics solutions, both at the basic science and technological levels. The previously unknown situation of dealing with, on one hand, (potentially) exabytes of data, much of which is noisy, has large experimental er- rors or theoretical uncertainties associated with it, or on the other hand, large quantities of data that require automated computationally intense analysis and processing, have produced important innovations in web and grid technology.
    [Show full text]
  • BINF 634 Bioinformatics Programming
    UNIVERSITA’ DEGLI Teacher: Matteo Re STUDI DI MILANO Metodi e linguaggi per il trattamento dei dati PERL scripting S introduction • PERL programming • Problem solving and Debugging • To read and write documentation • Data manipulation: filtering and transformation • Pattern matching and data mining (examples) • Example application: Computational Biology • Analysis and manipulation of biological sequences • Interaction with biological batabases (NCBI, EnsEMBL, UCSC) • BioPERL Objectives Guidelines • Operating system • During class appointments we will use windows • PERL installation WIN: http://www.activestate.com/activeperl/downloads UNIX, MacOS: available by default • Text editor PERL are saved as text files. Many options available… Vim (UNIX like OS) Notepad (Windows) Sequence file – FASTA format >gi|40457238|HIV-1 isolate 97KE128 from Kenya gag gene, partial cds CTTTTGAATGCATGGGTAAAAGTAATAGAAGAAAGAGGTTTCAGTCCAGAAGTAATACCCATGTTCTCAG CATTATCAGAAGGAGCCACCCCACAAGATTTAAATACGATGCTGAACATAGTGGGGGGACACCAGGCAGC TATGCAAATGCTAAAGGATACCATCAATGAGGAAGCTGCAGAATGGGACAGGTTACATCCAGTACATGCA GGGCCTATTCCGCCAGGCCAGATGAGAGAACCAAGGGGAAGTGACATAGCAGGAACTACTAGTACCCCTC AAGAACAAGTAGGATGGATGACAAACAATCCACCTATCCCAGTGGGAGACATCTATAAAAGATGGATCAT CCTGGGCTTAAATAAAATAGTAAGAATGTATAGCCCTGTTAGCATTTTGGACATAAAACAAGGGCCAAAA GAACCCTTTAGAGACTATGTAGATAGGTTCTTTAAAACTCTCAGAGCCGAACAAGCTT >gi|40457236| HIV-1 isolate 97KE127 from Kenya gag gene, partial cds TTGAATGCATGGGTGAAAGTAATAGAAGAAAAGGCTTTCAGCCCAGAAGTAATACCCATGTTCTCAGCAT TATCAGAAGGAGCCACCCCACAAGATTTAAATATGATGCTGAATATAGTGGGGGGACACCAGGCAGCTAT
    [Show full text]
  • Program Performance Review
    CALIFORNIA STATE UNIVERSITY, FULLERTON Program Performance Review Master of Science in Computer Science Department of Computer Science 3/15/2013 This is a self-study report of the Master of Science in Computer Science for Program Performance Review Program Performance Review Master of Science in Computer Science, 2011 – 2012 I. Department/Program Mission, Goals and Environment A. The mission of the MS in Computer Science Program is to enable each graduate to lay a solid foundation in the scientific, engineering, and other aspects of computing that prepare the graduate regardless of his/her background for a successful career that can advance the creation and application of computing technologies in the flat world. The learning goals of the MS in computer science program are as follows: 1. Students will have a solid foundation necessary to function effectively in a responsible technical or management position in global development environment and/or to pursue a Ph.D. in related areas. • Ability to identify, formulate, and solve computer science related problems. • Ability to understand professional and ethical responsibility in computer science related fields. • Ability to understand major contemporary issues in Computer Science. • Ability to apply process thinking and statistical thinking in solving and analyzing a complex problem. 2. Students will learn the technical skills necessary to design, implement, verify, and validate solutions in computer systems and applications. • Ability to design, implement, and test a computer system, component, or algorithm to meet specified needs. • Ability to evaluate the performance of a computing solution. • Ability to use the techniques, skills, and modern tools necessary for computer science related practices.
    [Show full text]
  • Technical Notes All Changes in Fedora 13
    Fedora 13 Technical Notes All changes in Fedora 13 Edited by The Fedora Docs Team Copyright © 2010 Red Hat, Inc. and others. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. The original authors of this document, and Red Hat, designate the Fedora Project as the "Attribution Party" for purposes of CC-BY-SA. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the Infinity Logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. For guidelines on the permitted uses of the Fedora trademarks, refer to https:// fedoraproject.org/wiki/Legal:Trademark_guidelines. Linux® is the registered trademark of Linus Torvalds in the United States and other countries. Java® is a registered trademark of Oracle and/or its affiliates. XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. All other trademarks are the property of their respective owners. Abstract This document lists all changed packages between Fedora 12 and Fedora 13.
    [Show full text]
  • IT-SC Beginning Perl for Bioinformatics James Tisdall Publisher: O'reilly First Edition October 2001 ISBN: 0-596-00080-4, 384 Pages
    IT-SC Beginning Perl for Bioinformatics James Tisdall Publisher: O'Reilly First Edition October 2001 ISBN: 0-596-00080-4, 384 pages This book shows biologists with little or no programming experience how to use Perl, the ideal language for biological data analysis. Each chapter focuses on solving particular problems or class of problems, so you'll finish the book with a solid understanding of Perl basics, a collection of programs for such tasks as parsing BLAST and GenBank, and the skills to tackle more advanced bioinformatics programming. $ IT-SC IT-SC 2 Preface What Is Bioinformatics? About This Book Who This Book Is For Why Should I Learn to Program? Structure of This Book Conventions Used in This Book Comments and Questions Acknowledgments 1. Biology and Computer Science 1.1 The Organization of DNA 1.2 The Organization of Proteins 1.3 In Silico 1.4 Limits to Computation 2. Getting Started with Perl 2.1 A Low and Long Learning Curve 2.2 Perl's Benefits 2.3 Installing Perl on Your Computer 2.4 How to Run Perl Programs 2.5 Text Editors 2.6 Finding Help 3. The Art of Programming 3.1 Individual Approaches to Programming 3.2 Edit—Run—Revise (and Save) 3.3 An Environment of Programs 3.4 Programming Strategies 3.5 The Programming Process 4. Sequences and Strings 4.1 Representing Sequence Data 4.2 A Program to Store a DNA Sequence 4.3 Concatenating DNA Fragments 4.4 Transcription: DNA to RNA 4.5 Using the Perl Documentation 4.6 Calculating the Reverse Complement in Perl 4.7 Proteins, Files, and Arrays 4.8 Reading Proteins in Files 4.9 Arrays 4.10 Scalar and List Context 4.11 Exercises 5.
    [Show full text]
  • Towards Left Duff S Mdbg Holt Winters Gai Incl Tax Drupal Fapi Icici
    jimportneoneo_clienterrorentitynotfoundrelatedtonoeneo_j_sdn neo_j_traversalcyperneo_jclientpy_neo_neo_jneo_jphpgraphesrelsjshelltraverserwritebatchtransactioneventhandlerbatchinsertereverymangraphenedbgraphdatabaseserviceneo_j_communityjconfigurationjserverstartnodenotintransactionexceptionrest_graphdbneographytransactionfailureexceptionrelationshipentityneo_j_ogmsdnwrappingneoserverbootstrappergraphrepositoryneo_j_graphdbnodeentityembeddedgraphdatabaseneo_jtemplate neo_j_spatialcypher_neo_jneo_j_cyphercypher_querynoe_jcypherneo_jrestclientpy_neoallshortestpathscypher_querieslinkuriousneoclipseexecutionresultbatch_importerwebadmingraphdatabasetimetreegraphawarerelatedtoviacypherqueryrecorelationshiptypespringrestgraphdatabaseflockdbneomodelneo_j_rbshortpathpersistable withindistancegraphdbneo_jneo_j_webadminmiddle_ground_betweenanormcypher materialised handaling hinted finds_nothingbulbsbulbflowrexprorexster cayleygremlintitandborient_dbaurelius tinkerpoptitan_cassandratitan_graph_dbtitan_graphorientdbtitan rexter enough_ram arangotinkerpop_gremlinpyorientlinkset arangodb_graphfoxxodocumentarangodborientjssails_orientdborientgraphexectedbaasbox spark_javarddrddsunpersist asigned aql fetchplanoriento bsonobjectpyspark_rddrddmatrixfactorizationmodelresultiterablemlibpushdownlineage transforamtionspark_rddpairrddreducebykeymappartitionstakeorderedrowmatrixpair_rddblockmanagerlinearregressionwithsgddstreamsencouter fieldtypes spark_dataframejavarddgroupbykeyorg_apache_spark_rddlabeledpointdatabricksaggregatebykeyjavasparkcontextsaveastextfilejavapairdstreamcombinebykeysparkcontext_textfilejavadstreammappartitionswithindexupdatestatebykeyreducebykeyandwindowrepartitioning
    [Show full text]