Interproscan-Docs Documentation
Total Page:16
File Type:pdf, Size:1020Kb
interproscan-docs Documentation EMBL-EBI Aug 18, 2021 Contents: 1 Introduction 1 1.1 What is InterProScan?..........................................1 1.2 Supported platforms...........................................1 1.3 To install and run InterProScan.....................................1 1.3.1 LSF cluster users........................................1 2 Release notes: InterProScan 5.52-86.03 2.1 What’s new................................................3 2.1.1 Data update...........................................3 2.1.2 Software updates........................................3 2.1.3 Other updates..........................................3 2.1.4 Known issues..........................................3 2.1.5 Reporting issues.........................................4 3 Installation requirements 5 3.1 How to check these on a system?....................................5 3.1.1 Which version of Linux am I running?.............................5 3.1.2 Testing your Perl installation..................................6 3.1.3 Testing your Python installation................................6 3.1.4 Testing the Java environment..................................7 4 Obtaining a copy of InterProScan9 4.1 Obtaining the core InterProScan software................................9 4.2 Index hmm models............................................ 10 4.3 Panther models.............................................. 10 4.4 Using the Local Pre-calculated Match Lookup Service (optional)................... 10 5 Running InterProScan 11 5.1 InterProScan test run........................................... 14 5.2 Command-line options.......................................... 15 5.2.1 -dp / –disable-precalc (optional)................................ 15 5.2.2 -appl / –applications application_name (optional)....................... 15 5.2.3 -i / –fasta sequence_file ..................................... 15 5.2.4 -iprlookup,–iprlookup...................................... 15 5.2.5 -goterms,–goterms (optional).................................. 16 5.2.6 -b / –output-file-base file_name (optional)........................... 16 5.2.7 -o / –outfile (optional)...................................... 16 i 5.2.8 -pa / –pathways (optional)................................... 16 5.2.9 -t / –seqtype (optional)..................................... 16 5.2.10 -T / –tempdir (optional)..................................... 16 5.2.11 -dra / –disable-residue-annot (optional)............................ 17 5.2.12 -version / –version (optional).................................. 17 5.3 Included analyses............................................. 17 5.4 Output format.............................................. 18 5.5 Optional configuration.......................................... 18 5.5.1 Working directory for temporary files............................. 18 5.5.2 Configuring the Pre-calculated Match Lookup Service.................... 18 5.6 Running InterProScan on an LSF/SGE Cluster............................. 18 6 Input formats 19 6.1 Supported input file format........................................ 19 6.2 Supported sequence format....................................... 19 7 Output formats 21 7.1 Tab-separated values format (TSV)................................... 21 7.1.1 Example output......................................... 22 7.2 Extensible Markup Language (XML).................................. 22 7.2.1 Example output......................................... 22 7.3 The XML Schema Definition...................................... 24 7.4 JavaScript Object Notation (JSON)................................... 24 7.4.1 Example output......................................... 25 7.5 Generic Feature Format Version 3 (GFF3)................................ 26 7.5.1 Example output......................................... 26 7.6 SVG and HTML............................................. 27 7.6.1 Example output......................................... 27 8 Nucleic acid sequences scan 29 8.1 The Open Reading Frame prediction tool................................ 29 8.2 How can I scan nucleic acid sequences in InterProScan 5?....................... 29 8.3 Which output formats are supported?.................................. 30 8.4 Redundant sequences and identifiers in your FASTA file........................ 30 8.5 Improving performance......................................... 30 8.5.1 Selecting the ORFs to analyse................................. 31 9 The InterProScan Lookup Match Service 33 9.1 Installing the lookup service locally................................... 33 9.2 System requirements........................................... 34 9.3 Obtaining the lookup service....................................... 34 9.3.1 Run with graphical user interface (to set port number)..................... 35 9.3.2 Run “Headless” (no graphical user interface)......................... 35 9.4 Waiting for the lookup service to start.................................. 36 9.5 Testing the service............................................ 36 9.6 Configure InterProScan 5 to use your local lookup service....................... 37 10 Running InterProScan 5 in Cluster Mode 39 10.1 Initial Setup............................................... 39 10.1.1 Cluster submission commands................................. 40 10.1.2 Master configuration options.................................. 41 10.2 Example usage on an LSF, SGE and other clusters........................... 41 10.3 clusterrunid................................................ 41 10.4 In house tested cluster versions..................................... 41 10.5 Related issues.............................................. 42 ii 11 Running InterProScan 5 in CONVERT mode 43 11.1 Usage instructions............................................ 43 11.2 Example Usage.............................................. 44 12 Improving performance 47 12.1 Review your CPU (and memory) command options........................... 47 12.2 Consider chunking large input files................................... 48 12.3 Review your command line input options................................ 48 12.3.1 Running InterProScan in CLUSTER mode........................... 48 12.4 Configure to analyse fewer ORFs (applies to nucleic acid sequences only)............... 48 13 Activating Phobius/SignalP/TMHMM analyses 49 13.1 Phobius.................................................. 49 13.2 SignalP.................................................. 49 13.3 TMHMM................................................. 50 14 Providing your feedback 51 14.1 Support requests............................................. 51 14.2 General discussion and suggestions................................... 51 15 Known issues 53 15.1 Open issues in InterProScan....................................... 53 15.1.1 1. CDD/RPSBlast errors..................................... 53 15.1.2 2. Coils errors.......................................... 53 15.1.3 Contacting us.......................................... 54 16 FAQ 55 16.1 What should I do if one of the binaries included with InterProScan doesn’t work on my system?... 55 16.2 Where can I find the XSD of the XML output?............................. 55 16.3 Can I use different binary versions than listed?............................. 55 16.4 Which cluster does InterProScan support?................................ 56 16.5 Is there Galaxy has a wrapper for InterProScan?............................ 56 16.5.1 Documentation and contact details............................... 56 16.5.2 Publication........................................... 56 16.6 I get Java errors on running InterProScan................................ 56 16.7 How to analyse a huge amount of protein sequences (>30000)?.................... 56 16.8 Should I filter by e-value?........................................ 57 16.9 Why do I see “Pre-calculated match lookup service failed - analysis proceeding to run locally”?... 57 16.10 How is InterProScan 5 different from InterProScan 4? How do I migrate?............... 57 17 Installing and compiling binaries used in Interproscan 59 17.1 cath-resolve-hits (used by CATH-Gene3D)............................... 59 17.2 Pfscan/Pfsearch (used by ProSite Profiles, ProSite Patterns and HAMAP)............... 59 17.3 Hmmer 2 (used by SMART)....................................... 60 17.4 Hmmer 3 (used by CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, SFLD, SUPERFAMILY and TIGRFAMs)............................................... 60 17.5 ncoils (used by Coils).......................................... 60 17.6 fingerPRINTScan (used by PRINTS).................................. 61 17.7 rpsblast/rpsbproc (used by CDD).................................... 61 17.8 sfld_preprocess/sfld_postprocess (used by SFLD)............................ 62 17.9 Phobius, TMHMM or SignalP...................................... 62 18 Configuration Options 63 19 Cluster mode benchmark run 65 iii 19.1 Benchmark run setup........................................... 65 19.1.1 Which version of InterProScan 5 (I5) was used for this run?................. 65 19.1.2 How was the set of input sequences assembled for this run?.................. 65 19.1.3 Which I5 command was used for this run?........................... 65 19.1.4 How does the interproscan.properties file look like?...................... 66 19.1.5 On which cluster/farm did we run I5?............................. 66 19.2 Benchmark run outcome......................................... 66 20 Change log for InterProScan JSON output format 67 20.1 InterProScan 5.31-70.0.........................................