CLC Science Server Administrator

CLC Science Server Administrator USER MANUAL Administrator Manual for CLC Science Server 3.0 including CLC Bioinformatics Database Windows, Mac OS X and Linux February 7, 2014 This software is for research purposes only. CLC bio, a QIAGEN Company Silkeborgvej 2 Prismet DK-8000 Aarhus C Denmark Contents 1 Introduction 8 1.1 System requirements................................. 10 1.2 Licensing........................................ 10 1.3 Latest improvements................................. 10 2 Installation 11 2.1 Quick installation guide................................ 11 2.2 Installing the database................................ 11 2.2.1 Download and install a Database Management System........... 11 2.2.2 Create a new database and user/role.................... 12 2.2.3 Initialize the database............................. 12 2.3 Installing and running the Server........................... 13 2.3.1 Installing the Server software......................... 14 2.4 Silent installation................................... 15 2.5 Upgrading an existing installation........................... 16 2.5.1 Upgrading major versions........................... 16 2.6 Allowing access through your firewall......................... 17 2.7 Downloading a license................................. 17 2.7.1 Windows license download.......................... 17 2.7.2 Mac OS license download........................... 18 2.7.3 Linux license download............................ 18 2.8 Starting and stopping the server........................... 19 2.8.1 Microsoft Windows.............................. 19 2.8.2 Mac OS X................................... 19 2.8.3 Linux...................................... 19 3 CONTENTS 4 2.9 Installing relevant plugins in the Workbench..................... 20 3 Configuring and administering the server 22 3.1 Logging into the administrative interface....................... 22 3.2 Adding locations for saving data........................... 22 3.2.1 Adding a database location.......................... 22 3.2.2 Adding a file system location......................... 23 Important points about the CLC Server data in the file system locations.. 24 File locations for job node set-ups...................... 24 3.2.3 Rebuilding the index.............................. 24 3.3 Accessing files on, and writing to, areas of the server filesystem.......... 25 3.4 Changing the listening port.............................. 27 3.5 Changing the tmp directory.............................. 27 3.5.1 Job node setups................................ 27 3.6 Setting the amount of memory available for the JVM................ 28 3.7 Limiting the number of cpus available for use.................... 28 3.8 Other configurations.................................. 28 3.8.1 HTTP settings................................. 28 3.8.2 Audit log settings............................... 29 3.8.3 Deployment of server information to CLC Workbenches........... 29 3.9 Server plugins..................................... 29 3.10 Queue......................................... 29 4 Managing users and groups 31 4.1 Logging in the first time and the root password................... 31 4.2 User authentication using the web interface..................... 31 4.2.1 Managing users using the web interface................... 32 4.2.2 Managing groups using the web interface.................. 33 4.3 User authentication using the Workbench...................... 34 4.3.1 Managing users through the Workbench................... 34 4.3.2 Managing groups through the Workbench.................. 34 4.3.3 Adding users to a group............................ 35 4.4 User statistics..................................... 35 CONTENTS 5 5 Access privileges and permissions 37 5.1 Controlling access to data.............................. 37 5.1.1 Setting permissions on a folder........................ 38 5.1.2 Recycle bin................................... 39 5.1.3 Technical notes about permissions and security............... 40 5.2 Global permissions.................................. 40 5.3 Customized attributes on data locations....................... 41 5.3.1 Configuring which fields should be available................. 42 5.3.2 Editing lists.................................. 43 5.3.3 Removing attributes.............................. 43 5.3.4 Changing the order of the attributes..................... 44 5.4 Filling in values.................................... 44 5.4.1 What happens when the sequence is copied to another data location?.. 45 5.4.2 Searching................................... 46 6 Job Distribution 47 6.1 Model I: Master server with execution nodes.................... 48 6.1.1 Master and job nodes explained....................... 48 6.1.2 User credentials on a master-job node setup................ 48 6.1.3 Configuring your setup............................. 49 6.1.4 Installing Server plug-ins........................... 51 6.2 Model II: Master server submitting to grid nodes.................. 51 6.2.1 Requirements for CLC Grid Integration.................... 51 Supported grid scheduling systems...................... 52 6.2.2 Technical overview............................... 52 6.2.3 Setting up the grid integration........................ 53 6.2.4 Licensing of grid workers........................... 54 6.2.5 Configuring licenses as a consumable resource............... 55 6.2.6 Configure grid presets............................. 55 6.2.7 Controlling the number of cores utilized................... 57 6.2.8 Other grid worker options........................... 59 6.2.9 Testing a Grid Preset............................. 59 CONTENTS 6 6.2.10 Client-side: starting CLC jobs on the grid................... 59 Installing the CLC Workbench Client Plugin.................. 59 Starting grid jobs............................... 60 6.2.11 Grid Integration Tips.............................. 60 6.2.12 Understanding memory settings....................... 61 7 External applications 62 7.1 External applications integration: Basic configuration................ 63 7.2 External applications integration: Post-processing.................. 66 7.3 External applications integration: Stream handling.................. 66 7.4 External applications integration: Environment.................... 66 7.4.1 Environmental Variables............................ 66 7.4.2 Working directory............................... 66 7.4.3 Execute as master process.......................... 67 7.5 External applications integration: Parameter flow.................. 67 7.6 External applications integration: Velvet....................... 67 7.6.1 Installing Velvet................................ 67 7.6.2 Running Velvet from the Workbench..................... 68 7.6.3 Understanding the Velvet configuration.................... 69 7.7 Troubleshooting.................................... 70 7.7.1 Checking the configuration.......................... 70 7.7.2 Check your third party application....................... 70 7.7.3 Is your Import/Export directory configured?................. 71 7.7.4 Check your naming.............................. 71 8 Workflows 72 8.1 Installing and configuring workflows......................... 72 8.2 Executing workflows.................................. 72 8.3 Automatic update of workflow elements....................... 74 9 Appendix 77 9.1 Troubleshooting.................................... 77 9.1.1 Check set-up.................................. 77 CONTENTS 7 9.1.2 Bug reporting................................. 77 9.2 Database configurations............................... 79 9.2.1 Getting and installing JDBC drivers...................... 79 9.2.2 Configurations for MySQL........................... 80 9.3 SSL and encryption.................................. 80 9.3.1 Enabling SSL on the server.......................... 80 Creating a PKCS12 keystore file....................... 81 9.3.2 Logging in using SSL from the Workbench.................. 81 9.3.3 Logging in using SSL from the CLC Server Command Line Tools ....... 81 9.4 DRMAA libraries.................................... 83 9.4.1 DRMAA for LSF................................. 83 9.4.2 DRMAA for PBS Pro.............................. 83 9.4.3 DRMAA for OGE or SGE............................ 84 9.5 Third party libraries.................................. 84 Bibliography 85 Index 85 Chapter 1 Introduction The CLC Server is the central part of CLC bio's enterprise solutions. You can see an overview of the server solution in figure 1.1. For documentation on the customization and integration, please see the developer kit for the server at http://www.clcdeveloper.com. The CLC Science Server is shipped with the following tool that can be started from the Workbench: • Alignments and Trees – Create Alignment – Create Tree – Maximum Likelihood Phylogeny • General Sequence Analysis – Extract Annotations – Extract Sequences • Protein Analysis – Signal Peptide Prediction • Nucleotide Analysis – Translate to Protein – Convert DNA to RNA – Convert RNA to DNA – Reverse Complement Sequence – Reverse Sequence – Find Open Reading Frames • Sequencing Data Analysis – Trim Sequences – Assemble Sequences 8 CHAPTER 1. INTRODUCTION 9 Figure 1.1: An overview of the server solution from CLC bio. Note that not all features are included with all license models. – Assemble Sequences to Reference – Secondary Peak Calling • Primers and Probes – Find Binding Sites and Create Fragments • Cloning – All three Gateway cloning tools • BLAST – BLAST – BLAST at NCBI – Download BLAST Databases CHAPTER 1. INTRODUCTION

CLC Science Server Administrator

Current Status and Future Perspectives of Bioinformatics in Tanzania

White Paper on CLC Read Mapper

Fpgas in Bioinformatics

Buying in to Bioinformatics: an Introduction to Commercial Sequence Analysis Software David Roy Smith

CLC Sequence Viewer

HMMER User's Guide

A Novice's Guide to Analyzing NGS-Derived Organelle And

CLC Genomics Workbench Features & Benefits

CLC Sequence Viewer Manual for CLC Sequence Viewer 6.5 Windows, Mac OS X and Linux

Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex- Specific Nuclease and Tetramethylammonium Chloride

Supplement on Visualizing Biological Data

Diploma Thesis a Module for Semi-Automated Annotation of Megabase-Sized DNA Sequences by Homolgy Search Stefan Michael Schuster