Your genotype submission pack

The purpose of this document is to provide detailed information on the key EGA submission stage s for your array based data.

If your submission also consists of sequence based data that is covered under the same study (publication), we request that you generate your study accession first, using the instructions provided in your sequence submission pack. The study accession obtained should then be used for your Array based submission.

Ad ditional submission help and support can be obtained by emailing EGA -Helpdesk

Key stages of Genotype & SNP submission

Receive - Encrypt - Calculate - Upload - Send Key - Document

Receive

Unless your submission also consists of sequence data, you should now have a unique accession number (EGAS000000000NN)

*An example of how to use this accession number when genomic data has been submitted to the EGA :

“Genome data has been deposited at the European Genome -Phenome Archive (EGA, http://www.ebi.ac.uk/ega/ ) which is hosted at the EBI, under accession number EGAS#."

Encrypt

Encrypt all your documents and files using GnuPG Contact EGA Helpdesk to obtain the GnuPG public key over email

Before uploading your data files to your upload account, all data files must be encrypted using GnuPG.

Quick quide to using GnuPG for encryption i) Follow the installation instructions found here . ii) If creating your own key, use the : gpg –output -c < filename> Follow the onscreen prompts and choose the defau lt options, which will create an encrypted copy for each file.

If using the EGA public key, import the key by using the command: gpg --import EGA_Public_Key iii) Now encrypt your files using the c ommand: gpg -e [filename1] [filename2] [etc]

If using your own key, enter your UID generated when you created the key in step 2. For EGA public key, enter your UID as ' EGA_Public_Key '.

You should now have an encrypted copy for each file, with the suffix *.gpg*.

Further information on using GnuPG can be found on their documentation pages here .

Calculate

Calculate md5 checksums for files prior and post encryption (i.e. each file should have two md5 values)

The md5sum program is installed by default on most , and Unix like systems. The windows md5sum program is available here .

To generate md5sum values for any number of files use the command: md5sum > myvalues.md5

This will create md5sum values for the files listed and save these values into a file called 'myvalues.md5'

Please upload your md5sum values to your data upload account.

Further information on md5sum can be found here .

Upload

Upload all your data files into your data upload account.

Methods available for uploading data are detailed below.

Using Aspera: Downloading the Aspera ascp command line program Aspera is a commercial file transfer protocol that provides faster transfer speeds than ftp over long distances.

For short distance file transfers we continue to recommend the use of ftp.

The Aspera ascp command line client can be dowloaded here . Please select the correct .

The ascp command line client is distributed as part of the aspera connect high-performance transfer browser plug-in.

Using Aspera: Using the Aspera ascp command line program

Please note: The ascp command line should be run from within the Aspera directory containing ascp.exe.

Your command should look similar to this: ascp -QT -l300M -L- @fasp.ega.ebi.ac.uk:/.

'-l300M' option sets the upload speed limit to 30MB/s. You may wish to lower this value to increase the reliability of the transfer.

'-L-' option is for printing logs out while transferring,

can be a file mask (e.g. '/homes/submitter/*.srf) or a list of files.

is your password protected Aspera login.

Add '-k2' switch for transfer restarts

Using default ftp command line client in Window

1- Start the command line interpreter: press Win-R, type cmd, hit enter 2- Enter 'ftp ftp-private.ebi.ac.uk' 3- Enter your login 4- Enter your password 5- To see a list of available ftp commands type 'help'. 6- Type '' command to check the content of your submission account. 7- Type 'prompt' to switch off confirmation for each file uploaded. 8- Use 'mput' command to upload files: 'mput *.srf' 9- Use 'bye' command to exit the ftp client. 10- Use 'exit' command to exit the command line interpreter.

Using default ftp command line client in Linux/Unix

1- Open a terminal and type 'ftp ftp-private.ebi.ac.uk' 2- Enter your login 3- Enter your password 4- To see a list of available ftp commands type 'help'. 5- Type 'ls' command to check the content of your drop box. 6- Type 'prompt' to switch off confirmation for each file uploaded. 7- Use 'mput' command to upload files: 'mput *.srf' 8- Use 'bye' command to exit the ftp client.

Send Key

Pass your encryption key to the EGA by post or phone (not required if GnuPG public key used)

Please do not pass your encryption key over email. You may use postal/courier services, deliver in person or pass the key over the phone

Our contact details:

Mr Jeff Almeida-King EGA User Support Officer EMBL-EBI Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD,UK Tel: +44 (0) 1223 494559

Document

Provide details of your study, samples, data files, policy documentation & dataset/s

The EGA-Array-based-Format document (EGA -AF ) is a spreadsheet template for submitters to add metadata and policy documentation associated with each genotype submission. Once comp leted and validated, the EGA-AF is used to produce a website that will describe and to the submitted data.

An EGA-AF template should be attached to your email.

The EGA can only process a submission once a completed EGA -Array-based -Format document is received from the submitter.

The EGA-Array-based-Format document

The EGA-GSF spreadsheets consists of three components:

1) Investigator and policy documents Information about your study and policy documentation.

2) Sample and phenotypes Sa mple and phenotype information.

3) Datasets Define how your data is going to be organised into datasets for distribution.

4) Data files Maps your submitted data files to your samples.

Should further assistance be required after going through the guide below; please do not hesitate to contact the EGA helpdesk . EGA-AF: Investigator and policy documents

What follows is an EGA-AF walk-through based on a hypothetical case -control genotype submission consisting of 2 human lung samples genotyped with 2 different platforms:

Affymetrix_500K and Illumina_550K. i) Individual contact details

ii) Details of data providers and data abstract

iii) Attaching policy documentation

Path/name of policy doc Notes on policy documentation:

* Document MUST be undersigned by an individual capable of confirming the statements made therein (e.g. Principal Investigator)

* Please add your policy document template to your data file upload account or email directly to EGA Helpdesk.

* View an example/template of the required Policy statements .

iv) Details of your Data Access Committee (DAC)

DAC/individual name

DAC contact details

Document name/path

Notes on Data Access Committee’s:

*Please add your Data access application form and Data Access Agreement form to your data file upload account.

View examples of a ‘’ Data access application form and a ‘ Data Access Agreement .’ v) Further deta ils of study and release policy

EGA-AF: Samples and phenotypes

What follows is a small sample of the Samples and phenotypes component, which consists of 2 samples from two individuals. Both samples have been genotyped using Affymetrix_500K and Illumina_550K platforms and three types of genotype calling software have been used (chiamo, brlmm and Illuminus).

You will find the Samples and phenotypes component located in the tab at the bottom of the sheet shown here:

EGA-AF: Datasets

What follows is a small sample of the dataset component.

We suggest that each dataset should consist of a common set of data. The example below consists of three datasets, grouped according to shared data type, technology and by case/control.

We also like to capture the number of samples that make up a dataset and the Data Access Committee responsible for approving access to the named dataset.

You will find the Dataset component located in the tab at the bottom of the sheet shown here:

What data Number of Unique Case or makes up samples in name control data your dataset dataset

DAC Describe your Technology responsible dataset platform used for data access

EGA-AF: Data files

What follows is an example of how to map your samples (detailed in the Samples and phenotype tab) to the genotype files added to your upload account.

You will find the Genotype and SNP component located in the tab at the bottom of the sheet shown here:

What happens after the key submission stages have been completed?

A website is prepared from the metadata you submitted, which will point to your submission.

Once completed, a member of the EGA will be in before your website goes live to ensure:

• Your study is represented accurately • Access to EGA user management tools is provided to the Data Access Committee named contacts

Finally, your data is archived and backed up within our database s to ensure that files are safely stored.

We strongly advise you NOT to delete your data until we confirm that your data has been successfully archived.