Your genotype submission pack
The purpose of this document is to provide detailed information on the key EGA submission stage s for your array based data.
If your submission also consists of sequence based data that is covered under the same study (publication), we request that you generate your study accession first, using the instructions provided in your sequence submission pack. The study accession obtained should then be used for your Array based submission.
Ad ditional submission help and support can be obtained by emailing EGA -Helpdesk
Key stages of Genotype & SNP submission
Receive - Encrypt - Calculate - Upload - Send Key - Document
Receive
Unless your submission also consists of sequence data, you should now have a unique accession number (EGAS000000000NN)
*An example of how to use this accession number when genomic data has been submitted to the EGA :
“Genome data has been deposited at the European Genome -Phenome Archive (EGA, http://www.ebi.ac.uk/ega/ ) which is hosted at the EBI, under accession number EGAS#."
Encrypt
Encrypt all your documents and files using GnuPG Contact EGA Helpdesk to obtain the GnuPG public key over email
Before uploading your data files to your upload account, all data files must be encrypted using GnuPG.
Quick quide to using GnuPG for encryption i) Follow the installation instructions found here . ii) If creating your own key, use the command: gpg –output
If using the EGA public key, import the key by using the command: gpg --import EGA_Public_Key iii) Now encrypt your files using the c ommand: gpg -e [filename1] [filename2] [etc]
If using your own key, enter your UID generated when you created the key in step 2. For EGA public key, enter your UID as ' EGA_Public_Key '.
You should now have an encrypted copy for each file, with the suffix *.gpg*.
Further information on using GnuPG can be found on their documentation pages here .
Calculate
Calculate md5 checksums for files prior and post encryption (i.e. each file should have two md5 values)
The md5sum program is installed by default on most Unix, Linux and Unix like systems. The windows md5sum program is available here .
To generate md5sum values for any number of files use the command: md5sum
This will create md5sum values for the files listed and save these values into a file called 'myvalues.md5'
Please upload your md5sum values to your data upload account.
Further information on md5sum can be found here .
Upload
Upload all your data files into your data upload account.
Methods available for uploading data are detailed below.
Using Aspera: Downloading the Aspera ascp command line program Aspera is a commercial file transfer protocol that provides faster transfer speeds than ftp over long distances.
For short distance file transfers we continue to recommend the use of ftp.
The Aspera ascp command line client can be dowloaded here . Please select the correct operating system.
The ascp command line client is distributed as part of the aspera connect high-performance transfer browser plug-in.
Using Aspera: Using the Aspera ascp command line program
Please note: The ascp command line should be run from within the Aspera directory containing ascp.exe.
Your command should look similar to this: ascp -QT -l300M -L-
'-l300M' option sets the upload speed limit to 30MB/s. You may wish to lower this value to increase the reliability of the transfer.
'-L-' option is for printing logs out while transferring,
Add '-k2' switch for transfer restarts
Using default ftp command line client in Window
1- Start the command line interpreter: press Win-R, type cmd, hit enter 2- Enter 'ftp ftp-private.ebi.ac.uk' 3- Enter your login 4- Enter your password 5- To see a list of available ftp commands type 'help'. 6- Type 'ls' command to check the content of your submission account. 7- Type 'prompt' to switch off confirmation for each file uploaded. 8- Use 'mput' command to upload files: 'mput *.srf' 9- Use 'bye' command to exit the ftp client. 10- Use 'exit' command to exit the command line interpreter.
Using default ftp command line client in Linux/Unix
1- Open a terminal and type 'ftp ftp-private.ebi.ac.uk' 2- Enter your login 3- Enter your password 4- To see a list of available ftp commands type 'help'. 5- Type 'ls' command to check the content of your drop box. 6- Type 'prompt' to switch off confirmation for each file uploaded. 7- Use 'mput' command to upload files: 'mput *.srf' 8- Use 'bye' command to exit the ftp client.
Send Key
Pass your encryption key to the EGA by post or phone (not required if GnuPG public key used)
Please do not pass your encryption key over email. You may use postal/courier services, deliver in person or pass the key over the phone
Our contact details:
Mr Jeff Almeida-King EGA User Support Officer EMBL-EBI Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD,UK Tel: +44 (0) 1223 494559
Document
Provide details of your study, samples, data files, policy documentation & dataset/s
The EGA-Array-based-Format document (EGA -AF ) is a spreadsheet template for submitters to add metadata and policy documentation associated with each genotype submission. Once comp leted and validated, the EGA-AF is used to produce a website that will describe and link to the submitted data.
An EGA-AF template should be attached to your email.
The EGA can only process a submission once a completed EGA -Array-based -Format document is received from the submitter.
The EGA-Array-based-Format document
The EGA-GSF spreadsheets consists of three components:
1) Investigator and policy documents Information about your study and policy documentation.
2) Sample and phenotypes Sa mple and phenotype information.
3) Datasets Define how your data is going to be organised into datasets for distribution.
4) Data files Maps your submitted data files to your samples.
Should further assistance be required after going through the guide below; please do not hesitate to contact the EGA helpdesk . EGA-AF: Investigator and policy documents
What follows is an EGA-AF walk-through based on a hypothetical case -control genotype submission consisting of 2 human lung samples genotyped with 2 different platforms:
Affymetrix_500K and Illumina_550K. i) Individual contact details
ii) Details of data providers and data abstract
iii) Attaching policy documentation
Path/name of policy doc Notes on policy documentation:
* Document MUST be undersigned by an individual capable of confirming the statements made therein (e.g. Principal Investigator)
* Please add your policy document template to your data file upload account or email directly to EGA Helpdesk.
* View an example/template of the required Policy statements .
iv) Details of your Data Access Committee (DAC)
DAC/individual name
DAC contact details
Document name/path
Notes on Data Access Committee’s:
*Please add your Data access application form and Data Access Agreement form to your data file upload account.
View examples of a ‘’ Data access application form and a ‘ Data Access Agreement .’ v) Further deta ils of study and release policy
EGA-AF: Samples and phenotypes
What follows is a small sample of the Samples and phenotypes component, which consists of 2 samples from two individuals. Both samples have been genotyped using Affymetrix_500K and Illumina_550K platforms and three types of genotype calling software have been used (chiamo, brlmm and Illuminus).
You will find the Samples and phenotypes component located in the tab at the bottom of the sheet shown here:
EGA-AF: Datasets
What follows is a small sample of the dataset component.
We suggest that each dataset should consist of a common set of data. The example below consists of three datasets, grouped according to shared data type, technology and by case/control.
We also like to capture the number of samples that make up a dataset and the Data Access Committee responsible for approving access to the named dataset.
You will find the Dataset component located in the tab at the bottom of the sheet shown here:
What data Number of Unique Case or makes up samples in name control data your dataset dataset
DAC Describe your Technology responsible dataset platform used for data access
EGA-AF: Data files
What follows is an example of how to map your samples (detailed in the Samples and phenotype tab) to the genotype files added to your upload account.
You will find the Genotype and SNP component located in the tab at the bottom of the sheet shown here:
What happens after the key submission stages have been completed?
A website is prepared from the metadata you submitted, which will point to your submission.
Once completed, a member of the EGA will be in touch before your website goes live to ensure:
• Your study is represented accurately • Access to EGA user management tools is provided to the Data Access Committee named contacts
Finally, your data is archived and backed up within our database s to ensure that files are safely stored.
We strongly advise you NOT to delete your data until we confirm that your data has been successfully archived.