Bioservices.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
BioServices Release 1.7.12 Thomas Cokelaer, Lea M. Harder, Jordi Serra-Musach, Dennis Pultz Jul 26, 2021 Contents 1 Installation 5 1.1 User guide................................................5 Python Module Index 195 Index 197 i ii BioServices, Release 1.7.12 Python_version_available BioServices is tested for Python 3.6, 3.7, 3.8 Contributions Please join https://github.com/cokelaer/bioservices and share your notebooks https:// github.com/bioservices/notebooks/ Issues Please use https://github.com/cokelaer/bioservices/issues How to cite Cokelaer et al. BioServices: a common Python package to access biological Web Services programmatically Bioinformatics (2013) 29 (24): 3241-3242 Documentation RTD documentation. Bioservices is a Python package that provides access to many Bioinformatices Web Services (e.g., UniProt) and a framework to easily implement Web Services wrappers (based on WSDL/SOAP or REST protocols). The primary goal of BioServices is to use Python as a glue language to provide a programmatic access to several Bioinformatics Web Services. By doing so, elaboration of new applications that combine several of the wrapped Web Services is fostered. One of the main philosophy of BioServices is to make use of the existing biological databases (not to re-invent new databases) and to alleviates the needs for expertise in Web Services for the developers/users. BioServices provides access to 43 Web Services. For a quick start, look at some notebooks here github coke- laer/bioservices and here github bioservices. Here is a small example using the UniProt Web Service to search for the zap70 specy in human organism: >>> from bioservices import UniProt >>> u= UniProt(verbose= False) >>> data=u.search("zap70+and+taxonomy:9606", frmt="tab", limit=3, ... columns="entry name,length,id, genes") >>> print(data) Entry name Length Entry Gene names ZAP70_HUMAN 619 P43403 ZAP70 SRK B4E0E2_HUMAN 185 B4E0E2 RHOH_HUMAN 191 Q15669 RHOH ARHH TTF More examples and tutorials are available in the On-line documentation Here is the list of services available and their testing status (non exhaustive). Not all services are currently tested (WIP, july 2021) Contents 1 BioServices, Release 1.7.12 Service CI testing arrayexpress bigg biocarta biodbnet biogrid biomart biomodels chebi chembl chemspider clinvitae cog dbfetch ena ensembl eutils eva hgnc intact_complex kegg muscle mygeneinfo ncbiblast omicsdi omnipath panther pathwaycommons pdb pdbe pfam picr pride psicquic pubchem quickgo reactome rhea rnaseq_ebi seqret unichem uniprot wikipathway Note: Contributions to implement new wrappers are more than welcome. See BioServices github page to join the development, and the Developer guide on how to implement new wrappers. Warning: Some of the services may be down. BioServices developers are not responsible for the maintenance or failure of underlying services. Generally speaking (and by experience) the services are up most of the time but 2 Contents BioServices, Release 1.7.12 failure may occur because the site is under maintenance or too many requests have been sent. Another common reason is the fact that the API of the web services has changed: If so, BioServices need to be updated. You may contribute or report such API changes on our Issue page Contents 3 BioServices, Release 1.7.12 4 Contents CHAPTER 1 Installation BioServices is available on PyPi, the Python package repository. The following command should install BioServices and its dependencies automatically provided you have pip on your system: pip install bioservices If not, please see the external pip installation page or pip installation entry. You may also find information in the troubleshootings page section about known issues. Regarding the dependencies, BioServices depends on the following packages: BeautifulSoup4 (for parsing XML), SOAPpy and suds (to access to SOAP/WSDL services; suds is used by ChEBI only for which SOAPpy fails to correctly fetch the service) and easydev. All those packages should be installed automatically when using pip installer. Since version 1.6.0, we also make use of pandas and matplotlib to offer some extra functionalities. 1.1 User guide 1.1.1 Quick Start Introduction BioServices provides access to several Web Services. Each service requires some expertise on its own. In this Quick Start section, we will neither cover all the services nor all their functionalities. However, it should give you a good overview of what you can do with BioServices (both from the user and developer point of views). Before starting, let us remind what are Web Services. There provide an access to databases or applications via a web interface based on the SOAP/WSDL or the REST technologies. These technologies allow a programmatic access, which we take advantage in BioServices. The REST technology uses URLs so there is no external dependency. You simply need to build a well-formatted URL and you will retrieve an XML document that you can consume with your preferred technology platform. 5 BioServices, Release 1.7.12 The SOAP/WSDL technology combines SOAP (Simple Object Access Protocol), which is a messaging protocol for transporting information and the WSDL (Web Services Description Language), which is a method for describing Web Services and their capabilities. What methods are available for a given service Usually most of the service functionalities have been wrapped and we try to keep the names as close as possible to the API. On top of the service methods, each class inherits from the BioService class (REST or WSDL). For instance REST service have the useful request method. Another nice function is the onWeb. See also: REST, WSDLService What about the output ? Outputs depend on the service and functionalities of the service. It can be heteregeneous. However, output are mostly XML formatted or in tabulated separated column format (TSV). When XML is returned, it is usually parsed via the BeautilSoup package (for instance you can get all children using getchildren() function). Sometimes, we also convert output into dictionaries. So, it really depends on the service/functionality you are using. Let us look at some of the Web Services wrapped in BioServices. UniProt service Let us start with the UniProt class. With this class, you can access to uniprot services. In particular, you can map an ID from a database to another one. For instance to convert the UniProtKB ID into KEGG ID, use: >>> from bioservices.uniprot import UniProt >>> u= UniProt(verbose= False) >>> u.mapping(fr="ACC+ID", to="KEGG_ID", query='P43403') {'P43403': ['hsa:7535']} Note that the returned response from uniprot web service is converted into a list. The first two elements are the databases used for the mapping. Then, alternance of the queried element and the answer populates the list. You can also search for a specific UniProtKB ID to get exhaustive information: >>> print(u.search("P43403", frmt="txt")) ID ZAP70_HUMAN Reviewed; 619 AA. AC P43403; A6NFP4; Q6PIA4; Q8IXD6; Q9UBS6; DT 01-NOV-1995, integrated into UniProtKB/Swiss-Prot. DT 01-NOV-1995, sequence version 1. ... To obtain the FASTA sequence, you can use searchUniProtId(): >>> res=u.searchUniProtId("P09958", frmt="xml") >>> print(u.searchUniProtId("P09958", frmt="fasta")) sp|P09958|FURIN_HUMAN Furin OS=Homo sapiens GN=FURIN PE=1 SV=2 MELRPWLLWVVAATGTLVLLAADAQGQKVFTNTWAVRIPGGPAVANSVARKHGFLNLGQI FGDYYHFWHRGVTKRSLSPHRPRHSRLQREPQVQWLEQQVAKRRTKRDVYQEPTDPKFPQ QWYLSGVTQRDLNVKAAWAQGYTGHGIVVSILDDGIEKNHPDLAGNYDPGASFDVNDQDP DPQPRYTQMNDNRHGTRCAGEVAAVANNGVCGVGVAYNARIGGVRMLDGEVTDAVEARSL (continues on next page) 6 Chapter 1. Installation BioServices, Release 1.7.12 (continued from previous page) GLNPNHIHIYSASWGPEDDGKTVDGPARLAEEAFFRGVSQGRGGLGSIFVWASGNGGREH DSCNCDGYTNSIYTLSISSATQFGNVPWYSEACSSTLATTYSSGNQNEKQIVTTDLRQKC TESHTGTSASAPLAAGIIALTLEANKNLTWRDMQHLVVQTSKPAHLNANDWATNGVGRKV SHSYGYGLLDAGAMVALAQNWTTVAPQRKCIIDILTEPKDIGKRLEVRKTVTACLGEPNH ITRLEHAQARLTLSYNRRGDLAIHLVSPMGTRSTLLAARPHDYSADGFNDWAFMTTHSWD EDPSGEWVLEIENTSEANNYGTLTKFTLVLYGTAPEGLPVPPESSGCKTLTSSQACVVCE EGFSLHQKSCVQHCPPGFAPQVLDTHYSTENDVETIRASVCAPCHASCATCQGPALTDCL SCPSHASLDPVEQTCSRQSQSSRESPPQQQPPRLPPEVEAGQRLRAGLLPSHLPEVVAGL SCAFIVLVFVTVFLVLQLRSGFSFRGVKVYTMDRGLISYKGLPPEAWQEECPSDSEEDEG RGERTAFIKDQSAL See also: Reference guide of bioservices.uniprot.UniProt for more details KEGG service The KEGG interface is similar but contains more methods. The tutorial presents the KEGG itnerface in details, but let us have a quick overview. First, let us start a KEGG instance: from bioservices import KEGG k= KEGG(verbose= False) KEGG contains biological data for many organisms. By default, no organism is set, which can be checked in the following attribute k.organism We can set it to human using KEGG terminology for homo sapiens: k.organis='hsa' You can use the dbinfo() to obtain statistics on the pathway database: >>> print(k.info("pathway")) pathway KEGG Pathway Database path Release 65.0+/01-15, Jan 13 Kanehisa Laboratories 218,277 entries You can see the list of valid databases using the databases attribute. Each of the database entry can also be listed using the list() method. For instance, the organisms can be retrieved with: k.list("organism") However, to extract the Ids extra processing is required. So, we provide aliases to retrieve the organism Ids easily: k.organismIds The human organism is coded as “hsa”. You can also get its T number instead: >>> k.code2Tnumber("hsa") 'T01001' Every elements is referred to with a KEGG ID, which may be difficult to handle at first. There are methods to retrieve the IDs though. For instance, get