<<

Expert Review of

ISSN: 1478-9450 (Print) 1744-8387 (Online) Journal homepage: http://www.tandfonline.com/loi/ieru20

Advances in the development of human microarrays

Jessica G. Duarte & Jonathan M. Blackburn

To cite this article: Jessica G. Duarte & Jonathan M. Blackburn (2017) Advances in the development of human protein microarrays, Expert Review of Proteomics, 14:7, 627-641, DOI: 10.1080/14789450.2017.1347042 To link to this article: http://dx.doi.org/10.1080/14789450.2017.1347042

Accepted author version posted online: 23 Jun 2017. Published online: 04 Jul 2017.

Submit your article to this journal

Article views: 22

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ieru20

Download by: [202.152.71.55] Date: 09 July 2017, At: 07:36 EXPERT REVIEW OF PROTEOMICS, 2017 VOL. 14, NO. 7, 627–641 https://doi.org/10.1080/14789450.2017.1347042

REVIEW Advances in the development of human protein microarrays

Jessica G. Duartea and Jonathan M. Blackburn b aCancer Immunobiology Laboratory, Olivia Newton-John Cancer Research Institute/School of Cancer Medicine, La Trobe University, Heidelberg, Australia; bInstitute of Infectious Disease and Molecular Medicine & Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Observatory, South Africa

ABSTRACT ARTICLE HISTORY Introduction: High-content protein microarrays in principle enable the functional interrogation of the Received 20 February 2017 human proteome in a broad range of applications, including biomarker discovery, profiling of immune Accepted 22 June 2017 responses, identification of enzyme substrates, and quantifying protein-small molecule, protein-protein KEYWORDS and protein-DNA/RNA interactions. As with other microarrays, the underlying proteomic platforms are Protein microarrays; under active technological development and a range of different protein microarrays are now commer- proteomics; biomarker cially available. However, deciphering the differences between these platforms to identify the most discovery; profiling suitable protein for the specific research question is not always straightforward. Areas covered: This review provides an overview of the technological basis, applications and limita- tions of some of the most commonly used full-length, recombinant protein and protein fragment microarray platforms, including ProtoArray Human Protein Microarrays, HuProt Human Proteome Microarrays, Human Protein Atlas Protein Fragment Arrays, Nucleic Acid Programmable Arrays and Immunome Protein Arrays. Expert commentary: The choice of appropriate platform depends on the specific biological application in hand, with both more focused, lower density and higher density arrays having distinct advantages. Full-length protein arrays offer advantages in biomarker discovery profiling appli- cations, although care is required in ensuring that the protein production and array fabrication methodology is compatible with the required downstream functionality.

1. Introduction , copy number variation, splice variation, poly- morphic variation, DNA methylation, and transcription factor Since the introduction of DNA microarrays in the mid-1990s, a binding sites. wide range of analogous microarray technologies have been In the proteomics field, microarray technologies today spawned that enable a multitude of applications across the include full-length protein arrays, protein fragment arrays, biological and biomedical sciences and which encompass arrays, antibody arrays, reverse-phase arrays and tis- genomic, transcriptomic, proteomic, glycomic, epigenomic, sue arrays. In this Review, we provide an overview of recent and drug discovery research areas, among others. At their technological advances in the development of protein micro- simplest, microarrays can be considered as lab-on-chip tools arrays, with a specific focus on full-length protein and protein consisting of multiple different biomolecules that are immo- fragment microarrays, discussing the technological basis, bilized at spatially addressable locations on a surface and applications and limitations of some of the most commonly which can function as discrete probes in downstream, highly used protein microarray platforms. Technological advances in miniaturized, multiplexed assays. The parallel interaction of the field have been reviewed recently such immobilized probes with analytes contained in complex elsewhere [3–7] so are not dealt with again here. sample solutions results in the generation of vast amounts of Early research in the protein microarray field focused initi- quantitative, functional and interaction-based information for ally on simple technical demonstrations that instrumentation analysis [1,2]. and surfaces adapted directly from the DNA microarray field The underlying microarray technologies themselves remain could also be used to fabricate protein microarrays: commer- under active development in several areas, including diversifi- cially-available arrayers were used to print nanolitre volumes cation of content, improvement in surfaces and broadening of of different purified onto chemically derivatized glass application areas. For example, in the genomic field, DNA slides, resulting in high density, spatially defined patterns; microarray technology has been revolutionized over the discrete locations on those prototype microarrays were iden- years by dramatically expanding the number of probes on tified through on-chip protein interaction assays, using fluor- the arrays while concomitantly reducing the feature sizes, escently labeled, known binding partners; and assays were improving the underlying surfaces, and developing many read out using DNA microarray scanners [8]. From those new application areas, including inter alia global analysis of

CONTACT Jonathan M. Blackburn [email protected] Institute of Infectious Disease and Molecular Medicine & Department of Integrative Biomedical Sciences, Faculty of Health Sciences, University of Cape Town, Observatory 7925, South Africa © 2017 Informa UK Limited, trading as Taylor & Francis Group 628 J. G. DUARTE AND J. M. BLACKBURN relatively modest beginnings, protein microarrays have devel- throughput purification strategies. Insect cell expression sys- oped progressively into valuable proteomic tools that today tems have therefore found favor with several groups since, can contain tens of thousands of different proteins, including once transfected with a suitable baculoviral vector, expres- entire proteomes in some cases, and which are capable of sion of many different eukaryotic proteins with more com- addressing numerous different biological questions, with plex PTMs can be achieved in parallel in a few days, albeit potential applications in biomarker discovery as well as in with typically lower yields, and downstream lysis methods the quantitative analyses of protein function and interactions. are much milder than for yeast [10]. By contrast, mammalian However, realization of the true potential of modern protein expression systems are less simple to establish and maintain microarray technologies relies on the effective integration and in high throughput, while cell-free transcription-translation application of various components of the overall system, systems can be good sources of polypeptide, but often including high throughput protein expression and purification, require optimization for expression of individual folded, protein immobilization, assay development, signal detection, functional proteins. data processing and data analysis [9]. Classic limiting factors to Following expression, high throughput, parallel purification the wider uptake of protein microarrays by the community of recombinant proteins to near-homogeneity is also not have included the availability of protein content in a form always straightforward, since in many cases, short peptide suitable for arraying, the development of surface chemistries tags (e.g. the hexa-His tag) can be occluded, necessitating to preserve the folded structure and function of proteins on purification under denaturing conditions [11]. The protein immobilization, and the availability of assays for all the arrayed microarray field has thus in general gravitated now toward proteins; these factors are discussed briefly in turn below. use of larger protein domains as affinity tags (e.g. the glu- tathione S-transferase (GST) tag), as well as to the use of automated purification systems. Some groups, though, have adopted alternative purification strategies, combining specific 1.1. Content generation protein affinity tags with specialized array surfaces to enable One of the early decisions to be made in the protein micro- single step, in situ and surface immobiliza- array field is how the content will be generated. One obvious tion [12–17], thus circumventing the need for laborious high approach is to separate many different proteins directly from throughput pre-purification of proteins prior to array native sources, which at first sight has the advantage that the fabrication. native proteins come complete with relevant post-transla-

tional modifications and potentially also with interacting part- ners. However, purification of a single native protein or 1.2. Array fabrication complex from a biological sample can be a lengthy, highly optimized process. By comparison, multi-dimensional fractio- The construction of a high-density protein microarray, invol- nation strategies to produce the content for native protein ving the immobilization of diverse protein content, requires microarrays are not optimized for each individual protein in numerous factors to be considered in order to assure an the sample, so instead yield simplified mixtures of unknown optimal microarray design [2]. composition and unknown concentration that require further High content protein microarray fabrication methods typi- downstream deconvolution. Furthermore, reproducibility of cally use either contact printing or piezoelectric deposition protein purification and activity from native sources remains methods, with instruments adapted directly from the DNA an issue, even for abundant native proteins such as plant microarray field. Typically, piezoelectric printing methods lectins. Thus, despite the cost and labor-intensive nature of give excellent, reproducible control over spot morphologies high throughput cloning and recombinant expression, the and spot sizes, but are prone to blockage of nozzles if the major protein microarray platforms available today focus lar- samples are too viscous or contain particulate matter. Contact gely on recombinant sources, with proteins expressed hetero- printing with either solid or split pins is more robust, but can logously in either simpler prokaryotic hosts (usually Escherichia result in greater spot-to-spot variability due to the subtle coli), in yeast, insect or mammalian cells, or in cell-free tran- variation in the machining of individual pins. Irrespective of scription-translation systems. the exact printing method used, care needs to be taken in all Each heterologous expression system has its own advan- fabrication processes to avoid carry-over between different tages: E. coli is quick to culture and easy to lyse in high sample analytes deposited with the same pin/nozzle, so inclu- throughput and can give good yields of recombinant pro- sion of suitable wash steps and quality control checks is there- tein, but suffers from the disadvantages that prokaryotes fore essential [18]. and eukaryotes use different co- or post-translational protein Consideration also needs to be applied to the choice of folding mechanisms and that it lacks eukaryotic-like post- buffer from which the proteins are printed, since the surface translational modification (PTM) machinery, resulting in tension of the printed droplets affects evaporation rates as lower success rates for high throughput expression of func- well as spot spreading, so influences spot morphology, spot tional eukaryotic proteins. By comparison, yeast expression size and homogeneity of the immobilized protein within each systems are better suited to the heterologous expression of spot. Spot sizes can be altered based on the specific protein eukaryotic proteins with simple eukaryotic PTMs, but suffer microarray application and content density required and are from difficulties of cell lysis without induction of endogen- typically in the range of 50–450 –m – considerably larger than ous proteases, which complicates downstream high the feature sizes on current commercial DNA microarrays. EXPERT REVIEW OF PROTEOMICS 629

Spot-to-spot spacing also needs to be large enough to mini- protein-DNA binding assays across sets of transcription factors mize spot running and cross-contamination during fabrication. [20], kinase substrate identification assays [20,24] and protein– The use of duplicate, triplicate or quadruplicate replicates protein interaction mapping assays [25]. However, as the pro- per probe is commonplace on protein microarrays, since this tein content on the microarrays becomes more diverse, the assists in assay validation and in determination of coefficients range of array-based assays that can be configured for every of variation within and across replica arrays. Appropriate posi- microarrayed protein diminishes due to the lack of common- tive and negative controls should also be included in array ality in structure or function of the arrayed proteins and to the designs where possible. still-limited methods for detecting the results of microarray- Important choices also need to be made regarding the based assays. The first protein microarray application to have surface chemistries on to which the arrays are fabricated, in taken hold widely therefore is serological screening. Here, the order to ensure that sufficient protein densities per spot are interaction between target analytes (e.g. -specific anti- achieved, as well as to control orientation and folding of bodies) contained in a sample solution (e.g. human serum or arrayed target proteins when required. The simplest fabrica- plasma) with arrayed proteins () can be readily tion approaches make use of either chemically-reactive glass detected since, while the target antigens themselves are surfaces (typically either aldehyde-, N-hydroxy succinamide- or diverse, the constant regions of the antibody analytes enables epoxide-activated) or of nitrocellulose-coated glass [8,19]; in readout of binding interactions via, for example, a fluores- both cases, immobilization of essentially all proteins can be cently-labeled secondary antibody [2,26–28]. readily achieved in parallel, but the binding to such surfaces is As with ELISAs, in such serological assays, it is critical that uncontrolled, through random covalent coupling or non-spe- cross-reactivity of the secondary antibody is eliminated, to cific physisorption, respectively and can lead to unfolding and minimize the possibility of false positives [18]. However, for loss of activity. Other approaches therefore aim to achieve quantitative serological analysis, it is equally important that controlled, oriented immobilization of different recombinant the measured signal in the assay does not inadvertently proteins onto the surface via a common affinity tag depend on the affinity of the labeled secondary antibody for [2,12,14,17,20,21], which has theoretical advantages, particu- the captured primary antibody, which argues that the second- larly if downstream, quantitative, on-array functional assays ary antibody needs to be used at a concentration that ensures are required [20]. saturation binding [21]. Thus, considerable optimization of Post-microarraying, typically some form of blocking agent such serological assays is still required, despite their apparent is applied to the surrounding surface in order to abrogate simplicity.

non-specific binding of analyte molecules to the underlying Detection limits vary between different microarray plat- surface during downstream assays. Here, typical methods sim- forms, depending in part on the choice of detector, but the ply adapt blocking agents commonly used in enzyme-linked most widely used microarrays typically offer high sensitivity in immunosorbent assays (ELISA), such as bovine serum albumin fluorescence-based assays that are read-out on a confocal or milk proteins. In the absence of specific control over on- DNA microarray scanner, thereby in principle enabling both array protein stability, many groups simply print protein qualitative and quantitative data to be produced. As with microarrays at point of use, potentially creating unwanted classic ligand binding assays, the limit of detection in many batch effects in downstream assays. However, more sophisti- such assays is thus fundamentally dominated by physical cated protein microarray architectures make use of the well- chemistry considerations, based on the combination of the known and superior properties of polyethylene glycol poly- intrinsic binding affinity of the capture probe for the target mers, which block non-specific macromolecule binding to the analyte and the concentration of that analyte in solution [29]. surface [21–23] while at the same time creating a hydrated, The limit of detection in microarray assays is also controlled by hydrogel-like environment that helps stabilize the folded state the signal to noise ratio in the assay, with the lower back- and functionality of the immobilized proteins for downstream ground binding afforded by, for example, polyethylene glycol- assay over periods of many months. Such stability of arrayed coated surfaces enabling a lower limit of detection of true proteins post-immobilization is a key factor in enabling the foreground signal [21] compared to more traditional blocking use of protein microarrays as functional and quantitative ana- agents that show higher non-specific background binding. lytical tools [18]. 1.4. Data processing and analysis 1.3. Assay development Protein microarrays are usually high-content, allowing for the Once high-density protein microarrays with diverse protein high throughput analysis of numerous target antigens in par- content have been constructed, the next hurdle to overcome allel. In addition, protein microarrays can be printed with is the development of assays that work for all the arrayed multiple replica sub-arrays on each slide (e.g. in 4-plex or 16- proteins. In contrast to DNA microarrays, this has remained a plex formats), with each replica array on a given slide being major bottleneck in the protein microarray field since the assayable independently with either different samples or dif- different arrayed proteins typically have very different physical ferent concentrations of a given analyte, but under otherwise and biochemical properties, as well as widely differing affi- uniform assay conditions [2,20]. Protein microarrays can thus nities for their interacting analyte molecules. Numerous yield large amounts of raw data, which require dedicated papers have reported functional assays carried out on-array processing and analysis tools to enable the correct biological on smaller, focused sets of proteins, for example quantitative interpretation of the results. It is important to note here that 630 J. G. DUARTE AND J. M. BLACKBURN while many processing and analysis tools exist from the DNA change based statistical methods, among others, depending microarray field, typically these cannot be used readily for the on the research question in place. Bioinformatic methods for analysis of protein microarray datasets because the underlying such downstream analyses are typically adapted from the statistical assumptions differ between the microarray experi- genomic field. Since there is no consistency between ments. For example, in a gene expression study on a DNA research groups in the choice of statistical approach to microarray, it is typically assumed that a roughly equal num- data analysis in either the protein or DNA microarray fields, ber of genes will be up- and down-regulated as a result of the it remains of critical importance that the validity of the test condition, so the mean expression across all genes, as well statistical analysis is considered up front. Importantly, com- as the distribution of expression values, should remain the plications can arise if genomic analysis methods are applied same and quantile normalization can therefore be applied to to protein microarray datasets without prior consideration of the raw datasets. However, in a protein microarray experi- the underlying data structure. For example, hierarchical clus- ment, typically fewer elements on the array will produce a ter analysis of an unfiltered protein microarray dataset that true positive signal in the assay and these might differ sub- genuinely contains large numbers of zero values and small stantially in identity and magnitude between test and control numbers of positive values will be considerably less infor- assays for genuine biological reasons, in which case both the mative than hierarchical cluster analysis of the same dataset mean and the distributions will be altered and quantile nor- that was pre-filtered to focus only on the most commonly malization therefore should not be applied [2,30]. identified proteins [32]; this can be likened to a penetrance- Several tools are currently available that have been speci- based analysis in the genomics field. Furthermore, as with all fically developed for protein microarray datasets, offering ‘omics experiments, it is important to consider the sample bioinformatic and statistical analysis methods that are suitable size and number of technical replicates per sample that are for this type of data [31]; some of these are described in more required to achieve the statistical power required for the detail in subsequent sections of this review. As in the DNA study. microarray and proteomics fields, there is no single standard Protein microarrays currently enable the interrogation of an data processing or analysis pipeline used by all protein micro- ever larger subset of the human proteome and of the entire array users, but rather each research group typically adopts proteome of specific microorganisms (e.g. E. coli,yeast, their own specialized methods, which creates problems when Mycobacterium tuberculosis, and human papilloma virus). Such trying to compare reported protein microarray datasets protein microarrays are now finding utility across a broad num- between laboratories. Given that array content and surface ber of different biological applications, including biomarker

chemistries differ between protein microarray platforms, if a discovery in cancers, autoimmune diseases, neurological dis- standardized set of reporting criteria were in place (similar, for eases and inflammatory diseases [33]. In the following sections example to the MIAME standards in the DNA microarray field), of this Review, we will discuss some of the most commonly datasets produced by different labs could become compli- used planar, recombinant protein microarray platforms, high- mentary to one another within a common disease setting. lighting the key differences and uses of each of these. With the ambition of enabling a more generically useful data processing pipeline for protein microarray datasets, a compo- site and comprehensive bioinformatic tool has recently been 2. ProtoArray human protein microarray developed for raw data processing which includes methods such as neighborhood background correction and subtraction, The progenitor of the ProtoArray human protein microarray noise and CV thresholds, data point and array filtering, pin-to- (Thermo Fisher Scientific) was a yeast proteome microarray, pin and array-to-array normalization, and data consolida- originally developed by the Snyder laboratory at Yale and tion [2]. containing 5800 full-length, purified target yeast proteins A further consideration in the field is how to objectively that were used to screen for diverse biochemical activities determine the threshold for true signal. The simplest approach [34]. These N-terminal GST-His-tagged proteins were produced here relies solely on subtraction of the signal from the sur- recombinantly from high-quality yeast ORF clones and rounding background, but with the availability of pixelated expressed in Saccharomyces cerevisiae to ensure correct pro- data, it is straightforward to calculate the standard deviation tein folding. After high throughput purification, proteins were (SD) of the surrounding background signal and to then set a printed onto aldehyde-treated or nickel-coated slides, with classic noise threshold for true signal of 2SD from the mean immobilization being via amine or His-tag attachment respec- background pixel intensity [2]. However, the weakness in this tively, and detected using fluorescently-labeled anti-GST anti- approach is that the surrounding area has a different bio- bodies; immobilization via the His-tag was found to yield chemical composition to the foreground probe area, so the superior signals. Yeast is often referred to as the model host non-specific binding of analytes to the foreground and back- and studying the yeast proteome has enabled the understand- ground areas is not necessarily the same. A more nuanced ing of many essential cellular processes of relevance to higher approach therefore relies on negative control data for each eukaryotes [35]. The yeast proteome microarray has been used arrayed protein to define a probe- and assay-specific threshold across numerous functional applications, including protein– for true signal. protein interactions, phospholipid interactions and small Following the use of appropriate data processing tools, molecule interactions, as well as verification of antibody tar- the biological interpretation of the resulting extracted data- gets in antibody target assays, and the identification of sub- sets can be achieved by way of data clustering and fold- strates for solution phase enzymes [36]. EXPERT REVIEW OF PROTEOMICS 631

The ProtoArray human protein microarray today consists of target discovery and validation using enzyme substrate profil- 9483 unique full-length and functional human proteins and appro- ing, target identification or selectivity using small molecule priate controls. This content includes disease-relevant proteins profiling, mapping biochemical pathways using protein–pro- across several distinct protein classes, including protein kinases, tein interaction profiling, and therapeutic antibody develop- transcription factors, membrane proteins, nuclear proteins and ment using antibody specificity profiling. To date, the majority secreted proteins, as well as signal transduction, cell communica- of publications based on use of the ProtoArray platform have tion, metabolism, cell death, and protease activities. These human focused on the first of these areas: qualitative profiling of proteins are produced from sequenced-verified clones (Invitrogen antibody responses in disease. Ultimate ORF collection) and are expressed as N-terminal GST Within the cancer biomarker discovery field, serum anti- fusion proteins using a baculovirus expression system (Gateway body response profiling has been carried out by numerous entry vector pENTR221), with high throughput purification carried groups using the ProtoArray platform on a broad spectrum of out under non-denaturing conditions. Purified proteins and con- cancers, including bladder cancer [37], breast cancer [38], trols are subsequently printed in duplicate on thin-film nitrocellu- chronic lymphocytic leukemia [39], colorectal cancer [40], lose-coated glass slides at 4°C. The ProtoArray human protein lung cancer [41], myelodysplastic syndromes [42], ovarian microarray can then be incubated with assay sample, washed cancer [43,44], and pancreatic cancer [45]. and probed with fluorescently labeled detection reagents, prior Additional disease-specific biomarker discovery studies to visualization on any open architecture fluorescence microarray have been performed for autoimmune diseases (rheumatoid scanner. arthritis [46], Sjögren’ssyndrome[47],type1diabetes[48], The resulting data can be analyzed using the free Meniere’sdisease[49], autoimmune polyendocrine syn- ProtoArray Prospector software which features algorithms for drometype1[50]), inflammatory diseases (asthma [51], analysis of the various supported applications, as well as for inflammatory bowel disease [52]), neurological diseases data normalization, and provides a report output. This soft- (Alzheimer’sdisease[53], Parkinson’sdisease[54], multiple ware includes signal normalization using a z-score method, sclerosis [55], amyotrophic lateral sclerosis [56]), organ which indicates how an individual data point differs from the transplants (kidney allografts [57], kidney transplant [58], population mean in units of standard deviation: chronic transplant injury [59]), and chronic diseases (chronic renal disease [60]). ðÞX μ ¼ ; However, despite the seemingly broad applicability of z score σ these microarrays, a number of issues arise that potentially μ where X is the signal value from a protein feature, is the complicate deduction of biological significance from the mean signal for all protein features, and σ is the signal sample ProtoArray data. Porous nitrocellulose surfaces, including standard deviation for all protein features. Noise thresholds the thin-film nitrocellulose slides used in fabrication of the are then set using the signal-to-noise ratio z-factor method, ProtoArrays, give higher protein binding capacities than which measures the dynamic range and the variation between planar surfaces, but typically suffer from higher intrinsic noise and signals: background fluorescence, as well as from higher non-speci- fic binding to the surface [61]andreducedstabilityand 3ðÞσ þ σ ¼ s n ; reproducibility [62]. A number of different approaches z factor jjμ μ s n have been explored to minimize background fluorescence issues with nitrocellulose surfaces, the aim being to increase where σs is the signal sample standard deviation for the signal-to-noise ratios in assays, including reducing the thick- protein features, σn is the signal sample standard deviation ness of the nitrocellulose coating [63]anduseofblack for the negative control features, μs is the mean signal for the nitrocellulose membranes [64]. These modifications may protein features and μn is the mean signal for the negative control features. A z-factor > 0.5 indicates a signal greater than though negatively affect protein binding capacities and twofold above the array background. Outlier identification is signal homogeneity [65]. Some newer microarray scanners carried out using the Chebyshev’s inequality (CI) p-value canalsoreadreporterprobeswithemissioninthenear- method, which compares the signal from each protein feature infrared region (700–800 nm), away from the commonly to the distribution of negative controls: used shorter wavelength regions of the spectrum (e.g. 532 ()and 635 nm) where autofluorescence is particularly high μ þ σ 1X n n due to light scattering within the 3D nitrocellulose matrix, CIp value ¼ σ 2 ; n > μ þ σ but even this does not directly resolve issues with higher Xμ X n n n non-specific binding. where X is the signal value from a protein feature, µn is the More importantly though, while the individual recombinant mean signal for the negative control features and σn is the proteins on the ProtoArray platform are expressed in insect signal sample standard deviation for the negative control cells and purified under native conditions, the nitrocellulose features. This algorithm calculates the p-value for the null microarray surfaces provide a relatively denaturing environ- hypothesis that a given signal belongs to the negative control ment to non-immunoglobulin proteins. Moreover, proteins features distribution. can also be at least partially stripped away from the nitrocel- Profiling applications of the ProtoArray human protein lulose surface during assay [66], potentially leading to some microarray promoted by the manufacturer include disease- lateral protein diffusion on the array surfaces. Any unfolding of specific biomarker discovery using immune response profiling, individual proteins on the nitrocellulose surface will disrupt 632 J. G. DUARTE AND J. M. BLACKBURN discontinuous , potentially leading to false negative which represents a confounding factor for researchers using results in serological profiling assays, and may also result in this technology without specialized knowledge in the field. false positives, through non-specific antibody binding to Antibody profiling with the HuProt array – also referred to exposed hydrophobic epitopes that are ordinarily buried in as biomarker profiling – has been carried out across numerous the native structures of the arrayed proteins (N.B.: this phe- disease settings, including primary biliary cirrhosis [70], neu- nomenon is frequently seen in western blots of denaturing ropsychiatric lupus [71], glioma [72], Behcet Disease [73], sec- polyacrylamide gels, with both monoclonal and polyclonal ondary progressive multiple sclerosis [74], and gastric cancer antibody reagents). Furthermore, the ProtoArray Prospector [75]. Additional uses of this array include investigating anti- software has some scope for improvement, particularly when body specificity, small molecule profiling, protein–protein dealing with large data sets, through the implementation of interactions, DNA/RNA binding and enzyme substrate more appropriate normalization methods and other desirable identification. features that have been discussed elsewhere in detail Using an essentially identical approach to protein produc- [2,31,67]. tion, purification, and immobilization to that described above, other related arrays have also recently been described on the same basic platform, for example an E. coli proteome micro- 3. HuProt human proteome microarray array [76–78] and a M. tuberculosis proteome microarray [79], The HuProt Human Proteome Microarray (CDI Laboratories) but have not yet gained widespread utilization beyond the was originally developed by the Zhu laboratory at Johns academic groups who developed these arrays. Hopkins University and contained 16,368 unique full-length A potential weakness of the HuProt system however lies in human proteins, representing 12,586 protein-coding genes the reliance on expression of human proteins in a lower [68]. If one adopts the simplistic ‘onegene=oneprotein’ eukaryote, in which post-lysis is a known problem paradigm (i.e. ignoring splice variants and PTMs), this repre- and in which only simpler post-translational modifications will sents ~60% of the human proteome. Similar to the yeast occur. Furthermore, as with the ProtoArrays, another potential proteome microarray described above, in the HuProt plat- limitation of the HuProt system lies in the surface chemistry form, human proteins are expressed recombinantly in S. used for array fabrication: non-specific immobilization of the cerevisiae as N-terminal GST-His-tagged fusion proteins, arrayed proteins onto the glass surface, whether by physisorp- using a pEGH-A yeast [69]. After high tion or covalent binding, can result in unfolding that con- throughput affinity purification using the GST tag, proteins founds downstream data interpretation, as discussed earlier.

and controls are printed in duplicate onto glass slides, and detected for quality control purposes using an anti-GST anti- body. Among a wide range of potential applications, this 4. Human Protein Atlas protein fragment array high density human protein microarray has found utility, for example, in the low-cost identification and validation of The Human Protein Atlas is an open-access interactive data- novel . base containing extensive images and data derived from anti- The current high-content v3.0 HuProt array contains >19,000 body-based proteomics and transcriptomics experimentation full-length human proteins, corresponding to >15,000 unique [80–83]. The Human Protein Atlas uses both in-house and protein-coding genes. These proteins belong to multiple dis- commercially-generated antibodies for immunohistochemistry tinct categories, including kinases, cell death, membrane pro- on tissue microarrays (to document the distribution of protein teins, nuclear proteins, metabolism, signal transduction, expression in normal and cancer tissues) and for immunofluor- secreted proteins, and transcription factors. The HuProt arrays escence staining on cell lines (to document the spatial distri- can be printed on a variety of different glass slides, including bution at a subcellular level). The resultant database includes FAST, FullMoon, SuperEpoxy, SuperAldehyde, SuperNHS, Ni- the majority of human protein-coding genes, with information NTA, and PATH. For antibody profiling applications, after block- about the expression and localization of proteins, and is sub- ing, arrayed slides are incubated with the biological sample and divided into three different atlases: the Tissue Atlas provides bound IgG antibodies are detected using a fluorescently- pathology-based annotated protein expression levels and labeled human IgG secondary antibody. Assayed slides can immunohistochemistry stained images obtained across 76 dif- then be scanned with any open architecture microarray scan- ferent cell types, corresponding to 44 different normal tissue ner, using the GenePix array list (.gal file) from the arrayers that types, as well as RNA deep sequencing-based RNA expression describes the location of each protein on the array. The result- data obtained across 37 different normal tissue types; the ing data can then be analyzed using various image acquisition Cancer Atlas provides pathology-based annotated protein and analysis software such as GenePix, ArrayPro, or others. In expression levels and immunohistochemistry stained images the absence of a standardized downstream data analysis pack- obtained across 216 different cancer samples, corresponding age or approach, data processing, normalization and analysis to 20 different cancer types with basic demographic and methods have varied considerably between studies based on tumor phenotype information; and the Cell Atlas provides the HuProt arrays, including use of methods such as z-scores, antibody profiling-based annotated protein expression levels Fisher’s exact test on positive incidence and rate, signal value and confocal microscopy stained images cut-offs, t-tests, quantile normalization, outlier detection, false obtained across 56 different cell lines, corresponding to 32 discovery rate detection and ROC curves; these methods are different organelles and fine cellular structures, as well as RNA not all equally statistically valid for protein microarrays though, deep sequencing-based RNA expression data. EXPERT REVIEW OF PROTEOMICS 633

In order to enable the large-scale generation of high spe- platforms, as a result of the high proportion of the human cificity antibodies to human proteins, the Human Protein Atlas proteome that is represented on these arrays. However, it is consortium employs a bioinformatic strategy to identify ‘pro- important to recognize that there are typically only two PrEST tein signature tags’ (PrESTs) for each human protein tags per human protein on the arrays and that any one tag [84,85], representing the peptide regions of each protein represents only a fraction of all possible epitopes on any given which show the lowest similarity to any other human protein. protein. Furthermore, discontinuous epitopes and post-trans- These PrESTs are then produced and used in high throughput lationally modified epitopes will also be largely absent on immunization of animals to elicit the cognate antibodies. This these protein fragment arrays, so if the coverage was repre- bioinformatic approach combines the PRESTIGE web-based sented as a fraction of all possible human epitopes, it would tool and a whole-genome analysis to avoid 50-amino acid be significantly smaller than the claimed 94%. Moreover, the regions with more than 60% sequence identity to other pro- use of protein fragments (which presumably in general have tein domains, 10-amino acid regions with more than 8 iden- little tertiary structure), rather than full-length, folded proteins, tical amino-acids, transmembrane regions, signaling restricts the range of potential assays on this platform largely and restriction enzyme recognition sites commonly used in to profiling of antibody binding to a repertoire of linear epi- cloning methods. topes [74]; in this regard, the Human Protein Atlas Protein The resulting PrEST fragments are typically 80-amino acids Fragment Arrays can therefore be considered closer concep- long, ranging between 16 and 202-amino acids, and are con- tually to high density peptide arrays than to full length protein sidered unique representations of the corresponding proteins; arrays. these fragments are expressed as N-terminal albumin binding protein (ABP)-His-tagged fusion using the pAff8c expression 5. Nucleic Acid Programmable Protein Array (NAPPA) vector in E. coli Rosetta DE3 to increase protein yields for immunization [86,87]. An innovative, alternative approach to the production of pro- These same PrEST fragments have subsequently found tein microarrays was first reported by the Taussig group additional utility as the content for high-density microarrays, (Babraham Institute, UK) and developed more fully by the in both planar and bead-based suspension microarray formats LaBaer group (Harvard), dispensing with off-line, high [81]. The resultant Human Protein Atlas Protein Fragment throughput protein expression and purification altogether. Arrays developed by the Nilsson laboratory at KTH (accessible Although the precise components differ, essentially in the via the Profiling Facility, SciLifeLab, Stockholm) Protein In Situ Array (PISA [12]) and Nucleic Acid

and currently include 42,698 unique purified protein frag- Programmable Protein Array (NAPPA [14]) methods, cDNAs ments, derived from 19,497 human protein coding genes encoding an in-frame fusion to a tag are printed (but not [74]; using the SwissProt ‘one-gene = one-protein’ paradigm, immobilized) into a spatially defined array, together with a 94% of the human proteome is thus represented by at least capture reagent that is specific for the tag (the capture agent one fragment on these arrays. is itself immobilized on the array surface). The encoded, For the production of Human Protein Atlas Protein tagged proteins are then expressed in situ by overlaying Fragment Arrays, an optimized high-throughput protein pro- coupled cell-free transcription-translation mixes and each duction workflow is used for the large scale expression, affinity tagged protein then binds rapidly to its cognate, immobilized purification and verification (by mass spectrometry) of the capture reagent to form the protein microarray. More recent recombinant proteins. Purified protein fragments are spotted technological developments of this in situ array fabrication onto epoxide-coated glass slides using a non-contact micro- approach have focused on the choice of tag: array printer and the residual surfaces then blocked [74]ina In the PISA method, a C-terminal double His6-tag was used, similar manner to HuProt arrays. These high density microar- but the relatively low binding affinity of a tag-surface combi- rays have found utility in, for example, the validation of newly nation really designed for protein purification results in rela- generated Human Protein Atlas antibodies and in autoanti- tively high spontaneous off-rates from Ni-NTA surfaces and body profiling of serum and plasma. any subsequent re-binding events would begin to scramble Autoantibody profiling with such protein fragment arrays the spatial locations on the array; has been carried out across different diseases, including multi- In the original NAPPA method, a C-terminal GST tag was ple sclerosis, sarcoidosis, and osteoarthritis, often as a screen- used, combined with an immobilized polyclonal anti-GST anti- ing or discovery phase, followed by validation on bead-based body to provide a more stable linkage. However, the antibody suspension arrays [74,88–91]. As with other protein microarray itself was adsorbed non-specifically through ionic interactions platforms, the data analysis pipelines and statistical to an amino-silanized glass surface, so spontaneous deso- approaches used varies between studies, although the major- rption and re-binding (equating to time-dependent, lateral ity convert the resulting fluorescent intensities to a binary diffusion of the arrayed proteins) was again likely an issue; variable by defining a sample-specific threshold (signal inten- In the most recent manifestation of the NAPPA method, a sities > median + n × median absolute deviation or standard C-terminal HaloTag [17,92] is used to provide covalent binding deviation), and apply a comparative false discovery rate-cor- of the tagged protein to the surface via a chloroalkane ligand rected statistical test to compare sample groups and identify that is co-arrayed and is itself cross-linked on to an aminosi- significant differences. lane glass slide. The authors note that this approach results in At first sight, the Human Protein Atlas Protein Fragment rapid capture and up to four times higher densities of Arrays offer an apparent advantage over other protein array expressed fusion proteins per spot than the original NAPPA 634 J. G. DUARTE AND J. M. BLACKBURN

method, resulting in higher signal-to-noise ratios; importantly, affinities and selectivity across families of structurally-related they also note that there is minimal lateral protein diffusion as proteins [15,20]; immune response profiling [21,103–105]; on- a result of the covalent immobilization method. array post-translational modifications [20]; and measurement Human protein NAPPA arrays are today available from the of protein–protein interactions [20]. Protein Array Core of the BioDesign Institute (Arizona) and Currently, the Immunome protein array contains 1636 full- contain up to 11,500 human ORF expression products printed length, correctly folded and functional human proteins plus in a variety of formats. NAPPA arrays have also been produced controls, representing major protein classes involved in the for other organisms, including Vibro Cholera, M. tuberculosis, immune response, such as cancer antigens, interleukins, cyto- and Ornithodoros moubata, as well as Arabidopsis protein kines, protein kinases, transcription factors and signaling pro- arrays comprising a total of 12,000 ORFs (each array contain- teins. Each protein is printed in quadruplicate across a single ing 4600 ORFs printed in duplicate [17,93]). NAPPA arrays have array to enable more rigorous statistical analysis of quantita- been used in a variety of serological studies, including in tive data downstream. diabetes [94,95], juvenile arthritis [96], ankylosing spondylitis The proteins on the Immunome protein array are expressed [97], breast cancer [98], ovarian cancer [99] and varicella zoster from baculoviral vectors in insect cells as fusions to a viral infections [100], as well as in mapping transcription factor C-terminal biotin carboxyl carrier protein (BCCP) tag, which is interactome networks in Arabidopsis [17] and detecting post- itself a compact, 80 amino acid, all-beta domain of the E. coli translational modifications such as AMPylation [101] and che- acetyl CoA carboxylase [106]. Expression of human proteins in mical acetylation [102]. insect cells is well known to have no particular size limitation The simplicity of the NAPPA array fabrication is attractive, and, since intrinsic folding mechanisms are shared with higher not least since assays can be carried out on freshly expressed eukaryotes, both folding and post-translational modifications proteins. However, it is less clear whether the NAPPA arrays will are considered to be mammalian like [107,108]. prove to be compatible with a wider variety of assays for a The BCCP tag is biotinylated in vivo on a single lysine number of reasons. In particular, cell-free transcription-transla- residue (K122, based on AccB numbering) by host cell biotin tion systems have classically proved to be a good way to ligases in E. coli, yeast, insect cells and mammalian cells, with- produce polypeptide in increasingly large scale, but can require out the need to co-express the E. coli biotin ligase. The BCCP considerable optimization of conditions to produce any given tag is however only biotinylated in vivo if the tag itself folds protein in a folded, functional form, perhaps because there is into its correct tertiary structure and, unlike the in vitro limited control over redox conditions during expression and evolved AviTag, the linear motif in BCCP is not

folding due to the loss of compartmentalization [16]; this is a substrate for the biotin ligases [109]. In common with other likely to be particularly problematic for more complex proteins small, autonomously folding bacterial proteins (e.g. glu- with multiple disulphide bonds and will be exacerbated by the tathione S transferase, thioredoxin), an N-terminal BBCP tag fact that all proteins are by necessity expressed under a single can increase expression and solubility of a C-terminal fusion set of in vitro conditions for the NAPPA arrays. Furthermore, it partner [20]. Moreover, if used as a C-terminal tag, post-trans- seems likely that post-translational modifications on eukaryotic lational modification (i.e. biotinylation) of BCCP has been proteins produced by cell-free transcription-translation will be found to be a useful reporter for the folded state of fusion simpler, even if eukaryotic cell-free extracts are used, again due partners [15,21,110] in a manner similar to other known fold- to compartmentalization issues. As with the ProtoArray and ing markers, such as the green fluorescent protein (GFP) HuProt systems, only simple surface chemistry has been used [111,112]. In the case of GFP, post-translational formation of to date for NAPPA arrays (aminosilane glass slides) but concerns the fluorophore does not occur if the upstream fusion partner related to slow on-array unfolding are likely to be less of an itself mis-folds, due either to concomitant mis-folding of GFP, issue here because the NAPPA arrays are typically produced to or to aggregation of the mis-folded proteins, or to rapid order at point of use. degradation of the mis-folded proteins by the proteasome; similar mechanisms are likely to be at play with biotinylation of C-terminal BCCP tags. 6. Immunome protein array Taken together, the advantageous features of the BCCP tag The Immunome protein array platform (Sengenics) was origin- enable a facile route to microarray fabrication that circum- ally developed by the Blackburn laboratory at the University of vents the need for laborious pre-purification steps by combin- Cambridge, based on the concept of lower density microarrays ing in situ purification and immobilization into a single step: comprising focused families of full-length, folded, functional nanolitre volumes of crude insect cell lysates containing the proteins and designed from the bottom up to enable the expressed, BCCP-tagged proteins are printed directly onto quantitative, high throughput, systems-oriented analysis of a -coated hydrogel surfaces; the biotinylated, BCCP- wide range of different protein functions. For example, differ- tagged recombinant proteins bind with femtomolar affinity, ent protein function microarrays from the Blackburn labora- essentially saturating the available streptavidin binding sites, tory included: ~330 human kinases, representing roughly half irrespective of expression level; and all non-biotinylated pro- the human kineome [15]; ~100 cancer-testis antigens, repre- teins wash away, leaving a microarray of individually purified, senting ~80% of that protein family [2,21]; ~300 human tran- BCCP-tagged human proteins tightly bound to the array sur- scription factors; and ~50 clinically-relevant polymorphic face [15]. Usefully, it has been observed by mass spectrometry variants of p53 [20]. Published assays on this platform include: that endogenously biotinylated host cell proteins (which can quantitation of protein-DNA and protein-drug binding be readily observed on a ) do not pull down as EXPERT REVIEW OF PROTEOMICS 635 native proteins because their biotin moieties are not physically The Immunome protein array has been used for immune accessible to bind to streptavidin [113]. Furthermore, the profiling across numerous disease settings, including systemic streptavidin-biotin interaction is essentially irreversible (Kd lupus erythematosus [103], malaria [104], Parkinson’s disease – ~10 15 M) and the streptavidin tetramers are themselves cova- [105], melanoma [21,114], colorectal cancer [114], prostate lently bound into the hydrogel surface, so no lateral diffusion cancer [32], non-small cell lung cancer [41] and ageing (sub- of the arrayed proteins occurs, even after months of storage. mitted). Further applications of the Immunome protein array A further notable feature of the Immunome protein array include cancer and autoimmune disease biomarker discovery, platform is the use of more sophisticated surface chemistry, monitoring global immune responses to novel therapeutics with polyethylene glycol (PEG) polymers serving two principle (e.g. NY-ESO-1 vaccine [21], anti-CTLA-4, and/or anti-PD-1 functions on the surface: highly effective blockade of non- immunotherapy (Duarte et al. submitted); BRAK/MEK inhibitors specific macromolecule absorption to the underlying glass [114]) in a clinical trial setting, and monitoring global immune [22]; and provision of an environment surrounding the arrayed responses to viral, microbial and fungal infections. In addition, proteins that is similar to unstructured water and which helps kinase inhibitor selectivity assays have been reported across to preserve the folded structure of the immobilized proteins 336 human kinases on the Immunome protein array platform for a long period (Sengenics claim a shelf life of 6 months for [15], with the microarray data matching previously reported the immunome protein arrays). The underlying conformational solution-phase Ki values well (Peton et al. submitted), support- flexibility of the PEG polymers also allows individual protein ing the contention that the proteins on this array platform are molecules in the same spot to orient themselves in 3D such folded and functional. that they can interact, enabling homo- and hetero-dimeriza- Extensive linearity and dynamic range assays have been tion to occur (the latter, where different proteins have been performed for this array, reporting a detection limit in the co-arrayed). For example, the Blackburn group have shown pg/mL range for autoantibody binding and a linearity of that arrayed human protein kinases can autophosphorylate on detection over ~4 orders [21]. Coupled with very low inter- the array surface in the presence of ATP, implying that cataly- array variability (inter- and intra-array CVs <4%), this means tically-active homodimers have formed [15]. that the Immunome protein array platform is genuinely quan- The Blackburn group has also developed a new bioinfor- titative – making it the first full length protein microarray matic pipeline for raw data processing and normalization of platform for which this is true. Equally importantly, compared protein microarray data [2]. This pipeline includes flagging of to other platforms, the Immunome protein array gives sparse saturated signals (>10% saturated pixels, which represent sig- signals in serological assays on young, presumed healthy indi-

nals above the reading capacity of the microarray scanner); viduals [32,114], yet gives positive, specific binding data with background subtraction, which calculates net intensities by conformation-specific antibodies. Together, this suggests that subtracting the neighborhood background intensities from preservation of folded structure on the Immunome protein matching foreground intensities; noise threshold, which dis- arrays – which is essentially a function of the underlying sur- cards net intensities that are less than two standard deviations face chemistry – should result in fewer false positives (since from the mean neighborhood background intensity, that is, hydrophobic epitopes remain buried) and fewer false nega- which are not significantly different from background there- tives (since discontinuous epitopes are preserved) than typi- fore considered to be in the noise; percentage of coefficient of cally observed on other protein microarray platforms. variation (CV) calculations, which determines the variation between replica features: σ % ¼ ; 7. Conclusion CV μ 100 In this review, we have provided an overview of the techno- where σ is the standard deviation, and μ is the mean signal of logical challenges encountered by researchers in the protein all feature replicates; spot filtering, where spots with CV above microarray field regarding content generation, immobiliza- 20% are flagged; mean of net intensity, which calculates the tion strategies, assay development, as well as data processing mean of all valid replicas per feature; verification of positive and analysis; we have then discussed technological advances controls by CV evaluation, which confirms that array printing to overcome some or all of these challenges in the context of was successful and of good quality; and data normalization, the most commonly used protein microarray platforms which uses a composite normalization method that is based (Table 1). To date, the greatest efforts have understandably only on the positive control features (e.g. fluorescently-labeled been made in expanding the content on the various protein and biotinylated BSA) [2]. microarray platforms, with less attention being paid yet to Downstream of the raw data extraction and processing the development of improved surface chemistries that will steps, for biomarker discovery assays, a penetrance fold aid preservation of folded structure function of the arrayed change-based method is then used to identify up-regulated proteins, or to the development of a diverse repertoire of and down-regulated biomarkers when conducting a group quantitative, functional assays that are compatible with the analyses (e.g. a diseased cohort in comparison to an age- various microarray platforms. Data handling is also evolving, and gender-matched healthy cohort) and data is visualized with increasingly sophisticated data processing and statisti- using volcano plots, while receiver–operator characteristic cal methods being used now by many groups, but standar- (ROC) curves are also used to identify candidate diagnostic dization of appropriate statistics and of reporting remains an panels. on-going issue. 636 J. G. DUARTE AND J. M. BLACKBURN

Table 1. Comparison between commonly used commercially available protein microarray platforms. Protein microarray Cloning and expression platform Protein content method Data analysis approach Profiling applications ProtoArray Human 9483 unique full-length N-terminal GST fusion Prospector software (1) Immune response Protein Microarray human proteins proteins produced using (2) Enzyme substrate the pENTR221 expression (3) Protein–protein interaction vector in insect cells (4) Antibody specificity HuProt Human >19,000 full-length human N-terminal GST-His-tagged GenePix Pro Image Acquisition and (1) Biomarker discovery Proteome proteins, representing proteins produced using Analysis Software/varied (2) Enzyme substrate Microarray >15,000 unique protein- the pEGH-A expression (3) Small molecule coding genes and ~75% of vector in a Saccharomyces (4) Protein–protein interaction the human proteome cerevisiae strain (5) Antibody specificity (6) DNA/RNA binding Human Protein Atlas Up to 42,698 unique protein N-terminal ABP-His-tagged Sample-specific thresholds for binary (1) Biomarker discovery Protein Fragment fragments, representing proteins produced using variable assignment and (2) Antibody specificity Array 19,497 human protein- the pAff8c expression comparative false discover rate- coding genes and ~94% of vector in the Escherichia corrected statistical test/varied the human proteome coli Rosetta DE3 strain Nucleic Acid Up to 12,000 Arabidopsis or In situ coupled cell-free User defined (1) Immune response Programmable human open reading transcription-translation of (2) On-array post-translational Protein Arrays frames tagged cDNAs modification (3) Protein–protein interaction (4) Polymorphic variants Immunome Protein 1,636 full-length human BCCP-tagged fusion proteins Generic, open source Java scripts for (1) Immune response Array proteins involved in the produced in SF9 insect saturated signal flagging, (2) Drug selectivity immune response cells and biotinylated in neighborhood background (3) On-array post-translational vivo subtraction, noise filtration, replica modification CV calculation, outlier removal, (4) Protein–protein interaction median of net intensity calculation, (5) Protein-nucleic acid binding normalization, penetrance fold (6) Polymorphic variants change-based biomarker identification

At present, the dominant biological applications for protein broader uptake of this technology; this is evidenced by the microarrays are based on qualitative profiling of antibodies across relatively poor uptake of the yeast proteome microarray repertoires of potential antigens. It seems intuitive that, when described herein, despite it containing ~80% of the theoretical conducting antibody profiling, biomarker discovery or protein proteome of the work-horse, model organism, S. cerevisiae.We functionality studies, having full-length, correctly folded proteins suggest that there are several distinct ways to expand the onthearraywouldbemoredesirable than shorter protein frag- menu of compatible assays, including inter alia: ments or unfolded proteins. This goal presents a different magni- tude of technological challenges, yet there are encouraging signs (i) Broaden out the effective proteome coverage of exist- that this challenge is being addressed now, at least for some of the ing on-array assays by introducing on-chip specific platforms describe earlier, suggesting that functional assays on post-translational modifications of the existing content protein microarrays may become more mainstream in biology in (e.g. phosphorylation; citrullination; methylation; acet- the near future, if costs of access can be kept reasonable. ylation; glycosylation; etc.) prior to functional assay, thereby enabling the regulation of protein activities to be studied in higher throughput; 8. Expert commentary (ii) Build protein complexes on the array surfaces prior to Protein microarray technologies continue to develop, with the functional assay, either by expression of the content in ultimate aim of full proteome coverage. To date the highest den- homologous hosts (essentially equivalent to mass co- sity protein microarrays have reached the point where at least one experimentation to form the epitope per protein is present for 19,497 of the estimated ~20,500 arrays) or through co-expression of known interacting human genes and it seems likely that every human protein will partners in the context of the most commonly used soon be represented by one or more epitopes on such arrays. Full protein microarray platforms (Table 1); length human protein arrays are not far behind, with current (iii) Broaden out the repertoire of on-array assays to versions representing the expressedproductsofupto15,000 include high content, cell-based readouts; unique protein coding genes. However, more generally the (iv) Incorporate neoantigens onto the arrays, not just wild human protein microarray field will take longer to tackle questions type proteins, including aberrant splice variants and of polymorphic variation, splice variation and dynamic post-trans- mutations; and lational modifications – issues that the mass spectrometry-based (v) Move away from reliance on fluorescence as the readout proteomics community are now turning their attention to. of functional assays on the arrays and into the realm of Fundamentally though, irrespective of the available content, label-free detection of biomolecular interactions. any protein microarray is only as good as the range of assays that can be performed on-array and it is this aspect that we The first three of these possibilities will require a concomitant believe still today provides one of the greatest barriers to greater focus on underlying surface chemistries and on EXPERT REVIEW OF PROTEOMICS 637 preservation of the folded structure of proteins post-immobili- cost of content production and comparison to what research- zation, since non-specific modification of arrayed proteins, non- ers have paid for DNA microarrays in the past, yet the cost- specific complexes or non-specific cell binding to the surface per-resultant true datapoint is vastly different between DNA would all obfuscate biologically meaningful, true positives in and protein microarrays. We therefore suggest that, despite subsequent assays. Here, it is clear that simple microarray sur- the march toward complete proteomes on a chip, there will faces are not best and we suggest that the protein microarray remain a demand for smaller, more focused protein microar- field will do well to learn from other biosensor platforms, where rays that are customized to the specific research question and site-specific attachment of capture probes to more complex, which can deliver relevant quantitative data for all the probes hydrogel-type surfaces is commonly used, albeit with much on the array. lower densities of content per chip, and where a broader Protein microarray data processing and analysis tools have range of biomolecular interactions are routinely probed on come a long way, with current approaches starting to use the surfaces. more appropriate statistical methods. However, we believe The importance of neoantigens in immunology is now that the restricted ability to compare datasets across experi- widely recognized in the cancer and vaccine fields, where ments done in different laboratories and on different plat- accumulation of neoantigens appears to correlate with, for forms represents another significant bottleneck in the field example, escape from immunotherapy or prophylaxis. It today. We therefore suggest that some degree of standardiza- would therefore be fascinating to use protein microarray tech- tion of the statistical principles that are appropriate for protein nology to explore the connection between tumor specific microarray datasets, as well as of the reporting and annotation mutations and the development of corresponding mutation- of data – as has happened in the DNA microarray, proteomics specific autoantibody responses. This concept is not without and glycan microarray communities through the MIAME its difficulties, since the recent literature on intra-tumor het- [118,119], MIAPE [120], and MIRAGE [121] initiatives – would erogeneity shows that the precise location of tumor biopsy be beneficial to the protein microarray community now. sampling can have a significant effect on the spectrum of It is worth pausing briefly to consider the elephant in the mutations identified by sequencing, which may in turn con- room in any discussion on protein microarrays: the question of found efforts to correlate protein array and sequencing data, how the information that can be gained from protein micro- but early papers in this area [115] suggest that there is defi- arrays compares to mass spectrometry-based proteomics. nitely scope for progress. If such correlations can be estab- Mass spectrometry provides the ability to study given pro- lished, then the prospects for use of focused protein teomes in both unbiased data dependent and data indepen-

microarrays to monitor treatment response and the emer- dent modes, as well as in targeted modes, and can detect gence of resistance through serology seem bright. However, PTMs, splice variants, and neoantigens. However, due to high given that tumor-associated mutations can affect folding of sequence homologies, one class of proteins that mass spectro- the antigen and that autoimmune responses can occur to metry-based proteomics continues to struggle with is the discontinuous as well as linear epitopes, we suggest that immunoglobulins, so for serological studies that aim to quan- arrays of folded proteins are likely to be most useful here. tify changes in the titres of individual (auto)antibodies in In terms of the development of new array-based assays, in disease, the component-resolved protein microarray architec- principle any assay that can be conducted on a purified protein ture has considerable advantages, both for discovery as well in solution ought to be feasible on a protein microarray, with as for quantitative profiling applications. some ingenuity. However, a major challenge is how to simulta- The advent of chemical proteomic strategies that, for exam- neously visualize, for example, the binding of unlabeled small ple, utilize crosslinking approaches to identify components of molecules to many different arrayed proteins, either from the complexes [122] or immobilized chemical probes to detect same or unrelated families. Ideas from the activity-based pro- potential drug binding proteins [123], have opened up new tein profiling field [116,117] can be adapted to a degree, but horizons for mass spectrometry-based proteomics. However, such probes are typically limited to subsets of a given protein these mass spectrometry-based methods are not readily amen- family and do not routinely give quantitative readouts. We able to direct, high throughput quantitation of biomolecular therefore suggest that label-free detection methodologies, interaction (e.g. protein–protein, protein–ligand, protein–DNA) such as surface plasmon resonance, biolayer interferometry affinities, which is crucial to any understanding of biomolecular and mass spectrometry, hold greater promise for the future of selectivity, nor, for example, to the high throughput study of protein microarray technology: by removing the requirement protein-cell interactions or the activities of protein complexes. for a fluorophore to be present somewhere in the assay system Thus, we suggest that both protein microarrays and mass for signal detection, quantification of biomolecular interactions spectrometry-based proteomics have their own niches, with across a broader range on structurally unrelated proteins on a protein microarrays having distinct advantages for systems- protein microarray should then be enabled. oriented, quantitative analysis of the functions of full length, As an aside, it is interesting to reflect on the fact that, correctly folded proteins and of protein complexes. We also whereas a DNA microarray-based transcriptomic experiment suggest though that the convergence between these two typically returns a data point for every probe on the chip, a proteomic technologies is likely to create a rich seam in the protein microarray experiment ought to only return data on a future, with mass spectrometry identifying new targets to put small subset (unless the assay is dominated by non-specific on protein microarrays for quantitative analysis and with pro- interactions). The current prices of commercial protein micro- tein microarrays becoming mass spectrometry-readable for arrays appear to be set based on some composite factor of the analysis of mixtures. 638 J. G. DUARTE AND J. M. BLACKBURN

This review has highlighted the fact that protein microarrays Key issues can be used across numerous applications. Of particular current ● Protein microarrays are approaching full coverage of the interest globally is the use of protein microarray technology in basic ‘one-gene-one-protein’ human proteome now and biomarker discovery or immune response profiling in disease the complete proteome microarrays have also been pro- settings. The various protein microarray platforms described duced for a number of smaller, model organisms, including here have been used to identify sets of candidate diagnostic yeast and E. coli. biomarkers in cancers, autoimmune diseases, inflammatory dis- ● The major applications of protein microarrays currently lie ease, neurological diseases, and chronic diseases. In the cancer in antibody screening and in serological profiling, where biomarker discovery field, we believe that cancer-specific micro- the component-resolved protein array-based methods have arrays hold great promise in identifying novel diagnostic, prog- obvious advantages over mass spectrometry-based nostic, and disease recurrence biomarkers, as well as for proteomics. monitoring immune responses to novel immuno- and che- ● Demonstrated protein microarray-compatible assays motherapeutics in a clinical trial setting. However, as will the include on-array quantitation of protein-drug, protein-DNA vast majority of proteomics data, the clinical validation and and protein-protein interactions, as well as enzymatic activ- application of such candidate biomarkers is still a work in ity assays, suggesting that protein microarrays are capable progress and, in our experience, a closer link between clinicians of generating high throughput, quantitative data on the and scientists will be critical in order to begin translating inter- selectivity and affinity of biomolecular interactions; this esting research observations made using protein microarray represents another area where array-based methods have technology into clinical practice. In particular, once the discov- obvious advantages over mass spectrometry-based ery phase of research has been completed, identified candidate proteomics. diagnostic biomarkers then need be correlated with detailed ● Formation of protein complexes and post-translational clinical knowledge about the individual patients and samples modifications on-array has also been demonstrated on (including precise disease classifications at the time point at some platforms, suggesting that protein microarrays have which samples were drawn) and then validated across large potential to aid the study of regulation in biological patient cohorts and available databases before being devel- systems. oped onto a suitable diagnostic platform for patient stratifica- ● In general, greater emphasis is needed on surface chemis- tion, diagnosis and treatment monitoring. tries that are able to preserve folded structure and function of arrayed proteins, in order to underpin the development 9. Five-year view of a wider repertoire of high throughput, on-array func- tional assays. In the next five years, we anticipate that several protein ● Integration of protein microarrays with cell-based readouts microarray platforms will achieve full coverage of the basic and label-free detectors is needed now to take the field ‘one-gene-one-protein’ human proteome as well as com- beyond fluorescence-based assays. plete proteomes for a number of other model organisms. ● Smaller, focused custom protein microarrays continue to We further anticipate that greater emphasis will be placed have a niche for specific biological questions, for example onbroadeningtheproteomecoverageoftheprotein in drug selectivity screening or in vaccine design. microarrays to include post-translational modifications, ● Standardisation of bioinformatic methods, as well as for splice variants, neoantigens and protein complexes. reporting and annotation of data for protein microarrays Greater focus will also be placed on surface chemistry to is needed now to enable datasets from different studies to retain the folded structure of the arrayed proteins, in order be mined more effectively. to reduce false positives and false negatives in assays; these advances will collectively impact significantly on the utility of protein microarrays in downstream biomarker discovery Funding and functional analysis applications. Smaller, more focused This paper was funded by a grant from the National Research Foundation protein microarrays will be customized for specific studies of South Africa [64760]. and the cost of protein microarrays will be lower than today. A key technological advance will be integration of protein Declaration of interest microarrays with on-array cell-based readouts and with label- JM Blackburn is a consultant for Sengenics and developed their free detection methodologies, which will open out a much Immunome protein array product while an academic researcher at the wider range of functional assays that can be performed and University of Cambridge. The Immunome protein array is one of the five which will therefore significantly increase the relevance of commercially-available protein array platforms described in this review. protein microarray technologies to the study of complex bio- The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial logical phenomena in both humans and in model organisms conflict with the subject matter or materials discussed in the manuscript such as yeast. apart from those disclosed. Translation into the clinic of (auto)antibody-based biomar- kers discovered using protein microarrays will offer the pro- spect of pre-symptomatic diagnosis in cancers and of patient ORCID stratification in precision medicine. Jonathan M. Blackburn http://orcid.org/0000-0001-8988-9595 EXPERT REVIEW OF PROTEOMICS 639

References 21. Beeton-Kempen N, Duarte J, Shoko A, et al. Development of a novel, quantitative protein microarray platform for the multiplexed Papers of special note have been highlighted as either of interest (•)orof serological analysis of autoantibodies to cancer-testis antigens. Int considerable interest (••) to readers. J Cancer. 2014;135(8):1842–1851. 1. Hardiman G. Microarray technologies - an overview. •• Focused protein arrays used for quantitative profiling of Pharmacogenomics. 2003;4(3):251–256. changes in autoimmune titres in response to immunotherapy. 2. Duarte J, Serufuri J-M, Mulder N, et al. Protein function microarrays: 22. Mrksich M. Using self-assembled monolayers to understand the design, use and bioinformatic analysis in cancer biomarker discov- biomaterials interface. Curr Opin Colloid Interface Sci. 1997;2 ery and quantitation. Bioinforma Hum Proteomics Transl (1):83–88. Bioinforma. 2013;3:39–74. 23. Zheng J, Li L, Tsao H-K, et al. Strong repulsive forces between • Describes a generic pipeline for processing raw protein micro- protein and oligo (ethylene glycol) self-assembled monolayers: a array datasets. molecular simulation study. Biophys J. 2005;89(1):158–166. 3. Sanchez-Carbayo M. Antibody microarrays as tools for biomarker 24. Mok J, Im H, Snyder M. Global identification of protein kinase discovery. In: Korf U, editor. Methods Mol. Biol. Vol. 785. Clifton, NJ: substrates by protein microarray analysis. Nat Protoc. 2009;4 Humana Press; 2011. p. 159–182. (12):1820–1827. 4. Hoheisel JD, Alhamdani MSS, Schröder C. Affinity-based microar- 25. Jones R, Gordus A, Krall J, et al. A quantitative protein interaction rays for proteomic analysis of cancer tissues. Proteomics Clin Appl. network for the ErbB receptors using protein microarrays. Nature. 2013;7(1–2):8–15. 2006;439(7073):168–174. 5. Borrebaeck CAK, Wingren C. Antibody array generation and use. •• Focused protein arrays of human Src homology2 and phos- Methods Mol Biol. 2014;1131(2011):563–571. photyrosine binding domains used for quantitative measure- 6. Gerdtsson A, Dexlin-Mellby L, Delfani P, et al. Evaluation of solid ment of binding to peptide substrates. supports for slide- and well-based recombinant antibody microar- 26. Schweitzer B, Predki P, Snyder M. Microarrays to characterize pro- rays. Microarrays. 2016;5(2):16. tein interactions on a whole-proteome scale. Proteomics. 2003;3 7. Delfani P, Mellby LD, Nordstrom M, et al. Technical advances of the (11):2190–2199. recombinant antibody microarray technology platform for clinical 27. Klein JB, Thongboonkerd V. Overview of proteomics. Proteomics immunoproteomics. PLoS One. 2016;11(7):1–26. Nephrol. 2004;141:1–10. 8. MacBeath G, Schreiber SL. Printing proteins as microarrays for high- 28. Hall DA, Ptacek J, Snyder M. Protein microarray technology. Mech throughput function determination. Science (80-.). 2000;289 Ageing Dev. 2007;128(1):161–167. (5485):1760–1763. 29. Blackburn JM, Hart DJ. Fabrication of protein function microarrays 9. Gupta S, Manubhai KP, Kulkarni V, et al. An overview of innovations for systems-oriented proteomic analysis. Methods Mol Biol. and industrial solutions in protein microarray technology. 2005;310(5):197–216. Proteomics. 2016;16(8):1297–1308. 30. Safari Serufuri J-M. Development of computational methods for 10. Blackburn JM, Shoko A. Protein function microarrays for custo- custom protein arrays analysis. A case study on a 100 protein mised systems-oriented proteome analysis. In: Korf U, editor. (“CT100”) cancer/testis antigen array. Masters thesis. University of

Methods Mol. Biol. Vol. 785. Clifton, NJ: Humana Press; 2011.p. Cape Town (Cape Town, RSA). 2010. 305–330. 31. Deluca DS, Marina O, Ray S, et al. Data processing and analysis for 11. Braun P, Hu Y, Shen B, et al. Proteome-scale purification of human protein microarrays. Methods Mol. Biol. 2011;723:337-347. proteins from bacteria. Proc Natl Acad Sci U S A. 2002;99(5):2654–2659. 32. Adeola HA, Smith M, Kaestner L, et al. Novel potential serological 12. He M, Taussig MJ. Single step generation of protein arrays from prostate cancer biomarkers using CT100+ cancer antigen microar- DNA by cell-free expression and in situ immobilisation (PISA ray platform in a multi-cultural south african cohort. Oncotarget. method). Nucleic Acids Res. 2001;29(15):e73. 2016;7(12):13945–13964. • Describes the first in situ generated protein array approach. 33. Moore CD, Ajala OZ, Zhu H. Applications in high-content functional 13. Jackson AM, Boutell J, Cooley N, et al. Cell-free protein synthesis for protein microarrays. Curr Opin Chem Biol. 2016;30:21–27. proteomics. Brief Funct Genomic Proteomic. 2004;2(4):308–319. 34. Zhu H. Global analysis of protein activities using proteome chips. 14. Ramachandran N, Hainsworth E, Bhullar B, et al. Self-assembling Science (80-.). 2001;293(5537):2101–2105. protein microarrays. Science. 2004;305(5680):86–90. •• Screening of a yeast proteome microarray for protein and •• Generation of high density, self-assembling protein microar- phospholipid binding functions. rays from arrayed cDNA by coupled in vitro transcription- 35. Nagy PD. Yeast as a model host to explore plant virus-host inter- translation. actions. Annu Rev Phytopathol. 2008;46:217–242. 15. Blackburn JM, Shoko A, Beeton-Kempen N. Miniaturized, microar- 36. Smith MG, Jona G, Ptacek J, et al. Global analysis of protein function ray-based assays for chemical proteomic studies of protein func- using protein microarrays. Mech Ageing Dev. 2005;126(1):171–175. tion. Zanders ED, editor. Methods Mol Biol. 2012;800:133–162. 37. Orenes-Piñero E, Barderas R, Rico D, et al. Serum and tissue profil- 16. Carlson ED, Gan R, Hodgman CE, et al. Cell-free protein synthesis: ing in bladder cancer combining protein and tissue arrays. J applications come of age. Biotechnol Adv. 2012;30(5):1185–1194. Proteome Res. 2010;9(1):164–173. 17. Yazaki J, Galli M, Kim AY, et al. Mapping transcription factor inter- 38. Mangé A, Lacombe J, Bascoul-Mollevi C, et al. Serum autoantibody actome networks using HaloTag protein arrays. Proc Natl Acad Sci. signature of ductal carcinoma in situ progression to invasive breast 2016;113(29):E4238–E4247. cancer. Clin Cancer Res. 2012;18(7):1992–2000. •• Transcription factor binding networks identified using func- 39. Marina O, Hainz U, Biernacki MA, et al. Serologic markers of effec- tional analyses on arrays of 4600 Arabidopsis proteins. tive tumor immunity against chronic lymphocytic leukemia include 18. Romanov V, Davidoff SN, Miles AR, et al. A critical comparison of nonmutated B-cell antigens. Cancer Res. 2010;70(4):1344–1355. protein microarray fabrication technologies. Analyst. 2014;139 40. Babel I, Barderas R, Díaz-Uriarte R, et al. Identification of tumor- (6):1303–1326. associated autoantigens for the diagnosis of colorectal cancer in 19. Büssow K, Konthur Z, Lueking A, et al. Protein array technology: serum using high density protein microarrays. Mol Cell Proteomics. potential use in medical diagnostics. Am J Pharmacogenomics. 2009;8(10):2382–2395. 2001;1(1):1–7. 41. Gnjatic S, Wheeler C, Ebner M, et al. Seromic analysis of antibody 20. Boutell JM, Hart DJ, Godber BLJ, et al. Functional protein micro- responses in non-small cell lung cancer patients and healthy arrays for parallel characterisation of p53 mutants. Proteomics. donors using conformational protein arrays. J Immunol Methods. 2004;4(7):1950–1958. 2009;341(1–2):50–58. •• Quantitative analysis of the effect of polymorphic variation on 42. Mias GI, Chen R, Zhang Y, et al. Specific plasma autoantibody p53 function. reactivity in myelodysplastic syndromes. Sci Rep. 2013;3:3311. 640 J. G. DUARTE AND J. M. BLACKBURN

43. Gunawardana CG, Memari N, Diamandis EP. Identifying novel auto- 63. Yin LT, Hu CY, Chang CH. A single layer nitrocellulose substrate for antibody signatures in ovarian cancer using high-density protein fabricating protein chips. Sensors Actuators B Chem. 2008;130 microarrays. Clin Biochem. 2009;42(4–5):426–429. (1):374–378. 44. Hudson ME, Pozdnyakova I, Haines K, et al. Identification of differ- 64. Walter JG, Stahl F, Reck M, et al. Protein microarrays: reduced auto- entially expressed proteins in ovarian cancer using high-density fluorescence and improved LOD. Eng Life Sci. 2010;10(2):103–108. protein microarrays. Proc Natl Acad Sci U S A. 2007;104 65. Balboni I, Limb C, Tenenbaum JD, et al. Evaluation of microarray (44):17494–17499. surfaces and arraying parameters for autoantibody profiling. 45. Gnjatic S, Ritter E, Büchler MW, et al. Seromic profiling of ovarian Proteomics. 2008;8(17):3443–3449. and pancreatic cancer. Proc Natl Acad Sci U S A. 2010;107 66. Holstein CA, Chevalier A, Bennett S, et al. Immobilizing affinity (11):5088–5093. proteins to nitrocellulose: A toolbox for paper-based assay devel- • Serological analysis of 60 pancreatic cancer, 51 ovarian cancer opers. Anal Bioanal Chem. 2016;408(5):1335–1346. and age-matched controls on a ProtoArray identified 29 and 67. Turewicz M, May C, Ahrens M, et al. Improving the default data 202 autoantibodies in the two cancers, respectively. analysis workflow for large autoimmune biomarker discovery stu- 46. Auger I, Balandraud N, Rak J, et al. New autoantigens in rheu- dies with protoarrays. Proteomics. 2013;13(14):2083–2087. matoid arthritis (RA): screening 8268 protein arrays with 68. Jeong JS, Jiang L, Albino E, et al. Rapid identification of monospe- sera from patients with RA. Ann Rheum Dis. 2009;68(4):591– cific monoclonal antibodies using a human proteome microarray. 594. Mol Cell Proteomics. 2012;11(6):O111.016253-O111.016253. 47. Hu S, Vissink A, Arellano M, et al. Identification of autoantibody • Construction of an array of 16,368 unique full-length human biomarkers for primary Sjogren’s syndrome using protein microar- ORFs. rays. Proteomics. 2011;11(8):1499–1507. 69. Hu S, Xie Z, Onishi A, et al. Profiling the human protein-DNA 48. Koo BK, Chae S, Kim KM, et al. Identification of novel autoantibo- interactome reveals ERK2 as a transcriptional repressor of inter- dies in type 1 diabetic patients using a high-density protein micro- feron signaling. Cell. 2009;139(3):610–622. array. Diabetes. 2014;63(9):3022–3032. •• Identification of >ca. 18,000 protein–DNA interactions between 49. Kim SH, Kim JY, Lee HJ, et al. Autoimmunity as a candidate for the 460 DNA motifs and 4191 human proteins of various func- etiopathogenesis of Meniere’s disease: detection of autoimmune tional classes, including a large number of unanticipated pro- reactions and diagnostic biomarker candidate. PLoS One. 2014;9 tein –DNA interactions. (10):e111039. 70. Hu CJ, Song G, Huang W, et al. Identification of new autoantigens 50. Landegren N, Sharon D, Shum AK, et al. Transglutaminase 4 as a for primary biliary cirrhosis using human proteome microarrays. prostate autoantigen in male subfertility. Sci Transl Med. 2015;7 Mol Cell Proteomics. 2012;11(9):669–680. (292):292ra101. 71. Hu C, Huang W, Chen H, et al. Autoantibody profiling on human 51. Liu M, Subramanian V, Christie C, et al. Immune responses to self- proteome microarray for biomarker discovery in cerebrospinal fluid antigens in asthma patients: clinical and immunopathological and sera of neuropsychiatric lupus. PLoS One. 2015;10(5):1–16. implications. Hum Immunol. 2012;73(5):511–516. 72. Syed P, Gupta S, Choudhary S, et al. Autoantibody profiling of 52. Vermeulen N, De Béeck KO, Vermeire S, et al. Identification of a glioma serum samples to identify biomarkers using human pro-

novel autoantigen in inflammatory bowel disease by protein micro- teome arrays. Sci Rep. 2015;5:13895. array. Inflamm Bowel Dis. 2011;17(6):1291–1300. 73. Hu C, Pan J, Song G, et al. Identification of novel biomarkers for 53. Nagele E, Han M, DeMarshall C, et al. Diagnosis of Alzheimer’s behcet disease diagnosis using human proteome microarray disease based on disease-specific autoantibody profiles in human approach. Mol Cell Proteomics. 2017;16(2):147–156. sera. PLoS One. 2011;6(8):1–8. 74. Sjöberg R, Mattsson C, Andersson E, et al. Exploration of high- 54. Han M, Nagele E, DeMarshall C, et al. Diagnosis of parkinson’s density protein microarrays for antibody validation and autoimmu- disease based on disease-specific autoantibody profiles in human nity profiling. N Biotechnol. 2016;33(5):582–592. sera. PLoS One. 2012;7(2):e32383. 75. Yang L, Wang J, Li J, et al. Identification of serum biomarkers for 55. Querol L, Clark PL, Bailey MA, et al. Protein array-based profiling of gastric cancer diagnosis using a human proteome microarray. Mol CSF identifies RBPJ as an autoantigen in multiple sclerosis. Cell Proteomics. 2016;15(2):614–623. Neurology. 2013;81(11):956–963. 76. Chen C-S, Korobkova E, Chen H, et al. A proteome chip approach 56. May C, Nordhoff E, Casjens S, et al. Highly immunoreactive IgG reveals new DNA damage recognition activities in . antibodies directed against a set of twenty human proteins in the Nat Methods. 2008;5(1):69–74. sera of patients with amyotrophic lateral sclerosis identified by 77. Tu YH, Ho YH, Chuang YC, et al. Identification of lactoferricin B protein array. PLoS One. 2014;9(2):e89596. intracellular targets using an escherichia coli proteome chip. PLoS 57. Porcheray F, DeVito J, Yeap BY, et al. Chronic humoral rejection of One. 2011;6(12):e28197. human kidney allografts associates with broad autoantibody 78. Chen C-S, Sullivan S, Anderson T, et al. Identification of novel responses. Transplantation. 2010;89(10):1239–1246. serological biomarkers for inflammatory bowel disease using 58. Li L, Wadia P, Chen R, et al. Identifying compartment-specific non- Escherichia coli proteome chip. Mol Cell Proteomics. 2009;8 HLA targets after renal transplantation by integrating transcrip- (8):1765–1776. tome and “antibodyome” measures. Proc Natl Acad Sci U S A. 79. Deng J, Bi L, Zhou L, et al. Mycobacterium tuberculosis proteome 2009;106(11):4148–4153. microarray for global studies of protein function and immunogeni- 59. Le Roux S, Devys A, Girard C, et al. Biomarkers for the diagnosis city. Cell Rep. 2014;9(6):2317–2329. of the stable kidney transplant and chronic transplant injury • Protein–protein interactions, small-molecule–protein binding, using the ProtoArray technology. Transplant Proc. 2010;42 protein kinase substrate identification and c-di-GMP binding (9):3475–3481. proteins identified on a bacterial proteome microarray. 60. Butte AJ, Sigdel TK, Wadia PP, et al. Protein microarrays discover 80. Uhlén M, Björling E, Agaton C, et al. A human protein atlas for angiotensinogen and PRKRIP1 as novel targets for autoantibodies normal and cancer tissues based on antibody proteomics. Mol Cell in chronic renal disease. Mol Cell Proteomics. 2011;10(3): Proteomics. 2005;4(12):1920–1932. M110.000497. 81. Pontén FK, Schwenk JM, Asplund A, et al. The human protein atlas 61. Zhu H, Snyder M. Protein chip technology. Curr Opin Chem Biol. as a proteomic resource for biomarker discovery. J Intern Med. 2003;7(1):55–63. 2011;270(5):428–446. 62. Derwinska K, Sauer U, Preininger C. Reproducibility of hydrogel 82. Uhlen M, Oksvold P, Fagerberg L, et al. Towards a knowledge- slides in on-chip immunoassays with respect to scanning mode, based human protein atlas. Nat Biotechnol. 2010;28(12):1248–1250. spot circularity, and data filtering. Anal Biochem. 2007;370 83. Uhlén M, Fagerberg L, Hallström BM, et al. Tissue-based map of the (1):38–46. human proteome. Science (80-.). 2015;347(6220):1260419–1260419. EXPERT REVIEW OF PROTEOMICS 641

84. Lindskog M, Rockberg J, Uhlén M, et al. Selection of protein epi- • Induction of elevated autoantibody levels against 24 human topes for antibody production. Biotechniques. 2005;38(5):723–727. proteins as a result of a Plasmodium infection. 85. Berglund L, Björling E, Jonasson K, et al. A whole-genome bioinfor- 105. Suwarnalata G, Tan AH, Isa H, et al. Augmentation of autoantibo- matics approach to selection of antigens for systematic antibody dies by Helicobacter pylori in Parkinson’s disease patients may be generation. Proteomics. 2008;8(14):2832–2839. linked to greater severity. PLoS One. 2016;11(4):1–17. •• Strategy for selecting human protein epitope signature tags • Serological screening of 30 Helicabacter pylori-seropositive for immunization and protein arrays. Parkinson’s disease patients and matched controls identified 86. Larsson M, Gräslund S, Yuan L, et al. High-throughput protein 13 significant autoantibodies, including against proteins expression of cDNA products as a tool in functional genomics. J essential for normal neurological functions. Biotechnol. 2000;80(2):143–157. 106. Cronan JE. The biotinyl domain of Escherichia coli acetyl-CoA car- 87. Tegel H, Tourle S, Ottosson J, et al. Increased levels of recombinant boxylase: evidence that the “thumb” structure is essential and that human proteins with the Escherichia coli strain Rosetta(DE3). the domain functions as a dimer. J Biol Chem. 2001;276(40):37355– Protein Expr Purif. 2010;69(2):159–167. 37364. 88. Henjes F, Lourido L, Ruiz-Romero C, et al. Analysis of autoantibody 107. Kost TA, Condreay JP, Jarvis DL. Baculovirus as versatile vectors for profiles in osteoarthritis using comprehensive protein array con- protein expression in insect and mammalian cells. Nat Biotechnol. cepts. J Proteome Res. 2014;13(11):5218–5229. 2005;23(5):567–575. 89. Häggmark A, Hamsten C, Wiklundh E, et al. Proteomic profiling 108. Contreras-Gómez A, Sánchez-Mirón A, García-Camacho F, et al. reveals autoimmune targets in sarcoidosis. Am J Respir Crit Care Protein production using the baculovirus-insect cell expression Med. 2015;191(5):574–583. system. Biotechnol Prog. 2014;30(1):1–18. 90. Ayoglu B, Häggmark A, Khademi M, et al. Autoantibody profiling in 109. Chapman-Smith A, Cronan JE. The enzymatic biotinylation of pro- multiple sclerosis using arrays of human protein fragments. Mol teins: a post-translational modification of exceptional specificity. Cell Proteomics. 2013;12(9):2657–2672. Trends Biochem Sci. 1999;24(9):359–363. 91. Ayoglu B, Mitsios N, Kockum I, et al. Anoctamin 2 identified as an 110. Dyson M, Hart D, Samaddar M, et al. comprising a autoimmune target in multiple sclerosis. Proc Natl Acad Sci. biotinylation domain and method for increasing solubility and 2016;113(8):2188–2193. determining folding state. Patent numbers EP1,470,229; 92. Wang J, Barker K, Steel J, et al. A versatile protein microarray US8,999,897; JP4,377,242; CA2,474,457. 2015. platform enabling antibody profiling against denatured proteins. 111. Waldo GS, Standish BM, Berendzen J, et al. Rapid protein-folding assay Proteomics Clin Appl. 2013;7(0):378–383. using green fluorescent protein. Nat Biotechnol. 1999;17(7):691–695. 93. Manzano-Román R, Díaz-Martín V, González-González M, et al. Self- 112. Pédelacq J-D, Piltch E, Liong EC, et al. Engineering soluble proteins assembled protein arrays from an ornithodoros moubata salivary for structural genomics. Nat Biotechnol. 2002;20(9):927–932. gland expression library. J Proteome Res. 2012;11(12):5972–5982. 113. Koopmann J-O, Blackburn J. High affinity capture surface for 94. Miersch S, Bian X, Wallstrom G, et al. Serological autoantibody matrix-assisted laser desorption/ionisation compatible protein profiling of type 1 diabetes by protein arrays. J Proteomics. microarrays. Rapid Commun Mass Spectrom. 2003;17(5):455–462. 2013;94:486–496. 114. Duarte JG. Proteomic studies on patient responses to chemother-

95. Bian X, Wasserfall C, Wallstrom G, et al. Tracking the antibody apy, radiotherapy and immunotherapy in cancers. PhD immunome in type 1 diabetes using protein arrays. J Proteome thesis. University of Cape Town (Cape Town, RSA). 2015. Res. 2017;16(1):195–203. 115. Katchman BA, Barderas R, Alam R, et al. Proteomic mapping of p53 96. Gibson DS, Qiu J, Mendoza EA, et al. Circulating and synovial anti- immunogenicity in pancreatic, ovarian, and breast cancers. body profiling of juvenile arthritis patients by nucleic acid pro- PROTEOMICS Clin Appl. 2016;10(7):720–731. grammable protein arrays. Arthritis Res Ther. 2012;14(2):R77. • Anti-p53 autoantibodies identified in pancreatic, ovarian, and 97. Wright C, Sibani S, Trudgian D, et al. Detection of multiple auto- breast cancer patient sera showed a polyclonal response to antibodies in patients with ankylosing spondylitis using nucleic common epitopes shared between p53 point mutant proteins. acid programmable protein arrays. Mol Cell Proteomics. 2012;11 116. Speers AE, Cravatt BF. Profiling enzyme activities in vivo using click (2):M9.00384. chemistry methods. Chem Biol. 2004;11(4):535–546. 98. Anderson KS, Sibani S, Wallstrom G, et al. Protein microarray sig- 117. Nomura DK, Dix MM, Cravatt BF. Activity-based protein profiling for nature of autoantibody biomarkers for the early detection of breast biochemical pathway discovery in cancer. Nat Rev Cancer. 2010;10 cancer. J Proteome Res. 2011;10(1):85–96. (9):630–638. 99. Anderson KS, Cramer DW, Sibani S, et al. Autoantibody signature 118. Brazma A, Hingamp P, Quackenbush J, et al. Minimum information for the serologic detection of ovarian cancer. J Proteome Res. about a microarray experiment (MIAME)-toward standards for 2015;14(1):578–586. microarray data. Nat Genet. 2001;29(4):365–371. 100. Ceroni A, Sibani S, Baiker A, et al. Systematic analysis of the IgG anti- 119. Quackenbush J. Data reporting standards: making the things we body immune response against varicella zoster virus (VZV) using a self- use better. Genome Med. 2009;1(11):111.1-111.3. assembled protein microarray. Mol Biosyst. 2010;6(9):1604–1610. 120. Taylor CF, Paton NW, Lilley KS, et al. The minimum information 101. Yu X, LaBaer J. High-throughput identification of proteins with about a proteomics experiment (MIAPE). Nat Biotechnol. 2007;25 AMPylation using self-assembled human protein (NAPPA) microar- (8):887–893. rays. Nat Protoc. 2015;10(5):756–767. 121. Liu Y, McBride R, Stoll M, et al. The minimum information required • Identification of novel human adenylation substrates using for a glycomics experiment (MIRAGE) project: improving the stan- NAPA arrays, read out using click chemistry. dards for reporting glycan microarray-based data. Glycobiology. 102. Olia AS, Barker K, McCullough CE, et al. Nonenzymatic protein 2016;26(9):907–910. acetylation detected by NAPPA protein arrays. ACS Chem Biol. 122. Leitner A, Reischl R, Walzthoeni T, et al. Expanding the chemical 2015;10(9):2034–2047. cross-linking toolbox by the use of multiple proteases and enrich- 103. McAndrew M, Wheeler C, Koopmann J, et al. Novel autoantibody ment by size exclusion . Mol Cell Proteomics. biomarkers for the improved diagnosis of systemic lupus erythe- 2012;11(3):M111.014126-1-M111.014126-12. matosus. Ann Rheum Dis. 2013;72:A760. 123. Médard G, Pachl F, Ruprecht B, et al. Optimized chemical proteo- 104. Liew J, Amir A, Chen Y, et al. Autoantibody profile of patients mics assay for kinase inhibitor profiling. J Proteome Res. 2015;14 infected with knowlesi malaria. Clin Chim Acta. 2015;448:33–38. (3):1574–1586.