bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Noname manuscript No. (will be inserted by the editor)

Centromere Detection of Metaphase Images using a Candidate Based Method

Akila Subasinghe · Jagath Samarabandu · Yanxin Li · Ruth Wilkins · Farrah Flegal · Joan H. Knoll · Peter K. Rogan

Received: date / Accepted: date

Abstract Accurate detection of the human metaphase locations yielding a detection accuracy of chromosome centromere is an critical element of cytoge- 87%. We also introduce a Candidate Based Centromere netic diagnostic techniques, including chromosome enu- Confidence (CBCC) metric which indicates an approx- meration, karyotyping and radiation biodosimetry. Ex- imate confidence value of a given centromere detection isting image processing methods can perform poorly in and can be readily extended into other candidate re- the presence of irregular boundaries, shape variations lated detection problems. and premature sister separation, which can Keywords Centromere detection · Chromosome adversely affect centromere localization. We present a analysis · Laplacian based thickness measurement · centromere detection algorithm that uses a novel profile Support vector machines thickness measurement technique on irregular chromo- some structures defined by contour partitioning. Our al- gorithm generates a set of centromere candidates which 1 Introduction are then evaluated based on a set of features derived from images of . Our method also parti- The centromere of a human chromosome (figure 1) is tions the chromosome contour to isolate its re- the primary constriction to which the spindle fiber is gions and then detects and corrects for sister chromatid attached during the division cycle (). The separation. When tested with a chromosome database detection of this salient point is the key to calculating consisting of 1400 chromosomes collected from 40 meta- the centromere index which can lead to the type and phase cell images, the candidate based centromere de- the number of a given chromosome. The reliable de- tection algorithm was able to accurately localize 1220 tection of the centromere by image analysis techniques is challenging due to the high morphological variations A. Subasinghe and of chromosomes on microscope slides. This variation is Department of Electrical and Electronics Engineering, Uni- caused by various cell preparatory and staining meth- versity of Sri Jayewardenepura, Nugegoda, Sri Lanka. ods along with many other factors during mitosis. Irreg- J. Samarabandu ular boundaries and large variations in morphology of Department of Electrical and Computer Engineering, West- the chromosome can cause a detection algorithm to miss ern University, ON, Canada. the constriction, especially in higher resolution chromo- R. Wilkins somes. Premature sister chromatid separation can also Health Canada, 775 Brookfield Road, PL 6303B, Ottawa, ON and Canadian Nuclear Laboratories, 20 Forest Avenue, Deep pose a significant challenge, since the degree of sepa- River, ON K0J 1P0. ration varies from cell to cell, and even among chro- F. Flegal mosomes in the same cell. In such cases, the width con- Canadian Nuclear Laboratories, Plant Rd, STN 51, Chalk striction can be missed by image processing algorithms, River, ON, K0J 1J0. and can result in incorrect localization of a centromere Y. Li, J. Knoll and P. Rogan on one of the sister . Departments of Pathology and Biochemistry, Schulich School From an image analysis perspective, the high mor- of Medicine & Dentistry, Western University, ON, Canada. phological variations in human chromosomes due to bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

2 Akila Subasinghe et al.

Fig. 1 Demonstrates the anatomy of a human metaphase chromosome using a simple graphical design with key com- (a) (b) (c) ponents labeled. their non rigid nature pose a significant challenge. Cell preparation and staining techniques and also vary among on the laboratories. The end results obtained from clin- ical cytogenetic vs. reference biodosimetry laboratories (d) (e) (f) can produce chromosome images that differ significantly in their appearance. As an example, chromosomes that Fig. 2 Depicts various degrees of sister chromatid separa- were DAPI (4’,6-Diamidino-2-Phenylindole) stained wo- tion present in some Geimsa stained chromosome cell images uld demonstrate different intensity features and bound- (fig 2(a), (b) & (c)) as well as some lengthy chromosomes ary characteristics from chromosomes subjected only to characteristic to those prepared at a cytogenetic laboratory (fig 2(d), (e) & (f)). Gei-msa staining. Additionally, the stage of metaphase in which the cells were arrested along with environ- mental factors such as humidity can dictate the shape proach to derive the centerline and then used a curva- characteristics of each individual cell and introduce a ture measure to localize the centromere location instead large variance to the data set. Furthermore, in some of the width measurements [6]. Another interesting ap- preparatory methods, the cells are denatured introduce proach by Jahani and Setarehdan involves artificially significant noise at the chromosome boundary. These straightening chromosomes prior to creating the trellis same factors can also dictate the amount of premature structure using the centerline derived through morpho- sister chromatid separation in some of the cells. Effec- logical thinning [7]. Yet all these methods, including tive algorithms for centromere detection need to be able our previous approach, work well only with smooth ob- to handle the high degree of shape variability present ject boundaries. The absence of a smooth boundary will in different chromosomes, while correcting for artifacts directly affect the centerline and thus make the fea- such as premature sister chromatid separation. Figure 2 ture calculations noisy. Furthermore, the accuracy of below illustrates a sample set of shapes of chromosomes all these methods is adversely impacted by sister chro- in the data set and their high morphological variations. matid separation. This research is a prerequisite for the development We propose a candidate based centromere localiza- of a set of algorithms for detecting dicentric chromo- tion algorithm capable of processing highly bent chro- somes (possessing two ) which are diagnos- mosome images prepared with a variety of staining tech- tic of radiation exposures in cytogenetic biodosimetry. niques. This method can also detect and correct for The ability of the proposed algorithm to handle high the artifacts introduced by premature sister chromatid degrees of morphological variations and also to detect separation. Since centerline-based methods tend to per- and correct for the artifact created by premature sister form better than other methods, we have proposed an chromatid separation in cell images is also critical to algorithm which utilizes the centerline simply to di- detecting dicentric chromosomal abnormalities. vide the chromosome contour into two nearly symmet- Numerous computer algorithms have been proposed ric partitions, rather than using this feature as a basis over time for chromosome analysis ranging from metap- for width measurements. In centerline-based methods, hase finding [1], analysis [2] to centromere boundary irregularities often get embedded in the cen- and dicentric detection [3], [4]. These methods are ei- terline, which therefore introduces noise into the width ther constrained by the protocol used for staining the profiles. By avoiding the centerline as the basis for mea- cell image or by the morphology of the chromosome. surements, the condition of the chromosome boundary We have previously proposed an algorithm to locate does not impact the smoothness of width profile mea- the centromere by calculating a centerline with no spu- surements. Then, a Laplacian-based algorithm that in- rious branches irrespective of boundary irregularities tegrates intensity measurements in a weighting scheme, or the morphology of the chromosome [5]. Mohammad biases the thickness measurement by tracing vectors proposed an approach where he used our previous ap- across regions of homogeneous intensity. To address im- bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Centromere Detection of Human Metaphase Chromosome Images using a Candidate Based Method 3 age processing artifacts arising from sister chromosome the axis of symmetry and a modified Laplacian based separation, an improved contour partitioning algorithm thickness measurement algorithm (called Intensity In- is presented. This paper also introduces the Candidate- tegrated Laplacian or IIL) is used to calculate the width Based Centromere Confidence (CBCC) metric, which profile of the chromosome [8]. This profile is then used measures the confidence in an accurate centromere as- to identify a possible set of candidates for centromere signment [8]. This metric is used in tests of the algo- location(s) and features are calculated for each of those rithm on a large data set of chromosomes, with the aim locations. Next, a third classifier is trained on expert- of validating the performance of the algorithm. classified chromosomes to detect centromere locations in test chromosomes. In most instances, each chromo- some will contain at least one centromere. therefore, the 2 Methods correct centromere should be present among the candi- dates. The distance from the separating hyperplane is The following section will describe the proposed can- used as an indicator for the goodness of fit of a given didate based centromere detection algorithm in detail. candidate and thereafter used to select the best candi- This method can be functionally divided into following date from the pool of candidates. We test the hypothesis sections for clarity, that best candidate for a given chromosome is a cen- – Segmentation & centerline extraction tromere location. In the following sections, we present – Contour partitioning & correcting for sister chro- further details of the algorithm. Section 2.1 and 2.3 have matid separation been previously published [8] and have been included – Chromosome thickness measurement using the in- in this paper to improve readability. tensity integrated Laplacian method – Candidate point generation & metaphase centromere detection 2.1 Segmentation & Centerline extraction The proposed method operates on well-separated chromosomes that do not overlap or touch others. To Once the user selects a point contained within a chro- ensure that the metaphase cells with the maximum mosome, a region of interest (ROI) containing the chro- number of segmentable chromosomes are analyzed, cells mosome is selected and extracted. These regions will be with incomplete chromosome complements and those processed separately in subsequent steps. Pre-processing with higher densities of overlapping and touching chro- consists of the application of a median filter followed mosomes are initially deprecated using a content-based by intensity normalization for the extracted window classification procedure [9]. around the chromosome. The chromosome is first thresh- To develop the method, individual chromosomes were olded using Otsu’s method and then the contour of that first selected manually, while the remainder of the pro- binary object is used as the initial contour for GVF ac- cess has been automated. Our Automated Dicentric tive contours. The use of GVF active contours provides Chromosome Identifier (ADCI) software also automat- a contour that is smooth and that converges to bound- ically selects individual chromosomes [10]. Once a chro- ary concavities [14]. mosome is selected, the proposed method segments this In order to calculate the width profile of the chro- object by local thresholding [11] and Gradient Vec- mosome using the thickness measuring algorithm, the tor Flow (GVF) active contours [12]. Upon extraction chromosome contour is divided longitudinally into two based on the GVF boundaries, the contour of the chro- approximately symmetric segments. We used Discrete mosome is partitioned using a polygonal shape simpli- Curve Evolution (DCE)- based skeletal pruning [15], [5] fication algorithm known as Discrete Curve Evolution to obtain an accurate centerline. DCE is a polygon evo- (DCE) [13] which simplifies the shape of the object by lution algorithm which evolves through vertex deletion iteratively deleting vertices based on their importance based on a relevance measurement [15]. The objective to the overall shape of the object. Then a Support Vec- of this stage of the algorithm is to obtain the set of an- tor Machine (SVM) classifier is used to pick the best chor points (end points of the centerline). This set of set of points to isolate the telomeric regions i.e. at the anchor points is denoted by EP ( EP = 2), where |.| ends of the chromosome. The segmented telomere re- is the cardinality operator). gions are then tested for evidence of sister chromatid Since the minimum polygon is a triangle, the end separation using a second trained SVM classifier de- result of this process is a skeleton traversing the length signed to capture shape characteristics of the telom- of the chromosome that is connected to a single spu- ere regions and then corrected for that artifact. After- rious branch (3 branches). Throughout this paper, we wards, the chromosome is split into two partitions along use the supercript P to refer to various point sets on bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

4 Akila Subasinghe et al. the chromosome object contour. If C ∈ R2 is the con- those 6 points will be selected as the end points of the tour of the chromosome, the DCE initial anchor points centerline calculated in section 2.1. Contour partition- (skeletal end points) for the centerline are denoted by ing is performed by selecting the best 4 point combina-

Pˆ Pˆ tion (including the two anchor points) that represents E ( E = 3). In order to obtain the centerline, the shortest branch of the resulting skeleton is pruned out all the telomere end points. to reveal the centerline of the chromosome. This yields The approach for selecting the optimal contour par- the set of anchor points EP . This set of points is used titioning point combination occurs in two stages. Ini- for contour partitioning in the next section. The center- tially, a SVM classifier is trained to detect and label line was then shortened by 10% to discard any skeletal preferred combinations from the given 12 possible com- bifurcations that occur close to the end of the chromo- binations for each chromosome. At this stage, all the some. combinations across the data set are used as a pool of candidates to train the classifier. Then, the signed Euclidian distance from the separating hyperplane (say ρ) is computed for each of the candidates for a given 2.2 Contour partitioning & correcting for sister chromosome, considering only the combinations of that chromatid separation chromosome. This process ranks all the candidates ac- cording to the likelihood they are a preferred candidate. Sister chromatid separation in chromosomes is an inte- Unlike traditional rule-based ranking algorithms, this gral process that occurs during the metaphase stage of approach requires very little high level knowledge of the mitosis. Depending on the stage of mitosis at which the desirable characteristics. The positioning of the sepa- cells were arrested, varying degrees of sister chromatid rating hyperplane encapsulates this high level informa- separation may be evident. Furthermore colcemid, a tion through user-specified ground truth. The highest- chemical agent which is used mainly as a preparatory ranked candidate is the best combination of contour chemical in biodosimetry studies, can cause or exac- partitions for the given chromosome. The formal de- erbate this condition and prematurely produce sister scription of this procedure follows. chromatid separation on metaphase cells. It is impor- Features used for contour segmentation will be rep- tant that the algorithm and associated software be able resented by F s, while a second set of features used to analyze chromosomes with sister chromatid separa- for centromere localization will be denoted by F c (dis- tion. cussed in section 2.4). In order to identify and correct for sister chromatid Let Φ be the curvature value at candidate point h separation, we proposed an automated contour parti- h and S ∈ 2 be the skeleton of the chromosome with 6 tioning and shape matching algorithm. Our chromo- R DCE point stop criteria. We now define the following some thickness measurement algorithm requires an ap- set of points, proximate symmetric division of the contour of the chro- mosome. Accurate partitioning of the telomere region is – DP (⊂ C) is the set of six DCE vertices. necessary to identify evidence of sister chromatid sepa- – SP constitutes of all the points in DP except the an- ration and therefore correct for any such artifact as well chor points (EP ). These are the four telomere end- as to split the contour into two segments accurately. point candidates. Curvature of the contour is one of the most commonly used features for detecting salient points that can be Then the family of sets T P for all possible combi- used for partitioning [16]. An important requirement nations with the sets EP and SP would contain,  P P P P  P P P P is that the location of these salient points needs to be E1 ,S1 ,E2 ,S2 , E1 ,S1 ,E2 ,S3 ,  P P P P  P P P P highly repeatable under varying levels of object bound- E1 ,S1 ,E2 ,S4 , E1 ,S2 ,E2 ,S1 ,  P P P P  P P P P ary noise. The DCE method described in the previous E1 ,S2 ,E2 ,S3 , E1 ,S2 ,E2 ,S4 ,  P P P P  P P P P section was adapted to provide a set of initial salient E1 ,S3 ,E2 ,S1 , E1 ,S3 ,E2 ,S2 ,  P P P P  P P P P points on the contour of the chromosome outline. This E1 ,S3 ,E2 ,S4 , E1 ,S4 ,E2 ,S1 ,  P P P P  P P P P is because this method performs well with boundaries E1 ,S4 ,E2 ,S2 , E1 ,S4 ,E2 ,S3 . regardless whether they are smooth or not, yielding re- Figure 3 illustrates one such combination where the peatable results [17]. The ability to terminate the pro- selected (connected by the blue line segments) combi- cess of DCE shape evolution at a given number of ver- nation for the contour partitioning points are given by  P P P P tices further lends to its applicability. It was empirically E1 ,S4 ,E2 ,S1 . established that a termination at 6 DCE points would In order to identify the best possible combination ensure that the required telomere end points will be re- for contour partitioning, we have used a SVM classifier tained within the set of candidate salient points. Two of trained with the 11 different features (F s) indicated bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Centromere Detection of Human Metaphase Chromosome Images using a Candidate Based Method 5

the set of features F s for each contour partitioning com- bination is defined as follows,

s P 1. F1 = 1 if the point Si belongs to a skeletal end P s point (Si ∈ (S ∩ C)). Otherwise, F1 = 0. s P 2. F2 = 1 if the point Sj belongs to a skeletal end P s point (Sj ∈ (S ∩ C)). Otherwise, F2 = 0. h i 3. F s = 1 − r1−r2 where 0 < F s < 1. This 3 max(r1,r2) 3 calculates the chromosome width/length ratio for each anchor point and the difference between the two measures. Two similar fractions would result in a high value for the feature F s. h i 3 s r1 s 4. F4 = 1 − max(r ,r ) where 0 < F4 < 1. This Fig. 3 Demonstrates one possible combination for contour 1 2 P calculates the chromosome width/length ratio with partitioning where the anchor point (red ’+’ sign) E1 is con- P P nected with the candidate point S4 while the other anchor respect to the first anchor point (E1 ). Except for P P point E2 is connected with candidate point S1 which cap- smallest chromosomes at the highest degree of meta- tures the telomere regions. The (blue) line connects the set phase condensation, the telomere axis is shorter than of points constituting the considered combination in this in- stance. the longitudinal dimension of the chromosome. There- fore, a lower length ratio measurement is a higher value for the feature F s and is a desirable property. s s h i4 below. Features F1 and F2 provide an indication to s r2 s 5. F5 = 1 − where 0 < F5 < 1. This is the saliency of the candidate point with respect to the max(r1,r2) s P skeletonization process. Features F s to F s are three same as F4 , but from the other anchor point, E2 . 3 5 s normalized features which capture the positioning of 6. F6 : ratio of length of the chromosome to area of the s s chromosome. This provides a measure of elongation each candidate in the given combination. F6 and F7 represent the shape or the morphology of the chromo- of a chromosome. s some of interest (same values for all 12 combinations). 7. F7 : ratio value of perimeter of the chromosome to The rationale behind the inclusion of these features is the area of the chromosome. This provides a mea- that they account for morphological variations across sure of how noisy the object boundaries are. s s s 8. F8 : average of the curvature values Φh of the candi- the cell images in the data set. F8 and F9 represent the curvature of the candidate points as well as the concav- dates. The curvature is an important measurement ity/convexity of those locations. The features F s and of the saliency of the candidate points. 10 s F s are two Euclidean distance-based features which 9. F9 : number of the negative curvature values (Φh < 11 P P  capture the proportion of each telomere region in the 0) of the candidates points Si and Sj . The telom- combination to the perimeter of the rectangle made by ere region end points are generally characterized by connecting the 4 candidate points. During our investi- points with high convexity. The number of negative angles yield how concave the points of interest are. gation, we observed a significant improvement of the P P s d(E1 ,Si ) Px=2 P P accuracy of classification by the inclusion of these two 10. F10 = D where D = x=1,y=i,j d(Ex ,Sy ). features. This feature calculates the normalized Euclidean dis- E Let d (p, q) denote the Euclidean distance between tance between the anchor point P1 and the candi- S the points p and q. Similarly let l (p, q) represent the date Pi which makes up one telomere region. d(EP ,SP ) length of the curve between p and q, which are points s 2 j Px=2 P P 11. F11 = D where D = x=1,y=i,j d(Ex ,Sy ). P s from the set D . Then, for each contour partitioning This is the same as feature F10, but calculated for P  P P P P combination in T given by E1 ,Si ,E2 ,Sj (where the other anchor point. i and j are integer values such that 1 ≤ i, j ≤ 4 and i 6= j), two main length measurement ratios (r1 and r2) A data set of 1400 chromosomes was collected from are used for both calculating length based features, as 40 metaphase cell images, which together yield 16,800 P P l(E1 ,Si ) possible combinations of feature sets for contour par- well as for normalizing these features. r1 = l(EP ,SP ) 1 j titioning. Three expert cytogeneticists marked the vi- yields the chromosome width/length with respect to P able combinations of the salient points that capture the anchor point E1 for the given contour partitioning P P the telomere regions for training the SVM classifier. l(E2 ,Si ) combination (refer figure 3). Similarly r2 = P P is l(E2 ,Sj ) The procedure involved training and testing with 2 fold P calculated with respect to the anchor point E2 . Then, cross validation (50% - train data, 50% - test data). We bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

6 Akila Subasinghe et al. obtained values of accuracy, sensitivity and specificity profile could still be adversely affected by irregularities values of 94%, 97% and 68%, respectively. The results in the object boundary. demonstrate the ability of the feature set to effectively We have previously presented an algorithm, in which detect good combinations of candidate points for parti- intensity was introduced as an additional feature into tioning telomere regions. The slightly lower specificity the standard Laplacian-based approach to improve its suggests that some false positive were detec- accuracy by making it less dependent on the object tion. However, this does not affect the accuracy of the contour [8]. The intensity information is included as contour partitioning, since the algorithm picks the op- an additional feature in the calculation in the form of timal combination based on its rank rather than the a weighting scheme for the Laplacian kernel. This bi- classification label. ases the flow of heat towards similar intensities. The Correcting the deviation of the centerline for the ef- intensity feature aids in minimizing the impact from ir- fects of premature sister chromatid separation can be regular boundary of the chromosome segmentation by a difficult problem to solve. Once the best combination guiding the width profile trace lines to be contained for the end points of the telomere region is selected, within chromosome bands, which are regions with sim- the telomere portions are segmented. Premature sister ilar intensities. chromatid separation is detected from differences in the chromosome shape in the telomere region. This problem is solved with an algorithm that creates a set of features 2.4 Candidate point generation & metaphase using functional approximation of the shape character- centromere detection istics unique to premature sister chromatid separation and is derived from the coefficients calculated for each In a previously described candidate-based approach, telomere [8]. A second SVM classifier is trained on these four candidate points were selected based on the min- features to effectively detect these inherent shape vari- ima values from the width profile [21]. However, this ations of the sister chromatids. Once identified, correc- limits the number of possible locations that could be tion is performed by extending the sample point (on detected as the centromere location. Especially in cases the pruned centerline) to pass through the mid point of where a high degree of sister chromatid separation is the partitioned telomere region. By getting the contour evident, limiting the search to just few candidates can partitioned accurately, the correction process is signifi- have adverse effects. Therefore, we consider all possible cantly simplified. local minima locations as candidates for the centromere location in a given chromosome, which are selected us- ing the simple criteria given below. 2.3 Intensity integrated Laplacian method

The width profile, which is defined as the sequential width measurements along the centerline or the axis of symmetry of the chromosome, is an important measure- ment used for deriving the centromere location. In previous studies, Laplacian-based thickness mea- surement algorithms have been utilized for cortical thick- ness measurement applications [18], [19]. This method takes the second order derivative of the object contour Fig. 4 Illustrates an example where the contour C is split and solves the Laplacian heat equation to obtain the into two approximately symmetric segments C1 and C1. The width trace line, in red, connects the points C1 and C2 of steady state condition. This is performed by splitting λ λ the two contour segments. the object into two similar segments, maintaining them at different temperature levels and then allowing heat flow between the segments. The width profile of the Our notation p is used to refer to any other point(s), chromosome is obtained by creating trace lines extend- in general. Let the contour C be partitioned into two ing from one contour segment to the other following the contour segments C1 (starting segment for tracing lines) static vector field created at steady state conditions on and C2 (see figure 4). The width measurement of the the potential field. The width profile calculated using normalized width profile at the discrete index λ (W (λ)) this approach was observed to be more uniform and less is obtained using the trace line which connects the con- 1 2 1 noisy relative to analogous approaches based on center- tour points Cλ and Cλ from the two contours C and line of the chromosome [20]. However, as a consequence C2. Then, the set of candidate points for the centromere of the sole dependency on the object contour, the width location pC (which stores the indices λ), where the local bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Centromere Detection of Human Metaphase Chromosome Images using a Candidate Based Method 7

c C mR minima conditions of W (λ − 1) < W (λ) < W (λ + 1) 2. F2 = W (p (i)) − W (p (i)) . This feature cal- and W (λ − 2) < W (λ) < W (λ + 2) are fulfilled for all culates the absolute width profile difference between valid locations λ of the width profile. In cases where the the candidate and the regional maxima beyond the above condition failed to secure any candidates (mainly candidate point on the width profile. c c c on extremely short chromosomes), the global minima 3. F3 = F1 +F2 which calculates the combined width was selected as the only candidate. Next, the following profile difference created by the candidate point. c C two sets of indices are created to correspond with each 4. F4 = W (p (i)). This captures the value of the C C c given element p (α) of p , width profile (0 ≤ F4 ≤ 1) at the candidate point location. – pmL(α) = W (β) where W (β) > W (γ), ∀ γ < pC (α). c 5. F5 is the local curvature value at the contour point Here pmL(α) stores the index of the global maxima 1 Cλ which corresponds to the current centromere for the portion (referred to as a regional maxima, candidate location (where λ = pC (i)) henceforth) of the width profile prior to the candi- c 6. F6 is the local curvature value at the contour point date minima index pC (α). 2 Cλ which corresponds to the current centromere mR C – p (α) = W (β) where W (β) > W (γ), ∀ γ > p (α). candidate location (where λ = pC (i)) Similarly, pmR(α) stores the index of the global max- c 7. F7 = min (d(1, i), L − d(1, i)) /L. Gives a measure ima for the portion of the width profile after the where the candidate is located with respect to the candidate minima index pC (α). c chromosome as a fractional measure (0 ≤ F7 ≤ 0.5) 8. F c = W (pC (i))/W¯ , where W¯ is the average of the Once the centromere candidate points pC and their 8 mL mR width profile of the chromosome. This includes the corresponding maxima points p and p are cal- significance of the candidate point minima with re- culated, the set of features F c are calculated as given c spect to the average width of the given chromosome. below. A set of 11 features Fk are proposed to train the c mL C 9. F9 = d(p (i), p (i))/L. This gives the distance third SVM classifier which will then be used to calculate between the candidate point location and the re- the best candidate for a centromere location in a given c c gional maxima value prior to the candidate point, chromosome. Features F1 to F3 provide an insight on normalized by the total length of the chromosome. the significance of the candidate point with respect to 10. F c = d(pC (i), pmR(i))/L. This gives the distance the general width profile distribution. The normalized 10 c between the candidate point location and the re- width profile value itself is embedded in features F4 and c gional maxima value beyond the candidate point, F8 where the latter scales the minima based on the av- c c normalized by the total length of the chromosome. erage value of the width profile. Features F5 and F6 c 11. F11 is a boolean feature used to indicate the staining capture the contour curvature values that are intrin- process used during cell preparation. A logical ’0’ sic to the constriction at the centromere location. Fea- c c c would indicate the use of DAPI chromosome stain- tures F7 ,F9 and F10 include distance measures which ing while ’1’ would indicate a Geimsa-stained cell. indicate the positioning of the candidate point with re- spect to the chromosome as well as to the width pro- The detection of the centromere location assumes c that each chromosome at least contains one centromere file shape. Finally the feature F11 records the staining method used in the cell preparation. This gives the clas- location within the chromosome. This is a reasonable sifier a crucial piece of information that is then used to assumption, since the centromere region is an integral accommodate for specific shape features that may be part of chromosome anatomy which is normally retained the result of the particular laboratory procedure used in cell division, with the exception of acentric fragments to prepare and stain the sample. produced by excessive radiation exposure, or rarely in Let i be a candidate member number assigned among congenital and neoplastic conditions. This assumption the pool of centromere candidates. Also, let d(1, i) be transforms the detection problem into a ranking prob- the Euclidean distance along the midpoints of the width lem in which we pick the best out of a pool of candi- profile trace lines (centerline) from a telomere to the dates. Therefore, this enables the same approach to be candidate point, and L be the total length of the chro- adopted that was utilized for the contour partitioning mosome. Then, the set of features F c are stated as be- algorithm (section 2.2); ie. in which the distance from low, where k.k yields the absolute value, the separating hyperplane (ρ) represents a measure of goodness-of-fit for a given candidate. This metric re- c C mL 1. F1 = W (p (i)) − W (p (i)) . This feature cal- duces the multidimensional feature space to a single di- culates the absolute width profile difference between mension, which inherently reduces the complexity of the the candidate and the regional maxima prior to the ranking procedure for the candidate locations. Since the candidate point on the width profile. large margin binary classifier (SVM) orients the sepa- bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

8 Akila Subasinghe et al. rating hyperplane in the feature space, the 1D distance 3 Results metric directly relates to how well a given candidate fits into the general characteristics of a given class la- The complete data set used for developing and test- bel. A detailed introduction to the candidate-based cen- ing the algorithm discussed in this paper consists of 40 tromere confidence metric is provided in the following metaphase cell images, of which 38 consisted from ir- section. radiated samples obtained from cytogenetic biodosime- try laboratories and 2 were non-irradiated samples from clinical cytogenetic laboratories. The chromosome data set comprised images of 18 Giemsa-stained cells and 22 DAPI-stained cells. The cells with minimal touch- 2.4.1 Candidate-based centromere confidence (CBCC) ing and overlapping chromosomes (a good metaphase spread) was manually selected from a pool of 1068 cell Although existing measures of accuracy can establish images for this experiment. Then 40 cell images were performance of machine learning applications, these mea- selected to represent both DAPI (55%) and Giemsa sures do not provide information on the reliability of the (45%) staining methods. During ground truth evalu- method. We developed a confidence metric for accurate ation, the expert was presented with the set of cen- detection of centromeres, which will be essential for as- tromere candidates generated by the algorithm and was sessment and ultimately adoption of this approach for asked to select the candidate that closely represent the diagnosis. We develop a Candidate Based Centromere correct chromosomal location, while explicitly marking Confidence metric (CBCC) to assess detection of a cen- other candidates as non-centromeres. In cases where tromere location relative to alternatives. This value is all the candidates suggested by the algorithm were in- obtained using the feature space derived via the classi- correct, all the positions were designated as negative fier and the distance metric ρ. For a given set of candi- candidates. Intra-observer variability between experts date points, ie. centromeres, of a chromosome pC , the (ground truth) was minimal, as the two laboratory di- goodness of fit (GF) of the optimal candidate point (`ρ) rectors differed in assessment in a single centromere out of > 500 chromosomes analyzed by both. The 1400 is obtained by calculating (`ρ−ρ¯) , which is the average 2 chromosome data set yielded 7058 centromere candi- distance of all the remaining candidate points. In the dates. A randomly selected portion comprising 50% of ideal situation, the optimal candidate and the other this data set along with the corresponding ground truth candidates as support vectors for the classifier reside centromere assignments were used for training a sup- on opposite faces of the separating hyperplane (see fig- port vector machine for centromere localization. Next, ure 5). Therefore the optimal candidate distance (`ρ) is the accuracy of centromere localization was calculated ≈ 1, while the average of the remaining candidate dis- and is provided in Table 1. This provides a breakdown tances (¯ρ) is ≈ −1. The GF value is truncated at unity, of the detection accuracy of the algorithm based on the since exceeding this value does not add additional in- presence or the absence of sister chromatid separation formation to the metric. in the cell image for each staining method.

Table 1 The detection accuracy values for chromosomes used for the larger data set based on the staining method and the sister chromatid separation (SC Sep.) Chromo- Number Number Detec- -some of of -tion morphology chromo- accurate accuracy -somes detections DAPI without 114 104 91.2% Sc Sep DAPI with 587 517 88.1% Sc Sep Fig. 5 Shows the expected scenario for candidate-based cen- Giemsa with 699 599 85.6% tromere detection, in which 6 candidates are assessed by the Sc Sep SVM. The blue square represents the optimal candidate while the other five candidates are given by the red squares in the feature space. Table 2 depicts CBCC values for accurately detected chromosomes as opposed to inaccurately detected chro- mosomes. It also includes a third category termed ”All bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Centromere Detection of Human Metaphase Chromosome Images using a Candidate Based Method 9 nonviable candidate chromosomes” (a subset of the in- accurate centromere detection category), where none of the candidates for a given chromosome were marked as capturing the true centromere of the chromosome.

Table 2 Shows that CBCC metric demonstrates higher val- (a) (b) (c) ues in cases with accurate centromere detection. Category Chromo- Mean Std. Dev Fig. 7 Demonstrates results where algorithm failed to yield -somes (µ) (σ) an accurate centromere location. The detected centromere lo- cation (selected candidate) is depicted by a yellow dot while Accurate detection 1220 0.7861 0.3000 the segmented outline is drawn in blue. These results reported Inaccurate detection 180 0.3799 0.3293 a CBCC measures of (a) 0.368, (b) 0.066, (c) 0.655 respec- Nonviable candidates 124 0.2696 0.2457 tively.

Figure 6 shows a representative sample of cases where set (2 fold cross validation); accuracy, sensitivity and the centromere was accurately localized. These cases in- specificity were 92%, 96% and 72% respectively. Two clude chromosomes with and without sister chromatid fold cross validation was used instead of other meth- separation. The method does not detect centromere lo- ods such as the leave on out method, since it yields a cations in all cases, some of which are impacted by the reasonable estimation of the accuracy with a low com- algorithm’s inability to fully correct for the adverse ef- putational cost. The higher sensitivity of this algorithm fects of sister chromatid separation (depicted in Fig- relative to our previous efforts [5] can be attributed to ure 7). improvements in the performance of the classifier on both typical and sister chromatid separated chromo- somes. The lower specificity is predominantly related to lower confidence detection by the integrated inten- sity Laplacian algorithm of centromeres in acrocentric chromosomes, in which the centromeric constriction is not readily apparent because of its close proximity to one of the telomeres. (a) (b) (c) The objective of this study was to accurately de- tect the preferred centromere location (points) for each chromosome, even though the SVM produces a set of candidate points that can each be classified separately. All candidates in each chromosome were analyzed sepa- rately and the best candidate from this set was selected based on the distance metric value (ρ) of which the re- (d) (e) (f) sults are produced in table 1. Upon testing, the algo- Fig. 6 Demonstrates some sample results of the algorithm rithm accurately located a correct centromere location where the accurately detected centromere location (selected in 1220 of 1400 chromosomes (87%). It is notable that candidate) is depicted by a yellow dot while the segmented 124 of the 180 chromosomes that were missed were in- outline is drawn in blue. Figure 6(a) is a result of DAPI stances of non-viable candidate chromosomes. Some of stained chromosomes while Figure 6(b)-(f) are results of Geimsa stained chromosomes. These results reported a CBCC these were caused by segmentation of acrocentric chro- measures of (a) 1.000, (b) 1.000, (c) 1.000, (d) 0.995, (e) 1.000, mosomes, where the lighter intensity of the short-arm (f) 0.661 respectively. satellite regions were segmented out, while others were primarily the result of an extreme degree of sister chro- matid separation, such that the pairs of telomeres from sister chromatids could not be unequivocally paired. 4 Discussion The values in table 1 further suggest a slight reduction in accuracy for Giemsa-stained images, which contained The candidate based approach for centromere detec- significantly higher levels of sister chromatid separation tion used a trained SVM classifier based on half of and noisy chromosome boundaries. the input chromosomes. The accuracy of the method The proposed method performed centromere local- was then tested using the remaining 50% of the data ization accurately for chromosomes with high morpho- bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

10 Akila Subasinghe et al. logical variations (see figure 6). From a machine learn- Although it was not the focus of this study, we car- ing point of view, figure 6(a), (b) and (c) are fairly ried out a preliminary analysis of the capability of this straightforward centromere localizations. The CBCC algorithm to detect both chromosomes in a set of dicen- values for all three cases were 1.000 which was truncated tric chromosomes, which were present among an excess from an even higher value. This further validates the of normal single centromere chromomes, due to irrada- CBCC metric, indicating that the selected candidate tion of some of the cytogenetic samples analyzed. The is prefereable than the other candidates in the same constriction at the second centromere is similar mor- chromosome. It is important to notice that the bound- phologically to the first centromere in these images, ary conditions at the telomeric region of figure 6(c) is and therefore, it should be feasible that it be among similar in appearance to those in figure 3 and figure 4. the candidates found by the algorithm. We hypothe- However, with further separation and intensity fading sized that along with the optimal candidate, the sec- between the two sister chromatid arms, the segmenta- ond centromere was also expected to exhibit a short tion algorithm could converge to a concave morphology distance to the hyperplane and be well separated from in the telomere region that links the sister chromatids. the other candidates. These distances were compared Figure 6(e) represents such an instance where sister for all centromere candidates, and probable dicentric chromatid separation has had a significant effect on the chromosomes were identified by determining if the cor- chromosome segmentation. However, as a result of cor- rect, ground truth centromeres were among the top four recting for this effect, the algorithm has localized the ranked candidates. The breakdown of the candidates centromere accurately with a CBCC value of 1.000. The which captured the second centromere location is given chromosome segmentation in figure 6(d) demonstrates in table 3, where 20 cases (out of 31) reported the sec- evidence of extensive sister chromatid separation and ond centromere location as the second highest ranked therefore the CBCC value is at 0.995, which still is a candidate location. Among the 31 dicentric chromo- high value for the data set. The figure 6(f) represents somes present in the data set, the first candidate (the a chromosome which is highly bent and also presents selected centromere) was accurate in all instances. There with very significant sister chromatid separation. Nev- were only two instances where the second centromere ertheless, the algorithm was capable of localizing an ac- was not among the top four candidates. In both of these curate centromere location though the CBCC value was cases, the chromosomes exhibited a high degree of sis- low (0.661), which indicates a less than ideal separation ter chromatid separation. Nevertheless, the proposed among the centromere candidates. method provides a good framework for detecting di- centric chromosomes in radiation biodosimetry appli- Some of the shortcomings of the proposed method cations. are represented in figure 7. Most of these (86%) were observed to be cases where none of the candidates were deemed to contain the actual centromere. This was mai- Table 3 Shows that the proposed method ranked the second nly due segmentation problems and very high levels of centromere in dicentric chromosomes higher in most of the sister chromatid separation. Figure 7(b) depicts an ex- cases. ample where the segmentation algorithm failed to cap- Rank of the Number of second centromere cases ture the constriction in an acro-centric chromosome. 02 20 The CBCC value in this example was as low as 0.066, 03 6 which indicates that the algorithm selected a weak can- 04 3 didate for the centromere. Figure 7(a) demonstrates 05 1 a case where extreme sister chromatid separation has 06 1 caused the segmentation algorithm to treat each indi- vidual chromatid arm separately. This chromosome had a low CBCC value of 0.368, which is consistent with the acentric nature (morphological) of the separated arm. Figure 7(c) shows another impact of extreme sister 5 Conclusions chromatid separation, namely, the incorrect connection of the long arm of a pair of sister chromatids, leading to We have described a novel candidate-based centromere an apparent, bent chromosome, instead of detecting sis- detection algorithm for analysis of metaphase cells pre- ter chromatid separation. The CBCC measure fails to pared by different culturing and staining methods. The distinguish this chromosome from a normal bent chro- method performed with an 87% accuracy level when mosome, but nevertheless yielded a relatively high value tested with a data set of 1400 chromosomes from a of 0.655. composite set of metaphase images. The algorithm was bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Centromere Detection of Human Metaphase Chromosome Images using a Candidate Based Method 11 capable of correcting for the artifact created by pre- 3. M. Moradi et al., in 16th IEEE Symposium on Computer- mature sister chromatid separation. The majority of Based Medical Systems (2003) chromosomes with centromere constrictions were de- 4. A. Vaurijoux et al., in Radiation research, vol. 178 (2012), vol. 178, pp. 357 – 364 tected with very high sensitivity. We have also tested 5. A. Subasinghe A. et al., in International Conference on an promising extension of the centromere detection al- Image Processing (ICIP) (2010) gorithm to accurately identify dicentric chromosomes 6. R.M. Mohammad, Journal of Medical Signals and Sen- for cytogenetic biodosimetry. Loss of specificity in both sors 02(02) (2012) 7. S. Jahani, S.K. Satarehdan, in International Journal of mono and dicentric chromosomes was primarily the re- Biological Engineering, vol. 02 (1996), vol. 02, pp. 56–61 sult of segmentation errors in acrocentric chromosomes, 8. A. Subasinghe A. et al., in Biomedical Engineering, IEEE as well as in chromosomes with extreme degrees of sister Transactions on, vol. 60 (2013), vol. 60, pp. 2005 – 2013 chromatid separation. A better segmentation algorithm 9. T. Kobayashi et al., in IEEE Conference on Multimedia that addresses these challenging morphologies would Imaging (2004) 10. P. Rogan et al., in Radiation Protection Dosimetry further improve the detection accuracy of the proposed (2014) method. Furthermore, the telomere partitioning algo- 11. M. Kass et al., International Journal of Computer Vision rithm needs to be improved in order to handle chromo- 1(4), 321 (1988) somes with extreme sister chromatid separation which 12. C. Xu, J.L. Prince, IEEE Transaction on Image Process- ing 7(3) (1998) are commonly encountered in radiation biodosimetry 13. L.J. Latecki, R. Lak¨amper, in DAGM-Symposium (1998), applications. In addition, an algorithm to accurately pp. 85–92 separate touching and overlapping chromosomes will 14. A. Subasinghe A. et al., in Seventh Canadian Conference also be required to fully automate this process. It is on Computer and Robot Vision (CRV) (2010) also necessary to analyze a larger data set to gauge 15. X. Bai et al., IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 29(03) (2007) performance of the proposed method. 16. C. Xu, B. Kuipers, in Canadian Conference on Computer The framework used for adding intensity into the and Robot Vision (CRV) (2011) Laplacian thickness measurement algorithm can be eas- 17. L.J. Latecki, R. Lak¨amper, in Proceedings of the Sec- ily extended to include other features besides the cal- ond International Conference on Scale-Space Theories in Computer Vision (Springer-Verlag London, UK, 1999), culation of chromosome width. Further investigation pp. 398 – 409 aimed at both improving centromere detection accu- 18. M. Michael I. et al., NeuroImage 12(6), 676 (2000). DOI racy and applications of this algorithm to other de- 10.1006/nimg.2000.0666 tection problems is warranted. The Candidate Based 19. H. Haidar, J. Soul, NeuroImage 16, 146 (2006). DOI 10.1111/j.1552-6569.2006.00036.x Centromere Confidence (CBCC) was introduced as a 20. A. Subasinghe A. et al., in Electrical Computer Engineer- measure for confidence in each centromere detection. ing (CCECE), 2012 25th IEEE Canadian Conference on However, this metric can be applied to any problem (2012) which required a selection of a candidate from a pool 21. R. Stanley et al., in Proceedings: a conference of the of candidates. We suggest that the CBCC metric may American Medical Informatics Association (1996), pp. 284–288 be extensible to indicate the relative quality of a given cell image or of a set of slide containing a set of meta- phases cells from the same patient. If successful, the CBCC metric may eventually limit the amount of time Dr. Akila Subasinghe (am- required to evaluate samples both prior to and during [email protected] is a centromere detection. senior lecturer of Electrical and Electronics Engineer- ing at the University of Sri Acknowledgements Supported by the Western Innovation Jayewardenepura, Nugegoda, Fund (University of Western Ontario), Natural Sciences and Sri Lanka. He obtained a PhD Engineering Research Council of Canada and the DART- of Electrical and Computer DOSE CMCR (5U01AI091173-02 from the US Public Health Engineering from the Western Service) and the Canada Research Chairs Secretariat. University, London, Ontario. He secured a BSc. Honors in Electrical Engineering from References University of Moratuwa Sri Lanka and then a MESc. Electrical and Computer 1. F. Coso, L. Vega, A. Becerra, R. Melndez, G. Corkidi, Medical & Biological Engineering & Computing 39(3), Engineering from Western 391 (2001) University. Dr. Akila’s re- 2. M. Munot, J. Mukherjee, M. Joshi, Medical & Biological search areas include biomedical image analysis, biomedical Engineering & Computing 51(12), 1325 (2013) instrumentation and machine learning. bioRxiv preprint doi: https://doi.org/10.1101/032110; this version posted January 9, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

12 Akila Subasinghe et al.

Dr. Jagath Samara- Ms. Farrah Flegal is a bandu ([email protected]), is Research Scientist and lead a professor of Electrical and of the Biodosimetry Program Computer Engineering at the at Canadian Nuclear Labo- Western University, London ratories (CNL). She is cer- Ontario. He is a Fulbright tified in clinical Scholar and obtained his MS and molecular genetics by the and Ph.D. at SUNY Buffalo. Canadian Society for Medi- His research areas include cal Laboratory Science, the computer vision, pattern College of Medical Labora- recognition, image under- tory Technologists of Ontario, standing, machine learning graduate of the University of and cyber-security. He has Guelph with a BSc in Bio- been with the university since logical Sciences and from The 2000. Dr. Samarabandu’s re- Michener Institute of Applied search activities include image Health Sciences with a DipHSc analysis, computer vision and (Genetics). Her work at CNL pattern recognition. He has more than 20 years of academic involves being a partner in the Canadian Cytogenetic Emer- and industrial experience in this domain. gency Network for emergency biological dosimetry as well as in research into the development of high throughput dose es- timation methods for large scale exposures to ionizing radia- tion.

Mr. Yanxin Li is a soft- Dr. Joan Knoll is a professor ware developer of Biochem- at WU and Leader of Molecu- istry Department at Western lar Diagnostics Division, Lon- University, London, Ontario. don Health Sciences Centre, He obtained his B.Eng in com- Ontario (Ph.D., , University puter science and engineer- of Manitoba; postdoc: Harvard ing at Zhejiang University and University) and Chief Scien- MSc in computer science at tific Officer of Cytognomix. Western University. His re- Dr. Knoll is board certified in search interests include paral- cytogenetics and molecular ge- lel computing, machine learn- netics in the US and Canada. ing and image processing. Hi Dr. Knoll has more than 30 is currently working on a re- years experience in Cytoge- search project to automate ra- netics including more than 20 years directing CLIA-certified

diation dosimetry employing and CAP or OLA accredited high performance computing. clinical cytogenetical laborato- ries. Knoll and Rogan founded Cytognomix based on their patents of cytogenetic/genomic technologies.

Dr. Peter Rogan Dr. Pe- ter Rogan is Professor of Bio- Dr. Ruth Wilkins earned chemistry, Computer Science, a Doctor of Philosophy in Oncology, and Biomedical En- Medical Physics from the gineering at WU. He holds Carleton University, Ottawa, the Canada Research Chair in Canada after which she joined Bioinformatics (B.S., the Consumer and Clinical Johns Hopkins; Ph.D., Yale Radiation Protection Bureau, University, postdoc: US Na- Health Canada, as a Research tional Cancer Institute), and Scientist. Since 2006, she has is a prolific inventor and been the Division Chief of serial entrepreneur. He also the Radiobiology Division at cofounded Cytognomix Inc., Health Canada. Her research which is developing software involves the biological effects for automated detection of di- ionizing radiation in mam- malian systems both in vitro centric chromosomes in cyto- genetic data. The company and in vivo with a strong fo- holds US Pat. No. 8,605,981 and other pending patents, which cus on cytogenetic biological cover some methods described in this study. Rogan has more dosimetry. She is currently the lead of the development of >20 years experience developing software applications that the Canadian National Biological Dosimetry Response Plan are widely used by molecular diagnostics laboratories. for large scale exposures to ionizing radiation.