Downloaded by guest on September 29, 2021 www.pnas.org/cgi/doi/10.1073/pnas.1816748116 a II Simone B. Charles Jos the with additive trees decision accurate more Building ATadbotdsup htcnotefr ihro these of either outperform can that stumps approaches. boosted also between and models it hybrid CART perfor- by medicine, represented trees like predictive decision applications produce and can high-stakes to single interpretability for primarily needed a model designed or mance varying the is CART by tree both shows to extremes additive provide and equivalent the the Although tree, models at parameter. stumps produce decision boosted can single gradient method a rigorous this creating that a for learn- introduces technique validated empirically an ing paper tree, additive models This the for these formalization approaches. connections that unseen these previously show revealing between been spectrum, We have a CART, isolation. along as exist gener- in such the largely models, of decision interaction investigated gradient interpretability by full of produced the and those as accuracy boosting, of such models, expense the Additive sim- model. the improve ated their at can to but methods due boosting trees, Ensemble sciences, predictions. gradient of health like explanation in intuitive role (CART) and Trees major ple Regression a and played Classification learn- have used machine widely interpretable the The in of interest ing. where understanding the justice, increased clear has criminal requires model, application and decisions high-stakes finance, informed Hooker) to Giles medicine, making and as Cutler learning Adele such by machine reviewed domains 2018; of 10, October expansion review for The (sent 2019 8, August Friedman, H. Jerome by Contributed 94305 CA Stanford, University, Stanford , 19104; of PA Philadelphia, Pennsylvania, of University 94115; CA Francisco, eiinspotsses ATbid oe yrecursively by model a the builds clinical CART (19–21), for systems. medicine requirement support a like decision considered fields are decision- traits in rule-based aforementioned Specifically, to scal- connection (18). interpretability, and making model datasets, been its large has to for that ability fields numerous technique more by learning adopted than statistical power well-established a predictive lower such have fields models, in models. simple yet widely sophisticated intuitively used healthcare, are on which as focuses trees, paper decision specifically This (e.g., models trees). simple sion intuitively and poten- 16) for Lately, (e.g., explanations models 13). simple been complex hoc (12, have tially post it interpretability namely, for measure (14), approaches to proposed of in how categories what and broad interpretability on means 2 of disagreement concept considerable a concept to is such able the Despite there be prediction. formalize learning, to particular machine to a users efforts made for model recent expla- need the to nondiscrimination the right why motivating 7), the examine thus recently, (6, and, (11), debt 10), nation (9, generated technical outcomes the as medical risks understanding (8), such serious in incur scenarios can interest domains in of these in surge Mispredictions a models. to led has T boosting gradient diietree additive eateto aito nooy nvriyo enyvna hldlha A19104; PA Philadelphia, Pennsylvania, of University Oncology, Radiation of Department lsicto n ersinTe CR)aayi 1)is (17) analysis (CART) Tree Regression and Classification acoLuna Marcio e ´ oan uha rmnljsie(,2 n eiie(3–5) medicine and 2) (1, justice criminal high-stakes as to such learning domains machine of application increasing he | eiintree decision c eateto optn n nomto cec,Uiest fPnslai,Piaepi,P 19104; PA Philadelphia, Pennsylvania, of University Science, Information and Computing of Department a,1,2 e fttisD Gennatas D. Efstathios , eoeH Friedman H. Jerome , | nepeal ahn learning machine interpretable iuladtxulepaain)(15, explanations) textual and visual diiemdl,deci- models, additive e eateto aito nooy e okPoo etr e ok Y105 and 10035; NY York, New Center, Proton York New Oncology, Radiation of Department f,2 ioh .Solberg D. Timothy , b,1, yeH Ungar H. Lyle , | CART | 1073/pnas.1816748116/-/DCSupplemental. y at online information supporting contains article This 2 1 oeiaie ies . C BY-NC-ND) (CC patent 4.0 NoDerivatives License distributed under a status.y is pending have article access trees,” G.V. open decision and This improved T.D.S., generating C.B.S., for L.H.U., methods and E.E., “Systems J.M.L., titled statement: interest University. of y Cornell Conflict G.H., E.E., and L.H.U., University; E.D.G., State Utah J.M.L., paper. A.C., y and the Reviewers: wrote data; G.V. analyzed and T.D.S., G.V. J.H.F., C.B.S., and S.T.J., T.D.S., E.S.D., G.V. C.B.S., S.T.J., L.H.U., and E.D.G., E.S.D., T.D.S., J.M.L., tools; E.E., E.E., J.H.F., reagents/analytic L.H.U., new E.D.G., C.B.S., contributed G.V. J.M.L., and research; E.S.D., T.D.S., J.H.F., performed S.T.J., G.V. E.E., and E.D.G., L.H.U., J.M.L., research; E.D.G., designed J.M.L., contributions: Author eiinte loihs pcfial ATadC. 2) are, (27), C4.5 and CART that specifically proves algorithms, 26 ref. tree in learn- analysis decision rigorous weak a the However, as boosting. in used ers models CART through primarily nected of capacity their expense the affecting at therefore but topology, explanation. trees, their decision altering successful of In been of accuracy data. have the 25) training improving (24, approaches at labeled ensemble’s hoc the ad an some match optimize addition, increasingly iteratively to (23) nodes, gradi- prediction of methods particular, number boosting In the interpretability. ent of pre- model increase high-performance harming an weak a therefore, with multiple into albeit combines model, models) iteratively dictive CART that (often ensemble means learners a an as developed learning train (22). were techniques statistical to methods ensemble boosting other latter, and the methods than Among kernel accurate as such less models, CART often real use, widespread or are Despite classification) regression). models of of case case the the (in with (in value partition category each predicted labeling a and either space, instance the partitioning c eneiieueneu [email protected] jose.luna@ [email protected], or Email: pennmedicine.upenn.edu, addressed. be may correspondence whom work.y To this to equally contributed E.D.G. and J.M.L. rcEaton Eric , AT svldtdo 3casfiaintasks. classification to 83 on performance boosting. validated predictive as gradient superior CART, exhibits and tree CART additive which The tree, between gen- decision interpretable connections to and approach reveals accurate theoretical more a a tree, erate We power. additive due predictive the in medicine, limitations developed as intrinsic such from However, such suffer interpretation. trees intuitive fields and in simplicity their role to predominant have decision a (CART), Trees Current been Regression played and risks. Classification has as serious such there possible trees, result, overcome to a learning models machine As general understanding in lives. legitimate interest increasing people’s medicine, mis- individual on and of effects predictions finance, high-impact about justice, emerge concerns criminal areas as high-stakes to such expand applications learning machine As Significance eiinte erigadgain osighv encon- been have boosting gradient and learning tree Decision b n imrValdes Gilmer and , b eateto aito nooy nvriyo aiona San California, of University Oncology, Radiation of Department c rcS Diffenderfer S. Eric , b,2 . y raieCmosAttribution-NonCommercial- Commons Creative a hn .Jensen T. Shane , www.pnas.org/lookup/suppl/doi:10. d eateto Statistics, of Department NSLts Articles Latest PNAS f Department d , | f7 of 1

STATISTICS in fact, boosting algorithms. Based on this approach, AdaTree, a tree-growing method based on AdaBoost (28), was proposed CART AddTree GBS in ref. 29. A sequence of weak classifiers on each branch of the was trained recursively using AdaBoost; there- ℎ , ℎ , ℎ fore, rendering a decision tree where each branch conforms to a strong classifier. The weak classifiers at each internal node were implemented either as linear classifiers composed with ℎ , ℎ , ℎ , ℎ , ℎ a sigmoid or as Haar-type filters with threshold functions at their output. Another approach is the Probabilistic Boosting- Tree (PBT) algorithm introduced in ref. 30, which also uses AdaBoost to build decision trees. PBT trains ensembles of strong classifiers on each branch for image classification. The strong Fig. 1. A depiction of the continuum relating CART, GBS, and our AddTree. classifiers are based on up to third-order moment histogram fea- Each algorithm has been given the same 4 training instances (blue and tures extracted from Gabor and intensity filter responses. PBT red symbols); the symbol’s size depicts its weight when used to train the also carries out a divide-and-conquer strategy in order to per- adjacent node. form data augmentation to estimate the posterior probability of each class. Recently, MediBoost, a tree-growing method based on the more general approach, was proposed is trained with a different weighting of the entire dataset, unlike in ref. 31. In contrast to AdaTree and PBT, MediBoost empha- CART, repeatedly emphasizing mispredicted instances at every sizes interpretability, since it was conceived to support medical round (Fig. 1). GBS or simple regression creates a pure addi- applications while exploiting the accuracy of boosting. It takes tive model in which each new ensemble member reduces the advantage of the shrinkage factor inherent to gradient boost- residual of previous members (32). Interaction terms can be ing, and introduces an acceleration parameter that controls the included in the ensemble by using more complex learners, such as observation weights during training to leverage predictive perfor- deeper trees. mance. Although the presented experimental results showed that Classifier ensembles with decision stumps as the weak learn- MediBoost exhibits better predictive performance than CART, ers, ht (x), can be trivially rewritten (31) as a complete binary no theoretical analysis was provided. tree of depth T , where the decision made at each internal node In this paper, we propose a rigorous mathematical approach at depth t−1 is given by ht (x), and the prediction at each leaf is for building single decision trees that are more accurate than tra- given by F (x). Intuitively, each path through the tree represents ditional CART trees. In this context, we use the total number of the same ensemble, but one that tracks the unique combination decision nodes (NDN) as a quantification of interpretability, due of predictions made by each member. to their direct association with the number of decision rules of the model and the depth of the tree. This paper addresses a num- The Additive Tree ber of fundamental shortcomings: 1) We introduce the additive This interpretation of boosting lends itself to the creation of a tree (AddTree) as a mechanism for creating decision trees. Each tree-structured ensemble learner that bridges CART and GBS: path from the root node to a leaf represents the outcome of a gra- the AddTree. At each node, a weak learner is found using gra- dient boosted ensemble for a partition of the instance space. 2) dient boosting on the entire dataset. Rather than using only the We theoretically prove that AddTree generates a continuum of data in the current node to decide the next split, the algorithm single-tree models between CART and gradient boosted stumps allows the remaining data to also influence this split, albeit with (GBS), controlled via a single tunable parameter of the algo- a potentially differing weight. Critically, this process results in an rithm. In effect, AddTree bridges between CART and gradient interpretable model that is simply a decision tree of the weak boosting, identifying previously unknown connections between learner’s predictions, but with higher predictive power due to its additive and full interaction models. 3) We provide empirical growth via boosting and the resulting effect of regularizing the results showing that this hybrid combination of CART and GBS tree splits. This process is carried out recursively until the depth outperforms CART in terms of accuracy and interpretability. limit is reached. By varying the weighting scheme, this approach Our experiments also provide further insight into the continuum interpolates between CART trees at one extreme and boosted of models revealed by AddTree. regression stumps at the other. Notably, the resulting tuning of the weighting scheme allows AddTree to improve in predic- Background on CART and Boosting tive performance over CART and GBS, while allowing direct N interpretation. Assume we are given a training dataset (X, y)= {(xi , yi )} , i=1 AddTree maintains a perfect binary tree of depth n, with where each d-dimensional data instance xi ∈ X ⊆ X has a corre- 2n − 1 internal nodes, each of which corresponds to a weak sponding label yi ∈ Y, drawn independently and from an identi- cal and unknown distribution D. In a binary classification setting, learner. The kth weak learner, denoted hk,l , along the path from the root node to a leaf prediction node l induces 2 disjoint par- Y = {±1}; in regression, Y = . The goal is to learn a function titions of the feature space X , namely Pk,l and its complement F : X 7→ Y that will perform well in predicting the label on new c examples drawn from D. CART analysis recursively partitions X , Pk,l . Let {R1,l , ... , Rn,l } be the corresponding set of partitions along that path to l, where the kth term in the tuple, namely with F assigning a single label in Y to each terminal node; in this c manner, there is full interaction. Different branches of the tree Rk,l , can be either Pk,l or Pk,l . We can then define the partition are trained with disjoint subsets of the data, as shown in Fig. 1. of X associated with the leaf node l as the intersection of all of the set of partitions along the path from the root node to the In contrast, boosting iteratively trains an ensemble of T Tn T leaf node l; that is, Rn,l = Rk,l . AddTree predicts a label weak learners {ht : X 7→ Y} , such that the resulting model k=1 t=1 for each x ∈ R via the ensemble consisting of all weak learn- F (x) is a weighted sum of the weak learners’ predictions: n,l T T ers along the path to l so that the resulting model is given by P n F (x)= t=1 ρt ht (x) with ρ ∈ R .* Each boosted weak learner P F (x ∈ Rn,l ) = k=1 ρk,l hk,l (x). To focus each branch of the tree on corresponding instances, thereby constructing diverse ensem- N bles, AddTree maintains a set of weights denoted wn,l ∈ R over *In classification, F gives the sign of the prediction. CART models are often used as all training data. Such weights are associated with training a weak the ht s. learner hn,l at the leaf node l at depth n.

2 of 7 | www.pnas.org/cgi/doi/10.1073/pnas.1816748116 Luna et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 h eaieucntandgainsa ahdt instance data as each squares at least gradients with unconstrained negative n trees function the regression loss binary the are learners L Eq. h ott efrpeet h aeesml,ads would so and ensemble, same the represents leaf depth a of to tree root binary the perfect a build weights instance all If r qiaety ovn o h iperegressor simple the for solving ρ equivalently, or, pseudoresiduals data, from original obtained the learners fit weak to the scale to factor is scaling tion optimal the for solving then residuals the to Eq. solves boosting Gradient for parameters optimal factor learners scaling weak additional with uae al. et Luna [i.e., uals to is goal The ensemble. data corresponding the over leaf’s loss the minimize each for one node terminal each at function the of l estimate revised a yields the of each replacing depth by of estimate tree function the binary of perfect estimate current a ere yidpnetymnmzn h ne umto for summation inner the minimizing each connection independently the by Eq. breaking learner in at loss global aims of the approach between values proposed appropriate Our leaf. the choosing by n qiaetto equivalent n y ˜ et efcso eiigAdreweeteweak the where AddTree deriving on focus we Next, have we step, boosting each At follows. as tree the train We ,l L ,l i h n (·) = L 2 ρ n l (X F n ,l n − {1, ∈ γ ,l a esle fcetyvagain osigo each of boosting gradient via efficiently solved be can n nalvlws anrtruhtetree. the through manner level-wise a in ,l (x , n (X ,l ∂` = y) r min arg = ,l i = (x) ) ( (x , ∂ y ˜ y r min arg = y) sfollows: as F . . . i i i ,F n r min arg = ) = 2 h X −1,l l n n n =1 {ρ 2 F 2 , ρ −1 −1,l y ,l (X X n n i ρ n i =1 (x N n ,l n −1 ( −1,l − X x , i ,l , −1 i ( =1 i N h y ˜ X ) olwn e.2,w rtestimate first we 23, ref. Following `(·). } i x r min arg = ) w n F =1 i N eaaefunctions, separate ), l 2 )) htis, that }; ,l =1 + (x) n n n w w γ −1 X ,l o i −1,l w n n L =1 N ,i i N ,l ,l =1 n X n i rwn h reb n ee.This level. one by tree the growing , ,i =1 `(y N ,l ,l eancntn,ti prahwould approach this constant, remain w (x ρ ,i hc r qiaett h resid- the to equivalent are which , (·) × n n h i 3 w i (y ,l .Te,w a eemn the determine can we Then, )]. ,l , `(y 1 n yfis tigdcso stumps decision fitting first by ,i n ysolving by h F i ,l X i n n h osue o ahweak each for used loss the and {h − n =1 − N ,i (˜ i ,l −1,l , y (x) (y F n ese oipoethis improve to seek We 1. F i w ,l 2 F n − n i n n } −1,l (x n T −1,l − ,l −1 l 2 ρ −1,l =1 ∀l n i ,i hr ahpt from path each where , n F + ) −1 (˜ ,l (x (x ρ ρ y n {1, ∈ efpeito nodes prediction leaf (x) h n n i −1,l i i ρ ,l n ) ,l ihcorresponding with − + ) ,l n − hs anfunc- main whose , ,l and (x tteleaf the at h (x . . . h ρ ) ρ i i n 2 h n )) ) , ,l n ,l 2 , − 2 ,l (x h h . n (x n n γ i ,l −1 ,l )). γ ) i (x )) n 2 teach at . }, ,l 2 i (x l )), , fa of i h = ) [3] [2] [1] n ,l loih 1: Algorithm GBS; to equivalently decision performs and a CART yields outperforms AddTree that 2) tree AddTree; that and demonstrating GBS, results estab- CART, empirical analysis between theoretical connections a the 1) lishing results: following the provide We in Results shown as S6. trees and larger S5 accurate yield Figs. to more estimate potentially function running resulting smaller but the trees); estimate, slow interpretable function rates more learning the consequently (and to function weak shorter each more running in allow contribute the rates bal- to learning to rate Larger learner learning node ensemble. The its each algorithm. of of estimate the contributions of the shrink- 8 ances or step rate in learning shrinkage by a is that approach parameter Notice AddTree parti- age 1. complete one Algorithm The for as space. predictions detailed instance optimal ensembles, the yield interrelated of think to of tion tuned also collection is a can each growing where we as Therefore, AddTree of distributed. uniformly typically cate where weight the for separately updated are parti- weights that of the to each this, belong accomplish that To instances corresponding branch tion. on each of tree focusing boosting the by of ensembles gradient diverse to constructs AddTree equivalent exactly be usd h atto aeterwihsmlile yafactor a by multiplied weights their have λ of partition factor the a by outside multiplied weights their have 0 If 10: 1 If 11: Inputs: Outputs: :Dfietesml regressor: simple the Define 7: :Udt h etadrgtsbreisac weights: instance subtree right and left the Update 9: estimation: function current the Update 8: 6: :Let 5: classifier weak Fit 4: :Cmuengtv gradients: negative Compute root 3: subtree new a Create 2: If 1: ∈ ρ γ l l F h egtdaeaeof average weighted n [0, n n n,l n,l n,l n,l y ˜ .right .left p R R n i (x ← ← {P = ≥ ← ∞) riigdt (X data Training ntneweights instance erigrate learning function prediction depth max λ n,l n,l 1[p stu,ad0ohrie n h nta weights initial the and otherwise, 0 and true, is h ) ∈ h otnd faheacia ensemble hierarchical a of node root The − T n,l r min arg = n ρ ← r min arg T T eunapeito node prediction a return , n,l ,l [0, ← , ∂` eoe itrrtblt atr”Teudt uefor rule update The factor.” “interpretability denoted ] F w P P s2cide:Isacsi h orsodn partition corresponding the in Instances children: 2 ’s P AddTree w h n−1,l ∂ (y w n,l n,l c w n,l n c sabnr niao ucinta s1i predi- if 1 is that function indicator binary a is AddTree ,nd depth node +∞), n,l F n,l (right) i ρ ,l h n−1,l ,F n AddTree(X n,l (left) } 6 6 = = n−1,l (x ,l (x etepriin nue by induced partitions the be P ν P (x opt ih ute recursively: subtree right compute ∅, recursively: subtree left compute ∅, i (x T (x (x ) ) (x i N i i oedomain node , i N si lc nodrt pl regularization apply to order in place in is + =1 ) =1  i ν i = ) of i h ) ) )) X . ν  ← ← n,l o w , w , X x · i N y y =1 w (x n,l y w γ i n,l w w , , ) n,l y λ, n,l for ) n ihweights with (x , = n,l n,l (x F : , −1,l y n−1,l λ, →Y 7→ X i w ∈ (x {(x (x i ) , )( l R n w R n,l (left) i i y ˜ w ) y ) n (x i i ohl eklearner weak a hold to N n (x i , n,l − n,l (right) − λ λ y ,l dfut 1), (default: , (default: i (default: ) i , R R (λ ) h) )} + + {P ∈ F ysolving: by λ, n−1,l l i N 2 n,l n,l 1[x 1[x n , =1 R n, T htpeit the predicts that + (default: , w i i n n,l (x P T, ∈ ∈ 1[ ,l n,l n,l i T w , ) P P NSLts Articles Latest PNAS R , x − P n,l n,l c F n,l n+1, P i h 0 n n,l n,l c c ] ] ,i ρh ∈ (x n,l   ,l + 1 = , , ) X . R } n,l n F = T N 1 ), n−1, n sgvnby given is +  , n instances and λ, ), y ¯ 2 ,l F (X 0 1, n,l ]), ), T l , IAppendix, SI , , ν , Instead, y). ν  F ) n,l , ν w |  0 f7 of 3 are [4]

STATISTICS P w0(xi )(yi −F (xi ))  xi ∈Rn,l n−1,l and 3) an empirical analysis of the behavior of AddTree under P if xi ∈ Rn,l  w0(xi )  xi ∈Rn,l various parameter settings. ∗  lim γn,l (x)= λ→∞ P x ∈Rc w0(xi )(yi −Fn−1,l (xi )) Theoretical Analysis. This section proves that AddTree is equiva-  i n,l  P otherwise,  x ∈Rc w0(xi ) lent to CART when λ = 0 and converges to GBS as λ → ∞, thus i n,l establishing a continuum between CART and GBS. We include [10] proof sketches for the 3 lemmas used to prove our main result in Theorem 1; full proofs are included in SI Appendix. n Lemma 1 The weight of xi at leaf l ∈ {1, ... , 2 } at depth n of where w0(xi ) is the initial weight for the ith training sample. the tree (∀n = 1, 2, ...) is given by Proof Eq. 9 follows from applying Eq. 5 in Lemma 1 to Eq. 6 in Lemma 2 and taking the limit when λ → 0, regarding the result F (x)=¯y , with constant y¯ defined by Eq. 8. Similarly, Eq. 10 Pn 1[x ∈ R ] Pn 1[x ∈ Rc ] n n,l n,l k=1 i k,l k=1 i k,l 5 6 wn,l (xi ) = w0(xi )(λ + 1) λ , follows from applying Eq. in Lemma 1 to Eq. in Lemma 2 and taking the limit when λ → ∞. [5]  Remark 1 The simple regressor given by Eq. 9 calculates a weighted average of the difference between the random output vari- ∗ where {R1,l , ... , Rn,l } is the sequence of partitions along the path ables yi and the previous estimate y¯n−1,l of the function F (x) from the root to l. in the disjoint regions defined by Rn,l . This formally defines the Proof Sketch: Shown by induction based on Eq. 4.  behavior of the CART algorithm. ∗ Lemma 2 The optimal simple regressor γn,l (x) that minimizes the Remark 2 The simple regressor given by Eq. 10 calculates a squared error loss function Eq. 2 at leaf l ∈ {1, ... , 2n } at depth n weighted average of the difference between the random output vari- ∗ of the tree is given by ables yi and the previous estimate of F (x) given by the piece-wise constant function Fn−1,l (xi ). Fn−1,l (xi ) is defined in the over- P lapping region determined by the latest stump, namely R . This  w (xi ) yi −F (xi ) n,l i:xi ∈Rn,l n,l ( n−1,l )  P if xi ∈ Rn,l defines the behavior of the GBS algorithm.  w (xi )  i:xi ∈Rn,l n,l ∗  Based on Remarks 1 and 2, we can conclude that AddTree γn,l (x) = . [6] P is equivalent to CART when λ → 0 and GBS as λ → ∞. In i:x ∈Rc wn,l (xi )(yi −Fn−1,l (xi ))  i n,l  P otherwise addition to identifying connections between these 2 algorithms,  i:x ∈Rc wn,l (xi ) i n,l AddTree provides the flexibility to train a model that lies between CART and GBS, with potentially improved perfor- Proof Sketch: For a given region Rn,l ⊂ X at the depth n of the tree, mance over either, as we show empirically in the next sec- the simple regressor has the form tion. Finally, notice that, although the introduction of the λ parameter might be reminiscent of decision trees trained  with fuzzy logic (33), in the fuzzy logic approach, the conti- γ if x ∈ R  n1,l n,l nuity is between a simple constant model and CART, while γn,l (x)= , [7] AddTree produces a continuous mapping between CART γ otherwise n2,l and GBS.

with constants γn1,l , γn2,l ∈ R. We take the derivative of the loss c function (Eq. 2) in each of the 2 regions Rn,l and Rn,l , and solve for where the derivative is equal to zero, obtaining Eq. 6.  Balanced Accuracy (Row > Column) Lemma 3 The AddTree update rule is given by F (x)= n,l CART AddTree GBS RF GBM Fn−1,l (x)+ γn,l (x). If γn,l (x) is defined as

P wn,l (xi )(yi − Fn−1,l (xi )) i:xi ∈Rn,l CART 0 22 26 10 9 γn,l (x) = P , wn,l (xi ) i:xi ∈Rn,l

with constant F0(x)=¯y0, then Fn,l (x)=¯yn,l is constant, with AddTree 55 0 34 15 13 P wn,l (xi )yi i:xi ∈Rn,l y¯n,l = P , ∀n = 1, 2, .... [8] wn,l (xi ) i:xi ∈Rn,l GBS 53 46 03328

Proof Sketch: The proof is by induction on n, building upon Eq. 7. Each γn,l (xi ) is constant and so y¯n,l is constant; therefore, the lemma holds under the given update rule.  RF 68 64 46 030 Building upon these 3 lemmas, our main theoretical result is presented in the following theorem, and explained in the subsequent 2 remarks: Theorem 1 Given the AddTree optimal simple regressor (Eq. 6) GBM 67 64 51 46 0 that minimizes the loss (Eq. 2), the following limits hold for the parameter λ of the weight update rule (Eq. 4): Fig. 2. Frequency that an algorithm (rows) has higher average BACC com- P pared to each one of the remaining algorithms (columns) under study across w0(xi )yi ∗ i:xi ∈Rn,l 83 binary classification tasks. Each of the learning algorithms has been tuned lim γn,l (x)= P − y¯n−1,l , [9] λ→0 w0(xi ) to maximize BACC. Ties have been ruled out for the sake of clarity. i:xi ∈Rn,l

4 of 7 | www.pnas.org/cgi/doi/10.1073/pnas.1816748116 Luna et al. Downloaded by guest on September 29, 2021 Downloaded by guest on September 29, 2021 h rvosscinpoie omlpofta AddTree CART between that connection proof unknown formal previously a a provided establishes section previous The Discussion CART. over AddTree of the Therefore, explains performance bias estimator. superior compromising consistent without a of is reduction which the GBS, to converges it (λ interpretability CART the to of as function plot, a the as AddTree parameter of variance the AddTree. of the Variance Empirical nodes. explain of number may smaller a which with performance interactions, in equivalence model exploits (SD AddTree (SD 316,728.8 412,458.4 with with GBM the RF and ensembles out tree carry (SD large to the 4,692.0 by GBS was the by CART, tasks assembled and classification stumps AddTree of for values number NDN average previous the (P (SD with significant ing 55.4 (SD statistically was was 50.7 reduction tasks and this classification trees the CART out for carry to used Nodes. Decision of Number of more expense are the GBM at but NDNs. and larger approaches, RF having single-tree methods the than ensemble accurate the GBS. to expected, respect with As AddTree of NDNs of reduction in meaningful plot with scatter the bars in CART shown the outperforms AddTree by where indicated tasks chart 55 are bar of the the where on comparison 3, illustrated Furthermore, Fig. is in CART 2. and Fig. AddTree of in BACCs the illustrated GBS bet- that as with tasks fact BACC classification the ter of counts despite the SD, in AddTree less overcomes exhibiting even GBS, comes in shown comparison F1-score that the trees, dis- single GBM the the and to and across RF is, GBS Moreover, 2.9%) to 1. performance rate Table superior in played (improvement shown score as tasks, F1 PMLB and (improvement 1.4%) BACC (P mean rate significance highest the statistical exhibits no AddTree fact, with In but tasks, (55.4%) 46 uae al. et Luna not are ties where significance statistical 2, high CART with Fig. (P outperformed tasks in classification AddTree (66.3%) clarity. illustrated 55 of BACC in is sake the the others for outperformed the included, of models all aforementioned Benchmark which of the for Learning tasks of classification Machine the one of Penn frequency classification The the Gradient (35). 83 (PMLB) from in and namely, selected calculated (RF) models, was tasks Forest learning (GBM), machine Random Machines optimized Boosting GBS, (BACC) of CART, set accuracy AddTree, a balanced of on (34) based performance predictive Performance. Predictive of Evaluation Empirical nTbe1 h vrg ACadF cr fAdreover- AddTree of score F1 and BACC average the 1, Table In ATadAdre hs eut r locnitn with consistent also are results These AddTree. and CART 3 = .08 × λ λ 10 0 = setmtdadsoni i.4 sosre in observed As 4. Fig. in shown and estimated is nrae,Adrerdcsvrac ihrespect with variance reduces AddTree increases, −7 ihu opoiigba ic,as since, bias compromising without ) ,wieAdrewsotefre yGSin GBS by outperformed was AddTree while ), Dn.o nodes of no. SD enn.o nodes of no. Mean F1-score SD enF score F1 Mean DBACC SD efrac esr ATAdreGBS AddTree CART BACC Mean 83 across comparison under measure algorithms Performance learning the of all tasks for classification indexes PMLB Performance 1. Table ssoni al ,teaeaeNDN average the 1, Table in shown As lo oiethe notice Also, S2. Fig. Appendix, SI yuigtann aafo PMLB, from data training using By 700,981 = 0 = ∆BACC i.S1. Fig. Appendix, SI o dTe re,and trees, AddTree for .17) 1,614 = .I otatt GBS, to contrast In .6). > 177.0 796.7 146.1 811.5 iia eutis result similar A 0. 0 = ,srasdonly surpassed .1), 86.2 55.4 nti eto,the section, this In × × × × .Contrast- .001). 10 10 10 10 −3 −3 −3 −3 752,654.8) = 0 = 174.8 799.7 144.8 815.5 λ 0 = ∞, → .06). . 84.1 50.7 18) × × × × 10 10 10 10 −3 −3 −3 −3 osrastepeitv efrac fCR vrawide a by corroborated over further CART S3 is Fig. This of Appendix, tasks. performance classification of predictive variety the to surpass out to carried was experiments of hypothesis. of this set validate interpretability a and Therefore, more performance a AddTree. the allows of PMLB assessment of small using power rigorous but statistical 36, superior ref. The in performance datasets. presented regarding previously were results AddTree interpretability Preliminary of limited provides. the GBS improving while that performances GBS, the and surpass CART may of pre- AddTree the of that performance hypothesis the dictive motivates connection This GBS. and fotiigls-nepeal oes yuigtettlNDN total the expense using the by at models, significance, less-interpretable statistical obtaining with of GBS and AddTree, primarily that algorithm boosting-based vari- bias. a reduces reduces GBS, AddTree to converges 4, it Fig. in weighting illustrated the as training of ance as the variation fact, of the all In and regu- of nodes scheme. of descendant passing effect the the the to through of data but splits boosting, tree result to the a connection larizing is its AddTree CART of GBS. only over optimized not performance as accurate predictive performance trees as in the is improvement decision which surpasses and that building CART performance of for predictive pre- paradigm a AddTree AddTree with the alternative that over conclude yet an improve we provides and not results, these does possible, Through GBS significantly. performance tuned, of optimally best performance adap- were dictive the was hyperparameters GBS provide of the to stumps and of added, number tively the However, based compari- interpretability NDNs. of better depth terms on a in maximum AddTree provided and the have GBS between would match son AddTree to and size that CART ensemble of Notice aver- GBS (23). the plots an limiting dependence had partial AddTree through (SD (SD interpreted 50.7 whereas 4,692.0 of tasks, NDN of age Further- classification NDN significant. average the an statistically over had not GBS are more, results generate the can AddTree, trees. AddTree smaller building that while CART conclude than predictions we better that Therefore, CART, 48.50). The and CART. in AddTree to shown of respect NDNs significant with is, the a NDN between showed average difference AddTree the on Moreover, reduction statistically well. be to as shown been significant has CART over of AddTree of values mance positive toward E bias a reflects ∆BACC † h symbol The  h eut rsne nFg aiaeteaiiyo AddTree of ability the validate 2 Fig. in presented results The h reesmlsR n B upromdCART, outperformed GBM and RF ensembles tree The over GBS of advantage performance apparent the Despite ∆ BACC 193.4 776.9 156.4 804.1 NDN 700,981.6 752,654.8 1,614.1 316,728.8 412,458.4 4,692.0 λ = × × × × hs envleis value mean whose S4, Fig. Appendix, SI E nrae ihu opoiigba ic,as since, bias compromising without increases AddTree = {·} 10 10 10 10 BACC NDN −3 −3 −3 −3 eoe h xetto operator. expectation the denotes hc hw h itiuino h difference the of distribution the shows which , AddTree AddTree 173.4 831.9 140.9 837.5

> 0 = E RF × × × ×  ;hwvr B a eindirectly be can GBS however, .17); BACC 10 10 10 10 − − −3 −3 −3 −3 BACC NDN CART 170.0 833.3 138.2 843.4 CART CART GBM × × × ×

NSLts Articles Latest PNAS . † 10 10 10 10 a h distribution the has , h ueirperfor- superior The ∆ nwihtemean the which in , −3 −3 −3 −3 AC therefore, BACC; .6(SD −4.66 1,614.1) = λ | ∞, → f7 of 5 SI =

STATISTICS Downloaded by guest on September 29, 2021 B,R,adGMuigBC,amaueetcmol sdt cal- experiments to All used (34). commonly domains measurement imbalanced a 2-class BACC, in using performance GBM culate and RF, GBS, random a tasks. as classification datasets potential 83 of the viewing sample on dif- based significant were statistically performances, for ferential tests the 1,713.0 including of comparisons, mean performance a have datasets 83 (SD demanded remaining instances analyses because The performance discarded dataset. overfitting the particular was avoid that this instances in times to 48,842 execution out with extended dataset ruled the 1 of were and instances 11 analysis, datasets, 100 the PMLB in than available 95 less the performance with of the datasets compare Out and algorithms. assess learning to different includes order of It in (43). datasets synthetic tool and Learning Irvine real Evolutionary California, on the based of and (42), Extraction University benchmark Knowledge metalearning the the (41), as used repository learning commonly such machine from benchmarks datasets learning includes machine evaluat- PMLB for classification. GitHub, through supervised accessible ing datasets, curated of selection a NDN. tains and Evaluation Performance Predictive Wilcoxon the of the all using (40). of test evaluated significance sum statistical rank was The measurements (39). performance language R predictive the in the written using (38) out carried were experiments All Methods and Materials to in AddTree are agenda. of (37) research connections Forest future Random our of Generalized analysis introduced to the recently AddTree the 5) of and extensions boosting implementation losses, formal gradient other the 4) of Furthermore, 2) application the AddTree. trees, 3) to CART and AddTree, as bagged such of (γ and trees stumps RF the decision of of by replacement performance the predictive the 1) the namely, concern, GBM, over possibil- of improving provides not for AddTree ities is of performance model predictive the sce- superior the understanding in where even However, narios interpretability. measure to proxies as in details (more CART of structure is logic it binary and AddTree, Appendix hard as the such by method regression-like solved ill-suited soft is better which any problem, by parity captured synthetic be a to of consists CART of favor in task (P CART (∆BACC than tasks BACC 55 better significantly exhibits AddTree f7 of 6 3. Fig.

nti td,w vlae h efrac fAdreaantCART, against AddTree of performance the evaluated we study, this In -0.10 -0.06 -0.02 0.02 | BACC = a hr hwn h difference the showing chart Bar www.pnas.org/cgi/doi/10.1073/pnas.1816748116 ). = )wt 63faue (SD features 36.3 with 2,900.3) > )oto h 3PL lsicto ak.Teoeoutlier one The tasks. classification PMLB 83 the of out 0) AddTree − BACC ∆BACC rtemis h MBrpstr 3)con- (35) repository PMLB The = = ) h itiuin of distributions The 111.4). ahn eriglibrary learning machine CART BACC AddTree = 3.08 n ,l − Eq. (x), × BACC 10 −7 CART in ) 6) SI . u ubro bevto ttetria oe a civd(nodesize achieved was nodes terminal based the at stopping and observation 1 grown early of number were by mum ensembles tuned tree the were using RF search. trees pruned error. grid GBM classification using and cross-validated tuned GBS on also of was which as number parameter, parameter), The (n.minobsinnode maxdepth nodes the terminal as the well in the observation using 1 pruned of and ber not grown is are 0.0001 fit ensembles of of GBM gbm factor lack and parameter a overall GBS by complexity the The error decrease the attempted. classification not cross-validated of does the variation which on a split based any on that relies using indicates tuned also that optimally fair CART is for prune.cp than search. The 30 less trimmed. grid is of is measure prune.cp maxdepth complexity denoted value a whose a branch AddTree, attempted any in be addition, to as In split comparison. same a for and, observations parameter) 2 of (minsplit were number trees minimum a CART by contrast, the limited In using out- out. pruned both carried and (e.g., in was grown method pruning) class pruning based same small-tree statistical the or other give no branches however, stopping (i.e., child puts); paths the redundant subsequent AddTree and to where impossible in due of nodes paths elimination mostly the of on experiments, number based reduced the the is Furthermore, of min.membership. depth any of tree this in effect however, parameter); reached (maxdepth a tree not place such maximum in was otherwise, A also internal; is parameter). 30 node (min.membership of a node depth make leaf to a required becomes is node node child per tion tuned in were in provided shown rate are is RF, learning space S2. hyperparameters pre- hyperparameter Table for the the the of as and tuned of S1, well number Table was definitions Appendix, as the split Detailed depth 4) each GBM. tree GBS, for at maximum number for the candidates maximum tuned 5) as the and were and sampled pruning ensemble rate randomly complexity learning the dictors cost the in 3) in stumps CART, used of for tuned parameter was the tuned: complexity (44) of were the 75% hyperparameters 2) factor to inter- following AddTree, set interpretability both was The The size In cases. 1) set algorithms. of training across number the comparison each available resampling, maxi- for external direct fixed and to a External subsamples, nal ensure search stratified resampling). to 25 grid (internal using dataset subsamples using hyper- performed stratified tuned was the 5 resampling dataset, were across each algorithm BACC For mize each resampling. of nested parameters using performed were ihrsett AT h ro asrpeet1SE. 1 represent bars error The CART. to respect with parameter pretability 4. Fig. Prediction Variance h dTe prahgostesweeamnmmo n observa- one of minimum a where trees grows approach AddTree The

0.16 0.18 0.20 0.22 0.24 0.26 0.28 num- minimum the by limited is growth trees’ Individual (45). package siaeo h aineo dTe safnto fteinter- the of function a as AddTree of variance the of Estimate 02468 ranger oieta dTe sal oipoetevariance the improve to able is AddTree that Notice λ. akg 4) h re eegonutltemini- the until grown were trees The (46). package rpart λ n h erigrt eetndfor tuned were rate learning the and akg 4) h ATte rwhis growth tree CART The (44). package lambda minimum-error IAppendix, SI uae al. et Luna SI Downloaded by guest on September 29, 2021 7 .Bemn .H remn .A lhn .J Stone, J. C. Olshen, A. R. Friedman, H. J. Breiman, L. 17. Hendricks pre- A. the L. Explaining 16. you?: trust I should “Why Guestrin, C. Singh, S. Ribeiro, Vaughan, Wortman T. M. 15. J. Hofman, M. J. Goldstein, G. D. Poursabzi-Sangdeh, learning. F. machine interpretable 14. of 6 science rigorous Deposited a Towards arXiv:1606.03490. Kim, interpretability. B. Doshi-Velez, model F. of 13. mythos The Lipton, C. Z. 12. decision- algorithmic on regulations Union “European Flaxman, S. Goodman, B. 11. Caruana R. 10. 1 .M Luna M. inter- J. to 21. approach generative A gap: the “Mind Doshi-Velez, F. Shah, methodology. A. classifier J. tree Kim, decision B. of survey 20. A Landgrebe, D. Safavian, R. S. trees. 19. regression and classification of years Fifty Loh, W.-Y. 18. rdcin costebosrp hr h aewslf u ntpr of subsampled part randomly (not 1,000 out from left drawn was were bootstraps case set test 500 the all set); where of training mode the bootstraps the calculated the from We different across PMLB. was predictions prediction of test dataset a twonorm times the of percent using 47 ref. in indicated AddTree. of Variance Empirical predicted gets row input every times. that few ensure a speci- least to is at try trees to ensembled parameter) 1,000 (ntree of fied number default a Moreover, parameter). uae al. et Luna .Z .Lpo,“h otrjs o’ cetta! in that!” accept won’t just doctor “The Lipton, C. Z. 9. across used software There’s bias: Machine Kirchner, L. Mattu, S. Larson, J. Angwin, J. scoring: 8. credit to applied methods Classification Fernandes, B. G. Ara, A. Louzada, F. 7. (EHRs): Sculley records health D. electronic Mining 6. Simon, G. Kumar, V. Steinbach, M. Yadav, P. 5. seg- image medical on survey “Recent Shams, M. Salah, A. Atef, from A. flood? Salem, automatically the A.-M. for M. derived Waiting medicine: 4. laboratory Semantics in learning Narayanan, Machine Banfi, G. A. Cabitza, F. Bryson, 3. J. J. Caliskan, A. 2. making decision “Algorithmic Huq, A. Goel, S. Feller, A. Pierson, E. Corbett-Davies, S. 1. .Lie .Mts .Sb,M eln,Es Srne,21) pp. 2016), (Springer, Trees Eds. Welling, M. Sebe, N. Matas, J. 3–19. Leibe, B. Vision, Computer Krishnapuram 1135–1144. B. pp. Mining, Data and in Discovery Knowledge classifier” any of dictions 1–14. pp. Caruana, 2017), on R. (NeurlPS, Simard, Conference P. Eds. Yosinski, J. Herlands, International Wilson, W. G. the A. at Systems, Processing Learning Information Neural Machine in Interpretable interpretability” of model Symposium measuring and “Manipulating Wallach, H. 2017. March 2 Deposited arXiv:1702.08608. 2017. March on Conference (International Machine Eds. 1–9. on Varshney, pp. R. Conference 2016), K. Learning, Malioutov, International Machine M. in the D. Kim, at explanation’” B. Learning Learning, to Machine ‘right in Interpretability a and making Cao L. Mining, Data 1721–1730. and Discovery Knowledge on in readmission” 30-day hospital Herlands, W. Caruana, R. 1–3. Simard, P. pp. Yosinski, 2017), J. (NeurlPS, Wilson, Informa- Eds. G. Neural A. on Systems, Conference Processing International tion the at Learning Machine Interpretable blacks. against 2018. October biased 1 it’s Accessed sentencing. And criminals. future https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal- predict to country the (2016). comparison. 2015), overall (NeurlPS, and review Eds. Systematic Garnett, R. Sugiyama, Systems, M. Lee, Processing D. Information D. 2503–2511. Lawrence, pp. Neural D. on N. Cortes, Conference C. International of ceedings survey. A Global, (IGI Eds. Anttiroiko, A.-V. Becker, 129–169. A. pp. Jennex, 2018), E. M. Clarke, S. Khosrow-Pour, M. in mentation” Med. Lab. Chem. Clin. biases. human-like (SIGKDD, contain corpora Eds. language Farooq, F. Yu, S. Matwin, S. 797–806. Mining, pp. 2017), Data and in Discovery fairness” Knowledge of cost the and ml elln acruigmcielearning. machine using cancer lung cell small Lee, 2260–2268. D. pp. D. 2015), Lawrence, (NeurlPS, D. Eds. N. Garnett, Cortes, R. C. Sugiyama, Systems, M. Processing Information in Neural extraction” on ence and selection feature pretable Cybernetics Man Sys. Trans. (2014). Wdwrh&Bok,Mnee,C,1984). CA, Monterey, Brooks, & (Wadsworth C opt Surv. Comput. ACM rdcigrdainpemntsi oal dacdsaeI–I non- II–III stage advanced locally in pneumonitis radiation Predicting al., et Hde ehia eti ahn erigsses in systems” learning machine in debt technical “Hidden al., et Itliil oesfrhatcr:Peitn nuoi ikand risk pneumonia Predicting healthcare: for models “Intelligible al., et optrVso:Cnet,Mtooois ol,adApplications, and Tools, Methodologies, Concepts, Vision: Computer Gnrtn iulepaain”in explanations” visual “Generating al., et 1–2 (2018). 516–524 56, 6–7 (1991). 660–674 21, rceig fteAMItrainlCneec on Conference International ACM the of Proceedings rceig fteAMItrainlCneec on Conference International ACM the of Proceedings –0(2018). 1–40 50, rceig fteAMItrainlConference International ACM the of Proceedings h aineo dTe a acltdas calculated was AddTree of variance The uv pr e.Mng.Sci. Manage. Res. Oper. Surv. rceig fWrso nHuman on Workshop of Proceedings Science aite.Oncol. Radiother. rceig fItrainlConfer- International of Proceedings d.(IKD 05,pp. 2015), (SIGKDD, Eds. al., et 8–8 (2017). 183–186 356, rceig fSmoimof Symposium of Proceedings lsicto n Regression and Classification d.(IKD 2016), (SIGKDD, Eds. al., et n.Sa.Rev. Stat. Int. uoenCneec On Conference European 0–1 (2019). 106–112 133, rceig of Proceedings 117–134 21, 329–348 82, ProPublica. IEEE Pro- 7 .Ahy .Tbhrn,S ae,Gnrlzdrno forests. random Generalized Wager, S. Tibshirani, J. Athey, S. 37. Luna large M. A J. PMLB: Moore, 36. H. J. Urbanowicz, J. R. Orzechowski, P. Cava, La W. Olson, S. R. 35. and accuracy balanced “The Buhmann, M. J. Stephan, E. K. Ong, S. C. Brodersen, trees. H. decision K. fuzzy 34. of Induction Shaw, of view J. M. statistical Yuan, A Y. regression: logistic 33. Additive Tibshirani, R. Hastie, T. Friedman, J. 32. Valdes G. 31. classification, for in models discriminative tree” Learning boosting-tree: decision “Probabilistic Tu, a Z. into 30. classifier weak a Boosting “AdaTree: Grossmann, and learning E. on-line of 29. generalization decision-theoretic A Schapire, E. R. Freund, Y. 28. Quinlan, R. J. 27. 0 .Pgn,K Gauvreau, K. Pagano, M. 40. Team, Core cog- R and 39. development matter Gray psychiatry: precision “Towards Gennatas, D. E. 38. learning tree decision top-down of ability boosting the On interaction Mansour, Y. and Kearns, M. trees. selection classification 26. of variable precision the unbiased Improving Loh, with W.-Y. 25. trees Regression Loh, machine. W.-Y. boosting gradient 24. A approximation: function Greedy Friedman, H. J. learning 23. supervised of comparison empirical “An Niculescu-Mizil, A. Caruana, R. 22. 7 .Vlnii .G iteih Lwba agdspotvco ahns in machines” vector support bagged bias “Low Dietterich, G. T. Valentini, G. 47. meta- various of approaches evaluating for dataset comprehensive “A http://archive.ics.uci. repository. Reif, learning M. machine 42. UCI Taniskidou, K. E. Dheeru, D. 41. 6 .N rgt .Zelr agr atipeetto frno oet o high for forests random of implementation fast A ranger: Ziegler. 4.1-13. A. Wright, version N. M. package 46. R ‘rpart.’ package Ridgeway Ripley, G. B. 45. Atkinson, B. Therneau, T. 44. Alcal J. 43. 0E060.Tecneti oeytersosblt fteatosand of authors Institutes the National the of of Award views responsibility official Health. under the the represent Health solely necessarily of not is does Institutes content National The the in K08EB026500. of Imag- Biomedical reported Bioengineering of Institute research and National the ing the by Additionally, supported was Collective. publication Emerson this the by granted ACKNOWLEDGMENTS. 9.00} .00, 4 of 2.33, set 1.50, .00, 1 a 0.67, for 0.43, .25, 0 bootstrap 0.11, each {0.00, on trained was AddTree cases. .G isn .Ysnk,P iad .Craa .Hrad,Eds. Herlands, W. Caruana, R. Simard, P. Yosinski, 1–8. J. pp. Wilson, 2017)., G. (NeurlPS, A. 2017), (NIPS in tems trees” decision full and stumps comparison. and (2017). evaluation 36 learning machine for suite 3121– benchmark pp. 2010), (IEEE, Eds. Lee, S.-W. Cetin, M. Boyer, 3124. K. Ercil, A. (ICPR), Recognition in distribution” posterior its (1995). boosting. medicine. precision of era the in making Chaudhuri, S. Gool, 1589–1596. Van pp. L. 2, Freeman, vol. T. 2005), W. (IEEE, Shum, H.-Y. 105– Eds. Ma, pp. S. (ICCV), 2004), Vision Computer (IEEE, in Eds. clustering” and Hanson, (CVPRW), recognition, A. Workshop Haralick, R. Recognition Hung, Pattern Y.-P. Kumar, and 105. R. Vision Zhu, Computer Z. on Conference boosting. to application an onainfrSaitclCmuig ina uti,2018). Austria, Vienna, Computing, Statistical PA for Foundation Park, University Pennsylvania, of University thesis, (2017). PhD adolescence,” in nition (2019). algorithms. (2009). 1737 detection. 161– pp. Stat. 2006), Machinery, Computing for (Association Eds. 168. Moore, A. Cohen, W. in algorithms” .Msr,Es AA rs,20) p 752–759. pp. 2003), Fawcett, Press, T. (ICML-03) , (AAAI Learning Eds. Machine Mishra, on N. Conference International 20th the of ings S S. J. (ICPRAM), Methods and nition in tasks” learning 2017. June 15 Accessed edu/ml. iesoa aai + n R. and C++ in data dimensional 2018. July 7 Accessed project.org/web/packages/gbm/gbm.pdf. 2018. July 1 Accessed https://cran.r-project.org/web/packages/rpart/rpart.pdf. framework. analysis experimental Comput. and algorithms of tion 273–276. 1913 (2001). 1189–1232 29, a-Fdez ´ 5–8 (2011). 255–287 17, n.Stat. Ann. eios:Aptetsrtfiainto o nepeal decision interpretable for tool stratification patient A MediBoost: al., et tt Sin. Stat. Te-tutrdbotn:Cnetosbtengain boosted gradient between Connections boosting: “Tree-structured al., et .Cmu.Ss.Sci. Syst. Comput. J. 45 rgasfrMcieLearning Machine for Programs C4.5: eldt-iigsfwr ol aastrpstr,integra- repository, set Data tool: software data-mining Keel al., et akg gm’Rpcaevrin213 https://cran.r- 2.1.3. version package R ‘gbm.’ package al., et rceig fteItrainlCneec nMcieLearning, Machine on Conference International the of Proceedings :ALnug n niomn o ttsia Computing Statistical for Environment and Language A R: rceig fteItrainlCneec nPtenRecog- Pattern on Conference International the of Proceedings 6–8 (2002). 361–386 12, 3–0 (2000). 337–407 28, hswr a atal upre ya award an by supported partially was work This rnilso Biostatistics of Principles .Cmu.Ss.Sci. Syst. Comput. J. rceig fteItrainlCneec nPattern on Conference International the of Proceedings rceig fteIE nentoa ofrneon Conference International IEEE the of Proceedings .Sa.Softw. Stat. J. 0–2 (1999). 109–128 58, ofrneo erlIfrainPoesn Sys- Processing Information Neural on Conference nhz .L amn,Es IE,21) pp. 2012), (IEEE, Eds. Carmona, L. P. anchez, ´ c.Rep. Sci. –7(2017). 1–17 77, 1–3 (1997). 119–139 55, 75 (2016). 37854 6, CamnadHl/R,2018). Hall/CRC, and (Chapman Esve,2014). (Elsevier, NSLts Articles Latest PNAS uz esSyst. Sets Fuzzy .Ml.Vle oi Soft Logic Valued Mult. J. n.Ap.Stat. Appl. Ann. n.Stat. Ann. λ aus namely, values, iDt Min. BioData 1148–1178 47, 125–139 69, | Proceed- 1710– 3, f7 of 7 Ann. 10, (R

STATISTICS