A Adaboost, 176 ADALINE. See Adaptive Linear Element/ Adaptive
Total Page:16
File Type:pdf, Size:1020Kb
Index A B AdaBoost, 176 Bayes’ classifier, 171–173 ADALINE. See Adaptive Linear Element/ Bayesian decision theory Adaptive Linear Neuron multiple features (ADALINE) complex decision boundary, Adaptive Linear Element/Adaptive Linear 85, 86 Neuron (ADALINE), 112 trade off performance, 86 Agglomerative hierarchical clustering. See two-dimensional feature space, 85 Hierarchical clustering single dimensional (1D) feature Angular anisotropy, 183 class-conditional probabilities (see ANN. See Artificial neural network (ANN) Class-conditional probabilities) Artificial neural network (ANN) classification error, 81–83 activation function, 112, 113 CondProb.xls, 85 ADALINE, 112 likelihood ratio, 77, 83, 84 advantages and disadvantages, 116, 117 optimal decision rule, 81 backpropagation, 109, 115 posterior probability, 76–77 bias, 99–100, 105–106, 109 recognition approaches, 75 feed-forward, 112, 113 symmetrical/zero-one loss global minimum, 110 function, 83 learning rate, 109–111 Bayes’ rule linear discriminant function, 107 conditional probability and, 46–53 logical AND function, 106, 107 Let’s Make a Deal, 48 logical OR function, 107 naı¨ve Bayes classifier, 53–54 logical XOR function, 107–109, 114 posterior probability, 47 multi-layer network, 107, 112, 116 Bias-variance tradeoff, 99–101 neurons, 104–105 Bioinformatics, 2 overfitting, 112–113 Biometrics, 2 perceptron (see Perceptron, ANN) Biometric systems recurrent network, 112 cost, accuracy, and security, 184 signals, 104 face recognition, 187 simulated annealing, 115 fingerprint recognition, 184–186 steepest/gradient descent, 110, 115 infrared scan, hand, 184 structure, neurons, 105 physiological/behavioral training, 109, 115–116 characteristics, 183 universal approximation theorem, 112 Bootstrap, 163–164 weight vector, 109, 112 Breast Screening, 51–52 Astronomy, 2 Brodatz textures, 181 G. Dougherty, Pattern Recognition and Classification: An Introduction, 189 DOI 10.1007/978-1-4614-5323-9, # Springer Science+Business Media New York 2013 190 Index C Bayes rule, 168 Canonical plot bias and variance, 159–160 confusion matrix, 140 classification toolbox, 171–174 contours, classes, 139 confusion matrix, 164, 165 description, 135 cross-validation and resampling eigenvalues, 136–137 methods, 160–164 iris data, 136 decision stumps, 176 CDF. See Cumulative distribution diverse, 175 function (CDF) ensemble output, 175 Central Limit Theorem, 57, 76, 87 error rate, 157, 174 Chaining, 150 learning, 174 Character recognition, 2 McNemar’s test, 169, 170 Chebyshev distance, 143 medical diagnosis, 167 Class-conditional probabilities null hypothesis and outcomes, test, 167 classification error, 81–83 overlapping probability density decision threshold, 81 functions, 164 densities, 79 performance measures, 165 description, 76 ROC, 165, 166, 169, 170 different variances, 78–80 scientific method, 159 equal variance, 77–78 SE, 167–168 likelihood test, 80 significance level, 166 Classification statistical tests, 169, 171 acquired image, 3 three-way data split, 158 algorithm selection and supervised training set, 157 learning, 17–18 validation set, 158 decision boundaries, 5 Cluster analysis, 143 electronic components, shapes Clusters and sizes, 23, 24 analysis, 143 face recognition, 5 hierarchical, 150–154 features (see Features) k-means clustering (see k-Means labeling, 10, 11 clustering) letters, 25 non-exclusive, 144 nonmetric approaches, 19–20 partitional, 144, 145 post-processing, 10 Coefficient of determination, 62 pre-processing, 3, 9 Computer-aided diagnosis, 2 process, 3, 4 Conditionally dilation, 21, 22 rule-based classifier, 39 Conditional probability segmentation, 9–10 description, 46 sensing/acquisition stage, 9 multiplicative rule, 47 shape, 21–22 sensitivity and specificity, 50 single-pixel outlines, fruit, 24 Venn diagram, 45, 46 size, 22–23 CondProb.xls, 52–53, 85 stages, 9, 10 Confusion matrix, 137, 140, 153 statistical approaches, 18–19 Contingency table statistical/structural techniques, 5–6 description, 46 string matching, 40 diagnostic test, 49 SVM, 20–21 marginal probabilities, 50, 52 training and learning, 16–17 Correlation coefficients, 130, 131 Classification error, 30–31 Correlation matrix, 61 Classifiers Covariance matrices AdaBoost, 176 bivariate Gaussian distribution, 66 AUC approaches, 165, 166 coefficient of determination, 62 bagging/bootstrap aggregating, 175 correlation matrix, 61 Index 191 decision boundary Decision threshold hyperbolic, 92 decision boundaries, 85 linear, 92, 93 intersecting distributions, 81 parabolic, 90, 91 intersection, probability distribution, 80–81 definition, 59–60 posterior probabilities, 78 2D Gaussian and isocontours, Decision tree 63–65 advantages and disadvantages, 38–39 diagonalization, 66–67 benefit, 28 eigenvalues and eigenvectors, 135 binary splits, 38 equal, 90, 91 branching structure, 27, 28 equiprobable normally distributed computes, logical AND function, 40 classes, 92–94 cross-validation, 37 factorization, 61 entropy, information and impurity, 29–31 feature vector, 60 feature axes, 35, 36 LDA, 136 gain (see Gain) nonzero off-diagonal elements, 65–66 greedy strategy, 29 PCA, 66, 128, 129 ID3 algorithm, 37 QDA, 90, 137–138 oblique, 36 statistical significance, 62–63 overfit, 36, 37 symmetric matrix, 68 structure, 19, 20 whitening transform, 68 three-level, 27, 28 Cross-validation training and test error rat, 36–37 bootstrap, 163–164 underfitting, 36–37 holdout method, 161 Dendrogram k-fold cross-validation, 162–163 description, 150 leave-one-out validation, 18 hierarchical clustering, Fisher iris overfitting, 160 data, 151, 152 Cumulative distribution function (CDF) nested clusters, 144 definition, 55 scatter plots, Fisher’s canonicals, 151, 153 and PDF, 55–56 Diagonalization, 66–67 Curse of dimensionality, 102, 123 Dimensionality curse of dimensionality, 123 methods D feature extraction (see Feature Data mining, 1 extraction) Decision boundary feature selection (see Feature selection) bivariate normal distributions, 89 peaking phenomenon, 123, 124 complex, 85, 86 preprocessing, 124 decision thresholds, 85 Discriminant functions and discriminant functions, 87 classes, decision curves, 94, 95 hyperbolic, 92, 94 covariance matrices (see Covariance linear, 90, 91 matrices) parabolic, 90, 91 decision boundary, 89 quadratic, 86 decision regions, 87 Decision nodes, 27, 28, 35 description, 87 Decision regions, 87, 94, 95 equal variance, decision lines, 89, 90 Decision rule LDA, 87–88 class-conditional probabilities, 78, 79 linear machine, 87–88 decision regions, 87 minimum-distance classifier, 89 likelihood ratio, 77, 83 Dissimilarity, 72 minimum-distance classifier, 89 Diverse classifiers, 175 optimal decision rule, 81 Document recognition, 2 probability, classification error, 76 Drucker, H., 176 192 Index E eigenvectors, 134, 135 EM algorithm. See Expectation- iris flowers, 129, 130 maximization (EM) algorithm MDS, 133 Entropy, 29–30 overlaid biplot, 132 Errors principal components, 133, 134 accuracy, 43, 44 scatter plot matrix, 129, 130 precision, 43, 44 Fuzzy c-means clustering, 148–149 Expectation-maximization (EM) algorithm, 145, 148 G Gain F attributes, 31–32 Face recognition, 5, 187 decisions made, 33, 34 False Positive Paradox, 52 Gini index, 32 FDR. See Fisher’s discriminant ratio (FDR) ID3 algorithm, 34–36 Feature extraction splitting binary attributes, 33 LDA (see Linear discriminant stages, weather attribute, 34, 35 analysis (LDA)) Gaussian distribution optimal mapping, 127 bivariate, 66, 67 PCA (see Principal Component diagnostic testing, 48–49 Analysis (PCA)) Mahalanobis distance, 69 Features multivariate Gaussian categorical/non-metric, 4 (see Multivariate Gaussian) classification, 15 PDF, 56 continuous, 4 standard deviation, 57 discriminating, 12 Gini impurity, 30–31 extraction, 11 Goodness-of-fit, 182–183 independent, 12 linear decision boundary, 15 prudent approach, 15 H reliable, 12 Hessian, 179 robust, 12 Hierarchical clustering scaling, 14 algorithm, 150 shape, 12–13 complete-link clustering, 150 structural, 12 dendrogram and nested clusters, 150 vector, 13–14 operation, 150 Feature selection single-link clustering, 150, 151 inter/intraclass distance (see Ward’s method (see Ward’s method) Inter/interclass distance) Holdout method, 161 subset selection, 126 Fingerprint recognition biometric identification, 184 I FVC-onGoing, 185 ID3 algorithm, 33, 34, 37, 41 minutiae points, 185 Impurity point pattern matching, 186 classification error, 31 triangles, 186 Gini, 30–31 Fisher, R.A., 70, 126, 129–131, 135, 138 Information, 29 Fisher’s discriminant ratio (FDR), Inter/interclass distance 126, 135 between-scatter matrix, 126 Fisher’s iris data FDR, 126 correlation matrix, 130, 131 single feature, two equiprobable 3D scatterplot, 131 classes, 124, 125 eigenvalues, 132, 133 within-scatter matrix, 125 Index 193 Isocontours Learning diagonalization, 66–67 classification error, 16 ellipsoidal, 68 eager learning, 102 Gaussian, 63–66 Lazy-learning, 19 Mahalanobis distance, 69 reinforcement learning, 16–17 supervised learning, 16–18 training set, 16 J unsupervised learning, 16 Jack-knifing, 137 Least-squares classifier, 171, 172 Leave-one-out approach, 162, 163 Let’s Make a Deal, 48 K Likelihood ratio, 77, 83, 84 Karhunen-Loe`ve (KL) transform. See Linear discriminant analysis (LDA) Principal Component canonicals (see Canonical plot) Analysis (PCA) confusion matrix, 137 Kernel description, 135 Gaussian kernel, 99, 100 jack-knifing, 137 machine, SVM, 117–119 loss function terms, 140 Parzen window, 99 Mahalanobis distances, 137 polynomial kernel, 119, 120 QDA, 137–138