Enabling Freehand Sketching Through Improved Primitive Recognition
Total Page:16
File Type:pdf, Size:1020Kb
RETHINKING PEN INPUT INTERACTION: ENABLING FREEHAND SKETCHING THROUGH IMPROVED PRIMITIVE RECOGNITION A Dissertation by BRANDON CHASE PAULSON Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY May 2010 Major Subject: Computer Science RETHINKING PEN INPUT INTERACTION: ENABLING FREEHAND SKETCHING THROUGH IMPROVED PRIMITIVE RECOGNITION A Dissertation by BRANDON CHASE PAULSON Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Approved by: Chair of Committee, Tracy Hammond Committee Members, Yoonsuck Choe Ricardo Gutierrez-Osuna Vinod Srinivasan Head of Department, Valerie E. Taylor May 2010 Major Subject: Computer Science iii ABSTRACT Rethinking Pen Input Interaction: Enabling Freehand Sketching Through Improved Primitive Recognition. (May 2010) Brandon Chase Paulson, B.S., Baylor University Chair of Advisory Committee: Dr. Tracy Hammond Online sketch recognition uses machine learning and artificial intelligence tech- niques to interpret markings made by users via an electronic stylus or pen. The goal of sketch recognition is to understand the intention and meaning of a partic- ular user's drawing. Diagramming applications have been the primary beneficiaries of sketch recognition technology, as it is commonplace for the users of these tools to first create a rough sketch of a diagram on paper before translating it into a machine understandable model, using computer-aided design tools, which can then be used to perform simulations or other meaningful tasks. Traditional methods for performing sketch recognition can be broken down into three distinct categories: appearance-based, gesture-based, and geometric-based. Al- though each approach has its advantages and disadvantages, geometric-based methods have proven to be the most generalizable for multi-domain recognition. Tools, such as the LADDER symbol description language, have shown to be capable of recognizing sketches from over 30 different domains using generalizable, geometric techniques. The LADDER system is limited, however, in the fact that it uses a low-level rec- ognizer that supports only a few primitive shapes, the building blocks for describing higher-level symbols. Systems which support a larger number of primitive shapes have been shown to have questionable accuracies as the number of primitives increase, or they place constraints on how users must input shapes (e.g. circles can only be drawn in a clockwise motion; rectangles must be drawn starting at the top-left corner). iv This dissertation allows for a significant growth in the possibility of free-sketch recognition systems, those which place little to no drawing constraints on users. In this dissertation, we describe multiple techniques to recognize upwards of 18 primi- tive shapes while maintaining high accuracy. We also provide methods for producing confidence values and generating multiple interpretations, and explore the difficulties of recognizing multi-stroke primitives. In addition, we show the need for a standard- ized data repository for sketch recognition algorithm testing and propose SOUSA (sketch-based online user study application), our online system for performing and sharing user study sketch data. Finally, we will show how the principles we have learned through our work extend to other domains, including activity recognition using trained hand posture cues. v ACKNOWLEDGMENTS The writing of this dissertation would not be possible without the strength and guidance given to me by my Lord and Savior, Jesus Christ. In addition, this achieve- ment would not have been completed without the love and support I have received from my wife, Stacy, as well as the rest of my entire family. I would also like to thank my family at Beacon Baptist Church for their encouragement during this pro- cess. Thank you to my committee members, Dr. Yoonsuck Choe, Dr. Ricardo Gutierrez-Osuna, and Dr. Vinod Srinivasan for all that you have taught me, both in the classroom, and through your helpful critiques of this work. Thanks to the many members of the Sketch Recognition Lab (past and present) for your help in proofreading, critiquing, and editing this work. Thank you also to Greg Sparks for taking time to proofread this document as well. Last, but not least, I would like to acknowledge my advisor, Dr. Tracy Hammond, for all of her hard work and the investment that she has made in me to further my academic career. vi TABLE OF CONTENTS CHAPTER Page I INTRODUCTION :::::::::::::::::::::::::: 1 A. Sketch Recognition in Design . 1 B. Sketch Recognition in Engineering & Education . 2 C. Proposal . 3 II PREVIOUS WORK IN PEN-BASED INTERFACES :::::: 7 A. Hardware Research . 7 B. Human-computer Interaction . 10 1. Sketch Editing . 11 2. Incorporating Gestures . 11 3. Beautification . 13 4. Toolkits . 15 5. Multimodal Systems . 15 C. Recognition . 16 D. High-level Recognition . 17 E. Low-level Recognition . 19 1. Motion-based Recognition . 20 2. Appearance-based Recognition . 23 3. Geometric-based Recognition . 24 a. Corner Finding . 25 b. Primitive Recognition . 30 c. Limitations of Geometric-based Approach . 31 4. Hybrid/Combination Techniques . 32 F. Applications for Sketch-based Interfaces . 35 1. Sketching in Design . 35 a. Storyboards . 37 b. 3D Modeling . 37 2. Sketching in Engineering and Education . 38 a. Sketching in Engineering . 39 b. Sketching in Math & Science . 41 c. Sketching in Language & Fine Arts . 43 3. Search by Sketch . 44 vii CHAPTER Page III PALEOSKETCH :::::::::::::::::::::::::: 46 A. Introduction . 46 1. First Version of PaleoSketch . 49 2. Determining What Primitives to Support . 49 3. Symbol Description Experiment . 51 B. Data Set . 53 C. The First Part of PaleoSketch: Pre-recognition . 55 D. Features . 58 1. New Geometric Features . 59 a. General Features . 59 b. Line Features . 62 c. Ellipse Features . 64 d. Circle Features . 64 e. Arc Features . 65 f. Curve Features . 66 g. Polyline Features . 67 h. Spiral & Helix Features . 68 E. Classifier . 70 F. Handling Complex Shapes . 71 1. Confidence Modification . 72 G. Experiment & Results . 74 1. Accuracy of Complex Fits . 77 H. Discussion . 78 1. Complex Fits . 78 2. Using PaleoSketch in a Real-world Setting . 82 3. Limitations of PaleoSketch . 82 I. Chapter Summary . 85 IV MILITARY COURSE OF ACTION :::::::::::::::: 86 A. Introduction . 86 B. Previous Work in COA Systems . 87 C. Methodology . 89 1. Features . 90 a. Rectangle & Diamond Features . 91 b. Arrow Features . 91 c. Dot Features . 92 d. Wave Features . 93 e. Gull Features . 94 viii CHAPTER Page f. NBC Features . 95 2. Classifier . 96 3. Data . 96 D. Results . 98 E. Discussion . 98 F. Chapter Summary . 100 V MULTI-STROKE PRIMITIVES :::::::::::::::::: 101 A. Introduction . 101 B. Implementation . 103 1. Graph Building . 104 2. Graph Searching . 106 3. Stroke Combination . 108 4. False Positive Removal . 109 5. Arrow Detection . 111 C. Experiment . 113 D. Results . 115 1. Single-stroke . 115 2. Multi-stroke . 118 3. Complex Shapes . 123 E. Discussion . 129 1. Comparison to Other Methods . 129 2. Multiple Interpretations . 131 3. Improving Complex Interpretations . 132 4. Improving Grouping . 132 5. Additional Improvements . 133 F. Chapter Summary . 133 VI SOUSA :::::::::::::::::::::::::::::::: 134 A. Introduction . 134 B. Previous Efforts . 134 C. The SOUSA System . 135 1. Collection Studies . 138 2. Verification Studies . 138 D. Evaluation . 141 E. Future Work . 142 F. Chapter Summary . 143 VII OFFICE ACTIVITY RECOGNITION :::::::::::::: 145 ix CHAPTER Page A. Preface . 145 B. Introduction . 145 C. Related Work in Activity Recognition . 149 D. Experiment . 150 1. Classifiers . 151 2. Feature Spaces . 153 E. Results . 153 1. User-Independent System . 154 2. User-Dependent System . 155 F. Discussion . 157 G. Future Work . 162 H. Chapter Summary . 163 VIII CONCLUSION ::::::::::::::::::::::::::: 164 REFERENCES ::::::::::::::::::::::::::::::::::: 167 VITA :::::::::::::::::::::::::::::::::::::::: 198 x LIST OF TABLES TABLE Page I Accuracy results of existing recognizers on our collected set of primitive data. :::::::::::::::::::::::::::::: 54 II Accuracy results of the different feature sets using a multi-layer perceptron classifier. The first five columns represent the accu- racies of the five individual feature sets. Combined refers to the combined feature set of CALI, HHReco, and Long. All refers to the combination of the Combined feature set, plus the Paleo fea- tures. Modified uses the same feature set as All, but also utilizes the complex confidence modification algorithm. :::::::::::: 75 III Number of occurrences of each primitive shape in our data sets as well as the accuracy of 10-fold cross-validation on each set. The final column shows the accuracy results when we train with dataset A and test with dataset B. \Total" accuracy represents the flat average of all shape accuracies, while \weighted" average shows the average accuracy of each shape weighted by its number of occurrences. *Dataset B consisted of more lines because it included \anticipated" unit symbols, which are drawn as dashed rectangles or diamonds. ::::::::::::::::::::::::: 99 IV Number of each type of sketched primitive present in the collected data. Complex and polyline shapes cannot be drawn with multiple strokes. :::::::::::::::::::::::::::::::::: 116 V Accuracy results after 10-fold cross-validation. \Average" repre- sents flat averages, while \weighted averages".