A Data-Driven Text Mining and Semantic Network Analysis for Design Information Retrieval
Total Page:16
File Type:pdf, Size:1020Kb
A Data-Driven Text Mining and Semantic Network Analysis for Design Information Retrieval By Feng Shi Imperial College London, Dyson School of Design Engineering A thesis submitted for the degree of Doctor of Philosophy July 2018 1 Abstract Data-Driven Design is an emerging area with the advent of big-data tools. Massive information stored in electronic and digital forms on the internet provides potential opportunities for knowledge discovery in the fields of design and engineering. The aim of the research reported in this thesis is to facilitate the design information retrieval process based on large-scale electronic data through the use of text mining and semantic network techniques. We have proposed a data-driven pipeline for design information retrieval including four elements, from data acquisition, text mining, semantic network analysis, to data visualisation and user interaction. Web crawling techniques are applied to fetch massive online textual data in data acquisition process. The use of text mining enables the transformation of data from unstructured raw texts into a structured semantic network. A retrieval analysis framework is proposed based on the constructed semantic network to retrieve relevant design information and provoke design innovation. Finally, a web-based platform B-Link has been developed to enable user to visualise the semantic network and interact with it through the proposed retrieval analysis framework. Seven case studies were conducted throughout the thesis to investigate the effectiveness and gain insights for each element of the pipeline. Thousands of design post news items and millions of engineering and design peer reviewed papers can be efficiently captured by web crawling techniques. Through the use of itemset mining and noun phrase chunking, a semantic network constructed based on these textual data is shown to capture more inherent design- and engineering-oriented concepts and relations, compared to the benchmarking approaches: WordNet, ConceptNet, NeLL and Wikipedia. A retrieval analysis framework has been developed with different retrieval behaviours to retrieve either common general or domain-specific concepts, explicit or implicit knowledge relations, which are found to satisfy various knowledge demands in our real design projects at the conceptual stage. Finally, the result of a user test is shown to be consistent with these findings. 2 Declaration The coding for web-software B-Link was accomplished in collaboration with Liuqing Chen. Except where otherwise stated, the work in this thesis is my own. Parts of this thesis have been disseminated through conference or journal publications and are reused according to the respective copyright agreements. This is noted at the beginning of a chapter where it occurs. Feng Shi July 2017 The copyright of this thesis rests with the author and is made available under a Creative Commons Attribution Non-Commercial No Derivatives licence. Researchers are free to copy, distribute or transmit the thesis on the condition that they attribute it, that they do not use it for commercial purposes and that they do not alter, transform or build upon it. For any reuse or redistribution, researchers must make clear to others the licence terms of this work. 3 List of Publications This work has resulted in or contributed to the following publications at the time of submitting this thesis. Full Journal Papers: Shi, F., Chen, L., Han, J. and Childs, P., 2017. A data-driven text mining and semantic network analysis for design information retrieval. Journal of Mechanical Design, 139(11), p.111402. doi:10.1115/1.4037649. Han J., Shi F., Chen, L., Childs P. R. N., 2018. The Combinator – A computer-based tool for creative idea generation based on a simulation approach. Design Science. 4, p. e11. doi: 10.1017/dsj.2018.7. Han J., Shi F., Chen, L., Childs P. R. N., 2018. A computational tool for creative idea generation based on analogical reasoning and ontology. Artificial Intelligence for Engineering Design, Analysis and Manufacturing. doi:10.1017/ S0890060418000082. Full Conference Papers: Chen, L., Wang, P., Shi, F., Han, J. and Childs, P., 2018. A computational approach for combinational creativity in design. In DS92: Proceedings of the DESIGN 2018 15th International Design Conference (pp. 1815-1824). Shi, F., Chen, L., Han, J. and Childs, P., 2017, August. Implicit Knowledge Discovery in Design Semantic Network by Applying Pythagorean Means on Shortest Path Searching. In ASME 2017 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (pp. V001T02A053-V001T02A053). American Society of Mechanical Engineers. Chen, L., Shi, F., Han, J. and Childs, P.R., 2017, August. A network-based computational model for creative knowledge discovery bridging human-computer interaction and data 4 mining. In ASME 2017 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (pp. V007T06A001- V007T06A001). American Society of Mechanical Engineers. Han, J., Shi, F., Chen, L. and Childs, P., 2017. The Analogy Retriever–an idea generation tool. In DS 87-4 Proceedings of the 21st International Conference on Engineering Design (ICED 17) Vol 4: Design Methods and Tools, Vancouver, Canada, 21-25.08. 2017. Shi, F., Han, J. and Childs, P.R.N., 2016. A Data Mining Approach to assist design knowledge retrieval based on keyword associations. In DS 84: Proceedings of the DESIGN 2016 14th International Design Conference (pp. 1125-1134). Han, J., Shi, F. and Childs, P.R.N., 2016. The Combinator: a computer-based tool for idea generation. In DS 84: Proceedings of the DESIGN 2016 14th International Design Conference (pp. 639-648). 5 Table of Contents Abstract ...................................................................................................................... 2 Declaration ................................................................................................................. 3 List of Publications ..................................................................................................... 4 Table of Contents ........................................................................................................ 6 List of Figures ............................................................................................................. 9 List of Tables............................................................................................................. 12 Acknowledgements ................................................................................................... 13 Chapter 1 Introduction.............................................................................................. 14 1.1 Data-Driven Design ..................................................................................... 15 1.2 The Aim of this Thesis ................................................................................ 20 1.3 Overview of the thesis ................................................................................. 22 Chapter 2 Text, Ontology, and Design Information .................................................. 27 2.1 Text mining and the use in design and engineering .................................... 29 2.2 Automatic Ontology Construction ............................................................. 40 2.3 The Use of Ontology in Engineering and Design.........................................49 2.4 Discussion and Motivation ......................................................................... 52 Chapter 3 Data acquisition: Web Crawler ................................................................. 55 3.1 Data Forms and Data Resources for design knowledge ............................... 56 3.2 Data acquisition by Web Crawler ................................................................ 61 6 3.3 Study 1: Capture Design Posts from Yanko Design ..................................... 65 3.4 Study 2: Capture Engineering and Design Literatures from Elsevier ......... 70 3.5 Conclusion .................................................................................................. 74 Chapter 4 Mining and Mining: Network Construction .............................................. 76 4.1 From Texts to Concepts, Relations, and Ontology....................................... 77 4.2 Keyword Associations Mining ................................................................... 80 4.3 Full Text Mining ......................................................................................... 87 4.4 Unifying and Disparity Filtering .................................................................94 4.5 Study 3: Concepts and Relations, Precision and Recall .............................. 98 4.6 Discussion and Conclusion ....................................................................... 108 Chapter 5 The use of Explicit vs Implicit Networks for Data Insights ...................... 111 5.1 Explicit and Implicit Knowledge Associations ............................................112 5.2 Retrieval framework ..................................................................................116 5.3 Probability and Velocity Layer Modelling ................................................. 118 5.4 Study 4: Probability and Velocity analysis on the golden relations ........... 123 5.5 More Criteria by Pythagorean Means........................................................ 126 5.6 Study 5: Exploring from a Design Query of a Single Concept...................