Semantic Analysis of Ladder Logic

Semantic Analysis of Ladder Logic A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Soumyashree Gad, M.S. Graduate Program in Computer Science and Engineering The Ohio State University 2017 Master's Examination Committee: Dr. Srinivasan Parthasarathy, Advisor Dr. P. Sadayappan c Copyright by Soumyashree Gad 2017 Abstract Purpose: A careful examination of Ladder Logic program reveals it's hierarchical nature and components, which makes it interpret able. Perhaps, using data mining techniques to interpret the top level and sub-level components can be really useful. This seems like a classification problem. The application of machine learning algorithms on the features extracted from ladder logic can give insights of the whole ladder logic code. The components and their interactions are intuitive which certainly add to the probability of getting better results. The learning in the PLC programming can be alleviated by converting the existing PLC code to commonly used formats such as JSON, XML etc. The goal is to consume ladder logic sample, break it down into minor components, identify the components and interactions between them and later write them to JSON. As Ladder logic is the most commonly used programming language for PLCs we decided to start the experiment with Ladder logic program samples. Feature engineering combined with machine learning techniques should provide accurate results for the Ladder Logic data. Methods: The data set contains 6623 records for top level classification with 1421 ALARM, 150 STEP SEQUENCE, 96 SOLENOID samples and 5304 UNCLASSIFIED and 84472 records for sub-level classification which contains sub-level components of the all the ALARM and STEP SEQUENCE samples from the top level data set. We extract the initial top level and sub-level features from GX works. The ii advanced features like Sequence, LATCH and comments are extracted by parsing the information from the output of GX works. The final set of features for Top level classification consists of basic features, advanced features, and comments. Data set for Sub-level classification has few more features apart from the already existing features from the top level such as previous instruction - next instruction features(3 Window), bi-gram features of the instructions and top level class( the result of top level classification). The result of top level classification and sub-level classification are filled into a JSON object which is later written to a JSON file. Results: We have classification results from the top level and sub-level. Decision trees seem to work the best for both Top Level and Sub-level classifications. Since the features are discrete we tried Decision trees, Naive Bayes and Support vector machines. Performance result of each of them was: Decision Tree : Accuracy- 0.91 , F1-macro- 0.90, F1-micro 0.90 Naive Bayes : Accuracy- 0.85 , F1-macro- 0.80, F1-micro 0.85 LinearSVC : Accuracy- 0.88 , F1-macro- 0.88, F1-micro 0.88 For Sub-level classification decision trees out perform any of the other classifiers Decision Tree : Accuracy- 0.90 , F1-macro- 0.91, F1-micro 0.90 Naive Bayes : Accuracy- 0.80 , F1-macro- 0.80, F1-micro 0.81 LinearSVC : Accuracy- 0.75 , F1- macro- 0.78, F1-micro 0.79 Conclusions: A decision tree is the best suitable classifier for the purpose here. The tuning of Decision tree classifier played important role in improving the performance. Using entropy as the classification criteria and restricted the depth of the tree to 6 improved the performance of the classifier by 6 iii I dedicate this to my parents who have always got my back. iv Acknowledgments I would like to thank Prof. Srinivasan Parthasarthy for accepting me as his student and giving me the opportunity to be part of this project. I would like to thank Derrick Cobb from Honda, who came up the unique concept of using machine learning algorithms with PLC and was always excited work with us. I would like to thank Albert (Jiongqian Liang) who mentored me through out the project and gave valuable inputs. I would like to thank my friend Mr.Siddharth Saurav who kept me motivated through out. Finally and most importantly I would like to thank God for blessing me with a family which has always believed in me and supported my dream. v Vita April 3, 1991 . Born - Sangli, India 2013 . .B.E. Computer Science 2013 - 2015 . Software Engineer, Bosch India 2015 - present . Graduate Student, The Ohio State University. Fields of Study Major Field: Computer Science and Engineering vi Contents Page Abstract . ii Dedication . iv Acknowledgments . .v Vita......................................... vi List of Figures . ix 1. Introduction . .1 1.1 Challenges . .2 1.2 Thesis Statement . .3 1.3 Contributions . .3 1.4 Organization . .5 2. Background . .7 2.1 Ladder Logic . .7 2.2 Classification Algorithms . .9 2.2.1 Decision Tree . .9 2.2.2 Bi-gram ordering . 10 2.3 Related work . 10 2.3.1 Mining Software Repositories . 11 2.3.2 Machine Learning Automotive Industry . 13 2.3.3 Aid in learning PLC programming . 15 vii 3. Implementation . 16 3.1 Pre-Processing Steps . 16 3.2 Top Level Classification . 19 3.3 Sub-level classification . 21 3.4 Performance Improvement . 25 3.4.1 Parameter tuning . 25 3.4.2 Majority Voting . 26 4. Data Set and Results . 27 4.1 Data set . 27 4.2 Results . 31 5. Conclusion and Future Work . 36 5.1 Future Work . 39 Bibliography . 42 viii List of Figures Figure Page 1.1 ALARM sub-level example . .4 2.1 ladder logic example . .8 3.1 Window of size 3 . 17 3.2 Sub-Level Data Null padded . 18 3.3 Flow Chart for Top level Classification. 20 3.4 Step Sequence sub-level example . 22 3.5 Flow Chart of Sub-level Classification. 24 4.1 Step Sequence examples with Count . 29 ix 4.2 Alarm examples with Count . 30 4.3 Decision tree Visualization . 31 4.4 Confusion Matrix for TS3 . 33 4.5 Heat Map for the important features for the classes . 34 4.6 Input and Output GUI . 35 5.1 Accuracy vs Max depth of the Decision tree classifier . 37 x Chapter 1: Introduction The relationship between hardware and software has been long prevailing. The software which operates very close to the hardware is usually low level and far from being user-friendly in most of the cases. One of the PLC programming languages, Ladder Logic is a quintessential example of such a low-level language.Ladder logic is used to develop software for programmable logic controllers (PLCs) used in industrial control applications. The name is based on the observation that programs in this language resemble ladders, with two vertical rails and a series of horizontal rungs between them. The unique features of every class make the identification possible. Every component in the Ladder Logic can be considered as a separate class and different sub components of the particular component make the labels of the sub- level classification. The sub-level components are unique to their respective top level classes. There are several data mining techniques which can be used to solve this problem. However, Classification seems to be the most appropriate. Classification is a method of assigning labels to unknown records by learning from the previously labeled records(training set). Labels are assigned based on the maximum similarity between a particular unknown record and the known records(training set). Prediction probability can be also outputted to see what is the probability that the unknown record 1 is of a certain class. The classifier is said to have learned well when it classifies the unlabelled data with good accuracy. For the classifier to work well the records in the training set have to be similar to the test data. For example, you cannot have oranges and grapes in training set and give apples in the test data. All the apples will be predicted as oranges or grapes. 1.1 Challenges Getting a good training data which has records similar to the test data is very difficult as there can be numerous possibilities. It is a Herculean task to find every existing pattern to include in the training data. Even after including all the existing patterns it not certain that we would find only the similar records. The ladder logic samples differ across the different automotive platforms. So, we need to have a standard if the same tool has to be applied across all the platforms. The classifier works very intuitively, it classifies records just like a human would do because the features extracted are the ones even humans would look-up to when identifying the component. However, there are cases when the classifier will output wrong result, given the features resemble other classes more than the actual class. This project is in its initial phase, so getting training set 100 percent error free is difficult. To err is Human. There have been times when the classifier identified wrongly labeled data in the training set. But, we have to try to check multiple times before inserting a record into the training set. Thus, the challenges involved can be divided into two types: 1. Choosing right training set for the classifier 2. Tuning the parameters of the Classifier, to make the performance better. 2 We try to address both the challenges and how to deal with them in detail in the chapters ahead. 1.2 Thesis Statement The statement of this thesis is that it is possible to lever custom feature engineering and a novel hierarchical classification methodology to translate Ladder Logic programs to an understandable format (in JSON) that can help both train and ex- plain the underlying logic of such systems to engineers working on such systems.

Load more