A Hybrid Graph Neural Network Approach for Detecting PHP Vulnerabilities Rishi Rabheru Hazim Hanif Sergio Maffeis Department of Computing Department of Computing Department of Computing Imperial College London Imperial College London; Imperial College London
[email protected] Faculty of Computer Science
[email protected] and Information Technology University of Malaya, Malaysia
[email protected] Abstract—This paper presents DeepTective, a deep learning of the source code. LSTM is good at finding patterns in approach to detect vulnerabilities in PHP source code. Our ap- sequential data but is not equally well suited to learn from proach implements a novel hybrid technique that combines Gated tree- or graph-structured data, which is a more natural repre- Recurrent Units and Graph Convolutional Networks to detect SQLi, XSS and OSCI vulnerabilities leveraging both syntactic sentation of program semantics. and semantic information. We evaluate DeepTective and compare In this paper we present DeepTective, a deep-learning based it to the state of the art on an established synthetic dataset and on vulnerability detection approach, which aims to combine both a novel real-world dataset collected from GitHub. Experimental syntactic and semantic properties of source code. results show that DeepTective achieves near perfect classification The first key decision in a machine learning based vulnera- on the synthetic dataset, and an F1 score of 88.12% on the realistic dataset, outperforming related approaches. We validate bility detection pipeline for source code is what code units DeepTective in the wild by discovering 4 novel vulnerabilities in constitute the samples. Many options are possible, ranging established WordPress plugins.