ISSN 1816-353X (Print) Vol. 22, No. 5 (September 2020) ISSN 1816-3548 (Online)

INTERNATIONAL JOURNAL OF NETWORK SECURITY

Editor-in-Chief Prof. Min-Shiang Hwang John C.S. Lui Department of Computer Science & Information Engineering, Asia Department of Computer Science & Engineering, Chinese University of University, Taiwan Hong Kong (Hong Kong) Kia Makki Co-Editor-in-Chief: Telecommunications and Information Technology Institute, College of Prof. Chin-Chen Chang (IEEE Fellow) Engineering, Florida International University (USA) Department of Information Engineering and Computer Science, Feng Chia University, Taiwan Gregorio Martinez University of Murcia (UMU) (Spain) Publishing Editors Sabah M.A. Mohammed Shu-Fen Chiou, Chia-Chun Wu, Cheng-Yi Yang Department of Computer Science, Lakehead University (Canada) Lakshmi Narasimhan Board of Editors School of Electrical Engineering and Computer Science, University of Newcastle (Australia) Ajith Abraham Khaled E. A. Negm School of Computer Science and Engineering, Chung-Ang University Etisalat University College (United Arab Emirates) (Korea) Wael Adi Joon S. Park Institute for Computer and Communication Network Engineering, Technical School of Information Studies, Syracuse University (USA) University of Braunschweig (Germany) Antonio Pescapè Sheikh Iqbal Ahamed University of Napoli "Federico II" (Italy) Department of Math., Stat. and Computer Sc. Marquette University, Chuan Qin Milwaukee (USA) University of Shanghai for Science and Technology (China) Vijay Atluri Yanli Ren MSIS Department Research Director, CIMIC Rutgers University (USA) School of Commun. & Infor. Engineering, Shanghai University (China) Mauro Barni Mukesh Singhal Dipartimento di Ingegneria dell’Informazione, Università di Siena (Italy) Department of Computer Science, University of Kentucky (USA) Andrew Blyth Tony Thomas Information Security Research Group, School of Computing, University of School of Computer Engineering, Nanyang Technological University Glamorgan (UK) (Singapore) Chi-Shiang Chan Mohsen Toorani Department of Applied Informatics & Multimedia, Asia University (Taiwan) Department of Informatics, University of Bergen (Norway) Chen-Yang Cheng Sherali Zeadally National Taipei University of Technology (Taiwan) Department of Computer Science and Information Technology, University Soon Ae Chun of the District of Columbia, USA College of Staten Island, City University of New York, Staten Island, NY Jianping Zeng (USA) School of Computer Science, Fudan University (China) Stefanos Gritzalis Justin Zhan University of the Aegean (Greece) School of Information Technology & Engineering, University of Ottawa (Canada) Lakhmi Jain School of Electrical and Information Engineering, Ming Zhao University of South Australia (Australia) School of Computer Science, Yangtze University (China) Chin-Tser Huang Mingwu Zhang Dept. of Computer Science & Engr, Univ of South Carolina (USA) College of Information, South China Agric University (China) James B D Joshi Yan Zhang Dept. of Information Science and Telecommunications, University of Wireless Communications Laboratory, NICT (Singapore) Pittsburgh (USA) Ç etin Kaya Koç PUBLISHING OFFICE School of EECS, Oregon State University (USA) Min-Shiang Hwang Department of Computer Science & Information Engineering, Asia Shahram Latifi Department of Electrical and Computer Engineering, University of Nevada, University, Taichung 41354, Taiwan, R.O.C. Las Vegas (USA) Email: [email protected] Cheng-Chi Lee International Journal of Network Security is published both in traditional Department of Library and Information Science, Fu Jen Catholic University paper form (ISSN 1816-353X) and in Internet (ISSN 1816-3548) at (Taiwan) http://ijns.jalaxy.com.tw Chun-Ta Li PUBLISHER: Candy C. H. Lin Department of Information Management, Tainan University of Technology (Taiwan) © Jalaxy Technology Co., Ltd., Taiwan 2005 23-75, P.O. Box, Taichung, Taiwan 40199, R.O.C. Iuon-Chang Lin Department of Management of Information Systems, National Chung Hsing University (Taiwan) Volume: 22, No: 5 (September 1, 2020)

International Journal of Network Security

Research on Malware Detection and Classification Based on Artificial 1. Intelligence Li-Chin Huang, Chun-Hsien Chang, and Min-Shiang Hwang, pp. 717-727

2. A BLS Signature Scheme from Multilinear Maps Fei Tang and Dong Huang, pp. 728-735

3. Eighth Power Residue Double Circulant Self-Dual Codes Changsong Jiang, Yuhua Sun, and Xueting Liang, pp. 736-742

Identity-based Public Cryptographic Primitive with Delegated Equality Test 4. Against Insider Attack in Cloud Computing Seth Alornyo, Acheampong Edward Mensah, and Abraham Opanfo Abbam, pp. 743-751

5. One-Code-Pass User Authentication Based on QR Code and Secret Sharing Yanjun Liu, Chin-Chen Chang, and Peng-Cheng Huang, pp. 752-762

Efficient Anonymous Ciphertext-Policy Attribute-based Encryption for General 6. Structures Supporting Leakage-Resilience Xiaoxu Gao and Leyou Zhang, pp. 763-774

7. A Distributed Density-based Outlier Detection Algorithm on Big Data Lin Mei and Fengli Zhang, pp. 775-781

8. Binary Executable Files Homology Detection with Genetic Algorithm Jinyue Bian and Quan Qian, pp. 782-792

A New Diffusion and Substitution-based Cryptosystem for Securing Medical 9. Image Applications L. Mancy and S. Maria Celestin Vigila, pp. 793-800

10. A LWE-based Oblivious Transfer Protocol from Indistinguishability Obfuscation Shanshan Zhang, pp. 801-808

Verifiable Secret Sharing Based On Micali-Rabin's Random Vector 11. Representations Technique Haiou Yang and Youliang Tian, pp. 809-814

A Privacy-Preserving Data Sharing System with Decentralized Attribute-based 12. Encryption Scheme Li Kang and Leyou Zhang, pp. 815-827

13. Evidence Gathering of Facebook Messenger on Android Ming-Sang Chang and Chih-Ping Yen, pp. 828-837

14. Protection of User Data by Differential Privacy Algorithms Jian Liu and Feilong Qin, pp. 838-844

Verifiable Attribute-based Keyword Search Encryption with Attribute Revocation 15. for Electronic Health Record System Zhenhua Liu, Yan Liu, Jing Xu, and Baocang Wang, pp. 845-856

An Unlinkable Key Update Scheme Based on Bloom Filters for Random Key 16. Pre-distribution Bin Wang, pp. 857-862

17. Survey on Attribute-based Encryption in Cloud Computing P. R. Ancy, Addapalli V. N. Krishna, K. Balachandran, M. Balamurugan, and O. S. Gnana Prakasi, pp. 863-868

A Sequential Cipher Algorithm Based on Feedback Discrete Hopfield Neural 18. Network and Logistic Chaotic Sequence Shoulin Yin, Jie Liu, and Lin Teng, pp. 869-873

A Novel Blockchain-based Anonymous Handover Authentication Scheme in 19. Mobile Networks ChenCheng Hu, Dong Zheng, Rui Guo, AXin Wu, Liang Wang, and ShiYao Gao, pp. 874-884

New Publicly Verifiable Data Deletion Supporting Efficient Tracking for Cloud 20. Storage Changsong Yang, Xiaoling Tao, and Qiyu Chen, pp. 885-896

International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 717

Research on Malware Detection and Classification Based on Artificial Intelligence

Li-Chin Huang1, Chun-Hsien Chang2, and Min-Shiang Hwang3,4 (Corresponding author: Min-Shiang Hwang)

Department of Information Management, Executive Yuan, Taipei 10058, Taiwan1 Department of Management Information Systems, National Chung Hsing University, Taiwan2 Department of Computer Science & Information Engineering, Asia University, Taichung, Taiwan3 500, Lioufeng Rd., Wufeng, Taichung 41354, Taichung, Taiwan, R.O.C. Department of Medical Research, China Medical University Hospital, China Medical University, Taiwan4 (Email: [email protected]) (Received Apr. 13, 2020; Revised and Accepted July 8, 2020; First Online July 9, 2020)

Abstract erated adversarial network is optimized for small sample malware detection. Malware remains one of the major threats to network Keywords: Android Malware; Deep Learning; Machine security. As the types of network devices increase, in ad- Learning; Malware; Ransomware Detection dition to attacking computers, the amount of malware that affects mobile phones and the Internet of Things de- vices has also significantly increased. Malicious software 1 Introduction can alter the regular operation of the victim’s machine, damage user files, steal private information from the user, Today, malware is still one of the significant cybersecu- steal user permissions, and perform unauthorized activi- rity threats [2,10]. According to Symantec’s 2018 Cyber- ties on the device. For users, in addition to the inconve- security Threat Report [23]: In terms of traditional mal- nience caused by using the device, it also poses a threat ware that affects computers, the total number of malware to property and information. Therefore, in the face of variants discovered in 2017 was as high as 669 million, an malware threats, if it can accurately and quickly detect increase of 87.7% from the amount found in 2016; In terms its presence and deal with it, it can help reduce the im- of malware that affects mobile devices such as cell phones, pact of malware. To improve the accuracy and efficiency the number of new variants has increased from 17,000 to of malware detection, this article will use deep learning 27,000, an increase of about 55%. Due to the advance- technology in the field of artificial intelligence to study ment and popularization of IoT technology, the number and implement high-precision classification models to im- of malware that affects IoT devices has increased signifi- prove the effectiveness of malware detection. We will use cantly in recent years. The report mentions that malware convolutional neural networks and long and short-term variants affecting IoT devices have grown by about 600% memory as the primary training model. When using con- in recent years. It is the main threat to the currently volutional neural networks for training, we use malware accessible IoT device environment. visualization techniques. By converting malware features Common malware currently includes Kotver Trojans, into images for input, and adjusting the input features worms, ransomware, spyware, and Coinminer. Most mal- and input methods, models with higher classification ac- ware infection methods are that attackers use system and curacy will be found; in long-term and short-term memory software security vulnerabilities to implant malware into models, appropriate features and preprocessing methods the victim’s device or trick the victim into downloading are used to find Model with high classification accuracy. a file containing malware, and then implant the malware Finally, the accuracy of small sample training is optimized into the victim’s computer. Different malware can have by generating features for network output samples. In the different effects on infected devices. Usually, it may af- above training, all of us want to use malware as a sample fect the regular operation of the device, damage user files, that affects different devices. In this article, we propose steal user’s private information, steal user rights, and per- three research topics: 1). When importing images, high- form unauthorized activities on the device. Most ran- precision models are used to study malware. 2). When somware encrypts users’ files and requires a ransom to importing non-images, a high-precision model will be used unlock files, which poses a severe threat to users’ data to study the malware. 3). By using this model, the gen- and property. Besides, due to the prevalence of virtual International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 718 currencies such as Bitcoin, malware that secretly uses the samples: computing power of the victim’s computer for mining op- TP + TN erations is also one of the widespread malware in recent Accuracy = years. FN + TP + FP + TN Traditionally, malware detection methods can be This result indicates the classifier’s ability to roughly divided into static analysis and dynamic anal- distinguish the entire sample set. However, in ysis. Static analysis can be realized by byte sequence some cases, this indicator will fail. For example, analysis, control flow chart and operation code frequency if there are 10,000 A samples and 100 B samples distribution to achieve malware identification; Currently, in the data set, and the classifier will judge all dynamic analysis is based on the use of appropriate moni- samples as A, the accuracy is still 99%. toring tools for API call monitoring and program behavior monitoring in controlled environments (such as sandboxes Recall: The proportion of positive samples correctly or virtual machines). Due to the development of artificial classified by the classifier among all positive intelligence technology, the identification of malware has samples: excellent potential. At present, many studies aiming at TP this aspect aim to improve accuracy and efficiency. Cur- Recall = FN + TP rently, most artificial intelligence technologies are used to improve the detection of static analysis. Precision: The proportion of all classified samples In order to achieve the purpose of detecting and dis- classified as classified samples by the classifier: tinguishing malware, many methods have been proposed TP in different studies. In 2011, Nataraj et al. proposed P recision = a method for classifying malware after visualization is TP + FP proposed [17]; In 2011, Santos et al. A method for de- The result indicates the actual accuracy of pre- tecting malware using the opcode sequence of malware is dicting positive samples. proposed. In 2013, based on the 2011 proposed method and the method of tracking the behavior of malware after F-measure: If the Recall is as important as Preci- execution, combined static analysis and dynamic analy- sion in the model, the F-measure can be used sis to detect malware [20, 21]; In 2014, Zolotukhin et al. as an indicator. The calculation also considers proposed to use the n-gram method to find the essen- two indicators: Precision and Recall: tial features in the opcode sequence, and then use sup- P recision × Recall port vector machines for training to detect malware [26]. F − measure = 2 × P recision + Recall In previous studies, only a small amount of opcode ex- traction can reduce the performance overhead. However, 2) Operating cost: Compare the time spent training and when this method is used for a small number of train- distinguishing samples with the required operating ing sets, accuracy may be severely affected. Therefore, resources. Zhang et al. proposed a method to convert Opcode se- quences to images in 2017 was introduced to improve this problem [25]; In the same year, Kwon et al. proposed to Table 1: The evaluation criteria use API call patterns to identify the type of malware [15]; In 2018, Ni et al. proposed to use the SimHash method Actual True Actual False for hashing and converting it into an image as a sam- Prediction True Positive False Positive ple for training [18]; Kim et al. proposed a method for True (TP) (FP) detecting Android malware using multi-feature input as Prediction False Negative True Negative training is proposed [14]; For malware affecting the In- False (FN) (TN) ternet of Things, Hamed et al. uses LSTM as training to detect malware that affects IoT devices in 2018 [8]; Sajad et al. digitized ransomware running records, and then use CNN and LSTM to train a classifier to achieve ransomware recognition [9]. 2 Research on Malware Input by The following evaluation criteria are commonly used to evaluate the effectiveness of this research topic: Image 1) Accuracy, recall rate, precision, and F-measure: 2.1 Motivations These indicators are used to analyze the results of machine learning. Table 1 shows the calculation Many types of machine learning models can identify method for each parameter. malware. At present, the performance is the best, and the most popular are various neural network models. Among Accuracy: The proportion of correct samples clas- them, a convolutional neural network (CNN) can be said sified for the classifier to the total number of to be one of the representatives. CNN is very suitable International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 719 for the recognition of image processing. When identifying an image, training will be conducted to achieve the effect malware, if the training samples are input as images, CNN of identifying malware. In this study, the data set used can achieve a good classification effect. included ten types of malware, with a total of 9168 sam- Nataraj et al. proposed the use of image processing ples. The classification accuracy of the research results is technology to visualize and classify malware in 2011 [17]. about 93.7% 96.7%. This method converts binary malware files into grayscale In 2018, Ni et al. proposed an MCSC (Malware Clas- images for classification. Kancherla et al. proposed in sification using SimHash and CNN) method [18]. Use 2013 to convert malware executable files into grayscale the SimHash method to hash the decompiled malware to images called byteplot. And use a Support Vector Ma- achieve the effect that similar functions can hash similar chine (SVM) as a training tool [11]. The method pro- sequences. After hashing, the results are converted into posed by Zhang et al. in 2017 decompiled the binary grayscale images, which are then trained using CNN to executable file and converted the Opcode sequence into classify the malware. Figure 3 shows the research struc- an image. Then identify whether the file is malware [25]. ture. Ni et al. proposed to use the SimHash method in 2018 This method first decompiles the malware file into Op- to hash decompiled malware for sequence similarity com- code and then executes SimHash. The calculation method parison. Then use CNN for training to achieve the effect of SimHash is shown in Figure 4. of distinguishing different types of malware [18]. The results obtained by SimHash will be converted into From the above research, we can see that after visual- grayscale images and used as samples for CNN training. izing the malware and then analyzing it, a lot of research The method proposed in this study maintains a stable has been done in this area. But the methods are different, classification accuracy rate for relatively few malware cat- so we think it is still possible to find higher accuracy and egories in the sample, with an average accuracy rate of efficiency in visualizing and analyzing malware. In addi- 98.86%. tion to the above research, we also need to focus on con- verting different data from malware and then performing 2.3 Malware Detection Based on the machine learning, different preprocessing of image input, High-precision Model for Image In- and the impact on training. In the first research topic, we propose to use several different feature input methods put to convert to images, and then try to use different pre- This research topic, malware implemented with the processing methods to process the images. Finally, the high-precision model during image input, focuses on mod- best feature selection and preprocessing methods in the els that can obtain the highest accuracy and performance experiment are obtained. when using malware as training input. The research ar- chitecture is shown in Figure 5. 2.2 Related Works This research topic will use different methods to deal with the steps of converting samples into images. Eval- The malware is based on images as training samples. uate the impact of selecting the appropriate function or After the malware is visualized, it can be processed for directly converting to an image on accuracy. Then, after related research to identify the malware. Nataraj et al. converting to an image, explore whether different image proposed a method for visualizing malware, and perform- processing methods can improve accuracy. Finally, ad- ing automatic classification in 2011 is proposed [17]. Fig- just the training model to obtain a classifier that can be ure 1 is a schematic of the study. Since it was observed accurately classified. that the malware of the same family would have similari- This research topic will collect malware samples that ties in the texture and layout of the image, the study first affect different devices—for example, traditional comput- converted binary malware files into 8-bit vectors. Then ers used for research, mobile phones, Internet of Things convert the 8-bit vector into a grayscale image. Finally, devices, etc. Then, when mirroring the malware, we will the k-nearest neighbor of Euclidean distance is used for use different methods to deal with malware. One is to ex- classification. The 9458 sample classifications have 98% tract features from malware and then convert the features classification accuracy among its 25 categories. into images. The extraction is mainly based on opcodes; The method proposed by Zhang et al. in 2017 decom- the other is to image the malware directly. After process- piled the binary executable file into an Opcode sequence. ing the image, the image is used as a training sample. After converting it into an image, a convolutional neu- Convolutional neural network (CNN) is the most ral network was used to compare the target image with widely used deep learning technique in image process- the image generated by known malware. Then determine ing [1]. We will input the samples obtained in the previous whether the file is malware [25]. The flow chart of its step to CNN for training. CNN is roughly composed of method is shown in Figure 2. a convolutional layer, pooling layer, and fully connected First, after decompressing the unknown file, find out layer, as shown in Figure 6. its opcode sequence and frequency of occurrence. Next, C1 and C2 in Figure 6 are convolutional layers. The use ”Information Gain” to find out which function to use convolution layer consists of convolution units. The in- in training. After the selected function is converted to put malware image and function detector (filter) are con- International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 720

Malware Convert Binary Convert an 8-bit Use 01000010101000101 - to - Vector to - the KNN 001010100001... 8-bit Vector a Graylevel Image Classification

Figure 1: Method for visualizing and automatically classifying malware [17]

Figure 2: Flow chart of Zhang et al.’s method [25]

Figure 3: MCSC architecture volved. Adding the appropriate excitation function (such Figure 7 is an example of convolution operation and as ReLU), we get the resulting output. The convolution maximum pooling. The yellow box in the figure is the operation can extract input features and repeatedly re- convolution operation of the 3 × 3 area and the feature move high-level, sophisticated features from low-level fea- detector; The red block is the largest pool in the 2 × 2 tures such as lines in a multi-layer network. area. The pooling layer will process the output of the con- Finally, the fully connected layer flattens the previ- volutional layer. The pooling layer can be seen as a sub- ous output and connects it to the neural network, and sampling process. Taking the maximum pool as an ex- obtains the output after passing through the neural net- ample, if the maximum pool is 2 × 2, the output will be work. In this research topic, we first use the traditional the maximum value in each 2 × 2 block. The data size, CNN model. The training affected malware samples from after processing by the pooling layer, will become smaller. different devices. After finding a better image input and Therefore, the number of parameters will also be reduced, processing method, adjust the CNN model to obtain bet- which helps reduce the phenomenon of overfitting. ter accuracy and performance. International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 721

Figure 4: Example of SimHash method

Binary Malware File Malware Classification  CNN Model

¡¡ A 6 ¡ AAU Malware File - Feature Selection - Malware Image Generation - Image Processing

Figure 5: A research framework for the malware detection based on the high-precision model for image input

Figure 6: Example of CNN architecture

Figure 7: Convolution operation and maximum pooling

There are currently many sample sets open to the out- samples. Several malware data sets are currently open to side world. However, new malware samples that have the public and are expected to be used in research: recently appeared may be missing. There are two leading solutions: One is to cooperate with the malware collection 1) Microsoft Malware Classification Challenge [16]; platform to obtain fresh samples. The second is to estab- 2) The ultimate gaming malware research bench- lish a platform and encourage users to upload malware mark [4]; International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 722

3) Android malware data set [24]; training. Finally, input files can be distinguished as be- nign software or malware. The study used two sets of 4) AndroZoo [3]; samples, namely 1,075 malware and 19,417 benign soft- 5) CTU-13 [22]. ware, and 1,209 malware and 1,300 benign software. The model under study achieved 98% accuracy and 0.99 F measurement in the first set of samples. In the second set 3 Research on Malware Input by of samples, 99% accuracy and 0.99 F measurement were obtained. Non-image Due to the rapid increase in malware targeting IoT de- vices, Hamed et al. used LSTM to detect IoT malware [8] 3.1 Motivations in 2018. Figure 9 is its research architecture. After de- In addition to using image input for machine learning compressing and decompiling the sample, the opcode is of malware samples, directly using opcodes or character taken out, and then the opcode is selected as the fea- string sequences as training inputs is also one of the main ture to generate the feature vector. Then use RNN and methods. However, just as the input image is used as the LSTM for training. The study used 281 malware samples training sample, the selection of the input features of the and 270 benign software samples. And use three different sample, the preprocessing of the data, and the adjustment LSTM configurations for training (Layers 1 to 3). Finally, of the model all affect the accuracy and efficiency of the 100 samples not used in training are used to evaluate the classification. performance of the model. Finally, compare with random Kim et al. proposed a multi-mode deep learning forest, support vector machine, KNN, and other classi- method using multiple types of features to detect Android fication models. The two-layer LSTM configuration has malware. The method uses information obtained from de- the highest accuracy. The accuracy rate is 98.18%. compiled apk files to extract various kinds of features—for Sajad et al. proposed the DRTHIS (Deep ran- example, strings, permissions, and calls API, and so on. somware threat hunting and intelligence system) method Then use the Multimodal Neural Network (MNN) for in 2018 [9]. The research architecture is shown in Fig- training. Finally, it is used to determine whether the in- ure 10. It is roughly divided into three parts: data conver- put file is benign software or malware [14]. Hamed et al. sion, threat detection (whether detection is ransomware), used RNN (Recurrent Neural Network) and LSTM (Long and what kind of ransomware is detected. This method Short Term Memory) in 2018 to take Opcode of IoT mal- records and digitizes the motion information within 10 ware as input to train a model that can be judged to be seconds after the file is executed. The digitized data will benign or malware [8]. Sajad et al. used to record and be merged with the label into a training data set. Then digitize the actions of the ransomware when it was run- through the two models of CNN and LSTM, the binary ning, and then used LSTM and CNN to train separately classifier and the ransomware classifier are trained. Fi- to classify benignly and ransomware and their types [9]. nally, a system capable of judging benign and malicious The subject of this research is to test different types of software and distinguishing the types of ransomware is samples (traditional computer malware, mobile malware, obtained. In this study, the LSTM model obtained rel- etc.) based on their selectable characteristics. Find the atively good results. In the experiment, three different combination of the best accuracy and efficiency, and com- types of ransomware samples (Locky, Cerber, TeslaCrypt) pare it with the best performing images obtained in the were used in the experiment. Each type of ransomware previous stage. used 220 and 219 benign samples for training. The re- sult of the F-measure is 0.96, and the true positive rate is 97.2%. Also, the study used other types of ransomware 3.2 Related Works not used for training. 99% of CryptWall samples, 75% The subject of this study uses non-image methods as of TorrentLocker samples, and 92% of Sage samples were a sample input. This section will introduce research on correctly classified. different types of malware (such as Android, IoT, and ransomware). 3.3 Malware Detection Based on the In 2018, Kim et al. proposed an Android malware High-precision Model for Non-image detection framework that uses multi-mode deep learn- Input ing methods with multiple input features to detect An- droid malware [14]. Its architecture is shown in Figure 8. For this research topic, we focus on research when non- First, decompile the apk file to extract various types of images are used as a sample input. The research archi- functions. This study is divided into seven categories: tecture is shown in Figure 11. String functions, method opcode functions, method API The input samples are adjusted according to different functions, shared library function opcode functions, per- feature selection methods and combinations. At the same mission functions, component functions, and environment time, there are many variations of LSTM. For example, functions. After generating the feature matrix from the GRU (Gated Cycling Unit), etc. The accuracy and per- features, a multimodal neural network will be used for formance of the image are also worth discussing. International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 723

Raw Data Extraction Feature Extraction Feature Vector Multimodel Neural Net Generation Each vector is inputed Decoding - Permission/Components Manifest Files /Environment Infomation Generating separately feature vector DNN ··· DNN Decoding - String/Dalvik Opcode - - Malware- Dex Files Freq./API Invocation Freq. by each or Benign ? ? Disassembling kind of - Opcode Freq. Final Net Shared Library features

Figure 8: The framework of Kim et al. [14]

ELF Parser Deep Threat Classifier Feature Extraction Updating ELF File Selecting Opcode Deep Net Tuning ? ARM ELF Files - and Decompling ELF File - Creating - ELF File ? ? Feature Deep Training Extracting Opcode Vector

Figure 9: Using LSTM to detect IoT malware [8]

Threat Intelligence Data Transformation Threat Hunting Deep Learning: Deep Learning: Training Multi-class Classifier Data Transformation Training Binary Classifier - - ? Multi-class Classifier Combing and Labeling Binary Classifier One class Classifier

Figure 10: DRTHIS architecture [9]

Malware File - Feature Selection - LSTM Model - Malware Classification

Figure 11: Research structure for the malware implemented with the high-precision model for non-image input International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 724

First, we select the features of malware samples based Most GANs are used to process image samples. When on their category characteristics. For example, malware this method is applied to the classification of malware, that affects mobile phones will have functions such as per- we convert the samples into image input methods and mission requirements, which can also be used as features. study the accuracy of classifying a small number of sam- At this stage, we use LSTM and its deformation as the ples using GANs. In addition, there are currently several training model. variants of GANs. Improving the accuracy or efficiency When dealing with non-image input, especially the in- of malware classification is also one of the research direc- put of sequence data, LSTM will be a very suitable model. tions. Since the traditional RNN will have a connection phe- nomenon between the next node and the previous node (The vanishing gradient problem for RNNs), the weight 4.2 Related Works of the self-loop can be changed through the input gate, In 2017, Kim et al. proposed the use of generative ad- output gate, and forget gate: versarial networks for malware classification [13]. In 2018, the research continued and discussed the protection mea- • The function of the input gate is to determine sures for zero-day attacks [12]. Figure 13 is its research whether to add the current input to the long-term architecture. memory. Traditional malware detection mechanisms usually rely • The function of the output gate is to determine on existing functions. The defense effect against zero- whether to add the current input to the output. day attacks is limited. Therefore, the method proposed in this study first visualizes the malware and then uses • The function of the forget gate is to determine the generative adversarial network (GAN) for training. whether the incoming information of the upper layer Pseudo samples, the GAN generates pseudo samples and should be kept in memory or forgotten. adjusts the parameters through continuous confrontation with the discriminant network. Finally, the generated Therefore, when the model parameters are fixed, the fake samples are similar to the actual samples, but not problem of gradient disappearance or expansion can be the same. It can mimic a variant of the virus. avoided. Figure 12 shows the architecture of the LSTM Part of the detector uses an autoencoder to learn the model. features of the malware and feed it back to the generation There are many variations of LSTM. One of them is network to train the generator stably. The classification GRU (Gated Circulation Unit). GRU merges the input accuracy rate is 95.74%. At the same time, in the ex- gate and forgets the gate in LSTM into one update gate. periments of this study, noise-added sample images were Compared with LSTM, GRU has the advantage of simpli- used to simulate virus variants. The SSIM (Structural fying the calculation. If the prediction performance used Similarity Index) of the sample after adding noise and is similar to LSTM, it is possible to improve performance. the original sample is 0.6 ∼ 0.69. The accuracy rate is between 98.16% and 98.99%, which indicates that it also 4 Research on Malware and Gen- has good detection ability for the new variant virus. erative Adversarial Networks 4.3 Malware Detection Based on GAN 4.1 Motivations for Generating Small-sample Mal- ware In the current research, the results of machine learning are related to the number of samples. However, in the For this research topic, we used the best performing face of new malware threats, you must limit the number image input and processing methods in the first research of samples in a short period. The Generative Adversar- topic to process the samples to be used at this stage. ial Network (GAN) that have emerged in recent years Then use GAN for training to improve the training effect has considerable potential in this regard [6, 7, 12]. GAN of a small number of samples. It can be expected that is composed of the Generative model and the Discrimi- when new types of malware appear, they can be effectively native model. Generative model random samples of the detected even with a small number of samples. network as input, and the output should be as close as The research architecture at this stage is shown in Fig- possible to the real samples in the training and deception ure 14. In this research stage, we used a malware data set network; The discriminative model identifies the real sam- with a small number of samples and extracted a certain ples of the network or generates the output of the system amount of samples from the previous samples as samples and distinguishes the non-real samples as much as possi- at this stage. Then use GAN as a model for this training ble. These two networks face each other and adjust the phase. parameters, which ultimately enables the generating net- The concept of GAN was proposed by Goodfello et al. work to generate samples, thus making the discriminating in 2014 [7]. Figure 15 shows the underlying architecture network unable to identify authenticity. of the GAN. International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 725

Figure 12: LSTM model

Generating Malware Image - Training by Using tDCGAN - Malware Type Classification From Its Pattern

Figure 13: The research framework of Kim et al. [12]

GAN Model Malware Image Malware Dataset - - DCGAN Model - Malware Classification Generation . .

Figure 14: Research structure of malware detection based on GAN for generating small-sample malware

Real Samples @ @ @R Discriminator - Noise Fake Samples 

Random Index ?- Generator 6 ?

Figure 15: Generating an adversarial network model

GAN consists of a generating network and a discrim- as possible. The two networks struggle with each other inative network. The generating network generates ran- and are adjusting parameters. The final sample generated dom samples of the network as input and produce an out- by the network is almost real. Using this technology, you put that can deceive and distinguish the network; The dis- can amplify the number of samples analyzed by malware criminative network inputs real samples or generates net- and a small amount of samples, and combine the results work output and distinguishes non-real samples as much of the previous two research topics to optimize the accu- International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 726 racy of malware analysis for a small number of samples. [5] M. Arjovsky, S. Chintala, L. Bottou, “Wasserstein GAN is not suitable for learning discrete data, such as GAN,” arXiv preprint, arXiv: 1701.07875, 2017. text. Therefore, in this part, the image will still be used [6] J. Bruner, A. Deshpande, Generative Adver- as the sample input form. sarial Networks for Beginners, July 8, 2020. In addition, GAN has conducted many studies to pro- (https://github.com/jonbruner/generative pose its deformation. For example, DCGAN proposed -adversarial-networks/blob/master/ in [19], WGAN proposed in [5], and so on. gan-notebook.ipynb) [7] I. Goodfellow, J. Pouget-Abadie, M. Mirze, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Ben- 5 Conclusions gio, “Generative adversarial nets,” in Proceedings of the 27th International Conference on Neural Infor- In this article, we have proposed three research topics: mation Processing Systems (NIPS’14), vol. 2, pp. 1) Research on malware input by image: Malware detec- 2672–2680, 2014. tion based on the high-precision model for image input; [8] H. HaddadPajouh, A. Dehghantanha, R. Khayami, 2) Research on malware input by non-image: Malware de- K. K. R. Choo, “A deep recurrent neural network tection based on the high-precision model for non-image based approach for internet of things malware threat input; 3) Research on malware and generative adversarial hunting,” Future Generation Computer Systems, vol. networks: Malware detection based on GAN for generat- 85, pp. 88-96, 2018. ing small-sample malware. [9] S. Homayoun, A. Dehghantanha, M. Ahmadzadeh, This article collected and used malware samples that S. Hashemi, R. Khayami, K. K. R. Choo, D. E. New- affected different devices for training. It contains mal- ton, “DRTHIS: Deep ransomware threat hunting and ware that affects computers, malware, and ransomware intelligence system at the fog layer,” Future Genera- that affects mobile phones. Malware that affects differ- tion Computer Systems, vol. 90, pp. 94-104, 2019. ent methods has various features. For example, due to the apk file structure of mobile phones, the malware that [10] S. Islam, H. Ali, A. Habib, N. Nobi, M. Alam, and D. affects mobile phones has many different characteristics Hossain, “Threat minimization by design and deploy- compared to the malware of cellular phones; Another ex- ment of secured networking model,” International ample, the permission functions and component functions Journal of Electronics and Information Engineering, mentioned in the related research. The goal of this article vol. 8, no. 2, pp. 135–144, 2018. is to propose a high-precision model for different types of [11] K. Kancherla, S. Mukkamala, “Image visualiza- samples. tion based malware detection,” in IEEE Sympo- sium on Computational Intelligence in Cyber Secu- rity (CICS’13), pp. 40-44, 2013. Acknowledgments [12] J. Y. Kim, S. J. Bu, S. B. Cho, “Zero-day mal- ware detection using transferred generative adversar- This research was partially supported by the Ministry ial networks based on deep autoencoders,” Informa- of Science and Technology, Taiwan (ROC), under contract tion Sciences, vol. 460–461, pp. 83-102, 2018. no.: MOST 108-2410-H-468-023 and MOST 108-2622-8- [13] J. Y. Kim, S. J. Bu, S. B. Cho, “Malware detec- 468-001-TM1. tion using deep transferred generative adversarial networks,” in International Conference on Neural In- formation Processing, pp. 556-564, 2017. References [14] T. Kim, B. Kang, M. Rho, S. Sezerm, E. G. Im, “A [1] D. S. Abdul Minaam and E. Amer, “Survey on multimodal deep learning method for Android mal- machine learning techniques: Concepts and algo- ware detection using various features,” IEEE Tran- rithms,” International Journal of Electronics and In- scations on Information Foresics and Security, vol. formation Engineering, vol. 10, no. 1, pp. 34–44, 14, no. 3, pp. 773-788, 2019. 2019. [15] I. Kwon, E. G. Im, “Extracting the representative [2] A. A. Al-khatib, W. A. Hammood, “Mobile malware API call patterns of malware families using recurrent and defending systems: Comparison study,” Inter- neural network,” in Proceedings of the International national Journal of Electronics and Information En- Conference on Research in Adaptive and Convergent gineering, vol. 6, no. 2, pp. 116–123, 2017. Systems, ACM, pp. 202-207, 2017. [3] K. Allix, T. F. Bissyand´e, J. Klein, and Y. Le [16] Microsoft, Microsoft Malware Classification Chal- Traon, “AndroZoo: Collecting millions of android lenge (BIG 2015), July 8, 2020. (https://www. apps for the research community,” in IEEE/ACM kaggle.com/c/malware-classification) 13th Working Conference on Mining Software Repos- [17] L. Nataraj, S. Karthikeyan, G. Jacob, B. S. Manju- itories (MSR’16), 2016. nath, “Malware images: Visualization and automatic [4] H. S. Anderson, P. Roth, “EMBER: An open dataset classification,” in Proceedings of the Eighth Interna- for training static PE malware machine learning tional Symposium on Visualization for Cyber Secu- models,” arXiv preprint, arXiv: 1804.04637, 2018. rity, pp. 311–320, 2011. International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 727

[18] S. Ni, Q. Qian, R. Zhang, “Malware identification in information management from Chaoyang University using visualization images and deep learning,” Com- of Technology (CYUT), Taichung, Taiwan, in 2001; and puters & Security, vol. 77, pp. 871-885, 2018. the Ph.D. degree in computer and information science [19] A. Radford, L. Metz, S. Chintala, “Unsupervised rep- from National Chung Hsing University (NCHU), Taiwan resentation learning with deep convolutional gener- in 2001. Her current research interests include informa- ative adversarial networks,” Under review as a con- tion security, cryptography, medical image, data hiding, ference paper at ICLR 2016, 2016. (https://arxiv. network, security, big data, and mobile communications. org/pdf/1511.06434.pdf) Chun-Hsien Chang received B.S. in Department of Me- [20] I. Santos, F. Brezo, X. Ugarte-Pedrero, P. G. chanical Engineering from National Cheng Kung Univer- Bringas, “Opcode sequences as representation of ex- sity, Taiwan in 2017; M.S. in Department of Management ecutables for data mining based malware variant de- Information Systems, National Chung Hsing University, tection,” Information Sciences, vol. 231, no. 9, pp. Taiwan, in 2019. His current research interests include 64-82, 2011. [21] I. Santos, J. Devesa, F. Brezo, J. Nieves, “OPEM: A artificial intelligence and information security. static-dynamic approach for machine learning based malware detection,” in Proceedings of International Min-Shiang Hwang received M.S. in industrial engi- Conference (CISIS’12), pp. 271-280, 2013. neering from National Tsing Hua University, Taiwan in [22] Stratosphereips Lab., The CTU-13 Dataset. A 1988, and a Ph.D. degree in computer and information Labeled Dataset with Botnet, Normal and Back- science from National Chiao Tung University, Taiwan, in ground Traffic, July 8, 2020. (https://www. 1995. He was a professor and Chairman of the Depart- stratosphereips.org/datasets-ctu13/) ment of Management Information Systems, NCHU, dur- [23] Symantec, Internet Security Threat Report, vol.23, ing 2003-2009. He was also a visiting professor at the 2018. (https://www.symantec.com/content/dam/ University of California (UC), Riverside, and UC. Davis symantec/docs/reports/istr-23-2018-en.pdf) (USA) during 2009-2010. He was a distinguished profes- [24] F. Wei, Y. Li, S. Roy, X. Ou, W. Zhou, “Deep sor of the Department of Management Information Sys- ground truth analysis of current android malware,” tems, NCHU, during 2007-2011. He obtained the 1997, in International Conference on Detection of Intru- 1998, 1999, 2000, and 2001 Excellent Research Award of sions and Malware, and Vulnerability Assessment, National Science Council (Taiwan). Dr. Hwang was a Springer, pp. 252-276, 2017. dean of the College of Computer Science, Asia University [25] . J. Zhang, Z. Qin, H. Yin, L. Ou, Y. Hu, “IRMD: (AU), Taichung, Taiwan. He is currently a chair professor Malware variant detection using opcode image recog- with the Department of Computer Science and Informa- nition,” in IEEE International Conference on Par- tion Engineering, AU. His current research interests in- allel and Distributed Systems (ICPADS’17), pp. clude information security, electronic commerce, database 1175–1180, 2017. and data security, cryptography, image compression, and [26] M. Zolotukhin, T. Hamalainen, “Detection of zero- mobile computing. Dr. Hwang has published over 300+ day malware based on the analysis of opcode se- articles on the above research fields in international jour- quences,” in Proceedings of The 11th Annual IEEE nals. CCNC Security, Privacy and Content Protection, pp. 386-391, 2014.

Biography

Li-Chin Huang received the B.S. in computer science from Providence University, Taiwan, in 1993 and M.S. International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 728

A BLS Signature Scheme from Multilinear Maps

Fei Tang1 and Dong Huang2,3 (Corresponding author: Fei Tang)

School of Cyber Security and Information Law, Chongqing University of Posts and Telecommunications1 Chongqing, China Chongqing University of Science and Technology 2 Chongqing Vocational and Technical University of Mechatronics3 (Email: [email protected]) (Received Mar. 12, 2019; Revised and Accepted Dec. 10, 2019; First Online Apr. 6, 2020)

Abstract security of the BLS scheme is rely on the random ora- cle model. However, it is well known that random oracle The security of the BLS signature scheme is based on the is an ideal model. After BLS scheme, there has several random oracle model. Hence, there is a question that works that presented some signature schemes which se- how to instantiate BLS scheme without random oracles. cure in the standard model, such as [3]. However, all of In this work, by using a powerful tool multilinear map, these schemes do not preserve the BLS scheme’s simple we answer this question. The main contributions of this structure. This leads us to a question: Can we instan- work are as follows. First of all, we describe the BLS tiate the BLS signature scheme without random oracles? scheme in the setting of multilinear group and prove its In this work, by using a powerful tool multilinear map, security in the standard model. Then, we design a ring we answer this question. signature scheme based on the multilinear BLS scheme. The notion of multilinear maps was introduced (but In the proposed scheme, ring signatures consist of a single without concrete instantiation) by Boneh and Silver- multilinear group element. berg [8]. Until 2013, Garg, Gentry, and Helevi [12] Keywords: BLS Signatures; Multilinear Map; Ring Sig- gave the first approximate candidate. Then there has natures; Standard Model many multilinear map schemes been proposed or ana- lyzed, e.g., [9, 15, 21] et al. There are several relevant works answered the above 1 Introduction question. Hohenberger et al. [18] instantiated the ran- dom oracle with an actual family of hash functions for Digital signatures are one of the most fundamental and the BLS scheme by using a more powerful tool indis- well studied cryptographic primitives. The research of tinguishability obfuscation [13]. Freire et al. [11] took digital signatures has two aspects: practicability and se- advantage of multilinear maps to realize programmable curity. As for the aspect of practicability, we mainly con- hash functions and construct IBE, BLS signature, and sider the efficiencies of the signing and verification algo- SOK non-interactive key exchange schemes. In addi- rithms, the storage space of the state information, and tion, Hohenberger et al. [17] made use of the multilinear the properties that the scheme can provide, such as ag- BLS scheme to construct an (identity-based) aggregate gregate signatures [17], ring signatures [23], blind signa- signature scheme which admits unrestricted aggregation. tures [16,19,20] and so on. As for the aspect of the secu- In [17], Hohenberger et al. followed in the Waters [25] framework and proved the adaptive security of the mul- rity, we mainly focus on the assumptions that the scheme 1 based on, such as one-way function, and whether in the tilinear BLS signature scheme. However, their proof is random oracle model or standard model. based on a strong new assumption, (n, k)-modified mul- tilinear computational Diffie-Hellman exponent where n At ASIACRYPT 2001, Boneh, Lynn, and Shacham [6] is a polynomial of the number of queries made by the designed a short signature scheme based on bilinear adversary. group, the so called BLS scheme. The BLS scheme In this work, we also give an adaptive proof for the has shown to be very useful to construct other crypto- multilinear BLS scheme. However, our proof which takes graphic primitives, such as threshold signature scheme [7], advantage of the technique of admissible hash function [2] blind signature scheme [7], signcryption scheme [10], key- is based on a weaker assumption, multilinear computa- generation algorithm of identity-based encryption (IBE) scheme [4]. The main reason of the BLS scheme is so 1Their adaptive proof of the aggregate signature scheme implies useful is that it has a relatively simple structure. The this result. (Please refer to Appendix D.2 of [17] for details.) International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 729 tional Diffie-Hellman assumption [12]. In addition, we 2.3 Complexity Assumption consider the applications of the multilinear BLS signature We assume that the following assumption holds in the set- scheme. It can be served as the key-generation algorithm ting described above: Multilinear Computational Diffie- of the multilinear IBE scheme [11]. It also can be used Hellman (MCDH) assumption. to construct aggregate signature [17], threshold signature scheme [7] and so on. In this work, we take advantage of Definition 1. For any PPT algorithm B, any polynomial the structure of the multilinear BLS scheme to construct p(·), any integer k, and all sufficiently large λ ∈ N, a ring signature scheme in the standard model. In a ring signature scheme [23], a signer can generate signatures on  λ  MP ← MulGen(1 , k); Q behalf of a group of users (i.e., ring) if and only if he is a ci 1 Pr  c , . . . , c ←R ;: v = g i∈[k]  < . member of the ring. Then, any verifier can confirm that  1 k Zp k−1  p(λ) v ← B( , gc1 , . . . , gck ) the message has been signed by one of the members in MP the ring, but he cannot know who is the real signer. Our This assumption can be viewed as an adaptation of the ring signature scheme has an attractive feature that for n Bilinear Computational Diffie-Hellman (BCDH) assump- members of a ring the signatures consist of just a single tion [4] in the setting of multilinear groups. group element.

3 Digital Signatures 2 Preliminaries 3.1 Definitions 2.1 Notations For ease of description, we define digital signature schemes with four algorithms: Setup, KeyGen, Sign, and The following notations will be used in this paper. Let Z Vrfy. Formally, given a security parameter λ, the PPT λ be the set of integers and Zp be the ring modulo p. 1 algorithm Setup, run by a trusted authority, generates denotes the string of λ ones for λ ∈ . |x| denotes the N public parameters PP. The public parameters will be length of the bit string x.[k] is a shorthand for the set used in all of the following three algorithms, for simplicity, {1, 2, . . . , k}. Finally, we write PPT for the probabilistic we omit this fact. The PPT algorithm KeyGen outputs polynomial time. a signing/verification key pair (SK,VK) for the signer. The PPT algorithm Sign takes as input a signing key SK and a message M, then outputs a signature σ. Finally, 2.2 Multilinear Maps the deterministic algorithm Vrfy processes a purported signature σ with respect to a message M and verification Let (G1,..., Gk) be a sequence of groups each of large key VK, accordingly, it outputs 1 to indicate a successful prime order p, and gi be a generator of group Gi, where verification and 0 otherwise. we let g = g1. There exists a set of bilinear maps {ei,j : Gi × Gj → Gi+j|i, j ≥ 1 ∧ i + j ≤ k}, which satisfy: 3.2 Existential Unforgeability a b ab ei,j(gi , gj ) = gi+j : ∀a, b ∈ Zp. The security model for signature schemes is Existential Unforgeability against adaptive Chosen-Message Attacks When the context is obvious, we omit the indexes i (EU-CMA) which is defined by the following game. a b ab and j, i.e., e(gi , gj ) = gi+j. It also will be conve- nient to abbreviate e(h1, h2, . . . , hj) = e(h1, e(h2,..., 1) Setup: Challenger runs Setup and KeyGen algo- rithms to generate the public parameters and chal- e(hj−1, hj) ...)) ∈ Gi for hj ∈ Gij and i1+i2+...+ij ≤ k. ∗ ∗ Let MulGen(1λ, k) be a PPT multilinear group genera- lenge keys (SK ,VK ). Adversary A is given the ∗ tor algorithm which takes as input a security parameter λ public parameters and VK . and an integer k, where k is the number of allowed pair- 2) Signing queries: Adversary A is allowed to adap- ing operations, then it outputs the multilinear parameters tively queries the signing oracle at most q times on MP = (G1,..., Gk, p, g = g1, g2, . . . , gk, ei,j) to satisfy the messages M1,...,Mq. In its i-th query, it receives above properties. ∗ back a signature σi ← Sign(SK ,Mi). In recent years, there has many multilinear maps been proposed, e.g., [9,12,14,22]. However, some of them have 3) Output: Finally, adversary A outputs a tuple of ∗ ∗ ∗ been shown to be insecure, e.g., [15, 21]. Fortunately, (M , σ ), where M 6= Mi for all i ∈ [q]. A wins the there still has several multilinear maps are beyond the game if Vrfy(VK∗,M ∗, σ∗) = 1. existing cryptanalysis, e.g., [1,14]. Therefore, we also can use this tool to design cryptographic schemes. For exam- We denote the success probability of a PPT adversary ple, [26,28,29] take advantage of the multilinear maps to A (taken over the random choices of the challenger and eu−cma design different cryptographic schemes. adversary) to win the game as AdvA . International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 730

Definition 2. We say that a signature scheme is EU- Correctness. To see the correctness, a signature σ Q` a CMA secure, if for any PPT adversary A, it cannot win i=1 i,mi x on message M is (gk−1 ) , and thus we have the above game with non-negligible advantage. Q` Q` i=1 ai,mi x i=1 ai,mi x e(σ, g) = e((gk−1 ) , g) = e((gk−1 ), g ) = e(A ,..., A ,VK). Selective security. The model of selective security is a 1,m1 `,m` weaker notion of the model of EU-CMA. In such model, we require that the adversary gives its chal- 4.2 Security of Multilinear BLS Scheme lenge message M ∗ before the setup phase, then it in the Standard Model cannot make signing query for M ∗. We now prove the security of the multilinear BLS scheme Definition 3. We say that a signature scheme is selec- in the standard model based on the MCDH assumption. tively secure, if for any PPT adversary, it cannot win the Our proof needs an admissible hash function h which can selective game with non-negligible advantage. be used to partition the message space to two subsets with probability 1/θ(q) (where q is the upper bound of the adversary’s queries) so that: the adversary’s query 4 Multilinear BLS Scheme messages Mi fall in one subset where we know a trap- door that allows us to answer its queries, and the adver- In this section, we describe the multilinear BLS signature sary’s challenge message M ∗ falls in the other set where scheme and its security. we do not know any trapdoor but hope to embed a chal- lenge element. We show that we can leverage the struc- ture of the multilinear BLS signature scheme to prove 4.1 Construction adaptive security. For simplicity of exposition, we as- We specify the message space M := {0, 1}`, more gener- sume that there is a polynomial s(λ) which denotes the ally, a collision resistant hash function can be used to hash length of messages space to be signed. We use a function s(λ) `(λ) messages to this size. The construction of the multilinear h : {0, 1} → {0, 1} maps the messages to ` bits, BLS signature scheme is as follows: and an efficient randomized algorithm Sample that is θ- admissible. The following definition of admissible hash • Setup(1λ, `): Trusted authority takes as input a secu- functions is from [17] which is a slight variant of the sim- rity parameter λ and the length ` of messages to runs plified definition in [11]. this algorithm to generate the public parameters. Definition 4. Let s, ` and θ be efficiently computable It first runs MP = (G1,..., Gk, p, g, . . . , gk, e) ← univariate polynomials. We say that a function h : λ MulGen(1 , k = `+1). Next, it chooses 2` random in- {0, 1}s(λ) → {0, 1}`(λ), and an efficient randomized algo- 2 tegers (a1,0, a1,1),..., (a`,0, a`,1) ∈ Zp and computes rithm Sample, are θ-admissible if the following properties a Ai,β = g i,β ∈ G1, for i ∈ [`] and β ∈ {0, 1}. The hold: public parameters PP contain the group descriptions ` s MP and (A1,0,A1,1),..., (A`,0,A`,1). • For any u ∈ {0, 1, ⊥} , define Pu : {0, 1} → {0, 1} as follows: Pu(X) = 0 iff ∀i : h(X)i 6= ui, and otherwise • KeyGen(PP): Each user chooses a random element (if ∃i : h(X)i = ui) we have Pu(X) = 1. x ∈ Zp as his signing key SK. The corresponding x verification key is VK = g ∈ G1. • We require that for any efficiently computable poly- s nomial q(λ), for all X1,..., Xq,Z ∈ {0, 1} , where • Sign(SK,M): Given a message, M, of length `, let Z 6∈ {Xi}, we have Pr[Pu(X1) = ... = Pu(Xq) = m1, . . . , m` be the bits of this message, the signer 1 ∧ Pu(Z) = 0] ≥ 1/θ(q), where the probability is computes the signature as: taken only over u ← Sample(1λ, q).

Q` ai,m Theorem 1. For any efficiently computable polynomials σ = e(A ,...,A )x = (g i=1 i )x ∈ .2 1,m1 `,m` k−1 Gk−1 s, `, there exists an efficiently computable polynomial θ such that there exists θ-admissible function families map- • Vrfy(V K, M, σ): Given a verification key VK and ping s bits to ` bits. a purported signature σ on message M, verify the following equation: The construction is identical to the multilinear BLS scheme with the exception of the Setup algorithm cre- ? ates the admissible hash functions. Then the signing and e(σ, g) = e(A1,m ,...,A`,m ,VK). 1 ` verification algorithms take as input h(M) instead of M. 2In the bilinear BLS signature scheme [6], signer computes a signature as σ = H(M)x, where H(·) is a collision-resistant hash Theorem 2. If h is a θ-admissible function and the k- function that will be treated as a random oracle in the proof. MCDH problem is hard in the multilinear groups, then In the multilinear BLS signature scheme, H(M) is defined as the multilinear BLS signature scheme with an admissible e(A1,m1 ,...,A`,m` ) which can be computed from the public pa- rameters of the (leveled) multilinear maps. hash is adaptively secure. International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 731

Proof. If there exists a PPT adversary A who can break Lemma 1. Assume an adversary that makes at most a the security of the multilinear BLS signature scheme with polynomial of signing queries q = q(λ) in Game0. If the an admissible hash in the EU-CMA game with advantage advantage of an adversary in Game0 is , then the advan-  for message length s, level k of the multilinear maps, tage of the adversary in Game1 will be at least /θ(q). In and security parameter λ, then we can construct a PPT particular, any PPT adversary with non-negligible advan- challenger B to break the k-MCDH assumption with prob- tage in Game0 will also have non-negligible advantage in 0 ability  ≥ /θ(q). The challenger B takes as input a Game1. k-MCDH instance (gc1 , . . . , gck ) together with the group descriptions to interactive with the adversary. The Proof. The lemma follows immediately from the property MP Q i∈[k] ci of function h satisfies the definition of a θ-admissibility, challenger’s goal is to compute g . k−1 since the only independent choice of u ← Sample(1λ, q) We describe the proof as a sequence of hybrid games determines whether or not the game aborts. where the first hybrid corresponds to the original EU- CMA game. Then in the first hybrid step we do a “par- Lemma 2. The advantage of any PPT adversary in titioning” of the space of the messages. After the first Game2 is the same as its advantage in Game1. proof step, we prove that any PPT adversary’s advan- tage must be close with negligible gap at most between Proof. The two games are equivalent as all Ai,β ∈ G1 are each successive hybrid games. We finally show that any still set to uniformly at random in both games. PPT adversary in the final game that succeeds with non- negligible advantage can be used to break the k-MCDH Lemma 3. If the k-MCDH assumption holds, then the assumption. advantage of any PPT adversary in Game2 is negligible.

• Game0 is the original EU-CMA game. Proof. We prove this lemma by giving a reduction to the k-MCDH assumption. To do so, we construct an algo- 1) Setup: Challenger B runs MulGen(1λ, k) to rithm B. produce group parameter MP. It then chooses B takes as input a k-MCDH problem instance c c λ a random exponent x ∈ Zp for the secret (MP, g 1 , . . . , g k ). Next, B runs u ← Sample(1 , q). It key and sets the challenge verification key sets the challenge key as VK∗ = gck . All these steps ∗ x as VK = g . It also randomly chooses together simulate the Setup phase of the Game2. Now, (a1,0, a1,1),..., (a`,0, a`,1) from Zp and com- it plays the game with the adversary A by using pub- ai,β putes Ai,β = g for i ∈ [`], β ∈ {0, 1}. Finally, lic parameters PP = (MP, {Ai,β|i ∈ [`], β ∈ {0, 1}}) and ∗ it sets PP = (MP, {Ai,β|i ∈ [`], β ∈ {0, 1}}) and challenge key VK . gives PP and VK∗ to the adversary A. The adversary A will then adaptively make 2) Signing queries: Adversary A adaptively at most q signing queries each for message Mi. queries the signing oracle at most q times on If Pu(Mi) = 0, B aborts and quits. Other- wise, Pu(Mi) = 1 and there exists an γ we have messages M1,...,Mq. In its i-th query, it re- x h(Mi)γ = uγ . Thus, B can compute the signature as ceives back e(A1,m ,...,A`,m ) from chal- i1 i` σ = e(A ,...,A ,A ,..., lenger B. 1,h(Mi)1 γ−1,h(Mi)γ−1 γ+1,h(Mi)γ+1 ∗ bγ,h(M ) A`,h(M ) ,VK ) i γ by knowing the exponent 3) Output: At some point A outputs a forgery i ` bγ,h(M ) of the parameter Aγ,h(M ) . σ∗ with respect to the challenge key VK∗ and i γ i γ ∗ ∗ ∗ Finally, the adversary A outputs an attempted forgery message M , it wins the game if Vrfy(VK ,M , ∗ ∗ ∗ ∗ σ with respect to the challenge verification VK on σ ) = 1 and M 6= Mi for i ∈ [q]. some message M ∗. B first checks the signature ver- ification Vrfy(VK∗,M ∗, σ∗) and aborts if it returns 0. • Game1 is the same as Game0 except that the chal- ∗ ` Next, it checks if Pu(M ) = 1 and aborts if that is the lenger begins by sampling a string u ∈ {0, 1, ⊥} by ∗ λ case. Otherwise, Pu(M ) = 0 and for all i we have revoking u ← Sample(1 , q). At the end of the game, ∗ ∗ h(M )i 6= ui. This means that the hash of M will the adversary is only considered to be successful if Q i∈[k−1] ci both its output satisfies the winning conditions and be gk−1 raised to some known product of bi,β val- Q ∗ ∗ ci for the challenge message M we have Pu(M ) = 0 i∈[k] ues. The signature therefore contains gk−1 raised to and for all messages M queried P (M ) = 1. i u i some known product of bi,β values. This value can be re- covered by taking the proper root of the signature, i.e., Q Q c • Game2 is the same as Game1 except that the follow- ∗ 1/ b ∗ i∈[k] i ∗ (σ ) i∈[`] i,h(M )i = g , and thus if σ is a suc- ing modification. The challenger sets the parameters k−1 (A ,A ),..., (A ,A ) in the following way: for cessful forgery, then this root of the signature is a solution 1,0 1,1 `,0 `,1 to the challenge instance of the k-MCDH problem. i ∈ [`] and β ∈ {0, 1} it chooses random bi,β ∈ Zp and sets By construction of the algorithm B, the probability of B succeeds is exactly the advantage that the adversary A  b g i,β , if β = ui succeeds in Game2. Whenever B aborted, the adversary Ai,β = ci b (g ) i,β , if β 6= ui. by the rules of Game2 was not considered to be successful International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 732 since its queries or forgery violated the partition. The 5.2 Security Models of Ring Signatures lemma follows. Security models of the ring signature scheme contains two parts: unforgeability and anonymity. These three lemmas together yield the main theorem that the multilinear BLS signature scheme with an ad- 5.2.1 Ring Unforgeability missible hash is adaptively secure. This security guarantees that an adversary can compute a valid signature on behalf of a ring only if he knows a 5 Ring Signatures from Multilin- secret key corresponding to one of them. In this work, we use Bender et al.’s [5] model: unforgeability with respect ear BLS Scheme to insider corruption3 which is defined by the following game: In this section, we show the applications of the multilin- ear BLS signatures. It can be served as the key-generation 1) Setup: The challenger runs Setup and KeyGen al- algorithm of the multilinear IBE scheme [11], it also can gorithms to generate public parameters and users’ n(λ) be used to construct aggregate signature [17]. In addi- keys {(SKi,VKi)}i=1 . Then it gives the adver- tion, based on Boldyreva’s [7] work, we can easily obtain sary A the system parameters and verification keys a threshold signature, a multi-signature, and a blind sig- n(λ) S = {VKi}i=1 . In addition, the challenger main- nature scheme, respectively, based on the multilinear BLS tains a set C to record the corrupted users, initially, scheme in the standard model. Here, we take advantage C ← ∅. of the multilinear BLS scheme to construct a ring signa- ture scheme in the standard model. The resultant scheme 2) Signing queries: The adversary A can adaptively has an attractive feature that for n members of a ring our make singing queries on inputs (M, R, s), where M signatures consist of just a single group element. is the message to be signed, R ⊆ S is a ring of veri- fication keys and s is an index such that VKs ∈ R. The challenger returns back a ring signature σ ← 5.1 Definition of Ring Signatures Sign(SKs,R,M) to A. For convenience, we define an algorithm Setup, run by 3) Corruption queries: The adversary A also can trusted authority, to generate public parameters and build adaptively make some corruption queries on input the system of the ring signature scheme. The public pa- s ∈ [n(λ)]. The challenger returns back SKs to A rameters will be used in all of the following three algo- and adds VKs into the set C. rithms. In addition, we refer to an ordered set R = 4) Output: Finally, the adversary A outputs a tuple {VK ,...,VK } of verification keys as a ring, and let 1 n of (M ∗, σ∗,R∗). We say that A wins the game if the R[i] = VK . We will also freely use set notation, e.g., i following conditions hold: (1) Vrfy(R∗,M ∗, σ∗) = 1; VK ∈ R if there exists an index i such that R[i] = VK. (2) R∗ ⊆ S\C; (3) it never made a singing query (M ∗,R∗, s) for any s. Definition 5. A ring signature scheme contains the fol- lowing four algorithms: We denote the success probability of a PPT adversary A (taken over the random choices of the challenger and λ • Setup(1 ) → PP: The system setup algorithm takes adversary) to win the above game as AdvUnf . as input a security parameter λ to produce the sys- A tem public parameters PP. Definition 6. We say that a ring signature scheme has the property of unforgeability with respect to insider cor- • KeyGen() → (SK,VK): The key generation algo- ruption, if for any PPT adversary A, it cannot win the rithm generates users’ signing and verification keys above game with non-negligible advantage. (SK,VK). Selective security. We define a weaker notion, selective • Sign(SK ,R,M) → σ: The signing algorithm takes security, to the above model. In the game of selec- s tive security, the adversary A is required that to give as input a message M to be signed, a set of verifica- ∗ ∗ 4 tion keys R (i.e., the ring), and an user’s signing key a forgery ring/message pair (R ,M ) to the chal- lenger before the setup phase, then it cannot make SKs. It is required that VKs ∈ R meanings that the signer is a member of the signing ring. The algorithm 3We make use of a weaker notion of this security model in which outputs a signature σ. corruptions of honest users are allowed but adversary-chosen public keys are not allowed. This weaker notion has been used in [24, 27]. 4 n(λ) In the beginning, A does not given the keys S = {VKi}i=1 . • Vrfy(R, M, σ) → 0/1: The verification algorithm In order to obtain the forgery ring R∗, we require that A out- takes as input a purported signature σ on a ring R puts a set of index IR∗ = {i1, . . . , i|R∗|} ⊆ [n(λ)]. Then, after and a message M. It outputs 1 if σ is valid. Other- n(λ) ∗ the keys S = {VKi}i=1 be generated, the forgery ring R = wise, it outputs 0. {VKi1 ,...,VKi|R∗| } ⊆ S also be defined. International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 733

signing query on inputs (M ∗,R∗, s) for any s, it also • Setup(1λ, n, `): Trusted authority takes as input cannot make corruption query on input s for which a security parameter λ, the length ` of messages ∗ VKs ∈ R . and ring size n to runs this algorithm to gen- erate public parameters. It first runs MP = λ Definition 7. We say that a ring signature scheme is (G1,..., Gk, p, g, . . . , gk, ei,j) ← MulGen(1 , k = n + selectively unforgeable with respect to insider corruption, `). Next, it chooses 2` random values (a1,0, a1,1),..., 2 ai,β if for any PPT adversary A, it cannot win the selective (a`,0, a`,1) ∈ Zp and computes Ai,β = g ∈ G1, for game with non-negligible advantage. i ∈ [`] and β ∈ {0, 1}. The public parameters PP con- tain the group descriptions MP and group elements (A ,A ), ..., (A ,A ). 5.2.2 Ring Anonymity 1,0 1,1 `,0 `,1 • KeyGen( ): Each user i chooses a random value This security guarantees that any verifier can be con- PP x ∈ as his signing key SK . The corresponding vinced that someone in the ring has generated a valid ring i Zp i verification key is VK = gxi ∈ . signature, but the real signer remains unknown. In this i G1 paper, we make use of the notion of perfect anonymity. • Sign(M,SKs,R = {VK1,...,VKn}): Given a ring We say that a ring signature scheme is perfectly anony- of n verification keys, the holder of signing key SKs ∗ ∗ mous, if a signature on a message M under a ring R with s ∈ [n] can sign some message M ∈ M as σ = and key VK looks exactly the same as a signature on i0 e(A1,m ,...,A`,m ,VK1,...,VKs−1,VKs+1,..., ∗ ∗ 1 ` the same message M under the same ring R and a dif- xs VKn) . The signature consists of just a single group ferent key VK . This means that the signer’s key is hid- Q` Qn i1 ( i=1 ai,mi )·( j=1 xj ) den among all the honestly generated keys in the ring. element. In fact, σ = gk−1 ∈ Gk−1. Formally, it is defined by the following game: • Vrfy(M, σ, R = {VK1,...,VKn}): Given a ring of n verification keys and a purported signature σ on a 1) Setup: The challenger runs Setup and KeyGen algo- ? message M, check the following equation: e(σ, g) = rithms to generate public parameters and users’ keys n(λ) e(A1,m1 ,...,A`,m` ,VK1,...,VKn). {(SKi,VKi)}i=1 . Then it returns back the public n(λ) parameters and all keys {(SKi,VKi)}i=1 to the ad- Correctness. To see the correctness, a signature σ on Q` Qn versary A. ( i=1 ai,mi )·( j=1 xj ) message M and ring R is gk−1 , and Q` Qn ( i=1 ai,mi )·( j=1 xj ) 2) Challenge: The adversary A gives a tuple of thus we have e(σ, g) = e(gk−1 , g) = ∗ ∗ ∗ Q` ai,m Qn (M ,R , i0, i1), where M is the challenge message, i=1 i j=1 xj ∗ e(gk−1 , g ) = R is the challenge ring, i0 and i1 are two indices ∗ e(A1,m1 ,...,A`,m` ,VK1,...,VKn). such that {VKi0 ,VKi1 } ⊆ R , to the challenger. The challenger chooses random b ∈ {0, 1}, computes In the setting of the multilinear maps, the space to repre- σ∗ ← Sign(M ∗,SK ,R∗), and sends σ∗ to the ad- ib sent a group element might grow with k (which is n + `), versary. because this happens in the GGH [12] framework. To mit- igate this problem, we can use the method in [17], which 3) Guess: Finally, the adversary outputs b0, indicating differs the message alphabet size in a tradeoff between his guess for b. computation and storage. The above construction uses a binary message alphabet. If it uses an alphabet of 2d sym- We denote the advantage of an unbounded adversary A bols, then the ring signature could resident in the group (taken over the random choices of the challenger and the with `/d + n − 1 pairings required to compute it, adversary) to win the above game as AdvAno = |Pr[b0 = G`/d+n A at the cost of the public parameters requiring 2d · ` group b] − Pr[b0 6= b]|. elements in G1. Definition 8. A ring signature scheme has the property of perfect anonymity, if even an unbounded adversary can- 5.4 Security not win the above game with non-negligible advantage. Theorem 3. The ring signature scheme with message length ` and ring size n in the above is selectively unforge- 5.3 Construction able with respect to insider corruption under the (n + `)- MCDH assumption. We now construct a ring signature scheme based on the multilinear BLS scheme. We specify the message space Proof. If there exists a PPT adversary A who can break M := {0, 1}`, more generally, a collision resistant hash the selective security of the ring signature scheme with ad- function can be used to hash messages to this size. Let vantage  for message length `, ring size n, level k = n + ` m1, . . . , m` be the bits of the message M ∈ M. The of multilinear maps, and security parameter λ, then we following construction is an n-user ring signature scheme, show that we can construct a PPT challenger B to break means that |R| = n. the k-MCDH assumption for security parameter λ with International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 734

∗ ` probability . Initially, A gives (M ∈ {0, 1} ,IR∗ = Given a ring signature, we show that any ring member ∗ {i1, . . . , in} ⊆ [n(λ)]) to B who is given an instance, could possibly have created it. Consider a signature σ c1 c ∗ ∗ (MP, g , . . . , g k ), of the k-MCDH assumption. on ring R = {VK1,...,VKn} and message M , that has been created using key SK . We will show that with the 1) Setup: The challenger B first sets signing and i0 same probability it could have been created using SK verification keys for the challenge ring (SK = i1 i1 with i 6= i . The proof is straight-forward. c`+1 ck 1 0 c`+1,VKi1 = g ),..., (SKin = ck,VKin = g ) (it does not know these ci). For indices i 6∈ IR∗ , it ∗ ∗ Proof. For any tuple (M ,R , i0, i1) which are chosen chooses random xi ∈ Zp and sets SKi = xi,VKi = xi by an unbounded adversary A, the signatures created g . It then generates parameters as follows: ∗ by the member i and i are σ = e(A ∗ ,..., 0 1 i0 1,m1 x A ∗ ,VK ,...,VK ,VK ,...,VK ) i0 • Choose random integers a1, . . . , a` ∈ Zp. `,m` 1 i0−1 i0+1 n ∗ ci and σ = e(A1,m∗ ,...,A`,m∗ , • For i ∈ [`], set Ai,m∗ = g and compute i1 1 ` i xi ai VK ,...,VK ,VK ,...,VK ) 1 , respectively. A ¯∗ = g . 1 i1−1 i1+1 n i,mi Q` Qn ( ai,m∗ )·( xi) However, σ∗ = σ∗ = g i=1 i i=1 since the Note that these parameters are distributed uniformly i0 i1 k−1 at random as in the real ring signature scheme. Then signing algorithm is deterministic. Therefore, any mem- ber of a ring can compute a same signature on a given B sets the public parameters PP = (MP, {Ai,β|i ∈ message and ring. The perfect anonymity follows easily [`], β ∈ {0, 1}}). Finally, it gives and {VK }n(λ) PP i i=1 from this observation. to A. 2) Signing queries: Conceptually, the challenger B can generate signatures for the adversary, because 6 Conclusion the adversary’s requests and the challenge ring or message will be different in at least one bit. Specifi- In this work, we consider the BLS signature scheme in the cally, when A makes a query to the signing oracle on setting of multilinear groups. First of all, we present a ∗ input (M,R = {VK1,...,VKn}, s). If R 6= R , we proof of adaptive security for the multilinear BLS scheme ∗ assume that VKj ∈ R but 6∈ R , then B ignores the based on MCDH assumption. Then, we construct a index s and signs M with SKj in the usual way since ring signature scheme that, based on the multilinear BLS ∗ B knows VKj’s singing key xj. If R = R , then we scheme, has an attractive feature that for n members of a ∗ ∗ ring the signatures consist of just a single group element. know M 6= M and assume that mγ 6= mγ , where mγ ∗ ∗ and mγ are the γ-th bit of the message M and M ,

respectively. Hence B can compute σ = e(A1,m1 , ..., aγ Acknowledgments Aγ−1,mγ−1 ,Aγ+1,mγ+1 ,...,A`,m` , VK1, ...,VKn) by knowing the exponent a of the parameter A ¯∗ . γ γ,mγ Finally, it returns σ to the adversary A. This study was supported by the National Natural Sci- ence Foundation of China under Grant 61702067, the 3) Corruption queries: When A makes a query to the Natural Science Foundation of Chongqing under Grants corruption oracle with input an index i for i 6∈ IR∗ , cstc2017jcyjAX0201 and cstc2019jcyj-msxmX0551, and B gives SKi to A and adds VKi to the set C of the the key project of science and technology research pro- corrupted users. gram of Chongqing Education Commission of China under Grants KJZD-K201803701 and KJQN201903701. 4) Output: Finally, A outputs a forgery σ∗ with re- The authors gratefully acknowledge the anonymous re- spect to the challenge ring R∗ = {VK ,...,VK } i1 in viewers for their valuable comments. and message M ∗. Then B outputs σ∗ as the so- lution to the given instance of the k-MCDH as- sumption. According to the setting of the public parameters and the verification keys of the chal- References lenge ring in the setup phase, and the assumption [1] M. R. Albrecht, P. Farshim, D. Hofheinz, E. Lar- that σ∗ is valid, we know that σ∗ should be equal raia, K. G. Paterson, ”Multilinear maps from ob- to e(A1,m∗ ,...,A`,m∗ ,VK1,...,VKj−1,VKj+1,..., 1 Q ` fuscation,” in Proceedings of Part I of the 13th In- ci c`+j i∈[1,k] VKn) = gk−1 , where c`+j is a certain sign- ternational Conference on Theory of Cryptography ing key A uses. It implies that σ∗ is a solution for the (TCC’16), vol. 9562, pp. 446-473, 2016. given instance to the k-MCDH problem, and thus B [2] D. Boneh, X. Boyen, ”Secure identity based encryp- breaks the k-MCDH assumption. tion without random oracles,” in Annual Interna- tional Cryptology Conference, pp. 443-459, 2004. It is clear that B succeeds whenever A does. [3] D. Boneh, X. Boyen, ”Short signatures without Theorem 4. The ring signature scheme with message random oracles,” in International Conference on length ` and ring size n in the above is anonymous against the Theory and Applications of Cryptographic Tech- any unbounded adversary. niques, vol. 3027, pp. 56-73, 2004. International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 735

[4] D. Boneh, M. Franklin, ”Identity-based encryption from the Weil pairing,” in Annual International [19] M. S. Hwang, C. C. Lee, Y. C. Lai, “An untrace- Cryptology Conference, vol. 2139, pp. 213-229, 2001. able blind signature scheme”, IEICE Transactions [5] A. Bender, J. Katz, R. Morselli, ”Ring signatures: on Foundations, vol. E86-A, no. 7, pp. 1902–1906, Stronger definitions, and constructions without ran- July 2003. dom oracles,” Journal of Cryptolog, vol. 22, no. 1, [20] C. C. Lee, M. S. Hwang, and W. P. Yang, “A new pp. 114-138, 2009. blind signature based on the discrete logarithm prob- [6] D. Boneh, B. Lynn, H. Shacham, ”Short signatures lem for untraceability”, Applied Mathematics and from the Weil pairing,” Journal of Cryptology, vol. Computation, vol. 164, no. 3, pp. 837–841, May 2005. 17, no. 4, pp. 297-319, 2004. [21] H. T. Lee, H. J. Seo, ”Security analysis of multilin- [7] A. Boldyreva, ”Threshold signatures, multisigna- ear maps over the integers,” in Annual Cryptology tures and blind signatures based on the gap-Diffie- Conference, vol. 8616, pp. 224-240, 2014. Hellman-group signature scheme,” in International [22] A. Langlois, D. Stehl´e, R. Steinfeld, ”GGHLite: Workshop on Public Key Cryptography, pp. 31-46, More efficient multilinear maps from ideal lattices,” 2002. Advances in Cryptology, vol. 8441, pp. 239-256, 2014. [8] D. Boneh, A. Silverberg, ”Applications of multilinear [23] R. Rivest, A. Shamir, Y. Tauman, ”How to leak a se- forms to cryptography,” Contemporary Mathematics, cret,” in International Conference on the Theory and vol. 324, pp. 71-90, 2002. Application of Cryptology and Information Security, [9] J. S. Coron, T. Lepoint, M. Tibouchi, ”New multi- vol. 2248, pp. 552-565, 2001. linear maps over the integers,” in Annual Cryptology [24] S. SchageS, J. Schwenk, ”A CDH-based ring signa- Conference, vol. 9215, pp. 267-286, 2015. ture scheme with short signatures and public keys,” [10] L. Chen, J. Malone-Lee, ”Improved identity-based in International Conference on Financial Cryptogra- signcryption,” International Workshop on Public phy and Data Security, vol. 6052, pp. 129-142, 2010. Key Cryptography, vol. 3386, pp. 362-379, 2005. [25] B. Waters, ”Efficient identity-based encryption with- [11] E. S. V. Freire, D. Hofheinz, K. G. Paterson, C. out random oracles,” in Annual International Con- Striecks, ”Programmable hash functions in the mul- ference on the Theory and Applications of Crypto- tilinear setting,” in Annual Cryptology Conference, graphic Techniques, vol. 3494, pp. 114-127, 2005. vol. 8042, pp. 513-530, 2013. [26] H. Wang, D. He, J. Shen, Z. Zheng, X. Yang, M. [12] S. Garg, C. Gentry, S. Halevi, ”Candidate multilin- H. Au, ”Fuzzy matching and direct revocation: A ear maps from ideal lattices,” Annual International new CP-ABE scheme from multilinear maps,” Soft Conference on the Theory and Applications of Cryp- Computing, vol. 22, no. 7, pp. 2267-2274, 2017. tographic Techniques, vol. 7881, pp. 1-17, 2013. [27] F. Tang, H. Li, ”Ring signatures of constant size [13] S. Garg, C. Gentry, S. Helevi, M. Raykova, A. Sahai, without random oracles,” in International Confer- B. Waters, ”Canditate indistinguishability obfusca- ence on Information Security and Cryptology, vol. tion and functional encryption for all circuits,” IEEE 8957, pp. 93-108, 2015. 54th Annual Symposium on Foundations of Com- [28] F. Tang, H. Li, B. Liang, ”Attribute-based signatures puter Science, pp. 40-49, 2013. ISBN: 978-0-7695- for circuits from multilinear maps,” in International 5135-7. Conference on Information Security, vol. 8783, pp. [14] C. Gentry, S. Gorbunov, S. Halevi, ”Graph-induced 54-71, 2014. multilinear maps from lattices,” in Theory of Cryp- [29] F. Tang, Y. Zhou, ”Policy-based signatures for pred- tography Conference, vol. 9015, pp. 498-527, 2015. icates,” International Journal of Network Security, [15] Y. Hu, H. Jia, ”Cryptanalysis of GGH map,” in An- vol. 19, no. 5, pp. 811-822, 2017. nual International Conference on the Theory and Ap- plications of Cryptographic Techniques, vol. 9665, pp. 537-565, 2016. [16] M. S. Hwang, C. C. Lee, Y. C. Lai, ”An untrace- Biography able blind signature scheme,” IEICE Transactions Fei Tang received his Ph.D from the Institute of Informa- on Foundations, vol. E86-A, no. 7, pp. 1902-1906, tion Engineering of Chinese Academy of Sciences in 2015. 2003. He is currently an associate professor of the College of [17] S. Hohenberger, A. Sahai, B. Waters, ”Full domain Cyberspace Security and Law, Chongqing University of hash from (leveled) multilinear maps and identity- Posts and Telecommunications. His research interests are based aggregate signatures,” in Annual Cryptology public key cryptography and blockchain. Conference, vol. 8024, pp. 494-512, 2013. [18] S. Hohenberger, A. Sahai, B. Waters, ”Replacing a Dong Huang received his Ph.D from the Chongqing random oracle: full domain hash from indistinguisha- University in 2012. He is currently a professor of the bility obfuscation,” in Annual International Con- Chongqing University of Science and Technology and ference on the Theory and Applications of Crypto- Chongqing Vocational and Technical University of Mecha- graphic Techniques, vol. 8441, pp. 201-220, 2004. tronics. His research interest is public key cryptography. International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 736

Eighth Power Residue Double Circulant Self-Dual Codes

Changsong Jiang1,3, Yuhua Sun1,2, and Xueting Liang1 (Corresponding author: Yuhua Sun)

College of Science, China University of Petroleum1 Qingdao, Shandong 266580, China Provincial Key Laboratory of Applied Mathematics, Putian University, Putian, Fujian 351100, China2 School of Computer Science and Engineering, University of Electronic Science and Technology of China3 (Email: sunyuhua [email protected]) (Received Mar. 1, 2019; Revised and Accepted Sept. 16, 2019; First Online Jan. 23, 2020)

Abstract pher sequences with good pseudo-random properties (for example, see [7, 23, 27]). In fact, they have also been Self-dual codes are one of the most important classes of used to construct error-correcting codes, and a very in- linear codes. Power residue classes are widely used in teresting method of constructing linear codes or self-dual the constructions of linear codes and pseudo-random se- codes is combining double circulant matrices and residue quences. In this paper, we give new constructions of classes to give the generator matrix of codes(for example, self-dual codes over GF(2) and GF(4) by eighth power see [8, 10, 12, 16, 21])). But, in most of the relevant lit- residues. We get multiple pure double circulant codes eratures at present, the residue being used to construct and bordered double circulant codes. Some of these new codes are mainly quadratic residue. self-dual codes have large minimum distances. Recently, Zhang and Ge introduced fourth power Keywords: Cyclotomic Number; Double Circulant Code; residue double circulant and obtained several new Eighth Power Residues; Self-Dual Code infinite families of classes of self-dual codes over GF(2), GF(3), GF(4), GF(8), GF(9) [30]. Some of these codes have better minimum weight than previously known 1 Introduction codes. In this paper, inspired by their methods, we con- struct double circulant self-dual codes by by higher power The famous paper ”A mathematical theory of commu- residues, especially eighth power residues. We give new nication” [26] by Shannon marked the beginning of cod- constructions of self-dual codes over GF(2) and GF(4) by ing theory. Codes with good properties have many ap- prime p of the form 16f + 9, and some of these codes plications in cryptography and communication systems. have good parameters. Examples of such codes are bi- Most of the codes constructed in the initial stage were nary self-dual [82, 41, 14] code, quaternary self-dual [82, binary codes. Now, codes over finite fields and over finite 41, 14] code and quaternary self-dual [84, 42, 12] code. rings are very common in both mathematical and engi- All computation have been done by MATLAB R2017b neering literatures. Thanks to having neat mathematical and MAGMA V2.12 on a 2.50 GHz CPU. structure and being easy to code and decode, linear codes The paper is organized as follows. In Section 2, we give play a decisive role in coding theory. It is worth not- the relevant knowledge of double circulant codes and self- ing that, among linear codes, there is one class of special dual codes. In Section 3, we describe the detailed process codes, i.e., self-dual codes which are widely used in data of constructing linear codes by eighth power residues and transmission and have become important tools to con- discuss the parameter conditions satisfying self-dual prop- struct quantum error-correcting codes. Therefore various erty. Section 4 considers the constructions over GF(2) and methods of construction and analysis of self-dual codes GF(4) respectively. A conclusion is given in Section 5. have been presented by coding researchers and various classes of linear codes with self-dual property appeared successively in many literatures. For example, readers can 2 Preliminaries refer to [1–6,9,11,17–19,22,24,25,28,29] or can also refer to the survey paper [13] for the advances of early research Self-Dual Codes. A linear [n, k] code C of length n in this field. It is well known that power residue classes and dimension k over the Galois field with q ele- have become an important tool to construct stream ci- ments GF(q) is a linear subspace of dimension k of International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 737

GF(q)n, where q is a prime power. An element of power residue double circulant code and bordered eighth the code C is called a codeword of C. A generator power residue double circulant code, respectively. Some matrix of C is a matrix whose rows generate C. Let codes have large minimum distances, and almost reach x = (x1, x2, ..., xn) and y = (y1, y2, ..., yn) be two the bounds of the minimum distance. codewords of GF(q)n. The Euclidean inner product Pn is defined by (x, y) = i=1 xiyi. For a linear code C, the code C⊥ = {x ∈ GF(q)n|(x, c) = 0 for all c ∈ C} 3 Generator Matrices of Eighth is called its Euclidean dual code. And we say C is self-orthogonal if C ⊆ C⊥ and C is self-dual if Power Residue Double Circu- C = C⊥. lant Self-Dual Codes

Definition 1. [30]: Let Pn(R) and Bn(R) be codes with generator matrices of the form Let p = Nf + 1 be a prime with a fixed primitive root g over GF(q). We define the Nth cyclotomic classes

(In R) (1) C0,C1, ..., CN−1 of GF(p) by

 jN+i and Ci = g |0 ≤ j ≤ f − 1 ,

 α 1 ··· 1  where 0 ≤ i ≤ N − 1. Then we call C0 is the Nth power  −1  residues modulo p, and C = giC where 0 ≤ i ≤ N − 1. I  (2) i 0  n+1 .   . R  Define the cyclotomic number (i, j) of order N to be the −1 number of integers n (mod p) which satisfy respectively, where α ∈ GF(q), I is the identity matrix n ≡ g16s+i, 1 + n ≡ g16t+j(mod p), and R is an n × n circulant matrix. An n by n circulant matrix has the form where s, t in {0, 1, 2, ..., f − 1}.   In order to give the necessary conditions, we give the r0 r1 r2 ··· rn−1 eighth power residue cyclotomic numbers and derive the rn−1 r0 r1 ··· rn−2   (3) relationships between them when p is an odd prime of the  . . . .   . . . .  form 16l + 9. r1 r2 r3 ··· r0 so that each successive row is a cyclic shift of the previous Lemma 1. [15]: Let p = ef + 1 be an odd prime. Then 0 0 0 one. The codes Pn(R) and Bn(R) are called pure double 1) (i, j)e = (i , j )e, when i ≡ i (mod e) and j ≡ circulant and bordered double circulant, respectively. j0(mod e). 2) (i, j)e = (e − i, j − i)e The (Hamming) distance between two codewords x  (j, i)e; if f even. and y denoted by d(x, y), is defined to be the num- = e e (j + , i + )e; if f odd. ber of places at which x and y differ. The Hamming 2 2 3) Pe−1(i, j) = f − δ , where δ = 1 if j ≡ weight of a codeword is the number of non-zero compo- i=0 e j j nents. And the minimum distance d(C) of C is defined 0 (mod e); otherwise δj = 0. by d(C) = min{d(x, y)|x 6= y ∈ C}, and it also equals to the minimum weight of the codewords of C except for 0. Let p be a prime of the form p = 16l + 9, where l is Let C be a self-dual code over GF(q) of length n and a positive integer. From Lemma 1, the relationships of minimum distance d(C). Then the following bounds are cyclotomic numbers of order 8 are known in [14, 22, 24, 25]. For binary self-dual codes:

 4[ n ] + 4, if n 6= 22 (mod 24), d(C) ≤ 24  4[ n ] + 6, if n = 22 (mod 24).  (0, 0)8 = (4, 0)8 = (4, 4)8, (0, 1)8 = (3, 7)8 = (5, 4)8, 24   (0, 2)8 = (2, 6)8 = (6, 4)8, (0, 3)8 = (1, 5)8 = (7, 4)8,  The minimum distance of a self-dual ternary code C  (0, 4)8, (0, 5)8 = (1, 4)8 = (7, 3)8, satisfies: d(C) ≤ 3[ n ] + 3 and for quaternary Euclidean  (0, 6) = (2, 4) = (6, 2) , (0, 7) = (3, 4) = (5, 1) , 12  8 8 8 8 8 8 self-dual codes: d(C) ≤ 4[ n ] + 4. The code C is called  (1, 0) = (3, 3) = (4, 1) = (4, 5) = (5, 0) = (7, 7) , 12  8 8 8 8 8 8 extremal if the equality holds. If a code has the highest (1, 1)8 = (3, 0)8 = (4, 3)8 = (4, 7)8 = (5, 5)8 = (7, 0)8, possible minimum weight for its length and dimension, we  (1, 2)8 = (2, 7)8 = (3, 6)8 = (5, 3)8 = (6, 5)8 = (7, 1)8,  call it optimal.  (1, 3)8 = (1, 6)8 = (2, 5)8 = (6, 3)8 = (7, 2)8 = (7, 5)8,  In this paper, we construct a circulant matrix R by  (1, 7)8 = (2, 3)8 = (3, 5)8 = (5, 2)8 = (6, 1)8 = (7, 6)8,  eighth power residue and get a necessary condition such  (2, 0)8 = (2, 2)8 = (4, 2)8 = (4, 6)8 = (6, 0)8 = (6, 6)8,  that the corresponding codes are self-dual. Further, under  (2, 1)8 = (3, 1)8 = (3, 2)8 = (5, 6)8 = (5, 7)8 = (6, 7)8. this condition, we get two kinds of codes called pure eighth (4) International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 738

Remark 1. For simplicity, in the next we denote in equation (7) have the following relationships.

t t t t A1 = A5,A2 = A6,A3 = A7,A4 = A8, 2 A1 = AA1 + BA2 + CA3 + DA4 + EA5 + FA6 + GA7 + HA8, 2 A2 = HA1 + AA2 + BA3 + CA4 + DA5 + EA6 + FA7 + GA8, 2 A3 = GA1 + HA2 + AA3 + BA4 + CA5 + DA6 + EA7 + FA8, 2  A4 = FA1 + GA2 + HA3 + AA4 + BA5 + CA6 + DA7 + EA8, A := (0, 0)8,B := (0, 1)8,C := (0, 2)8, 2 A5 = EA1 + FA2 + GA3 + HA4 + AA5 + BA6 + CA7 + DA8,  2  A6 = DA1 + EA2 + FA3 + GA4 + HA5 + AA6 + BA7 + CA8,  D := (0, 3)8,E := (0, 4)8,F := (0, 5)8, 2  A7 = CA1 + DA2 + EA3 + FA4 + GA5 + HA6 + AA7 + BA8, 2 G := (0, 6)8,H := (0, 7)8,I := (1, 0)8, (5) A8 = BA1 + CA2 + DA3 + EA4 + FA5 + GA6 + HA7 + AA8, A1A2 = A2A1 = IA1 + JA2 + KA3 + LA4 + FA5 + DA6 + LA7 + MA8,  J := (1, 1) ,K := (1, 2) ,L := (1, 3) , A A = A A = NA + OA + NA + MA + GA + LA + CA + KA ,  8 8 8 1 3 3 1 1 2 3 4 5 6 7 8  A1A4 = A4A1 = JA1 + OA2 + OA3 + IA4 + HA5 + MA6 + KA7 + BA8, M := (1, 7)8,N := (2, 0)8,O := (2, 1)8. A1A5 = A5A1 = (2l + 1)Ip + AA1 + IA2 + AA3 + JA4 + AA5 + IA6 +NA7 + JA8, A1A6 = A6A1 = IA1 + HA2 + MA3 + KA4 + BA5 + JA6 + OA7 + OA8, A1A7 = A7A1 = NA1 + MA2 + GA3 + LA4 + CA5 + KA6 + NA7 + OA8, A1A8 = A8A1 = JA1 + AA2 + LA3 + FA4 + DA5 + LA6 + MA7 + IA8, A2A3 = A3A2 = MA1 + IA2 + JA3 + KA4 + LA5 + FA6 + DA7 + LA8, A2A4 = A4A2 = KA1 + NA2 + OA3 + NA4 + MA5 + GA6 + LA7 + CA8, A2A5 = A5A2 = BA1 + JA2 + OA3 + OA4 + IA5 + HA6 + MA7 + KA8, Let p ≡ 1(mod 8) be a prime. Its 8th cyclo- A2A6 = A6A2 = JA1 + AA2 + IA3 + NA4 + JA5 + AA6 + IA7 + NA8, A2A7 = A7A2 = OA1 + IA2 + AA3 + MA4 + KA5 + BA6 + JA7 + OA8, tomic classes are C ,C ,C ,C ,C ,C ,C and C . A2A8 = A8A2 = OA1 + NA2 + MA3 + GA4 + LA5 + CA6 + KA7 + NA8, 0 1 2 3 4 5 6 7 A3A4 = A4A3 = LA1 + MA2 + IA3 + JA4 + KA5 + LA6 + FA7 + DA8, Suppose m , m , m , m , m , m , m , m , m are the A3A5 = A5A3 = CA1 + KA2 + NA3 + OA4 + NA5 + MA6 + GA7 + LA8, 0 1 2 3 4 5 6 7 8 A3A6 = A6A3 = KA1 + BA2 + JA3 + OA4 + OA5 + IA6 + HA7 + MA8, elements of GF(q). Then we construct the matrix A3A7 = A7A3 = (2l + 1)Ip + NA1 + JA2 + AA3 + IA4 + NA5 + JA6 +AA7 + IA8, A A = A A = OA + OA + IA + HA + MA + KA + BA + JA , Cp(m0, m1, m2, m3, m4, m5, m6, m7, m8) which is a p × p 3 8 8 3 1 2 3 4 5 6 7 8 A4A5 = A5A4 = DA1 + MA2 + LA3 + IA4 + JA5 + KA6 + LA7 + FA8, A A = A A = LA + CA + KA + NA + OA + NA + MA + GA , matrix on GF(q). The component cij, 1 ≤ i, j ≤ p, defines 4 6 6 4 1 2 3 4 5 6 7 8 A4A7 = A7A4 = MA1 + KA2 + BA3 + JA4 + OA5 + OA6 + IA7 + HA8, A4A8 = A8A4 = (2l + 1)Ip + IA1 + NA2 + JA3 + AA4 + IA5 + NA6 +JA7 + AA8, A5A6 = A6A5 = FA1 + DA2 + LA3 + MA4 + IA5 + JA6 + KA7 + LA8, A5A7 = A7A5 = GA1 + LA2 + CA3 + KA4 + NA5 + OA6 + NA7 + MA8, A5A8 = A8A5 = HA1 + MA2 + KA3 + BA4 + JA5 + OA6 + OA7 + IA8, A6A7 = A7A6 = LA1 + FA2 + DA3 + LA4 + MA5 + IA6 + JA7 + KA8,  A6A8 = A8A6 = MA1 + GA2 + LA3 + CA4 + KA5 + NA6 + OA7 + NA8, m0; if j = i, A7A8 = A8A7 = KA1 + LA2 + FA3 + DA4 + LA5 + MA6 + IA7 + JA8.  (9)   m1; if j − i ∈ C0,   m2; if j − i ∈ C1, Proof. The proof is straightforward from the definition of   m3; if j − i ∈ C2, Ai and lemma 1. m4; if j − i ∈ C3, (6)  Lemma 3. If p = 16l + 9 is a prime, then  m5; if j − i ∈ C4,  m ; if j − i ∈ C ,  6 5 RRt = α I +α A + α A + α A + α A +  0 p 1 1 2 2 3 3 4 5  m7; if j − i ∈ C6, (10)  α5A5 + α6A6 + α7A7 + α8A8  m8; if j − i ∈ C7. where

2 p−1 2 2 2 2 2 2 2 2 α0 = m0 + 8 (m1 + m2 + m3 + m4 + m5 + m6 + m7 + m8), 2 2 α1 = α5 = (m0m1 + m0m5) + (m1 + m1m5 + m5)A +(m1m2 + m4m8 + m5m6)B + (m1m3 + m3m7 + m5m7)C Let I be the identity matrix and J be the all-one +(m1m4 + m2m6 + m5m8)D + m1m5E n n +(m1m6 + m2m5 + m4m8)F + (m1m7 + m3m5 + m3m7)G square matrix, so that Cp(1, 0, 0, 0, 0, 0, 0, 0, 0) = Ip and +(m1m8 + m2m6 + m4m5)H 2 2 Cp(1, 1, 1, 1, 1, 1, 1, 1, 1) = Jp. Denote +(m1m2 + m1m6 + m2m5 + m4 + m5m6 + m8)I 2 2 +(m1m4 + m1m8 + m2 + m4m5 + m5m8 + m6)J +(m2m3 + m2m8 + m3m8 + m4m6 + m4m7 + m6m7)K +(m2m4 + m2m7 + m3m6 + m3m8 + m4m7 + m6m8)L +(m2m7 + m2m8 + m3m4 + m3m6 + m4m6 + m7m8)M A1 := Cp(0, 1, 0, 0, 0, 0, 0, 0, 0),A2 := Cp(0, 0, 1, 0, 0, 0, 0, 0, 0), 2 2 +(m1m3 + m1m7 + m3 + m3m5 + m5m7 + m7)N A3 := Cp(0, 0, 0, 1, 0, 0, 0, 0, 0),A4 := Cp(0, 0, 0, 0, 1, 0, 0, 0, 0), +(m2m3 + m2m4 + m3m4 + m6m7 + m6m8 + m7m8)O, 2 2 A5 := Cp(0, 0, 0, 0, 0, 1, 0, 0, 0),A6 := Cp(0, 0, 0, 0, 0, 0, 1, 0, 0), α2 = α6 = (m0m2 + m0m6) + (m2 + m2m6 + m6)A A7 := Cp(0, 0, 0, 0, 0, 0, 0, 1, 0),A8 := Cp(0, 0, 0, 0, 0, 0, 0, 0, 1). +(m1m5 + m2m3 + m6m7)B + (m2m4 + m4m8 + m6m8)C (7) +(m1m6 + m2m5 + m3m7)D + m2m6E +(m1m5 + m2m7 + m3m6)F + (m2m8 + m4m6 + m4m8)G +(m1m2 + m3m7 + m5m6)H 2 2 +(m1 + m2m3 + m2m7 + m3m6 + m5 + m6m7)I +(m m + m m + m m + m2 + m m + m2)J And the construction of the n × n circulant matrix R 1 2 1 6 2 5 3 5 6 7 +(m1m3 + m1m4 + m3m4 + m5m7 + m5m8 + m7m8)K is given as follows: +(m1m4 + m1m7 + m3m5 + m3m8 + m4m7 + m5m8)L +(m1m3 + m1m8 + m3m8 + m4m5 + m4m7 + m5m7)M 2 2 +(m2m4 + m2m8 + m4 + m4m6 + m6m8 + m8)N +(m1m7 + m1m8 + m3m4 + m3m5 + m4m5 + m7m8)O, 2 2 α3 = α7 = (m0m3 + m0m7) + (m3 + m3m7 + m7)A +(m2m6 + m3m4 + m7m8)B + (m1m5 + m1m7 + m3m5)C +(m2m7 + m3m6 + m4m8)D + m3m7E R = m0Ip+m1A1 + m2A2 + m3A3 + m4A4+ (8) +(m2m6 + m3m8 + m4m7)F + (m1m3 + m1m5 + m5m7)G +(m2m3 + m4m8 + m6m7)H m5A5 + m6A6 + m7A7 + m8A8 2 2 +(m2 + m3m4 + m3m8 + m4m7 + m6 + m7m8)I 2 2 +(m2m3 + m2m7 + m3m6 + m4 + m6m7 + m8)J +(m1m6 + m1m8 + m2m4 + m2m5 + m4m5 + m6m8)K +(m1m4 + m1m6 + m2m5 + m2m8 + m4m6 + m5m8)L +(m1m2 + m1m4 + m2m4 + m5m6 + m5m8 + m6m8)M +(m2 + m m + m m + m m + m2 + m m )N Lemma 2. Let p = 16l + 9 be a prime, then the matrices 1 1 3 1 7 3 5 5 5 7 +(m1m2 + m1m8 + m2m8 + m4m5 + m4m6 + m5m6)O, International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 739

α = α = (m m + m m ) + (m2 + m m + m2)A 4 8 0 4 0 8 4 4 8 8 where +(m1m8 + m3m7 + m4m5)B + (m2m6 + m2m8 + m4m6)C +(m1m5 + m3m8 + m4m7)D + m4m8E  α 1 ··· 1   α − 1 · · · − 1   −1   1  +(m1m4 + m3m7 + m5m8)F + (m2m4 + m2m6 + m6m8)G     t     +(m m + m m + m m )H KK =  .  ·  .  1 5 3 4 7 8  .   .  2 2  . R   . Rt  +(m1m4 + m1m8 + m3 + m4m5 + m5m8 + m7)I 2 2 −1 1 +(m1 + m3m4 + m3m8 + m4m7 + m5 + m7m8)J (16)  α2 + p S ··· S  +(m1m2 + m1m7 + m2m7 + m3m5 + m3m6 + m5m6)K  S  +(m m + m m + m m + m m + m m + m m )L   1 3 1 6 2 5 2 7 3 6 5 7   =  .  +(m1m6 + m1m7 + m2m3 + m2m5 + m3m5 + m6m7)M  .  2 2  . X  +(m + m2m4 + m2m8 + m4m6 + m + m6m8)N 2 6 S (p+1)×(p+1) +(m1m2 + m1m3 + m2m3 + m5m6 + m5m7 + m6m7)O. and Proof. The result comes from Lemma 2, Lemma 3 and a −→ −→ −→ −→ complex computation. X = Jp + D0(m)Ip + D1(m)A1 + D2(m)A2 + D3(m)A3 −→ −→ −→ −→ −→ +D4(m)A4 + D1(m)A5 + D2(m)A6 + D3(m)A7 + D4(m)A8, In order to facilitate, we denote p − 1 S = −α + m + (m + m + m + m 0 8 1 2 3 4 −→ 9 m := (m0, m1, m2, m3, m4, m5, m6, m7, m8) ∈ GF(q) , + m + m + m + m ). −→ 5 6 7 8 D0(m) := α0, (17) −→ D1(m) := α1 = α5, −→ D2(m) := α2 = α6, −→ D3(m) := α3 = α7, −→ The result can be obtained by the definition of self-dual D4(m) := α4 = α8. codes. (11) Theorem 1. Let p be an odd prime of the form 16l+9 and 4 Eighth Power Residue Double q be a prime power. Suppose α ∈ GF(q), −→m ∈ GF(q)9. Then Circulant Self-Dual Codes Over (1) pure eighth power residue double circulant code −→ GF(2) and GF(4) Pp(m) is self-dual over GF(q) when the following condi- tions hold: In this section, we give some constructions of self-dual  −→ codes over GF(2) and GF(4) by MATLAB and MAGMA. D0(m) = −1,  −→ And the corresponding minimum hamming distances are  D1(m) = 0,  −→ solved. Some codes have good minimum distances, even D2(m) = 0, (12) −→ almost satisfy the bounds.  D3(m) = 0,  −→  D4(m) = 0. Theorem 2. Let p be an odd prime of the form 16l + 9, several pure eighth power residue double circulant self- (2) bordered eighth power residue double circulant code dual codes whose generator matrix satisfies the form of B (α, −→m) is self-dual over GF(q) when the following p Pp(m0, m1, m2, m3, m4, m5, m6, m7, m8) of length 2p over conditions hold: GF(2) are abtained. The parameters that satisfy the con- ditions are listed in the following table when p = 41 =  2 α + p = −1, 16 × 2 + 9.  p−1  −α + m0 + (m1 + m2 + m3+  8  m4 + m5 + m6 + m7 + m8) = 0, When p = 41, the length n of the code is 82, so that  −→  D0(m) = −2, the bound of the minimum hamming distance is 16. By −→ (13) D1(m) = −1, our method, the minimum hamming distance of the codes  −→  D2(m) = −1, has a maximum of 14, which almost satisfies the bound.  −→  D3(m) = −1, The self-dual [82, 41, 14] codes over GF(2) with a good  −→  D4(m) = −1. property are obtained.

Proof. According to Lemma 3, Theorem 3. Let ξ be the fixed primitive element of GF(4) satisfying ξ2 +ξ+1 = 0 and p be an odd prime of the form −→ −→ t −→ −→ −→ Pp(m)Pp(m) = Ip + D0(m)Ip + D1(m)A1 + D2(m)A2 16l + 9, pure eighth power residue double circulant self- −→ −→ −→ −→ +D3(m)A3 + D4(m)A4 + D1(m)A5 dual codes Pp(m) of length 2p over GF(4) and bordered −→ −→ −→ −→ +D2(m)A6 + D3(m)A7 + D4(m)A8, eighth power residue double circulant codes Bp(α, m) of (14) length 2(p + 1) over GF(4) can be obtained. Furthermore, and it is obvious that equation α2 + p = −1 holds if and only if α = 0, because p is an odd prime. And the parameter  I  values except α that meet the conditions are listed in the B (−→m)B (−→m)t = (I K) p+1 = I + KKt, p p p+1 Kt p+1 following table when p = 41 = 16 × 2 + 9 and p = 73 = (15) 16 × 4 + 9. International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 740

Table 1: The parameters of Pp over GF(2) with p = 41

Serial number m0 m1 m2 m3 m4 m5 m6 m7 m8 Min-distance 1 0 0 0 1 1 1 0 1 1 10 2 0 0 1 1 0 0 1 1 1 10 3 0 1 0 0 1 1 1 0 1 10 4 0 1 1 0 1 1 0 0 1 10 5 1 0 0 0 1 0 1 1 1 14 6 1 0 0 1 0 1 1 1 0 14 7 1 0 0 1 1 0 1 0 1 12 8 1 0 1 0 1 0 0 1 1 12 9 1 0 1 1 1 0 0 0 1 14 10 1 1 1 0 1 0 1 0 0 12

Table 2: The parameters of Pp over GF(4) with p = 41

Serial number m0 m1 m2 m3 m4 m5 m6 m7 m8 Min-distance 1 1 1 1 1 ξ 1 1 1 ξ2 14 2 1 1 ξ 1 ξ2 0 0 ξ2 ξ 12 3 1 1 0 0 0 1 0 1 1 14 4 ξ 1 ξ 1 0 0 1 ξ2 ξ2 14 5 0 ξ 0 ξ ξ2 ξ ξ2 ξ2 0 14

Table 3: The parameters of Bp over GF(4) with p = 41

Serial number m0 m1 m2 m3 m4 m5 m6 m7 m8 Min-distance 1 1 1 1 ξ 1 0 ξ ξ ξ 12 2 1 1 ξ 1 ξ2 ξ2 ξ ξ2 ξ 8 3 1 1 ξ2 1 0 ξ2 ξ2 ξ2 1 8 4 ξ2 1 0 0 ξ ξ2 ξ 1 0 12 5 0 1 ξ2 ξ 0 ξ 0 ξ2 1 12

Table 4: The parameters of Pp over GF(4) with p = 73

Serial number m0 m1 m2 m3 m4 m5 m6 m7 m8 Min-distance 1 1 1 ξ ξ 0 1 ξ2 ξ2 0 12 2 1 1 ξ 0 ξ2 1 ξ2 0 ξ 12 3 0 ξ ξ ξ2 0 ξ2 ξ2 ξ 0 6 4 1 1 0 ξ2 ξ 1 0 ξ ξ2 12 5 1 ξ2 ξ2 ξ2 ξ2 ξ ξ ξ ξ 12

Table 5: The parameters of Bp over GF(4) with p = 73

Serial number m0 m1 m2 m3 m4 m5 m6 m7 m8 Min-distance 1 1 1 ξ2 ξ2 ξ 1 ξ ξ ξ2 8 2 0 1 ξ 0 ξ 1 ξ2 0 ξ2 12 3 0 1 0 ξ ξ 1 0 ξ2 ξ2 12 4 0 ξ2 1 0 ξ2 ξ2 1 0 ξ 12 5 0 ξ ξ 1 0 ξ2 ξ2 1 0 12 International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 741

The pure double circulant self-dual codes [82, 41, 14] [2] E. R. Berlekamp, F. J. MacWilliams and N. J. codes and bordered double circulant self-dual codes self- A. Sloane, “Gleason’s theorem on self-dual,” IEEE dual [84, 42, 12] codes over GF(4) which have good Transactions on Information Theory, vol. 18, pp. property are listed, especially the values of parame- 409-414, 1972. ters m0, m1, m2, m3, m4, m5, m6, m7, m8. Besides, we get [3] S. Buyuklieva, “On the binary self-dual codes with some other codes when p = 73. an automorphism of order 2,” Designs Codes & Cryp- tography, vol. 12, no. 1, pp. 39-48, 1997. [4] S. Bouyuklieva and I. Bouyukliev, “An algorithm for 5 Conclusion classification of binary self-dual codes,” IEEE Trans- actions on Information Theory, vol. 58, no. 6, pp. In this paper, we construct double circulant self-dual 3933-3940, 2012. codes by higher power residues, especially eighth power [5] J. H. Convay and V. Pless, “On the enumeration of residues. self-dual codes,” Journal of Combinatorial Theory, First of all, the relationship of the eighth power residue vol. 28, no. 1, pp. 26-53, 1980. cyclotomic numbers is given. Suppose that there are eight [6] J. H. Convay and J. A. Sloane, “A new upper bound matrices with nine parameters on the GF(q), and the ex- on the minimal distance of self-dual codes,” IEEE pression for multiplying any two matrices is represented Transactions on Information Theory, vol. 36, no. 6, by the cyclotomic numbers. From the linear combination pp. 1319-1333, 1990. of eight circulant matrices, we can construct the circu- [7] C. Ding, T. Helleseth, and W. Shan, “On the linear lant matrix R. Two kinds of codes are represented by R. complexity of legendre sequences,” IEEE Transac- One is pure circulant codes, and the other is bordered cir- tions on Information Theory, vol. 44, no. 3, pp. 1276– culant codes. Combined with the necessary condition of 1278, 1998. self-dual code (GGT = 0), parameters can be determined [8] S. T. Dougherty, J. L. Kim and P. Sole, “Double to satisfy the condition of self-dual code, which renders circulant codes from two class association schemes,” the pure double circulant self-dual codes and bordered Advances in Mathematics of Communications, vol. 1, circulant self-dual codes can be obtained. By program- no. 1, pp. 45-64, 2007. [9] S. T. Dougherty, J. Gildea, A. Korban, A. Kaya, A. ming, the parameters that satisfy the conditions and the Tylyshchak and B. Yildiz, “Bordered constructions minimum hamming distance are given. of self-dual codes from group rings and new extremal We exploit a new way to construct self-dual codes over binary self-dual codes,” Finite Fields and Their Ap- GF(2) and GF(4) by prime p of the form 16f+9, and some plications, vol. 57, pp. 108-127, 2019. codes have good properties. Examples of such codes are [10] P. Gaborit, “Quadratic double circulant codes over binary self-dual [82, 41, 14] code, quaternary self-dual [82, fields,” Journal of Combinatorial Theory, Series A, 41, 14] code and quaternary self-dual [84, 42, 12] code. vol. 97, no. 1, pp. 85-107, 2002. [11] M. Harada, M. Kiermaier, A. Wassermann, et al., “New binary singly even self-dual codes,” IEEE Acknowledgments Transactions on Information Theory, vol. 56, no. 4, pp. 1612-1617, 2010. The work is financially supported by National Nat- [12] T. Helleseth, “Double circulant quadratic residue ural Science Foundation of China (No. 61902429, codes,” IEEE Transactions on Information Theory, No.11775306), Shandong Provincial Natural Sci- vol. 50, no. 9, pp. 2154-2155, 2004. ence Foundation of China (No. ZR2017MA001, [13] W. Huffman, “On the classification and enumeration ZR2019MF070), Fundamental Research Funds for of self-dual codes,” Finite Fields and Their Applica- the Central Universities (No. 19CX02058A, No. tions, vol. 11, pp. 451-490, 2005. 17CX02030A), the Open Research Fund from Shandong [14] W. C. Huffman, R. A. Brualdi and V. S. Pless, Hand- provincial Key Laboratory of Computer Networks, Grant book of Coding Theory, 1998. No. SDKLCN-2017-03, Key Laboratory of Applied [15] K. Ireland and M. Rosen, “Gauss and jacobi sums,” Mathematics of Fujian Province University (Putian Uni- Mathematical Gazette, vol. 84, pp. 75-92, 1998. versity)(No.SX201702, No.SX201806), and International [16] M. Karlin, “New binary coding results by circulants,” Cooperation Exchange Fund of China University of IEEE Transactions on Information Theory, vol. 15, Petroleum (UPCIEF2019020). no. 1, pp. 81-92, 1969. [17] A. Kaya, B. Yildiz and I. Siap, “New extremal bi- nary self-dual codes from F4 + uF4-lifts of quadratic References circulant codes over F4,” Finite Fields and Their Ap- plications, vol. 35, pp. 318-329, 2015. [1] K. T. Arasu and T. A. Gulliver, “Self-dual codes over [18] A. Kaya, B. Yildiz, “Various constructions for self- Fp and weighing matrices,” IEEE Transactions on dual codes over rings and new binary self-dual Information Theory, vol. 47, no. 5, pp. 2051-2055, codes,” Discrete Mathematics, vol. 339, pp. 460-469, 2001. 2016. International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 742

[19] A. Kaya, “New extremal binary self-dual codes of [28] N. J. A. Sloane and J. G. Thompson, “Cyclic self- lengths 64 and 66 from R2-lifts,” Finite Fields and dual codes,” IEEE Transactions on Information Their Applications, vol. 46, pp. 271-279, 2017. Theory, vol. 29, pp. 364-366, 1983. [20] A. Kaya, B. Yildiz and I. Siap, “New extremal binary [29] N. Yankov, M. Ivanova and M. H. Lee, “Self-dual self-dual codes of length 68 from quadratic residue codes with an automorphism of order 7 and s- 2 codes over F2 +uF2 +u F2,” Finite Fields and Their extremal codes of length 68,” Finite Fields and Their Applications, vol. 29, pp. 160-177, 2014. Applications, vol. 51, pp. 17-30, 2018. [21] A. Kaya, B. Yildiz and I. Siap, “New extremal binary [30] T. Zhang and G. Ge, “Fourth power residue dou- self-dual codes of length 68 from quadratic residue ble circulant self-dual codes,” IEEE Transactions on codes over + u + u2 [J],” Finite Fields and F2 F2 F2 Information Theory, vol. 61, no. 8, pp. 4243-4252, Their Applications, vol. 29, pp. 160-177, 2014. 2015. [22] C. L. Mallows and N. J. A. Sloane, “An upper bound for self-dual codes,” Information & Control, vol. 22, no. 2, pp. 188-200, 1973. [23] R. Meng, T. Yan, “New constructions of binary in- Biography terleaved sequences with low autocorrelation,” Inter- national Journal of Network Security, vol. 19, no. 4, Changsong Jiang was born in 1997 in Sichuan Province pp. 546–550, 2017. of China. He was graduated from China University of [24] V. Pless and N. J. A. Sloane, “On the classification Petroleum in 2019. He is currently studying for a post- and enumeration of self-dual codes,” Journal of Com- graduate degree at University of Electronic Science and binatorial Theory, vol. 18, no. 3, pp. 313-335, 1975. Technology of China. Email: [email protected] [25] E. M. Rains, “Shadow bounds for self-dual codes,” IEEE Transactions on Information Theory, vol. 44, Yuhua Sun was born in 1979. She was graduated from no. 1, pp. 134-139, 1998. Shandong Normal University, China, in 2001. In 2004, [26] C. Shannon, “A mathematical theory of communi- she received the M.S. degree in mathematics from the cation,” Bell System Technical Journal, vol. 27, pp. Tongji University, Shanghai and a Ph.D. in Cryptogra- 379-423, 1948. phy from the Xidian University. She is currently a lec- [27] S. Zhang, T. Yan, Y. Sun, L. Wang, “Lin- turer of China University of Petroleum. Her research in- ear complexity of two classes of binary inter- terests include cryptography, coding and information the- leaved sequences with low autocorrelation,” ory. Email: sunyuha [email protected] International Journal of Network Security, Xueting Liang was born in 1997 in Anhui Province of 2019. (http://ijns.jalaxy.com.tw/contents/ China. She is studying at China University of Petroleum. ijns-v22-n6/ijns-2020-v22-n6-p834-0.pdf) Email:[email protected] International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 743

Identity-based Public Key Cryptographic Primitive with Delegated Equality Test Against Insider Attack in Cloud Computing

Seth Alornyo1,2, Acheampong Edward Mensah1, and Abraham Opanfo Abbam1 (Corresponding author: Seth Alornyo)

School of Information and Software Engineering, University of Electronic Science and Technology of China1 4 E 2nd Section, 1st Ring Rd, Jianshe Road, Chenghua, Chengdu, Sichuan, China Computer Science Department, Koforidua Technical University, Koforidua-Ghana2 (Email: [email protected]) (Received Mar. 3, 2019; Revised and Accepted Oct. 3, 2019; First Online Jan. 23, 2020)

Abstract two ciphertext from user A and user B are encryption of the same message. . The notion of attacks perpetuated by an insider (cloud However, there has been a recent attack perpetuated server) is paramount in this era of cloud computing and by an adversary who is able to launch what is referred to data analytics. When a cloud server is delegated with as the insider attack [15]. In this era of cloud computing, certain responsibilities, it is possible for a cloud server equality test function are outsourced to a cloud server to peddle with users’ encrypted data for profit gains. to examine whether two ciphertext are encryptions with The cloud server takes advantage of it’s authorization to same message [4]. Such a delegated responsibility to the launch what we referred to as insider attack. We put for- cloud server gives it the leverage to launch the insider ward a new improved scheme on identity-based public key attack on users’ ciphertext. This attack when successful cryptographic primitive which integrate delegated equal- enables the cloud server peddle with encrypted data for ity test to resist insider attack in cloud computing. Our economic gains. If the cloud server has legitimate access scheme resist the insider attack perpetuated by the cloud to all users ciphertext and can test their equality, then server (insider). We refer to our new scheme as identity- the cloud server (insider) should be resisted from peddling based public key cryptographic primitive with delegated with users’ ciphertext. Recent schemes on insider attack equality test against insider attack in cloud computing has not been able to fully solve this problem. (IB-PKC-DETIA). We construct our scheme using a wit- Recent works of insider attack by [15] and a security ness based cryptographic primitive with an added pairing analysis and modification by [19] enables the user to gen- operation. Our scheme achieves weak indistinquishable erate a token tokID to prevent the tester from launching identity chosen ciphertext (W-IND-ID-CCA) security us- the insider attack. Therefore, their scheme is susceptible ing the random oracle model. to the insider attack because the cloud server was not del- Keywords: Identity Based Encryption; Insider Attack; egated to perform equality test. When the cloud server is Witness Based Encryption delegated to perform equality test on users ciphertext, it could guess the token tokID to launch the insider attack. The insider attack resistance scheme proposed by [15] and 1 Introduction a security analysis and modification by [19] enable the tester after receiving the secret trapdoor to successfully The concept of searchable public key encryption which guess the token tok and launch the attack as follows: integrate keyword search (PEKS) was unveiled by [2]. In- ID spite of this, there has been several works on this cryp- 1) The cloud server (insider) receives a valid trapdoor tographic primitives where a search on ciphertext allows Td and tries to find out the message m and the token a third party to search over an encrypted data without from Td. revealing any information about the ciphertext. A Public 2) The cloud server (insider) computes IB-PKC-DETIA key cryptographic primitive with equality test was un- 0 ciphertext C of a guessed message m and a guessed veiled by [18] and was used to manage encrypted data for 0 token tok . clients. Recently, [11] proposed identity based cryptosys- ID tem with integrated equality test (IBE-ET) in cloud com- 3) The cloud server (insider) checks whether puting and it enables the cloud server to verify whether T est(C,Td, tokID)= 1. The equation holds if a International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 744

0 0 guess of the message m and a token tok is suc- tokID could successfully enable the cloud server guess a ID 0 cessful to the adversary (cloud server). Otherwise, new token tokID to launch the insider attack. When a go back to Step 2. cloud server (insider) is delegated to perform equality test, it is possible for the cloud server to launch the insider at- Therefore, if the guess of a message and a token are suc- tack because a guess of a token is possible. To the best cessful as indicated above, then the cloud server (insider) of our knowlege, their scheme cannot resist the insider could launch the insider attack because of the delegated attack as explained above. A scheme to resist the insider responsibility. Other works of IBE-ET [11] and a security attack with delegated equality test in IBE-ET with the modification by [17] assumed that it’s possibe to resist the cloud server (insider) authorized to perform equality test insider attack. On the contrary, when the cloud server is is still problem to the research community. delegated to conduct equality test, it is possible the adver- sary can launch the insider attack. Therefore, we propose a new improved scheme to resist insider attack perpetu- 1.2 Our Contribution ated by adversary (cloud server) delegated to undertake equality or equivalence test on ciphertext. Wu et al. [17] unveiled a variant to IBE-ET scheme. How- ever, their scheme allowed anyone to perform equality 1.1 Related Work test between two ciphertext hence lack authorization for equality test. The security analysis and modification in [9] Boneh et al. [2] first unveiled the primitive of PEKS and did not authorize a third party (cloud server) by gen- was later examined by [11] on their work on off-line key- erating a trapdoor function for the cloud server to per- word guessing attack on recent keyword search schemes. form equality test. It is not clear whether a computed Their work showed that PEKS scheme was vulnerable to trapdoor to the cloud server could resist the insider at- the insider attack. A related works on a delegated tester tack as claimed in their security analysis and modification was later unveiled by [14] whereby only the designated scheme. tester (server) could perform equality test on the cipher- To address this problem, we added a pairing operation text. Their scheme concentrated on security of the trap- to the witness cryptographic primitive in [6] to resist in- door in PEKS and a resistant to insider attack. Chen et sider attack in IBE-ET. Witness based encryption ensure al. [9,17] also proposed a new general framework for secure that given a witness relation R(W, X) of an NP language public key cryptosystem with keyword search and a dual- L, an encryption of (m, w) can be tested by a generated server public key primitive with keyword search for secure trapdoor (m0, x). The tester checks if m0 = m. However, cloud storage to resist the insider attack so far as there it is difficult to compute w from x under a defined witness was no collusion by two servers [9]. However, keyword relation. guessing attacks against the insider has being a challeng- Our scheme achieves Weak-IND-ID-CCA (W-IND-ID- ing problem in PEKS until recently, [12] proposed the no- CCA) and a resistant to insider attack. Our scheme tion of witness-based searchable cryptographic primitive achieves a stronger notion of IND-ID-CCA security for to resist insider attack in PEKS. IBE-ET using the random oracle model. A special type of searchable encryption was un- veiled in [18] for a general equality test. However, [11] introduced identity based cryptosystem with equality 1.3 Organization test (IBE-ET) in cloud computing which integrated the The rest of the paper is organized as follows. In Section 2, identity-based primitive into public key cryptosystem our scheme provide some preliminaries for our construc- with equality test [2], it gains the advantages of equal- tion. In Section 3, our scheme formulate the notion of ity test in [18]. In their construction, search functions IB-PKC-DETIA. In Section 4, construction of IB-PKC- were delegated to the service provider. DETIA and prove its security in Section 5. In Section 6, Existing works on insider attacks mainly focused on we compare our work with other related works. In Sec- PEKS schemes in [8,19] whiles few works on PKEETs ex- tion 7, we conclude our paper. tensions [7,11,13,15,18] and IBE-ET applications in [19] were not resistant to insider attack. To solve the prob- lem of insider attack in IBE-ET, ID-based primitive with 2 Preliminaries equality test against the insider attack was recently put forward by Wu et al. [17], the scheme claimed their scheme Definition 1. Billinear map: Let G and GT be two mul- achieves confidentiality in IBE-ET but the work of Lee et tiplicative cyclic groups of prime order p. Suppose that g al. [9] refuted their claim of weak indistinguishability of is a generator of G. A bilinear map e : G × G → GT IBE-ET. Lee et al. [9] modified their security analysis satisfies the following properties: claims to achieve the weak indistinguishability as unveiled a b in [17]. While in [17], their scheme ensured that the desig- 1) Bilinearity: For any g ∈ G, a and b ∈ Zp, e(g , g ) = nated users’ token tokID were‘ changed per a correspond- e(g, g)ab. ing identity, but in [9] scheme, they ensured that the to- ken was fixed for all group users. Therefore, a fixed token 2) Non-degenerate: e(g, g) 6= 1. International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 745

1) Setup: It takes as input security parameter k and returns the public key K and msk.

2) Extract: It takes as imput msk, an arbitrary ID ∈ {0, 1}∗ and returns a decryption key dk for that iden- tity.

3) WBInstGen: It takes as input the security param- eter k, an arbitrary ID ∈ {0, 1}∗ and returns a pri- vate witness key w ∈ W for that identity wID, where W InsGen(w) = x and x ∈ X where (w, x) satisfies the witness relation R.

4) Trapdoor: It takes as input decryption key dk, an arbitrary ID ∈ {0, 1}∗, an instance x ∈ X and re- Figure 1: System model for IB-PKC-DETIA turns a trapdoor td for that identity. 5) WBEncrypt: It takes as input an identity ID ∈ 3) Computable: There is an efficient algorithm to com- {0, 1}∗, a plaintext m ∈ M with a random cho- pute e(g, g) for any g ∈ G. sen witness w ∈ W and outputs a ciphertext C = (x, c) where x ∈ X from a generated witness Definition 2. Bilinear Diffie-Hellman (BDH) problem: W InsGen(w) = x, and (w,x) satisfies the witness Let G,GT be two groups of prime order p. Let e : G × relation R. G → GT be an admissible bilinear map and let g be a generator of G. The BDH problem in hp, G, GT, ei is as 6) WBDecrypt: The algorithm takes as input the ci- a b c ∗ follows: Given hg, g , g , g i, for random a,b,c ∈ Zp, for phertext c ∈ C, a private decrption key dk and a any randomized algorithm A computes value e(g, g)abc ∈ witness w ∈ W and returns a plaintext m ∈ M, if GT with advantage: and only if C is a valid ciphertext with the ID and a witness w ∈ W . BDH a b c abc ADVA P r[A(g, g , g , g ) = e(g, g) ]. The BDH assumption holds if for any polynomial-time algorithm 7) Test: It takes as input a ciphertext CA ∈ C of a BDH A, it’s advantage AdvA is negligible. receiver with IDA, a trapdoor tdA for the receiver with IDA, a ciphertext CB ∈ C of a receiver with IDB and trapdoor tdB for the receiver with IDB, and Definition 3. Witness Relation: Given a witness re- returns 1 if CA and CB contains the same message. lation R(W, X) on an NP language L [3], a randomly Otherwise return ⊥. chosen w ∈ W generates an instance x ∈ X defined over the relation R. For any polynomial algorithm: AIB−PKC−DETIA, P r[AIB−PKC−DETIA(k, w) = x] = 3 Security Model 1, and AIB−PKC−DETIA, P r[AIB−PKC−DETIA(k, x) = w] < ε(k), where k is a security parameter and ε a negli- Definition 4. (Weak-IND-ID-CCA). We let t = gible functionon on k. (Setup, Extract, W BInstGen, T rapdoor, W BEncrypt, W BDecrypt, T est) be the same scheme and a polynomial Our model has four roles which includes: users, PKG, time algorithm A. cloud server and with adversary (see Figure 1). Users stores their encrypted sensitive data in the cloud. The 1) Setup: The challenger runs the security parameter cloud server with adversary is resisted from peddling on input k and derives K and randomly takes a wit- with users encrypted sensitive data for economic gains. ness w ∈ W and generates an instance x ∈ X of a In this section, we give formal definitions of our scheme. witness relation R(W, X) defined on an NP language We employ a witness based cryptographic primitive to L. It gives the relation R to the adversary. resist the insider attack in IBE-ET. Our scheme achieves weak chosen ciphertext security (i.e. W-IND-ID-CCA) 2) Phase 1: The adversary issues query N1, N2, ··· , under the defined security model. Nm. Each query is of the form:

In identity-based public key cryptographic prim- • Query (IDi): The challenger run H(.) to gen- itive with delegated equality test against in- erate dki corresponding to the public identity sider attack scheme, we specify seven algorithms: (IDi). It sends dki to A. Setup, Extract, W BInstGen, T rapdoor, W BEncrypt, W BDecrypt, T est, where M • Trapdoor (IDi): The challenger runs the pri- and C are its plaintext space and ciphertext space, vate decryption on W InsGen using a randomly respectively: chosen witness w ∈ W of the relation R(W,X) International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 746

on an NP language L. The algorithm generates Definition 5. A witness relation R(X,W ) on an an instance x and compute a trapdoor tdi using NP language L consist of the following polynomial- dki via trapdoor algorithm. Finally, it sends time algorithm. Given a randomly chosen w ∈ tdi to A. W over a witness relation R(w, x)=1 defined on an NP language L if for any polynomial algorithm: A , P r[A (k, w) = x] = 1, • Decryption queries (ID ,C , w): The challenger IB−PKC−DETIA IB−PKC−DETIA i i and A , runs the decryption algorithm to decrypt the IB−PKC−DETIA P r[A (k, x) = w] < ε(k), where k is a ciphertext C by running the extract algorithm IB−PKC−DETIA i security parameter and ε a negligible functionon on k. to obtain dki corresponding to the public key (IDi). Finally, it sends the plaintext Mi to A. 1) Setup: The system takes a security parameter k and returns the public parameter K and master secret key 3) Challenge: After Phase 1 is over, A submits two msk. ∗ equal-length message (m0, m1) and ID to be chal- • The algorithm generates the pairing parameters lenged by the challenger. However, both (m0, m1) are not issued in the encryption query and ID∗ is G and GT of prime order p and an admissible also not in the extract query in Phase 1. The chal- bilinear map e : G × G → GT . Choose a ran- lenger randomly picks b ∈ {0, 1} and respond with dom generator g ∈ G. ∗ ∗ ∗ C ← Enc(Mb,ID , w ). The algorithm generates • The system choose cryptographic hash func- ∗ ∗ ∗ ∗ a challenge trapdoor td = (ID , x ) by runing the tions: H : {0, 1} → G, H1 : GT → G, ∗ ∗ τ1+τ2 trapdoor td ← td(dk, Mb, x ) algorithm and returns H2 : R × GT → {0, 1} , where τ1 and τ2 td∗ to A. are security parameters. The elements of G are represented in τ1 bits and elements of x ∈ X 4) Phase 2: The adversary issues query N1, N2, ··· , of a witness relation R are represented in τ2 bits. Nm. Each query is of the form: • The algorithm randomly picks a witness w ∈ W • Query (IDi). The challenger responds as in and generate an instance x ∈ X on NP lan- ∗ Phase 1, since IDi 6= ID . w guage L of a relation R(W, X) and set g1 = g ∗ x • Trapdoor query (IDi). Where x 6= x . The and g2 = g . The ciphertext space C ∈ challenger respond in the same way as in (G∗ × {0.1})τ1+τ2 . The message space is M ∈ Phase 1. G∗. It publishes the public parameter K = {p, G, GT , R, e, g, g1, g2,H,H1,H2}. • Decryption Query(IDi,Ci). Where (IDi,Ci) 6= (ID∗,C∗), the challenger respond in the same 2) WBInstGen: The algorithm takes as input the se- way as in Phase 1. cret parameter k and arbitrary ID ∈ {0, 1}∗ and compute hID = H(ID) ∈ G and randomly choose 0 0 5) Output: A submits a guess b on b. If b = b, we say a witness w ∈ W and generates a corresponding in- A wins the game. stance x ∈ X on a witness relation R(W, X). 3) Extract: Given a string ID ∈ {0, 1}∗, the algorithm We define A’s advantage on breaking the scheme as computes hID = H(ID) ∈ G and set the private 0 1 w x AdvIB−PKC−DETIA(k) = P r[b = b]− 2 is negligible. decryption key dkID = (hID, hID) where (w, x) is the master key corresponding to the relation R. In the Weak-IND-ID-CCA (W-IND-ID-CCA) model, 4) Trapdoor: On input a string ID ∈ {0, 1}∗, the the adversary has access to ciphertexts but cannot algorithm compute hID = H(ID) ∈ G and set compute the witness w ∈ W from x ∈ X of a relation x tdID = (hID) where x is an instance of the relation R(W, X) over NP language L. R. 5) WBEncypt: The algorithm on input the public pa- 4 Construction rameter K, ID, it computes hID = H(ID) ∈ G and encrypt M ∈ G by choosing two random num- ∗ Our scheme aims to resist an attack continuum perpetu- bers (r1, r2) ∈ Zq with a randomly chosen witness ated by a cloud server delegated to perform equality test. and instance (w, x). The algorithm set the cipher- The cloud server (insider) is considered as an adversary text C = (C1,C2,C3,C4) as: A who is authorized only to perform equlaity test but x r1 C1 = M .H2(e(hID, g2) ), should not be able to peddle with user’s ciphertext. C = gr1 However, only the authorized cloud server could perform 2 r2 equality test. C3 = g , r2 C4 = (M k w) ⊕ H2(C1 k C2 k C3 k e(hID, g1) ). International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 747

6) Decrypt: To decrypt, the algorithm requires an in- The algorithm output 1 if the following equation put the ciphertext C, private decryption key dkID = holds. Otherwise 0. Hence: w x (hID, hID) where C = (C1,C2,C3,C4) corresponding

to the ciphertext encrypted with ID. It computes: e(C2A ,TB) =e(C2A ,TB)

0 0 w r xB r x (M k w ) = C4 ⊕ H2(C1 k C2 k C3 k e(C3, h )), 1A 1A B ID e(C2A ,TB)=e(g ,MB )=e(g, MB)

where w ∈ W of the witness relation R(W, X) of r1 xA r1 xA e(C2 ,TA)=e(g B ,M )=e(g, MA) B the NP language L. The algorithm checks whether B A 0 0 0 r C1 = (M | x ) and C2 = g 1 . Hence, if both holds, 0 return M . Otherwise ⊥. Remark 1. Given a witness w ∈ W , the user should be able to compute x to recover M on a witness relation R. However, given the instance x of a witness rela- 7) Test: The algorithm on input a ciphertext CA, a tion R, it is difficult to recover its corresponding wit- trapdoor tdA and a given sender’s ciphertext CB. ness w. Therefore the cloud server can only perform x x The algorithm test whether MA = MB by comput- equality test but cannot generate a new ciphertext. ing: If MA = MB, it implies that e(C2A , TB)=e(C2B , TA). C Given the witness relation R(W, X) defined over an T = 1A A NP language L. It means that: AIB−PKC−DETIA, H2(e(C1A , tdIDA )) C P r[AIB−PKC−DETIA(k, w) = x] = 1, and T = 1B A , P r[A (k, w) = w] < B H (e(C , td )) IB−PKC−DETIA IB−PKC−DETIA 2 1B IDB ε(k), where k is a security parameter and ε is a negligible If the above equation holds, the algorithm outputs 1. function on k. Otherwise 0. Thus: 5 Security Analysis e(TB,C2A ) = e(TA,C2B ). We define W-IND-CCA security for IBE-ET via the fol- Remark 1. With the work in [17], the token gen- lowing game similar in [18]. erated was changed per identity in their con- A probabilistic polynomial time (PPT) adversary A struction whiles a security analysis and modifi- achieves the advantage ε on breaking u=(Setup, Extract, cation in [9] had a fixed token for all group users. W BInstGen, T rapdoor, W BEncrypt, W BDecrypt, T est). We note that since the token was fixed in their Given BDH instance, a PPT adversary B takes advantage construction, the insider attack is paramount in 0 of A to solve the BDH problem with a probability of ε . their scheme. A randomly chosen witness w un- Suppose B holds a tuple (g,U,V,R) where a = log U, der the relation R(W, X) is considered to be se- g b = log V and c = log R are unknown. Given the gen- cure and should avoid a reuse. A reuse of a ran- g g erator g of G, B is supposed to output e(g, g)abc ∈ G . domly chosen witness will compromise the secu- T The game between B and A runs as follows: rity of our scheme hence should be discarded im-

a.r1 r1 ∗ mediately the decryption process is completed. Setup. B sets g1 = g = U , where r1 ← Zq and sets trapdoor td = x ← X from a witness relation Correctness: R(W, X). B gives g1 to A. Let CA ← WBE(MA,IDA, g2A , xA) and CB ← Phase 1. WBE(MB,IDB, g2B , xB) generated by user A and user B respectively. Then: 1) H Query: A query the random oracle H. A queries IDi to obtain hID. B responds CA = (CA1 ,CA2 ,CA3 ,CA4 ). with hID. If IDi has been in the H table(ID , hw , coin). Otherwise, for each ID , Test algorithm computes results as: i IDi i B responds as follows:

C1A C1B CA = , CB = . • B tosses a coin with P r[coini = 0] = δ. If H2(e(C2A ,tdIDA )) H2(e(C2B ,tdIDB )) wi coini = 1, responds to A with hID = g , xA xB MA .H2(e(C2A ,tdIDA )) MB .H2(e(C2B ,tdIDB )) wiy TA= , TB= . wi ← W . Otherwise, B sets hID = g = H2(e(C2 ,tdID )) H2(e(C2 ,tdID )) A A B B V wi . r r xA 1A xA xB 1B xB M .H2(e(g ,h )) M .H2(e(g ,h )) A 1A IDA B 1B IDB r r • B responds with hIDi , then adds TA= 1A xA , TB= 1B xB . H2(e(g ,h ) H2(e(g ,h ) 1A IDA 1B IDB (IDi, hID, wi, coin) in the H table which is xA xB initially empty. TA = MA and TB = MB . 2) Extract Query: A queries private key of IDi. B xA xB Therefore, MA =MB . responds as follows: International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 748

∗ • B obtains H(IDi) = hID in the H table. • B searches the H table, if coin = 1, B re- If coini = 0, B responds ⊥ and terminates sponds with ⊥ and terminates the game, since ∗ w∗ the game. hID = g . It is observed that for a given wit- wi ness relation R(W, X), it is difficult to compute • Otherwise, B responds with dkIDi = g1 = r1.wi the corresponding witness w for a given instance U , where (wi, hIDi ) is in the H table. x. • B sends dkIDi to A, then stores (dkID,IDi) in the private key list which is initially • Otherwise, B randomly selects b ∈ {0, 1}, how- w∗ x∗ empty. ever, dkID = (hID, hID) and calculates: 3) H2 Query: A queries Di ∈ R × GT → ∗ x ∗ r∗ τ1+τ2 1 {0, 1} . B responds with Si ∈ H2(Di) in C1 = Mb .H2(e(hID, g2) ) ∗ ∗ r1 the H2 table. Otherwise, for every Di, B selects C2 = g τ1+τ2 ∗ a random string Si = {0, 1} as H2(D). B ∗ r2 C3 = g responds A with H2(Di) and adds (Di,Si) in ∗ ∗ ∗ C4 = (Mb||w ) ⊕ S , the H2 table which is initially empty. 4) Trapdoor Query: B runs the private decryption where w ∈ W of a witness relation R(W, X), S∗ key queries on (ID ) to obtain dk = (gwi , gxi ) ∗ ∗ ∗ ∗ ∗ w i ID 1 1 = H2(D ) and D = (C1 ||C2 ||C3 ||e(C3, hID)), and responds A with td = gxi . td is the w IDi 2 IDi where hID is unknown and B want A to com- first element of the decryption key. ∗ ∗ ∗ ∗ ∗ pute. C = (C1 ||C2 ||C3 ||C4 is a valid cipher- 5) Encryption Query: A queries Mi encrypted text for Mb. with IDi. B responds as follows: • B responds A with C∗. • B searches the H table and obtain hID and computes h = (hwi , hxi ) where (w, x) Phase 2: IDi IDi IDi are randomly chosen from the witness rela- 1) H Query: A queries as in Phase 1. tion R(W, X). ∗ 2) Extract Query: A queries as in Phase 1, but • A selects r1i , r2i ← Zq and computes: ∗ IDi 6= ID . xi r1 C1i = M .H2(e(hIDi , g2) ) 3) H2: A issues the query as in Phase 1. r1i C2i = g r 4) Trapdoor Query: A queries as in Phase 1, but C = g 2i 3i responds to trapdoor queries the same as in r2i Di = (C1i ||C2i ||C3i ||e(hIDi , g1) ). Phase 1. However, the adversary given the wit- ness relation instance x ∈ X, the adversary can- • B queries O to obtain S = H (D ). H2 i 2 i not compute the corresponding witness w ∈ W • B computes C = (M ||w ) ⊕ S . 4i i i i of the relation R(W, X) to generate a new ci- ∗ Then B responds with Ci = (C1i ,C2i ,C3i ,C4i ). phertext C . 6) Decryption Query: A queries C to be decrypted i 5) Encryption Query: The message Mi ∈ {m0, in ID . B responds as follows: i m1}, A queries as in Phase 1. • B searches the H table to obtain h . IF IDi 6) Decryption Query: A queries as in Phase 1, ex- coini = 1, obtain dkID of IDi in the pri- ∗ ∗ i cept that ciphertext (Ci,IDi) 6= (C ,ID ). vate key list to decrypt Ci. Then B com-

putes the bilinear map with dkIDi as: Result: Given a witness relation R, a randomly chosen witness w ∈ W generates an instance x ∈ X. Given r2i wi r2iwi e(C3i , hIDi ) = e(g , g ) = e(g, g) . x the instance x to compute tdID = hID, it is dificult • B computes Di = to compute the corresponding witness w ∈ W . 0 0 0 r2i (C1i ||C2i ||C3i ||e(hIDi , g1) ) and ob- However, A guess b on b. If b 6= b and w 6= w, B 0 tains Si in the H2 table. B obtains Mi and responds with failure and terminate the game. If b = 0 r2i by C4i ⊕ Si. b and w = w, then B gets the results of the BDH Finally, B computes C∗ ,C∗ with M and r tuple by guessing the inputs of H query. However, 1i 1i i 2i 2 decrypted from Ci. If it is a valid ciphertext this is not possible under the define witness relation ∗ ∗ that C = C1 and C = C2 . B responds R on NP language L. B aborts the game because, 1i i 2i i 0 with M . Otherwise ⊥. |P r[b = b] − 1 | ≥ ε . i 2 e(qtd+qe+qd+1) Challenge: Once Phase 1 is over, A outputs two mes- ∗ sages m0, m1 of equal length and ID to be chal- Trapdoor Security: We further provide a trapdoor se- lenged, where both m0, m1 are not issued in encryp- curity (TD) experiment to our scheme: tion Query and ID∗ is not queried in Extract Query in Phase 1. B responds as follows: ExpW −IND−ID−CCA (k). IB−PKC−DET IA,A International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 749

Table 1: Efficiency comparisons of PKEETs variant PKEETs IA Enc Dec Test Del Security [18] N 3Exp 3Exp 2P N/A OW-CCA [16] N 4Exp 2Exp 4P 3Exp OW/IND-CCA [15] N 5Exp 2Exp 4Exp N/A OW/IND-CCA [13] N 1P+5Exp 1P+4Exp 4P+2Exp 3Exp OW/IND-CCA [11] N 6Exp 2P+2Exp 4P 2Exp OW/IND-CCA [17] Y 1P+3Exp 1P+2Exp 2P N/A W-IND-ID-CCA Ours Y 2P+3Exp 1P+1Exp 4P 2Exp W-IND-ID-CCA

00 00 00 00 00 00 Remark: In this table, Exp refers to the exponent computation, P refers to the pairing computation, IA 00 00 00 00 00 00 refers to insider attack, Y refers to ’Yes’ as a supportive remark, N refers to ’No’ as not supportive and Del refers to the delegation, W-IND-ID-CCA refers to weak indistinguishable chosen ciphertext attack against identity, OW-ID-CCA refers to one-way chosen ciphertext attack against ientity and IND-ID-CPA refers to indistinguishable chosen plaintext attack against identity.

Given a security parameter k, a witness w ∈ W and 6 Comparison A adversary against trapdoor security (TD). The ex- periment between the adversary A and the challenger In this section, we made a comparison (Table 1) on is as follows: the efficiency of algorithms adopted in our scheme with other PKEET variants. Other PKEET variants (Ta- WInsGen Generation Phase: The challenger runs ble 1) achieved a one-way chosen ciphertext attack (OW- the WInsGen algorithm with a random witness pa- CCA), one-way indistinquishable chosen ciphertext at- rameter w ∈ W . It generates a corresponding in- tack (OW/IND-CCA) and a weak indistinquishable iden- stance x ∈ X. The adversary A computes an in- tity chosen ciphertext attack (W-IND-ID-CCA) security. stance x∗(x∗ ∈/ X) from a randomly chosen witness The extended PKEET schemes cost three to four steps ∗ ∗ x∗ w . Finally, it gives tdID = hID to A. to conduct the equality test including analyzing trapdoor and inverse-computing trapdoor. Phase 1: A adaptively ask the challenger for the follow- The above comparison shows that our scheme can resist ing trapdoor oracle: insider attack, whereas others do not have such ability ex- cept in [17]. Even though Wu et al.’s scheme resist insider 1) Trapdoor oracle: On input a message M and attack, it does not provide delegation to the cloud server instance x ∈ X where x 6= x∗ submitted by A. (insider) to perform equality test. It is possible for Wu et x al.’s scheme to fail the insider attack resistance when the It output the trapdoor tdID = hID by running the trapdoor algorithm. cloud server is delegated to perform equality test. How- ever, our scheme ensures that equality test is delegated 2) Challenge Phase: A submit two messages to the cloud server and the cloudserver is resisted from (m0, m1) with equal length. The challenger launching the insider attack. picks b ← {0, 1} and generate challenge trap- ∗ The experiment results are shown in Figure 2. This door td∗ = hx corresponding to the challenge ID ID experiment is executed on a desktop computer with an i5- ciphertext M ||x ⊕ H (e(hx ,C )) by running b 2 ID 2 4460 CPU @3.2 GHz and 4gigabyte RAM, running Win- the WInsGen algorithm and returns td∗ to A. ID dows 7, 64 bit system and VC++ 6.0, by using PBC Li- brary [10]. The time consumptions were obtained from Phase 2: a repeated simulations to obtain an objective computa- tional cost comparison (see Figure 2) of ours with Yang et 1) TDID Query: A continue to ask the oracle for al. [18], Tang et al. [15, 16], Ma et al. [11, 13], and Wu et trapdoor queries. Oracle responds as in Phase 1. al. [17]. We assume both schemes were experimented on 0 2) Output: A output its quess b . The adversary the same desktop computer. 0 win the game if b = b, which shows that the Obviously, our encryption (Enc) computational cost output of experiment is 1 and 0 otherwise. Ad- seems higer than other related schemes. This is due to the versary A advantage in the above experiment is extra computational overheads by a generation of an in- defined as: stance from a witness relation to resist insider adversary. Decryptions and test computations were comparable to W −IND−TD other schemes (see Figure 2). Although time consumption AdvIB−PKC−DETIA(w) 1 of encryption (Enc) is slightly higher than in [17] scheme, = |P r[ExpW −IND−TD ] − . IB−PKC−DETIA 2 it provides enhanced security to resist insider attack. International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 750

search for secure cloud storage,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 4, pp. 789-798, 2016. [6] S. Garg, C. Gentry, A. Sahai, and B. Waters, ”Wit- ness encryption and its applications,” in Proceedings of the 45th Annual ACM Symposium on Theory of Computing, pp. 467-476, 2013. [7] K. Huang, R. Tso, Y. C. Chen, S. M. M. Rahman, A. Almogren, and A. Alamri, ”PKE-AET: Public key encryption with authorized equality test,” The Com- puter Journal, vol. 58, no. 10, pp. 2686-2697, 2015. [8] I. R. Jeong, J. O. Kwon, D. Hong, and D. H. Lee, ”Constructing PEKS schemes secure against key- word guessing attacks is possible?,” Computer Com- munications, vol. 32, no. 2, pp. 394-396, 2019. [9] H. T. Lee, H. Wang, and K. Zhang, ”Security anal- ysis and modification of ID-based encryption with Figure 2: Computational time consumption comparison equality test from ACISP 2017,” Information Secu- rity and Privacy, pp. 780-786, 2018. [10] B. Lynn, The Stanford Pairing based Crypto Library. 7 Conclusion (http://crypto.stanford.edu/pbc/) Our scheme ensures a security improvement in [9,17]. We [11] S. Ma, ”Identity-based encryption with outsourced delegate a cloud server to perform equality test on users equality test in cloud computing,” Information Sci- ciphertext. Such authorization cause the cloud server to ences, vol. 328, pp. 389-402, 2016. launch the insider attack which our scheme resist such an [12] S. Ma, Y. Mu, W. Susilo, and B. Yang, ”Witness- attack. However, our scheme ensures that even though based searchable encryption,” Information Sciences, the cloud server is authorized to perform equality test, it vol. 453, pp. 364-378, 2018. could not launch the insider attack on users cipheretext. [13] S. Ma, M. Zhang, Q. Huang, and B. Yang, ”Pub- Our scheme ensures a resistant to insider attack by the lic key encryption with delegated equality test in a adoption of witness based cryptographic primitive. Our multi-user setting,” The Computer Journal, vol. 58, scheme support weak indistinguishable identity chosen ci- no. 4, pp. 986-1002, 2015. phertext attack security (W-IND-ID-CCA) with extended [14] H. S. Rhee, J. H. Park, W. Susilo, and D. H. Lee, trapdoor security (TD). ”Trapdoor security in a searchable public-key en- cryption scheme with a designated tester,” Journal of Systems and Software, vol. 83, no. 5, pp. 763-771, References 2010. [15] Q. Tang, ”Public key encryption schemes supporting [1] S. Alornyo, M. Asante, X. Hu, and K. K. Mireku, equality test with authorisation of different granular- ”Encrypted traffic analytic using identity based en- ity,” International Journal of Applied Cryptography, cryption with equality test for cloud computing,” in vol. 2, no. 4, pp. 304-321, 2012. IEEE the 7th International Conference on Adaptive Science and Technology (ICAST’18), pp. 1-4, 2018. [16] Q. Tang, ”Public key encryption supporting plain- [2] D. Boneh, G. D. Crescenzo, R. Ostrovsky, and text equality test and user-specified authorization,” G. Persiano, ”Public key encryption with keyword Security and Communication Networks, vol. 5, no. search,” in International Conference on the Theory 12, pp. 1351-1362, 2012. and Applications of Cryptographic Techniques, pp. [17] T. Wu, S. Ma, Y. Mu, and S. Zeng, ”ID-based en- 506-522, 2004. cryption with equality test against insider attack,” [3] J. W. Byun, H. S. Rhee, H. A. Park, and D. H. Lee, in Australasian Conference on Information Security ”Off-line keyword guessing attacks on recent keyword and Privacy, pp. 168-183, 2017. search schemes over encrypted data,” in Workshop [18] G. Yang, C. H. Tan, Q. Huang, and D. S. Wong, on Secure Data Management, pp. 75-83, 2006. ”Probabilistic public key encryption with equality [4] R. Chen, Y. Mu, G. Yang, F. Guo, and X. Wang, ”A test,” in Cryptographers’ Track at the RSA Confer- new general framework for secure public key encryp- ence, pp. 119-131, 2010. tion with keyword search,” in Australasian Confer- [19] W. C. Yau, S. H. Heng, and B. M. Goi, ”Off-line key- ence on Information Security and Privacy, pp. 59-76, word guessing attacks on recent public key encryp- 2015. tion with keyword search schemes,” in International [5] R. Chen, Y. Mu, G. Yang, F. Guo, and X. Wang, Conference on Autonomic and Trusted Computing, ”Dual-server public-key encryption with keyword pp. 100-105, 2008. International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 751

Acknowledgments work Security. A Member of IEEE.

We wish to thank the anonymous reviewers for their com- Acheampong Edward Mensah received his masters ments and contributions. degree of Software Engineering from the University of Electronic Science and Technology of China in 2019. His research interests are in the areas of cryptography and Biography information security. Abraham Opanfo Abbam received his bachelor in In- Seth Alornyo is a lecturer at Koforidua Technical Uni- formation and Communication Technology degree from versity. He received his Master of Philosophy(M.Phil) University of Education, Winneba. He is currently pursu- degree from Kwame Nkrumah University of Science and ing software engineering masters degree program at Uni- Technology in 2014, bachelor of science degree, computer versity of Electronic Science and Technology of China. science in 2012 and a higher national diploma in 2008. His research interest lies in the area of Deep Learning Currently, he is pursuing his Ph.D. in Software Engineer- specifically Natural Language Processing. ing at University of Electronic Science and Technology of China. His research interests are Cryptography and Net- International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 752

One-Code-Pass User Authentication Based on QR Code and Secret Sharing

Yanjun Liu1, Chin-Chen Chang1, and Peng-Cheng Huang1,2 (Corresponding author: Peng-Cheng Huang)

Department of Information Engineering and Computer Science, Feng Chia University1 Taichung 40724, Taiwan College of Computer and Information Engineering, Xiamen University of Technology2 Xiamen, Fujian, China (Email: [email protected]) (Received July 10, 2018; Revised and Accepted Feb. 7, 2019; First Online Oct. 6, 2019)

Abstract based, biometrics-based and image morphing-based pro- tocols. In a password-based user authentication proto- The quick response (QR) code has gained extensive pop- col [8, 18], the user first registers at the remote server by ularity in information storage and identification in our transmitting a selected password to the server privately. daily lives due to its small printout size, high storage ca- Then, the server can provide the user with required ser- pacity and error correction ability. Taking into account vices after the authentication process, in which the pass- these fascinating features of the QR code, we propose a word presented by the user is checked to confirm that the user authentication protocol based on QR code and secret user is legal. However, a cost-inefficient problem arises sharing. It is the first user authentication protocol that due to the fact that a verifier table maintaining users’ implements the “one-code-pass” functionality in which an private information for authentication must be stored on authorized user can use only one QR code to access vari- the server and the volume of the verifier table is in pro- ous services from all departments within an organization. portion to the increased number of users. Therefore, the In the proposed protocol, a secret is divided into many verifier table needs to occupy much storage space when a shares in the form of QR codes held among different de- lot of users need to be authenticated. Besides, private in- partments and users. The original secret must be restored formation of the users may be leaked out since the verifier by the cooperation of both the department’s share and the table is prone to be stolen with malicious intention. user’s share for the authentication. Experimental results To overcome the aforementioned shortcomings, the demonstrate that the proposed protocol has high robust- smart card is employed extensively in the design of a user ness and is secure against various typical attacks. authentication protocol. There are two main advantages Keywords: One-Code-Pass; QR Code; Security; Secret for the use of smart card [6]. Firstly, the level of secu- Sharing; User Authentication rity can be enhanced significantly. Each user individually owns a smart card which stores important and confiden- tial information for mutual authentication that can au- 1 Introduction thenticate not only the user but also the remote server. Furthermore, the private information of all users is fully The user authentication mechanism has been regarded as dispersed by smart cards rather than being centralized in a very important technique to efficiently verify the iden- a verifier table, substantially attenuating the risk of infor- tity of the user through various kinds of secure commu- mation disclosure. Secondly, the authentication protocol nications. In 1981, for the first time Lamport [8] intro- becomes more cost-efficient because there is no need to duced a password-based remote user authentication pro- maintain a great deal of users’ information on the server. tocol. Since then, numerous variants [6, 7, 9, 18] of Lam- A typical smart card based authentication protocol is con- port’s protocol have been developed in the literature, no- ducted as follows [9]. First of all, the user submits a regis- ticeably extending the scope of applications for user au- tration request to the server, and then, the server delivers thentication in the field of secure communications, such as a smart card storing private information of both the user the agreement and distribution of session keys, e-business and the server to the user over a secure channel. When systems and wireless network communications, etc. the user inserts the smart card into a card reader for spe- At present, user authentication protocols generally fall cific services provided by the server, the identity of the into four categories, i.e., password-based, smart card- user should be verified to confirm that the user is the International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 753 true card owner. The private information maintained in that the user is legal; otherwise, an unauthorized user is the smart card is extracted by the card reader and then successfully detected. Since the original face image of the transmitted to the server for mutual authentication. If card owner is no longer stored anywhere, this protocol the authentication is passed, a shared session key is usu- can furnish efficient privacy protection and higher secu- ally established by the authorized user and the server to rity level. In 2015, Mao et al. [15] introduced a proxy user ensure subsequent secure communication. authentication protocol. In their protocol, the proxy user Unfortunately, the smart card is liable to be forged has authority to act on behalf of the primary user and since an illegal user may easily invade the secret informa- all users are able to be authenticated by image exchange tion preserved in the smart card due to weak computing based on morphing technology. power [7]. Consequently, biometrics information [16] re- In 2017, Liu and Chang [13] innovatively extended the ferring to unique biometric feature of a person that is image morphing-based authentication protocol to a “one- unable to be stolen or forged, has been employed as a card-pass” scenario with high practical use. Under this popular tool to further increase the authentication secu- scenario, an organization contains various kinds of depart- rity. Fingerprint, face, iris, voice and gait are the most ments providing diverse services. An authorized user can common biometric information used to verify the identity register at any department to acquire a smart card stor- of the user since it is impossible for two people to present ing a generated morphed image. After that, the user can identical data of these biometric features [20, 21]. The utilize this smart card to obtain services from different biometric information of the true user is usually stored in departments without registering again if the user passes advance. For authentication, the current biometrics fea- the authentication such that the face image of the user ture is retrieved by the biometric reader and then com- is identical to that restored by de-morphing the morphed pared with the stored information. If they can match, image and another face image maintained on the cloud the identity of the user is authenticated; otherwise, the storage servers. Nevertheless, most of the morphed im- authentication process fails. ages generated in authentication protocols look unnatu- Recognition rate that indicates the identification ac- ral, which is prone to arouse suspicion among malicious curacy is one of the most important measurements for attackers. Therefore, the research of optimization algo- performance evaluation of a biometrics-based authenti- rithms [14] on the selection of control points to achieve cation protocol [21]. Therefore, the research subject on better visual effect of morphed images becomes a very recognition rate has attracted scholars’ attention in re- challenging task. cent years and great recognition rate has been obtained by Nowadays, the quick response (QR) code [11,12] plays unceasing improvements on feature matching algorithms. a very important role in information storage and identi- However, this type of authentication protocol has the fol- fication in our daily lives. The QR code has many fas- lowing weaknesses [16, 20, 21]. Firstly, it just provides cinating features [11, 12, 19], such as small printout size, weak privacy protection since personal biometric infor- high storage capacity and error correction ability. Since mation may be disclosed and then abused by malicious the QR code is very easy to be decoded by any stan- adversaries. Secondly, if the true user’s biometric infor- dard QR code reader, it also has gained extensive popu- mation is destroyed because of diseases or accidents, it larity in real-time applications. Due to these advantages, will induce incorrect authentication. At last, development the QR code is very suitable for storing the user’s per- of a robust biometric reader is both time-consuming and sonal information to verify the identity of the user. In money-consuming. 2018, Huang et al. [5] applied the aesthetic QR code and To make further efforts on the uplift in security and the single-key-lock mechanism to a smart-building access performance, a new type of authentication protocols control system. The single-key-lock mechanism produces based on image morphing has flourished. The image mor- keys for users and locks for doors. After encryption and phing technology suspends from the production of special fusion procedures, the keys are used to generate aesthetic image effects in the film industry in such a way that given QR codes as the user’s credential. This access control a source image and a target image, a morphed image is system has the attributes of high security and robust- created. The morphed image looks like both the source ness. Different from Huang et al.’s access control system, image and the target image. This attractive characteris- we consider the application of QR code from another as- tic can be well applied to identity authentication. Up to pect. The application scenario of our user authentication date lots of user authentication protocols combining im- protocol is similar to that of Liu and Chang’s morphing- age morphing with smart card are introduced [13,15,22]. based “one-card-pass” method, but ours offers “one-code- Zhao and Hsieh [22] presented a card user authentication pass” functionality via the combination of QR code and protocol in which a morphed image MI is first created secret sharing. That is to say, an authorized user can suc- via the card owner’s face image OI and a pre-selected cessfully use only one QR code to access various services face image SI and then printed on the smart card. When from all departments within an organization. Suppose a card user needs to be authenticated, the image OI is that there is a secret provided by the server of the organi- recovered by a de-morphing process operating on the im- zation in the proposed protocol. The secret is divided into ages MI and SI. Then, the image OI is compared with many shares in the form of beautified QR codes and each the face image of the user. If they are the same, it implies department holds its own share beforehand. During the International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 754 registration, a user can register at any department and padding bits. Then, n parity bits are created for detect- acquire his/her share from the department. When the ing and correcting errors when scanning the QR code. user wants to access services of any of the departments, Figure 1 demonstrates how to place an RS code into a 2D the original secret must be restored by the cooperation of QR code. Especially, each message/padding/parity bit is both the department’s share and the user’s share; knowing located onto one module of the message/padding/error only one share is unable to derive any valuable informa- correction region. tion about the secret. If the restored secret is not correct, the user is illegal. Experimental results demonstrate that the proposed protocol is secure against various attacks and the generated beautified QR code has high robust- ness. The rest of the paper is organized as follows. In Section 2, we first introduce elementary knowledge of QR code and secret sharing, and then, review related works [13] and [5]. In Section 3, a novel “one-code-pass” user authentication protocol based on QR code and se- cret sharing is proposed. Section 4 demonstrates the ex- Figure 1: The structure of a QR code perimental results of the proposed protocol and Section 5 gives the security and performance analyses. Finally, our conclusions are presented in Section 6.

2 Preliminaries

In this section, we first introduce main building blocks of a new user authentication protocol that will be pro- posed in the next section. After that, we provide a brief review of the morphing-based “one-card-pass” authenti- cation protocol [13] and the QR code-based access control Figure 2: RS code system [5].

2.1 QR Code 2.2 Shamir’s Threshold Secret Sharing The QR code, invented by the Japanese Denso-Wave Company in 1994 [4], is a two-dimensional graphical code The concept of secret sharing, independently introduced that consists of black and white square modules represent- by Shamir [17] and Blakley [3] in 1979, is an effi- ing bit information. As illustrated in Figure 1, the struc- cient mechanism for data protection. The popularity of ture of a standard QR contains message region, padding Shamir’s threshold secret sharing (TSS) has noticeably in- region and error correction region, together with version creased due to its high efficiency and security. In a (t, r) information, format information, position detecting pat- TSS scheme (t ≤ r), a secret is divided into r shares to be terns, alignment patterns and timing patterns. held by r participants. The secret can be simply recov- The QR code is extensively used in information stor- ered by the collaboration of at least t shares; otherwise, age and identification applications due to its advantages if fewer than t shares are collected, the original secret on small printout size, high storage capacity and error cannot be retrieved correctly. correction ability [11, 12, 19]. As many as 40 QR code Assume that there is a dealer and r participants, and versions are used to determine different storage capacities the secret is denoted as s. The (t, r) TSS scheme con- such that the QR code with a higher version number can tains the share establishment phase and the secret recov- provide larger data payload. As another extraordinary ery phase, which are described as follows. feature of the QR code, the error correction capability with four error correction levels (i.e. L, M, Q and H) 2.2.1 Share Establishment Phase ensures successful decodability when portions of the QR code are dirty or damaged. The dealer randomly selects a Lagrange interpolating One of the most important processes for generating a polynomial g with degree t-1 such that g(x) = s + a1x + 2 t−1 QR code is to encode information using Reed-Solomon a2x + ... + at−1x mod p, where p is a prime num- (RS) code. A t-bit RS code, shown in Figure 2, is com- ber, and a1, a2, . . . , at−1 and s are in the finite field posed of three segments, i.e., l message bits, m padding GF(p). Then, the dealer chooses r random numbers xd bits and n parity bits. The to-be-embedded message is for d = 1, 2, . . . , r to establish r shares sd = g(xd) for first decoded into a l-bit binary stream, followed by m d = 1, 2, . . . , r, and provides each participant with a share. International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 755

2.2.2 Secret Recovery Phase service provided by this department. The detailed steps in this phase are listed below. The t participants can cooperate to recover the secret s by releasing their shares. Suppose the t out of r Step 1. User U presents the smart card to a terminal T . shares are denoted as sbh ∈ {s1, s2, . . . , sr} for h = 1, 2, . . . , t. According to the principle of the Lagrange in- Step 2. T scans the morphed image MI printed on the terpolating polynomial, g(x) can be recovered by g(x) = smart card and extracts the information for authen- Pt Qt x−xbk sbh mod p. Thus, the secret s can tication from the smart card. h=1 k=1,k6=h xbk−xbh be immediately obtained by s = g(0). Step 3. T finds the face image SI stored in the cloud storage servers C according to the extracted infor- 2.3 Related Works mation.

2.3.1 Morphing-Based “One-Card-Pass” Au- Step 4. C sends SI to T . thentication Protocol Assume that an organization includes many independent Step 5. T recovers the face image of the true card owner, departments. For instance, a university may contain a OI, by de-morphing images MI and SI. library, a digital information center, a fitness center, a Step 6. T compares the face image of the user U with student association, etc. We hope that the “one-card- OI to see if they are identical. If it holds, U is legal. pass” functionality could be provided in such a way that a faculty member or a student could use only one smart This protocol is very practical since it implements the card to gain various services from all of the departments. “one-card-pass” functionality. The strategy that the real Accordingly, the objective of the “one-card-pass” user au- image of the true card owner is not stored anywhere but thentication protocol is to verify the identity of the card hidden in a morphed image increases the security to a cer- user in each department after the user registers at any of tain extent. However, the protocol requires an additional the departments. set of cloud storage servers that preserve large numbers of Recently, Liu and Chang [13] proposed the first “one- face images for the morphing operation, which may lead card-pass” user authentication protocol using image mor- to potential security problems. Furthermore, since the phing technology. This protocol contains three enti- procedures of printing and scanning may cause noises, the ties, i.e., a card user, a terminal within a department, authentication result may not be correct due to the dis- and cloud storage servers, and it consists of the registra- tortions generated to the recovered face image. To solve tion phase and the authentication phase. these problems, in this paper, we will combine QR code In the registration phase, a legal user U can register and secret sharing to design a new “one-code-pass” user at any terminal T within any department. The following authentication protocol in Section 3. steps are conducted to fulfill the registration.

Step 1. The user U sends a registration request includ- 2.3.2 QR Code-Based Access Control System ing his/her identity number to a terminal T . Huang et al. [5] presented a novel access control system for smart building based on QR code. The enrollment Step 2. T takes a digital photo of U. This photo, de- procedure of this system consists of three steps, i.e. the noted as OI, is used as the original face image of keys and locks generation, key encryption, and the QR U. code beautification, as listed below: Step 3. T selects a face image SI stored in the cloud Step 1. Keys and locks generation. The single-key-lock storage servers C according to U’s identity number mechanism is used to generate keys for users and and then C sends SI to T . locks for doors on a randomly non-singular matrix. Step 4. T generates a morphed image MI using OI as the source image and SI as the target image by a Step 2. Key encryption. Each key is encrypted by a specific morphing algorithm. symmetrickey cryptography algorithm to prevent the leak of key. Then, a QR code generated by the en- Step 5. T stores the information that will be used in crypted keys is sent to the user. the authentication phase in a smart card and prints the morphed image MI on the smart card. Then, T Step 3. QR code beautification. The owner of original delivers the smart card to U. QR code produced by Step 2 cannot be distinguished by human eyes for a number of confused black and Later, in the authentication phase, a terminal T of a white modules. This would induce difficulties in the department recovers the face image of the true card owner management of QR codes when the number of users via image de-morphing to verify the validity of the card (keys) grows sharply. Therefore, these QR code need user U. If the face image of U matches the recovered to be beautified to show the user’s photo on its own. face image, U passes the authentication and can gain the The beautification process includes: International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 756

a) Construct the basis vector matrix. user and department sides must be collected to re- b) Generate an XORed QR code by perform the cover the original secret; Just one share is impossible XOR operation between the Reed-Solomon mes- to leak the secret. In addition to this, the proposed sage of the QR code and the basis vector ma- protocol is secure against various attacks, such as the trix. In the XORed QR code, most modules impersonation attack, the collusion attack, the replay in the padding region are modified and middle attack, etc. It can also achieve high robustness as it modules are kept clear to make the preparation is tolerant of various quality degradation situations. of embedding the user’s photo. c) Synthesize an aesthetic QR code from the user’s 3.2 Main Idea photo and the XORed QR code as the user’s To authenticate the identity of a user, a (2,M + N) TSS credential. is exploited in the proposed protocol for an organization containing M users and N departments. At first, some When a user presents the aesthetic QR code to request initialization work should be done. The server of the or- the access to a door, the reader on the door would de- ganization selects a secret s and generates N shares such crypt and retrieve the user’s key from the aesthetic QR that each share is held by a corresponding department. code, and then, verifies the access right via conducting an Then, for each user, he/she can register at any depart- operation on the key and the lock. ment and acquire his/her own share from the department with the help of the server. It is noted that each share 3 The Proposed Protocol of a department/user is a beautified QR code that not only encodes authentication information, but also embeds In this section, we propose a novel “one-code-pass” user a logo/photo of the department/user in its padding re- authentication protocol. The proposed protocol contains gion for efficient management of QR codes. When a user the initialization phase, the registration phase and the wants to access services of any of the departments, the authentication phase. In the following, we first point out user is authenticated by recovering the original secret s the contribution of the proposed protocol, then address through the cooperation of both the department’s share its main idea, and finally elaborate on each phase. and the user’s share. The photo of the authorized user on the beautified QR code also provides an auxiliary way for user authentication. Figure 3 shows the architecture 3.1 Contribution of the proposed protocol, in which a user first registers The contribution of the proposed protocol is described as at the department 1, and then is authenticated by the follows: departments 1 and 2, respectively. Before addressing the detailed protocol, some notations 1) It is the first user authentication protocol that can used in the paper are described in Table 1. fully accomplish the goal of “one-code-pass” in a specified organization. More specifically, a user is able to make registration on any department to gain Table 1: Notation description his/her own beautified QR code. By using the QR Notation Description code, the user is immediately authenticated and di- S The server rectly accesses diverse services from all departments Ui(1 ≤ i ≤ M) The user i

in this organization. IDUi The identity number of Ui QU The beautified QR code for Ui 2) The QR code technique is combined with secret shar- i Dj(1 ≤ j ≤ N) The department j ing mechanism to implement the proposed protocol. IDDj The identity number of Dj A secret is divided into a number of shares that Q The beautified QR code for D are held among different departments and authorized Dj j s The secret provided by S users. The share actually is a beautified QR code The secret key shared among all containing some authentication information. Each K departments department gains its share in advance and each user g(·) The Lagrange interpolating polynomial obtains his/her share from the department on which h(·) A collision-free one-way hash function the registration operation for the user is conducted. k The string concatenation operation To verify the identity of the user, the original se- cret should be recovered by the cooperation of both the department’s share and the user’s share. If the restored secret is correct, the user is eligible to get services from all departments. 3.3 The Initialization Phase 3) The proposed protocol is significantly secure. Based This phase allows the server to generate a share in the on the secret sharing mechanism, both shares on the form of a beautified QR code for each department. As- International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 757

Figure 3: Architecture of the proposed protocol

sume that all of the N departments within the organi- Step 4. S generates a beautified QR code QUi for the zation share a secret key K. Firstly, the department Dj user Ui as the share of Ui by using the information computes xDj = h(IDDj k K) and sends it to the server (xUi , yUi ) and Ui’s photo. The method of generating S. Then, S selects a secret s, a number a1 and a prime the beautified QR code for the user is the same as number p to generate a one-degree-polynomial function g that for the department as mentioned in the initial- as ization phase.

Step 5. S sends Q to D . g(x) = s + a1x (mod p). (1) Ui j

Step 6. Dj sends QUi to Ui. For each department, S computes yDj = g(xDj ) = s + a1xDj (mod p). Finally, S creates a beautified QR code Q for the department D as the share of D by using Dj j j 3.5 The Authentication Phase the information (xDj , yDj ) and Dj’s logo. Here, we briefly explain how the beautified QR code for The authentication is conducted by an authentication de- a department is generated. The information (xDj , yDj ) is vice within a department which is equipped with a QR first placed into modules of the message region to produce code reader and has a copy of the secret s. The authenti- a QR code. Then, a beautification algorithm on the QR cation device verifies the identity of the user by recover- code is made for efficient management. There are many ing the original secret s through the collaboration of the available beautification algorithms and we use the one user’s QR code and the department’s QR code. If the presented in [10]. In this algorithm, the modules in the user passes the authentication, he/she can gain services padding region of the QR code is first modified to keep provided by this department. The steps of this phase are “clean” with the help of a basis vector matrix, and then, addressed as follows and illustrated in Figure 5. the department’s logo is embedded into the padding re- gion by a synthetic strategy. Interested readers can refer Step 1. The user Ui shows the beautified QR code QUi to [10] for a better understanding. to the authentication device. Step 2. The authentication device scans the QR code

3.4 The Registration Phase QUi of Ui and extracts (xUi , yUi ). In this phase, a user can register at any department and Step 3. The authentication device scans the QR code acquire his/her own share, also in the form of a beautified QDj of the department Dj where it is located and

QR code. This phase is described in detail as follows and retrieves (xDj , yDj ). demonstrated in Figure 4. Step 4. The authentication device uses (xUi , yUi ) and (x , y ) as two shares to restore a secret s0. If Step 1. The user Ui sends his/her identity number IDUi Dj Dj 0 to the department Dj. s 6= s, the authentication device terminates the phase and the authentication fails; otherwise, it exe-

Step 2. Dj computes xUi = h(IDUi k K) and sends xUi cutes Step 5. to the server S. Step 5. The authentication device computes x0 = Ui Step 3. S computes y = g(x ) = s+ a x (mod p) by h(ID k K) and then checks whether x0 equals Ui Ui 1 Ui Ui Ui

Equation (1). xUi . If it holds, the authentication device confirms International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 758

Figure 4: The registration phase of the proposed protocol

that the user is legal; otherwise, the phase is termi- are involved into a concrete example to illustrate the ex- nated and the authentication fails. perimental results. In the following example, we select s = 1024, a1 = 21, p = 1237 and K =“@MSNLAB”. Thus, the function g generated by the server S for the initialization becomes g(x) = 1024+21x(mod 1237). The

beautified QR code QDj for each of the two departments is generated in the initialization phase and shown in Ta- ble 2, where the department’s logo, the identity number

IDDj and the information (xDj , yDj ) used for the gen-

eration of QDj are also provided. After the registration on the department D1 or D2, each of the three users ob-

tains a beautified QR code QUi by using the user’s photo,

the identity number IDUi and the information (xUi , yUi ), as shown in Table 3. In addition, the departments’ lo- gos in Table 2 are provided by Feng Chia University and the users’ photos in Table 3 come from the Yale Face Figure 5: The authentication phase of the proposed pro- Database [2]. tocol Here, we only take the user authentication process per- formed by the department D1 as an example. Now let us In the authentication phase, it is worth noticing that if demonstrate how the department D1 verifies the identity the restored secret s0 is different from the original secret of the user U1 in the proposed protocol. Firstly, the au- s by Step 4, the user is definitely illegal. However, if s0 is thentication device in D1 scans D1’s beautified QR code Q (See in Table 2) and U ’s beautified QR code Q equal to s, it cannot confirm that the user is legal. This D1 1 U1 is because if an attacker impersonates a user to embed (See in Table 3), respectively. Secondly, the information (x , y ) and (x , y ) are extracted from Q and fake values of x00 and y00 that satisfy y00 = g(x00 ) = D1 D1 U1 U1 D1 Ui Ui Ui Ui 00 QU , respectively, and then used as two shares to recon- s + a1x (mod p) into the QR code, the correct secret 1 Ui struct a function g0 as s can still be obtained by the two points (xDj , yDj ) and (x00 , y00 ) on the function g. Therefore, Step 5 is added Ui Ui 0 x−xD1 x−xU1 00 0 g (x) = yU1 + yD1 (mod 1237) to check whether x equals h(ID k K) when s = s to xU1 −xD1 xD1 −xU1 Ui Ui ensure that x00 is real. = 1024 + 21x(mod 1237). Ui Then, a secret s0 is derived by s0 = g0(0) = 1024, which 4 Experimental Results is equal to the original secret s. Finally, we compute x0 = h(ID k K) and find that x0 equals x . This U1 U1 U1 U1 To evaluate the performance of our proposed protocol, indicates that the user U1 passes the authentication. our experiments were implemented by the open source In the following, we also show how the de- computer vision library OpenCV and C++ program lan- partment D1 detects an illegal user Bob by the guage. In this section, three users and two departments proposed protocol. Assume that Bob uses xB= International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 759

QR code, but first encrypted by a one-way hash function Table 2: The beautified QR codes for the departments as xUi = h(IDUi k K), and then, the encrypted message Department D1 Department D2 xUi and the corresponding value yUi computed via xUi by Equation (1) are used to generate the user Ui’s QR code

QUi . When a malicious attacker scans the QR code QUi Logo through a QR code reader, he/she only gets the messages

xUi and yUi but has no idea of the user’s identity number

IDUi . By this way, the user’s personal information will not be disclosed so that privacy protection is achieved. http://lib.fcu.edu http://www.sa.fcu Besides, it doesn’t matter that the photo embedded on IDD j .tw .edu.tw the QR code reveals the appearance of the user since it is dc1cf84361a49864 742b34c092beb4ea useless for the user authentication. xDj = h(ID k K) 24325645bef5be80 e998a104509635fe Dj 2344dc55 32c394f3 5.1.2 Withstanding Impersonation Attack yDj = g(xDj ) 607 107 According to the proposed protocol, the impersonation attack refers to an attack that a malicious intruder forges a QR code and then impersonates a user with the inten- QD j tion of passing the authentication. To launch this kind

of attack, the intruder must produce fake values of xUi

and yUi and use them to forge the QR code. There is a

strong possibility that the fake xUi and yUi do not sat-

isfy the equation yUi = g(xUi ) = s + a1xUi (mod p), thus 6ed3693a3ef2fa0421cd0d1bd36750691522ce29, yB = 671 the secret s restored by the cooperation of the two shares and his photo to forge his QR code QB and then shows QB (xUi , yUi ) and (xDj , yDj ) is different from the original se- to the authentication device. The authentication device cret s. This implies that the impersonation attack is suc- uses (xB, yB) extracted from QB and (xD1 , yD1 ) derived cessfully detected in this situation. However, there also 00 from QD1 to reconstruct a function g as exists a situation that the intruder knows the function

g(x) = s + a1x(mod p) and produces fake xUi and yUi 00 x−xD1 x−xB g (x) = yB + yD1 (mod 1237) satisfying the function g such that the correct secret s xB −xD1 xD1 −xB = 287 + 16x(mod 1237). can still be obtained. Since under this case the value of s cannot determine whether the user is legal, the proposed Obviously, a secret s0 is derived by s0 = g00(0) = 287 6= protocol will further compute x0 = h(ID k K). The Ui Ui 1024, which implies that Bob is an illegal user. value x created by the intruder will not equal x0 be- Ui Ui cause the intruder does not know the value of K, which

indicates that xUi is fake and the impersonation attack 5 Analyses fails. In this section, we give the security and performance anal- yses of the proposed protocol, respectively. 5.1.3 Withstanding Collusion Attack

5.1 Security Analysis The proposed protocol must be collusion-resistant. The The proposed protocol can efficiently protect the user’s collusion attack in the proposed protocol means that, if privacy and resist various typical attacks, such as the im- multiple authorized users collude, they can obtain impor- personation attack, the collusion attack and the replay tant information and use it to help an attacker to pass the attack. The detailed security analysis from the theoreti- authentication. Since it is sufficient for any two shares cal aspect is given as follows. working together to restore the secret s, two authorized users can release their shares and collude to recover the function g(x) = s+a x(mod p). The two users may share 5.1.1 Privacy Protection 1 the function g with an attacker who attempts to imper- One property of the QR code is that the information em- sonate a legal user. The attacker can create the values of bedded in the message region of the QR code can be xUi and yUi satisfying the function g and the correct secret decoded and extracted by any QR code reader. As a s can be obtained in the authentication phase. Neverthe- result, a malicious attacker must be prevented from re- less, the attacker cannot pass the authentication since the trieving any personal knowledge about the user from the created value x will not equal x0 (x0 = h(ID k K)) Ui Ui Ui Ui extracted information. To achieve this goal, the user Ui’s as addressed in Subsection 5.1.2. Therefore, the collusion identity number IDUi is not embedded directly into the attack cannot be launched successfully. International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 760

Table 3: The beautified QR codes for the users

User U1 User U2 User U3

Photo

IDUi WangJiaQing#M0557591 ZhuZhaoHua#M0419274 LiJing#P0361138

xUi = a2e5bc07294a34caaffcc79 3cf5b5ffb9153545bd8bbf1 53132e6ffe935538492e213f

h(IDUi k K) 264e0f2aec912b37a f0edb154ec2691fc1 9c232d308c9c0488

yUi = g(xUi ) 788 709 565

Photo

5.1.4 Withstanding Replay Attack 5.2.2 Robustness of the Proposed Protocol

In a replay attack, a valid data transmission is repeated In the application scenario, when the beautified QR code or delayed by a malicious attacker without detection. The was printed in the plastic card to be an ID card or scanned proposed scheme can withstand such attack by the follow- by a camera in the absence of adequate lighting condition, ing way. An attacker can steal or duplicate a valid QR it usually suffers from several image degradation factors, code and tries to use it to pass the authentication. One such as pixel distortion, geometric distortion, noise, blur, method to prevent it happening is that we can periodi- and so on. These factors can be considered as a kind cally change the value of the secret key K and accordingly of image attack. Sometimes the quality of the QR code update the QR codes for the users and departments based image being attacked has degraded significantly. Fig- on the function g. Therefore, the QR code held by the ure 6(a) shows the result of the beautified QR code QUi attacker always expires since the attacker can never get of user U1 in Table 3 suffering from the print-and-scan at- the latest version of the user’s QR code. tack. The beautified QR code was printed in 600dpi with the HP LaserJet 500 color M551 printer, and scanned by the 200dpi with HP LaserJet M2727nf scanner. Fig- 5.2 Performance Analysis ure 6(b) shows the result of QUi suffering from Gaussian blur with the parameter σ = 5. Figure 6(c) shows the re- To evaluate the performance of the proposed protocol, we sult of Q suffering from Gaussian noise with parameters analyze the decoding rate and robustness of the QR codes Ui M = 0,V = 0.05. for users and departments in this subsection. The authentication information embedded in these at- tacked beautified QR codes could still be decoded by any 5.2.1 Decoding Rate of Beautified QR Codes standard QR code reader. It demonstrates that the beau- tified QR code is tolerant to common attacks and the To evaluate the decoding rate of the beautified QR one-code-pass user authentication protocol is practically code, we conduct experiments on both Apple’s iOS and usable in the real-world applications. Google’s Android mobile operation system. As shown in Table 4, several applications from Apple APP store and Google Play APP store were installed on two different mo- 6 Conclusions bile phones (iPhone 7, XiaoMi 4) and twenty beautified QR codes synthesized from users’ authentication infor- In this paper, we proposed a novel “one-code-pass” user mation with their photos were used to test the decoding authentication protocol based on QR code and secret rate of beautified QR codes. All these 20 users’ photos sharing such that an authorized user can successfully use are from Yale face database [2] and AT&T Cambridge only one QR code to access various services from all de- face database [1]. From Table 4, we can see that all the partments within an organization. In the proposed proto- APPs in these two selected phones can successfully de- col, a secret is divided into many shares in the form of QR code the beautified QR code messages at the decoding codes held among different departments and users. Each rate of 100%. user can register at any department and acquire his/her International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 761

Table 4: Decoding rate in different applications

Mobile Phone Applications (Developer) Decoding Rate WoChaCha QR code 100% (WoChaCha Information Technology) Quick scan (iHandy Inc.) 100% iPhone 7 Quick QR code reader & creator (iOS 10.3.2) 100% (Fellow Software) QR code kit (Sima Biswas) 100% QuickMark (SimpleAct Inc.) 100% WoChaCha QR code 100% (WoChaCha Information Technology) XiaoMi 4 QR code extreme(FancyApp) 100% (Android 7.0) QR code reader (Scan.me) 100% Free QR code scanner (TWMobile) 100% QuickMark (SimpleAct Inc.) 100%

[5] P. C. Huang, C. C. Chang, Y. H. Li, and Y. Liu, “Ef- ficient access control system based on aesthetic QR code,” Personal and Ubiquitous Computing, vol. 22, no. 1, pp. 81–91, 2018. [6] Q. Jiang, J. Ma, G. Li, and X. Li, “Improvement of robust smart-card-based password authentication (a) (b) (c) scheme,” International Journal of Communication Systems, vol. 28, no. 2, pp. 383–393, 2015.

Figure 6: Results of beautified QR code of user U1 in Ta- [7] S. Kumari, S. A. Chaudhry, F. Wu, X. Li, M. S. ble 3 after image degradation processes. (a) After Print- Farash, and M. K. Khan, “An improved smart and-Scan attack; (b) After adding the Gaussian blurring card based authentication scheme for session initi- with σ = 5; (c) After adding the Gaussian noise with ation protocol,” Peer-to-Peer Networking and Appli- parameters M = 0,V = 0.05 cations, vol. 10, no. 1, pp. 92–105, 2017. [8] L. Lamport, “Password authentication with inse- cure communication,” Communications of the ACM, vol. 24, no. 11, pp. 770–772, 1981. share including authentication information from the de- [9] C. T. Li, “A new password authentication and user partment. When a user wants to access services of any anonymity scheme based on elliptic curve cryptog- of the departments, the user is authenticated by recov- raphy and smart card,” IET Information Security, ering the original secret through the cooperation of both vol. 7, no. 1, pp. 3–10, 2013. the department’s share and the user’s share. Theoretical [10] L. Li, J. Qiu, J. Lu, and C. C. Chang, “An aesthetic analyses and experimental simulations show that the pro- QR code solution based on error correction mecha- posed protocol can efficiently protect the user’s privacy nism,” Journal of Systems and Software, vol. 116, and resist various typical attacks, such as the imperson- pp. 85–94, 2016. ation attack, the collusion attack and the replay attack. [11] P. Y. Lin, “Distributed secret sharing approach with cheater prevention based on QR code,” IEEE Trans- References actions on Industrial Informatics, vol. 12, no. 1, pp. 384–392, 2016. [1] At&t Face Database, accessed 31 Oct. 2017. [12] S. S. Lin, M. C. Hu, and T. Y. Lee, C. H. Lee, “Effi- (http://www.cl.cam.ac.uk/research/dtg/atta cient QR code beautification with high quality visual rchive/facedatabase.html) content,” IEEE Transactions on Multimedia, vol. 17, [2] Yale Face Database, accessed 31 Oct. 2017. no. 9, pp. 1515–1524, 2015. (http://cvc.yale.edu/projects/yalefaces/ya [13] Y. Liu and C. C. Chang, “A one-card-pass user au- lefaces.html) thentication scheme using image morphing,” Mul- [3] G. R. Blakley, “Safeguarding cryptographic keys,” timedia Tools and Applications, vol. 76, no. 20, in Proceedings of the National Computer Conference, pp. 21247–21264, 2017. vol. 48, pp. 313–317, 1979. [14] Q. Mao, K. Bharanitharan, and C. C. Chang, [4] Denso-Wave Inc. QR code standardization, accessed “Edge directed automatic control point selection al- 31 Oct. 2017. (http://www.qrcode.com/en/index. gorithm for image morphing,” IETE Technical Re- html) view, vol. 30, no. 4, pp. 343–243, 2013. International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 762

[15] Q. Mao, K. Bharanitharan, and C. C. Chang, “A International Conference on Awareness Science and proxy user authentication protocol using source- Technology (iCAST’11), pp. 117–122, Sep. 2011. based image morphing,” The Computer Journal, vol. 58, no. 7, pp. 1573–1584, 2014. [16] V. Odelu, A. K. Das, and A. Goswami, “A secure Biography biometrics-based multi-server authentication proto- col using smart cards,” IEEE Transactions on In- Yanjun Liu received her Ph.D. degree in 2010, in School formation Forensics and Security, vol. 10, no. 9, of Computer Science and Technology from University of pp. 1953–1966, 2015. Science and Technology of China (USTC), Hefei, China. [17] A. Shamir, “How to share a secret,” Communications She has been an assistant professor serving in Anhui Uni- of the ACM, vol. 22, no. 11, pp. 612–613, 1979. versity in China since 2010. She currently serves as a [18] H. M. Sun, Y. H. Chen, and Y. H. Lin, “oPass: A senior research fellow in Feng Chia University in Taiwan. user authentication protocol resistant to password Her specialties include E-Business security and electronic stealing and password reuse attacks,” IEEE Transac- imaging techniques. tions on Information Forensics and Security, vol. 7, Chin-Chen Chang is a professor in Feng Chia Univer- no. 2, pp. 651–663, 2012. sity. He received the BS degree in Applied Mathematics [19] I. Tkachenko, W. Puech, C. Destruel, O. Strauss, in 1977 and the M.S. degree in Computer and Decision J. M. Gaudin, and C. Guichard, “Two-level QR code Sciences in 1979, both from the National Tsing Hua Uni- for private message sharing and document authenti- versity, Taiwan. He received the Ph.D. degree in Com- cation,” IEEE Transactions on Information Foren- puter Engineering in 1982 from the National Chiao Tung sics and Security, vol. 11, no. 3, pp. 571–583, 2016. University, Taiwan. He is the author of more than 900 [20] J. Yu, G. Wang, Y. Mu, and W. Gao, “An ef- journal papers and has written 36 book chapters. His ficient generic framework for three-factor authen- research interests include computer cryptography, data tication with provably secure instantiation,” IEEE engineering, and image compression. Transactions on Information Forensics and Security, vol. 9, no. 12, pp. 2302–2313, 2014. Peng-Cheng Huang is a lecturer at the Xiamen Uni- [21] L. Zhang, Y. Zhang, S. Tang, and H. Luo, “Pri- versity of Technology. He received his BS degree from vacy protection for e-health systems by means of Xiamen University of Technology in 2007, the MS degree dynamic authentication and three-factor key agree- in Computer Architecture from the Fuzhou University in ment,” IEEE Transactions on Industrial Electronics, 2010. He is currently pursuing the Ph.D. degree from the vol. 65, no. 3, pp. 2795-2805, 2018. Feng Chia University. His current research interests in- [22] Q. Zhao and C. H. Hsieh, “Card user authentication clude multimedia security, image processing, and Internet based on generalized image morphing,” in 2011 3rd of thing. International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 763

Efficient Anonymous Ciphertext-Policy Attribute-based Encryption for General Structures Supporting Leakage-Resilience

Xiaoxu Gao and Leyou Zhang (Corresponding author: Xiaoxu Gao)

School of Mathematics and Statistics, Xidian University Xi’an, Shaanxi 710071, China (Email: gxx [email protected]) (Received Mar. 5, 2019; Revised and Accepted Nov. 6, 2019; First Online Jan. 23, 2020)

Abstract • The only computation leaks information model was proposed by Micali [14]. It requires that the leak only The traditional cryptographic schemes cannot guarantee occurs in the memory part of the system executing data security and users privacy under the side-channel at- the calculation, and the memory part that does not tacks. Additionally, most of the existing leakage-resilient participate in the calculation is not leaked. The rea- schemes cannot protect the privacy of the receivers. To son given in [14] is as “data can be placed in some achieve the leakage resilience and privacy-preserving, two form of storage where, when not being accessed and anonymous attribute-based encryption (ABE) schemes computed upon, it is totally secure.” for general access structures are proposed. In the first scheme, the access structure is encoded as minimal sets • The relative leakage model (also named memory- which provide the higher efficiency in the cost of de- attacks model) was proposed by Alwen et al. [1] to cryption algorithm. Then we show how to obtain an deal with cold boot attacks where the part not in- anonymous leakage-resilient ABE for non-monotone ac- volved in the operation also leaked information. In cess structures. Both schemes can tolerate the contin- this model, the leakage amount is bounded by a pre- ual leakage when an update algorithm is employed in the determined value. event of the occurrence of the leakage information be- yond the allowable leakage bound. They are proven to be • The bounded retrieval model is a model that stronger adaptively secure in the standard model under four static than the relative leakage model [2, 21, 25, 27, 29]. In assumptions over composite order bilinear group. The this model, the leakage parameter l is an arbitrary performance analyses confirm efficiency of our schemes. and independent parameter of the system, and se- cret keys can be increased to allow l bits of leakage Keywords: Anonymous; Attribute-based Encryption; without affecting the size of public keys. Ciphertext-Policy; Leakage-Resilience • The continuous leakage model [30] was put forward to solve the situation that the leakage bits exceed 1 Introduction the predetermined number in which the leakage is unbounded in the lifecycle of the system, but it is In traditional encryption schemes, security is based on bounded between consecutive updates. an idealized assumption that the adversary cannot get any information about the private keys and internal state. • The auxiliary input model was presented by Dodis [8] However, the practice shows this assumption is quite in- which required that polynomial time adversary can- valid. Many cryptographic schemes are vulnerable to side- not recover sk from f(sk) with negligible probabil- channel attacks, where an adversary can learn meaningful ity. Meanwhile, Yuen et al. [22] proposed a model information about a system by using some of the physical which combined the concepts of the auxiliary inputs information that the algorithm outputs, such as running and continual memory leakage. The scheme [28] also time, power consumption, and fault detection etc.. In or- comes from this model. der to characterize the leaked information that the adver- sary knows in the system better, various leakage models While the ABE schemes constructed in the above leak- are presented. Some of them are motivated by practical age model cannot achieve anonymity except [25,27]. Ad- issues, while others are for theoretical needs. ditionally, the number of leakage bits in [25] is bounded International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 764 and the performance of [27] is inefficient because its de- bilinear group. cryption time depends not only on the leakage parame- Though ABE can be directly applied to design secure ter but also on the number of attributes. Hence, it is access control, there is an increasing need to protect user’s natural to ask whether there is an efficient anonymous privacy in access control systems. In order to address the leakage-resilient ciphertext-policy attribute-based encryp- problem, the concept of anonymous ABE was introduced tion scheme resilient to the continual leakage. in schemes [10,11]. More related works are referred to [23, 24, 27]. In the anonymous CP-ABE, ciphertexts can not 1.1 Related Work reveal the information of corresponding attributes in the access policies. A user obtains his/her secret keys if the ABE [18] was regarded as a highly promising public key corresponding attribute sets satisfy the access structure primitive for realizing scalable and fine-grained access embedded in the ciphertexts. The user cannot decrypt control systems. Goyal et al. [9] formulated the idea and guess what access policy was specified by the data of ABE and presented key-policy ABE (KP-ABE). Then owner. As we know, there is no efficient anonymous CP- Bethencourt et al. [4] put forward to the ciphertext-policy ABE can achieve constant size ciphertexts and adaptive ABE (CP-ABE). In KP-ABE, the private keys are as- leakage-resilient security in the standard model. sociated with access policies and ciphertexts are labeled with sets of attributes. While in CP-ABE, ciphertexts 1.2 Our Contributions are associated with access policies and private keys are labeled with sets of attributes. ABE has become a re- Based on the works of [30] and [6], we put forward efficient search hotspot because it implements one-to-many en- anonymous leakage-resilient CP-ABE schemes for general cryption. Following this trend, many schemes have been access structures, which has better leakage rate and adap- proposed [5, 7, 13, 26]. tive security under the four static assumptions in the stan- Schemes [4,9] adopted the access tree which is a mono- dard model over composite order bilinear group. In ad- tone access structure. A KP-ABE scheme can handle non- dition, the proposed schemes were built on the relative monotone access structures over attributes where the ac- leakage model, and implicitly used an update algorithm to cess structures can be a boolean formula involving AND, tolerate continual leakage on the private keys. In the secu- OR, NOT, and threshold operations was proposed by R. rity proof, we use the dual system encryption [20], where Ostrovsky et al. [16]. Lewko et al. [12] first put forward we extend the semi-functional keys into two types: truly a CP-ABE scheme and a KP-ABE scheme where the semi-functional and nominally semi-functional. Normal access structures were monotone span program (MSP). keys and nominally semi-functional keys can decrypt nor- Both schemes are proven to be fully secure under Deci- mal ciphertexts and semi-functional ciphertexts, but truly sional Subgroup assumptions in the standard model over semi-functional keys cannot decrypt the challenge semi- composite order bilinear group. Subsequently, Okamoto functional ciphertexts. In addition, the method to prove and Takashima [15] brought forward a KP-ABE and a the indistinguishability of nominally semi-functional and CP-ABE for non-monotone access structures which were truly semi-functional is similar to [30], so we omitted it shown to be fully secure under a standard assumption, in the paper. The access structure used in the first con- the Decisional Linear (DLIN) assumption, in the standard struction is constructed by minimal sets with multi-valued model over prime order bilinear group. Then Waters [19] attributes which provides the ability to fast decryption. proposed three efficient selectively secure CP-ABE con- structions by employing MSP to express access structure in the standard model under Decisional Parallel Bilinear 1.3 Organizations Diffie-Hellman Exponent (PBDHE) assumption. Lately, The rest of paper is organized as follows. In Section 2, Attrapadung et al. [3] introduced a KP-ABE scheme for some preliminaries are given. Section 3 gives the defini- non-monotone access structures with constant size cipher- tion of leakage resilience of ABE. The security definition texts, which was selectively secure under Decisional q- is presented in Section 4. The construction of scheme and parallel Bilinear Diffie-Hellman Exponent (q-DBDHE) as- anonymity, performance, and efficiency analysis are given sumption in the standard model over prime order bilin- in Section 5. And security proof is introduced in Section 6. ear group. This scheme also adopted the method of Os- In Section 7, we give an anonymous leakage-resilient CP- trovsky et al. [16] to convert the non-monotone access ABE scheme for non-monotone access structures. Finally, structures to monotone access structures with negative we conclude this paper in Section 8. attributes. Based on the fact that there are some mono- tone access structures for which the size of MSP is at least the number of minimal sets, while the number of minimal 2 Preliminaries sets is constant. Pandit and Barua [17] put forward an ABE which used minimal sets to describe general access 2.1 Notations structures. They also constructed a corresponding hier- archical (H)KP-ABE scheme. All of the schemes achieve 1) Angle brackets h·, ·i denotes two vectors inner prod- full security in the standard model over composite order uct, and parentheses (·, ·, ·) denotes vectors. The dot International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 765

product of vectors is denoted by ‘·’ and component- $ $ Assumption 2. Pick g1,U1 ←− Gp1 ,U2,W2 ←− Gp2 , wise multiplication is denoted by ‘∗’. $ $ $ $ g3,W3 ←− Gp3 , g4 ←− Gp4 ,T1 ←− Gp1p2p3 ,T2 ←− Gp1p3 2) The fact that χ is picked uniformly at random from a and set E = (G, g1, g3, g4,U1U2,W2W3). Define the $ advantage of an algorithm A in breaking Assumption finite set Ω is denoted by χ ←− Ω, and that all ψ, ω, ζ 2 to be are picked independently and uniformly at random $ 2 from Ω is denoted by ψ, ω, ζ ←− Ω. AdvA(λ) = |P r[A(E,T1) = 1] − P r[A(E,T2) = 1]| (2) ~ρ 3) Let ~ρ = (ρ1, ρ2, ..., ρn), ~σ = (σ1, σ2, ..., σn), g denote We say that Assumption 2 holds if for all PPT algo- ~ρ ρ ρ ρ the vector of group element g = (g 1 , g 2 , ..., g n ), 2 rithm A, AdvA(λ) ≤ negl(λ) holds for the security the inner product vectors ~ρ and vector ~σ is denoted λ. by h~ρ,~σi and the bilinear group inner product is denoted bye ˆ (g~ρ, g~σ). i.e., h~ρ,~σi = P ρ σ , and $ $ n i∈[n] i i Assumption 3. Pick α, s, r ←− ZN , g1 ←− ~ρ ~σ Q ρi σi h~ρ,~σi $ $ $ eˆn(g , g ) = eˆ(g , g ) =e ˆ(g, g) . i∈[n] Gp1 , g4 ←− Gp4 , U2,W2, g2 ←− Gp2 , g3 ←− αs $ Gp3 ,T1 =e ˆ(g1, g1) ,T2 ←− GT , and set r r α s 4) A negligible function of λ is denoted by negl(λ). E = (G, g1, g2, g3, g4, g2,U2 , g1 U2, g1W2). De- fine the advantage of an algorithm A in breaking 5) {1, 2, ..., n} is denoted by [n]. Assumption 3 to be

3 AdvA(λ) = |P r[A(E,T1) = 1] − P r[A(E,T2) = 1]| 2.2 Minimal Set and Its Critical Set (3) Definition 1. Let Γ be a monotonic access structure over We say that Assumption 3 holds if for all PPT algo- 3 the set of attributes V = {a1, a2, ..., an}. If ∀A ∈ Γ\{B} rithm A, AdvA(λ) ≤ negl(λ) holds for the security λ. where B ∈ Γ, we have A ⊂ B invalid, then B is called $ $ a minimal authorized set. The collection of all minimal Assumption 4. Pick s,r, ˆ sˆ ←− ZN , g1, U1 ←− Gp1 , g4, sets in Γ is called the basis of Γ. $ $ $ $ U4 ←− Gp4 ,U2,W2, g2 ←− Gp2 , g3 ←− Gp3 ,W24,D24 ←− $ s $ Definition 2. (Dual of access structure) If V \A = Gp2p4 ,T1 ←− U1 D24,T2 ←− Gp1p2p4 , and set E = (G, c rˆ rˆ s sˆ A 6∈ Γ where A ⊂ V , then the collection of sets A is g1, g2, g3, g4, U1U4, U1 U2, g1W2, g1W24, U1g3). De- composed of the dual of access structure Γ⊥ of an access fine the advantage of an algorithm A in breaking As- structure Γ over V . sumption 4 to be

4 Definition 3. (Critical set of minimal sets) If every AdvA(λ) = |P r[A(E,T1) = 1] − P r[A(E,T2) = 1]| Yi ∈ H contains a set Bi ⊂ Yi, there is |Bi| ≥ 2: (4) We say that Assumption 4 holds if for all PPT algo- The set Bi uniquely determines Yi in the set H. i.e., no 4 rithm A, AdvA(λ) ≤ negl(λ) holds for the security λ. other set in H contains Bi.

∀Z ⊂ B , set S = S T (Y \Z) does not con- i Z Yj ∈H,Yj Z6=∅ j tain any element of B. 3 Leakage Resilience of CP-ABE A CP-ABE scheme with continual leakage model is com- If (I) and (II) hold, where B = {Y1,Y2, ..., Yr} is the set of minimal set of an access structure Γ, and H ∈ B be posed of the following five algorithms: a subset of minimal sets, then H is called a critical set of minimal sets for B. Setup((λ, V, l) −→ (PK,MSK)): The setup algorithm takes a security parameter λ, a description of at- tribute universe set V and a leakage bound l as input. 2.3 Complexity Assumptions It outputs system public keys PK and master secret $ $ $ keys MSK. Assumption 1. Pick g1 ←− Gp1 , g3 ←− Gp3 , g4 ←− $ $ Gp4 , T1 ←− Gp1p4 ,T2 ←− Gp1p2p4 and set E = KeyGen((PK,MSK,S) −→ SKS): The key generation (G, g1, g3, g4). Define the advantage of an algorithm algorithm inputs the public keys PK, master secret A in breaking Assumption 1 to be keys MSK and an attribute set S, returns secret keys SKS. 1 AdvA(λ) = |P r[A(E,T1) = 1] − P r[A(E,T2) = 1]| 0 (1) UpdateUSK((P K, S, SKS) −→ SKS): On input the We say that Assumption 1 holds if for all PPT algo- public keys PK, a set of attributes S and the secret 1 rithm A, AdvA(λ) ≤ negl(λ) holds for the security keys SKS, it outputs re-randomized secret keys 0 λ. SKS. International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 766

Encrypt((PK,M, Γ) −→ CTΓ): The encryption algo- Challenge: A outputs two pairs of message and access rithm takes the public keys PK, a message M, and structure (M0, Γ0), (M1, Γ1) to C, where for every an access structure Γ over the universe of attributes S ∈ R, neither satisfies Γ0 nor satisfies Γ1. With as input, and outputs ciphertexts CTΓ such that the restriction that the length of the message M0 only users whose attribute sets satisfy the access equals to the length of the message M1, C selects structure Γ should be able to extract M. b ∈ {0, 1} randomly and encrypts the message Mb under the access structure Γ , and sends the resulting Decrypt((PK,CT ,SK ) −→ M): The algorithm takes b Γ S ciphertexts to A. the public keys PK, ciphertexts CTΓ and secret keys SKS as input, outputs the message M if and only Phase 2: This phase is the same as Phase 1 with the if the attribute set S of key SKS satisfies the access additional restrictions that reveal queries and update structure Γ. queries can be performed, and cannot execute leakage queries. The attributes of the query do not satisfy the challenge access structure. 4 Security Definition Guess: A outputs a guess b0 ∈ {0, 1} and wins the game For key leakage attacks, we provide a game between an if b0 = b. adversary A and a challenger C to achieve an anony- We say that an anonymous attribute-based encryption mous leakage-resilient ciphertext-policy attribute-based scheme is l leakage-resilient and adaptively secure against encryption scheme. The security parameter and the up- chosen plaintext attacks (ANON-IND-CPA) if for all per bound of leakage are denoted by λ and l respectively. polynomial time adaptive adversaries A, the advantage of Setup: The challenger C runs the setup algorithm to gen- A in the above mentioned game is negligible, where the ANON−IND−CPA erate the public keys and master keys (PK,MSK), advantage of A is defined as AdvA (λ, l) = 0 1 and sends the public keys to the adversary A while |P r[b = b] − 2 |. keeps the master secret keys. At the same time, C cre- ates two initial empty lists: Q = (hd, S, SKS,LSK ), R = (hd, S) to store records, where all records are 5 Anonymous Leakage-Resilient associated with a handle hd and LSK means total CP-ABE for MAS leakage bits. 5.1 Concrete Construction Phase 1: In this stage, A can adaptively perform the following queries: Our scheme relies on a composite order bilinear group where its order is N = p p p p , and p , p , p , p are dis- • Key Generation queries: A submits the at- 1 2 3 4 1 2 3 4 tinct primes. The main system is built in subgroup, tribute set S to C, and C runs the key gen- Gp1 while the subgroup acts as the semi-functional space. eration algorithm to generate the private keys Gp2 The subgroup provides the additional randomness on SK . Challenger sets hd = hd + 1, then adds Gp3 S keys to isolate keys in our hybrid games. will make (hd, S, SK , 0) to the set Q. In this query, C Gp4 S the scheme achieve anonymity. Then we would extend the only gives A hd of the generated keys rather composite order group to multiple dimensional to tolerate than the concrete keys itself. the possible leakage. • Leakage queries: A gives a polynomial-time Let V = {attr1, attr2, ..., attrn} be a set of attributes. computable arbitrary function f : {0, 1}? −→ Each attribute contains ni possible values and vi,j repre- ? {0, 1} with a queried handle hd of the keys to C. sents the jth value of attri, and I ⊂ {1, 2, ..., n} is the C finds the tuple (hd, S, SKS,LSK ) and checks attribute name index. if LSK +|f(SK)| ≤ l established. If this is true, Setup((λ, V, l) −→ (PK,MSK)): The setup algorithm it returns f(SK) to A and updates LSK with takes as input a security parameter λ, the attribute LSK + |f(SK)|. If the check fails, it returns ⊥ to the A. universe description V and a leakage upper bound l. Then the algorithm generates the public keys and • Reveal queries: A gives the handle hd for a spec- master secret keys as follows. Run the bilinear group ified key SKS to C. The C scans Q to find the re- generator to produce Φ = (N = p1p2p3p4, G, GT , eˆ). quested entry and returns the private keys SKS −τ Define negl = p2 as the allowable maximum prob- to A. Then C removes the item from the set Q ability in succeeding in leakage guess and compute and adds it to the set R. ω = d1 + 2τ + l e, where τ is a positive constant. log p2 • Update queries: A issues a key update query for l In practice, ω ≈ d1 + e. Select g1,X1 ∈ p , log p2 G 1 SKS. C searches the record in Q. If it is not g ∈ , g ,X ∈ , α ∈ , ~ρ ∈ ω ran- found, C returns the keys with key generation 3 Gp3 4 4 Gp4 ZN ZN domly. For each i ∈ [n], j ∈ [ni], choose random algorithm and sets LSK = 0. Otherwise, C re- values t ∈ and set the public keys as follows. turns with UpdateUSK algorithm and updates i,j ZN ~ρ the corresponding LSK = 0. PK = (N, g1, g4, g1 , y, Y, Ti,j; ∀i ∈ [n], j ∈ [ni]), International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 767

where 5.2 Correctness

α ti,j y =e ˆ(g1, g1) ,Y = X1X4,Ti,j = g . The correctness can be checked by applying the orthogo-

The master secret keys are nality of Gpi , where i = 1, 2, 3, 4. If the attributes set S satisfies the access structure specified by B, then one can MSK = (X1, g3, α). obtain the below equations hold.

KeyGen((PK,MSK,S) −→ SKS): This algorithm ~ ~ ~ s~ρ d ~σ ~y1 eˆω(C1, K1) =e ˆω(g1 ∗ g4 , g1 ∗ g3 ) takes PK, MSK and an attribute set S = {v1,x1 , 0 s~ρ ~σ v , ..., v 0 0 } as input, where n ≤ n, 1 ≤ x ≤ n 2,x2 n ,xn i i =e ˆω(g1 , g1 ) 0 for each 1 ≤ i ≤ n . Then the algorithm chooses sh~ρ,~σi ω =e ˆ(g1, g1) random values t, y2, y3 ∈ ZN , ~y1, ~σ ∈ ZN , and eˆ(C ,K ) =e ˆ(gsg , gα+h~ρ,~σiXtgy2 ) yi,j ∈ ZN for vi,j ∈ S, and outputs the secret keys 2 2 1 4 1 1 3 αs+sh~ρ,~σi st as follows. =e ˆ(g1, g1) eˆ(g1,X1)

s Y sk t y3 SKS = (S, K~ 1,K2,K3,Ki,j; ∀vi,j ∈ S), eˆ(C3,k,K3) =e ˆ(Y ( Ti,j) Wk, g1g3 ) vi,j ∈Bk in which st Y skt =e ˆ(g1,Y ) eˆ( Ti,j, g1) ~ ~σ ~y1 t y3 K1 = g1 ∗ g3 ,K3 = g1g3 , vi,j ∈Bk y α+h~ρ,~σi t y2 t yi,j Y sk Y t i,j K2 = g1 X1g3 ,Ki,j = Ti,jg3 . eˆ(C4,k, Ki,j) =e ˆ(g1 Vk, Ti,jg3 ) vi,j ∈Bk vi,j ∈Bk 0 UpdateUSK((P K, S, SKS) −→ SKS): The key update Y skt ω =e ˆ(g1, Ti,j) algorithm selects ∆t, ∆y2, ∆y3 ∈ N , ∆~y1, ∆~σ ∈ Z ZN vi,j ∈Bk randomly, picks ∆yi,j ∈ ZN for vi,j ∈ S at random, 0 and outputs a new key SKS: 0 ~ 0 0 0 0 SKS = (S, K1,K2,K3,Ki,j; ∀vi,j ∈ S), 5.3 Anonymity Analysis where This section will show that the proposed scheme achieves ~ 0 ~ ∆~σ ∆~y1 0 ∆t ∆y3 K1 = K1 ∗ g1 ∗ g3 ,K3 = K3g1 g3 , the anonymity over the composite order bilinear group.

0 h~ρ,∆~σi ∆t ∆y2 0 ∆t ∆yi,j Compared with scheme [30], our scheme adds some K2 = K2g1 X1 g3 ,Ki,j = Ki,jTi,j g3 . random elements in Gp4 to each part of the cipher- texts. These random elements will not make an ef- Encrypt((PK,M, Γ) −→ CT ): At first, this algorithm Γ fect on the decryption process. However, they are converts the monotonic access structure Γ to the necessary for anonymity of the scheme. Because if set of minimal sets B = {B1,B2, ..., Bm˜ }, where ∗ there is no such elements, for some minimal sets Bk, Bk(k ∈ [m ˜ ]) is a set of attribute values. It selects ~ ω the adversary may determine whether the ciphertext s, s1, s2, ..., sm˜ ∈ ZN , d ∈ ZN ,Wk,Vk ∈ Gp4 (k ∈ component C3,k of the ciphertext C~3 is encrypted un- [m ˜ ]) randomly, then outputs the resulting ciphertexts ∗ der Bk or not. In our scheme, by utilizing the CTΓ and the index IBk ⊂ {1, 2, ..., n}(k ∈ [m ˜ ]) corre- ? Q ∗ DDH-teste ˆ(C3,k, g1) =e ˆ(Y,C2)ˆe( ∗ ∗ Ti,j ,C4,k) sponding to attribute Bk(k ∈ [m ˜ ]). vi,j ∈Bk to determine whether the ciphertext component C3,k ~ ~ ~ ∗ CTΓ = ({IBk }k∈[m ˜ ],C0, C1,C2, C3, C4), is encrypted under the Bk or not. The DDH-test ? Q ∗ eˆ(C3,k, g1) =e ˆ(Y,C2)ˆe( ∗ ∗ Ti,j ,C4,k) is the same where vi,j ∈Bk eˆ(C3,k,g1) ? s s~ρ d~ as Q = 1. The followings are the ~ eˆ(Y,C2)ˆe( v ∈B∗ Ti,j∗ ,C4,k) C0 = My , C1 = g1 ∗ g4 , i,j∗ k detailed analyses. s ~ sk C2 = g1g4, C4 = (C4,k)k∈[m ˜ ] = (g1 Vk)k∈[m ˜ ],

~ s Y sk C3 = {C3,k}k∈[m ˜ ] = {Y ( Ti,j) Wk}k∈[m ˜ ]. s Y sk eˆ(C3,k, g1) =e ˆ(Y ( Ti,j) Wk, g1) vi,j ∈Bk vi,j ∈Bk

s Y sk Decrypt((PK,CTΓ,SKS) −→ M): If the attributes set =e ˆ(Y , g1)ˆe(( Ti,j) , g1)

S satisfies the access structure specified by B, then S vi,j ∈Bk must be a superset of a minimal set in B. Let B ⊂ S Y k =e ˆ(Y, g )seˆ( T , g )sk for some k ∈ [m ˜ ], this algorithm calculates 1 i,j 1 vi,j ∈Bk s C · eˆ (C~ , K~ )ˆe(C ,K ) eˆ(Y,C2) =e ˆ(Y, g1) eˆ(Y, g4) M = 0 ω 1 1 3,k 3 eˆ(C ,K )ˆe(C , Q K ) 2 2 4,k vi,j ∈Bk i,j International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 768

Y Y sk eˆ( Ti,j∗ ,C4,k) =e ˆ( Ti,j∗ , g1 Vk) 1200 v ∗ ∈B∗ v ∗ ∈B∗ i,j k i,j k [30] [21] Y sk =e ˆ( Ti,j∗ , g1 ) 1000 [27] ∗ Ours vi,j∗ ∈Bk

Y sk =e ˆ( Ti,j∗ , g1) 800 ∗ vi,j∗ ∈Bk Q sk 600 eˆ( Ti,j , g1) eˆ(C3,k, g1) vi,j ∈Bk Q = Q s eˆ(Y,C2)ˆe( ∗ Ti,j∗ ,C4,k) eˆ(Y, g4)ˆe( ∗ Ti,j∗ , g1) k vi,j∗ ∈Bk vi,j∗ ∈Bk KeyGen Time (ms) 400 ∗ 0 If Bk = Bk, then vi,j = vi,j∗ for all i, 1 ≤ i ≤ n ≤ n. eˆ(C3,k,g1) 1 200 Therefore Q = . eˆ(Y,C2)ˆe( v ∈B∗ Ti,j∗ ,C4,k) eˆ(Y,g4) i,j∗ k ∗ 0 If Bk 6= Bk, then there exists at last one k , where 0 0 0 10 20 30 40 50 1 ≤ k ≤ n ≤ n such that v 0 6= v 0 ∗ . Without k ,j k ,j Size of attribute set loss of generality, let vk0,j = vk0,j∗ , for all 1 ≤ i ≤ 0 0 Figure 1: KeyGen time with different number of at- n ≤ n except i = k . Then ti,j = ti,j∗ . Therefore sk eˆ(C3,k,g1) eˆ(Tk0,j ,g1) tributes Q = sk . eˆ(Y,C2)ˆe( v ∈B∗ Ti,j∗ ,C4,k) eˆ(Y,g4)ˆe(Tk0,j∗ ,g1) i,j∗ k ∗ ∗ In both cases, Bk = Bk and Bk 6= Bk, the DDH-test 1200 gives a random element of GT so that the adversary will [30] be not able to determine whether the component C of [21] 3,k 1000 Ours ~ ∗ the ciphertext C3 is encrypted under the Bk or not. So the access structure is hidden, and our scheme is anonymous. 800

5.4 Performance Analysis 600 As shown in Table 1, we give the performance compar- 400 isons of schemes [21,27,30] and the proposed scheme in the UpdateUSK Time (ms) access policy, leakage model, support multi-functionality 200 and anonymity. All these schemes are constructed un- der the key leakage model, in which the access structure 0 of [21, 27] is denoted by the linear secret sharing (LSSS), 10 20 30 40 50 while that of [30] and the proposed scheme are repre- Size of attribute set sented by the minimal sets. In addition, it can be found Figure 2: UpdateUSK time with different number of at- that [21,30] do not support anonymity and scheme [21,27] tributes do not support multi-show functionality. However, our scheme achieves the anonymity and attribute multi-show Figure 4 gives the comparison of decryption time with ability simultaneously. different leakage parameters, where the leakage pa- rameter changes from 5 to 25. 5.5 Efficiency Analysis From the analysis of the experimental results, we can see We present the performance evaluation based on our that our scheme has advantages over [27] when the num- DMA implementation prototype. Our experiment is im- ber of attributes is between 10 and 20. While, compared plemented on Pairing-Based Cryptography (PBC) library with [21,30], the proposed scheme spends less time. More- to implement the scheme. We will compare the computa- over, the proposed scheme has obvious advantage over [27] tional efficiency of our scheme with scheme [21,27,30]. In in decryption. In summary, our scheme is quite practical the Figures 1, 2 and 3, the leakage parameter is set to be and efficient. ω = 5. Figure 1 shows the comparison of key generation time 6 Security Proof with different number of attributes, where the num- ber of attribute changes from 10 to 50. In the proof, we generate normal private keys and cipher- texts which are used in the real scheme. Then we generate Figure 2 presents the comparison of update time with semi-functional keys and ciphertexts which are used in the different number of attributes. proofs. They are shown as follows.

Figure 3 provides the comparison of encryption time • KeyGenSF. SKS = (S, K~ 1,K2,K3,Ki,j; ∀vi,j ∈ S) with different number of minimal sets, where the be the normal keys. The semi-functional keys are as number of minimal sets varies from 5 to 25. follows. International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 769

Table 1: Performance analysis Scheme Access policy Leakage Model Multi-show attr Anonymity [21] LSSS Continual leakage No No [30] Minimal Sets Continual leakage Yes No [27] LSSS Bounded leakage Not Yes Ours Minimal Sets Continual leakage Yes Yes

¯ ~ texts are converted as: CTΓ = ({IBk }k∈[m ˜ ],C0, C1 ∗ 3000 ~e1 e2 ~ ~e3 ~ [30] g2 ,C2g2 , C3 ∗ g2 , C4), where ~e1, e2,~e3 are random Ours elements in ZN . 2500 If we use the Type 1 of semi-functional keys to decrypt 2000 a semi-functional ciphertext, we will obtain extra term ~ hd1,~e1i−d2e2+d3e3,k ~ eˆ(g2, g2) . If hd1,~e1i − d2e2 + d3e3,k = 0, 1500 we can call it a nominally semi-functional key, otherwise it is truly semi-functional.

1000 The proof uses a series of indistinguishable games to Encryption Time (ms) prove the indistinguishability between in the Gamereal and Game . There will be 2Q + 4 games between an 500 final1 adversary A and a challenger C, where Q is the number of key queries times and k0 is from 0 to Q. The concrete 0 5 10 15 20 25 definition of games is as follows. The number of minimal sets Game : This is the real anonymous ABE security Figure 3: Encryption time with different number of min- real game, where all private keys and the ciphertexts are imal sets in normal form.

1800 Game0: All private keys are in normal form, and the [30] challenge ciphertexts are in semi-functional form. 1600 [27] Ours Gamek0,1: The challenge ciphertexts are semi- 1400 functional. The first k0 −1 keys are semi-functional of 1200 Type 2, and the k0th key is the semi-functional form

1000 of Type 1. The remaining keys are normal.

800 Gamek0,2: The challenge ciphertexts are semi- functional. The first k0 keys are semi-functional of 600 Decryption Time (ms) Type 2, and the remaining keys are normal. 400 Gamefinal0: All private keys are the Type 2 of semi- 200 functional keys. And the challenge ciphertexts are 0 semi-functional where C0 is random in group GT . 0 5 10 15 20 25 30 35 40 45 50 Size of attribute set Gamefinal1: It is same as Gamefinal0 except that the ~ Figure 4: Decryption time with different number of at- C3 is random in group Gp1p2p4 . tributes We can see that in GameQ,2, all of the keys are semi- functional. And in the last game, the adversary has no ~ ¯ ~ d1 d2 d3 – Type 1: SKS = (S, K1 ∗ g2 ,K2g2 ,K3g2 ,Ki,j advantage. gdi,j ; ∀v ∈ S), where g is a generator of group 2 i,j 2 Lemma 1. Suppose there exists a PPT adversary A ~ p and d1, d2, d3, di,j are random elements in G 2 who can distinguish Gamereal and Game0 with the non- ZN . negligible advantage , then there is a PPT algorithm B ¯ ~ d2 – Type 2: SKS = (S, K1,K2g2 ,K3,Ki,j; ∀vi,j ∈ with the advantage  in breaking Assumption 1. S) Proof. B receives E = (G, g1, g3, g4) from the challenger ~ ~ ~ C and simulates Game or Game with A depending • EncSF. Let CTΓ = ({IB }k∈[m ˜ ],C0, C1,C2, C3, C4) real 0 k $ $ be a normal ciphertext. The semi-functional cipher- on whether T ←− Gp1p4 or T ←− Gp1p2p4 . International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 770

Setup: B takes the security parameter λ and leakage up- generating Type 2 of semi-functional keys. per bound l as input and outputs the description ¯ ~ of group: Φ = (N = p1p2p3p4, G, GT , eˆ). Then SKS = (S, K1,K2,K3,Ki,j; ∀vi,j ∈ S)

B generates the public keys as follows. Set ω = ~σ ~y1 α+h~ρ,~σi t h y2 t y3 l = (S, g1 ∗ g3 , g1 X1(W2W3) g3 , g1g3 , d1 + e. Select α, a, b, ti,j ∈ N randomly, and logp2 Z t yi,j a b ω Ti,jg3 ; ∀vi,j ∈ S) set Y = X1X4 = g1 g4. Choose ~ρ ∈ ZN randomly α ~ρ and generate PK = (N, g1, g4, eˆ(g1, g1) , Y, g ,Ti,j = 1 0 0 ω ti,j And if i = k , B selects ~σ ∈ that satisfies h~σ, ~ρi = g ; ∀i ∈ [n], j ∈ [ni]). ZN 1 0, outputs the private key as follows: Phase 1: B generates normal keys for attribute sets in ¯ ~ the key generation queries (keep in mind that the SKS = (S, K1,K2,K3,Ki,j; ∀vi,j ∈ S) y private keys of A query are either in normal form or = (S, T ~σ ∗ g~y1 , gαT agy2 , T gy3 ,T ti,j g i,j ; in semi-functional form). In addition, B can answer 3 1 3 3 3 the queries of leakage, reveal and update. ∀vi,j ∈ S)

$ If T ←− p p , the private key is a normal key, B sim- Challenge: A sends two equal length message M0,M1 G 1 3 $ 0 and access structures Γ0, Γ1. B selects random ulates the Gamek −1,2. If T ←− Gp1p2p3 , the private b ∈ {0, 1} and encrypts Mb under the access struc- key is a semi-functional key of Type 1. In this case, ture Γb. B encodes the access structure as the B simulates Gamek0,1. ∗ set of minimal sets B = {B1,B2, ..., Bm˜ }, where θ ~ Set the part of T in Gp2 is g2, then d1 = θ~σ, d2 = Bk(k ∈ [m ˜ ]) is a set of attribute values. Select aθ, d3 = θ. In addition, A can ask the oracles of leak random element s1, s2, ..., sm˜ ∈ ZN and generate ¯ and update. the challenge ciphertexts CTΓ = ({IBk }k∈[m ˜ ],C0 = α ~ ~ρ d~ ~ Mbeˆ(g1 ,T ), C1 = T ∗ g4 ,C2 = T g4, C3 = Challenge: It is similar with Lemma 1. Set U1 = ξ a Q sk sk s {T ( T ) W } , C~ = (g V ) ). g1,U2 = g and calculate the ciphertexts vi,j ∈Bk i,j k k∈[m ˜ ] 4 1 k k∈[m ˜ ] 2 Phase 2: It is similar with Phase 1. B can answer the ~ ~ ~ CTΓ = ({IBk }k∈[m ˜ ],C0, C1,C2, C3, C4), queries of reveal and update with the restriction that the attribute sets of adversary queries cannot meet in which the challenge access structure. α ~ρ d~ C0 = Mbeˆ(g ,U1U2), C~1 = (U1U2) ∗ g , Guess: The adversary A outputs a guess b0 ∈ {0, 1}. If 1 4 ~ sk b0 = b, A wins the game. C2 = (U1U2)g4, C4 = (g1 Vk)k∈[m ˜ ]), ~ a Y sk C3 = {(U1U2) ( Ti,j) Wk}k∈[m ˜ ]. $ c If T ←− Gp1p2p4 , set T as g2. It implicitly sets the vi,j ∈Bk semi-functional factor of the challenge ciphertexts as ~ (c~ρ,c, ac1, 0). In this situation, B simulates the Game0. Phase 2: It is similar with Phase 1. B can answer the Otherwise, B simulates the Gamereal. queries of reveal and update with the restriction that So if A can distinguish two games with a non-negligible the attribute sets corresponding to any private key of advantage , B can use the algorithm to break the As- the query cannot satisfy the challenge access struc- sumption 1 with the same advantage. ture. Lemma 2. Suppose there exists a PPT adversary A Guess: The adversary A outputs a guess b0 ∈ {0, 1}. If can distinguish Game 0 and Game 0 with the non- k −1,2 k ,1 b0 = b, A wins the game. negligible advantage , then is a PPT algorithm B with the advantage  in breaking the Assumption 2. Based on the above descriptions, we obtain the semi-functional factor of the challenge ciphertexts is Proof B receives E = (G, g1, g3, g4,U1U2,W2W3) and ~ (ξ~ρ,ξ, aξ~1, 0). And the equation hd1,~e1i − d2e2 + d3e3,k = simulates Gamek0−1,2 or Gamek0,1 with A depending on 0th $ $ 0 holds. If the attributes of k keys satisfy the chal- whether T ←− Gp1p2p3 or T ←− Gp1p3 . lenge access structure, it is a nominally semi-functional Setup: It is same as Lemma 1. key. Following the Lemma 5 in [30], the leakage of the key can not help adversary detect the k0th private key is Phase 1: Because B knows the master keys, it can an- normal or semi-functional. swer all private key queries. Lemma 3. Suppose there exists a PPT adversary A If i0 > k0, B generates normal keys. can distinguish Gamek0,1 and Gamek0,2 with the non- 0 0 If i < k , B selects t, h, y2, y3 ∈ ZN and ~σ, ~y1 ∈ negligible advantage , then there is a PPT algorithm B ω ZN randomly, and picks yi,j ∈ ZN at random for with the advantage  in breaking the Assumption 2. International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 771

Proof. Unlike construction of Lemma 2, the construction Proof. B receives the instance E = (G, g1, g2, g3, g4,U1 0th rˆ rˆ s sˆ of k key is as follows. U4,U1 U2, g1W2, g1W24,U1g3) and simulates the Gamefinal0 or Gamefinal1. SK¯ S =(S, K~ 1,K2,K3,Ki,j; ∀vi,j ∈ S) y ω ~σ ~y1 α a y2 d y3 ti,j i,j Setup: B selects ti,j, α ∈ ZN , ~ρ ∈ ZN randomly, and =(S, T ∗ g3 , g1 T g3 (W2W3) , T g3 ,T g3 ; sets PK = (N, g , g , eˆ(g , g )α,Y = U U , g~ρ,T = ∀v ∈ S). 1 4 1 1 1 4 1 i,j i,j ti,j g ; ∀i ∈ [n], j ∈ [ni]).

ω Phase 1: B selects ~y1, ~σ ∈ ZN , t, y2, y3 ∈ ZN , yi,j ∈ ZN $ at random, calculates and outputs the secret keys as If T ←− Gp1p3 , this private key is a Type 2 of semi- $ follows. SK¯ S = (S, K~ 1,K2, K3,Ki,j; ∀vi,j ∈ S) = functional key. Then B simulates the Gamek0,2. If T ←− ~σ ~y1 α+h~ρ,~σi sˆ t y2 t y3 t yi,j (S, g ∗ g , g (U1g ) g2g , g g ,T g ; Gp1p2p3 , this private key is the Type 1 of semi-functional 1 3 1 3 3 1 3 i,j 3 key. In this case, B simulates Gamek0,1. So if A can ∀vi,j ∈ S). distinguish these two games with the advantage , B can In addition, A can ask the oracles of leak and update. break the Assumption 2 with the same advantage. Challenge: B generates the challenge ciphertexts as fol- Lemma 4. Suppose there is a PPT adversary A can dis- $ ~ s ~ρ d~ s lows: C0 ←− GT , C1 = (g1W24) ∗ g4 ,C2 = (g1W24) · tinguish GameQ,2 and Gamefinal0 with the non-negligible Q sk sk g4, C~3 = {T ( Ti,j) Wk} , C~4 = (g Vk) advantage , then there exists a PPT algorithm B with vi,j ∈Bk k∈[m ˜ ] 1 advantage  in breaking Assumption 3. k∈[m ˜ ]. r Phase 2: It is similar with Phase 1. B can answer the Proof. B receives the instance E = (G, g1, g2, g3, g4, g2, r α s queries of reveal and update with the restriction that U2 , g1 U2, g1W2), simulates GameQ,2 or Gamefinal0. the attributes sets of the adversary queries cannot Setup: B selects a, b, ti,j ∈ ZN , and sets Y = X1X4 = meet the challenge access structure. gagb. Select ~ρ ∈ ω randomly and generate the 1 4 ZN 0 α ~ρ ti,j Guess: The adversary A outputs a guess b ∈ {0, 1}. If PK = (N, g1, g4, eˆ(g U2, g1), Y, g ,Ti,j = g ; ∀i ∈ 1 1 b0 = b, A wins this game. [n], j ∈ [ni]).

Phase 1: All of the keys generated are the Type 2 of $ s If T ←− U1 D24, it is a semi-functional ciphertext, and B semi-functional keys. B selects ~y , ~σ ∈ ω , t, y , y ∈ $ 1 ZN 2 3 simulates the Game . If T ←− , it is a ran- , y ∈ randomly, and outputs the secret keys final0 Gp1p2p4 ZN i,j ZN dom element, and B simulates the Game . These ¯ ~ ~σ final1 SKS = (S, K1,K2,K3,Ki,j; ∀vi,j ∈ S) = (S, g1 ∗ two games are indistinguishable. ~y1 α t h~ρ,~σi y2 t y3 t yi,j g3 , (g1 U2)X1g1 g3 , g1g3 ,Ti,jg3 ; ∀vi,j ∈ S). Theorem 1. If the Assumptions 1, 2, 3 and 4 hold and In addition, A can ask the oracles of leak and update. for l = (ω − 1 − 2τ) where τ is a positive constant, then Challenge: Similar to Lemma 1, B calculates the chal- the proposed scheme is anonymous and l leakage-resilient. lenge ciphertexts CTΓ = ({IB } ,C0 = MbT, C~1 k k∈[m ˜ ] Proof. Suppose the Assumption 1, 2, 3 and 4 hold. We s ~ρ d~ s ~ s a = (g1U2) ∗ g4 ,C2 = (g1U2)g4, C3 = {(g1U2) · can learn from the lemma 1 to lemma 5 that the adver- Q sk sk ( Ti,j) Wk} , C~4 = (g Vk) ). vi,j ∈Bk k∈[m ˜ ] 1 k∈[m ˜ ] sary A can not distinguish between the Gamereal and Phase 2: It is similar with Phase 1. B can answer the Gamefinal1. So the value of b is hidden from the A. The queries of reveal and update with the restriction that upper bound of the leakage between consecutive updates the attribute sets corresponding to any private key of is l, so it is anonymous and l leakage-resilient. the query cannot satisfy the challenge access struc- ture. Guess: The adversary A outputs a guess b0 ∈ {0, 1}. If 7 Anonymous Leakage-Resilient b0 = b, A wins the game. CP-ABE for Non-MAS

αs Obviously, we can learn that if T =e ˆ(g1, g1) , it is In this section, we give the construction of the anony- a semi-functional ciphertext of message Mb. Otherwise mous leakage-resilient CP-ABE for non-monotone access $ T ←− GT , it is a random element of GT . So if the adver- structures. In this scheme, a non-monotone access struc- sary A can distinguish these two games, B can break the ture is represented by the set of authorized sets in the assumption 3 with the same advantage. non-monotone access structure.

Lemma 5. Suppose there exists a PPT adversary A Setup((λ, V, l) −→ (PK,MSK)): Similar to Section 5, can distinguish Gamefinal0 and Gamefinal1 with the non- the setup algorithm sets the public keys as negligible advantage , then there is a PPT algorithm B ~ρ with the advantage  in breaking the Assumption 4. PK = (N, g1, g4, g1 , y, Y, Ti,j; ∀i ∈ [n], j ∈ [ni]), International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 772

Table 2: Adversaries gain advantages over two consecutive games Adjacent games Adversary gain advantage differences Related lemmas Gamereal Game0 Gamereal and Game0 |AdvA − AdvA | ≤  Lemma 1 Gamek0−1,2 Gamek0,1 Gamek0−1,2 and Gamek0,1 |AdvA − AdvA | ≤  Lemma 2 Gamek0,1 Gamek0,2 Gamek0,1 and Gamek0,2 |AdvA − AdvA | ≤  Lemma 3 GameQ,2 Gamefinal0 GameQ,2 and Gamefinal0 |AdvA − AdvA | ≤  Lemma 4 Gamefinal0 Gamefinal1 Gamefinal0 and Gamefinal1 |AdvA − AdvA | ≤  Lemma 5

where where s ~ s~ρ d~ α ti,j C0 = My , C1 = g ∗ g , y = e(g1, g1) ,Y = X1X4,Ti,j = g . 1 4 s ~ sk C2 = g g4, C4 = (C4,k)k∈[m ˜ ] = (g Vk)k∈[m ˜ ], The master secret keys are 1 1 ~ s Y sk C3 = {C3,k}k∈[m ˜ ] = {Y ( Ti,j) Wk}k∈[m ˜ ].

MSK = (X1, g3, α). vi,j ∈Bk

Finally the algorithm publishs PK and keeps MSK. Decrypt((PK,CTΓ,SKS) −→ M): If the attribute set S satisfies the non-monotone access structure Γ, then KeyGen((PK,MSK,S) −→ SKS): On input pub- S ∈ Γ, i.e., S = Bk(k ∈ [m ˜ ]) for Bk ∈ Γ. This lic keys PK, the master keys MSK, an at- algorithm calculates tribute set S = {v , v , ··· , v 0 0 }, where 1,x1 2,x2 n ,xn ~ ~ 0 0 C0 · eˆω(C1, K1)ˆe(C3,k,K3) n ≤ n, 1 ≤ xi ≤ ni for each 1 ≤ i ≤ n , this M = . ω eˆ(C2,K2)ˆe(C4,k,K4) algorithm selects t, y2, y3, y4 ∈ ZN , ~y1, ~σ ∈ ZN , calculates and outputs the secret keys as follows. Theorem 2. If Assumptions 1, 2, 3 and 4 hold, then the proposed scheme is anonymous and l leakage resilience. SKS = (S, K~ 1,K2,K3,K4), Proof. The security proof of anonymous leakage-resilient in which CP-ABE scheme for non-MAS can be derived from the proof of the scheme for MAS with minor modification. ~ ~σ ~y1 t y3 K1 = g1 ∗ g3 ,K3 = g1g3 , The minor modification is that in the simulation of key α+h~ρ,~σi y Y y t 2 t 4 components for each vi,j ∈ S is just multiplied to get a K2 = g1 X1g3 ,K4 = ( Ti,j) g3 . single key component K = Q K . vi,j ∈S 4 vi,j ∈S i,j

UpdateUSK((P K, S, SK ) −→ SK0 ): The update algo- S S 8 Conclusion rithm selects ∆t, ∆y2, ∆y3, ∆y4 ∈ ZN , ∆~y1, ∆~σ ∈ ω 0 ZN at random, and outputs a new key SKS: In this paper, an anonymous leakage-resilient CP-ABE scheme for monotone access structures is proposed at 0 ~ 0 0 0 0 SKS = (S, K1,K2,K3,K4), first, in which the access structure is converted as min- where imal sets that can provide fast decryption. By using sim- ilar ideas, we present an anonymous leakage-resilient CP- ~ 0 ~ ∆~σ ∆~y1 0 ∆t ∆y3 ABE scheme with the constant size ciphertexts for non- K1 = K1 ∗ g1 ∗ g3 ,K3 = K3g1 g3 , monotone access structures. Both schemes are proven to 0 h~ρ,∆~σi ∆t ∆y2 0 Y ∆t ∆y4 K2 = K2g1 X1 g3 ,K4 = K4( Ti,j) g3 . be adaptively secure in the standard model under four vi,j ∈S static assumptions over composite order bilinear group and can tolerate continual leakage on the private keys Encrypt((PK,M, Γ) −→ CTΓ): Let Γ be a non- when a update algorithm is implicitly employed to peri- monotonic access structure where Γ = {B1,B2, ..., odically update the private keys. However, our schemes Bm˜ } and Bk(k ∈ [m ˜ ]) is a set of attribute values and cannot achieve the optimal leakage rate to ensure the effi- m˜ is the size of the non-monotone access structure Γ. ciency, so designing an efficient ABE scheme with optimal ~ ω This algorithm selects s, s1, s2, ..., sm˜ ∈ ZN , d ∈ ZN , leakage rate will be our future work. Wk,Vk ∈ Gp4 (k ∈ [m ˜ ]) at random and out- puts the ciphertexts CTΓ and the index

IBk ⊂ {1, 2, ..., n}(k ∈ [m ˜ ]) corresponding to Acknowledgments attribute Bk(k ∈ [m ˜ ]). This work was supported in part by the International S&T ~ ~ ~ CTΓ = ({IBk }k∈[m ˜ ],C0, C1,C2, C3, C4), Cooperation Program of Shaanxi Province No. 2019KW- International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 773

056, the National Cryptography Development Fund under [14] S. Micali and L. Reyzin, “Physically observable cryp- grant (MMJJ20180209). tography,” in Theory of Cryptography, pp. 278–296, 2004. [15] T. Okamoto and K. Takashima, “Fully secure func- References tional encyption with general relations from the deci- sional linear assumption,” Crypto, vol. 6223, pp. 191– [1] J. Alwen, Y. Dodis, M. Naor, G. Segev, S. Wal- 208, 2010. fish, and D. Wichs, “Public-key encryption in the [16] R. Ostrovsky, A. Sahai, and B. Waters, “Attribute- bounded-retrieval model,” Lecture Notes in Com- based encryption with non-monotonic access str- puter Science, vol. 2009, no. 5, pp. 113–134, 2010. tures,” in Proceedings of the 14th ACM Confer- [2] J. Alwen, Y. Dodis, and D. Wichs, “Leakage-resilient ence on Computer and Communications Security public-key cryptography in the bounded-retrieval (CCS’07), pp. 195–203, 2007. model,” in Advances in Cryptology, pp. 36–54, 2009. [17] T. Pandit and R. Barua, “Efficient fully secure [3] N. Attrapadug, B. Libert, and E. D. Panafieu, “Ex- attribute-based encryption schemes for general ac- pressive key-policy attribute-based encryption with cess structures,” in Provable Security, pp. 193–214, constant-size ciphertexts,” in Public Key Cryptogra- 2012. phy (PKC’11), pp. 90–108, 2011. [18] A. Sahai and B. Waters, “Fuzzy identity-based en- [4] J. Bethencourt, A. Sahai, and B. Waters, cryption,” in Advances in Cryptology, pp. 457–473, “Ciphertext-policy attribute-based encryption,” 2005. in IEEE Symposium on Security and Privacy, pp. 321–334, 2007. [19] B. Waters, “Ciphertext-policy attribute-based en- [5] Z. Cao, L. Liu, and Z. Guo, “Ruminations on cryption: An expressive, efficient, and provably seure attribute-based encryption,” International Journal realization,” Lecture Notes in Computer Science, of Electronics and Information Engineering, vol. 8, vol. 2008, pp. 321–334, 2008. no. 1, pp. 9–19, 2018. [20] B. Waters, “Dual system encryption: Realizing fully [6] A. D. Caro, V. Iovino, and G. Persiano, “Fully secure secure ibe and hibe under simple assumptions,” in anonymous hibe and secret-key anonymous ibe with International Cryptology Conference on Advances in short ciphertexts,” in International Conference on Cryptology, pp. 619–636, 2009. Pairing-Based Cryptography, pp. 347–366, 2010. [21] Q. Yu and J. Li, “Continuous leakage resilient [7] P. S. Chung, C. W. Liu, and M. S. Hwang, “A study ciphertext-policy attribute-based encryption sup- of attribute-based proxy re-encryption scheme in porting attribute revocation,” Computer Engineer- cloud environments,” International Journal of Net- ing and Applications, vol. 52, no. 20, pp. 29–38, 2016. work Security, vol. 16, no. 1, pp. 1–13, 2014. [22] T. H. Yuen, S. S. M. Chow, Y. Zhang, and S. M. [8] Y. Dodis, Y. T. Kalai, and S. Lovett, “On cryptogra- Yiu, “Identity-based encryption resilient to contin- phy with auxiliary input,” in Proceedings of the 41st ual auxiliary leakage,” in International Conference Annual ACM symposium on Theory of Computing, on Theory and Applications of Cryptographic Tech- pp. 621–630, 2009. niques, pp. 117–134, 2012. [9] V. Goyal, O. Pandey, A. Sahai, and B. Waters, [23] L. Zhang, Y. Cui, and Y. Mu, “Improving privacy- “Attribute-based encryption for fine-grained access preserving CP-ABE with hidden access policy,” in control of encrypted data,” in ACM Conference on The 4th International Conference on Cloud Comput- Computer and Communications Security, pp. 89–98, ing and Security, pp. 596–605, 2018. 2006. [24] L. Zhang, G. Hu, Y. Mu, and F. Rezaeibagha, [10] A. Kapadia, P. Tsang, and S. W. Smith, “Attribute- “Hidden ciphertext policy attribute-based encryp- based publishing with hidden credentials and hidden tion with fast decryption for personal health record policies,” NDSS, vol. 7, pp. 179–192, 2007. system,” IEEE Access, no. 7, pp. 33202–33213, 2019. [11] J. Katz, A. Sahai, and B. Waters, “Predicate en- [25] L. Zhang and Y. Shang, “Leakage-resilient attribute- cryption supporting disjunctions, polynomial equa- based encryption with CCA2 security,” International tions, and inner products,” in Advances in Cryptol- Journal of Network Security, vol. 22, no. 6, pp. 1–9, ogy, pp. 146–162, 2008. 2019. [12] A. Lewko, T. Okamoto, A. Sahai, K. Takashima, [26] L. Zhang and H. Yin, “Recipient anonymous and B. Waters, “Fully secure functional encryption: ciphertext-policy attribute-based broadcast encryp- Attribute-based encryption and (hierarchical) inner tion,” International Journal of Network Security, product encryption,” in International Conference on vol. 20, no. 1, pp. 168–176, 2018. Theory and Application of Cryptographic Techniques, [27] L. Zhang and J. Zhang, “Anonymous CP-ABE pp. 62–91, 2010. against side-channel attacks in cloud computing,” [13] L. Liu, Z. Cao, and C. Mao, “A note on one out- Journal of Information Science and Engineering, sourcing scheme for big data access control in cloud,” vol. 33, no. 3, pp. 789–805, 2017. International Journal of Electronics and Information [28] L. Zhang, J. Zhang, and Y. Hu, “Attribute-based en- Engineering, vol. 9, no. 1, pp. 29–35, 2018. cryption resilient to continual auxiliary leakage with International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 774

constant size ciphertexts,” Journal of China Uni- Biography versities of Posts and Telecommunications, vol. 23, no. 3, pp. 18–28, 2016. Xiaoxu Gao is a master degree student in the school of [29] L. Zhang, J. Zhang, and Y. Mu, “Novel leakage- mathematics and statistics, Xidian University. Her re- resilient attribute-based encryption from hash proof search interests focus on computer and network security. system,” Computer Journal, vol. 60, no. 4, pp. 541– Leyou Zhang is a professor in the school of mathemat- 554, 2017. ics and statistics at Xidian University, Xi’an China. He [30] M. Zhang, W. Shi, C. Wang, Z. Chen, and Y. Mu, received his PhD from Xidian University in 2009. From “Leakage-resilient attribute-based encryption with Dec. 2013 to Dec. 2014, he is a research fellow in the fast decryption: Models, analysis and constructions,” school of computer science and software engineering at in International Conference on Information Security the University of Wollongong. His current research in- Practice and Experience, pp. 75–90, 2013. terests include network security, computer security, and cryptography. International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 775

A Distributed Density-based Outlier Detection Algorithm on Big Data

Lin Mei1,2 and Fengli Zhang1 (Corresponding author: Fengli Zhang)

School of information and software engineering, University of Electronic Science and Technology of China1 Chengdu, China School of Computer Science and Technology, Southwest Minzu University2 Chengdu, China (Email: [email protected]) (Received Mar. 11, 2019; Revised and Accepted Nov. 6, 2019; First Online Jan. 29, 2020)

Abstract mal patterns. Therefore, it has become one of hotspot directions in data mining, recently. As one of popular issues in data mining area, outlier de- Hawkins who has addressed that an outlier is an ob- tection aims to find the objects which show abnormal be- servation that deviates obviously from other observations haviors from original datasets’ distribution, and it can as to arouse suspicion that it was generated by a differ- be applied in various applications such as bank fraud, ent mechanism [9]. In recent years, there are many studies network intrusion detection, system health monitoring, about outlier detection. For example, distance-based out- medical care, public safety and security, and etc. Re- lier detection [13] and density-based outlier detection [5] cently, the density-based outlier detection has been pro- are well representative works in traditional outlier detec- posed which is the highly efficient and significant method tion methods. However, most of them only consider the for outlier detection processing. It adopts the relative centralized environments or single node processing. With density of an object to indicate the degree of an object the increasing scale of big data, the performance of these is an outlier compared with its neighbors. Specifically, proposed methods cannot satisfy computing requirements it aims at computing the Local Outlier Factor (LOF). In of users. For instance, in the area of credit card fraud this paper, we propose a novel distributed density-based detection, we obtain the users’ trade information as a outlier detection method for large-scale data processing, dataset. If the credit card is stolen, its transaction pat- namely IGBP. First, we split the data space into several tern usually changes dramatically, especially that the lo- grids and then allocates these grids into the data nodes cations of transactions and the purchased items are often with greedy algorithm in a distributed environment. Be- unusual for the authentic card owner. Therefore, we de- sides, we propose a distributed LOF computing method fine these abnormal transaction records as outliers. Then, with KD-tree for detecting density-based outliers in par- the techniques of the outlier detection can help us to iden- allel way. The validity of the proposed approaches is fi- tify outliers to find out whether user accounts have been nally verified by experiments, Experimental results which theft. And it can avoid the property damage. Besides, demonstrate that our proposed method outperform the Outlier detection technology also can used to detect the baselines. cheating of game bots in Massively Multiplayer Online Keywords: Density-based Outlier; Distributed Algorithm; Role-playing Games [17]. Greedy Algorithm; Kd-Tree; LOF Fortunately, there are some recent studies which at- tempt to utilize distributed computing environment to speed up the computation, and there are several related 1 Introduction methods [10, 11, 14] for distributed outlier detection were proposed. For example, E. Lozano and E.Acufia [16] pro- With the development of big data techniques in large scale pose a master-slave architecture for distributed comput- data processing, outlier detection is one of importance but ing. More specifically, each slave node computes its neigh- complex tasks in data mining area, it can be widely used borhood set and sends it to the master node. And the in various applications such as bank fraud, network in- master node will collect all the partial neighborhood sets trusion detection, system health monitoring, medical care and compute LOFs of all the tuples. However, a large and public security protection, and etc. Outlier detection number of tuples in proposed method are aggregated to can help us to discover valuable knowledge and abnor- the master node, and then lots of calculations are needed International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 776 to conduct the result. Thus, the master node will be the present a review about statistical approaches for out- bottleneck when the data scale is increasing. And a high- lier detection. Especially in [19], they present a review performance master node is necessary. Recently, Mei Bai about neural network based approaches for outlier detec- et al. adopt a coordinator and a number of datanodes for tion. Fujimaki et al. present a semi-supervised outlier outlier detection problem [3]. The coordinator is respon- detection method by using a set of labeled “normal” ob- sible for the overall scheduling. Each datanode stores sev- jects [7]. Dasgupta and Majumdar also propose a semi- eral data subsets in grids, and calculates LOFs in the data supervised method [6] for outlier detection task. Subse- subsets. The coordinator in this frame only takes charge quently, distance-based outliers was developed by Knorr of the scheduling and all the actual computations are allo- and Ng [13]. And the index-based, nested loop–based cated to the datanodes. Thus, the coordinator would not and grid-based approaches are well explored to speed up be a bottleneck if the data scale is large. In order to reduce distance-based outlier detection [12, 13]. the network overhead, their grid allocation algorithm allo- Other proximity-based approach is the density-based cates the adjacent grids to the same datanodes. However, outlier detection [5], In order to detect a tuple whether their proposed algorithm only need a small quantity of is an outlier or not, a local outlier factor (LOF) which network communications between pairs of datanotes and defines in [5] represents the degree of this tuple to be an the average numbers of tuples in each grid are likely to outlier is assigned to each tuple [3]. LOF is based on be very different. Hence, to address these limitations, we a concept of a local density, where locality is given by focus on improve the computational complexity which is k nearest neighbors, whose distance is used to estimate the greatest bottleneck of this issue. the density. By comparing the local density of a tuple In this paper, we aim to model density-based outlier to the local densities of its neighbors, one can identify detection in a distributed manner. The general idea is the regions of similar density, and tuples that have a sub- that we attempt to compare the density around an object stantially lower density than their neighbors. These are with the density around its local neighbors. The basic considered to be outliers [8]. Besides, the HilOut algo- assumption of density-based outlier detection method is rithm was proposed by Angiulli and Pizzuti [2]. Aggarwal that the density around a non-outlier object is similar to and Yu [1] develop the sparsity coefficient-based subspace the density around its neighbors, but the density around outlier detection method. Kriegel et al. proposed angle- an outlier object is significantly different from the density based outlier detection [15]. around its neighbors [8]. Through our sufficient analysis, we discover that workload and network communications in computing architecture will increase while the scale of data is increasing. But the increased speed of workload is higher than network communication. Besides, if the 2.2 Outlier Detection in Distributed En- dimensionality of data increases, the performance will be vironments much better. Therefore, we propose an improved algo- rithm in this paper which aims at detecting density-based outliers in distributed environments efficiently compared Lozano and Acufia propose a distributed algorithm to to [3]. Moreover, our experiments are implemented to compute density-based outliers [16]. However, owing to demonstrate the effectiveness and high efficiency. We will all the tuples are transferred to the master node, the work- show the detailed description of algorithm in Section 4. load on the master node is quite heavy. Thus, this method The rest of this paper is organized as follows. In Sec- is unable to achieve good performance when the data scale tion 2, we will overview the related works. Section 3 is huge. states the problem of density-based outlier detection in Mei et al. adopt a coordinator and a number of datan- a distributed environment. Section 4 detailedly presents odes [3]. And the coordinator is responsible for the overall our improved algorithm. Section 5 gives the experimental scheduling. Each datanode stores several data subsets in results. In the end, we conclude this paper in Section 6. grids, and calculates LOFs in the data subsets. After com- parison, the coordinator in this frame is only in charge of the scheduling and all the actual computations are allo- 2 Related Work cated to the datanodes. Hence, the coordinator would not be a bottleneck if the data scale is huge. In order to re- We first make briefly overview of outlier detection in Sec- duce the network overhead, their gird allocation algorithm tion 2.1. Then the previous methods of distributed outlier allocates the adjacent grids to the same datanodes. How- computing are described in Section 2.2. ever, in fact, their algorithm only need a small amount of network communications between pairs of datanotes. In 2.1 Outlier Detection addition, the time and space complexity of LOF are very high. In this paper, we present an improved algorithm Hawkins firstly give a definition about outliers in 1980 [9]. based on theirs to efficiently detect density-based outliers Afterwards, Beckman and Cook [4] present more im- in distributed environments. Moreover, our experiments proved definition and survey. Markou and Singh [18, 19] are implemented to demonstrate the validity. International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 777

3 Preliminaries C 3.1 Problem Formalism D we will show some definitions to better solve our problem and they will be applied in our proposed method. Rdk(A,B)= k-distance (A) Local density: It is estimated by the typical distance B at which a tuple can be reached from its neighbors. And distance is usually adopted in LOF, which is an G additional measure to produce more stable results Rdk(A,G)=d(A,G) within clusters. A Reachability distance: Let d(A, B) be the distance be- tween A and B, k-distance(A) be the distance of the object A to the k-th nearest neighbor. To simplify F the description, we define the distance as Euclidean E distance in the rest of this paper similar to [21]. Hence, 1) There are at least k tuples A0 such that Figure 1: k-distance neighborhood and reachability dis- d(A, A0) ≤ d(A, B). tance when k=3 2) There are at most k − 1 tuples A00 such that d(A, A00) < d(A, B). where the average local reachability density of the neigh- Note that the set of the k nearest neighbors includes all bors divided by the object’s own local reachability density. tuples at this distance, which can in the case of a ”tie” A value of approximately a given threshold indicates that be more than k tuples. We denote the set of k nearest the object is comparable to its neighbors (and thus it is 0 0 neighbors as Nk(A) = A |d(A, A ) ≤ d(A). This distance not an outlier). A value less than the threshold indicates is used to define what is called reachability distance: a denser region (which would be an inlier), while values significantly larger than the threshold indicate outliers. Rdk(A, B) = max{k-distance(A), d(A, B)}. The traditional methods usually form all pair Eu- As shown in following Figure 1, it illustrates the original clidean distance matrix, and then run KNN query to intention of reachability distance. Tuples B and D have proceed further. It is Θ(n2) in terms of both space and the same reachability distance (k=3), while G is not a k time complexity. However, it can be improved with KD- nearest neighbor. tree [20]. Thus, the reachability distance of an object A from B is the true distance of the two objects, but at least the k-distance(A). Objects that belong to the k nearest neighbors of A are considered to be equally distant. The 3.2 Distributed Environment reason for this distance is to get more stable results. Note that this is not a distance in the mathematical definition, As Figure 2 shows, we utilize a distributed framework that since it is not symmetric. consists of a coordinator and a number of datanodes. The The local reachability density of an object A is defined coordinator is for the overall scheduling similar to [3]. by Each datanode is to store a portion of a complete data set. Most of previous algorithms utilize a master-slave 1 LRD(A) = P architecture for outlier detection, while lots of computa- RdkB,A B∈Nk(A) tions are performed on the master node. Following [3], |Nk(A)| we also do that the coordinator in our frame only takes where the inverse of the average reachability distance of charge of the scheduling and all the actual computations the object A from its neighbors. Note that it is not the are allocated to the datanodes. Besides, Density-based average reachability of the neighbors from A (which by outlier detection is to compute the LOF of each tuple for definition would be the k-distance(A)), but the distance a given integer k. First, we use improved GBP (IGBP) at which A can be ”reached” from its neighbors. With algorithm to split the data set into several subsets and duplicate points, this value can become infinite. assign them to the datanodes. After that, our algorithm The local reachability densities are then compared with work via two main steps. First, each datanode processes those of the neighbors using the local tuples. And LOFs of some tuples can be com- P LRD(B) puted directly in the local nodes. Second, we will output B∈Nk(A) LRD(A) LOFs of the rest of the tuples by a few necessary network LOFk(A) = |Nk(A)| communications compared with [3]. International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 778

Algorithm 1 Grid allocation Input: The grid set G; The datanode set N; Output: output: Allocation plan; 1: sort the grids in G according to the number of tuples in a grid in the descending order; 2: for each grid g in G do 3: if there exist datanodes with no grid then 4: n ← randomly choose a datanode with no grid; 5: else 6: Initialize a datanode set N 0;

7: for each datanode ni in N do

Figure 2: Computation frame 8: if ni has the least tuples then 0 9: insert ni into N ; 4 The Improved GBP Algorithm 10: end if (IGBP) 11: if there exists at least one datanode in N 0 with grids which belong to adjacent grids Ng then 4.1 Grid-based Partition 12: n ← choose the datanode in N 0 with In our IGBP algorithm, we first attempt to split the whole the largest number of grids which belong to adjacent d-dimensional space into several isometric grids. While gridsNg; grid-based partition method conducts limitations in the 13: else high-dimensional data, Hence, we cut each dimension into 0 several equal segments (the number of segments is de- 14: n←randomly choose a datanode in N ; noted by s). After that, the space is partitioned into sd 15: end if grids. Let gx1,x2,··· ,xd be the gird that is at the xi-th 16: end for position for dimension i. Next, we give the definition of 17: end if adjacent grid: 18: allocate g to n;

N(gx1,x2,··· ,xd ) = {gy1,y2,··· ,yd | max(1, xi − 1) ≤ yi ≤ 19: end for

min(s, xi + 1), gy1,y2,··· ,yd 6= gx1,x2,··· ,xd }.

Next, we allocate these grids to the datanodes. In order calculate LRDs effectively, we have to compute the k- to speed up the computations of density-based outliers, distances and k-distance neighborhoods for all the tuples we propose an allocation method through considering the first, which is the core part of this section. following two factors. However, in distributed environments, the situation is more complex. It’s difficult to compute the actual k- 1) To obtain high parallelism, we set the number of tu- distances of all the tuples. For example, in Figure 3, con- ples on each datanode almost the same (balance the sidering the tuples in grid g1, the local k-distance neigh- workload). borhood of tuple A is identical to its actual k-distance neighborhood. However, tuple B shows a complex situa- 2) According to Section 3.1, we compute the k-distance tion. Its local k-distance neighborhood is C,D,E, which neighborhood for each tuple and reduce the network cannot be computed unless D,E is transmitted from g2 overhead if we allocate the adjacent grids to the same to gird g1. datanodes as possible. The details of the proposed For previous work in [3], they classify the tuples in a method are shown in Algorithm 1. grid into 2 categories. A tuple whose neighborhoods can be computed in local grid is a grid-local tuple. Otherwise, it is a cross-grid tuple. After that, they proposed an al- 4.2 Distributed LOF Computing gorithm to solve this problem. This part is not the most significant point in our paper. For distributed LOF com- In above part, we have split the data set into several puting, we will make a example to explain our proposed grids and allocated them to the corresponding datanodes. method which is different from previous work. Next, we turn to compute the LOF for each tuple in par- First, as shown in Figure 4, there is a set P with 120 allel way. By analyzing the definitions in Section 3, it tuples in 2-dimensional space, and the number of datan- is clear that LRD is the premise of LOF. In order to odes is 10. Thus, we set s = 4 and split the space into 16 International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 779

Table 1: Improved algorithm process Sequence number Grid ID Number of tuples Allocated datanode Average number of tuples 1 g1,2 11 n1 / 2 g1,1 10 n2 / 3 g3,2 10 n3 / 4 g1,4 10 n4 / 5 g3,4 9 n5 / 6 g3,3 9 n6 / 7 g3,1 8 n7 / 8 g4,1 8 n8 / 9 g4,4 7 n9 / 10 g1,3 7 n10 8.9 11 g2,2 6 n10 9.5 12 g2,4 6 n9 10.1 13 g2,1 6 n7 10.7 14 g4,2 5 n8 11.2 15 g4,3 4 n6 11.6 16 g2,3 4 n5 12

Dim 2 Neighbors of g2,2 & '  g1,4 g2,4 g3,4 g4,4 % ( $ 10 6 9 7  g1,3 g2,3 g3,3 g4,3

g1 g2 7 4 9 4  Figure 3: Example of DLC (k=3) g1,2 g2,2 g3,2 g4,2

11 6 10 5 grids. The related number of tuples is shown at the bot-  tom of each grid. According to the algorithm 1, we sort g1,1 g2,1 g3,1 g4,1 the grids by the number of tuples in a grid, and the result is shown in Table 1. Then, for each of the first 10 grids, 10 6 8 8 we randomly allocate it to a datanode with no grid. Af-    ter that, totally 89 tuples have been allocated, and the   Dim 1 average number of tuples per datanode is 8.9. When al- Figure 4: Example of grids locating the 11th grid g2,2, there are 4 datanodes whose numbers of tuples are not larger than 8.9, including n7; n8; n9; n10. We choose n10 because the number of tuples the number of tuples in datanotes {n1, n2, ··· , n10} in n10 is smallest. Using the same method, we allocate all is {11, 10, 10, 10, 15, 13, 14, 13, 11, 13}. We also obtain the grids to the corresponding datanodes. σ ≈ 1.732. After allocation, the number of tuples in data notes {n1, n2, ··· , n10} is {11, 10, 10, 10, 13, 13, 14, 13, 13, 13}. We use σ (standard deviation) to indicate how spread out 5 Experimental Evaluation a data distribution is. A low standard deviation means that this algorithm balances the workload well. We use We implement our proposed approaches using python pro- computational formula of standard deviation to obtain: gramming language, and evaluate the performance in a q cluster (with 4 data nodes and 1 coordinator) where each σ = 112+102+102+102+132+132+142+132+132+132 ≈ 1.483 10 node (coordinator or datanode) has a Intel Core i5 @ 2.53 Second, according to the previous algorithm in [3], GHz CPU, 32G main memory. We first use a synthetic International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 780

Table 2: The influence of data scale 100000 tuples 500000 tuples 1000000 tuples Method k=10 k=20 k=10 k=20 k=10 k=20 In [16] 782ms 1243ms 5341ms 7988ms 13568 27755 In [3] 512ms 910ms 4122ms 6278 9742 22495 In IGBP 547ms 932ms 3907ms 5611 8472 16625

k=10 Table 3: Open datasets 16000 Number of Instances Number of Attributes approach in [8] Shuttle 58000 9 14000 approach in [9] Census 48842 14 12000 IGBP Drug 215063 6 10000 8000

6000 runtime(ms) 4000 Table 4: Results in three datasets 2000 Runtime(ms) 0 Methods 100000 500000 1000000 Shuttle Census Drug data scale In [16] 12339 34987 47788 In [3] 3374 7120 9759 Figure 5: Runtime comparison IGBP 3115 5780 6880 dataset to verify the efficiency of our proposed method IGBP. Then we choose three open datasets Shuttle, Cen- sus and Drug which have been widely used in outlier de- which have been widely used in outlier detection. The tection to further show the highly efficiency and effective- details has shown in following Table 3. ness of IGBP. As Table 4 shows that our proposed method have low time consumption than [3, 16] based on three real-world 5.1 IGBP with Synthetic Data datasets. More specifically, with the increase of data scale, we find that our proposed method have higher effi- We generate various synthetic data sets to analyze the ciency compared with [3, 16]. performance of our methods. Specifically, we compare our proposed method with [3, 16]. We generate several clustering center points. And the tuples in each cluster follow a Gaussian distribution. Finally, we add some gen- erated noise into dataset, and the dimension of the data 6 Conclusions set is 3. The detailed parameter settings and the runtime are summarized in Table 2. In this paper, we focus on the problem of density-based As shown in the table, with the increase of data scale, outlier detection in distributed environments for high- the runtime for our algorithm show better performance dimensional and large-scale data sets. We first introduce than previous work [3, 16]. Besides, experimental results the basic concept of LOF. We summarized the approach demonstrate that workload balance is more significant in [3] to solve the problem of distributed LOF computing than network overhead in this issue. Figure 5 illustrates in detail. The advantages and disadvantages of the ap- that the impact of different data scale, with the increase of proach are discussed. Then, we propose an improved algo- data scale, our proposed method outperform better base- rithm based on greedy algorithm, namely IGBP. With the lines. experimental results, we show the efficiency and effective- ness of the proposed approaches compared with previous 5.2 IGBP with Open Datasets work. The results demonstrate that our algorithm out- perform baseline. In future, we will considerate more effi- In this part, we use three popular real-world datasets to cient policies to balance the time consumption and space evaluate our proposed method. Shuttle, Census and Drug consumption. International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 781

Acknowledgments [13] E. M. Knox and R. T. Ng, ”Algorithms for mining This research is supported by the National Natural Sci- distancebased outliers in large datasets,” in Proceed- ence Foundation of China (61472064,61602096) and the ings of the International Conference on Very Large Sichuan Provincial Science and Technology Department Data Bases, pp. 392–403, 1998. Project (2018GZ0087,2016FZ0002). [14] A. Koufakou, J. Secretan, J. Reeder, K. Cardona, and M. Georgiopoulos. ”Fast parallel outlier detec- tion for categorical datasets using mapreduce,” in References IEEE International Joint Conference on Neural Net- works, pp. 3298–3304, 2008. [1] C. C. Aggarwal and P. S. Yu, ”Outlier detection [15] H. P. Kriegel, A. Zimek, et al., ”Angle-based outlier for high dimensional data,” in ACM Sigmod Record, detection in high-dimensional data,” in Proceedings vol. 30, pp. 37–46, 2001. of the 14th ACM SIGKDD International Conference [2] F. Angiulli and C. Pizzuti, ”Outlier mining in large on Knowledge Discovery and Data Mining, pp. 444– high-dimensional data sets,” IEEE Transactions on 452, 2008. Knowledge and Data Engineering, vol. 17, no. 2, pp. [16] E. Lozano and E. Acufia, ”Parallel algorithms for 203–215, 2005. distance-based and density-based outliers,” in Fifth [3] M. Bai, X. Wang, J. Xin, and G. Wang, ”An effi- IEEE International Conference on Data Mining, pp. cient algorithm for distributed density-based outlier 4, 2005. detection on big data,” Neurocomputing, vol. 181, [17] Y. Lu, Y. Zhu, M. Itomlenskis, S. Vyaghri, and H. pp. 19–28, 2016. Fu, ”Mmoprg bot detection based on traffic analy- [4] R. J. Beckman and R. D. Cook, ”Outlier ...... sis,” International Journal of Electronics and Infor- s,” Technometrics, vol. 25, no. 2, pp. 119–149, 1983. mation Engineering, vol. 2, no. 1, pp. 18–26, 2015. [5] M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. [18] M. Markou and S. Singh, ”Novelty detection: A Sander, ”Lof: Identifying density-based local out- review—part 1: statistical approaches,” Signal Pro- liers,” in ACM Sigmod Record, vol 29, pp. 93–104, cessing, vol. 83, no. 12, pp. 2481–2497, 2003. 2000. [19] M. Markou and S. Singh, ”Novelty detection: A re- [6] D. Dasgupta and N. S. Majumdar, ”Anomaly de- view—part 2:: Neural network based approaches,” tection in multidimensional data using negative se- Signal Processing, vol. 83, no. 12, pp. 2499–2521, lection algorithm,” in Proceedings of the Congress 2003. on Evolutionary Computation (CEC’02), vol. 2, pp. [20] F. P. Miller, A. F. Vandome, and J. McBrew- 1039–1044, 2002. ster, Kd-Tree, 2009. (https://www.amazon.fr/ [7] R. Fujimaki, T. Yairi, and K. Machida, ”An ap- kd-tree-Frederic-P-Miller/dp/6130094590) proach to spacecraft anomaly detection problem us- [21] E. Schubert, A. Zimek, and H. P. Kriegel, ”Local ing kernel feature space,” in Proceedings of the outlier detection reconsidered: a generalized view on Eleventh ACM SIGKDD International Conference locality with applications to spatial, video, and net- on Knowledge Discovery in Data Mining, pp. 401– work outlier detection,” Data Mining and Knowledge 410, 2005. Discovery, vol. 28, no. 1, pp. 190–237, 2014. [8] J. Han, J. Pei, and M. Kamber, ”Data min- ing: Concepts and techniques,” A vol- ume in the Morgan Kaufmann Series in Biography Data Management Systems, 2012. (https: //www.sciencedirect.com/book/9780123814791/ LinMei received B.Sc and M.Sc in computer science data-mining-concepts-and-techniques) and technology from University of Electronic Science and [9] D. Hawkins, ”Identification of outliers,” Monographs Technology of China in 2003 and 2006, He is currently on Statistics and Applied Probability, vol. 11, 1980. a lecturer in Southwest Minzu University. And pursing [10] Q. He, Y. Ma, Q. Wang, F. Zhuang, and Z. Shi, ”Par- Ph.D degree in University of Electronic Science and allel outlier detection using kd-tree based on mapre- Technology of China. His research interests include duce,” in IEEE Third International Conference on network security and cloud computing. Cloud Computing Technology and Science, pp. 75–80, 2011. Fengli Zhang received B.Sc, M.Sc and Ph.D in computer [11] E. Hung and D. W. Cheung, ”Parallel mining of science and technology from University of Electronic outliers in large database,” Distributed and Parallel Science and Technology of China in 1983, 1986 and 2007. Databases, vol. 12, no. 1, pp. 5–26, 2002. She is currently a professor in University of Electronic [12] E. M. Knorr, R. T. Ng, and V. Tucakov, ”Distance- Science and Technology of China, her research interest based outliers: algorithms and applications,” The include database management, network security and big VLDB Journal—The International Journal on Very data processing. Large Data Bases, vol. 8, no. 3-4, pp. 237–253, 2000. International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 782

Binary Executable Files Homology Detection with Genetic Algorithm

Jinyue Bian1 and Quan Qian1,2 (Corresponding author: Quan Qian)

School of Computer Engineering and Science, Shanghai University1 Shanghai 200444, China Materials Genome Institute, Shanghai University, Shanghai, China2 (Email: [email protected]) (Received Mar. 16, 2019; Revised and Accepted Dec. 29, 2019; First Online May 9, 2020)

Abstract ing the corresponding relations between their vertexes and considering the consistency of edge sets at the same Software homology detection is very meaningful for soft- time. CFG similarity comparison methods include graph ware copyright protection and malicious code variants de- edit distance (GED), string matching, execution sequence tection. In this paper, we propose a genetic algorithm to comparison, matching program basic blocks, and so on. justify the binary code similarity. First of all, the binary However, in general, the computation complexity of these executable files are converted into control flow graph , methods are quite large. Therefore, in this paper, we and then use genetic algorithm to compute the similar- propose a new graph matching algorithm for binary code ity among control flow graphs, which is regarded as the similarity analysis. First, the binary file will be converted evaluation metric for software homology detection. The into CFG with relatively complete control flow informa- experimental results show that the method is not only ef- tion using the dynamic and static combination technique. fective, but also the average time efficiency is 0.3 times And then use genetic algorithm (GA) to calculate the sim- that of the classical algorithm of graph edit distance. ilarity between CFGs. The algorithm can be used to ac- Keywords: Binary Executable Files; Control Flow Graph; curately identify the isomorphism subgraph relationship Genetic Algorithm; Homology Detection and the exact identical CFGs, which can shorten the run- ning time greatly, so as judging the software homology effectively. 1 Introduction The organization of the paper is as follows: Section II describes some related work. Section III presents the With the rapid development of software, code plagia- framework of the proposed method. The experimental rism is emerging endlessly, which seriously threatens the results and detailed analysis are discussed in Section IV. software intellectual property rights. In addition, in Section V concludes the whole paper and outlines some March 2018, a total of 7,235,983 viruses were found in directions of the future work. the National Computer Virus Emergency Center,36,931 new viruses were added, and 76,606,087 computers were infected [13]. Viruses or malware’s escape from detection 2 Related Work through code variants, but the core code does not change much. Therefore, detection of code similarity is very nec- So far, there are certain amounts of research on homology essary. And the existing methods can be divided into identification related area. Here, we will give some back- two categories, source code based or binary code based. ground work in two aspects, including sequenced based Considering sometimes we can not get the source code. analysis method and graph based method that closely re- Therefore, binary code based homology detection is more lated to this paper. promising. That is, the similarity analysis based on bi- nary code is very important. 2.1 Sequence-based Analysis Method For binary code similarity detection, graph based method plays an important role in which Control Flow Aiming at the problem of source code plagiarism, Guo Graph (CFG) is one of the most commonly used method. proposed an improved code plagiarism detection algo- Therefore, for binary code homology detection, it can be rithm based on abstract syntax tree, which can detect transferred to graph similarity matching problems. That plagiarism effectively [22]. Koschke demonstrated how is, given two graphs, graph matching involves establish- suffix trees can be used to obtain a scalable comparison, International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 783 and presented a method to improve the accuracy through to generate CFG, and then use graph matching to eval- user feedback and automatic data mining [10]. Liu pre- uate the binary software homology. The main contribu- sented an improved abstract syntax tree, which can effec- tions of the paper are as follows: tively detect code plagiarism by modifying the variable • Design a special GA to evaluate the similarity of bi- type and adding meaningless variables [14]. nary files, which can be used to accurately identify However, the above mentioned methods are all the isomorphism subgraph and the exact identical sequence-based ones, that can be bypassed by interference CFGs, which is normally regarded as a NP-complete techniques, such as instruction rearrangement, equivalent problem. And the experimental result shows when instruction sequence replacement, branch inversion, etc. the number of CFG nodes is large, our method can And the essence of interference is that malicious code still get the CFG similarity and comparing with other can produce homogeneous code with different syntax but classical methods, the time overhead of our algorithm same semantics. Therefore, the other direction is dynamic is obviously reduced. based analysis, which generally relies on dynamic execu- tion log to analysis the program behaviour. For instance, • For GA based CFG matching, we give a complete through analyzing the anomaly and similarity of process framework including CFG mapping to chromosome, access behavior in data flow dependent networks, Mao et operation design for the crossover, mutation, selec- al. introduced an active learning method by minimizing tion and fitness evolution. risk estimation, which can improve the detection effect of malicious code apparently [23]. Although dynamic based method can extract the code running features, it relies on 3 The Proposed Method virtual running environment and also can be challenged by anti-virtual machine attacks [21].Yang et al. intro- The method proposed can be divided into two steps. The duces a method of defect detection based on homology first step is CFG extraction from the binary executable detection technology for open source software [28]. files combining static and dynamic recovery method. And So, we can use more information, for instance, the func- the second step is CFG encoded and read into GA to tion call diagram, to further detect malicious code. In compute the similarities among different graphs. The this aspect, Chae proposed a software plagiarism detec- flowchart of the method is shown in Figure 1, where the tion system using an API labeled CFG (A-CFG) that implementation of each step is described in detail below. abstracts the functionalities of a program [2]. Lim pre- sented a method to compare CFGs by matching the basic 3.1 Binary Files’ CFG Extraction block of a binary program, which can effectively identify Although static method has high code coverage, it cannot the similar CFGs [12]. Wu gave a parallel method to ex- obtain complete control flow. Although dynamic method tract the function call graph from the source code, and can obtain the exact program execution information, the introduced a new software structure information compar- code coverage is low. So, the hybrid recovery method is ison algorithm to effectively check the homology of the the combination of dynamic and static method, which en- software [24]. sures not only the high code coverage but also can solve the indirect jump issues. And then use symbolic execu- 2.2 Graph Matching Based Method tion and reverse slicing techniques to further traverse the binary file to obtain complete control flow information. Graph matching is a classical problem in computer sci- ence. At the same time, GA also has some applications 3.1.1 Symbolic Execution in graph matching, code similarity detection and other security areas [20]. Moon proposed a malware detection Symbolic execution is the symbol substitution for real val- system using a hybrid GA, in which a malware is repre- ues when running a program. The advantage is that we sented as a directed dependency graph and transforms the can traverse the paths of a program as much as possi- malware detection problem to the subgraph isomorphism ble by using the variable symbols. KLEE [9] and S2E [3] problem [8]. Jaeun gave a multi-objective GA with a lo- are two typical symbolic execution tools based on source cal search heuristic, comparing the degrees of each vertex code and binary code respectively. In symbolic execu- of two graphs that are mapped, and counted the num- tion, we can convert the program operation process into ber of mismatched vertices [5]. Kim proposed a new cost a mathematical expression. Whenever a judgment and function based on the program dependency graph using a jump statement are encountered, symbolic execution the GA to measure the similarity of the program, the gathers the path constraints of the current execution path method was proved to be feasible [7]. Xiang presented into the constraint set of that path. Through constraint an improved GA, which mainly studied the isomorphism solver, the path reachability can be obtained by solv- of subgraphs, and designed a special crossover function ing the constraint set. Combining the actual execution and a new fitness function to measure the evolution pro- with the symbol execution, that first emerged in 2005, in cess [25]. DART [18] and CUTE [19], many problems encountered In this paper, we combine static and dynamic method in traditional static symbol execution have been solved. International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 784

Figure 1: The flowchart of binary code homology detection

However, there are some challenges to symbolic execu- also for binary codes, it lacks structural information, so tion, for instance, the path explosion, solver unable to IDA Pro’s disassembly results are far from ideal [29]. solve, etc. For reducing the path explosion, heuristic func- In this paper, we use a new CFG tool Angr [27], which tions, reliable program analysis and software verification is a binary analysis platform based on Python. It im- techniques are some common strategies for reducing the plements different symbolic execution strategies, such as complexity of path exploration. For constraint solving veritesting [1]. The Angr system is originally designed problems, there are two kinds of optimization methods. for the DARPA Challenge. It can load different binary One is the elimination of uncorrelated constraints and the format files and their dependent libraries, and integrate other is incremental solution. many advanced binary analysis techniques to generate CFGs by combining dynamic and static methods. More- 3.1.2 Reverse Slicing over, Angr can optimize the basic block overlap in the CFG, and the overlapping parts are merged and subdi- Program slicing is an important technique for program vided to obtain more accurate control flow information. analysis, Negi et al. highlights the different test cases and Meanwhile, since Angr analyses and removes the edges comparative analysis of program slicing methods which generated by loop structure and pseudo-return, it sim- corresponds to the applications which are usually utilized plifies the CFG and alleviates the explosion problem of in software modification activities [15]. Given the slic- program state space, which improves the reconstructing ing standard < p, V >, the forward slicing of program ability of CFG. p contains all statements and control conditions affected by variables in V , while the backward slice of program p 3.2 GA for CFGs Similarity Calculation contains all statements and control conditions that have direct or indirect effects on variables in V . Given two directed graphs, and then number the nodes in the graph [4], for example, as shown in Figure 2. We use 3.1.3 CFG Generation and Optimization GA to select the best individual after several generations of evolution, such as crossover and mutation, until the Currently, the common tools for generating CFG are FXE evolution stop criteria is satisfied. (forced execution engine) [26], IDA Pro [17], et al. FXE can solve the problem of indirect branch jump by dynam- 3.2.1 Chromosome Initialization ically executing code, it is only suitable for the binary ex- ecutable of Windows PE format under x86 architecture. The GA adds a node of the matching graph to each chro- IDA Pro can generate CFG and function call graph, but mosome once each iteration, a corresponding node of the the repeated nodes in the graph are not merged or deleted, matched graph is generated by random function until all International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 785

Figure 2: Two examples of numbering nodes for directed graphs

the nodes of the matching graph are added to the chro- returns the current chromosome number 2, and so on. mosome. According to Figure 2, the initialization process is shown in Table 1, all the left nodes in the chromosome Pk Sk i=1 fi comes from a relatively small matching graph (the left one bk = = L . (1) S P f of Figure 2) and the right nodes come from the matched i=1 i graph (the right one of Figure 2). About the matching Where f denotes the fitness value and L = sizepop, k graph, adding some rules for initialization: 6 sizepop. • When the in-degree of the left node is zero, and the out-degree of the right node is zero, they do not 3.2.3 Chromosomes Crossover match. And if the out-degree of the left node is zero, and the in-degree of the right node is zero, it will Before a crossover, we generate a number between 0 and not match. For example, in Figure 2, node 5 of the 1 through a random function, denoted as rand. If rand 6 left graph doesn’t match node 7 of the right graph, cross, then do crossover operation, otherwise, keep silent. because the in-degree of node 5 is zero and the out- Two chromosomes are extracted from the population degree of node 7 is zero. by the selection function, the right nodes of the two • The current right node number is different from the chromosomes (noted as P arent one and P arent two) are previous right node number in the chromosome, and crossed, and the beginning position of the middle part the number does not exceed the total number of is marked as “begin”, the ending position of the mid- nodes in the matched graph. For example, in the dle part is marked as “end”. begin and end generated first chromosome of Table 1, the node corresponding by random functions must satisfy begin < end, and the to the left node 2 must not be node 5 and the number difference is not equal to the length of the whole chro- doesn’t exceed 7. mosome. Let the middle part of the P arent one as the middle part of the child one after crossing, then let the nodes of the P arent two in turn to fill in the child one, 3.2.2 Chromosomes Selection and after that do conflict detection. If there is a conflict, traversing backward in turn. After the traversal, if the Both crossover and mutation operations in GA rely on se- nodes of the child one chromosome is not filled up, then lection function. The selection operation in this paper is use the random function to generate the missing node and similar to roulette-wheel selection. First of all, a random then perform conflict detection. According to the above function produces a number between 0 and 1 denoted as crossover rules, can obtain two child chromosomes after a. Sum the fitness value of all chromosomes in the popula- the crossover. tion as S; Next, calculate the cumulative fitness value for For example, take the right node of the first and second each chromosome, that is the sum of the fitness from the chromosome in Table 1 to do a crossover and assume that first chromosome to the current one k, marked as Sk. Set begin = 3, end = 5, the specific crossover steps with the Sk as the divisor and S the dividend, the resulting quo- P arent one in mind are shown in Figure 3. Thus, the tient is recorded as bk and compared with a, as shown in chromosome produced by the crossover are 1 → 6, 2 → Equation (1). If a 6 bk, returns the current chromosome 5, 3 → 4, 4 → 2, 5 → 1, 6 → 3. number k. Combine two chromosomal populations before and af- For example, in Table 1, according to Equation (1), ter the crossover and then sorted according to the fitness the cumulative fitness value of the first chromosome is value in ascending order. If the population size is set to f1 b1 = f1+f2+f3+f4 , and the cumulative fitness value of sizepop, then select the top sizepop chromosome to form f1+f2 the second chromosome is b2 = f1+f2+f3+f4 . If a 6 b2, the population after crossover. International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 786

Table 1: An example of chromosome initialization iterations # First Second Third Fourth chromosome chromosome chromosome chromosome 1 1 → 5 1 → 6 1 → 1 1 → 2 2 1 → 5, 2 → 6 1 → 6, 2 → 1 1 → 1, 2 → 4 1 → 2, 2 → 3 3 1 → 5, 2 → 6 1 → 6, 2 → 1 1 → 1, 2 → 4 1 → 2, 2 → 3 3 → 4 3 → 4 3 → 2 3 → 6 4 1 → 5, 2 → 6 1 → 6, 2 → 1 1 → 1, 2 → 4 1 → 2, 2 → 3 3 → 4, 4 → 2 3 → 4, 4 → 2 3 → 2, 4 → 3 3 → 6, 4 → 4 5 1 → 5, 2 → 6 1 → 6, 2 → 1 1 → 1, 2 → 4 1 → 2, 2 → 3 3 → 4, 4 → 2 3 → 4, 4 → 2 3 → 2, 4 → 3 3 → 6, 4 → 4 5 → 1 5 → 5 5 → 6 5 → 1 6 1 → 5, 2 → 6 1 → 6, 2 → 1 1 → 1, 2 → 4 1 → 2, 2 → 3 3 → 4, 4 → 2 3 → 4, 4 → 2 3 → 2, 4 → 3 3 → 6, 4 → 4 5 → 1, 6 → 3 5 → 5, 6 → 3 5 → 6, 6 → 5 5 → 1, 6 → 5 fitness function f1 f2 f3 f4 value

Figure 3: An example of crossover step International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 787

3.2.4 Chromosome Mutation 3.2.5 Fitness Calculation for Each Chromosome In this paper, the fitness function F is divided into two Each chromosome in population is selected in turn on the parts F 1, F 2, and define F = F 1 + F 2. F 1 is composed basis of the crossover operation. We generate a num- of the mismatched nodes number f1 and the mismatched ber between 0 and 1 through a random function, which edges number f2, that is, F 1 = f1+f2, where f1 and f2 is denoted as rand. If rand 6 mutation, the mutation are both for smaller graphs. For example, in Table 1, for operation is performed, otherwise, no mutation. When the 4th chromosome, after 3 times of iteration, the results the nodes of the second graph are not all added to the are: 1 → 2, 2 → 3, 3 → 6, at this time f1 = 3, f2 = 2. chromosome, we use different random functions to gener- However, only according to the results obtained by F 1 ate a mutation position (position) and a mutation value is not scientific, because when two graphs are exactly the (num) in the chromosome, and then do conflict detec- same and two graphs are isomorphic subgraphs, the re- tion. If there is no conflict, replace them. Otherwise, a sults of F 1 are all zero. Therefore, we add the F 2 part, new mutation is generated. If all the nodes of the second as shown in Equation (2), if the number of nodes in the graph have been added to the chromosomes, use random two CFGs is equal, then F 2 is zero, otherwise it is not functions to generate two different positions in the chro- zero. mosome (i and j), and exchange the right nodes of the node one two positions as the mutated chromosomes. F 2 = 1 − . (2) node two For example, the third chromosome in Table 1, not all Where node one represents the number of nodes of the nodes of the second graph are added to the chromosome at smaller graph, and node two the larger graph. the sixth iteration, and suppose position = 3, num = 4, it Therefore, the fitness function of the proposed method is obvious that the mutation value generated at this point is shown as Equation (3). conflicts with the subsequent right node, so suppose we re- generate a mutation value num = 7, and there is no con- node one F = F 1 + F 2 = f1 + f2 + 1 − . (3) flict, then the chromosome mutation is shown as the left node two figure in Figure 4. Therefore, after the third chromosome With this fitness function, if F 1 = 0, F 2 = 0, the nodes mutation, it’s 1 → 1, 2 → 4, 3 → 7, 4 → 3, 5 → 6, 6 → 5. and edges of two CFGs are exactly the same. And if If there are 6 nodes in the second graph and all of them F 1 = 0, F 2 6= 0, the two CFGs are isomorphic subgraphs. have been added to the chromosome, assuming i = 2, j = 5, the chromosome mutation is shown as the right figure in Figure 4. Thus, after the 3rd chromosome mu- 3.2.6 Stop Criteria of Population Evolution tation, it’s 1 → 1, 2 → 6, 3 → 2, 4 → 3, 5 → 4, 6 → 5. When any one of the followings three rules occurs, the GA will stop iteration and return the chromosome with the smallest fitness value. • The evolution times reach the predefined number; • During evolution, the F1 value of a chromosome equals 0; • After several iterations, comparing with the popula- tion changes before and after each evolution, there are almost no changes in the fitness value.

4 Experiments 4.1 Experimental Data and Environment Kernel32.dll is a very important dynamic link library file in Windows, which belongs to kernel level file. It con- Figure 4: An example of mutation process trols the system memory management, the data input and output operation and interrupt processing. When the Windows start, the kernel32.dll resides in a specific write-protection area in memory, preventing other pro- According to the rule of mutation, we perform the op- grams from occupying the memory area [16]. We select eration of chromosome variation, and we compare the fit- loadimagefile, loadappinitdlls and basepappinitdlls in ness value of chromosomes before and after mutation. If kernel32.dll and write.exe as experimental data. In de- the fitness value of the chromosome is not reduced after tail, we extract the experimental data files from Win- mutation, the mutation operation is canceled. dows 7 and Windows 10, altogether 8 files. write.exe, International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 788 loadimagefile, loadappinitdlls, basepappinitdlls of Win- 4.3 Experimental Results and Analysis dows 10 system. For Windows 7 system, named them as write 7.exe, loadimagefile 7, loadappinitdlls 7 and 4.3.1 Comparison of CFG Generated by Angr basepappinitdlls 7, and numbered them as 1,... 8, for and IDA subsequent description. All the experiments is under the In order to verify the integrity and simplification of the Windows 10 system, and the related software include the CFG generated by Angr, we compare and analyze the IDA, Angr, and VS2017. In addition, for GA parame- CFG generated by Angr and IDA. For example, the CFG ters, we set the number of population (sizepop) 20, the of the CADET 0001 file given by the DARPA Challenge, number of evolution (maxgen) 2000, the crossover rate Figure 5 is a CFG generated using Angr, and Figure 6 by (cross) 0.3, and the mutation rate (mutation) 0.8. IDA. The number of basic blocks in the CFG generated by Angr is more than that of IDA, and the basic block of 4.2 Comparative Experimental Methods CFG generated by Angr also contains the basic block of CFG generated by IDA. After merging overlapped nodes Among all graph matching algorithms, GED is the clas- and eliminating cyclic and unreachable edges and nodes, sical one with good fault tolerance and be suitable for Angr can obtain more accurate CFG, which can provide various types of graphs. Although there are other graph a more concise CFG for subsequent analysis. similarity computation methods, especially the lately hot graph neural network (GNN), it needs not only massive data to train the network but also rich computational re- sources. Therefore, we select a series of GED algorithms as comparative ones, which can be more targeted when compared with the experimental results of the proposed method.

4.2.1 GED The GA proposed in this paper is mainly used to calcu- late the similarity between two directed graphs, which is consistent with the classical GED method. Therefore, our comparative experiment selects the edit distance method and its two variant methods. GED is the sum of the min- imum editing operation costs required to edit a source graph into a target graph. The cost of each step is ob- tained by defining the corresponding cost function [30]. In this paper, the cost of replacing, deleting and inserting the nodes and edges of the directed graph in the edit- distance method is all set to 1. Figure 5: CFG based blocks generated by Angr 4.2.2 Edit Distance Based on Binary Linear Pro- gramming Formula In 2015, Julien Lerouge proposed a new binary linear programming formulations for computing the exact GED between two graphs (GED linear) [11], the advantage of which is the universality of the formula, the similarity be- tween digraphs and undirected graphs can be calculated. But when the number of nodes is too large, the method does not work, and the time cost of this method is larger than that of other methods. Figure 6: CFG based block generated by IDA

4.2.3 Edit Distance Based on Hausdorff Match- ing 4.3.2 Comparison of GA with Other Experimen- In 2015, Andreas Fischer proposed a quadratic time tal Methods approximation of GED based on Hausdorff matching (GED distance) [6]. The advantage of the method is that First, we get the CFGs of the control group files, and the the time performance is greatly improved compared with nodes and edges of the CFGs are shown in Table 2. Then other methods. However, the disadvantage is that some take one of them and compare it with the CFGs of the accuracy is lost, that is, a small loss of precision. control group. The experimental results are demonstrated International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 789 by similarity trend diagram, in which the solid line refers to the left Y axis and the dashed line refers to the right Y axis. In this paper, the smaller the result, the greater the similarity between the two graphs.

(2.1) Similarity between loadimagefile and contrast files. CFG of the loadimagefile consists 8 nodes and 12 edges, and the similarity trends are shown in Fig- ure 7. Results of GA and three comparative ex- perimental methods show that the descending or- der of similarity with loadimagefile is 1 = 2 > 4 > 3 > 5 > 6 > 8 > 7. However, for GED linear and GED distance, the edit distance be- tween loadimagefile and itself is not zero, while GED and GA can get that there is no differ- ence between loadimagefile and itself, and the two Figure 8: Time performances of four different methods CFGs are exactly the same. It further shows that for computing loadimagefile loadimagefile of kernel32.dll in Windows 10 is not modified, just the same as in Windows 7.

nodes is large, GED linear cannot calculate the similarity (distance value) of two graphs. Just as in Table 3, we use NA to describe. Therefore, we mainly concern the other 3 methods and the similarity results are shown in Figure 9.

Figure 7: Similarity between loadimagefile and contrast files

The time performance of the four methods is shown as Figure 8. From Figure 8, it shows that the runtime Figure 9: Similarity comparison between loadappinitdlls of GA is relatively stable. But with the number of and contrast files nodes increasing, the other three methods need more time. When the loadimagefile file similarity calcu- lation with the file #8, the GA method needs the From Figure 9, it says that, for loadappinitdlls, the least running time, and both GED and GED linear similarity order calculated by GED distance is 4 > 1 = methods cost a long runtime. And on average, the 2 > 3 > 5 > 6 > 8 > 7, and the distance between running time of GED linear is 9 times that of GA, loadappinitdlls and itself is not 0. When using GED, GED 3 times, and GED distance 0.6 times. But com- the similarity order is 3 > 4 > 5 > 6 > 1 = 2 > 8 > 7, paring with GED distance, GA is more accurate and and loadappinitdlls is exactly the same as itself. For GA reasonable for identifying identical CFGs. method, the result is consistent with the GED method. (2.2) Analysis of similarity between loadappinitdlls and Therefore, the results obtained, the results obtained by contrast files. GED distance and GED are inconsistent, and GA is con- sistent with that of GED. And when the two programs are The CFG of loadappinitdlls contains 17 nodes and 29 exactly the same, the distance value should be 0, for in- edges. As mentioned in Section 4.2, when the number of stance, loadappinitdlls and file #3. Therefore, the results International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 790

Table 2: Nodes and edges of experimental files CFG filenum(#) 1 2 3 4 5 6 7 8 Node number 8 8 17 10 24 25 48 45 Edge number 12 12 29 16 36 39 68 64

Table 3: Distance calculation by the GED linear method filenum filename distance value (#) 1 loadimagefile 46 2 Loadimagefile 7 46 3 loadappinitdlls NA 4 Loadappinitdlls 7 50 5 basepappinitdlls NA 6 Basepappinitdlls 7 NA 7 Write.exe NA 8 Write 7.exe NA

obtained by GA and GED are more reasonable. Further, the experiment shows that loadappinitdlls of kernel32.dll in Windows 10 is slightly modified from Windows 7. When calculating the similarity between the loadappinitdlls and contrast files, the time perfor- mance of the three methods is shown in Figure 10. It can be seen that GA is less than that of both GED and GED linear methods, and when the number of nodes is too many, the GED linear cannot get result in a reasonable time. When calculating similarity between loadappinitdlls and the file #3, GA is significantly longer than that of GED distance. And the main reason is file #3 is the loadappinitdlls file itself, and GA will evolve to a perfect match. When a chromosome fitness value is 0, the evolution stops. But for GED distance, it is not 0. On average, running time for GED is 3 times that of the GA, and GED distance 0.6 times. Figure 10: Time performance of 4 different methods for (2.3) Similarity comparison between loadappinitdlls 7 computing loadappinitdlls and contrast files. The CFG of loadappinitdlls 7 contains 10 nodes and 16 edges. The similarity diagram is as Figure 11. According For loadappinitdlls 7, the time performance of the to Figure 11, the results calculated by the two variations four methods are shown in Figure 12. It can be seen of editing distance show that the order of similarity with that the time cost of the GED and GED linear meth- loadappinitdlls 7 is 1 = 2 > 4 > 3 > 5 > 6 > 8 > 7, ods is gradually increasing with the increase of the num- and loadappinitdlls 7 is not exactly the same as itself. ber of nodes, but the trend of GA is almost the same as For GED and GA method, the order of similarity with that of GED distance. On average, the running time of loadappinitdlls 7 is 4 > 1 = 2 > 3 > 6 > 5 > 8 > GED linear is 7 times that of GA, GED is 3 times, and 7. So, for GA and GED, there is no difference between GED distance is 0.5 times. loadappinitdlls 7 and itself, and two CFGs are identical. Based on the results, the loadappinitdlls 7 should be the most similar to itself, so the most similar file number 5 Conclusions and Future work should be 4. The results of the four methods show that the file with the least similarity to loadappinitdlls 7 is In this paper, we proposed a genetic algorithm method the file #7. Therefore, it can be proved that the method to calculate the similarity between two binary files. First, proposed in this paper is reasonable to some extent. the binary file is converted into CFG, and then the simi- International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 791

Figure 11: Similarity comparison between Figure 12: Time performance of 4 different methods for loadappinitdlls 7 and contrast files computing loadappinitdlls 7

larity between CFGs are calculated by GA. For similarity China (2018YFB0704400, 2016YFB0700504, calculation, the proposed method is consistent with the 2017YFB0701601), Research and Development Program GED method and can be used to accurately identify the in Key Areas of Guangdong Province (2018B010113001). isomorphism subgraph relationship and the exact identi- The authors gratefully appreciate the anonymous review- cal CFGs, which is the basis of software homology detec- ers for their valuable comments. tion. GED is a very classical graphic similarity calculation algorithm. Compared with other methods, GED has the advantages of high accuracy and simple operation. The References distance can be calculated simply by inputting 2 pairs of graphs. The similarity trend of GA in experiment (2.1) is [1] T. Avgerinos, A. Rebert, K. C. Sang, and D. Brum- almost the same as the three comparison algorithms, and ley, “Enhancing symbolic execution with veritest- the results of GA in experiment (2.2) and (2.3) are only ing,” in International Conference on Software En- the same as those of GED algorithm. Since the difference gineering, pp. 1083–1094, 2014. between the target file and itself should be equal to 0, we [2] D. K. Chae, J. Ha, S. W. Kim, B. J. Kang, and can infer that the results of GA and GED are more rea- E. G. Im, “Software plagiarism detection: A graph- sonable. Moreover, the proposed method is high efficient, based approach,” in ACM International Conference especially for graphs with massive nodes. Also, according on Conference on Information Knowledge Manage- to the fitness function designed in this paper, GA can ef- ment, pp. 1577–1580, 2013. fectively identify whether the two graphs are isomorphic [3] V. Chipounov, V. Georgescu, C. Zamfir, and G. sub-graphs or exactly the same. About time performance, Candea, “Selective symbolic execution,” in The on average, GA is 0.3 times that of the GED, 0.1 times Workshop on Hot Topics in System Dependability, the GED linear and 1.7 times the GED distance. pp. 1286–1299, 2009. About future work, some directions should be aug- [4] H. G. Choi, J. H. Kim, and B. R. Moon, “A hybrid mented. First of all, CFG is the analysis basis. How incremental genetic algorithm for subgraph isomor- to enrich the CFG is a direction worth further study. Af- phism problem,” in Conference on Genetic and Evo- ter that, for GA based software homology detection, it lutionary Computation, pp. 445–452, 2014. should be reinforced in more real and wide areas applica- [5] J. Choi, Y. Yoon, and B. R. Moon, “An efficient ge- tions. Meanwhile, some hyper-parameters of GA should netic algorithm for subgraph isomorphism,” in Con- be optimized automatically and adaptively according to ference on Genetic and Evolutionary Computation, different applications. pp. 361–368, 2012. [6] A. Fischer, C. Y. Suen, V. Frinken, K. Riesen, and H. Bunke, “Approximation of graph edit distance based Acknowledgment on hausdorff matching,” Pattern Recognition, vol. 48, no. 2, pp. 331–343, 2015. This work is partially sponsored by National [7] J. Kim, H. G. Choi, H. Yun, and B. R. Moon, “Mea- Key Research and Development Program of suring source code similarity by finding similar sub- International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 792

graph with an incremental genetic algorithm,” in ACM Transaction on Privacy and Security, vol. 21, Genetic and Evolutionary Computation Conference, no. 1, 2018. pp. 925–932, 2016. [22] G. Tao, G. Dong, Q. Hu, and B. Cui, “Improved pla- [8] K. Kim and B. R. Moon, “Malware detection based giarism detection algorithm based on abstract syntax on dependency graph using hybrid genetic algo- tree,” in International Conference on Emerging In- rithm,” in Conference on Genetic and Evolutionary telligent Data Web Technologies, pp. 714–719, 2013. Computation, pp. 211–1218, 2010. [23] M. Weixuan, C. Zhongmin, and T. Li, “A malicious [9] V. Kononov and V. A. Rusakov, “On the problems of code detection method based on active learning,” developing klee based symbolic interpreter of binary Software Journal, vol. 28, no. 2, pp. 384–397, 2017. files,” Procedia Computer Science, vol. 145, pp. 275– [24] P. Wu, J. Wang, and B. Tian, “Software homology 281, 2018. detection with software motifs based on function-call [10] R. Koschke, “Large-scale inter-system clone detec- graph,” IEEE Access, no. 99, pp. 1–1, 2018. tion using suffix trees,” in European Conference on [25] Y. Xiang, J. Han, H. Xu, and X. Guo, “An im- Software Maintenance and Reengineering, pp. 309– proved heuristic method for subgraph isomorphism 318, 2012. problem,” IOP Conference Series: Materials Science [11] J. Lerouge, Z. Abu-Aisheh, R. Raveaux, P. H´eroux, and Engineering, vol. 231, no. 1, pp. 012050, 2017. and S. Adam, “Graph edit distance : A new bi- [26] L. Xu, F. Sun, and Z. Su, “Construct- nary linear programming formulation,” Computer ing precise control flow graphs from bi- Science, vol. 72, pp. 254–265, 2015. naries,” University of California, 2012. [12] H. I. Lim, “Comparing control flow graphs of bi- (https://pdfs.semanticscholar.org/8a80/ nary programs through match propagation,” in IEEE f0d173ec7420478e4b96a8264e21e0dafac0.pdf) Computer Software and Applications Conference, [27] S. Yan, R. Wang, C. Salls, N. Stephens, M. Polino, pp. 598–599, 2014. A. Dutcher, J. Grosen, S. Feng, C. Hauser, and C. [13] Mudpo, “Analysis of computer virus epidemic situa- Kruegel, “Sok: (State of) the art of war: Offensive tion in March 2018,” Information Network Security, techniques in binary analysis,” in Security and Pri- no. 5, 2018. vacy, pp. 138–157, 2016. [14] L. Nan, H. Lifang, and X. Kunfeng, “An improved [28] J. Yang, X. Song, Y. Xiong, and Y. Meng, “An open software source code matching algorithm based on source software defect detection technique based on abstract syntax tree,” Information Network Security, homology detection and pre-identification vulnerabil- no. 1, pp. 38–42, 2014. itys,” Advances in Intelligent Systems and Comput- [15] G. Negi, E. Elias, R. Kohli, and V. Bibhu, “Relia- ing, vol. 773, pp. 932–940, 2019. bility analysis of test cases for program slicing,” in [29] Y. Zhibin, J. Xin, and S. Dawei, “A binary-oriented The 1st International Conference on Innovation and mixed recovery method for control flow graphs,” Challenges in Cyber Security (ICICCS’16), pp. 36– Computer Application Research, no. 7, 2018. 40, 2016. [16] H. Qin, Study on Software Homology Detection Tech- [30] X. Zhoubo, Z. Li, Ninglihua, and A. Tianlong, nology Based on AST Structure Optimization and “Graphic editing distance summary,” Computer Sci- CFG Comparison (in Chinese), Master Thesis, Bei- ence, vol. 45, no. 4, pp. 11–18, 2018. jing University of Posts and Telecommunications, Jinyue Bian is a master degree student in the school 2011. [17] Q. W. Qin, “Reverse analysis of software based on of computer science, Shanghai University. His research ida-Pro,” Computer Engineering, vol. 34, no. 22, interests include big data analysis, machine learning, pp. 86–88, 2008. and computer and network security especially in host [18] K. Sen, P. Godefroid, N. Klarlund, “DART: Directed behaviour analysis. automated random testing,” ACM Sigplan Notices, Quan Qian is a full Professor in Shanghai University, vol. 40, no. 6, pp. 213–223, 2005. China. His main research interests concerns computer [19] K. Sen, D. Marinov, and G. Agha, “Cute: A concolic network and network security, especially in cloud com- unit testing engine for C,” ACM Sigsoft Software En- puting, big data analysis and wide scale distributed gineering Notes, vol. 30, no. 5, pp. 263–272, 2005. network environments. He received his computer science [20] M. H. R. A. Shaikhly, H. M. El-Bakry, and A. Ph.D. degree from University of Science and Technol- A. Saleh, “Cloud security using markov chain and ogy of China (USTC) in 2003 and conducted postdoc genetic algorithm,” International Journal of Elec- research in USTC from 2003 to 2005. After that, he tronics and Information Engineering, vol. 8, no. 2, joined Shanghai University and now he is the dean of pp. 96–106, 2018. department of machine intelligence and technology, and [21] H. Shi, J. Mirkovic, and A. Alwabel, “Handling anti- the lab director of materials informatics and data science. virtual machine techniques in malicious software,” International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 793

A New Diffusion and Substitution-based Cryptosystem for Securing Medical Image Applications

L. Mancy and S. Maria Celestin Vigila (Corresponding author: S. Maria Celestin Vigila)

Department of Information Technology, Noorul Islam University Kumaracoil, Tamilnadu, India (Email: [email protected]) (Received Mar. 21, 2017; Revised and Accepted June 26, 2018; First Online July 11, 2019)

Abstract random advent. The second is the diffusion assets, which requires that alike keys should yield entirely contradictory In recent periods, individual security is becoming increas- cipher texts for the similar plaintext. ingly endangered. Several methods of safeguarding person Digital imageries like hypnotic quality images, X-rays are private, economic or health data are developed by en- images, etc. are generally used in remedial solicitations. tities, fabrications, and managements. Thus the patient It compact with patient proceedings that are remote and data are generally produced in the place of the health should only obtainable to official folks. Though certain images for observing. Thus, it is effortlessly available to patients are unworried about rupture of privacy could each one. The patient data obviously shown may be in- root risky awkwardness and disgrace. Therefore, there terrupted by other parties during automated broadcast. is a necessity to defend and sustain privacy of patient For analysis purposes, these health informatics desires to information. In this dispatch, an endangered image en- be voluntarily nearby to the physicians. Unique feature cryption for DICOM images, based on the substitution of this scheme is to make a revision on the Digital Imag- and diffusion transformations is suggested. A secret key ing and Communications in medicine (DICOM) standard, of 128-bits size is generated by an image a histogram. Ini- which specifies the form that all numerical health images tially, the visual feature of DICOM image is decomposed will be well-suited. Several private data, such as patient’s by the mixing process. The subsequent image is divided name, date of birth, gender, and patient identity with de- into key reliant blocks and further, these blocks are passed tails about where the image was taken. So it is essential through diffusion and substitution processes. Total five from the patient’s idea of sight is to keep the above evi- rounds are used in the encryption method. Finally the dence in private. Therefore the security of medical images generated secret key is embedded within the encrypted can be achieved through confidentiality, availability, reli- image in the process of steganography. At the receiver ability and authentication. side the secret key was recovered from the embedded im- Keywords: Diffusion; Encryption; Histogram; Steganog- age and decryption operation was performed in inverse raphy; Substitution format. Here the Steganography is the art of secreting data by means that avoid the recognition of secret com- munications. It contains a huge amount of approaches to 1 Introduction coat a communication from being grasped. The main aim Cryptography has developed a communal, to afford an ex- of steganography is to distort the presence of any secreted traordinary defense against various attacks. It diminishes statement. with the progression of systems for altering facts among The above introductory section explains the medical reasonable and worthless practices. In endangered dis- image security and provides some security measures to patches the cryptographic methods are focused by one or overcome the problem. The second section presents a re- more keys. When the keys are same, are convoked as view of literature and relevant research associated with private key cryptography. Consecutively, when keys are the problem addressed in this study. Then the third sec- different, then the cryptographic methods are known as tion explains the methodology and the steps of the pro- public key cryptography. There are two vital assets which posed cryptographic algorithm. Finally the fourth section all encryption schemes must fulfill. The first is the confu- explains the analysis of the data and the presentation of sion assets which involves, that cipher texts should have the results. Then the final section explains the conclusion International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 794 of medical image security applications. the Elliptic curve cryptosystem for a text message was implemented. Chong Fu et al. (2013) offer a chaos medi- cal image encryption pattern, through a bit-level shuffling 2 Related Works process. Viswanathan et al. (2014) advocated by giving admission to the results of medical images with secrecy, In the appraisal of works several researches have raised convenience and veracity. Abhilasha Sharma et al. (2015) the safety of medicinal images in their own idea. Certain direct the watermarking method based on DWT to embed appropriate mechanism is highlighted in this section. multiple watermark into the cover image. Jani Anbarasi Zhou et al. introduced a technique certainty and reli- et al. (2015) anticipated a multi top-secret image sharing ability of digital mammography to encounter the neces- scheme that shares the multiple disruption polynomial. sities of authenticity and integrity of images [36]. Zhang Mancy and Maria Celestin Vigila (2015) reviewed on et al. (2007) an image scrambling technique was estab- image encryption technique and their functionalities are lished on queue transformation. Zhang et al. (2007) im- analyzed. Maria Celestin Vigila and Muneeswaran (2015) plemented a secured digital communication protocols un- proposed the implementation of reversible information der various conditions. Zhicheng et al. (2008) suggested hiding in spatial domain images rooted in neighbor mean a novel lossless data embedding technique. Michal Voss image interruption without impairing the image emi- berg et al. (2008) acclimate the procedure to the globes nence. ZeinabFawaz et al. (2016) a novel image encryp- grid safekeeping administration. Gouenou Coatrieux et tion scheme based on two rounds of substitution–diffusion al. (2009) familiarized to make the image more service- is proposed. Ritu Agrawal et al. (2016) uses a loss- able, by watermarking it with a precis data. Yong Feng et less medical image watermarking method using modula- al. (2009) presented an invertible map, called Line map, tion variation was implemented, which offers high health- for image crypto system. iness and low entropy distance for safeguard. Akram Luiz Octavio Massato Kobayashi et al. (2009) a Belazi et al. (2016) a novel image encryption approach method using cryptography to improve conviction of med- based on permutation substitution network and chaotic ical images. Weihai Li et al. (2009) Peizhenwang et systems are proposed. BalaKrishnan Ramalingam et al. al. (2010) a better image encryption technique, which (2017) present permutation and diffusion based hybrid is based on hyper chaotic categorization is anticipated. image crypto system in transform domain using combined Vinod Patidar et al. (2010) a diffusion-substitution sys- chaotic maps and Haar Integer Wavelet Transform. Wei- tem, based on chaotic map, for the image encryption jia Cao et al. (2017) presents a medical image encryption is recommended. Sanfu Wangl et al. (2010) offered a algorithm using edge maps derived from a source image. image scrambling technique. Yi Wan et al. (2010) In Shahryar Toughi et al. (2017) utilize the Elliptic curve this paper, a histogram based vigorous estimator for the generator to generate a sequence of arbitrary numbers noise mixing prospect was projected. Guiliang Zhul et al. based on curves. (2010) recommended the three levels of multilayer scram- ble. Stallings (2010) has documented about security is- sues in his book. Sathish kumar et al.(2011) suggested 3 The Proposed Methodology anew algorithm for the image cryptographic system. Thomas Neuberger et al. (2011) safeguards the medi- The medical image safety has become a significant prob- cal accounts from prohibited access in which the patient lem during the storage and broadcast of data. So to pre- as material container to resolve, who are the certified serve the conveyed proof beside undesirable description, persons. Mustafa Ultras et al. (2011) authorize a se- the secret key used in encryption process is generated cret scattering system, which segments the health images from the image itself. By mixing process, the distinctive among a health team of ‘n’ doctors. Maria Celestin Vigila pixel is extended by associating the present pixel with its and Muneeswaran (2012) projected an Elliptic curvature former pixel and its session key. Here the size of the block based key generation for stream cipher. Dalel Bouslimi is decided by session key, in which the block may be of any et al. (2012) advocate a mutual encryption watermarking one of the ten different sizes. Then in diffusion process, method to combine the Quantization index modulation the pixel of each block is reorganized within the block by and an encryption algorithm. Abir Awad et al. (2012) a a spiral path pattern. In the substitution process, the novel chaotic replacement method based on the comple- pixel of each block is changed with one of their nearby mentary rule. pixels. Finally the generated secret key is embedded into Musheer Ahmad and Tanvir Ahmed (2012) recom- the cipher image using the process of steganography. mended a framework to provide visual protection to with- To secure the medical DICOM image, the secret key stand statistical attacks to medical images. Li Chin of 128 bit size is generated by an image histogram. Then Huangc et al. (2013) put forward a histogram fluctuating the medicinal image is encoded by using diffusion and method to reach high bit depth health images. Tsang substitution procedure. Total five rounds are used in the IngRen et al. (2013) estimated a motion recompense encryption method. At last the secret key is fixed within method to be applied in both encryption and decryption the encrypted image by the method of steganography. At process. Maria Celestin Vigila and Muneeswaran (2013) the receiver side the key was recovered from the embedded International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 795

Figure 1: Block diagram of the proposed methodology

image and then decryption operation is performed in the ‘H’ and ‘W’ represent the Highness and Extensive- inverse setup. Fig 1 shows the block diagram of proposed ness of the image respectively; Px,0 = {0 when x = 1 approach. & Px−1,W when x > 1}. The steps of this proposed cryptographic algorithm is explained as follows. Algorithm 1 The mixing process 1: i ←− position of first session key, i.e. 1 1) Key generation: 2: for x = 1: H do do The private key assets in the encryption process are 3: for y = 1: W do do generated from the image itself. 4: Px,y ←− (Px,y ⊕ Px,y−1 ⊕ Ki) 5: i ←− position of next session key, i.e. ((i mod Step 1. At first the histogram of the image is calcu- 16) + 1) lated by means of histogram counts. 6: end for Step 2. Then histogram counts of 255 values are ob- 7: end for tained by dividing the ith count by i+1th count. Step 3. The obtained result is rounded to the near- 3) Block Size Decision: est integer and modulo10 operation is performed The resulting image from the mixing process is di- on the values. vided into non-coinciding blocks B 1, B2....BN. Here Step 4. The resulting values will be in the range of the size of the block is decided by session key Ki .The [0-9]. block may be of any one of the ten different sizes. So, total five rounds are used to complete the encryption Step 5. From the 255 values, the first 128 values are process. In each round unique session key Ki is used. extracted and formed as a secret key with a size The table1 shows the block size decision criteria. of 128 bits. 4) Diffusion Process: Step 6. Then this key is divided into block of 8 bits Here the scrambling of the image is based on spi- to form the session keys k = k k ··· k (in 1 2 32 ral scanning pattern. Here the scrambling is done in hexadecimal) given as K = K K ··· K (in 1 2 16 a key dependent manner. The Pixel of each block ASCII). is replaced within the same block by a spiral path of size 8 × 8 matrix. Location of the starting pixel Here, ki’s referred to as sub-keys are hexadecimal for navigating in a block is made key reliant on ex- digits (0-9 and A-F) and Ki represents the session keys. clusively. For this purpose pairs of neighboring sub keys are formed. The Key pairs are given as (k1, k2), 2) Mixing Process: (k3, k4),(k5, k6), (k7, k8) and (k1, k2). When all sub- In mixing process every pixel of the image is inter- key pairs are firmed it is commenced again from the changed with its earlier pixels and the session key first sub-key pair (k1, k2). At the end of diffusion pro- by XOR operation. The algorithm related to mixing cesses, not only all pixels get altered, but also their process is given in Algorithm 1. In this algorithm, neighboring pixels are rearranged broadly within the International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 796

Table 1: Block size decision table

Ki mod 10 0 1 2 3 4 5 6 7 8 9 Block Size (Bi) 16 24 32 40 48 56 64 72 80 96

image block. The 8 × 8 matrix diagram is given in Table 2: Pixel location table Figure 2.

Sub Key Directions of Location Surrounding value adjacent surrounding pixel Px,y ki pixel to a pixel w.r.t current pixel 0/F E P (i, j + 1) 1/E NE P (i − 1, j + 1) 2/D N P (i − 1, j) 3/C NW P (i − 1, j − 1) 4/B W P (i + 1, j − 1) 5/A SW P (i + 1, j) 6/9 S P (i + 1, j) 7/8 SE P (i + 1, j + 1)

traction process and the image is decrypted in the Figure 2: Spiral path Scrambling reverse order as done in the encryption stage.

4 Performance Analysis 5) Substitution Process: In substitution process the property of the selected The current scheming power is capable of breaking en- pixel is altered by the one of the neighboring pix- cryption patterns in a real time, if the scheme is not de- els in the eight directions (i.e.) the current pixel signed to look into these problems. Hence, a good en- is XOR-ed with the neighboring pixel. Therefore cryption system would preserve away from the possible the eight neighboring pixels are North, North West, attacks. Hence, analysis of encryption systems such as North East, South, South West, South East, East histogram analysis, Entropy, Correlation analysis, etc., (E) and West (W). Here, we use neighboring as Px,y certifies the precise growth of the security scheme. which is XOR-ed with current pixel Pi,j. When all the sub-keys are shattered, begin the procedure from 1) Statistical Analysis: the first sub-key k1 again. In this step, some of the The statistical analysis involves collecting every data pixels lying on the boundary of a block may be re- sample in a set of items from which samples can be main unaffected. The pixel location table is given in drawn. The different types of statistical analysis are Table 2. given as follows.

6) Key Embedding: a. Histogram Analysis: The generated secret key is embedded into the cipher A histogram is a graphical design of numerical image using DWT transform, which is applied to the data. An image histogram is a chart that shows image to form four sub bands such as LL, LH, HL and the dispersal of intensities in a grayscale image. HH. The LL level is obtained to the size of 128×128. To prevent the outflow of data to an opponent, Then the transformed coefficients in all the bands are it is substantial to guarantee that the cipher im- converted into the binary form and along the diago- age does not have any statistical resemblance to nal elements the 3 LSB is replaced with each bit of the input image. The histogram of the input the secret key after converting each digit in binary image has massive sting. But, the histogram of form with three digits. Then after implanting the the cipher image is nearly even and constant, secret key is transformed into decimal format and signifying the nearly same probability of exis- again, it is converted into original format by IDWT. tence of each intensity level. Figure 3 shows the The ultimate output is the embedded image and the histogram of original, cipher and stegno image. secret key is mended at the receiver side through ex- b. Information Entropy: International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 797

(a) (b)

(c) (d)

(e) (f)

Figure 3: Histogram of original, encrypted, and stegno-cipher Images. (a)Original Image, (b) Histogram of Original Image, (c) Encrypted Image , (d) Histogram of Encrypted Image, (e) Stegno-Image, (f) Histogram of Stegno Image.

Information entropy is a concept from informa- Where P (Si) is the probability of an i th image. tion theory. It tells how much information there Table 3 shows the entropy of original, cipher and is at an event. In general, the more uncertain stegno image. To enterprise a good image en- or random the event is, the more information it cryption pattern, the entropy of encrypted im- will contain. It has applications in many areas, age should be closer to the ultimate expected including lossless data compression, statistical value 8. Therefore the information outflow in inference, cryptography, etc. The Entropy cal- the proposed cipher is insignificant, and it is culation formula gives as follows, safe upon the entropy outbreak.

2) Correlation Analysis: X H(s) = P (Si) log 2 × (1/P (Si)). (1) The correlation is an arithmetic method that shows International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 798

5 Conclusion Table 3: Entropy of original, cipher and plain image To secure the medical image, the secret key of 128 bit size Image Entropy is generated by means of image histogram and the medi- Original Image 7.2551 cal image is encrypted by using diffusion and substitution Cipher Image 7.5564 process. Finally the secret key is embedded within the en- Stegno Image 7.6463 crypted image in the process of steganography. This also enriched the security of medical image. At the receiver side the key was recovered from the embedded image and then decryption is performed in the reverse format. Due whether and how powerfully the couples of variables to this, the security of medical image is enriched. Here are connected. The correlation between pairs of origi- we deliberate the security analysis of medical image en- nal and its corresponding encrypted image produced cryption pattern such as statistical analysis, correlation using the proposed image encryption algorithm by analysis and key sensitivity analysis, which prove that the computing the correlation coefficients. The formula proposed cipher is safe against the most common attacks. related to correlation coefficients are given as follows.

p p References Cx,y = cov(x, y)/ D(x) D(y) X E(x) = 1/N × x [1] R. Agrawala, M. Sharma, “Medical image water- i marking technique in the application of diagnosis us- X 2 D(x) = 1/N × (xi − E(x)) ing M-ary modulation,” in International Conference X on Computational Modeling and Security, 2016. Cov(x, y) = 1/N × (xi − E(x))(yi − E(y)). [2] M. Ahmed, T. Ahmed, “A framework to protect pa- tient digital imagery for secure telediognosis,” Pro- In the given formula, there are N pairs of neighboring cedia Engineering, vol. 38 , pp. 1055-–1066, 2012. pixels and x, y represent the intensity values of two [3] J. S. Anbarasi, A. G. S. Malab, M. Narendrac, “ neighboring pixels from an image. Table 4 shows the DNA based multi-secret image sharing,” in Interna- correlation of original, cipher and stegno images. tional Conference on Information and Communica- tion Technologies, 2015. [4] A. Awad and A. Miri, “A new image encryption algo- 3) Key Sensitivity Analysis: rithm based on a chaotic dna substitution method,” Even a variation in a single bit of the key will make in IEEE International Conference on Communica- an entirely different cipher image for the attackers tions (ICC’12) , 2012. to identify the key. This makes the encryption pro- [5] A. Belazi, A. A. Abd El-Latif, S. Belghith, “A novel cedure sensitive enough to the secret key. To test Image Encryption Scheme based on Substitution- the sensitivity of the proposed image cipher with re- Permutation network and chaos,” Signal Processing, spect to the key, encrypted image corresponding to vol. 128, pp. 155–170, 2016. plain image is decrypted with a slightly different key [6] D. Bouslimi, G. Coatrieux, M. Cozic, C. Roux, “A than the original one. Here the two cipher images joint encryption/watermarking system for verifying are associated, in which it was not easy to compare the reliability of medical images,” IEEE Transactions the cipher images by simply observing these images. on Information Technology in Biomedicine, vol. 16, Thus, for comparison, the correlation between the pp. 5, pp. 891–899, 2012. identical pixels of the two cipher images is intended. [7] W. Cao, Y. Zhou, C. L. P. Chen, L. Xia, “Medical Table 5 shows the entropy and correlation between image encryption using edge maps,” Signal Process- two cipher images. ing, vol. 132, pp. 96–109, 2017. [8] G. Coatrieux, C. Le Guillou, J. M. Cauvin, “Re- Here Figure 4 shows the Key Sensitivity Analysis versible watermarking for knowledge digest embed- of original and two cipher images. Therefore a per- ding and reliability control in medical images,” fect diffusion and substitution method should resist IEEE Transactions on Information Technology in against all kinds of attacks. Hence some analysis Biomedicine, vol. 13, no. 2, pp. 158—165, 2009. techniques such as statistical, correlation and key [9] Z. Fawaz, H. Noura and A. Mostefaoui, “An efficient sensitivity analysis are discussed in the above ses- and secure cipher scheme for images confidentiality sion to prove that the proposed algorithm is secre- preservation,” Signal Processing, vol. 42, pp. 90–108, tive against most common attacks. Therefore a new 2016. diffusion and substitution based cryptosystem for se- [10] Y. Feng, X. Yu, “A novel symmetric image encryp- curing the medical image applications is implemented tion approach based on an invertible two-dimensional and performance analysis designates that the pro- map,” in 35th Annual Conference of IEEE Industrial posed cipher is more secure. Electronics, pp. 4244–4649, 2009. International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 799

Table 4: Correlation of original, cipher and stegno images

Image Vertical Correlation Horizontal Correlation Diagonal Correlation Original Image -0.0053 -0.0075 -0.0068 Cipher Image -0.0063 -0.0042 -0.0130 Stegno Image -0.0089 -0.0032 -0.0073

(a) (b) (c)

Figure 4: Key sensitivity analysis of original, cipher image 1 and cipher image 2. (a)Original Image, (b) Cipher Image 1, (c) Cipher Image 2.

Table 5: Entropy and Correlation between two cipher images

Image Entropy Vertical Correlation Horizontal Correlation Diagonal Correlation Cipher Image 1 7.5564 -0.0063 -0.0042 -0.0130 Cipher Image 2 7.6463 -0.0089 -0.0032 -0.0073

[11] C. Fu, W. H. Meng, Y. F. Zhan, et al., “An efficient [17] Z. Ni, Y. Q. Shi, N. Ansari, W. Su, Q. Sun, X. Lin, and secure medical image protection scheme based on “Robust lossless image data hiding designed for semi chaotic maps,” Computers in Biology and Medicine, fragile image authentication,” IEEE Transactions on vol. 43, pp. 1000–1010, 2013. Circuits and Systems for Video Technology, vol. 18, [12] L. C. Huang, L. Y. Tseng, M. S. Hwang, “A re- no. 4, pp. 497—509, 2008. versible data hiding method by histogram shifting in [18] V. Patidar, G. Purohit, K. K. Sud, N. K. Pareek, high quality medical images,” The Journals of Sys- “Image encryption through a novel permutation- tems and Software, vol. 86, pp. 716–727, 2013. substitution scheme based on chaotic standard map,” [13] L. O. M. Kobayashi, S. S. Furuie, P. S. L. M. Barreto, in International Workshop on Chaos-Fractal Theory “Providing integrity and authenticity in DICOM im- and Its Applications, 2010. ages: A novel approach,” IEEE Transactions on In- [19] B. Ramalingam, A. Rngarajan, J. B. B. Rayappan, formation Technology in Biomedicine, vol. 13, no. 4, “ Hybrid image crypto system for secure image com- pp. 582–589, 2009. munication - A VLSI approach,” Microprocessor and [14] W. Li, Y. Yuan, “ Improving security of an image Micro Systems, vol. 50, pp. 1–13, 2017. encryption algorithm based on chaotic circular shift,” in IEEE International Conference on Systems and [20] C. C. Sabino, L. S. Andrade, T. I. Ren, G. D. C. Cav- Cybernetics, 2009. alcanti, T. I. Jyh, J. Sijbers, “Motion compensation [15] L. Mancy, S. M. C. Vigila, “A survey on protection techniques in permutation-based video encryption,” of medical images,” in International Conference on in IEEE International Conference on Systems and Instumentation, Communication and Computational Cybernetics, 2013. Technologies, 2015. [21] G. A. Sathishkumar, K. B. Bagan, N. Sriraam, [16] T. Neubauera, J. Heurixb, “A methodology for the “Image encryption based on diffusion and multiple pseudonymization of medical data,” International chaotic maps,” International Journal of Network Se- Journal of Medical Informatics, vol. 80, pp. 190–204, curity & Its Applications, vol. 3, pp. 2, pp. 181—194, 2011. 2011. International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 800

[22] A. Sharma, A. K. Singh, S. P. Ghrera, “Secure hy- ping,” in International Conference on Computer Ap- brid robust watermarking technique for medical im- plication and System Modeling, 2010. ages,” in International Conference on Eco-friendly [33] S. Wang, Y. Zheng, Z. Gao, “ A new image scram- Computing and Communication Systems, 2015. bling method through folding transform,” in Inter- [23] W. Stallings, Cryptography and Network Security, national Conference on Computer Application and Fifth Edition, 2010. System Modeling, 2010. [24] S. Toughi, M. H. Fathi, Y. A. Sekhavat, “ An im- [34] H. Y. Zhang, “A new image scrambling algorithm age encryption scheme based on elliptic curve pseudo based on queue transformation,” in International random and advanced encryption system,” Signal Conference on Machine Learning and Cybernetics, Processing, vol. 141, pp. 217–227, 2017. pp. 19–22, 2007. [25] M. Ulutas, G. Ulutas, V. Nabiyev, “Medical image security and EPR hiding using Shamir’s secret shar- [35] J. Zhang, F. Yu, J. Sun, Y. Yang, C. Liang, “DICOM ing scheme,” The Journals of Systems and Software, image secure communications with internet protocols vol. 84, pp. 341–353, 2011. IPv6 and IPv4 ,” IEEE Transactions on Informa- [26] S. M. C. Vigila, K. Muneeswaran, “Key generation tion Technology in Bio medicine, vol. 11, no. 1, pp. based on elliptic curve over finite prime field,” Inter- 70—80, 2007. national Journal on Electronic Security and Digital [36] X. Q. Zhou, H. K. Huang, and S. L. Lou, “Authen- Forensics, vol. 4, pp. 1, pp.65—81, 2012. ticity and integrity of digital mammography images,” [27] S. M. C. Vigila, K. Muneeswaran, “A new elliptic IEEE Transaction on Medical Imaging, vol. 20, pp. curve cryptosystem for securing sensitive data appli- 8, pp. 784—791, 2001. cations,” International Journal on Electronic Secu- [37] G. Zhu, W. Wang, X. Zhang, M. Wang, “ ZGW-1 rity and Digital Forensics, vol. 5, pp. 1, 2013. digital image encryption algorithm based on three [28] S. M. C. Vigila, K. Muneeswaran, “Hiding of con- levels and multilayer scramble,” in 2nd IEEE Inter- fidential data in spatial domain images using image national Conference on Network Infrastructure and interpolation,” International Journal of Network Se- Digital Content, 2010. curity, vol. 17, No. 6, pp. 722–727, 2015. [29] P. Viswanathan, V. P. Krishna, “A joint FED wa- termarking system using spatial fusion for verifying the security issues of teleradiology,” IEEE Journal Biography of Biomedical and Health Informatics, vol. 18, no. 3, 2014. S. Maria Celestin Vigila completed her B.E. in Com- [30] M. Vossberg, T. Tolxdorff, “DICOM Image puter Science and Engineering in 1996 and M.E. in Com- Communication in Globus-Based Medical Grids,” puter Science and Engineering in 1999. She completed her IEEE Transactions on Information Technology in Ph.D. in the area of data security from Anna University, Biomedicine, vol. 12, pp. 2, pp. 145—153, 2008. Chennai. She is currently Associate Professor in the De- [31] Y. Wan, Q. Chen, Y. Yang, “Robust impulse noise partment of Information Technology, Noorul Islam Uni- variance estimation based on image histogram,” versity, Kumaracoil and member of ISTE and IET. She is IEEE Signal Processing Letters, vol. 17, no. 5, pp. the reviewer for quite a few peer reviewed international 485—488, 2010. journals. Her research interest includes cryptography and [32] P. Wang, H. Gao, M. Cheng, X. Ma, “A new image network security, wireless networks and information hid- encryption algorithm based on hyperchaotic map- ing. International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 801

A LWE-based Oblivious Transfer Protocol from Indistinguishability Obfuscation

Shanshan Zhang1,2 (Corresponding author: Shanshan Zhang)

State Key Laboratory of Integrated Services Networks, Xidian University1 No. 2, Taibai South Road, Xi’an 710071, Shaanxi Province, China School of Mathematics and Information Science, Baoji University of Arts and Sciences2 No. 44, Baoguang Road, Baoji 721013, Shaanxi Province, China (Email: [email protected]) (Received Mar. 18, 2019; Revised and Accepted Aug. 4, 2019; First Online Jan. 29, 2020)

Abstract wants to ensure that R receives only one of the two mes- sages. Due to the simple functionality of OT, it has been Oblivious transfer is an important cryptographic primi- widely exploited to construct cryptographic schemes, such tive and served as a powerful tool in secure computation. as contract signing, secure multi-party computation, the Most existing oblivious transfer protocols are built upon exchange of secrets, and key agreement. Therefore, it is the hardness of factoring or computing discrete logarithm of great significance to design efficient OT protocols. problem. However, threatened by quantum computing, these protocols will be broken down directly in the pres- ence of quantum computer. Therefore, it is essential to 1.1 Our Motivation construct OT protocol based on post-quantum cryptogra- As far as we know, most existing OT protocols are built phy. As a subarea of post-quantum cryptography, lattice- upon number theoretical problems mainly consist of the based cryptography has some attractive features. Specif- hardness of factoring and computing discrete logarithm ically, the learning with errors (LWE) problem has been problem. However, threatened by quantum comput- used as an amazingly versatile basic tool to design cryp- ing [16], these protocols will be broken down directly in tographic schemes. We are inspired by a result which pro- the presence of quantum computer. Therefore, it is es- posed an oblivious transfer protocol using the decisional sential to construct OT protocol based on post-quantum Diffie-Hellman assumption and indistinguishable obfusca- cryptography, such as lattice-based protocol and code- tion. Therefore, we propose a new secure LWE-based based protocol. Lattice-based cryptography has some oblivious transfer protocol from indistinguishability ob- attractive properties when compared with other post- fuscation. The main tools consist of LWE-based dual- quantum research fields, for instance, strong security mode cryptosystem and a secure indistinguishability ob- guarantees from worst-case hardness and algorithmic sim- fuscation which guarantee the security of our oblivious plicity. Among lattice-based hard problems, the learn- transfer protocol. ing with errors (LWE) problem [14] has been used as Keywords: Dual-mode Cryptosystem; Indistinguishability an amazingly versatile basic tool to design cryptographic Obfuscation; Lattice-based; Oblivious Transfer schemes. Specifically, Peikert et al. [11] proposed an ef- ficient and universally composable OT protocol which is extracted from a dual-mode cryptosystem and can be in- 1 Introduction stantiated with the decisional Diffie-Hellman assumption, the quadratic residuosity assumption and the worst-case Oblivious transfer (OT) is a fundamental cryptographic lattice assumption, respectively. primitive, first proposed by Rabin [13] in 1981. It contains Zheng et al. [17] researched the framework for compos- two participants, a sender (denoted by S), and a receiver able oblivious transfer and proposed a secure OT protocol (denoted by R), and requires that S sends a message to from indistinguishability obfuscation (iO), with a dual- R with probability 1/2, while S is oblivious to whether or mode cryptosystem and an iO as main technical tools. not the message was received by R. A well-known flavor of Their work mainly has the following contributions. First, 2 OT is called 1-out-of-2 OT (denoted by OT1), where S has a k-out-of-n OT protocol was presented. Second, it ex- two inputs m0 and m1, and R has a chosen bit b ∈ {0, 1}, plored the applications of iO. iO is a weaker notation of and R wishes to obtain mb, without S learning b, while S obfuscation that was first formally defined by [2]. They International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 802 suggested a definition of virtual black box obfuscation, notes the quotient ring Z/qZ. Let “ ← ” denote sampling and proved that this notion is impossible to realize. In an element from some distribution uniformly at random. order to avoid the impossibility result, they presented the We use bold lower-case letters to denote vectors in col- notion of iO, which only requires that if two circuites com- umn form, and bold upper-case letters to denote matri- pute the same functionality, then their obfuscation should ces. Let n ∈ N denote the security parameter through- be computational indistinguishable from each other. iO out this paper, and all other quantities are functions of n. is both very useful and potentially achievable. Garg et We use standard notation o to classify the growth of func- al. [7] proposed the first candidate GGH13-based [6] iO tions, the function negl(n) denotes an unspecified function for general circuits. Subsequently, many applications of f(n) = o(n−c) for some constant c > 0, calling negl(n) iO were described in [15], such as public encryption, in- is negligible, and we say a probability is overwhelming if jective trapdoor function, deniable encryption, and so on. it is 1-negl(n). We use the definition of computational c The OT protocol of Zheng et al. has a main tool that indistinguishability, denoted by ≈. is based on the hardness of the decisional Diffie-Hellman (DDH) problem. However, DDH assumption does not guarantee against quantum attack, and the selection of 2.2 Learning with Errors iO is the first candidate that have been attacked by [4]. The LWE problem was proposed by Regev [14], the hard- For these reasons, we aim to remedy the insufficiency and ness of it can be reduced by a quantum algorithm to some try to design another new OT protocol that is security standard problems on lattices in the worst case. in quantum setting. Therefore, we take advantage of the For an integer q = q(n) ≥ 2 and some probability LWE problem and a secure iO. Furthermore, if the secu- distribution χ over Zq, we define As,χ as the distribution rity of OT protocol is proved only according to an ideal n T on Zq ×Zq of the tuples (a, c) = (a, a s+e) where s, a ← world simulator that is shown only for a cheating receiver, n Zq is uniform and e ← χ, and all operations are performed then they are not necessarily secure when integrated into in Zq. There are two versions of the LWE problem, search- a lager protocol. Thus, our protocol needs to satisfy the LWE and decision-LWE, respectively. property of universally composable simultaneously. Definition 1 (Search-LWE and decision-LWE). For an integer q = q(n) and a distribution χ on Zq, for any 1.2 Our Contribution n s ∈ Zq , search-LWE finds s given any independent sam- In this work, we combine LWE-based dual-mode cryp- ples (a, c) from As,χ.The goal of decision-LWE is to dis- tosystem [11] and a secure iO to design a new OT pro- tinguish between an oracle that returns independent sam- n tocol. The key technique is an obfuscator of the dual- ples from As,χ for some uniform s ← Zq , and an oracle mode cryptosystem based on the hardness of LWE, versus that returns independent samples from the uniform distri- n based on DDH assumption in [17]. It is important that we bution on Zq × Zq. choose GGH13-based obfuscator which against quantum attack when combined with the technique of [5] to prevent Regev showed that these two versions are polynomially input partitioning. By utilizing these tools, we realize the equivalent for q = poly(n). He proved that for certain oblivious transform functionality, and guarantee security choices of q and χ, the decision-LWE problem is as hard as of our OT protocol. solving the shortest independent vectors problem (SIVP) using a quantum algorithm. 1.3 Organization Theorem 1. Let q = q(n) be√ a prime and let α = The rest of this paper is organized as follows. In Section 2, α(n) ∈ (0, 1) such that αq > 2 n. If there exists an we introduce two useful definitions of LWE and iO. Then efficient algorithm that solves the decision-LWE problem, two corresponding building blocks are given in Section 3. then there exists an efficient quantum algorithm for the In Section 4, we construct an LWE-based oblivious trans- SIVP within Oe(n/α) in the worst case. fer protocol from iO, and the security proof of our OT Due to the hardness of SIVP, we choose the decision- protocol is presented in Section 5. Finally, conclusions LWE problem as underlying hardness in this paper. are drawn in Section 6. 2.3 Indistinguishability Obfuscation 2 Preliminares Program obfuscation aims to make computer programs In this section, we introduce some notations and funda- “unintelligible” while preserving their functionality. The mental definitions. systematic study of program obfuscation was initiated by Barak et al. in 2001. In their work, they gave a poten- 2.1 Notation tially realizable notion of iO. iO requires that, given any two equivalent circuits of the same size, the obfuscation We let N denote the set of natural numbers, for n ∈ N,[n] of these two circuits should be computationally indistin- denotes the set {1, . . . , n}. For an integer q ≥ 1, Zq de- guishable. The specific definition is as follows. International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 803

Definition 2 (Indistinguishiability obfuscation). A PPT These security properties make the dual-mode cryptosys- algorithm iO is said to be an indistinguishability obfusca- tem able to derive a UC-secure OT protocol. tor for a class of circuits C, if it satisfies: The instantiation of the dual-mode cryptosytem based Functionality: For any C ∈ C and security parameter on the hardness of LWE relies on existing techniques, n, including an LWE-based encryption and an efficient se- P r[∀x : iO(C, 1n)(x) = C(x)] = 1. curely embedded a trapdoor algorithm. So, we first in- troduce an optimized version of the LWE-based encryp- Indistinguishability: For any PPT distinguisher D, tion, then instancing the dual-mode cryptosystem, where there exists a negligible function negl(·), such that for any the message space is Z2 = {0, 1}. Let the modulus two circuits C0,C1 ∈ C that compute the same function q = poly(n) be a prime, all operations are performed over are of the same size: Zq. For every message M ∈ Z2, the “center” of M is defined as t(M) = M · bq/2c ∈ Zq. Let χ denote an error |P r[D(iO(C , 1n)) = 1] − P r[D(iO(C , 1n)) = 1]| 0 1 distribution over Zq. n n×m ≤ negl(n). LWEKeyGen (1 ): Choose a matrix A ∈ Zq and n×1 a secret key s ← Zq are both uniformly at ran- Starting with the work of [7] constructing the first iO dom. To generate the public key, choose an error 1×m candidate for the polynominal-size circuit. Several iO vector x ← Zq where each entry xi ∈ χ is cho- candidates have appeared in literatures [1,3,9,10]. Unfor- sen independently for all i ∈ [m]. Then compute tunately, many constructions are attacked by [10]. So we p = sT A + x, the public key is (A, p). choose a secure iO [12] based on GGH13 multilinear map that haven’t found classical attack and quantum attack. LWEEnc ((pk = (A, p),M): To encrypt a message M ∈ m Z2, choose a vector e ∈ Z2 uniformly at random. The ciphertext is the pair (u, c) = (Ae, pe + M · n 3 Building Blocks bq/2c) ∈ Zq × Zq. LEWDec ((sk = s, (u, c))): Compute d = c − sT u ∈ , In order to construct an LWE-based oblivious transfer Zq output 0 if d is closer to 0 than bq/2c modulo q, protocol from iO, we need two building blocks, including otherwise output 1. LWE-based dual-mode cryptosystem and a secure iO. We verify the completeness of the encryption scheme 3.1 LWE-based Dual-mode Cryptosys- based on the LWE. The decryption algorithm needs to tem compute T T T The dual-mode cryptosystem is a simple and general d = c − s u = (s A + x)e + M · bq/2c − s Ae framework, proposed by Peikert et al. [11]. Actually it = xe + M · bq/2c ∈ . is an encryption scheme that can operate in two modes, Zq which are called messy mode and decryption mode. The If xe + M · bq/2c is closer to 0 than bq/2c modulo q, then trusted setup phase produces a common reference string output 0, otherwise output 1. (denoted by crs) and the corresponding trapdoor infor- The encryption scheme based on the LWE is secure mation according to one of two chosen modes. The crs under chosen plaintext attack, unless SIVP and GapSVP may be uniformly random or be some specified distribu- are easy for quantum algorithms. tion. We now give the construction of the LWE-based dual- The dual-mode cryptosystem has four security proper- mode cryptosystem using LWE-based encryption. It con- ties: sists six probabilistic algorithms, and the last two algo- rithms are only used in the security proof. 1) It suffices decryption completeness with overwhelm- n n×m ing probability over the randomness of the entire ex- SetupMessy (1 ): Choose a matrix A ∈ Zq uni- periment; formly at random, together with a trapdoor t = 1×m {S, A} as in [8]. Choose a row vector w ← Zq . 2) Given crs, the first outputs of SetupMessy and Se- For each b ∈ {0, 1}, choose an independent row vec- 1×m tupDec are computationally indistinguishable; tor τb ∈ Zq uniformly at random. Let crs = (A, τ1, τ2) and output (crs, t). 3) In Messy mode, for every pk, at least one of the de- n n×m rived public keys can statistically hide its encrypted SetupDec (1 ): Choose a matrix A ∈ Zq and a row 1×n message; vector w ← Zq are both at random. For each n b ∈ {0, 1}, choose a secret sb ← Zq and an error row 4) In decryption mode, the honest receiver’s chosen bit vector xb ← χ are both uniformly at random. Let T σ is statistically hidden by its choice or the base key τb = sb A + xb − w, crs = (A, τ1, τ2), t = (w, s0, s1) pk. and output (crs, t). International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 804

KeyGen (σ): Choose a secret s ← n uniformly at Zq P 1 random and a row vector x ← χ1×m. Let pk = T s A + x − τσ, sk = s, and output (pk, sk). 1 SK ,g1,g2 Given input (M, e1, e2, φ), (P 1 FHE ) proceeds Enc(pk, b, M): Output y ← LWEEnc(A, pk + τb,M), as follows: where y is the pair (u, c). 1. Check if φ is a valid low-depth proof for the NP- statement: Dec(sk, y): Output M ← LWEDec(sk, (u, c)). 1 e1 = EvalFHE(PKFHE,Uλ(·,M), g1), Findmessy(t, pk): Parse t as (S, A), run 2 e2 = EvalFHE(PKFHE,Uλ(·,M), g2). ISMessy (S, A, pk + τb) for each b ∈ {0, 1}, and output a b such that IsMessy can output messy 2. If the check fails output 0; otherwise, output on at least one branch correctly with overwhelming 1 probability. DecryptFHE(e1, SKFHE).

TrapKeyGen(t): Parse t as (w, s0, s1), and out- Figure 1: Program P1 put (pk, sk0, sk1) = (w, s0, s1).

According to [8], we know an efficient and UC-secure OT P 2 protocol based on LWE hardness can be directly derived when the LWE-based dual-mode cryptosystem built well. 2 SK ,g1,g2 Although the LWE-based dual-mode cryptoystem is a re- Given input (M, e1, e2, φ), (P 2 FHE ) proceeds laxed version, it can still derive a UC-secure OT protocol as follows: based on the LWE hardness. 1. Check if φ is a valid low-depth proof for the NP- statement:

1 3.2 Secure Indistinguishability Obfusca- e1 = EvalFHE(PKFHE,Uλ(·,M), g1), 2 tion e2 = EvalFHE(PKFHE,Uλ(·,M), g2). In this section, we introduce a secure iO as another tool in 2. If the check fails output 0; otherwise, output our scheme. Indistinguishability obfuscation is a power- 2 ful notion, which holds for every pair functionally equiv- DecryptFHE(e2, SKFHE). alent circuits C0,C1 that iO(C0) and iO(C1) are compu- tationally indistinguishable. Almost all known candidate Figure 2: Program P2 constructions of iO are based on multilinear-maps, which have been the subjects of various attacks. The first candi- date branching program obfuscator can be attacked when input M, denoted by Evaluate(τ, M). Compute the fol- the branching program has input partitioning. So we com- lowing procedure, where Uλ is a poly-sized universal cir- bine it with the prevent input partitioning technique to cuit. against cryptanalytic attacks. 1) Compute In this section, we introduce the secure iO for all 1 1 circuits. Firstly, we need to construct iO for NC cir- e1 = EvalFHE(PKFHE,Uλ(·,M), g1), cuit. More specifically, an NC1 circuit can be computed e = Eval (PK2 ,U (·,M), g ). by branching programs. Let f : {0, 1}n → {0, 1} be 2 FHE FHE λ 2 a function to be obfuscated. Fernando et al. [5] give 2) Compute a low depth proof φ that e1 and e2 were a model which takes partitionable f as input and pro- computed correctly. duces a function g with the same functionality, where 3) Run P (M, e , e , φ) and output the result. g has no input partitions exist. In this way, GGH13 - 1 2 based iO can defence the extension of annihilation at- tacks by [4]. Secondly, using iO for NC1 circuit together with Fully Homomorphic Encryption (FHE) to achieve 4 LWE-based Oblivious Transfer iO for all circuits. The process contains an obfuscation Protocol from Indistinguishabil- algorithm and an evaluation algorithm. To obfuscate a ity Obfuscation circuit C, we choose and publish two FHE keys PK0 λ and PK1. Obfuscate(1 , C ∈ Cλ), then output τ = (P , 1 1 2 SKFHE 4.1 Obfuscator for LWE-based Dual- PKFHE, PKFHE, g1, g2), where P = iONC1 (P 1 , SK1 mode Cryptosystem g1, g2), g1 = EncryptFHE(P 1 FHE ,C) and g2 = SK2 EncryptFHE(P 1 FHE ,C). We describe the two pro- The setup of a dual-mode cryptosystem has messy mode gram classes in Figure 1 and Figure 2. The evaluate al- and decrytion mode, and they are computationally indis- gorithm takes in the obfuscation output τ and program tinguishable. Therefore, their obfuscating results are still International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 805 indistinguishable. We know SetupMessy and SetupDec have two choices respectively, denoted as four circuits Cn that describe in Figure 3, 4, 5, 6, where n = 1, 2, 3, 4. We obfuscate these circuits to substitute the two modes in LWE-based dual-mode cryptosystem. The key is to Circuit C1 describe the four circuits Cn, that are based on LWE en- cryption scheme. Input: choose a matrix A ∈ n×m and a row vector The circuit C1 and C2 output truly random vectors, Zq w ← 1×m are both uniformly at random. Choose a and the circuit C3 and C4 output LWE instantiations. Zq secret s ← n and an error row vector x ← χ1×m Through obfuscating these circuits C1, C2, C3 and C4, 0 Zq 0 we obtain the result that their outputs are computation- are all uniformly at random. 1×m ally indistinguishable. The specific setup of the LWE- Output: a row vector v0 ← Zq uniformly at ran- based dual-mode cryptosystem are called two obfuscation dom. branches as follows. Figure 3: Circuit C1

n n×m Messy-obf-branch (1 ): Choose a matrix A ∈ Zq uniformly at random, together with a trapdoor t = 1×m {S, A} as in [8]. Choose a row vector w ← Zq . n Circuit C2 For each b ∈ {0, 1}, choose a secret sb ← Zq and an error row vector xb ← χ are both uni- formly at random. Let τ = Obfuscate(1λ,C ), n×m 1 1 Input: choose a matrix A ∈ Zq and a row vector τ = Obfuscate(1λ,C ), crs = (A, τ , τ ) and out- 1×m 2 2 1 2 w ← Zq are both uniformly at random. Choose a put (crs, t). n 1×m secret s1 ← Zq and an error row vector x1 ← χ are both uniformly at random. n n×m 1×m Dec-obf-branch (1 ): Choose a matrix A ∈ Zq and Output: a row vector v1 ← Zq uniformly at ran- 1×n a row vector w ← Zq are both uniformly at ran- dom. n dom. For each b ∈ {0, 1}, choose a secret sb ← Zq Figure 4: Circuit C2 and an error row vector xb ← χ are both uni- λ formly at random. Let τ1 = Obfuscate(1 ,C3), τ2 = λ Obfuscate(1 ,C4), crs = (A, τ1, τ2), t = (w, s0, s1) and output (crs, t). Circuit C3 Next, we invoke the evaluate algorithm in KeyGen pro- cess, where we let V = Obfuscate(τ , A) and V = 0 1 1 Input: choose a matrix A ∈ n×m and a row vector Obfuscate(τ , A). Comparing to the above LWE-based Zq 2 w ← 1×m are both uniformly at random. Choose a dual-mode cryptosystem, the rest of the steps are identi- Zq secret s ← n and an error row vector x ← χ1×m cal except that V is substituted for v . 0 Zq 0 σ σ are both uniformly at random. T Output: v0 = s0 A + x0 − w. 4.2 Oblivious Transfer Protocol from In- Figure 5: Circuit C3 distinguishability Obfuscation

2 OT1 is a two-party protocol, involving a sender S inputs M0,M1 and a receiver R inputs a choice bit σ ∈ {0, 1}. Circuit C The result is that R learns Mσ and nothing about an- 4 other message, while S learns nothing at all. Our OT protocol operate in the common reference string model, n×m Input: choose a matrix A ∈ Zq and a row vector denoted by F D , where D denotes a PPT algorithm. F D 1×m crs crs w ← Zq are both uniformly at random. Choose a runs with two parties and there is a trusted party which n 1×m secret s1 ← Zq and an error row vector x1 ← χ can produce crs for two parties before interacting. Once are both uniformly at random. the obfuscator for LWE-based dual-mode cryptosystem is T Output: v1 = s1 A + x1 − w. constructed well, our LWE-based OT protocol from iO de- noted by iOdmbranch can be derived directly, we describe Figure 6: Circuit C4 the protocol in Table 1. It can realize the exact definition D branch of the ideal OT functionality in the Fcrs. iOdm op- erates in two branches, when D=Messsy-obf-branch, D runs in the Messsy-obf-branch; when D=Dec-obf-branch, D runs in the Dec-obf-branch. International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 806

Table 1: Protocol iOdmbranch for oblivious transfer Sender Receiver (sid, ssid,M0,M1)(sid, ssid, σ) Setup: (sid, S, R) (sid, S, R) GGGGGGGGGGGGGGGGB D EGGGGGGGGGGGGGGGG FGGGGGGGGGGGGGGGG Fcrs GGGGGGGGGGGGGGGGC (sid, crs) (sid, crs) Multi-session OT:

(sid, ssid, pk) DGGGGGGGGGGGGGGGGGGG (pk, sk) ← KeyGen-obf-branch(crs, σ) yb ← Enc(pk, b, Mb) (sid, ssid, y0, y1) for each b ∈ {0, 1} GGGGGGGGGGGGGGGGGGGGGGGA outputs (sid, ssid, Dec(sk, yσ))

T 5 Security Proof Through defining w = s A + x − v0 and using s0 and x0 replace s and x, the second game outputs Obfuscator for LWE-based dual-mode cryptosystem in- T cluds two obf-branches, the trapdoor generation of keys in (A, s0 A + x0 − w, v1, w, s0), Dec-obf-branch, and the guaranteed existence and identi- fication of messy branches in Messy-obf-branch. Its prop- where w is uniform. Because v1 is uniform and indepen- erties take the form of theorem as follows. dent of the other variables, the third game outputs T Theorem 2. In the obfuscator for LWE-based encryption (A, s0 A + x0 − w, v1 − w, w, s0). scheme construction, the Messy-obf-branch and Dec-obf- branch are indistinguishability, assuming LWE is hard. The above three games are equivalent to each other. The hardness of LWE implies that (A, v1) is distin- T n Proof. The output of Dec-obf-branch is of form guishable from (A, s1 A + x1), where s1 ← Zq and T T 1×m (A, (s0 A+x0)−w, (s1 A+x1)−w). Because of the hard- x1 ← χ . So the prior games are indistinguishable T c ness of LWE, we have (A, s1 A + x1) ≈ (A, w1), where from the one that outputs 1×m w1 ← Z is uniformly random and independent. So we q (A, sT A + x − w, sT A + x − w, w, s ). T T c T 0 0 1 1 0 have (A, (s0 A + x0) − w, (s1 A + x1) − w) ≈ (A, (s0 A + x0) − w, w1 − w). The right side of the vector equation is This is the whole process, the final output is equivalent totally uniform, because w and w1 are uniform and inde- to (crs, (pk, sk0)) by definition. pendent. By the output of Messy-obf-branch is entirely c Theorem 4. In the obfuscator for LWE-based en- uniform, thus (A, (sT A + x ) − w, (sT A + x ) − w) ≈ 0 0 1 1 cryption scheme construction using the parameters (A, v , v ). √ 0 1 m ≥ 2(n + 1)logq and t ≥ qm · log2m, for (crs, t) ← Theorem 3. In the obfuscator for LWE-based en- Messy-obf-branch(1n) and every key pk, FindMessy(t, pk) cryption scheme construction satisfying for every outputs a messy branch with overwhelming probability. (crs, t) ← Dec − obf − branch(1n), TrapKenGen(t) Proof. In [8], the facts are as follows. Let m ≥ 2(n + outputs (pk, sk , sk ) such that for every σ ∈ {0, 1}, √ 0 1 1)logq and t ≥ qm · log2m, there is a negligible function (pk, sk ) ≈ KeyGen(σ), assuming LWE is hard. σ negl(m) such that with overwhelming probability over the √ Proof. The case σ = 0 and σ = 1 are symmetrically, so choice of A, S, for all but an at most (1/2 q)m fraction of 1×m we consider only one of them. Given the case σ = 0, we vectors p ∈ Zq , Ismessy(S, A, p) outputs messy with 1×m will prove that overwhelming probability. Define D ⊆ Zq to be the set of vectors p, then we have n c (Dec − obf − branch(1 ), KeyGen(0)) ≈ (crs, (pk, sk0)), √ m 1×m P r[v ∈/ D] ≤ (1/2 q) , where v ∈ Zq . where (crs, t) ← Dec − obf − branch(1n) and (pk, sk0, sk1) ← T rapKeyGen(t). We get the result using a sequence of hybrid games. 1×m For every pk ∈ Zq , there is a branch pk + vb ∈ M, 1×m By the outputs of these two branches are indistinguish- where b ∈ {0, 1} and v0, v1 ∈ Zq in the crs. For any able, we have the first hybrid game expands as fixed pk, we have

T (A, v0, v1, s A + x − v0, s), P r[pk + v0 ∈/ M and pk + v1 ∈/ D]

1×m 2 m where A, v0, v1 and s are uniform and x ← χ . = (P r[v ∈/ D]) ≤ (1/4q) . International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 807

For (crs, t) ← Messy-obf-branch(1n) and every key in Proceedings of the IEEE 56th Annual Symposium pk, FindMessy(t, pk) outputs a messy branch with over- on Foundations of Computer Science (FOCS’15), whelming probability, because of both branches lie outside pp. 171–190, 2015. D is at most (1/4)m=negl(n). [4] Y. L. Chen, C. Gentry, and S. Halevi, “Cryptanaly- From the above, we draw a conclusion that the ob- ses of candidate branching program obfuscators,” in fuscator for LWE-based encryption scheme is a slightly Advances in Cryptology, pp. 278–307, 2017. relaxed dual-mode cryptosystem. On the basis of it, we [5] R. Fernando, P. M. R. Rasmussen, and A. Sahai, obtain an OT protocol. The protocol operates in ei- “Preventing CLT attacks on obfuscation with linear ther obf-branches, which are obfuscation of the dual-mode overhead,” in Advances in Cryptology, pp. 242–271, encryption branches. The OT protocol based on dual- 2017. mode securely realizes the functionality FOT . Therefore, [6] S. Garg, C. Gentry, and S. Halevi, “Candidate multi- our LWE-based oblivious transfer protocol from indistin- linear maps from ideal lattices,” in Advances in Cryp- guishability obfuscation is secure. tology, pp. 1–17, 2013. [7] S. Garg, C. Gentry, S. Halevi, M. Raykova, A. Sahai, and B. Waters, “Candidate indistinguishability ob- 6 Conclusions fuscation and functional encryption for all circuits,” in IEEE 54th Annual Symposium on Foundations of We can see that most existing OT protocols are based Computer Science, pp. 40–49, 2013. on the hardness of number theoretical problems. In this [8] C. Gentry, C. Peikert, and V. Vaikuntanathan, paper, we propose a secure LWE-based oblivious trans- “Trapdoors for hard lattices and new cryptographic fer protocol from iO, and give the proof of security. In constructions,” in Proceedings of 40th Annual ACM addition to iO, our protocol is based on LWE-based dual- Symposium on Theory of Computing, pp. 197–206, mode encryption, which is a framework for efficient and 2008. composable oblivious transfer. Thus, our protocol can [9] H. Lin, “Indistinguishability obfuscation from realize the oblivious transform functionality. Compared constant-degree graded encoding schemes,” in Ad- with the protocol of Zheng et al., our protocol is secure in vances in Cryptology, pp. 28–57, 2016. the quantum environment. At present, using punctured [10] E. Miles, A. Sahai, and M. Zhandry, “Annihilation programs technique to carry out some applications of iO attacks for multilinear maps: Cryptanalysis of in- gradually become a central primitive for cryptography, so distinguishability obfuscation over GGH13,” in Ad- we would like to use this technique to build another se- vances in Cryptology, pp. 629–658, 2016. cure, efficient, and succinct oblivious transfer protocol. [11] C. Peikert, V. Vaikuntanathan, and B. Waters, “A framework for efficient and composable oblivious Acknowledgments transfer,” LNCS, vol. 5444, no. 72, pp. 554–571, 2009. This study was supported by the National Key R&D Pro- [12] A. Pellet-Mary, “Quantum attacks against indistin- gram of China under Grant No.2017YFB0802000, the guishablility obfuscators proved secure in the weak National Natural Science Foundations of China under multilinear map model,” in Advances in Cryptology Grant Nos.61972457, 61672412, 61402015, U1736111, the – CRYPTO 2018, pp. 153–183, 2018. National Cryptography Development Fund under grant [13] M. O. Rabin, “How to exchange secrets by obliv- No.MMJJ20170104, the MOE Layout Foundation of Hu- ious transfer,” Technical Report, 1981. (https:// manities and Social Sciences under Grant 19YJA790007, eprint.iacr.org/2005/187.pdf) and the Research Program of Baoji University of Arts and [14] O. Regev, “On lattices, learning with errors, random Sciences under Grant No.ZK2018093. I acknowledge the linear codes, and cryptography,” in Proceedings of anonymous reviewers for their valuable comments. 37th Annual ACM Symposium on Theory of Com- puting, pp. 84–93, 2005. [15] A. Sahai and B. Waters, “How to use indistinguisha- References bility obfuscation: Deniable encryption, and more,” Proceedings of the Annual ACM Symposium on The- [1] S. Badrinarayanan, E. Miles, A. Sahai, and ory of Computing, pp. 475–484, 2014. M. Zhandry, “Post-zeroizing obfuscation: New math- [16] P. W. Shor, “Polynomial-time algorithms for prime ematical tools, and the case of evasive circuits,” in factorization and discrete logarithms on a quantum Advances in Cryptology, pp. 764–791, 2016. computer,” SIAM Review, vol. 41, no. 2, pp. 303– [2] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, 332, 1999. A. Sahai, S. Vadhan, and K. Yang, “On the [17] Y. Zheng, W. Mei, and F. Xiao, “Secure oblivious (im)possibility of obfuscating programs,” in Ad- transfer protocol from indistinguishability obfusca- vances in Cryptology, pp. 1–18, 2001. tion,” The Journal of China Universities of Posts [3] N. Bitansky and V. Vaikuntanathan, “Indistin- and Telecommunications, vol. 23, no. 3, pp. 1–10, guishability obfuscation from functional encryption,” 2016. International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 808

Biography Now she is a PhD student in Xidian University. Her main research interests include public key cryptography Shanshan Zhang received her B.S. degree in 2004 from and indistinguishable obfuscation. Baoji University of Arts and Sciences, and received her M.S. degree in 2007 from Huaibei Normal University. International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 809

Verifiable Secret Sharing Based On Micali-Rabin’s Random Vector Representations Technique

Haiou Yang and Youliang Tian (Corresponding author: Youliang Tian)

School of Computer Science and Technology, Guizhou University Guizhou Province, Guiyang, China (Email: [email protected]) (Received Mar. 24, 2019; Revised and Accepted Nov. 16, 2019; First Online Jan. 29, 2020)

Abstract sharing scheme based on Lagrange interpolation poly- nomial and mapping geometry theory respectively. As- Verifiable secret sharing is the core basic protocol of many muth [1] proposed the threshold secret sharing scheme cryptographic systems, which is widely used in secure based on Chinese Remainder Theorem (CRT). Halper and communication in network environment. By now, there Teague [6] combined game theory with secret sharing and are many researches on verifiable secret sharing for thresh- proposed the concept of rational secret sharing for the old structure, which lacks generality and flexibility com- first time, that is, all participants are rational rather than pared with general access structure. However it is diffi- honest or malicious. TIAN [12] analyzed the distribution cult to realize verifiable secret sharing scheme for general mechanism and reconstruction mechanism of secret shar- access structures. Existing generalized verifiable secret ing under the framework of game theory, and studied the sharing schemes are few and have low efficiency. In this problem of one secret sharing based on Bayesian game, paper, we propose a new verifiable secret sharing scheme which solved the cooperation of this kind of rational se- of general access structure. We use knowledge commit- cret sharing system. But none of these schemes can prop- ment scheme based on bilinear pairing to ensure the se- erly detect and prevent the malicious behavior of dealers curity and concealment of public information, and adopt and participants. To address possible dishonesty among the Micali-Rabin’s random vector representations tech- participants, Chor et al. [3] proposed the concept of ver- nique to improve the the efficiency of verification process. ifiable secret sharing (VSS) based on large integer factor Our security and performance analysis shows that the new decomposition problem for the first time. Stadler [11] im- scheme is more efficient and practical compared to exist- proved the Chor’s scheme, and proposed the Publicly Ver- ing similar schemes. ifiable Secret Sharing schemes (PVSS) based on discrete Keywords: Bilinear Pairing; General Access Structure; logarithm. TIAN [13] constructed a non-interactive pub- Micali-Rabin’s Random Vector Representations Tech- lic verifiable secret sharing using bilinear pairs on elliptic nique; Verifiable Secret Sharing curves, and its information rate reached 2/3. Jhanwar [4] proposed a PVSS scheme and provided a formal proof for the IND-secrecy of his scheme, based on the (t, n)-multi- 1 Introduction sequence of exponents Diffie-Hellman assumption. After that, a lot of achievements have been made in the re- Secret sharing is the basic protocol for constructing cryp- search on secret sharing schemes. However, almost all of tographic schemes such as secure multiparty computation the above schemes are designed for threshold structure, and digital signature, which is mainly used for the dis- the threshold secret sharing is only a special case of gen- tribution, preservation and reconstruction of secret and eralized secret sharing. key (or other secret information), to prevent loss, dam- age or been tampered of information. The fundamen- Threshold structure lack flexibility and are not applica- tal idea of secret sharing is that secret is divided into ble in some specific scenario compared with general access multiple parts by one dealer and shared among differ- structure. For instance, the dealer share secret among ent participants, and some subsets of participants can be participants U1,U2,U3,U4 and specify the two subsets of used to reconstruct the secret, while others cannot recon- participants {U1,U2}, {U2,U3,U4} can reconstruct the se- struct it and get no information about secret. Shamir [10] cret. So (t,n) threshold secret sharing scheme is no longer and Blakley [2] first proposed the (t,n) threshold secret applicable under this circumstance. In order to study the International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 810 secret sharing of general access structures with wider ap- of performance between our proposed GVSS and the VSS plicability, Ito et al. [8] first proposed a secret sharing proposed by ZHANG [16] and Tsu-Yang [14]. Conclusion scheme based on general access structure, that is, the co- is given in Section 5. operation of participants with any authorized subset can reconstruct the secret. Harn et al. [7] applied integer pro- gramming to the generalized secret sharing scheme. They 2 Preliminaries stipulate authorized subsets and non-authorized subsets to try to find a reasonable share allocation scheme so as 2.1 Bilinear Pairing to achieve the general access structure with the tradi- Let G1 and G2 be additive cyclic groups and multiplica- tional (t,n) threshold scheme, but the scheme was expen- tive cyclic groups of order q, where q is a big prime num- sive and has low efficiency in calculation. In the existing ber. Assuming that discrete logarithm problems on group secret sharing scheme of general access structure, either G1 and G2 are difficult. A map: e : G1 × G1 → G2 with the correctness of secret shares cannot be verified, or the the following properties is called a bilinear pairing: computational overhead is increased to achieve the verifi- ab ability of secret shares. 1) Bilinear: e(aP, bQ) = e(P,Q) for all P,Q ∈ G1 and ∗ Therefore, we propose an efficient generalized verifi- a, b ∈ Zq . able secret sharing scheme based on Micali-Rabin’s ran- dom vector representations technique. In view of the 2) Non-degenerate: There exists P,Q ∈ G1 such that complexity of verification process and the low probabil- e(P,Q) 6= 1. ity of verifying the correctness of computing result, we 3) Computable: For all P,Q ∈ G1, there exists an effi- adopt Micali-Rabins random vector representation tech- cient algorithm to compute e(P,Q) = 1. nique, based on Zero-Knowledge Proof(ZKPs), proposed by Micali and Rabin [9]. Zero Knowledge Proofs, pro- posed by Golddwasser [5], are one of the most remark- 2.2 Knowledge Commitment Scheme able innovations in information security, which refers to Based on Bilinear Pairing the ability of a prover to convince a verifier that a state- Let P,Q ∈ G1 be two generators of group G1. Nobody ment is true without providing any useful information to knows the the discrete log of P,Q (anybody does not know the verifier, has been widely used in the field of infor- ∗ the n ∈ Zq such that Q = nP ). When making commit- mation security. Rabin [9] developed a novel secure and ∗ ment to s ∈ Zq , we just have to compute the commitment highly efficient way for verifying correctness of the output COM(s) = e(P,Q)s. When revealing the commitment, of a transaction while keeping input values secret, based only s needs to be disclosed, and the verifier can verify on the ZKPs. Xin [15] proposed a new fair and ratio- whether the commitments revealed by dealer are correct nal delegation computation. Aiming at the complexity of according to COM(s) = e(P,Q)s. the verification problem, they adopted the Micali-Rabin’s random vector representation technique. Consequently, ZKPs is used to prove the correctness of the computing 2.3 Micali-Rabin’s Random Vector Rep- results, which provides a new direction for our research. resentations In this paper, a new generalized verifiable secret shar- We adopt the knowledge commitment scheme based on ing (GVSS) scheme is proposed. The contributions of our bilinear pairing by Tian [?], the equality is proved by zero proposed GVSS are as follows: First, we proposed a GVSS knowledge proofs. The properties of the bilinear pairing scheme based on the difficulty of Diffie-Hellman problem satisfy the above assumption, and Fq is a finite field and of bilinear pairings. In this scheme, secret shares are cho- q is a large prime number of 128bits. sen by the participants themselves, effectively avoiding deception by the dealer. Also, compared with threshold Definition 1. A random vector representation of x is ∗ schemes, this scheme can specify any authorized subsets a vector X = (u, v), where u, v ∈ Zq , u was randomly to reconstruct secret, which greatly increases the flexibil- chosen, and v = (x − u)modq. The value of the vector X ity of secret sharing and expands the application scenarios is val(X) = (u + v)modq. of the schemes. Second, on account of the comlexity of verification phase, we adopt Micali-Rabin’s random vec- Definition 2. Commitment to vector X = (u, v) is tor representation technique, that is, the secret shares are COM(X) = (COM(u),COM(v)), where COM(u) = u v represented by knowledge commitment scheme for bilin- e(P,Q) , COM(v) = e(P,Q) . ear pairing. When needs to verify the correctness of secret Definition 3. A list of commitments COM(X(j)), 1 ≤ shares, it only needs to execute an efficiently process ac- j ≤ m are called value consistent if val(X(j)) = cording to the public information on the bulletin board. val(X(j+1)) for any 1 ≤ j ≤ m. The rest of this paper is organized as follows. In Sec- tion 2 we give the relevant backgrounds. Our proposed When needs to prove COM(X),COM(Y ) value con- GVSS is described in Section 3. In Section 4, we give se- sistent, where X = (u1, v1),Y = (u2, v2), we will prove curity analysis of our proposed GVSS and the comparison val(X) = val(Y ). Note that val(X) = val(Y ) if and only International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 811

∗ if there exists w ∈ Zq such that X = Y + (w, −w)(If ax)modq. Simultaneously, chooses t different ran- such an w does not exist, the value is inconsistent). The dom numbers d1, d2, ··· , dt to represent these t au- prover randomly chooses c ← {1, 2}. Assume that c = 1, thorized subsets in Γ0 respectively. In succession, the prover reveals to verifier u1, u2, −w. The verifier com- SD computes f(1), and for each authorized sub- putes COM(u1),COM(u2), and compares to the posted set δj = {P1j,P2j, ··· ,P|δj |j} in Γ0 computes Hj = s0 s0 s0 first coordinates of COM(X),COM(Y ). The verifier f(dj) ⊕ H(R1j ) ⊕ H(R2j ) ⊕ · · · ⊕ H(R|δj |j ). Fi- next checks that u1=u2 − w is true. Vice versa, assume nally, publish R0, f(1),H1,H2, ··· ,Ht, d1, d2, ··· , dt that c = 2, the prover reveals to verifier v1, v2, w. The ver- on the SBB. ifier computes COM(v1),COM(v2) and compares to the posted second coordinates of COM(X),COM(Y ). The 3.2 Verification Phase verifier next checks that v1 = v2 + w is true. Apparently, 1 the prover accepts an false formula with a probability of 2 . All participants of any authorized subset δj can cooperate to reconstruct the secret s. Assume that the participants Lemma 1. If more than k commitments are of δj = {P1j,P2j, ··· ,P|δj |j} reconstruct the secret s. false, then the probability that the verifier accepts sij 1 k Step 3. Each participant P computes R 0 = R is ( 2 ) . ij ij 0 based on the public information R0 on the SBB. And (j) (j+1) Proof. The probability that any val(X ) 6= val(X ) each participant Pij posts on the SBB 3k rows of is not found to be wrong is at most 1 . So the probability 0 (h) (h) 2 Rij : COM(R1j0 ), ··· , COM(R|δ |j0 ), 1 ≤ h ≤ 3k. 1 k j that at least k formulas are not found wrong is ( 2 ) with The 3k rows of SBB use the Micali-Rabin’s ran- a randomly chosen value c ← {1, 2}. (h) dom vector representations technique COM(R1j0 ) = (h) (h) (h) (h) (COM(uij0 ), COM(vij0 )), where R1j0 = (uij0 , (h) (h) (h) (h) 3 Scheme vij0 ), val(R1j0 ) = (uij0 + vij0 ) mod q, 1 ≤ h ≤ 3k. Step 4. To begin with, P randomly chooses half of This scheme assumes that the Dealer D needs to share the ij the commitments from the 3k rows of Rij0 , Pij se- secret s between n participants. Also, the scheme assumes (h) the existence of a secure bulletin board(SBB), which is cretly reveals Rij0 and commitment values R1j0 = (h) (h) used to publish data and cannot be deleted or modified (uij0 , vij0 ) to SR. as soon as it is published. At the same time, the published Next, determines the value of c ← {1, 2} by flip- data is visible to dealer and all participants. The scheme ping a coin, opening a part of the remaining com- includes Distribution Phase, Verification Phase and Se- mitments value. Assume that c = 1, Pij secretly cret Reconstruction. (h) (h+1) reveals the commitments COM(u 0 ),COM(u 0 ) Initialization: Assume that F is a finite field and q is a ij ij q (h+1) (h) large prime number, G1 and G2 are respectively additive and −w to SR, where w = (uij0 − uij0 ) mod q. cyclic groups and multiplicative cyclic groups of order q, Assume that c = 2, Pij secretly reveals the commit- (h) (h+1) P,Q ∈ G1 are two generators of group G1. The properties ments COM(vij0 ), com(vij0 ) and w to SR, where of the bilinear pairing satisfy the above assumption, and (h+1) (h) w = (vij0 − vij0 ) mod q. there are efficient algorithms for mapping e : G1 × G1 → ∗ G2 on groups G1 and G2. H : G2 → Zq is the anti- Step 5. At first, SR privately received commitment val- collision hash function. (h) (h) (h) ues R1j0 = (uij0 , vij0 ) and Rij0 sent by Pij. SR The secret distributor specifies the authorized subset. (h) first verify that the equation val(Rij0 ) = (uij0 + Assume that the participants set is P = {P1,P2, ··· ,Pn}, (h) v 0 ) mod q is correct. Γ0 = {δ1, δ2, ··· , δn} is the minimum access struc- ij ture, δ = {P ,P , ··· ,P } is the authorized subset, j 1j 2j |δj |j Next, SR performs a value consistent check on the which |δj| is the number of members in δj. The secret dis- remaining commitment values. Assume that re- tributor is SD, the secret reconstructor is SR, the shared ceived c = 1, SR opens the commitment value secret is s. (h) (h+1) COM(uij0 ),COM(uij0 ) and −w, then verifies that the (h) (h+1) equation COM(uij0 ) = (COM(uij0 ) + (−w)) mod q is 3.1 Distribution Phase correct. If received c = 2, SR opens the commitment (h) (h+1) ∗ value COM(v 0 ),COM(v 0 ) and w, then verifies that Step 1. Each participant Pij randomly chooses sij ∈ Z , ij ij q (h) (h+1) sij and computes Rij = e(P,Q) . And keeps sij se- the equation vij0 = (uij0 +w) mod q is correct. Appar- cretly, delivers Rij to SD. ently, only opened half of the commitment values at one time, from lemma 1, it can be seen that the probability of 1 Step 2. SD randomly selects s0, and computes R0 = the participants accepting an false equation is 2 , if more s0 ∗ e(P,Q) . Then, chooses a ∈ Zq randomly and than k commitments are false, then the probability that 1 k construct a 1st degree polynomial f(x) = (s + the participants accept is ( 2 ) . International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 812

3.3 Secret Reconstruction of the commitment, since the calculation of Diffie-Hellman problem (CDHP) of bilinear pairings are hard to work out. Step 6. And then SR received the verified R 0 . With ij In conclusion, the knowledge commitment scheme based these values, SR can compute H 0 = H ⊕ H(R 0 ) ⊕ j j 1j on bilinear pairing meets the requirements of complete H(R 0 ) ⊕ · · · ⊕ H(R 0 ). With the two coordinate 2j |δj |j hiding and computational binding. points (1, f(1)), (dj,Hj0 ), SR can reconstruct f(x) = −1 xf(1) − xHj0 − djf(1) + Hj0 (1 − dj) . At last, the Theorem 3. It is assumed that the calculation of shared secret can be recovered by computing s = Diffie-Hellman problem (CDHP) of bilinear pair- f(0) mod q. ings are difficult to be solved, then the proposed scheme is of security. 4 Scheme Analysis Proof. In the verification phase, the attackers try to get the s0 and sij from the commitments R0 and the com- 4.1 Security Analysis (h) (h) mitments of 3k rows COM(R 0 ), ··· ,COM(R 0 ), 1 ≤ 1j |δj |j Theorem 1. If the R 0 received by the secret recu- ij h ≤ 3k posted on the SBB, they have to solve R0 = perator SR are verified by the Micali-Rabin’s ran- (h) u(h) (h) s0 ij0 e(P,Q) , COM(uij0 ) = e(P,Q) and COM(vij0 ) = dom vector representations technique, then Rij0 is v(h) the correct and has not been modified. Then this e(P,Q) ij0 . However, the discrete logarithm prob- new scheme is verifiable. lem(DLP) on the elliptic curve and the calculation of Diffie-Hellman problem (CDHP) of bilinear pairings are Proof. In the verification phase, by adopting the difficult to be solved. So it’s not computationally feasible Micali-Rabin random vector representations tech- to get the specifics of the commitments. nique, each participant Pij committed 3K rows: (h) (h) COM(R 0 ), ··· ,COM(R 0 ), 1 ≤ h ≤ 3k to the 1j |δj |j At the same time, the participant Pij tries to dis- Rij0 on the SBB. In the verification phase, SR verify (h) (h) tribute false Rij0 to SR in the verification phase, half of the commitments u 0 , v 0 and Rij0 sent by 0 0 ij ij that is COM(Rij0 ) = COM(Rij0 ), where Rij0 6= (h) (h) (h) 0 ∗ Pij(That is to verify that val(Rij0 ) = (uij0 + vij0 )modq Rij0 ,Rij0 Rij0 ∈ Zq . However, according to theorem 2, is equal). If verification fails, SR reject Rij0 sent by the the knowledge commitment scheme based on bilinear pair- dealer. If it’s verified, SR performed a value consistent ing meets the requirement of computational binding, the check on the remaining commitment values, that is dealer only can open the commitment in one way. So Pij (h) (h+1) 0 verified COM(uij0 ) = (COM(uij0 ) + (−w)) mod q cannot send a false Rij to SR. Also, the secret shares sij (h) (h+1) of each participant Pij in this scheme are chosen by the or vij0 = (uij0 + w) mod q. According to Lemma 1, if more than k commitment values are wrong, the participants themselves, avoiding the distributor’s decep- 1 k tion. probability of SR accepting the wrong results is ( 2 ) . To sum up, the Rij0 verified by Micali-Rabin’s random vector representations technique is the correct and has 4.2 Performance Analysis not been modified, this scheme is verifiable. This section briefly analyzes the performance of the pro- Theorem 2. The knowledge commitment scheme posed scheme by comparing it with the existing scheme. based on bilinear pairing meets the requirements Te denotes the time of executing a bilinear pairing, Tm of complete hiding and computational binding. denotes the time of executing a scalar of multiplication 0 ∗ 0 in G1, Texp denotes the time of executing an exponenti- Proof. Assume that there exists s ∈ Zq and s 6= s such that COM(s0) = COM(s)(that is, the dealer can open ation in G2, Tp denotes the time of computing the poly- the commitment in two ways). Assume that s = s0 +t, 0 < nomial value. The time of executing a modular addition ∗ s0 s operation in Z and one-way hash function are negligible t < q, that is e(P,Q) = e(P,Q) . Because P,Q ∈ G1 q compared with Te and Tm. Therefore, we just consider are two generators of group G1, and q is the big prime those time-consuming operations Te, Tm, Texp, Tp, other order on group G1, so qP = 0, qQ = 0(0 is the point at 0 computational overhead is ignored.the computational ef- infinity of the group G1). We get e(s P,Q) = e(sP, Q) 0 from e(P,Q)s = e(P,Q)s = e(sP, Q) = e(s0P,Q), so we ficiency as shown in Table 1. have sP = s0P . So there exists sP − s0P = tP = 0 with The above mentioned, |δj| denotes the number of mem- s = s0 + t, 0 < t < q. But, 0 < t < q, this contradicts bers in authorized subset, hence |δj|  n. Therefore, P with order q, so it has to be s = s0, that is, the dealer performance analysis shows that in our scheme, with the only can open the commitment in one way. So the scheme adoption of Micali-Rabin’s random vector representations meets the requirement of computational binding. technique, the verification process is much less computa- tionally intensive. Compared with the above two schemes, Also COM(s) = e(sP, Q) = e(P, sQ), it is not compu- the computational costs in the distribution phase has also tationally feasible for an attacker to try to get the specifics been significantly improved. International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 813

Table 1: Comparison of computational costs

Distribution Verification Reconstruction Schemes Phase Phase Phase TIAN’s scheme[8] (3n + t)G1 tTexp + G1 + (t + 1)Te 3tTm + 2tTe ZHANG’s scheme[14] (n + 1)Tm + 2tTexp tTe + n(t + 1)Texp tTm Tsu-Yang’s scheme[15] nTe + (4n + t)Tm (n+3)Te + n(t + 1)Tm tTm +nTexp + nTp +nTexp + ntTp TIAN’s scheme[8] (n+1)Te + nTexp 6k |δj| Te + |δj| Texp Tm

4.3 Simulation Analysis For the generalized verifiable secret sharing based on Micali-Rabin’s random vector representation technique proposed in this paper, simulation analysis was carried out in combination with the actual scenario. All data were the average of the experimental results for 10 times. The execution performance of the secret distribution pro- cess is shown in Figure 1. It can be seen that the execution time is linear with the change of the number of people. Because as the number of people increases, the number Figure 2: The curve of calculation costs as the number of of operations of bilinear pairing and exponentiation in- people changes in the verification phase, and the curve of creases. As can be seen from Figure 2, as the sub-secret secret reconstruction calculations as the number of people is verified by Micali-Rabin’s random vector representation changes technique in the verification phase, the calculation time of the secret verification process is less affected by the number of participants, and the calculation time of the scheme adopts Micali-Rabin’s random vector representa- secret reconstruction process does not change much with tions technique, which greatly improves the efficiency of the number of people. In general, the scheme has good the verification phase. Finally, by analyzing the security application value in practical application scenarios. and performance of the scheme, the scheme satisfies the feature of verifiable secret sharing and is more efficient than the existing schemes. The next work is to design a efficient secret sharing scheme with general access struc- ture that can be publicly verified and applied to suitable application scenarios.

Acknowledgments

This work is supported by Key Projects of the Union Fund of the National Natural Science Foundation of China un- Figure 1: The curve of secret distribution calculations as der Grant No. U1836205; The National Natural Science the number of people changes Foundation of China under Grant No. 61772008; The Sci- ence and Technology Top-notch Talent Support Project in Guizhou Province Department of Education under Grant No.Qian Education Combined KY word [2016]060; The 5 Conclusions Guizhou Province Science and Technology Major Special Plan No. 30183001; The Guizhou Provincial Science and The (t,n) threshold secret sharing scheme has certain lim- Technology Plan Project under Grant No. [2017]5788; itations in practical application, so it is of great applica- The Ministry of Education-China Mobile Research Fund tion value to study the secret sharing of general access Project under Grant No. MCM20170401; The Guizhou structure. This paper proposes a verifiable secret shar- University Fostering Project No. [2017]5788; Research on ing scheme based on general access structure. Firstly, Block Data Fusion Analysis Theory and Security Manage- our scheme adopts the knowledge commitment scheme ment Model of Data Sharing Application (No. U1836205); based on the bilinear pairing to guarantee the conceal- Research on Key Technologies of Blockchain for Big Data ment and security of public information. Secondly, our Applications (Grant No. [2019]1098). International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 814

References [12] Y. L. Tian and J. Katz J. F. Ma and C. G. Peng, “One-time rational secret sharing scheme based on [1] C. Asmuth and J. Bloom, “A modular approach to Bayesiangame,”Wuhan University Journal of Nature key safeguarding,”IEEE Transactions on Information Science, vol. 16, pp. 430-434, 2011. Theory, vol. 29, no. 2, pp. 208-210, 1983. [13] Y. Tian and C. Peng, “publicly verifiable secret [2] G. R. Blakley, “Safeguarding cryptographic sharing schemes using bilinear pairings,”International keys,”IEEE Computer Society Digital Library, Journal Network Security, vol. 14, no. 3, pp. 142-148, vol. 22, no. 11, pp. 612-613, 1979. 2012. [3] B. Chor and S. Goldwasser and S. Micali and B. Awer- [14] T. Wu and Y. Tseng, “A pairing-based publicly ver- buch, “ Verifiable secret sharing and achieving simul- ifiable secret sharing scheme,”Journal of Systems Sci- taneity in the presence of faults,” in Foundations of ence and Complexity, vol. 24, no. 1, pp. 186-194, 2011. Computer Science, pp. 383-395, 1985. [15] Y. Xin and M. O. Rabin, “Fair and rational delega- [4] Y. Gan and L. Wang and P. Pan and Y. Yang, “Pub- tion computation protocol,”Journal of Software, vol. licly verifiable secret sharing scheme with provable 29, no. 7, pp. 1953-1962, 2018. security against chosen secret attacks,”International [16] F. Zhang, “Efficient and information-theoretical se- Journal of Distributed Sensor Networks, pp. 1-9, 2013. cure verifiable secret sharing over bilinear groups,” [5] S. Goldwasser and S. Micali and C. Rackoff, Chinese Journal of Electronics, vol. 23, no. 1, pp. 13- “The knowledge complexity of interactive proof sys- 17, 2014. tems,”SIAM Journal on Computing, vol. 18, no. 1, pp. 186-208, 1989. [6] J. Halpern and V. Teague, “Rational secret sharing Biography and multiparty computation: Extended abstract,”The 36th Acm Symposium on Theory of Computing, pp. Haiou Yang biography. He received the B.Sc. degree in 623-632, 2004. Information Management and System from Dalian Com- [7] L. Harn and C. Hsu and M. Zhang and T. He and muication University in 2017. He is now a postgraduate M. Zhang, “Realizing secret sharing with general ac- student at Guizhou University. His research interests cess structure,”Information Sciences, vol. 367-368, pp. include Security, Cloud computing and Cryptographic 209-220, 2016. protocols. [8] M. Ito and A. Saito and T. Nishizeki, “Secret sharing Youliang Tian biography. He received the B.Sc. degree scheme realizing general access structure,”Electronics in Mathematics and Applied Mathematics in 2004 and and Communications in Japan Part Iii-fundamental the M.Sc. degree in Applied Mathematics from Guizhou Electronic Science, vol. 72, no. 9, pp. 56-64, 1989. University in 2009. He received the Ph.D. degree in cryp- [9] M. O. Rabin and Y. Mansour and S. Muthukrishnan tography from Xidian University in 2012. In the years and M. Yung, “Strictly-black-box zero-knowledge and 2012 to 2015, he was a Postdoctoral Associate at the efficient validation of financial transactions,” in In- State Key Laboratory for Chinese Academy of Sciences. ternational Colloquium Conference on Automata, pp. He is currently a professor and Ph.D. supervisor at 738-749, 2012. College Of Computer Science and Technology, GuiZhou [10] A. Shamir, “How to share a secret,”Communications University. His research interests include algorithm game of The ACM, vol. 22, no. 11, pp. 612-613, 1979. theory, cryptography and security protocol. [11] M. Stadler, “Publicly verifiable secret sharing,” in Theory and Application of Cryptographic Techniques, pp. 190-199, 1996. International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 815

A Privacy-Preserving Data Sharing System with Decentralized Attribute-based Encryption Scheme

Li Kang and Leyou Zhang (Corresponding author: Li Kang)

School of Mathematics and Statistics, Xidian University Xi’an, Shaanxi 710071, China (Email: li [email protected]) (Received Mar. 26, 2019; Revised and Accepted Nov. 16, 2019; First Online Jan. 29, 2020)

Abstract In many cases, the data owners are reluctant to ex- pose their data stored in the cloud server to all users as The huge storage space and powerful computing power the data may be sensitive, such as the personal health make cloud servers be the best place for people to store record (PHR). PHR is a new summarized electronic data. However, convenience comes with the risk of data record of an individual’s medical data and information, leakage as the cloud servers are not completely reliable. which are often exchanged through cloud servers. How- To ensure data confidentiality and user privacy, decentral- ever, the record may contain sensitive information about ized attribute-based encryption (ABE) can provide a solu- consumers, such as allergies, diseases, etc. Placing the raw tion. Nevertheless, most of existing works cannot provide PHR data directly on an unreliable third-party server will a complete method since there are some vulnerabilities threaten the owner’s privacy. Thus, to protect the data in users’ collusion resilience and privacy protection. In confidentiality, it is an effective way for the owner to exe- this paper, a decentralized ciphertext-policy (CP) ABE cute encryption operation before uploading it to the cloud scheme is proposed for the secure sharing of data. In server. the proposed scheme, without requiring any cooperation In 2005, Sahai and Waters [30] proposed the concept of and knowing users’ global identifiers (GID), authorities attribute-based encryption (ABE) to support fine-grained can issue private keys for users independently through a access control, in which only when the attributes of ci- privacy-preserving key generation protocol. Additionally, phertext match the decryption keys can the user success- this scheme supports policy anonymity. The security of fully recover the received ciphertext. Since then, many the proposed scheme is reduced to the decisional bilinear studies [4,22,25] have been proposed. The multi-authority Diffie-Hellman (DBDH) assumption. Theoretical analy- ABE (MA-ABE) [5] was introduced to reduce the bur- sis and performance evaluations illustrate the efficiency of den on central authority (CA) and to divide its pow- the scheme. ers. In 2011, the decentralized ABE scheme [18] as a Keywords: Cloud Storage; Data Sharing; Decentralized new MA-ABE scheme was proposed by Lewko and Wa- CP-ABE; Privacy-preserving ters. Neither CA nor authority cooperation exists in this construction, thus it is more in accord with the actual re- quirements for the independence of authorities. The basic 1 Introduction properties [17,21] that all ABE schemes should satisfy are: fine-grained access control, scalability, data confidential- 1.1 Background ity, and collusion resistance. The emergence of the big data era makes cloud storage In addition, the privacy protection of users cannot technology become the one of the hottest technologies. be ignored. Although the rapid development of cloud The cloud becomes the best choice for data users to store computing technology enables users to deal with a large data as its storage space is much larger than the local one. amount of digital information, the privacy of personal The so-called cloud storage is a new technology that puts data is also facing unprecedented challenges. After wit- data on the cloud for human access, by which users can nessing some information leakage incidents, people grad- easily access data at any time and anywhere through any ually realize the importance of privacy protection. They connected device. However, it also brings some challenges, want to keep their data private while gaining legal ac- such as data confidentiality. cess. Therefore, there is an urgent need for an encryption International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 816

data and the user privacy will be compromised. In addi- tion, there are many categories of attributes in the real world, so it is impractical to have only one authority to manage them. Chase [5] put forward a multi-authority ABE (MA- ABE) scheme to weaken the power of central author- ity (CA), in 2007. Instead of a single authority in the scheme, there are many authorities here to issue private keys for users. While sharing the pressure of Figure 1: Colluding authorities gather information about single-authority, multiple authorities have differentiated the user its rights and provided a more secure and effective man- agement mode. However, there is still a CA to manage other attribute authorities, and the multiple authorities scheme that can protect users’ privacy. need to cooperate to initialize the system. Chase and In the MA-ABE schemes, in order to protect data con- Chow [6] first introduced the concept of privacy protec- fidentiality against collusion attacks, the global identi- tion and proposed a privacy-preserving MA-ABE scheme, fier (GID) is usually tied to the decryption keys. Al- which employed an anonymous key issuing protocol to though the introduction of global identifier in secret keys achieve the goal of hiding the user’s GID and employed enhances their uniqueness and can resist the collusion a distributed pseudorandom functions (PRF) to get rid attack of unauthorized users to a certain extent, the of the CA. Later, some MA-ABE schemes with privacy- user’s privacy will inevitably be compromised if he/she preserving in PHR system have been proposed [19,27,32]. directly sends the undisguised identifier to each authority These schemes employed the distributed PRF technique for private key request. As shown in Figure 1, suppose in [6] to protect the GID information. In these schemes, that the authority Ai(i = 1, 2, 3) manages the attribute there are multiple authorities that manage a set of dis- ˜ set Ai = {ai,j}j={1,2}, the user U who owns attributes joint attributes, though CA is not required, there is still a {a1,1, a2,1, a3,1} and identifier u directly submits (u, ai,1) partnership among the multiple authorities namely every to Ai for secret key request. By jointly tracking the same two authorities (Ak,Aj) must perform a 2-party key ex- u [6], the colluding authorities can easily gather all the change to share the PRF seed sk,j = sj,k, which increases attributes attached to it. As a result, users’ personal in- communication and computing costs. In 2011, Lewko and formation U(u, {a1,1, a2,1, a3,1}) is thoroughly exposed to Waters [18] introduced a novel MA-ABE, namely decen- them. Therefore, the hiding of GID is the primary task of tralized attribute-based encryption scheme. Unlike the constructing a privacy-preserving encryption scheme. Be- previous schemes, there is no CA in this scheme, and the sides, the access policy also needs to be hidden because authorities are independent of each other without any co- it indicates the legal recipient. Malicious users can infer operation. the attributes of the recipient based on the access policy. The first decentralized KP-ABE scheme considering This also compromises user privacy. Based on this, a se- user GID privacy was proposed by Han et al. [12], in 2012. cure decentralized scheme dedicated to privacy protection Unfortunately, it turned out to be unsafe [10]. Then a se- is proposed. ries of improvement schemes [28, 34] were proposed. In scheme [34], Zhang et al. modified the original scheme 1.2 Related Work and came up with a more secure scheme against user col- lusion. All the schemes mentioned above exploited the Along with the development, many expansion schemes [1, technique of hiding GID in [6] to produce the secret keys 29, 35] were derived from the original ABE scheme [30]. for users. In general, ABE comes in two categories: key-policy In 2015, Huang et al. [15] put forward a revocable MA- ABE (KP-ABE) [11] and ciphertext-policy ABE (CP- ABE scheme. In their scheme, a CA is needed to send ABE) [2]. The CP-ABE scheme enables data owners to the seed sk to each authority Ak and generate the se- achieve absolute control over their data and decide who cret key component to users. To protect user GID and can and cannot access the data, since the access policy is attributes from being leaked, Han et al. [13] proposed a determined by them. In the data encryption system, con- PPDCP-ABE scheme. However, Wang et al. [31] found sidering the computational and storage costs of the autho- that it could neither resist the collusion attack nor achieve rized organization, single-authority ABE scheme like [36] attributes hiding. In 2018, the scheme [14] gave a new is no longer applicable. Since there is only a trusted cen- idea. In their proposal, there exists an identity manage- tral authority (CA) to manage all users, thus the compu- ment (IDM) that actually acts as a fully trusted CA and tational and communication costs of CA have undoubt- multiple attribute authorities. IDM is responsible for gen- edly increased. More notably, the excessive power of CA erating a pseudonym P idU,j for the user U corresponding will be a potential risk because it has all users’ information to the authority Aj, and generating an anonymous iden- and decryption keys, so that it can access all encrypted tity certificate (AIDCred,j) for the user by signing the files. Once the CA is corrupted, the confidentiality of the pseudonym and attributes information. The AIDCred,j is International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 817

Table 1: Comparisons of the proposed scheme with other MA-ABE schemes

Collusion Central AA GID Policy Decryption Schemes Resistance Authority Cooperation Anonymous Anonymous Outsourced User Authority Huang [15] Yes No No No No Yes N-1 Feng [9] Yes Yes Yes Yes No Yes N-2 Fan [7] Yes No No Yes No Yes N-1 Hu [14] Yes No Yes No No Yes N-1 Chase [6] No Yes Yes No No – N-2 Qian [27] No Yes Yes No No – N-1 Zhang [34] No No Yes No No Yes N-1 Qian [26] No No Yes No No No N-1 Feng [8] No No No Yes No Yes – Han [13] No No Yes No No No N-1 Lyu [23] No No Yes No Yes No N-1 Ours No No Yes Yes Yes Yes N-1

sent to Aj for verification, and if the verification passes, scheme supports both user anonymity and collusion resis- the authority will generate partial secret keys for the user. tant, and the proxy server is introduced here to undertake During the whole process, the authority cannot know the partial decryption computation. The main contributions user’s GID and attributes information, so the user’s pri- of the proposed paper are listed below. vacy is protected. This scheme is resistant to users col- • A decentralized CP-ABE scheme is presented to lusion and authorities collusion, however the existence of cater to the requirements of a distributed system IDM and collaboration between IDM and multiple au- where there is no central authority and multiple at- thorities suggests that further improvement is needed. tribute authority can independently manage users’ In 2017, Lyu et al. [23] proposed a decentralized scheme attributes and generate secret keys for them. to achieve GID anonymity and improve the efficiency by using the online/offline encryption and the verifiable out- • By using the privacy-preserving key generation pro- source decryption. This scheme can tolerant N − 1 com- tocol (PPKeyGen) and composite order bilinear promised authorities. However, the authors did not men- groups, we hide both GID and access policy, thus tion the privacy of the access policy. providing a higher privacy security. In consideration of GID and policy privacy, Qian et • The proposed construction can resist collusion at- al. [26] employed the AND,OR gates on multi-valued at- tacks of the unauthorized users and tolerant N − 1 tributes and put forward a PPDCP-ABE with fully hid- (N is the number of attribute authorities involved in den policy scheme in 2013. But a malicious user can guess ? encryption) corrupted authorities, which provides a the access structure with a simple test: e(Ci,j,1, g) = guarantee for data confidentiality. k e(Πk∈Ic Ti,j,Ci,j,2). Fan et al. [7] came up with a MA- ABE scheme for a hidden policy under the q-BDHE • Theoretical analysis and experimental simulations assumption. In 2018, a new access control system [8] are given to enhance the credibility of the proposed was proposed by Feng et al., where they divided at- scheme in terms of security and efficiency. tributes into attribute names and attribute values, and realized policy hiding by hiding attribute values. In this 1.4 Organization scheme, GID is bounded with the time period, namely The organization of the rest paper is as follows. In Sec- H1(GID||T ime), to resist the collusion attack of illegal users. In 2019, Feng et al. [9] improved the scheme [16] tion 2, some preliminaries involved in this paper are listed. and proposed a privacy-preserving searchable CP-ABE The system model and security model are described in scheme, which achieved attributes anonymity and collu- Section 3. In Section 4, the proposed scheme is presented sion resistance. However, the CA is needed in this scheme in detail. Sections 5 and 6 show the security analysis and to issue the random identity (RID) and user key (UK) for performance analysis, respectively. A brief conclusion is each user, where RID is used to replace the user’s real given in Section 7. identity to achieve identity anonymity. 2 Preliminaries 1.3 Contributions 2.1 Composite Order Bilinear Groups As shown in Table 1, some existing solutions either have Assume that and are two cyclic groups with same weaknesses in privacy protection and collusion resistance, G GT order N = pp , where p and p are two different primes, e : or require the CA or cooperation among multiple author- 1 1 × → is a bilinear map. and are subgroups ities to generate private keys for users. To improve the G G GT Gp Gp1 of the group having order p and p respectively. The user privacy and data confidentiality, a privacy-preserving G 1 map e satisfies the following features: decentralized CP-ABE scheme is proposed, in which the CA is no longer needed and each authority can indepen- 1) Bilinearity: ∀f, h ∈ G, ∀u, v ∈ ZN , here is the equa- dently issue partial secret key for the user. The proposed tion e(f u, hv) = e(f, h)uv. International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 818

2) Non-degenerate: ∃f ∈ G such that e(f, h) in group 3 Definition and Security Model GT has order N. 3.1 System Model 3) Orthogonality: Let gp and gp1 be the generator of

group Gp and Gp1 , respectively. Notice that e is effi- The system architecture is shown in Figure 2, where there ciently computable, and it actually satisfies another are 5 entities: Data Owner, N Attribute Authorities,

property: e(gp, gp1 ) = 1. Cloud Storage Server, Proxy Server and Data User.

2.2 Decisional Bilinear Diffie-Hellman Data Owner: The owner encrypts data file under his/her own specified access policy, then upload the cipher- (DBDH) Assumption text to the cloud server. Suppose g is the generator of group , and a, b, c, z are G Attribute Authorities: Authorities are responsible for random numbers selected from . The DBDH assump- Zp generating secret keys for users, which are indepen- tion holds if given a tuple (A, B, C, Z) = (ga, gb, gc,Z), dent of each other and manage disjoint attribute sets. there is no algorithm B can distinguish Z = e(g, g)abc or The authorities are not entirely credible as they will Z = e(g, g)z in polynomial-time with non-negligible ad- try to collude with other entities to gather useful in- vantage. B’s advantage is defined as: formation in order to obtain illegal profits. AdvDBDH = | Pr[B(A, B, C, e(g, g)abc) = 1] B Cloud Storage Server: The server is assumed to be an − Pr[B(A, B, C, e(g, g)z) = 1]|. honest-but-curious entity that stores the encrypted data file uploaded by data owners and provides access 2.3 Access Structure channel for the users. It acts normally in most of time but may attempt to collect as much information as The AND-gate on multi-valued attributes is applied in possible. this scheme as an access policy. Suppose U = [a1, a2, ··· , an] is an attribute set, Proxy Server: The proxy server owns strong computing each attribute ai ∈ U has ni possible values Vi = power and it only provides partial decryption com- puting service for users. {vi,1, vi,2, ··· , vi,ni }. Let S = [S1,S2, ··· ,Sn] be a user’s attribute set where Si ∈ Vi, and W = [W1,W2, ··· ,Wn] Data User: The user can issue secret key queries to the be an access structure where Wi ∈ Vi. The symbol S |= W indicates that the attribute set S satisfies the authorities and download any ciphertext on the cloud server. However, only when the user’s attributes sat- access structure W (namely, Si = Wi), and S 6|= W indi- cates the opposite. isfy the access structure can the data be recovered. Users are assumed to be distrusted, and they have a tendency to conspire with other entities to illegally 2.4 Commitment access the data file. The commitment scheme adopted in this construction is the Pedersen commitment scheme [24], which is a 3.2 Outline of Decentralized ABE perfectly hiding scheme. This scheme is stated as fol- Scheme lows. Assume G is a prime-order group with generators g0, g1, ··· , gk. To commit messages (m1, m2, ··· , mk), the Let A1,A2, ··· ,AN represent N attribute authorities, r Qk mk user picks r ∈R Zp, and calculates T = g0 j=1 gj . The and each authority Ak monitors a disjoint attributes set number r is used by the user to decommit the commit- A˜k, namely, A˜k ∩ A˜j = ∅ (k, j ∈ [1,N] ∧ k 6= j). L˜ ment T when needed. represents a list of attributes for the user U. A decentralized ABE scheme has five algorithms as fol- 2.5 Zero-Knowledge Proof lows: Zero knowledge proof (ZKP) is a kind of interactive proof Global Setup(1λ): Input λ as a security parameter, the that the prover proves some knowledge to the verifier algorithm outputs the system parameters PP . without leaking them. The ZKP scheme proposed by Ca- menisch and Stadler [3] is briefly described as follows. Authority Setup(PP ): Taking system parameters PP Take the following formula as an example, as input, each authority Ak executes this algorithm to generate its public-secret key pair (PKk,SKk). α β α γ P oK{(α, β, γ): y = g h ∧ y0 = g h }, 0 0 ˜k KeyGen(PP,SKk, GID, Au): Input the system param- where {g, h} and {g0, h0} are generators of group G and eters PP , secret keys SKk, user’s GID and attributes ˜k ˜k ˜ T ˜ G0, respectively. The values α, β and γ are knowledge set Au, where Au = Ak L, authority Ak performs k that need to be proven, the remaining values are used by this algorithm and returns secret keys SKU for the the checker to verify the equations. user. International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 819

Figure 2: System model of the proposed scheme

Encrypt(PP,PKk, M,W ): With the system parame- a random number ξ ∈ {0, 1}. The encryption algo- ters PP , public keys PKk, message M and access rithm is executed to encrypt the message Mξ under policy W as input, the encryption algorithm outputs the access policy W∗ and outputs the ciphertext CT ∗ CT as the corresponding ciphertext. accordingly. Then B returns ciphertext CT ∗ to ad-

k versary A. Decrypt(P P, GID, SKU ,CT ): This algorithm consists of two phases: Pre-Decrypt and User-Decrypt. The Phase 2: Same as Phase 1. first phase is performed by the proxy server and the 0 0 second is executed by the data user. The user first Guess: A returns ξ as a guess on ξ. If ξ = ξ, A wins k the game. modifies the obtained secret keys SKU to construct 0k 0k new keys SKU , and then sends the (SKU ,CT ) to the Definition 1. If there is no t-time adversary who can proxy server for partial decryption calculations. Fi- make q secret key queries to break through the above game nally, the user performs the remaining decryption cal- with a non-negligible advantage, then the scheme is (t, q, ) ∗ culations with the results T returned by the proxy secure in the selective model. server.

3.3 Security Model 4 Scheme Construction The security model is defined by a security game between 4.1 Decentralized ABE Scheme adversary A and challenger B. Global Setup(1λ): Taking the security parameter λ as Initialization: A provides an access policy W∗ that input, this algorithm produces two cyclic groups he/she wants to challenge and a list of corrupted au- G, GT with same order N = pp1, where p and p1 thorities CA (|CA| < N) to B. are two different primes, and a bilinear map, e : × → . Suppose that and are sub- Global Setup: B executes this algorithm and returns G G GT Gp Gp1 groups of with order p and p , respectively. Let g system parameters PP to A. G 1 p1 be a generator of Gp1 and gp, g0, g1 be generators of ∗ Authority Setup: Two cases are discussed here: Gp. H : {0, 1} → ZN is a hash function. Public pa-

rameters are PP = (e, p, p1, gp, g0, g1, gp1 ,H, G, GT ). 1) For the corrupted authorities, namely Ak ∈ CA, B performs the authority setup algorithm and Authority Setup(PP ): Let A˜k be an attribute set returns the public-secret key pair (PKk,SKk) monitored by authority Ak. Each Ak randomly se-

to A. lects αk, βk, ak,i,j ∈R ZN and Rk,Rk,i,j ∈R Gp1 (j = [1, n ]). Then A computes Y = e(g , g )αk ,Z = 2) For the uncorrupted authorities, namely Ak 6∈ i k k p p k β ak.i,j g k · R and A = gp · R as its public keys, CA, B performs the authority setup algorithm p k k,i,j k,i,j namely PK = (Y ,Z , {A } ). The secret and sends public keys PKk to A. k k k k,i,j j=[1,ni] keys are SKk = (αk, βk, {ak,i,j}j=[1,ni]). Phase 1: A initiates a private key query to B by sub- ˜k mitting a list of attributes L˜, and the only limitation KeyGen(PP,SKk, GID, Au): Let u = H(GID). Input ˜ is that the list of attributes submitted does not meet PP,SKk and user information (u, L), Ak performs the access policy to be challenged, namely, L˜ 6|= W∗. the PPKeyGen algorithm to produce the decryption ˜k u ∗ Then B executes the algorithm KeyGen and outputs keys as follows. For vk,i,j ∈ Au, Ak picks tk,i,j ∈R ZN P u the secret keys to A correspondingly. and sets tk,u = tk,i,j. Then, Ak calculates:

tu Challenge: The adversary A provides M0 and M1 to tk,u k,i,j (β +u)a αk βk+u uβk k k,i,j challenger B, where |M0| = |M1|. Then B picks Dk,u = gp g0 g1 ,Dk,i,j = g0 International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 820

˜k ˜ T ˜ ˜ where Au = Ak L, L represents an attribute list of The detail steps are shown in algorithm 2. In the the user. check phase, the equation gH(K) =? CT 00 is used to

Then, Ak outputs the secret keys verify the correctness of the symmetric key K. Only the user whose attributes satisfy the access policy k can get the correct K and further run the symmetric SKU = (Dk,u, {Dk,i,j}v ∈A˜k ). k,i,j u decryption algorithm Dec to obtain the correct data file M. It can be known from the algorithm that user Encrypt(PP,PK , M,W ): The owner employs the ex- k only needs to perform one exponential operation and isting symmetric encryption algorithm to encrypt the one pairing operation to obtain the symmetric secret data file M, and gets CT 0 = Enc (M), where K is K key K, thus greatly reducing the computing cost of the symmetric key. The decentralized ABE scheme user. is then applied to encrypt K under the access struc- ture W defined by the encryptor. Finally, the owner Algorithm 2 Decrypt uploads the ciphertext {CT,CT 0,CT 00} to the cloud 1: Begin server. Algorithm 1 gives the specific encryption k 0 00 2: Input {PP,SKU , {CT,CT ,CT }}. steps. Ic represents an index set of relevant authori- 3: User selects a random number z ∈ ∗ . ties A . ZN k 0k 0 0 1/z 1/z 4: Calculate SKU = (Dk,u,Dk,i,j) = (Dk,u ,Dk,i,j). 0k Algorithm 1 Encrypt 5: Set NK = (SKU ). 1: Begin 6: User sends (CT,NK) to the proxy server. 7: Pre-Decrypt: 2: Input {PP,PKk,M, K,W }. Q e(D0 ,C ) ∗ k∈Ic k,u 1 3: Select symmetric encryption algorithm Enc and key 8: Calculate T = Q 0 . e(D ,Ck,x,y ) K. vk,x,y ∈W k,i,j 9: Send T ∗ to the user. 4: for data file M do 10: 0 User-Decrypt: 5: CT = EncK(M). u ∗ z 11: Calculate B = e(C2, g1 ), K = CB/(T ) . 6: end for ? 12: Check gH(K) = CT 00 7: for symmetric key K do 13: if gH(K) = CT 00 then 8: 00 H(K) CT = g . 0 ∗ 14: M = DecK(CT ). 9: select s ∈R ZN ,R1,R2 ∈R Gp1 . Q s s Q s 15: else 10: C = K k∈I Yk ,C1 = gp·R1,C2 = ( k∈I Zk)·R2. c c 16: return ⊥. 11: for each attribute vk,x,y ∈ W do 0 17: end if 12: select Rk,x,y ∈R Gp1 . s 0 18: Output M or ⊥. 13: Ck,x,y = Ak,x,y · Rk,x,y. 19: End 14: end for

15: CT = (C,C1,C2, {Ck.x,y}vk,x,y ∈W ). 16: end for 0 00 17: Output {CT,CT ,CT }. Correctness. 18: End Q e(D0 ,C ) ∗ k∈Ic k,u 1 T = Q 0 v ∈W e(Dk,i,j,Ck,x,y) k k,x,y Decrypt(P P, GID, SKU ,CT ): Any user can download tk,u 0 00 β +u uβ the ciphertext {CT,CT ,CT } from the cloud server Q e((gαk g k g k )1/z, gs · R ) k∈Ic p 0 1 p 1 = tu for decryption. To reduce the computational burden, k,i,j the user can outsource some decryption operations Q (βk+u)ak,i,j 1/z sak,x,y s 0 v ∈W e((g0 ) , gp Rk,x,yRk,x,y) to the proxy server. The decryption algorithm has k,x,y tk,u α s s suβ 1/z two phases: Pre-Decrypt and User-Decrypt. The user Q (e(g , g ) k e(g , g ) βk+u e(g , g ) k ) k∈Ic p p p 0 p 1 ∗ = t first chooses z ∈R ZN , and then calculates the new s k,u Q (e(g , g ) βk+u )1/z keys NK. Then, the user sends (CT,NK) to the k∈Ic p 0

proxy server and keeps z secret. Y αks/z suβk/z = e(gp, gp) e(gp, g1) k∈I Pre-Decrypt: This phase is executed by the proxy c server. Input (CT,NK), the server performs

partial decryption calculation and sends the re- u Y βks s u Y suβk B = e(C2, g ) = e( g R ·R2, g ) = e(gp, g1) sult T ∗ to the user. 1 p k 1 k∈Ic k∈Ic User-Decrypt: The user decrypts K through the se- cret value z and the result T ∗ returned by the Q αks suβk server. Then, user uses the symmetric key K to K· e(gp, gp) · e(gp, g1) K = BC/(T ∗)z = k∈Ic get M. (Q e(g , g )αks/ze(g , g )suβk/z)z k∈Ic p p p 1 International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 821

∗ Algorithm 3 PPKeyGen Protocol 2) U picks z ∈R ZN and runs the Commit algorithm on 1: Begin the GID u. Let T be the commitment value. U com- 0 0 2: U selects u, ρ0 ∈R N , Ak selects ρ1, βk ∈R N putes (T,P0,T ,P ) and sends them to Ak. More- Z Z 0 ∗ 2PC ∗ z u 3: over, U sends P oK{(u, ρ0, z ): T = gp g1 ∧ P0 = U(u, ρ0) ←−−→ Ak(ρ1, βk): η = (u + βk)ρ0ρ1 mod N ρ ∗ g 0 } to A to anonymously prove that he/she has 4: U selects z , z1, z2, z3 ∈R N , and computes T = 0 k ∗ Z ∗ z u ρ0 0 z1 z2 0 z3 knowledge (u, ρ0, z ). gp g1 , P0 = g0 ,T = gp g1 , P0 = g0 0 0 5: U returns (T,P0,T ,P ) to Ak 0 3) Ak checks the proof first. If the proof is correct, then 6: Ak picks c ∈R ZN and sends c to U ˜ ˜ ∗ Ak computes (P1, Dk,u, Dk,i,j) and sends them to U. 7: U sets a1 = z1 − cz , a2 = z2 − cu, a3 = z3 − cρ0 and Otherwise, Ak will terminate the algorithm. Then, sends (a1, a2, a3) to Ak Ak sends P oK{(αk, βk, ρ1, tk,u, ak,i,j): D˜ k,u∧D˜ k,i,j ∧ 8: Ak checks the zero-knowledge proof a a P1} to U to anonymously prove that it has knowledge 9: if T 0 = ga1 g 2 T c and P 0 = g 3 P c then p 1 0 0 0 (αk, βk, ρ1, tk,u, ak,i,j). ˜k u 10: ∀vk,i,j ∈ Au, Ak selects tk,i,j ∈R ZN and sets tk,u = P u 4) Similarly, A checks the zero-knowledge proof. If tk,i,j k 11: Ak selects y1, y2, y3, y4, y5 ∈R ZN and computes this proof works correctly, U can compute Dk,u and tu k,i,j Dk,i,j. Otherwise, U stops this algorithm. ηa ρ1 0 y4 0 y2 ˜ k,i,j P1 = g0 , P1 = g0 , Zk = gp , Dk,i,j = P1 , ρ t Algorithm 3 illustrates the specific steps. 1 k,u y ˜ αk βk η ˜ 0 5 ˜ 0 Dk,u = gp T P0 , Dk,i,j = P1 , Dk,u = The PPKeyGen algorithm needs to meet two features: y1 y2 y3 gp T P0 leak-freeness and selective-failure blindness. In the first ˜ ˜ 0 ˜ 0 ˜ 0 0 12: Ak sends (P1, Dk,u, Dk,i,j,P1, Dk,u, Dk,i,j,Zk) to U one, a malicious user running the PPKeyGen algorithm 13: else with trusted authorities would not gain more informa- 14: Abort tion than executing the KeyGen algorithm. The second 15: end if means that a malicious authority cannot get any informa- 0 0 16: U chooses c ∈R ZN , and sends c to Ak tion about the user’s GID, nor can it fail the PPKeyGen 0 0 0 0 0 17: Ak sets y1 = y1 − c αk, y2 = y2 − c βk, y3 = y3 − algorithm based on the user’s selection of GID. The defi- u 0 tk,uρ1 0 0 0 0 tk,i,j c , y = y4 − c ρ1, y = y5 − c nitions of the two features are shown below. η 4 5 ηak,i,j 0 0 0 0 0 18: Ak sends (y1, y2, y3, y4, y5) to U. Definition 2. (leak-freeness). A PPKeyGen algorithm 19: U checks the proof is leak-free if for all efficient adversaries U, there exists y0 0 y0 0 0 0 0 4 c ˜ 0 1 y2 y3 ˜ c ˜ 0 0 20: if P1 = g0 P1 ,Dk,u = gp T P0 (Dk,u) ,Dk,i,j = a simulator U such that no efficient distinguisher D can 0 y0 y0 0 ˜ c 5 0 2 c distinguish whether U is executing in the real game or in Dk,i,jP1 and Zk = gp (Zk) then ˜ the ideal game with non-negligible advantage. The two Dk,u ˜ ρ0 21: U computes Dk,u = z∗ ,Dk,i,j = Dk,i,j, where Zk games are defined as follows: β Zk = g k p Real Game: The distinguisher D runs the setup algo- 22: else rithm as many times as it wants, the malicious user 23: Abort U selects a GID u and executes the PPKeyGen algo- 24: end if rithm with the honest authority. 25: End Ideal Game: The distinguisher D runs the setup algo- rithm as many times as it wants, and the simulator 4.2 Privacy-Preserving Key Generation U0 chooses a GID u0 and executes the KeyGen algo- Protocol rithm with a trusted authority. Each user has a unique global identifier that distin- Definition 3. (selective-failure blindness). A PPKeyGen guishes them from each other, if the GID is exposed to algorithm is selective-failure blind if no probably polyno- a malicious party, this will directly endanger the privacy. mial time adversary Ak can win the following game with To achieve GID hidden, the knowledge of ZKP and com- non-negligible advantage. mitment scheme can be utilized to conduct a privacy- preserving key generation protocol between the relevant 1) The malicious authority Ak submits public key PKk authority and the user. Without revealing GID, user can and two global identifiers u0, u1. prove his/her legitimacy to the authority and get the se- 2) A random bit b ∈ {0, 1} is selected. cret keys generated by the authority. The PPKeyGen algorithm is presented as follows. 3) Ak is given two comments comb and com1−b, then it can black-box access oracles U(P P, ub,PKk, comb) 1) The user U selects u, ρ ∈ , and the authority A 0 ZN k and U(P P, u1−b,PKk, com1−b). selects ρ1, βk ∈ ZN . Then Ak executes the 2-party 4) The algorithm U returns the secret keys SKk and secure computing (2PC) protocol with U and gets Ub η = (u + β )ρ ρ mod N. SKk , separately. k 0 1 U1−b International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 822

5) If SKk 6=⊥ and SKk 6=⊥, A is given If X0 = Y 0, namely, a = a , means that Ub U1−b k k,i,j k,x,y 0 0 (SKk ,SKk ); If SKk 6=⊥ and SKk =⊥, A vk,i,j ∈ W ; If X 6= Y , namely, ak,i,j 6= ak,x,y, means Ub U1−b Ub U1−b k is given (, ⊥); If SKk =⊥ and SKk 6=⊥, A that vk,i,j 6∈ W . Ub U1−b k k k is given (⊥, ); If SK =⊥ and SK =⊥, Ak is Ub U1−b Do the same operations on the proposed scheme, then given (⊥, ⊥). X = e(A ,C ) = e(gak,i,j · R , gs · R ) 0 k,i,j 1 p k,i,j p 1 6) Finally, Ak returns its guess b on b. Ak wins the = e(g , g )sak,i,j e(R ,R ) game if b0 = b. p p k,i,j 1 sak,x,y s 0 Y = e(gp,Ck,x,y) = e(gp, gp · Rk,x,y · Rk,x,y) = e(g , g )sak,x,y 5 Security Analysis p p An attacker cannot determine whether the attribute value 5.1 Scheme Analysis vk,i,j belongs to W by comparing the values of X and Y . Therefore, the privacy of the recipient is protected to a Identity Privacy: The PPKeyGen algorithm intro- certain extent. duced in this paper is an interactive process between the user and each authority, which ensures that the user can obtain the correct secret key from the au- 5.2 Security Proof thority without directly providing the unencapsu- Theorem 1. The proposed scheme is secure in the se- lated u. In this case, the identifier u is confidential lective access policy model, assuming that the DBDH as- to all authorities, so they can no longer track it to sumption holds. aggregate user’s information. Theorem 2 in Section 5.2 shows that this protocol satisfies leak-freeness and Proof. Assuming that the proposed scheme can be bro- selective-failure blindness. ken by an adversary A with a non-negligible advantage , then, using the ability of A, a simulator B can be con- Collusion Resistance: In this construction, the au- structed to solve the DBDH problem with the advantage thors bind all the user’s key components to u, making /2. the user’s secret key unique. The power settings of g0 tk,u and g1 (namely, and uβk, respectively) make it Firstly, the challenger generates a bilinear group βk+u impossible for different users to achieve effective at- (e, p, p1, G, GT ), where G = Gp × Gp1 . Then he/she picks tacks by combining their secret keys. What’s more, a random number ϕ ∈ {0, 1}, and sets: the scheme can prevent the collusion attack of multi- ( abc ple authorities. By observing the ciphertext compo- Z = e(gp, gp) , ϕ = 0 Q αks (1) nent C in CT (namely, K e(gp, gp) ), it can z k∈Ic Z = e(gp, gp) , ϕ = 1 be known that only when all relevant αk is obtained can the message K be restored. Therefore, as long where, z is a random number in group G. The sim- as one authority is honest in the set of authority in- ulator gets the challenge tuple (gp, gp1 , A, B, C, Z) = volved, the attack will not succeed. a b c (gp, gp1 , gp , gp, gp,Z) from the challenger and finally re- turns a guess ϕ0 on ϕ. Hidden Policy: The authors implement attributes anonymity in the access policy by applying some Initialization: The adversary A provides a set of cor- 0 random elements (R1,R2,R ) in group Gp on ∗ k,x,y 1 rupted authorities CA and an access structure W some ciphertext components (C1,C2,Ck,x,y) [20,33]. which he/she wants to challenge. Suppose there is Such a setting is very necessary. The existence at least one completely honest authority A∗ in the of these random elements makes it difficult for security game. an attacker to easily guess the value of the at- tribute embedded by the encryptor in the access Global Setup: B randomly chooses γ, θ ∈R ZN and sets γ a+γ θ policy. Without the introduction of these random g0 = A · gp = gp , g1 = gp. Then, B sends PP = elements, the original ciphertext would be C0 = 1 (e, p, p1, gp, g0, g1, gp1 ,H, G, GT ) to adversary A. gs,C0 = (Q Z0 s),C0 = A0s , where Z0 = p 2 k∈Ic k k,x,y k,x,y k a βk 0 k,i,j Authority Setup: Three cases are discussed here: gp ,Ak,i,j = gp . An attacker needs only a simple 0 0 ? 0 test e(Ak,i,j,C1) = e(gp,Ck,x,y) to be able to guess 1) For the corrupted authorities Ak ∈ CA, B whether the encryptor has embedded the attribute randomly selects αk, βk, ak,i,j ∈R ZN and

value vk,i,j in the access policy or not. In fact, Rk,Rk,i,j ∈R Gp1 , then calculates Yk = a αk βk k.i,j e(gp, gp) ,Zk = gp · Rk and Ak,i,j = gp · 0 0 0 ak,i,j s sak,i,j X = e(Ak,i,j,C1) = e(gp , gp) = e(gp, gp) Rk,i,j. Then, B sends the secret keys SKk =

(αk, βk, {ak,i,j}j=[1,ni]) and the public keys 0 0 sak,x,y sak,x,y Y = e(gp,Ck,x,y) = e(gp, gp ) = e(gp, gp) PKk = (Yk,Zk, {Ak,i,j}j=[1,ni]) to adversary A. International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 823

2) For the authority Ak not corrupted and not Challenge: The adversary A provides two messages of ∗ A , B randomly chooses αk, βk, ak,i,j ∈R ZN the same length K0 and K1. B chooses a random bit ∗ and Rk,Rk,i,j ∈R Gp1 , computes Yk = ξ ∈ {0, 1} and then encrypts the message Kξ: CT = a bαk βk k,i,j ∗ ∗ c ∗ Q βkc c e(gp, gp) ,Zk = gp ·Rk,Ak,i,j = gp ·Rk,i,j, {C = Kξ · Z,C1 = g · R1,C2 = ( k∈I gp · Rk) · ca c ∗ bak,i,j ∗ ∗ k,x,y c 0 when vk,i,j ∈ W , or Ak,i,j = gp · Rk,i,j, R2, ∀vk,x,y ∈ W : Ck,x,y = gp · Rk,x,y · Rk,x,y}. when v 6∈ W∗. B sends the public keys k,i,j Phase 2: Same as Phase 1. PKk = (Yk,Zk, {Ak,i,j}j=[1,ni]) to adversary A. 3) For the authority A∗, B randomly selects β∗, Guess: Adversary A returns the guess ξ0 on ξ. If ξ0 = ξ, ∗ ∗ ∗ simulator B returns ϕ0 = 0 to the challenger; other- ai,j ∈R N and R ,Ri,j ∈R p1 , then com- Z G 0 putes Y ∗ = e(ga, gb) · Q e(g , g )−αk · wise, simulator B returns ϕ = 1. p p Ak∈CA p p ∗ Q −bαk ∗ β ∗ ∗ e(g , g ) and Z = g · R . z Ak6∈CA∪A p p p ∗ ∗ If ϕ = 1, Z = e(gp, gp) is a random value, adversary ∗ ai,j ∗ ∗ ∗ bai,j ∗ 0 Ai,j = gp ·Ri,j, if vi,j ∈ W ; Ai,j = gp ·Ri,j, A cannot get any information about ξ, so P r[ξ 6= ξ | ∗ 1 0 0 if vi,j 6∈ W . Then, B sends the public keys ϕ = 1] = 2 . Since B returns ϕ = 1 when ξ 6= ξ, thus ∗ ∗ ∗ ∗ P r[ϕ0 = ϕ | ϕ = 1] = 1 . PK = (Y ,Z , {Ai,j}j=[1,ni]) to adversary A. 2 If ϕ = 0, then Z = e(g , g )abc, the adversary A will Phase 1: The authority A issues a secret keys query for p p get a valid ciphertext of message K . The advantage of A global identifier u∗ with a list of attributes L∗, ξ ∗ ∗ in breaking the proposed scheme is  (non-negligible) by where L 6|= W . 0 1 definition, so P r[ξ = ξ | ϕ = 0] = 2 + . Since B returns ϕ0 = 0 when ξ0 = ξ, thus P r[ϕ0 = ϕ | ϕ = 0] = 1 + . 1) For A ∈ C , A can generate the user secret 2 k A Therefore, the advantage of B to break the DBDH as- keys directly by himself. 1 0 1 0 sumption is | 2 P r[ϕ = ϕ | ϕ = 0] + 2 P r[ϕ = ϕ | ϕ = ∗ k 1 2) For A 6∈ C ∪ A , ∀v ∈ A˜ ∗ , B picks k A k,i,j u 1] − 2 |= /2 (non-negligible). u∗ P u∗ tk,i,j ∈R ZN and sets tk,u∗ = tk,i,j. Then, tk,u∗ Theorem 2. The proposed PPKeyGen algorithm is leak- ∗ β +u∗ αk k u βk B computes Dk,u∗ = B g0 g1 ,Dk,i,j = free and selective-failure blind. u∗ tk,i,j ∗ (βk+u )ak,i,j Proof. (Leak-freeness) Suppose that there is a malicious g0 . user U runs the PPKeyGen algorithm with an honest Ak ∗ ˜∗ u∗ 3) For Ak = A , ∀vi,j ∈ Au∗ , B chooses ti,j ∈R ZN , in the real game. There should also exist a simulator P u∗ ∗ ˜ sets tu∗ = ti,j, and computes Du∗ = U runs the KeyGen algorithm with a trusted authority in tu∗ ∗ ∗ ∗ ∗ the ideal game such that no distinguisher D can effectively −γ β +u u β Q −αk Q −αk B g0 g1 gp B ˜ ∗ distinguish the two games. The simulator U can simulate Ak∈CA Ak6∈CA∪A u∗ the communication between D and U. U˜ works as follows: ti,j (β∗+u∗)a and D∗ = g i,j . Where, ˜ i,j 0 1) U sends PP and the public-key PKk of authority Ak U t ∗ to the malicious user . u ∗ ∗ ∗ −γ β∗+u∗ u β D ∗ = B g g u 0 1 2) The U needs to prove to U˜ that he/she owns u in Y −αk Y −αk gp B zero-knowledge by submitting two values (T,P0). If ∗ ˜ ∗ Ak∈CA Ak6∈CA∪A the proof succeeds, then U will gets (u, ρ0, z ) using tu∗ rewind technique. −bγ β∗+u∗ = gp g0 P P ˜ −( αk+ bαk) 3) U sends u to the trusted party and gets secret keys ∗ ∗ ∗ u β Ak∈CA Ak6∈CA∪A (D ,D ). g1 gp k,u k,i,j P P ab−( αk+ bαk)−b(a+γ) A ∈C ∗ ˜ k A Ak6∈CA∪A 4) U chooses ρ ∈R Zp, and calculates ρ1 = ρ/ρ0,P1 = = gp ∗ ρ1 z 1/ρ0 t ∗ g , D˜ = D · Z , D˜ = D . Then U˜ re- u ∗ ∗ 0 k,u k,u k k,i,j k,i,j g β∗+u∗ gu β 0 1 turns (P1, D˜ k,u, D˜ k,i,j) to U. P P ab−( αk+ bαk) ∗ Ak∈CA Ak6∈CA∪A = gp t ∗ u −b ∗ ∗ β∗+u∗ u β If (D ,D ) are correct keys from the trusted AA in g0 g1 k,u k,i,j the ideal game, then (D˜ k,u, D˜ k,i,j) are correct keys from 0 ∗ ∗ A in the real game. The distinguisher D cannot distin- Let tu∗ = tu∗ − b(β + u ), then k guish the real game with the ideal game. t0 P P u∗ ab−( αk+ ∗ bαk) ∗ ∗ ∗ Ak∈CA Ak6∈CA∪A β∗+u∗ u β Du∗ = gp g0 g1 Proof. (Selective-failure blindness) The malicious au- thority Ak provides PKk and two global identifiers ∗ Therefore, Du∗ is a valid secret key. (u0, u1). A random bit b ∈ {0, 1} is picked. Ak can International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 824

Table 2: The comparison of computing cost Scheme Setup KeyGen Encryption Decryption Huang [15] P + (N + 2)E (2|AU | + N + 1)E (2|Ac| + 2)E (2|Ac| + N + 1)P + |Ac|E Qian [26] P + (2N + m · |U|)E (2|AU | + 4N)E (2|Ac| + 2)E (|Ac| + 3N + 1)P Qian [27] P + (3N + |U|)E (|AU | + 3N(N − 1))E (|Ac| + 2)E (N|Ac| + 1)P + (|Ac| + 1)E Ours P + (2N + m · |U|)E (|AU | + 3N)E (|Ac| + N + 2)E (|Ac| + N + 1)P + E 1 P : the bilinear pairing operation. E: the exponential operation. | ∗ |: the number of elements in ∗. 2 AU : the attribute set of U. Ac: the attribute set in ciphertext. m: the number of values for each attribute.

have a black box access to U(ub, comb,PKk,PP ) and 6 Performance Analysis U(u1−b, com1−b,PKk,PP ). Then, U runs PPKeyGen algorithm with A and returns secret keys SKk and In this section, Table 2 and Figure 3 show the comparisons k Ub SKk : If SKk 6=⊥ and SKk 6=⊥, A is given of computation costs and efficiency, respectively. The sim- U1−b Ub U1−b k (SKk ,SKk ); If SKk 6=⊥ and SKk =⊥, A is ulation is executed on a Windows machine with 2.70 GHz Ub U1−b Ub U1−b k k k Intel(R) Core (TM) i5-4210U CPU and 4 GB RAM. The given (, ⊥); If SK =⊥ and SK 6=⊥, Ak is given Ub U1−b implementation is based on Java Pairing-Based Cryptog- (⊥, ); If SKk =⊥ and SKk =⊥, A is given (⊥, ⊥). Ub U1−b k raphy (PBC) Library (version 0.5.14). Random elements Finally, A outputs a guess b0 on b. k in group Gp1 are introduced into our scheme to implement policy hiding. Apart from this function, the authors com- pare the scheme with schemes [15,26,27]. Figure 3(a) and In the PPKeyGen algorithm, U first computes T,P0, Figure 3(b) show the comparisons of Setup time and Key- ∗ ∗ z u ρ0 and proves P oK{(u, ρ0, z ): T = gp g1 ∧ P0 = g0 }. So Gen time with different number of attribute authorities. far, the two oracles should be computationally indistin- In this simulation, the number of attribute authorities guishable to Ak. Otherwise, it will violate the commit- is increased from 2 to 14, each authority monitors 5 at- ment scheme’s hiding property and the witness undistin- tributes, and each attribute has 3 values. Scheme [15] guishable of the zero-knowledge proof. Assume that Ak uses a hash function to map the user’s attributes to a outputs secret keys for the first oracle with some comput- random group element. Attribute authorities do not need ing strategies. The next thing to prove is that Ak can to calculate the public keys related to the attributes, so predict secret keys for user without interaction with the the setup stage of the scheme takes less time. Figure 3(b) two oracles: shows that the proposed scheme takes less time to gen- erate the secret keys than the other three schemes. The reason why the secret key generation time of scheme [27] 1) Ak checks P oK{(αk, βk, rk,u, ρ1, η): D˜ k,u = tu increases rapidly with the number of authorities is that ρ t k,i,j 1 k,u ρ ηa αk βk η 1 ˜ k,i,j gp T P0 ∧ P1 = g0 ∧ Dk,i,j = P1 }. If every two authorities have to interact with each other. k The KeyGen time is a quadratic function of the indepen- the proof fails, Ak outputs SKU =⊥. 0 dent variable N. Figure 3(c) and Figure 3(d) show the comparisons of encryption time and decryption time with ˜ ˜ 2) Ak generates different (Dk,u, Dk,i,j) for the second different number of attributes in the access policy, and the ˜ oracle and proves P oK{(αk, βk, rk,u, ρ1, η): Dk,u = number of attribute authorities in the system is fixed at tu ρ t k,i,j 1 k,u ρ ηa 5. This proposed scheme encryption time is longer than αk βk η 1 ˜ k,i,j gp T P0 ∧P1 = g0 ∧Dk,i,j = P1 }. If the scheme [27], but the difference is very small. Figure 3(d) proof fails, A outputs SKk =⊥. k U1 shows that the proposed scheme is superior to the other three schemes in the decryption phase.

3) Finally, Ak returns its prediction on (u0, u1). If SKk 6=⊥ and SKk 6=⊥, the prediction is U0 U1 (SKk ,SKk ); If SKk 6=⊥ and SKk =⊥, the pre- 7 Conclusion U0 U1 U0 U1 diction is (, ⊥); If SKk =⊥ and SKk 6=⊥, the U0 U1 To implement data sharing, a privacy-preserving decen- prediction is (⊥, ); If SKk =⊥ and SKk =⊥, the U0 U1 tralized ABE scheme is proposed in this paper. All at- prediction is (⊥, ⊥). tribute authorities are independent of each other and do not require any collaboration. All users can get the se- The predication has the same distribution with the oracles cret keys from authorities by using the PPKeyGen pro- as Ak performs the same check as the honest user. There- tocol without revealing his/her GID, which meets users’ fore, if Ak can predict the user secret keys, then Ak, with requirement to protect their private information. Besides, or without the final outputs, has the same advantage. It this scheme supports policy anonymity, so one cannot get means that Ak can distinguish the two oracles before the any information about the user attributes. The security of prediction. However, based on the security of commit- the scheme is proved under a standard model and the sim- ment scheme and zero-knowledge proof, the advantage of ulation results verify the efficiency of the scheme. The dis- Ak in distinguishing the two oracles is negligible. advantage of the proposed scheme is that it only supports International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 825

500 1200

450 Huang [15] Huang [15] Qian [26] Qian [26] Qian [27] 1000 400 Qian [27] Ours Ours 350 800 300

250 600

200 400

The time of setup (ms) 150 The time of the KeyGen (ms) 100 200 50

0 0 2 4 6 8 10 12 14 2 4 6 8 10 12 14 The number of attribute authorites The number of attribute authorites (a) (b)

140 2500

Huang [15] Huang [15] Qian [26] 120 Qian [26] Qian [27] Qian [27] 2000 Ours Ours 100

1500 80

60 1000

40

The time of the encryption (ms) The time of the decryption (ms) 500 20

0 0 5 10 15 20 25 30 35 5 10 15 20 25 30 The number of attributes in the access structure The number of attributes in the access structure (c) (d)

Figure 3: Efficiency comparisons of the proposed scheme with others the AND-gate access structure. Constructing encryption [6] M. Chase and S. S. M. Chow, “Improving privacy and schemes to support more flexible access structures are left security in multi-authority attribute-based encryp- as the future works. tion,” in Acm Conference on Computer and Com- munications Security, pp. 121–130, 2009. [7] Y. Fan, X. Wu, and J. Wang, “Multi-authority Acknowledgments attribute-based encryption access control scheme with hidden policy and constant length ciphertext for This work was supported in part by the National Cryptog- cloud storage,” in IEEE Second International Con- raphy Development Fund under grant (MMJJ20180209). ference on Data Science in Cyberspace, pp. 205–212, 2017. [8] T. Feng and J. Guo, “A new access control system References based on CP-ABE in named data networking,” Inter- national Journal of Network Security, vol. 20, no. 4, [1] Y. Baseri, A. Hafid, and S. Cherkaoui, “Privacy 2018. preserving fine-grained location-based access control [9] T. Feng, X. Yin, Y. Lu, J. Fang, and F.Li, “A search- for mobile cloud,” Computers and Security, vol. 73, able CP-ABE privacy preserving scheme,” Interna- pp. 249–265, 2018. tional Journal of Network Security, vol. 21, no. 4, [2] J. Bethencourt, A. Sahai, and B. Waters, 2019. “Ciphertext-policy attribute-based encryption,” [10] A. Ge, J. Zhang, R. Zhang, C. Ma, and Z. Zhang, in IEEE Symposium on Security and Privacy, “Security analysis of a privacy-preserving decentral- pp. 321–334, 2007. ized key-policy attribute-based encryption scheme,” [3] J. Camenisch and M. Stadler, “Efficient group signa- IEEE Transactions on Parallel and Distributed Sys- ture schemes for large groups,” in Advances in Cryp- tems, vol. 24, no. 11, 2013. tology (CRYPTO’97), pp. 410–424, 1997. [11] V. Goyal, O. Pandey, A. Sahai, and B. Waters, [4] Z. Cao, L. Liu, and Z. Guo, “Ruminations on “Attribute-based encryption for fine-grained access attribute-based encryption,” International Journal control of encrypted data,” in ACM Conference on of Electronics and Information Engineering, vol. 8, Computer and Communications Security, pp. 89–98, no. 1, 2018. 2006. [5] M. Chase, “Multi-authority attribute based encryp- [12] J. Han, W. Susilo, Y. Mu, and J. Yan, “Privacy- tion,” in Conference on Theory of Cryptography, preserving decentralized key-policy attribute-based pp. 515–534, 2007. encryption,” IEEE Transactions on Parallel and Dis- International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 826

tributed Systems, vol. 23, no. 11, pp. 2150–2162, in Acm on Symposium on Access Control Models and 2012. Technologies, pp. 255–262, 2017. [13] J. Han, W. Susilo, and Y. Mu, et al., “PPDCP- [26] H. Qian, J. Li, and Y. Zhang, “Privacy-preserving de- ABE: Privacy-preserving decentralized ciphertext- centralized ciphertext-policy attribute-based encryp- policy attribute-based encryption,” in Computer Se- tion with fully hidden access structure,” in Interna- curity, pp. 73–90, 2014. tional Conference on Information and Communica- [14] S. Hu, J. Li, and Y. Zhang, “Improving security and tions Security, pp. 363–372, 2013. privacy-preserving in multi-authorities ciphertext- [27] H. Qian, J. Li, Y. Zhang, and J. Han, “Privacy- policy attribute-based encryption,” KSII Transac- preserving personal health record using multi- tions on Internet and Information Systems, vol. 12, authority attribute-based encryption with revoca- no. 10, 2018. tion,” International Journal of Information Security, [15] X. Huang, Q. Tao, B.Qin, and Z. Liu, “Multi- vol. 14, no. 6, 2015. authority attribute based encryption scheme with re- [28] Y. Rahulamathavn, S. Veluru, J. Han, F. Li, vocation,” in International Conference on Computer M. Rajarajan, and R. Lu, “User collusion avoid- Communication and Networks, pp. 1–5, 2015. ance scheme for privacy-preserving decentralized [16] T. Jung, X. Li, Z. Wan, and M. Wan, “Con- key-policy attribute-based encryption,” IEEE Trans- trol cloud data access privilege and anonymity with actions on Computers, vol. 65, no. 9, 2016. fully anonymous attribute-based encryption,” IEEE [29] S. Rezaei, M. A. Doostari, and M. Bayat, “A Transactions on Information Forensics and Security, lightweight and efficient data sharing scheme for vol. 10, no. 1, 2015. cloud computing,” International Journal of Electron- [17] C. C. Lee, P. S. Chung, and M. S. Hwang, “A survey ics and Information Engineering, vol. 9, no. 2, 2018. on attribute-based encryption schemes of access con- [30] A. Sahai and B. Waters, “Fuzzy identity-based en- trol in cloud environments,” International Journal of cryption,” in Advances in Cryptology, pp. 457–473, Network Security, vol. 15, no. 4, 2013. 2005. [18] A. Lewko and B. Waters, “Decentralizing attribute- [31] M. Wang, Z. Zhang, and C. Chen, “Security analy- based encryption,” in Advances in Cryptology- sis of a privacy-preserving decentralized ciphertext- eurocrypt -international Conference on the The- policy attribute-based encryption scheme,” Concur- ory and Applications of Cryptographic Techniques, rency and Computation: Practice and Experience, pp. 568–588, 2011. vol. 28, no. 4, 2015. [19] M. Li, S. Yu, Y. Zheng, K. Ren, and W. Lou, “Scal- [32] F. Xhafa, J. Feng, Y. Zhang, X. Chen, and J. Li, able and secure sharing of personal health records in “Privacy-aware attribute-based phr sharing with cloud computing using attribute-based encryption,” user accountability in cloud computing,” Journal of IEEE Transactions on Parallel and Distributed Sys- Supercomputing, vol. 71, no. 5, 2015. tems, vol. 24, no. 1, 2013. [33] R. Xu, Y. Wang, and B. Lang, “A tree-based CP- [20] X. Li, D. Gu, Y. Ren, N. Ding, and K. Yuan, ABE scheme with hidden policy supporting secure “Efficient ciphertext-policy attribute based encryp- data sharing in cloud computing,” in International tion with hidden policy,” in Internet and Distributed Conference on Advanced Cloud and Big Data, pp. 51– Computing Systems, pp. 146–159, 2012. 57, 2013. [21] C. W. Liu, W. F. Hsien, C. C. Yang, and M. S. H. [34] L. Zhang, P. Liang, and Y. Mu, “Improving privacy- Wang, “A survey of attribute-based access control preserving and security for decentralized key-policy with user revocation in cloud data storage,” Inter- attributed-based encryption,” IEEE Access, vol. 6, national Journal of Network Security, vol. 18, no. 5, pp. 12736–12745, 2018. 2016. [35] L. Zhang and H. Yin, “Recipient anonymous [22] L. Liu, Z. Cao, and C. Mao, “A note on one out- ciphertext-policy attribute-based broadcast encryp- sourcing scheme for big data access control in cloud,” tion,” International Journal of Network Security, International Journal of Electronics and Information vol. 20, no. 1, 2018. Engineering, vol. 9, no. 1, 2018. [23] M. Lyu, X. Li, and H. Li, “Efficient, verifiable and [36] Y. Zhang, D. Zheng, and R.H. Deng, “Security privacy preserving decentralized attribute-based en- and privacy in smart health: Efficient policy-hiding cryption for mobile cloud computing,” in IEEE Sec- attribute-based access control,” IEEE Internet of ond International Conference on Data Science in Cy- Things Journal, vol. 5, pp. 2130–2145, 2018. berspace, 2017. DOI: 10.1109/DSC.2017.8. [24] T. P. Pedersen, “Non-interactive and information- theoretic secure verifiable secret sharing,” in Interna- Biography tional Cryptology Conference on Advances in Cryp- Li Kang is a master degree student in the school of tology, pp. 129–140, 1991. mathematics and statistics, Xidian University. Her re- [25] H. S. G. Pussewalage and V. A. Oleshchuk, “A dis- search interests focus on computer and network security. tributed multi-authority attribute based encryption scheme for secure sharing of personal health records,” Leyou Zhang is a professor in the school of mathemat- International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 827 ics and statistics at Xidian University, Xi’an China. He received his PhD from Xidian University in 2009. From Dec. 2013 to Dec. 2014, he is a research fellow in the school of computer science and software engineering at the University of Wollongong. His current research in- terests include network security, computer security, and cryptography. International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 828

Evidence Gathering of Facebook Messenger on Android

Ming Sang Chang and Chih Ping Yen (Corresponding author: Chih Ping Yen)

Department of Information Management, Central Police University Taoyuan 33304, Taiwan (Email: [email protected]) (Received Aug. 28, 2019; Revised and Accepted Feb. 5, 2020; First Online Feb. 26, 2020)

Abstract daily activities. Facebook Messenger is a messaging application. Orig- The trend in social networking is changing people’s inally developed as Facebook Chat in 2008, the company lifestyle. Since both the smart phone and computers are revamped its messaging service in 2010. It released stan- connected to the same tools, the newly developed applica- dalone iOS and Android apps in August 2011. Over the tions must serve both ends to please the users. Therefore, years, Facebook has released new apps on a variety of the modes of cybercrime have also changed in accordance different operating systems, launched a dedicated website with the users’ activities. In order to identify crimes, it interface, and separated the messaging functionality from is necessary to use appropriate forensic techniques to re- the main Facebook app. Users can send messages and trieve these traces and evidence. This study considers the exchange photos, videos, stickers, audio, and files, as well social network, Facebook Messenger, as the research sub- as react to other users’ messages and interact with bots. ject. We analyze the artifacts left on the Facebook Mes- The service also supports voice and video calling. After senger application and show evidence of gathering, such being separated from the main Facebook app, Facebook as sending texts, pictures, and videos and making calls Messenger had 600 million users in April 2015. This grew on the Android platform. This study explores the differ- to 900 million in June 2016, 1 billion in July 2016, and ences between the traces that are left on Non-Rooted and more 1.3 billion monthly active users in October 2019 [22]. Rooted Android platform. Finally, the forensic analysis Social networking websites provide a virtual exchange found, due to the differences in privacy control, can lead space on the Internet for people with common interests, to discrepancies in recording the user behaviors on the hobbies, and activities to easily share, discuss, and ex- same social network. It proves to be helpful to forensic change their views without any limitation of space and analysts and practitioners because it assists them in map- time. Therefore, social networking websites continue to ping and finding digital evidences of Facebook Messenger accumulate a large number of users. According to the on Android smart phone. Metcalfe’s law, the value of a telecommunications network Keywords: Cybercrime; Digital Forensics; Facebook Mes- is proportional to the square of the number of connected senger; Social Network users of the system. As a result, social networking has be- come a great force in today’s society. However, this has also brought about endless criminal activities on social 1 Introduction networks, such as cyberbullying, social engineering, and identity theft, among the other issues. Due to the follow- Nowadays, social networking sites are increasingly popu- ing characteristics, the detecting cybercrime on social net- lar. The popularity of social networking sites has given works is different in comparison to other cybercrime [10]. rise to the number of social networking users for busi- Therefore, to assist the investigators in improving their ness, recreation or any other likely purposes. Over the efficiency of solving crimes, researches focusing on these past few years, social networking sites have been signif- upcoming technologies are needed [16]. icant mediums for people to enhance their interpersonal relationships. The prevalence of social networking web- 1) Anonymity: Users are often unaware of the true iden- sites has changed the living habits of many people. They tity of their counterpart in a social network because share their emotion or daily life with their friends via tex- they are dealing with a fake account. Therefore, in ting, photographing or videoing. There is no doubt that the case of a social network cybercrime, it is difficult people have integrated social networking sites into their to extract the suspect’s information and make arrests lives and turned the use of social networking sites into immediately [1]. International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 829

2) Diffuseness: Any news published on the social net- chance that a record can be found in the registry to prove work will be forwarded or shared immediately, which that the instant messenger has once installed onto the generates the diffusion effect [21]. Therefore, if a so- system. Information is also stored within the memory. cial network crime is not responded to immediately, Since every application requires memory to execute, it is it may cause the victim to suffer some serious dam- logical to think that there evidence could be left behind age. in the system’s memory. The analysis on live memory allows us to extend the possibility in providing additional 3) Cross-Regional feature: Due to the nature of Inter- contextual information for any cases. net, the location of the cybercrime is not necessarily Presently, various researches focusing on the forensic the place where the criminal suspects are located. A analysis of social networking are being conducted. Arti- bottleneck is formed during the crime investigation facts of instant messaging have been of interest in many due to the difficulty in locating the suspects [13]. different digital forensic studies. Early work focused on artifacts left behind by many instant messaging applica- 4) Vulnerability of evidence: The evidences obtained on tions, such as MSN Messenger [7], Yahoo Messenger [8], social networks are in the form of digital data. In ad- and AOL Instant Messenger [17]. In 2013 Mahajan per- dition to the highly volatile nature of the digital ev- formed forensic analysis of Whatsapp and Viber on five idences in the processing program from collection to android phones using UFED and manual analysis [12]. storage, it is easy to change, delete, lose, or contam- Katie Corcoran forensic 7 Messaging applications and first inate the digital evidences due to the anti-forensics analyzed the evidence of Facebook Messenger [5]. Leven- operation of the suspects or negligence of the inves- doski concluded that artifacts of the Yahoo Messenger tigators [14]. client produced a different directory structure on Win- This study considers the social network, Facebook Mes- dows Vista and 7 [11]. Wong and Al Mutawa demon- senger, as the study subject. User activities are per- strated that artifacts of the Facebook web-application formed through Android smart phones. Forensic analysis could be recovered from memory dumps and web brows- is conducted to understand what type of user behavior ing cache [15, 25]. leaves digital evidence on Android. We explore the dif- Said investigated Facebook and other IM applications, ferences between the traces that are left on Non-Rooted it was determined that only BlackBerry Bold 9700 and and Rooted Android platform. The results will be served iPhone 3G/3GS provided evidence of Facebook unen- as a reference for the future researchers in social network crypted [19]. Sgaras analyzed Skype and several other cybercrime investigation or digital forensics. VoIP applications for iOS and Android platforms [20]. It The rest of this paper is organized as follows. In the was concluded that the Android apps store far less arti- next section, we present our related works. In Section facts than of the iOS apps. Chu focused on live data ac- 3, we present our methodology. In Section 4, we present quisition from personal computer and was able to identify the results and findings of digital forensics on Facebook distinct strings that will assist forensic practitioners with Messenger. In Section 5, we discuss the findings. Finally, reconstruction of the previous Facebook sessions [4]. The we summarize our conclusions. analysis was conducted on an iPhone running iOS6 and a Samsung Galaxy Note running Android 4.1. Walny- cky analyzed 20 popular instant messaging applications 2 Related Works for Android, of which Facebook Messenger can get evi- dences such as text chat, voice call, audio, video, image, The evidences were stored on three principle areas by us- location, and stickers [24]. Azfar adapt a widely used ing instant messenger program (IM). They are hard drive, adversary model from the cryptographic literature to for- memory, and network. Some IM services have the ability mally capture a forensic investigator’s capabilities during to log information on the user’s hard drive. To use an IM, the collection and analysis of evidentiary materials from an account must be established to create a screen name mobile devices [2]. provided with user information. Some instant messenger William Glisson explored the effectiveness of different providers might assist the investigation with information forensic tools and techniques for extracting evidences on of the account owner. mobile devices [9]. In 2015, Nikos Virvilis presented stud- Evidence can be found in various internet file caches ies based on the security of web browsers and reported used by Internet Explorer for volatile IM and each cache the shortcomings and vulnerabilities of browsers oper- holds different pieces of data. Apart from the normal files, ated on desktop and mobile devices. It was found that files left by instant messenger on a hard drive can be in some browsers using secure browsing protocols had actu- temp file format and will generally be deleted could be ally limited their own protection level [23]. Dezfouli exam- very difficult to retrieve once the machine is power down. ine four social networking: Facebook, Twitter, LinkedIn An operating system generally stores information of all and Google+, on Android and iOS platforms, to detect the installed and uninstalled applications in the system. remnants of users’ activities that are of forensic inter- The uninstalled application also leaves evidence. If a user est [6]. In 2017, Yusoff report the results of investiga- has deleted an instant messenger application, there is a tion and analysis of three social media services (Facebook, International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 830

Twitter, and Google +) as well as three instant messag- systems. The devices include mobile phones, tablets or ing services (Telegram, OpenWapp, and Line) for foren- any other electronic device that is running Android mo- sic investigators to examine residual remnants of forensics bile operating system could obtain highest authority when value in Firefox OS [27]. Song-Yang Wu describes several they rooted the phone. Rooting is often carried out with forensic examinations of Android WeChat and provides the aim of overcoming limitations that mobile operators corresponding technical methods [26]. Imam Riadi per- and developers put on some devices. In order to obtain formed a comparison of tool performance to find digital more information on the mobile phone, the investigators and chat and pictures from Instagram Messenger [18]. Al- should execute a series of rooting processes before exam- though Zhang et al. analyzed the local artifacts of the 4 ining a mobile phone. Instant Messaging applications, the types of these local XRY is a commercial digital forensics and mobile de- artifacts are incomplete [28]. Jusop Choi analyzed the vice forensics product by the Swedish company Micro Sys- personal data files in three instant messaging applications temation. It used to analyze and recover information from (KakaoTalk, NateOn, and QQ) which are the most pop- mobile devices. XRY is designed to recover the contents ularly used in China and South Korea [3]. of a device in a forensic manner so that the contents of This paper investigated the user activities of Facebook the data can be relied upon by the user. The XRY sys- Messenger through Android smart phones. We conducted tem allows for both logical examinations and also physical forensics on Non-Rooted and Rooted Android platform, examinations. and explored and compared the type of user behavior that Autopsy is a free computer software that makes it leaves digital evidence on the device. The results will be simpler to deploy many of the open source programs. served as a reference for the future researchers in the social The graphical user interface displays the results from the network cybercrime investigation or digital forensics. forensic search of the underlying volume making it easier for investigators to flag pertinent sections of data. Win- Hex is a hex editor useful in data recovery and digital 3 Methodology forensics. WinHex is a free powerful application that you can use as an advanced hex editor, a tool for data analysis, In our research, we use the smart phone with an instal- editing, and recovery, a data wiping tool, and a forensics lation of Facebook Messenger. The study was focused on tool used for evidence gathering. identifying data remnants of the activities of Messenger SQLite is a software library that provides a relational on an Android platform. This is undertaken to determine database management system. SQLite database is inte- the remnants an examiner should search for when Instant grated with the application that accesses the database. Messenger is suspected. Our research includes the circum- The applications interact with the SQLite database read stances of Non-Rooted and Rooted Android platform. and write directly from the database files stored on disk. SQLite is an open source. SQL database that stores data 3.1 Research Goal to a text file on a device. Android comes in with built This paper studies the user behaviors, including logging in SQLite database implementation. The physical smart into Facebook Messenger, uploading images, exchanging phone uses SQLite to read and analyze the database files information, GIS location sharing, and special application on mobile devices. Android debug bridge (ADB) is a ver- functions under the Non-Rooted and Rooted Android en- satile command line tool that lets users communicate with vironment. The study also explored and compared the connected Android devices or emulators. Android debug type of user behavior that leaves digital evidence on the bridge command also facilitates a variety of devices ac- device. We explore the differences between the artifacts tions, for example, installing or debugging applications. that are left on Non-Rooted and Rooted Android plat- All the specifications of the tools we used are listed in the form. We checked the changes and discrepancies in the Table 1. residual digital data and relevant evidence on the Android smart phone. 3.3 Development of Experiments

3.2 Experimental Environment and Tools Based on the experimental environment designed, we run the Messenger features, including logging in, sending mes- In this paper, all the experiments were conducted on the sages, exchanging photos, videos, audio, and files, making real system. This study is built on Sony Xperia Z1 C6902 a call, etc. After that, the relevant evidence on each de- with Android 4.3. Under the Android operating envi- vice was extracted and analyzed using forensic tools. ronment, the Facebook Messenger social networking ap- plication was installed to run the Messenger features di- 3.3.1 Extract Data rectly. In addition, if Facebook Messenger have ever been installed, the ”/com.facebook.orca” folder will appear in XRY is a commercial tool specifically for mobile phone the directory structure. forensics. Besides XRY tool, we need backup the image Rooting is a process of allowing users to gain privileged file of smart phone to analyze for Autopsy and WinHex. control which is known as root over the various Android We create an image file for physical memory on the smart International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 831

Table 1: List of hardware and software used Devices/Tools Description Specification/Versions Sony Xperia Z1 C6902 Android Smart Phone Android 4.3, Memory 2GB/16GB XRY Mobile Forensics Tool Version v7.4.1 Autopsy Digital Forensics Tool Version v4.4.1 WinHex Digital Forensics Tool Version v18.9 SQLite Expert Personal Database Management Tool Version v3.5.96.2516 Messenger Social Networking App Version v141.0.0.31.76 Minimal ADB and Fastboot ADB Tool Version v10.0.16299.371

phone and use forensic tools to extract and analyze im- the relevant evidence. portant digital evidences from the image file. We can extract data and create image file from physi- 3.3.3 Experiment Elaboration cal memory. We connect the mobile phone with the com- puter by using the phone cable. First, we enter ”adb The experiment steps are summarized as follows. devices” command to connect the two devices. If the two 1) We install the Messenger software on the smart devices are connected, the message will show the list of phone, and the forensic tool on the personal com- devices attached. Next, we enter ”adb shell” command puter. to execute remote control and now the mark sign will be- come ”$”. Then, we need to obtain the administrator 2) We logged into the Messenger for running various level permissions. Thus, we enter ”su” command and the features for any material evidence left by the users. mark sign will become ”#”. Now, we can enter ”busybox df -h” command to inspect the system partition, path, 3) After the activities completed, the Messenger soft- volume, space usage, available space and so on. How- ware is logged out. ever, most of the application data installed and stored 4) Use the forensic tool to find out all kinds of artifacts on the phone would locate at the data partition. There- about Messenger software on smart phone. fore, we are interested in this data partition. The path of data partition is ”/dev/block/by-name/data”. Now, 5) Perform a comprehensive evidences analysis. we use ”dd” command to create an image file for this partition. ”dd” command can perform physical imaging by adopting bit-by-bit method. We enter ”busybox dd 4 Results and Findings if=/dev/block/by-name/data of=/storage/MicroSD/test conv=noerror bs=4096” command to create an image file. In this section, we will use three scenarios to describe the The string behind ”if” is a partition that we would create result and findings. Each scenario has two modes that are an image file. The string behind ”of” is an image out- the Non-Rooted mode and Rooted mode. The details of put path. In the experiment, we name the output image result and findings are as follows. file ”test.img” and store it on the external SD card. The ”conv=noerror” shows that there is no interruption when 4.1 Scenario 1: XRY there is an error occurs. The ”bs” represents the block In the scenario 1, we follow the experiment steps as Sec- size that we would write and read per time. tion 3.3 and use the XRY tool to find the evidences of the activities on the Messenger. 3.3.2 Design of Analysis 4.1.1 Non-Rooted Mode In order to ensure the integrity of digital evidence and avoid interference between digital evidence, we divide the Using the XRY tool to find the evidences on the Messen- experiment into the following three cases according to dif- ger, we can’t find the user account and password. The ferent forensic tools XRY, Autopsy and WinHex. The artifacts of user account can be found as Figure 1. The three experimental scenarios include Scenario 1: XRY, user’s information is as Figure 1 that are phone number Scenario 2: Autopsy, and Scenario 3: WinHex. And each +886985028322, nickname Huang Gordon, profile picture experimental scenario, the same experimental steps are URL, and Facebook number 100021304820523. performed. The friend list can be found as Figure 2. It includes In addition, each of the above experimental scenarios the name and the profile picture URL of friends. is further divided into two models: non-root and root An- The artifacts of sending text, image, audio, video, GIS droid platforms.We then performed the same experiments location, and GIF animation can be found. For an exam- on non-root and root Android platforms and compared ple, we could show the artifacts of sending video as Figure International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 832

Figure 1: The artifacts of user account Figure 4: The artifacts of making a call

Figure 2: The artifacts of friend list

3. The video delivery time, the video URL of storage lo- cation, who sending the video can be found.

Figure 5: The artifacts of message table

find the user network number. The contacts db2 database contains the contacts table. It includes friend list. The nickname and the URL of profile picture are on the table. The path, /data/data/com.facebook.orca/, is the main storage location of Messenger. All the storage path of different artifacts is shown as Table 2.

Figure 3: The artifacts of sending video Table 2: The storage path of various traces Artifacts Storage Path The artifacts of making a call can be found. We can Profile Picture /data/data/com.facebook.orca/ find the calling record that includes the calling time, who files/image/v2.ols100.1/52/ making the call, and threads db2 database. For an exam- Friend Picture /data/data/com.facebook.orca/ ple, the artifacts of making a call is as Figure 4. files/image/v2.ols100.1/88/ About the database, there are three main databases Image /data/data/com.facebook.orca/ for the recovered Messenger artifacts. The threads db2 cache/image/v2.ols100.1/84/ database contains the sending messages. The Video /data/data/com.facebook.orca/ call log db 10021304820523 database contains the files/ExoPlayerCacheDir/videocache/ setting of user account. 10021304820523 is the network GIFAnimation /data/data/com.facebook.orca/ login number. The contacts db2 database contains cache/image/v2.ols100.1/32/ the user information. The contacts db2 database Sticker /data/data/com.facebook.orca/ is on /data/data/com.facebook.katana/databases/. cache/image/v2.ols100.1/99/ The threads db2 database and the Audio /data/data/com.facebook.orca/ call log db 10021304820523 database are on cache/audio/v2.ols100.1/88/ /data/data/com.facebook.orca/databases. There are three significance tables in the threads db2 database. It includes the message table, the thread users table, and the message reactions table. The message table stores 4.1.2 Rooted Mode the sending messages. An example of message table is as Figure 5. The evidences of text, sender, and timestamp Root is the highest privilege of the mobile phone, which is can be found on the table. We can find the user name, equivalent to the administrator privilege in the computer nickname, and network number in the thread users table. window system. After obtaining the root privilege, all the The stickers can be found on message reactions table. files of the mobile phone can be read and modified. From the call log db 10021304820523 database, we can Using the forensic tool XRY, we can find the artifacts International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 833 of the Messenger account. It includes nickname, friend 4.2.2 Rooted Mode nickname, profile picture, user name, phone number, net- work number and related data URL. But the account We use the Autopsy tool to open the backup image and password cannot be found. The sending text mes- file of smart phone. Our nickname, friend nickname, sage, picture, video, GIF animation, sticker, audio file, text, image, video, GIF animation, sticker, audio file, GIS location, and calling records are also can be found. GIS location, and calling record can be found. But It includes message content, sending time, name of the the account number, password, and user name can’t be database, information about the sender and the recipi- found. We also find the databases such as threads db2, ent, etc. According to the experiment, the sending mes- call log db 10021304820523 and contacts db2. There is sage is stored in the threads db2 database, the account no data on the message reactions table in the threads db2 data is stored in the contacts db2 database, and the net- database. The Autopsy displays the artifacts of Messen- work login number of the Messenger is known by the ger is as Figure 6. call log db 10021304820523database. The Rooted mobile phone experiment has the same results as the Non-rooted mobile phone experiment, and can find artifacts on vari- ous tables in the different database.

4.1.3 The Comparison of Findings on Rooted Mode and Non-Rooted Mode The comparisons of findings between Rooted Mode and Non-Rooted Mode using XRY are as Table 3.

Table 3: The comparison of Rooted Mode and Non- Rooted Mode using XRY Evidences Rooted Non-Rooted Account None None Password None None Profile Name Found Found Nickname Found Found Friend Nickname Found Found Text Found Found Image Found Found Video Found Found GIF Animation Found Found Figure 6: The artifacts of Messenger using Autopsy Stickers Found Found Audio Found Found GIS Location Found Found User nickname and the network number can be found Calling Found Found on the thread users table in the threads db2 database. We also find the sending text, sticker, GIS location, calling record on the messages table in the same database. The artifacts of video are located on data/com.facebook.orca/ 4.2 Scenario 2: Autopsy files/ExoPlayerCacheDir/videocache/. We find the arti- facts of audio on data/com.facebook.orca/cache/audio/. In the scenario 2, we follow the experiment steps as Sec- The Image and GIF animation are on tion 3.3 and use the Autopsy tool to find the artifacts of data/com.facebook.orca/cache/image/. For an ex- the activities on the Messenger. ample, the artifact of GIS location is shown in Figure 7.

4.2.1 Non-Rooted Mode The Autopsy is a free information security forensics tool that provides a graphical interface for digital forensic in- vestigation. It can analyze Windows and UNIX disks and file systems such as NTFS, FAT, UFS1/2 and Ext2/3. In the case of Non-rooted mobile phone, we use the Autopsy tool to open the smart phone backup image file and ana- Figure 7: The artifacts of GIS location using Autopsy lyze it. As a result, no artifacts about Messenger can be found. International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 834

4.2.3 The Comparison of Findings on Rooted 4.3.3 The Comparison of Findings on Rooted Mode and Non-Rooted Mode Mode and Non-Rooted Mode The comparisons of findings between Rooted Mode and The comparisons of findings between Rooted Mode and Non-Rooted Mode using Autopsy are as Table 4. Non-Rooted Mode using WinHex are as Table 5.

Table 4: The comparison of Rooted Mode and Non- Table 5: The comparison of Rooted Mode and Non- Rooted Mode using Autopsy Rooted Mode using WinHex Evidences Rooted Non-Rooted Evidences Rooted Non-Rooted Account None None Account None None Password None None Password None None Profile Name None None Profile Name None None Nickname Found None Nickname Found None Friend Nickname Found None Friend Nickname Found None Text Found None Text Found None Image Found None Image None None Video Found None Video None None GIF Animation Found None GIF Animation None None Stickers Found None Stickers None None Audio Found None Audio None None GIS Location Found None GIS Location None None Calling Found None Calling None None

4.3 Scenario 3: WinHex In the scenario 3, we follow the experiment steps as Sec- 5 Discussions tion 3.3 and use the WinHex tool to find the artifacts of the activities on the Messenger. Using XRY tool, the path, /data/data/com.facebook.orca/, is the main storage location of Messenger. Under this path, there are two 4.3.1 Non-Rooted Mode folders, files, and cache, have more multimedia artifacts. WinHex is a disk editor and a hex editor useful in data They include the profile picture, image, GIF animation, recovery and digital forensics. WinHex is a free power- sticker, and audio files. The video artifacts are located ful application that you can use as an advanced hex ed- in the ExoPlayerCacheDir folder. In addition, the itor, a tool for data analysis, editing, and recovery, and storage path after the smart phone has been Rooted is a forensics tool used for evidence gathering. In the case the same as that of Non-rooted, but the pathname is of Non-rooted mobile phone, we use the WinHex tool to changed to /userdata/data/com.facebook.orca/. Both open the smart phone backup image file and analyze it. the Non-rooted smart phone and the Rooted smart As a result, no artifacts about Messenger can be found. phone have the same artifacts. The reason is that the forensic tool XRY uses the way to downgrade the version 4.3.2 Rooted Mode of Messenger to get a lot of artifacts. According to the experiments, in the case of the Non- We use the WinHex tool to open the backup image file of rooted smart phone, the forensic tools, Autopsy, and Win- smart phone. Our nickname, friend nickname, text, can Hex, could not find any artifacts of Messenger. On the be found. But the image, video, GIF animation, sticker, contrary, use the forensic tool XRY to bypass the secu- audio file, GIS location, calling record, account number, rity protection mechanism by downgrading the Messenger password, and user name can’t be found. For an exam- version. We can obtain many artifacts of the Messenger. ple, we find the user login email and nickname using the But the account number and password cannot be found. ”username” keyword as Figure 8. Therefore, it is proved that the use of the forensic tool XRY to perform the forensic analysis in the Non-rooted smart phone is better. In the Rooted smart phone, using the forensic tool Au- topsy can find many traces, but the account number, pass- word and user name cannot be found. Using the forensic Figure 8: The artifacts of the user under using WinHex tool WinHex can find our nickname, friend nickname, and text. According to the results of the experiments, it is International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 835

Table 6: The findings of three forensic tools Forensic Tools XRY Autopsy WinHex Messenger Artifacts Non-Rooted Rooted Non-Rooted Rooted Non-Rooted Rooted Account None None None None None None Password None None None None None None User name Found Found None None None None My nickname Found Found None Found None Found Friend nickname Found Found None Found None Found Text Found Found None Found None Found Image Found Found None Found None None Video Found Found None Found None None GIF animation Found Found None Found None None Sticker Found Found None Found None None Audio Found Found None Found None None GIS location Found Found None Found None None Calling log Found Found None Found None None

Table 7: The results of forensic analysis of relevant researchers for Facebook messenger Researcher Artifacts Katie Corcoran [5] User name, my nickname, friend nickname, text, GIS location, timestamp Hao Zhang [28] User name, my nickname, friend nickname, text, image, audio, GIS location, times- tamp Ming-Sang Chang & Chih-Ping User name, my nickname, friend nickname, text, image, video, GIF animation, Yen (this work) sticker, audio, GIS location, calling log, timestamp

proved that the use of the forensic tool XRY to perform 6 Conclusions the forensic analysis in the Rooted smart phone and Non- Rooted smart phone is better. However, considering the Although the instant messaging software has the ad- funding or other restrictions, we can try to use the free vantages of convenience and immediacy, it is always in- forensic tool, Autopsy or WinHex, to assist in the forensic evitable that it will be abused by cyber criminals. Crimes analysis work. are often exploited from software, website and web appli- cation exploits, using cloud services to spread malware, We know the investigation step of instant messaging and further exploiting social media posts and links to trick from the research of the experiments. First, we find the users into fraud traps. basic information of the user, and then search for the be- havior of the user through the keyword strings such as ac- In this paper, we investigated the apps of Facebook count number, nickname ,and network number. Using the Messenger to conduct a forensic analysis of the user be- user artifacts to estimate possible crimes. It also can use haviors in Android environments. The study found that other accounts that may be additionally discovered dur- different activities can lead to the discrepancies in record- ing the search period. It may help to expand the search ing the user behaviors on the same social network. scope to see if there are accomplices or other victims. While investigating cybercrime on Facebook Messen- ger, we recommend that the first goal should be finding We also can analyze the relationship between the crim- the account number, nickname, and network number of inal modus operandi and the timing chain. The investi- the criminal suspect. Using the account number and nick- gator can infer the motives and tactics of possible crimes name, the operational behaviors of the criminal suspect through the timing chain, and discover the criminal ac- on the social network can be searched, such as, uploading complices, transaction content, plans, time and place. pictures, sending text, calling log, and timestamps. Then, When investigators find all kinds of traces of crimes, they based on the contents of the operation, the possible crim- can prevent crimes in advance. inal activity or victimization practice can be deduced or Finally, we summarize the findings of three forensic estimated. At the same time, using the additional ac- tools based on Non-rooted mode and Rooted mode as count numbers that are possibly discovered during the Table 6. It also presents the results of forensic analysis of evidence gathering phase, the scope of the investigation relevant researchers for Facebook Messenger, as shown in can be expanded to find the possible accomplices or other Table 7. victims. The full evidence scenario obtained in a step-by- International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 836 step and layer-by-layer outward expansion will be the key [14] D. K. Mendoza, “The vulnerability of cyberspace – to solving the case. the cyber crime,” Journal of Forensic Sciences & Criminal Investigation, vol. 2, no. 1, 2017. [15] N. Mutawa, I. Awadhi, I. Baggili, and A. Marrington, References “Forensic artifacts of facebook’s instant messaging service,” in Proceedings of the International Confer- [1] N. Al-Suwaidi, H. Nobanee, and F. Jabeen, “Esti- ence for Internet Technology and Secured Transac- mating causes of cyber crime: Evidence from panel tions (ICITST’11), pp. 771–776, 2011. data fgls estimator,” International Journal of Cyber [16] D. Quick and K. K. Choo, “Pervasive social network- Criminology, vol. 12, pp. 392–407, 2018. ing forensics: Intelligence and evidence from mobile [2] A. Azfar, K. Choo, and L. Liu, “An android so- device extracts,” Journal of Network and Computer cial app forensics adversary model,” in Proceedings Applications, vol. 86, pp. 24–33, 2017. of the International Conference on System Sciences, [17] J. Reust, “Case study: Aol instant messenger pp. 5597–5606, 2016. trace evidence,” Digital Investigation, vol. 3, no. 4, [3] J. Choi, J. Yu, S. Hyun, and H. Kim, “Digital foren- pp. 238–243, 2006. sic analysis of encrypted database files in instant [18] I. Riadi, A. Yudhana, and M. C. F. Putra, “Forensic messaging applications on windows operating sys- tool comparison on instagram digital evidence based tems: Case study with kakaotalk, nateon and QQ on android with the nist method,” Scientific Journal messenger,” Digital Investigation, vol. 28, pp. 50–59, of Informatics, vol. 5, no. 2, pp. 235–247, Nov. 2018. Apr. 2019. [19] H. Said, A. Yousif, and H. Humaid, “Iphone forensics [4] H. C. Chu, D. J. Deng, and J. H. Park, “Live data techniques and crime investigation,” in Proceedings mining concerning social networking forensics based of the International Conference and Work-shop on on a facebook session through aggregation of social Current Trends in Information Technology, pp. 120– data,” IEEE Journal on Selected Areas in Commu- 125, 2011. nications, vol. 29, no. 7, pp. 1368–1376, 2011. [20] C. Sgaras, M. T. Kechadi, and N. A. Le-Khac, [5] K. Corcoran, A. Read, J. Brunty, and T. Fenger, “Forensics acquisition and analysis of instant messag- “Messaging application analysis for android ing and voip applications,” Computational Forensics, and iOS platforms,” Research and Semi- pp. 188–199, 2015. nar in Forensic Science Graduate Program, [21] P. Shakarian, A. Bhatnagar, A. Aleali, E. Shaabani, 2013. (http://www.marshall.edu/forensics/ and R. Guo, ”Diffusion in social networks,” Com- research-and-seminar/class-of-2013) puter Science, 2015. ISBN: 978-3-319-23105-1. [6] F. N. Dezfouli, A. Dehghantanha, B. Eterovic-Soric, [22] Statista, Most Popular Global Mobile Messenger and K. R. Choo, “Investigating social networking ap- Apps as of October 2019, Based on Number of plications on smartphones detecting facebook, twit- Monthly Active Users [Online], 2020. (https: ter, linkedin and google+ artefacts on Android and //www.statista.com/statistics/258749/ iOS platforms,” Australian Journal of Forensic Sci- most-popular-global-mobile-messenger-apps) ences, vol. 48, pp. 469–488, 2016. [last accessed 10.01.2020]. [7] M. Dickson, “An examination into msn messenger 7.5 [23] N. Virvilis, A. Mylonas, N. Tsalis, and D. Gritza- contact identification,” Digital Investigation, vol. 3, lis, “Security busters: Web browser security vs.rogue no. 2, pp. 79–83, 2006. sites,” Computer & Security, vol. 52, pp. 90–105, [8] M. Dickson, “An examination into yahoo messen- 2015. ger 7.0 contact identification,” Digital Investigation, [24] D. Walnycky, I. Baggili, and A. Marrington, et al., vol. 3, no. 3, pp. 159–165, 2006. “Network and device forensic analysis of android [9] W. B. Glisson, T. Storer, and J. Buchanan- social-messaging applications,” Digital Investigation, Wollaston, “An empirical comparison of data recov- vol. 14, pp. 77–84, 2015. ered from mobile forensic toolkits,” Digital Investi- [25] K. Wong, A. Lai, J. Yeung, and W. Lee, et al., gation, vol. 10, no. 1, pp. 44–55, 2013. “Facebook forensics,” valkyrie-x security research [10] J. Golbeck, Introduction to Social Media Investiga- group, 2011. (https://www.fbiic.gov/public/ tion, pp. 273-278, 2015. ISBN: 9780128016565. 2011/jul/facebook_forensics-finalized.pdf) [11] M. Levendoski, T. Datar, and M. Rogers, “Ya- [26] S. Y. Wu, Y. Zhang, and X. P. Wang et al., “Forensic hoo! messenger forensics on windows vista and win- analysis of wechat on android smartphones,” Digital dows 7,” Digital Forensics and Cyber Crime, vol. 88, Investigation, vol. 21, pp. 3–10, 2017. pp. 172–179, 2012. [27] M .N. Yusoff, A. Dehghantanha, and R. Mahmod, [12] A. Mahajan, M. S. Dahiya, and H. P. Sanghvi, “Chapter 4 – forensic investigation of social media “Forensic analysis of instant messenger applications and instant messaging services in firefox os: Face- on android devices,” International Journal of Com- book, twitter, google+, telegram, openwapp, and puter Applications, vol. 68, no. 8, pp. 38–44, 2013. line as case studies,” Contemporary Digital Foren- [13] E. Martellozzo and E. A. Jane, Cybercrime and Its sic Investigations of Cloud and Mobile Applications, Victims, 2017. ISBN10: 1138639443. pp. 41–62, 2017. International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 837

[28] H. Zhang, L. Chen, and Q. Liu, “Digital forensic a Professor. His research interest includes Computer Net- analysis of instant messaging applications on android working, Network Security, Digital Investigation, and So- smartphones,” International Conference on Comput- cial Networks. ing, Networking and Communications (ICNC’18), Chih Ping Yen is an Associate Professor, Department pp. 647–651, 2018. of Information Management, Central Police University. Received his Ph.D. degree from Department of Computer Biography Science and Information Engineering, National Central University, Taiwan, in 2014. His research interest in- Ming Sang Chang received the Ph.D. degree from Na- cludes Digital Investigation, Artificial Intelligence & Pat- tional Chiao Tung University, Taiwan, in 1999. In 2001 tern Recognition, Image Processing, and Management In- he joined the faculty of the Department of Information formation Systems. Management, Central Police University, where he is now International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 838

Protection of User Data by Differential Privacy Algorithms

Jian Liu1 and Feilong Qin2 (Corresponding author: Feilong Qin)

School of Automobile and Transportation, Chengdu Technological University1 School of Big Data and Artificial Intelligence, Chengdu Technological University2 No. 1, The second section of Zhongxin Avenue, Pidu District, Chengdu, Sichuan 611730, China (Email: [email protected]) (Received Apr. 13, 2019; Revised and Accepted Jan. 5, 2020; First Online July 13, 2020)

Abstract providers of application software mine information using big data mining technology, analyze users’ preferences, With the emergence of more and more social software and provide more accurate personalized services [9]. How- users, increasingly larger social networks have appeared. ever, the social network also contains sensitive private These social networks contain a large number of sensitive information, which is usually collected and archived by information of users, so privacy protection processing is service providers, so the protection measures of privacy needed before releasing social network information. This and confidential data become critical issues of service paper introduced the hierarchical random graph (HRG) providers. based differential privacy algorithm and the single-source The traditional privacy protection is mainly to encrypt shortest path based differential privacy algorithm. Then, sensitive data, but this method is gradually challenging the performance of the two algorithms was tested by two to play an active role in big data mining technology [13]. artificial networks without weight, which was generated Differential privacy algorithm is a method to deal with by LFR tool and two real networks with weight, which the above problem. Its basic principle is to disturb the were crawled by crawler software. The results show that original data and network structure, including adding, after processing the social network through the differen- deleting, exchanging, etc., to make the disturbing data tial privacy algorithm, the average clustering coefficient different from the original data, i.e., protecting original decreases, and the expected distortion increases. The data through publishing the disturbed data. To reduce smaller the privacy budget, the higher the reduction and the large amount of noise caused by separate privacy in the more significant the increase. Under the same privacy related data sets, Zhu et al. [15] proposed an effective cor- budget, the average clustering coefficient and expected related differential privacy solution. They found that the distortion of the single-source shortest path differential scheme was superior to the traditional differential privacy privacy algorithm are small. In terms of execution effi- scheme in terms of mean square error on a large group of ciency, the larger the size of the social network, the more queries. Li et al. [7] proposed segmentation mechanisms time it takes, and the differential privacy algorithm based based on privacy perception and utility to deal with the on the single-source shortest path spends less time in the personalized privacy parameters of every individual in the same network. data set and maximize the efficiency of the differential pri- Keywords: Differential Privacy; Hierarchical Random vacy calculation. Experiments a large amount of original Graph; Single Source Shortest Path Model; Social Net- data sets verified the effectiveness of the method. work Chen et al. [2] proposed two optimization techniques, PrivTHR and PrivTHREM, to optimize the differen- 1 Introduction tial privacy in wave clusters, and the simulation results showed that the optimization technique had high practi- The popularity of wireless communication technology and cability when the privacy budget allocation was appropri- intelligent mobile terminals makes people’s communica- ate. This paper briefly introduced the differential privacy tion more and more convenient, and a variety of com- algorithm based on a hierarchical random graph (HRG) munity communication application software makes more and the differential privacy algorithm based on the single- and more registered users on the Internet, to build a vast source shortest path. Then the performance of the two and sophisticated social network [1, 12]. Social network algorithms was tested by two artificial networks without contains different kinds of relevant information. Service weight, which was generated by LFR tool and two real International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 839 networks with weight, which were crawled by crawler soft- Figure 1 is one kind of HRG in the social network. The ware. division of G by binary tree is similar to the random di- chotomy of a node set. As shown in Figure 1, HRG di- chotomizes five nodes into (1, 2, 3) and (4, 5) groups. 2 Differential Privacy Algorithm The binary tree root (i.e. the box in Figure 1) of the two groups shows the connection probability of the two 2.1 The Concept of Differential Privacy groups, and the formula is: Social network is a network of points and lines in the P r = e /(n · n ), visual image. Every node represents a user, while the y L,r R,r line represents the connection between users. Points and where r is the internal node (root node) in the sample lines in the social network diagram contain various vi- tree, i.e. the box node of HRG in Figure 1, nL,r and tal data. At present, the commonly used social network nR,r are the number of network nodes on the left and privacy is divided into two categories, both of which sub- right sides under root node r, and er is the number of stantially change the overall structure of the social net- connection edges between node sets on both sides of the work graph. One is to cluster the network nodes into root node. After that, the dichotomy of groups continues, ”clusters” by using the clustering algorithm [8] and then and the connection probability is calculated until the seg- encrypt them; the other is to add disturbance to the net- mentation completes. The HRG based differential privacy work graph structure, including deleting, exchanging, and algorithm is as follows. adding nodes and connections. Although the former can hide the privacy data well, it seriously destroys the local 1) Firstly, the sample tree of HRG is sampled by Markov structure of the network and affects the typical mining of Chain Monte Carlo (MCMC) method [3], and the the network structure data. Although the latter disturbs details are as follows. A sample tree (HRG) T0 is the network structure, the overall scale is the same, and randomly selected. Then a neighbor tree is generated the impact on the regular use of the data is not signifi- according to the previous sample tree, and whether cant. Differential privacy is one of the protection methods to update the sample tree is determined according to of the latter. The definition of differential privacy is as the acceptance probability. The formulas are: follows. If the following equation holds:   T 0 α ε  Ti = P r(F (D1) ∈ S) ≤ e P r(F (D2) ∈ S).  Ti−1 1 − α  0 ε1(log L(T )−log L(Ti−1)) α = min(1, exp( ) )) (1) Then the algorithm F can complete ε-differential privacy.  P  log L(T ) = − r∈T nL,rnR,rh(pr) D1 and D2 are two data sets which only had difference  h(pr) = −pr log pr − (1 − pr) log(1 − pr) in one data; S is the output result of algorithm F to D1 and D , and it is in the domain of definition of algorithm 2 where T , T , and T 0 are sample trees after and be- F ; ε is called privacy budget [5], and its value determines i i−1 fore the update and the neighbor tree of T respec- the protection degree of differential privacy to data, in i−1 tively, α is the probability of acceptance, log L(T ) is details, the smaller the value is, the higher the protection the logarithm of similarity between the sample tree degree is and the larger the disturbance of data addition and G, h(p ) is Gibbs Shannon entropy function. is. r Through Equation (1), the sample tree is updated and iterated until log L(T ) before and after update and iteration is smaller than the set threshold. The number of samples is selected after a certain number of iterations, and finally the stable sample tree set SST = (TS1,TS2, ··· ,TSN ) is obtained through sam- pling, where TSN stands for the sample tree which is obtained by the N-th sampling after the sample tree becomes stable through iterations. Figure 1: The schematic diagram of social network and one of its HRG 2) Sample tree set SST is added with Laplace noise [5]. After noise addition, the calculation formula of the connection probability of node r inside the sample tree is: 2.2 Differential Privacy Algorithm Based e + laplace(ε−1) on HRG P r0 = min(1, r 2 ), nL,r · nR,r HRG [6] divides the hierarchical structure of G = (V,E) 0 using binary tree, in which V is a set of nodes in a net- where pr is the connection probability of internal work and E is a set of relationships among network nodes. node r after noise addition. International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 840

3) The lower triangular matrix of every HRG in SST trix representing the constraints is obtained. The after noise addition is calculated, and then the lower relevant steps are as follows. triangular mean value matrix of SST is calculated. The element in the lower triangular matrix is the a. Firstly, node is selected from the network as a connection probability of each pair of network nodes, source point and induced into to set V0, and which can be obtained through the multiplication of then nodes which can be reached in one step 0 from v0 are selected from the remaining nodes pr in HRG. to form set Q. 4) According to the connection probability of network b. Node µ which has the smallest edge weight with nodes in the lower triangular mean value matrix, the V0 is selected from Q. Then a row is added in connection edges between nodes is set. constraint matrix A according to constraint con- dition {f(v0, pre µ) ≤ f(v0, µ)}. Values of the 2.3 Single Source Shortest Path Con- corresponding positions in the matrix are con- straint coefficients obtained by the set of con- straint Model Based Differential Pri- straint conditions. pre µ is µ which is selected vacy Algorithm from Q previously. Then µ is induced into V0, The HRG based differential privacy algorithm described and the path between V0 and Q is updated. Set above can effectively protect the privacy of social net- Q is updated, i.e., deleting µ. works, but it is more aimed at the weightless social net- c. Step 2 repeats until Q becomes an empty set. works, i.e. although the degree of connection between Then the spanning tree which is composed nodes in the social networks in this algorithm is different, of nodes in V0 that has complete induction the importance of each connection is similar, or it does not and corresponding connection edges is added to matter to the algorithm. However, with the expansion of spanning tree sequence T . the scale of the Internet and the increase of social soft- d. A new source point is selected from the re- ware users, social networks not only increase in scale, but maining nodes which are not induced, and then also have different sensitivities between nodes. In order to Steps 1, 2, and 3 repeat until all the nodes are describe social networks more accurately, in addition to induced. Finally constraint matrix A and span- the connection between nodes, different weights are also ning tree sequence T are output. given to the connection, which is used to indicate the im- portance of the connection. The original expression of the 2) After getting the single source shortest path con- social network transforms to: G = (V,E,W ), where W straint model of social network, noise is added to represents the weight set of the corresponding connection differential privacy. The noise addition of social net- edges. For the social network with weight, the weight work includes two aspects: one is to add constraint that it has is also part of the sensitive information, and noise to the weight of the network connection edge, moreover it t is also necessary to deal with the weight and the other is to disturb network nodes. The algo- when dealing with the differential privacy of the social rithm steps of the former are as follows. network as the importance degree of connection edges is represented by weight. a. Firstly, according to the spanning tree in span- The HRG based differential privacy algorithm will af- ning tree sequence T , edge set E in G is divided fect the edge weight in the processing of differential pri- into ET and EN , where ET is the edge set of vacy of social network with weight. Once the edge weight spanning tree and EN is the remaining edge set. in the social network with weight changes, the structure b. Laplace noise is added to the edge weight in ET , of the whole graph will change; although the encryption of and the formula of noise addition is: the information is achieved, the data availability seriously reduces. Therefore, the constraint model of social network 0 wi = wi + laplace(ε1), was constructed by the single source shortest path algo- rithm [10] in this study, and linear constraints were ap- where S(f) stands for the sensitivity of f, i plied to the disturbance of differential privacy on the basis stands for the edge of node pair which accepts 0 of the constraint model. The single source shortest path search by f, and wi and wi are edge weights constraint model based differential privacy algorithm is of the corresponding node pair before and after divided into two steps: 1) Building a single source short- noise addition respectively. est path constraint model; 2) Adding noise to differential c. The edge weight in E is solved based on the privacy. N weight in ET after noise addition, constraint 1) The first step is to build a single source shortest path matrix A and constraint inequation. constraint model. For G = (V,E,W ), nodes in the d. Every spanning tree in spanning tree sequence T network are induced into the corresponding spanning is processed as follows. Node pair which has no tree using Dijkstra algorithm, and the constraint ma- edge originally in the spanning tree is randomly International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 841

Table 1: Data sets of artificial network and Weibo social network

The maximum The minimum Number of nodes Number of edges Average node degree number of nodes number of nodes in the community in the community LFR1 1200 4125 30 110 30 LFR2 5000 9230 35 250 50 Weibo 1 12530 151132 55 1123 122 Weibo 2 21650 213578 64 1624 231

selected. A new edge is added between the node 3 Simulation Experiment pair. The weight of the new edge is the smaller one among the maximum weight of the spanning 3.1 Experimental Environment tree and the shortest path of node pair. In this study, the coding of the above algorithm was real- The algorithm steps of network node disturbance are ized by Python software [4]. The experiment was carried as follows. out with a laboratory server which was configured with Core i7 processor (2.6 GHz), Windows 7 operating system a. Firstly, the number of nodes to be disturbed is and 16 GB memory. calculated according to the set privacy budget, 3.2 Experimental Setup Nn = |laplace(1/ε2)|. The performance of the two differential privacy algo- b. In order to reduce the influence of disturbance rithms was tested by the artificial network data set gener- such as addition and deletion of nodes on the ated by LFR tool and the Weibo social network data set sensitivity of query function, nodes whose node crawled by crawler software. The relevant parameters of degree is smaller than the set threshold are se- the artificial network data set generated by LFR and the lected firstly. Node v is randomly selected, and Weibo social network data set crawled from the Weibo then node set V1 which is connected with v is interface by crawler software are shown in Table 1. LFR processed as follows. generated two artificial network data sets, and the artifi- If (µ1, v) ∈ E,(µ2, v) ∈ E and f(µ1, µ2) = cial network also forms communities of different sizes for w(µ1, v)+w(v, µ2), then w(µ1, µ2) = w(µ1, v)+ simulating the real network. In LFR1, there were 1200 w(v, µ2); if there is no edge between µ1 and µ2, nodes and 4125 edges, with an average node degree of 30; then an edge is constructed. µ1 and µ2 are any the maximum and minimum number of nodes in the com- two nodes in set V1. f is a query function in munity composed of nodes was 110 and 30 respectively. the single source shortest path model, and it In LFR2, there were 5000 nodes and 9230 edges, with returns the shortest path between two nodes. an average node degree of 35; The maximum and min- After nodes in V1 are processed, v and its edge imum number of nodes in the community composed of are deleted. nodes was 250 and 50 respectively. The artificial network The increase of virtual nodes is as follows. Node generated by LFR tool only contained node identifica- v is randomly selected. Then virtual node v1 tion and connection relationship, which belongs to undi- is added. A connection line is added between rected network graph without weight. According to the cv and v1, and the weight of the connection preliminary statistics of two Weibo data networks which edge is the average value of edge weights of were composed of Weibo data crawled by crawler soft- other nodes which connected with v. Moreover, ware, there were 12530 nodes and 151132 edges in Weibo node µ which connects with v is also connected 1, and an average node degree of 55, and the maximum with v1, and the estimation formula of its weight and minimum number of nodes in the community was value is: 1123 and 122 respectively; there were 21650 nodes and 213578 edges in Weibo 2, and an average node degree of w(v1, µ) = w(µ, v) + laplace(S(f)/ε2). 64, and the maximum and minimum number of nodes in the community was 1623 and 231 respectively. Besides c. Step 2 repeats to disturb network nodes until the basic node identification and connection relationship, the number of disturbed nodes reaches Nn. Af- the real network which is composed of Weibo data also ter the noise addition of differential privacy for included weight information such as attribute labels, and original social network G, social network G0 is the real network is a social network with weight. output. For the above four social network data sets, the soil International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 842 networks are processed by the above two differential pri- extent. In this study, two differential privacy algorithms vacy algorithms. Privacy parameter ε of two algorithms were applied to deal with four kinds of social networks in differential privacy processing was set as 10, 1 and 0.1 under different privacy budgets. The average clustering respectively. coefficient before and after the processing is shown in Fig- ures 2 and 3. Algorithm 2.2 represents the HRG based 1) Privacy parameter ε = ε1 + ε2 was used when the differential privacy algorithm; Algorithm 2.3 represents social network was processed by the HRG based dif- the single source shortest path based differential privacy ferential privacy algorithm (Algorithm 2.2), where algorithm, and numbers in brackets after the algorithm ε1 : ε2 = 1 : 1. represent the privacy budget adopted. It was seen from Figures 2 and 3 that the average clustering coefficients 2) Privacy parameter ε = ε1 + ε2 + ε3 was used when the social network was processed by the single source of different social networks before and after differential shortest path based differential privacy algorithm privacy processing were different; the larger the scale of social networks was, the larger the average clustering coef- (Algorithm 2.3), where ε1 : ε2 : ε3 = 2 : 1 : 2. ficient was; the average clustering coefficient of real Weibo networks was significantly larger than that of artificial 3.3 Performance Evaluation networks, which was because that connections between In this study, the performance of the two algorithms was users in real networks are more close and frequent in ad- measured by average clustering coefficient, expected dis- dition to the reason of larger scale. tortion degree and data processing time. The average clustering coefficient [14] could reflect the structure of so- cial network. Comparing the average clustering coefficient of the network before and after the differential processing could understand the degree of privacy protection of an algorithm; the greater the difference was, the higher the degree of privacy protection was. The formula is:

n 1 X 2Ei C = , n k (k − 1) i=1 i i where n is the total number of nodes, Ei is the actual number of connections between nodes adjacent to node i, Figure 2: Average clustering coefficients of two LFR ob- and ki is the number of nodes adjacent to node i. tained by two algorithms under different privacy budgets The expected distortion degree [11] could reflect the degree of distortion of the data after differential privacy processing and could measure the availability of data; the larger the value was, the lower the degree of data distor- tion after processing was and the higher the availability was. The calculation formula is: X X E[d(X,X0)] = p(x)q(x0|x)d(x, x0), X X0 where X and X0 are data sets before and after differential privacy processing respectively, d(x, x0) is the Hamming distance of the data before and after processing, p(x) is the probability distribution of data before processing, and Figure 3: Average clustering coefficients of two Weibo net- q(x0|x) is the probability of differential privacy transfer works obtained by two algorithms under different privacy condition. budgets Due to the randomness of the noise added in the dif- ferential privacy algorithm, the differential privacy algo- rithm of each social network was repeated 10 times under The comparison of the average clustering coefficient different privacy budgets, and the average value was taken under the same network data set suggested that the aver- as the final result. age clustering coefficient after differential privacy process- ing reduced; the smaller the privacy budget was, the more 3.4 Experimental Results the reduction was. The comparison of the average cluster- ing coefficient under the same privacy budget suggested The average clustering coefficient could reflect the degree that the average clustering coefficient of Algorithm 2.3 of clustering among nodes in the network, and it could in the same social network was smaller. Overall, Algo- reflect the structural distribution of the network to some rithm 2.3 was better in the differential privacy protection International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 843

Figure 4: Expected distortion degrees of two LFR obtained by two algorithms under different privacy budgets

Figure 5: Expected distortion degrees of two Weibo networks obtained by two algorithms under different privacy budgets

of social networks. of social network scale and the existence of weights signif- The expected distortion degree could reflect the aver- icantly increased the time required for differential privacy age degree of distortion between the original data set and processing; under the same social network, the average the data set after differential privacy processing. This in- time required by Algorithm 2.3 was significantly less than dex measured the loss degree of effective information in that of Algorithm 2.2, i.e. the single source shortest path the process of differential privacy processing of social net- based differential privacy algorithm was more efficient for works. Once the loss degree of effective information was differential privacy processing of social networks. The too large, social networks would not have the value of HRG based difference privacy algorithm needed to gen- information mining. Under different privacy budgets, the erate neighbor trees constantly in constructing the most expected distortion degree of the four social networks pro- matched HRG and sampled after converging to stability; cessed by the two differential privacy algorithms is shown in this process, it takes some time to converge to stabil- in Figures 4 and 5. It was seen from Figures 4 and 5 ity and sample. The single source shortest path constraint that the expected distortion degree increased after dif- model completed at one time without repeated generation ferential privacy processing with the reduction of privacy and convergence, so it took less time. budget no matter what kind of network it was; under the same privacy budget, no matter what kind of network it was, the expected distortion degree of Algorithm 2.3 was 4 Conclusion smaller; moreover, the expansion of social network scale This paper briefly introduces the differential privacy al- also increased the expected distortion degree of networks gorithm based on HRG and the differential privacy al- after processing by algorithms. gorithm based on a single-source shortest path. Then, The purpose of applying differential privacy algorithm two artificial networks without weights generated by LFR to social network is to add noise to the privacy infor- Gongzu and two real networks with weights crawled by mation, so as to achieve the effect of privacy protection. searcher software were used to test the performance of Therefore, in addition to the encryption effect, the execu- these two algorithms. The results are as follows. tion efficiency of its encryption is also an important per- formance index. The average time of the two algorithms 1) After the two differential privacy algorithms process in processing differential privacy of four networks is shown the community network, the average clustering coef- in Figure 6. It was seen from Figure 6 that the expansion ficient is reduced; the lower the privacy budget, the International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 844

[5] A. Friedman, S. Berkovsky, M. A. Kaafar, “A dif- ferential privacy framework for matrix factorization recommender systems,” User Modeling and User- Adapted Interaction, vol. 26, no. 5, 2016. [6] K. Kalantari, L. Sankar, A. D. Sarwate, “Robust privacy-utility tradeoffs under differential privacy and hamming distortion,” IEEE Transactions on In- formation Forensics & Security, vol. 13, no. 11, pp. 1-1, 2018. [7] H. Li, L. Xiong, Z. Ji, X. Jiang, “Partitioning-based mechanisms under personalized differential privacy,” pp. 615-627, 2017. [8] R. Rogers, A. Roth, A. Smith, O. Thakkar, “Max- Figure 6: The average time of two algorithms for differ- information, differential privacy, and post-selection ential privacy processing hypothesis testing,” 2016. [9] T. Steinke, J. Ullman, “Between pure and approx- imate differential privacy,” Computer Science, vol. greater the reduction; under the same privacy bud- 8096, no. 2, pp. 363-378, 2015. get, the single-source shortest path algorithm can re- [10] X. C. Wang, Y. D. Li, “Geo-social network publica- duce more. tion based on differential privacy,” Frontiers of Com- puter Science, vol. 12, no. 6, 2018. 2) After the community network is processed by the dif- [11] X. Wu, Y. Wei, Y. Mao, L. Wang, “A differential ferential privacy algorithm, the smaller the privacy privacy DNA motif finding method based on closed budget of the algorithm, the greater the expected frequent patterns,” Cluster Computing, vol. 21, pp. distortion of the network; under the same privacy 1-13, 2018. budget, the expected distortion of the network pro- [12] B. Yang, I. Sato, H. Nakagawa, “Bayesian differential cessed by the algorithm based on the single-source privacy on correlated data,” in ACM Sigmod Inter- shortest path is smaller. national Conference on Management of Data, 2015. 3) As the scale of social networks increases, the time [13] D. Zhang, D. Kifer, “LightDP: Towards automating required for the two algorithms to process social net- differential privacy proofs,” ACM Sigplan Notices, works also increases, and the algorithm based on the vol. 52, no. 1, pp. 888-901, 2016. single-source shortest path requires less time to pro- [14] G. Q. Zhou, S. Qin, H. F. Zhou, “A differential pri- cess the same social network. vacy noise dynamic allocation algorithm for big mul- timedia data,” Multimedia Tools & Applications, vol. 78, no. C, pp. 1-19, 2018. References [15] T. Zhu, P. Xiong, G. Li, W. Zhou, “Correlated differ- ential privacy: Hiding information in non-IID data [1] A. Bhardwaj, V. Avasthi, S. Goundar, “Impact of set,” IEEE Transactions on Information Forensics Social Networking on Indian Youth: A Survey,” In- and Security, vol. 10, no. 2, pp. 229-242, 2015. ternational Journal of Electronics and Information Engineering, vol. 7, no. 1, pp. 41-51, 2017. [2] L. Chen, T. Yu, R. Chirkova, “Wave cluster with Biography differential privacy,” Computer Science, vol. 11, no. 2, pp. 191–198, 2015. Jian Liu, born in 1975, has received the doctor’s degree. [3] L. Chen, P. Zhu, “Preserving network privacy with He is a lecturer in Chengdu Technological University. He a hierarchical structure approach,” in 12th Interna- is interested in Internet of vehicles, big data analysis and tional Conference on Fuzzy Systems and Knowledge risk management. Discovery (FSKD’15), 2015. Feilong Qin, born in 1983, has received the doctor’s de- [4] G. Eibl, D. Engel, “Differential privacy for real smart gree. He is a lecturer in Chengdu Technological Univer- metering data,” Computer Science Research & De- sity. He is interested in mathematical geology and applied velopment, vol. 32, no. 1-2, pp. 173-182, 2016. statistics. International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 845

Verifiable Attribute-based Keyword Search Encryption with Attribute Revocation for Electronic Health Record System

Zhenhua Liu1, Yan Liu1, Jing Xu1, and Baocang Wang2 (Corresponding author: Yan Liu)

School of Mathematics and Statistics, Xidian University1 Xi’an 710071, P. R. China State Key Laboratory of Integrated Services Networks, Xidian University2 (Email: ly10 [email protected]) (Received Apr. 7, 2019; Revised and Accepted Dec. 1, 2019; First Online Jan. 29, 2020)

Abstract Traditional public key encryption can only support “one-to-one” model, which is not suitable for multi- Considering the security requirements of electronic client data sharing in EHR scenarios. Fortunately, Sahai health record (EHR) system, we propose a ciphertext- and Waters [16] first proposed the concept of attribute- policy attribute-based encryption scheme, which can sup- based encryption (ABE) in 2005, which can provide port data retrieval, result verification and attribute re- “one-to-many” service and be considered as one of the vocation. In the proposed scheme, we make use of the most appropriate encryption technologies for cloud stor- BLS signature technique to achieve result verification for age. Attribute-based encryption contains two variants: attribute-based keyword search encryption. In addition, Ciphertext-Policy ABE (CP-ABE), where the ciphertext key encrypting key (KEK) tree and re-encryption are uti- is associated with access policy, and key-policy ABE (KP- lized to achieve efficient attribute revocation. By giv- ABE), where a client’s secret key is associated with ac- ing thorough security analysis, the proposed scheme is cess policy. Furthermore, Narayan et al., [12] proposed proven to achieve: 1) Indistinguishability against selec- a privacy preserving EHR system using attribute-based tive ciphertext-policy and chosen plaintext attack un- encryption technology in 2010, which enables patients to der the decisional q-parallel bilinear Diffie-Hellman expo- share their data among health care providers in a flexi- nent hardness assumption; 2) Indistinguishability against ble, dynamic and scalable manner. Li et al., [9] designed chosen-keyword attack under the bilinear Diffie-Hellman a new ABE scheme for personal health records system assumption in the random oracle model. Moreover, the using multi-authority ABE, which avoids the key escrow performance analysis results demonstrate that the pro- problem. Reedy et al., [14] proposed a secure framework posed scheme is efficient and practical in electronic health for ensuring EHR’s integrity, and solved the key escrow record system. issue by using two-authority key generation scheme. Since Keywords: Attribute-Based Encryption; Attribute Revo- then, some attribute-based encryption schemes [5, 8] for cation; Electronic Health Records; Keyword Search; Veri- EHR system have been presented. fiability Although attribute-based encryption can achieve fine- grained data sharing, there are many problems to be con- 1 Introduction sidered in practical applications. For example, when a client leaves the system or discloses the secret key, it is Electronic health record (EHR) system can provide essential to revoke the client’s attributes or secret key. health record storage service that allows patients to In order to solve the problem, a lot of revocable ABE store, manage and share their EHR data with intended schemes (RABE) [3,17,23] have been put forward. Yu et clients [13]. With the development of electronic health al., [25] presented a revocable CP-ABE scheme by using record system, much sensitive information from patients proxy re-encryption, which allows an untrusted server to is being uploaded into the cloud. Since the cloud server update a ciphertext into a new ciphertext without de- may be dishonest, it is of vital importance to protect the cryption. Hur et al., [7] proposed an attribute-based ac- confidentiality of the sensitive EHR data. Furthermore, it cess control scheme with efficient revocation in data out- remains to be solved that how to securely share and search sourcing system using key encrypting key (KEK) tree. EHR data without revealing the information of patients. By using Chinese remainder theorem, Zhao et al., [26] in- International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 846 troduced an efficient and revocable CP-ABE scheme in result verification mechanism [10] is used to achieve cloud computing. However, there is few RABE schemes the verifiability for attribute-based keyword search for electronic health record system. encryption, which can reduce the computational In addition, attribute-based encryption can protect overhead of the client. data confidentiality, but hinder data retrieval from en- crypted data in cloud storage. To address this issue, 3) We provide thorough analysis of the security and per- searchable encryption (SE) is proposed. SE contains two formance of the proposed secure EHR sharing sys- types: symmetric searchable encryption (SSE) and asym- tem. The performance analysis results show that the metric searchable encryption (ASE). Song et al., [18] first proposed scheme is efficient and practical for elec- proposed the concept of symmetric searchable encryp- tronic health record system. tion. Boneh et al., [1] introduced the first public-key encryption with keyword search (PEKS) scheme, and for- 1.2 Organization malized a well-defined security notion of semantic secu- The rest of this paper is organized as follows. We de- rity under chosen-keyword attack. After that, a lot of scribe some preliminaries and system architecture in Sec- searchable encryption schemes [6, 15, 21] have been pro- tion 2. A formal definition and security model are given in posed. Furthermore, searchable encryption has widely Section 3. The proposed VABKS-AR scheme is presented been used in electronic health record system. For exam- in Section 4. The security proof and performance analysis ple, Xhafa et al., [24] presented an efficient fuzzy keyword are given in Section 5. Finally, we make the conclusions search scheme with multi-user over encrypted EHR data. in Section 6. Florence et al., [4] proposed an enhanced secure sharing of personal health record system scheme with keyword search in cloud. 2 Preliminaries and System Ar- Searchable encryption allows a client to search over the encrypted data in cloud storage to retrieve the inter- chitecture ested data without decryption. Nevertheless, the semi- trusted cloud server maybe performs search operation on 2.1 Bilinear Map the encrypted data and only returns a fraction of the Let G and GT be groups of prime order p, and g be a results. In order to resist the cloud server’s dishonest generator of G. The mape ˆ: G × G → GT is said to be an behavior, the verification technique [20] was introduced. admissible map if it satisfies the following properties [22]: Zheng et al., [27] proposed a verifiable attribute-based a b ab keyword search scheme using bloom filter and digital sig- 1) Bilinearity:e ˆ(g , g ) =e ˆ(g, g) for all a, b ∈ Zp. nature techniques, which has good performance in search 2) Non-degeneracy:e ˆ(g, g) 6= 1. efficiency, but needs huge computational overhead in the verification process. Sun et al., [19] introduced a verifiable 3) Computability: There is an efficient polynomial- attribute-based keyword search with fine-grained owner- time algorithm to computee ˆ(g, g). enforced search authorization in the cloud, but the ver- ification efficiency is low. Furthermore, Miao et al., [10] proposed a verifiable multi-keyword search over the en- 2.2 Access Structure crypted cloud data for dynamic data-owner. Let P1,P2, ··· ,Pn be a set of parties. A collection Unfortunately, the above existing schemes can not A ⊆ 2P1,P2,··· ,Pn is monotone for ∀B,C: if B ∈ A and achieve fine-grained access control with attribute revoca- B ⊆ C, then C ∈ A. An access structure [22] (respec- tion, data retrieval and result verification for EHR system, tively, monotone access structure) is a collection (respec- simultaneously. tively, monotone collection) A of non-empty subsets of P1,P2,··· ,Pn P1,P2, ··· ,Pn, i.e. A ⊆ 2 \{∅}. The sets in A 1.1 Our Contributions are called the authorized sets, and the sets not in A are called the unauthorized sets. Based on Waters’ scheme [22], we will propose a verifi- able attribute-based keyword search encryption scheme 2.3 Linear Secret Sharing Scheme (LSSS) with attribute revocation (VABKS-AR) for electronic health record system. Our contributions are described A linear secret-sharing scheme [22] Π over a set of par- as follows: ties P is described as follows:

1) We can achieve efficient attribute-level revocation by 1) The shares of each party form a vector over Zp. using a KEK tree and re-encryption. A KEK tree is utilized to distribute attribute group key and re- 2) There exists a share-generating matrix M for Π, encryption assures that the updated ciphertext can- where M has ` rows and n columns. For all i = not be decrypted by the revoked clients. 1, 2, ··· , `, the function ρ labels the i-th row of M as ρ(i). Consider the vector ~v = (s, r2, ··· , rn), where 2) Since the cloud service provider is semi-trusted, the s ∈ Zp is a secret to be shared, and r2, ··· , rn ∈ Zp International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 847

are chosen at random. µi = Mi · ~v is one of ` shares 2.6 KEK Tree of the secret s according to Π, where M ∈ n is the i Zp Let U = {u , u , ··· , u } be the universe of clients and i-th row of the matrix M. The share M · ~v belongs 1 2 n i L be the universe of descriptive attributes in the system. to party ρ(i). Let Gj ⊂ U be s set of clients that hold the attribute λ (j = 1, 2, ··· , q), which is referred to as an attribute Linear reconstruction property [22]: Suppose that j group. G will be used as a client access list to λ . Let G = Π is an LSSS for the access structure . Let S ∈ j j A A {G ,G , ··· ,G } be the universe of attribute groups and be any authorized set, and I = {i : ρ(i) ∈ S} ⊆ 1 2 q GK be the attribute group key that is shared among {1, 2, ··· , `}. Then, there exist constants {ω ∈ λj i the non-revoked clients in G ∈ G. } such that, if {µ } are valid shares of any se- j Zp i∈I i In a KEK tree [7], each node holds a KEK . A set of cret s according to Π, we have Σω µ = s. These j i i KEKs on the path node from leaf to root are called the constants ω can be found in polynomial time in the i path keys. A KEK tree is constructed by the data service size of the share-generating matrix M. manager as follows:

2.4 Bilinear Diffie-Hellman (BDH) As- 1) Each client uid (id = 1, 2, ··· , n) in the universe U is assigned to a leaf node of the tree. Random keys are sumption generated and assigned to all leaf nodes and internal nodes. Let G and GT be multiplicative cyclic groups with prime order p, and g be a generator of G. Given a tuple a b c 2) Each client uid ∈ U obtains the path keys P AKid ~y = (g, g , g , g ), where a, b, c are selected from Zp ran- from its leaf node to the root node of tree, securely. domly. The Bilinear Diffile-Hellman (BDH) problem [1] is abc For example, the client u4 has the path keys P AK4 = to computee ˆ(g, g) ∈ GT . An algorithm B has at least {KEK ,KEK ,KEK ,KEK } in Figure 1. advantage ε in solving the Bilinear Diffile-Hellman (BDH) 11 5 2 1 problem if 3) The minimum cover sets [11] node(Gj) is a minimum Pr[ˆe(g, g)abc ← B(~y)] ≥ ε. set of nodes in the tree, which can cover all of the leaf nodes associated with clients in Gj. KEK(Gj) is a BDH Assumption: We say the BDH assumption [1] holds set of KEK values owned by node(Gj). To consider if no probabilistic polynomial time algorithm can the intersection of P AKid and KEK(Gj), we have solve the BDH problem with a non-negligible proba- KEK = KEK(Gj) ∩ P AKid. bility ε. Let us give an example to illustrate the attribute groups Gj. Suppose {u1, u2, u3} are associated with 2.5 Decisional Parallel Bilinear Diffie- {λ1, λ2}, {λ1, λ2, λ3}, {λ2, λ3}, respectively. We have the Hellman Exponent Assumption attribute group G1 = {u1, u2},G2 = {u1, u2, u3},G3 = {u2, u3}. Let G be a group with order p, and g be a generator Consider the example in Figure 1. If the attribute of . Given G group for attribute λj is Gj = {u2, u3, u5, u6, u7, u8}

 q q+2 2q and u6 is associated with leaf node v13, we compute ~y = g, gs, ga, ··· , ga , ga , ··· , ga , the minimum cover sets node(Gj) = {v9, v10, v3} and q q+2 2q s·bj a/bj a /bj a /bj a /bj get KEK(Gj) = {KEK9,KEK10,KEK3}, which will ∀1≤j≤q, g , g , ··· , g , g , ··· , g , be used to encrypt the attribute group key GKλj in q  a·s·bk/bj a ·s·bk/bj ∀1≤k,j≤q,k6=j, g , ··· , g , the data re-encryption phase. Since u6 stores path keys P AK6 = {KEK13,KEK6,KEK3,KEK1}, we have KEK = KEK(G ) ∩ P AK = {KEK }, then u can where a, s, b1, ··· , bq ∈ Zp are chosen randomly, the j 6 3 6 decisional q-parallel bilinear Diffie-Hellman exponent decrypt the header message to get the attribute group (BDHE) problem [22] is to distinguish a valid tuple key GKλj using KEK3. aq+1·s eˆ(g, g) ∈ GT from a random element R ∈ GT . An al- gorithm B has advantage ε in solving the q-parallel BDHE 2.7 System Architecture problem if As shown in Figure 2, a verifiable attribute-based key- q+1 word search encryption scheme with attribute revocation Pr[B(~y, eˆ(g, g)a s) = 0] − Pr[B(~y, R) = 0] ≥ ε. (VABKS-AR) system consists of five entities: Trusted Authority (TA), Cloud Service Provider (CSP), Data Decisional q-Parallel BDHE Assumption: We say Owner/Patient, Client/Doctor, and Third Party Audit the decisional q-parallel BDHE assumption [22] holds (TPA). if no probabilistic polynomial time algorithm can solve the decisional q-parallel BDHE problem with ˆ Trusted Authority (TA): TA generates the pub- a non-negligible probability ε. lic parameter, the master secret key and the clients’ International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 848

Third Part Audit v1 Result Proof Challenge

Encrypted data v v 2 3 Data Owner/Patient Trapdoor Cloud Service Provider Client/Doctor Encrypted index

G v v v v j ' 4 5 6 7 PP SK

Trusted Authority v8 v9 v10 v11 v12 v13 v14 v15 Figure 2: System architecture of VABKS-AR u1 u2 u3 u4 u5 u6 u7 u8

Figure 1: KEK tree revocation is described as the following nine algorithms:

ˆ λ secret key according to their attributes. TA is fully Setup(1 ) → (PP,MSK): TA takes a security pa- trusted in the system. rameter λ as input, and outputs the public parameter PP and a master secret key MSK. ˆ Cloud Service Provider (CSP): CSP consists of a data server and a data service manager. The data ˆ KeyGen(MSK, id, S) → SK: TA runs this algo- server has huge storage space and a strong computa- rithm, which takes the master secret key MSK, the tional power. The data service manager is in charge identifier id of a legal client uid and attribute set of managing the attribute group keys of each at- S ⊆ L as input. This algorithm outputs a secret key tribute group and providing the corresponding ser- SK to the client uid. vices. We assume the data service manager is honest- but-curious. i.e., it will honestly performs the opera- ˆ Encrypt(PP, (M, ρ), K,W ) → (CT,IW ): The data tion but try to acquire much more information about owner runs this algorithm, which takes the public pa- the sensitive data. rameter PP , an access policy (M, ρ), the symmetric key K, and a set of keyword W as input. Using key ˆ Data Owner/Patient: A patient is viewed as the encapsulation technology, this algorithm outputs a data owner, which encrypts the sensitive data (i.e. ciphertext CT and an encrypted index set IW . electronic health record system data) and the key- word, and then uploads them to CSP in the form of ˆ Re-encrypt(CT, G, RL) → (CT 0, Cˆ): This algo- ciphertext. Meanwhile, the data owner enforces the rithm is performed by the data service manager. Tak- access policy for encrypted data, where the cipher- ing the ciphertext CT , attributes group G ⊆ G and a text will be shared with the client whose attributes revocation list RL as input. This algorithm outputs a satisfy the access structure embedded in ciphertext. re-encrypted ciphertext CT 0 and a header message Cˆ. ˆ Client/Doctor: A doctor is viewed as a client, ˆ Trapdoor(SK, w) → tk: The client runs this algo- which submits a search query to retrieve the en- rithm, which takes a secret key SK and a keyword crypted EHR stored on the cloud server. Upon re- w as input. This algorithm outputs tk to CSP. ceiving the query, the cloud server searches the in- tended ciphertext by the use of trapdoor. If a client 0 0 ˆ Search(P P, tk, IW ) → (C ,ID ): CSP runs this al- is not revoked and her or his attribute set satisfies the gorithm, which takes the public parameter PP , a access policy, the client can decrypt the ciphertext. search token tk and an encrypted index set IW as ˆ input. This algorithm outputs intended encrypted Third Party Audit (TPA): TPA can provide the 0 0 verification of search result and response a challenge file set C and corresponding identifier ID to TPA if to CSP. Upon receiving the challenge, CSP returns the search token tk matches with the index set IW ; a proof to TPA. Finally, TPA calculates a value to otherwise, outputs ⊥. verify the integrity of returned search results. ˆ Verify(PK,C0,ID0) → (0, 1): TPA runs this algo- rithm, which takes the data owner’s PK, the re- turned encrypted file set C0 and corresponding iden- 3 Formal Definition and Security tifier set ID0 as input. This algorithm outputs 1 if Model passes the result verification; otherwise, outputs 0. ˆ 0 3.1 Formal Definition Decrypt(CT ,SK) → K: The client runs this algo- rithm, which takes the ciphertext CT 0 and a secret In this section, the formal definition of a verifiable key SK as input. This algorithm outputs the sym- attribute-based keyword search encryption with attribute metric key K. International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 849

ˆ CTUpdate(CT 0,RL0) → (CT 00, Cˆ0): The data ser- ˆ Phase 1. The adversary A can adaptively query vice manager runs this algorithm, which takes the re- the challenger B for the trapdoor Tw of any keyword encrypted ciphertext CT 0 and a new revocation list w ∈ {0, 1}∗ in polynomial time. RL0 as input. This algorithm outputs an updated ˆ ciphertext CT 00 and a new header message Cˆ0. Challenge. The adversary A sends two equal length keywords w0 and w1 to the challenger B. The only restriction is that w0 and w1 have not been queried 3.2 Security Model for the trapdoor. The challenger B randomly selects

In this section, we will give two security models: in- one bit b ∈ {0, 1}, generates index Iwb for keyword distinguishability against selective ciphertext-policy and wb, and submits the challenge index Iwb to the ad- chosen plaintext attack (IND-sCP-CPA) game and in- versary A. distinguishability against chosen keyword attack (IND- ˆ Phase 2. The adversary A can issue more trap- CKA) game. The security of our scheme is based on the door queries for keyword w with the restriction w 6= following two games: w , w . Firstly, according to Waters’ scheme [22], we describe 0 1 the IND-sCP-CPA game as follows: ˆ Guess. The adversary A outputs its guess b0 of b and wins the game if b0 = b. ˆ Init. The adversary A gives the challenge access policy (M ∗, ρ∗) and a revocation list RL∗, where M ∗ The advantage of the adversary A is defined as follows: has n∗ ≤ q columns. 1 AdvIND−CKA = Pr[b0 − b] − . ˆ Setup. The challenger B runs the Setup algorithm, A 2 sends the public parameter PP to A, and then keeps Definition 2. A verifiable attribute-based keyword search the master secret key MSK for himself. encryption scheme with attribute revocation is IND-CKA ˆ Phase 1. The adversary A issues polynomial time secure if all polynomial time adversaries have at most a secret key queries for (id, S). The challenger B sends negligible advantage in the above game. SK to the adversary A, but with the restriction that:

∗ 0 1) if uid ∈/ RL , S = S, and the set of attribute 4 Concrete Construction S0 does not satisfy the challenge access policy (M ∗, ρ∗). The concrete construction is described as follows: ∗ 0 λ 2) if uid ∈ RL , then S = S \{λj∗ }, and the set of ˆ Setup(1 ): This algorithm selects a bilinear map 0 attribute S does not satisfy the challenge access eˆ : G × G → GT , such that G and GT are cyclic policy (M ∗, ρ∗). groups of order p, an λ-bit prime, and E(·) be a prob- ˆ abilistic symmetric encryption algorithm. We define Challenge. The adversary A selects two equal ∗ three hash functions H : Zp → G,H1 : {0, 1} → length message k0 and k1 to the challenger B. Then ,H : → {0, 1}log2 p. Let CL be a client list B randomly selects one bit b ∈ {0, 1} and encrypts k G 2 GT b and RL be a revocation list, where CL and RL are under (M ∗, ρ∗) and the revocation list RL∗. Finally, initially empty. TA runs this algorithm as follows: B sends the challenge ciphertext CT ∗ to A. 1) Pick random a, α ∈ and compute ˆ Phase 2. Same as Phase 1. Zp a α ˆ Guess. The adversary A outputs its guess b0 of b h = g ,Y =e ˆ(g, g) . and wins the game if b0 = b. 2) Publish the public parameter The advantage of the adversary A is defined as follows:  PP = e,ˆ h, Y, H, H1,H2,E(·), CL, RL 1 AdvIND−sCP −CPA = Pr[b0 − b] − . A 2 and keep the master secret key MSK = α him- self. Definition 1. A verifiable attribute-based keyword search encryption scheme with attribute revocation is IND-sCP- ˆ KeyGen(MSK, id, S): A client sends its identifier CPA secure if all polynomial time adversaries have at id and a set of attributes S ⊆ L to TA. TA runs this most a negligible advantage in the above game. algorithm as follows:

Secondly, according to Boneh’s scheme [1], we define 1) Select t ∈ Zp randomly and compute a secret the IND-CKA game as follows: key

ˆ Setup. The challenger B runs the Setup algorithm, α at t t  SK = K = g g ,L = g , {Kj = H(λj) }λ ∈S sends the public parameter PP to A, and then keeps j the master secret key MSK for himself. for the client uid ∈ U. International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 850

2) Add (id, gat) to the client list CL and send SK ˆ Trapdoor(SK, w): A client with identifier id and to the client. attribute set S inputs a secret key SK and a keyword w, The algorithm runs as follows: ˆ Encrypt(PP, (M, ρ), K, F,W ): The data owner in- puts the public parameter PP , an access policy 1) The client selects u ∈ Zp randomly, computes 1/u (M, ρ), a symmetric key K, a set of data file F and a qu = g , and sends (id, qu) to TA. Then, TA set of keyword W . This algorithm is run by the data retrieves gat according to id in the client list at α owner as follows: CL, computes qid = g qu , and sends qid to the client. 1) Encrypt the data file F = (f1, f2, ··· fd) as ck = 2) The client calculates a search token tk = Tw = EK(fk) (1 ≤ k ≤ d) with the symmetric key K. u 0 u 0 u  H1(w)qid,L = L , {Kj = Kj }λj ∈S and sends 2) Select random s, y2, ··· , yn ∈ Zp and set a col- tk to the data service manager. umn vector ~v = (s, y2, ··· , yn). For 1 ≤ i ≤ `, ˆ compute µi = Mi · ~v, where Mi is the i-th row Search(P P, tk, IW ): This algorithm inputs the pub- of M. Choose random numbers r1, ··· , r` ∈ Zp lic parameter PP , a search token tk, and encrypted and calculate index set IW . CSP runs this algorithm as follows:

αs s s CT = {C0,0 = K·eˆ(g, g) ,C0,1 = g ,C0,2 = h , 1) Compute aµi −ri ri ∀i = 1, ··· , ` : Ci = g H(ρ(i)) ,Di = g }. 0 eˆ(C0,1,Tw) αs s 3) Extract a set of keywords lw = 0 0 =e ˆ(g, g) · eˆ g, H1(w) . eˆ(L ,C0,2)

W = (w1, w2, ··· , wm) 2) If there exists some encrypted index Iwδ such that H (l ) = I , send (CT 0, Cˆ), the relevant from the data files F. For each keyword w (1 ≤ 2 w wδ δ 0 δ ≤ m), compute encrypted file set C = {c1, c2, ··· , cτ } and the corresponding identifier set ID0 = {1, 2 ··· , τ} αs s ϕδ =e ˆ(g, g) · eˆ(g, H1(wδ)) to TPA, where τ is the number of returned files; otherwise, return ⊥. and ˆ Verify(PK,C0,ID0): This algorithm inputs the I = {I = H (ϕ )}m , W wδ 2 δ δ=1 data owner’s PK, the returned encrypted file set C0 0 where IW is the encrypted index set for the key- and corresponding identifier set ID . TPA runs this word set W . algorithm as the following steps: x 4) Select a random x ∈ p, and compute PK = g Z 1) TPA randomly selects vr ∈ Zp, and generates a as its public key. For each encrypted data file chal =< r, vr > (r ∈ [1, τ]) to CSP. ck with identifier k, calculate a signature σk = c x 2) Upon receiving the chal of TPA, CSP computes (H1(k)g k ) with the data owner’s secret key x. vr ζ = Σr∈[1,τ]vrcr and σ = Πr∈[1,τ]σr , where cr x After the construction of CT , the data owner sends σr = (H1(r)g ) . Then CSP sends (ζ, σ) to d (CT,IW , {ck, σk}k=1) to CSP. TPA. 3) TPA verifies whether the following equation ˆ Re-encrypt(CT, G, RL): This algorithm inputs a holds or not. If hold, return 1 and send ciphertext CT , a set of attribute group G ⊆ G and (CT 0, C,Cˆ 0,ID0) to the client; otherwise, re- a revocation list RL. The data service manager runs turn 0. this algorithm as follows:

ζ vr  eˆ(σ, g) =e ˆ g · Πr∈[1,τ]H1(r) ,PK . 1) For ∀ Gj ∈ G, choose a random attribute group key GK ∈ ∗, and re-encrypt CT as: λj Zp ˆ Decrypt(CT 0,SK): This algorithm inputs the ci- 0 0 0 0 0 phertext CT , a secret key SK, and runs as follows: CT ={C0,0 = C0,0,C0,1 = C0,1,C0,2 = C0,2, 0 1) If a client has a valid attribute λj, i.e. uid ∈ Gj, ∀ i = 1, ··· , ` : Ci = Ci,  0 he can use a KEK ∈ KEK(Gj) ∩ P AKid to RL = ∅ : Di = Di, get the attribute group key GKλj . And then uid 0 GKλj RL 6= ∅ : Di = Di }. updates its secret key with the attribute group keys as follows: 2) Compute the minimum cover sets node(Gj) SK = K = gαgat,L = gt, of Gj in the KEK tree, get the correspond-

ing KEK(G ), and generate a ciphertext t/GKλj  j {Kj = H(λj) }λ ∈S . ˆ j C = {Eκ(GKλj )}κ∈KEK(Gj ), which called the header message. 2) Output (0, ⊥) if S does not satisfy (M, ρ). International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 851

 ωi 3) Otherwise, let I ⊂ {1, 2, ··· , `} be defined as Y 0 0 QCT = eˆ(Ci,L) · eˆ(Di,Kj) I = {i : ρ(i) ∈ S} and {ωi ∈ Zp|i ∈ I} be a set i∈I

of constants such that if µi are valid shares of Y  −ri = eˆgaµi Hρ(i) , gt any secret s according to M, then Σωiµi = s. We have i∈I t/GK ωi ri GKλ  λj  ·eˆ (g ) j ,H ρ(i) ω Y  0 0 0  i ast QCT = eˆ(Ci,L) · eˆ(Di,Kj) =e ˆ(g, g) . =e ˆ(g, g)Σaµiωit =e ˆ(g, g)ast i∈I 0 QCT K = C0,0 · 0 eˆ(C0,1,K) 4) Decrypt the ciphertext and obtain the symmet- eˆ(g, g)ast 0 QCT αs ric key: K = C0,0 · eˆ(C0 ,K) . = K· eˆ(g, g) s α at 0,1 eˆ(g , g g ) ast 0 eˆ(g, g) 5) Decrypt the encrypted data files C using K. = K· eˆ(g, g)αs eˆ(gs, gα) · eˆ(g, g)ast 1 ˆ 0 0 0 αs CTUpdate(CT ,RL ): This algorithm inputs CT = K· eˆ(g, g) αs = K 0 eˆ(g, g) and a new revocation list RL . If an attribute λj0 of vr  the client is revoked, TA sends the updated member- eˆ(σ, g) =e ˆ Πr∈[1,τ]σr , g  xv  ship list Gj0 to CSP. The data service manager runs cr  r =e ˆ Πr∈[1,τ] H1(r)g , g this algorithm as follows:  x  vr Σvr cr  =e ˆ Πr∈[1,τ] H1(r) · g , g 1) Select random s0, y0 , ··· , y0 ∈ n, a new at-   2 n Zp =e ˆ Π H (r)vr · gζ , gx tribute group key GK0 , and set a column vec- r∈[1,τ] 1 λj0   ~0 0 0 0 n ζ vr tor v = (s , y2, ··· , yn) ∈ Zp . For 1 ≤ i ≤ `, =e ˆ g · Πr∈[1,τ]H1(r) ,PK 0 ~0 compute µi = Mi · v , where Mi is the i-th row of M. 5 Security and Performance 0 0 2) Choose random numbers r1, ··· , r` ∈ Zp and update the ciphertext CT 0 as: 5.1 Security Analysis Theorem 1. If a probabilistic polynomial-time adversary 00 00 0 αs0 CT = C0,0 = C0,0 · eˆ(g, g) , A wins the IND-sCP-CPA game with non-negligible ad- 0 0 vantage ε, then we can construct a simulator B to solve C00 = C0 · gs ,C00 = C0 · hs , 0,1 0,1 0,2 0,2 the q-parallel BDHE problem with non-negiligible advan- 0 0 00 0 aµi −ri 0 ∀ i = 1, ··· , ` : Ci = Ci · g H(ρ(i)) , tage ε = ε/2. 0 0 GKλ 0 00 0 ri j0 ρ(i) ∈ RL : Di = Di · (g ) , Proof. Suppose A is an adversary that has advantage ε in 0 0 00 0 r GKλ  breaking the IND-sCP-CPA game. We construct a simu- ρ(i) ∈/ RL : D = D · (g i ) j . i i lator B that can solve the q-parallel BDHE problem with probability at least ε0. 3) Compute a new minimum cover set and gen- ˆ Init. The simulator B is given a decisional q-parallel erate a new header message with updated challenge vector ~y and a random number T . The ad- KEK(Gj0 ) as follows: versary A selects the challenge access policy (M ∗, ρ∗) and the revocation list RL∗, where M ∗ has n∗ ≤ q Cˆ0 =({E (GK0 )} , columns. κ λj0 κ∈KEK(Gj0 ) ∀ : {E (GK )} ). ˆ 0 λj ∈S\{λj0 } κ λj κ∈KEK(Gj ) Setup. The simulator B randomly selects α ∈ Zp, q 0 computese ˆ(g, g)α =e ˆ(ga, ga ) · eˆ(g, g)α , which im- 0 q+1 Correctness. The proposed scheme is correct as the fol- plies that α = α + a . We use a list called H-list lowing equations hold: to run the random oracle H for B. The simulator B responds as follows: 0  s atu α eˆ C ,Tw eˆ g ,H1(w)g g 1) If H(j) has already appeared on the H-list, then l = 0,1 = w 0 0 t u as eˆ(L ,C0,2) eˆ (g ) , g B returns the value that was predefined before. αs s astu 2) Otherwise, let X be the set of indices i that eˆ(g, g) · eˆ g, H1(w) · eˆ(g, g) = ∗ eˆ(g, g)astu makes ρ (i) = λj∗ true. B randomly selects a number z ∗ ∈ , and executes the random αs s j Zp =e ˆ(g, g) · eˆ g, H1(w) oracle: International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 852

∗ ∗ ∗ zj∗ 0 2 0 n −1 0∗ n – If X = ∅, H(j ) = g ; ~v = (s, sa + y2, sa + y3, ··· , sa + yn ) ∈ Zp . – If X 6= ∅, we have For i = 1, ··· , `, we define Ri as the set of all k 6= i ∗ ∗ ∗ such that ρ (i) = ρ (k). The challenge ciphertext Ci ∗ 2 ∗ ∗ z ∗ Y aM /b a M /b H(j ) =g j g i,1 i · g i,2 i is set as:

i∈X 0  ∗ 0  ∗ ∗ ri Y a −Mi,j yj s·bi −zρ∗(i) n∗ ∗ Ci =H(ρ (i)) (g ) (g ) a Mi,n∗ /bi ∗ ···· g (n ≤ q). j=2,··· ,n∗ −M ∗ ˆ  Y Y j  k,j Phase 1. The adversary A issues polynomial time · (ga ·s·bi/bk ) . secret key queries for (id, S). Suppose the adversary ∗ k∈Ri j=1,··· ,n sends the identifier and the corresponding set of at- tributes (id, S) to the simulator B, but the following 1) For the non-revoked attribute ρ∗(i), a challenge 0 ∗ −ri −sbi restrictions must be satisfied: ciphertext is set as Di = g g . ∗ 1) If u ∈/ RL∗, S0 = S, and the attributes set S0 2) For the revoked attribute ρ (i) = λj∗ and id ∗ 0 ∗ ∗ j 6= i, by selecting a random number GK , does not satisfy (M , ρ ). λj∗ ∗ ∗ 0 B computes the challenge ciphertext Di = ∗ 2) If uid ∈ RL , then S = S \{λj }, and the 0 0 −r −sb GKλ ∗ attributes set S0 does not satisfy (M ∗, ρ∗). (g i g i ) j . The simulator B chooses a vector −→ω = B gives the challenge ciphertext n∗ (ω1, ω2, ··· , ωn∗ ) ∈ Zp . For any i, such that ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 ∗ −→ CT = (C0,0,C0,1,C0,2, {Ci ,Di }i=1,··· ,`) ρ (i) ∈ S and ω1 = −1, we have Mi · ω = 0. B randomly selects a number r ∈ , and computes a Zp to A. secret key as follows: ˆ 0 Y q+2−i Phase 2. Same as Phase 1. K = gαgat = gα gar (ga )ωi , 0 i=2,··· ,n∗ ˆ Guess. The adversary A outputs γ of γ. B returns q+1 Y q+1−i µ = 0 and responds T = e(g, g)a ·s if γ0 = γ; oth- L = gt = gr · (ga )ωi , erwise, B returns µ = 1 and responds T ∈ T as a 1,··· ,n∗ G random element. which implies If µ = 0, A obtains a valid ciphertext of kγ . The ad- q q−1 q−n∗+1 0 t = r + ω1a + ω2a + ··· + ωn∗ a . vantage of A in this situation is ε, therefore Pr[γ = γ|µ = 0] = 1/2 + ε. Since B guesses µ0 = 0 when γ0 = γ, we 0 ∗ 0 For ∀ λj∗ ∈ S , when there is no i such that ρ (i) = have Pr[µ = µ µ = 0] = 1/2 + ε. z ∗ 0 λj∗ , we let Kj = L j . While for those attributes If µ = 1, we have Pr[γ 6= γ µ = 1] = 1/2. As B guesses 0 λj∗ ∈ S that satisfy the access structure, B can not µ0 = 1 when γ0 6= γ, we have Pr[µ0 = µ µ = 1] = 1/2. a(q+1)/bi ∗ simulate the items g . However, we have Mi · The advantage of B to solve the decisional q-parallel (q+1)/b −→ω = 0. Therefore, all of these terms of ga i BDHE problem is ε0 = ε/2 as follows. can be canceled. Let X be the set of indices i such ∗ 0 1 that ρ (i) = λj∗ . The simulator B computes Kj∗ as Pr[µ = µ] − follows: 2 1 0 1 0 1  j = Pr[µ = µ µ = 0] + Pr[µ = µ µ = 1] − zj∗ Y Y (a /bi)r Kj∗ =L (g ) 2 2 2 1 1 1 1 1 i∈X j=1,··· ,n∗ = ( + ε) + · − ∗ 2 2 2 2 2 q+1+j−k/b Mi,j Y a i ωk ε · (g ) . = . k=1,··· ,n∗,k6=j 2

ˆ Challenge. The adversary A chooses two equal length challenge message k0 and k1 to the simula- Theorem 2. If a probabilistic polynomial-time adversary tor B. B randomly selects a number s ∈ Zp and a A wins the IND–CKA game with non-negligible advantage random bit γ ∈ {0, 1}. Then it computes as follows: , then we can construct a simulator B to solve the BDH problem with non-negiligible advantage 0 = /(e·q ·q ) ∗ s α0 ∗ s ∗ s T H2 C0,0 = kγ · T · e(g , g ),C0,1 = g ,C0,2 = h . where e is the base of the nature logarithm.

∗ It is difficult for B to simulate Ci since it contains Proof. Suppose A is an attack algorithm that has advan- j ga s that B can not simulate. However, B randomly tage  in breaking the IND–CKA game. We construct an 0 0 0 0 selects y2, ··· , yn∗ ∈ Zp and r1, ··· , r` ∈ Zp. Then algorithm B that solve the BDH problem with probability simulator B shares the secret s utilizing the vector at least 0. International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 853

aη u 0 u 0 u Suppose that A makes at most qH2 hash function g qid,L = L , {Kj = Kj }j∈S). Therefore, 0 0 queries to H2 and at most qT trapdoor queries. The al- tk = (Tw,L ,Kj) is a valid search token. B α β γ gorithm B is given g, u1 = g , u2 = g , u3 = g ∈ G. It returns tk to A. αβγ aims at outputinge ˆ(g, g) ∈ GT . ˆ Challenge. The adversary A sends two equal-length ˆ Setup. The algorithm B starts by giving A the pub- keywords w0 and w1 to B. The algorithm B generates lic parameters PP . B simulates the challenger and a challenge index as follows: interacts with A as follows: 1) B runs H1-queries to obtain h0, h1 ∈ G ˆ Phase 1. The adversary A can query the following such that H1(w0) = h0 and H1(w1) = h1. For random oracle at any time. η = {0, 1}, let < wη, hη, aη, cη > be the corre- sponding tuples on the H1-list. If both c0 = 1 OH1 (wη): The algorithm B creates a list of tuple < w , h , a , c > called the H -list. The list is initially and c1 = 1, then B reports failure and termi- η η η η 1 nates. empty. When A asks the random oracle H1 at the ∗ point of wη ∈ {0, 1} , B responds as follows: 2) We know that at least one of c0, c1 is equal to 0. B randomly picks b ∈ {0, 1} such that cb = 0. 1) If the query wη appears on the H1-list in a tu- 3) The algorithm B selects s from Zp randomly, ple < wη, hη, aη, cη >, then B responds with s s and sets I = (C0,1 = g ,C0,2 = h ). Let ϕb = H1(wη) = hη. γ a γ eˆ(u1, u2) · eˆ(g, u2g b ) . B runs the above H2- ∗ p 2) Otherwise, B selects a random cη ∈ {0, 1} so queries algorithm to obtain J ∈ {0, 1}log . B that Pr[cη = 0] = 1/(qT + 1). stores the tuple < ϕb, J > in the H2-list and

responds to A with the challenge Iwb = J for a 3) B picks a random aη ∈ Zp. If cη = 0, B com- p aη log putes hη = u2 · g ; otherwise, B computes random J ∈ {0, 1} . Let γ = s, we have aη hη = g . The algorithm B adds the tuple γ γ H2(ˆe(u1, u2) · eˆ(g, H1(wb)) ) = J, < wη, hη, aη, cη > to the H1-list and responds to A with H1(wη) = hη. γ γ i.e J = H2(ˆe(u1, u2) · eˆ(g, H1(wb)) ) = γ a γ H2(ˆe(u1, u2) · eˆ(g, u2g b ) ). OH2 (ϕη): The H2-list is initially empty. At any time the adversary A can issue a query to H2. The algo- ˆ Phase 2. Same as Phase 1. A can continue to issue rithm B responds as follows: the trapdoor queries for keywords wη, where the only restriction is that wη 6= w0, w1. B responds to these 1) If the query on ϕη exists in the H2-list, B re- queries as before. sponds Iwη to A. ˆ 0 2) Otherwise, B picks a new random value Iwη ∈ Guess. The adversary A outputs its guess b ∈ {0, 1} log p {0, 1} for each new ϕη and sets H2(ϕη) = of b. A computes ϕb as follows:

Iwη . The algorithm B adds the pair (ϕη,Iwη ) to eˆ(C0,1,Tw) the H2-list and sends Iwη to A. ϕb = 0 eˆ(L ,C0,2) O (id): The algorithm B creates a list of tuple < qid eˆgs,H (w)q  SK, q , C > called the table T . Upon receiving a 1 id id = u s query of secret key on A and a commitment value C. (L , h ) eˆ(gs, gaη gatugα) B checks whether the tuple appears on T . = eˆ(gt)u, gas 1) If so, B returns qid to A. eˆ(g, g)αs · eˆ(g, gaη )s · eˆ(g, g)astu α/u at = 2) Otherwise, B sets qid = g · g and sends it eˆ(g, g)astu to A. =e ˆ(g, g)αs · eˆ(g, gaη )s. Otk(id, wη): The adversary A issues a query for the s trapdoor corresponding to the keyword wη and the If A can break our scheme, we have ϕb =e ˆ(u1, u2) · ab s client identifier id, and then B responds as follows: eˆ(g, g ) . B searches Iwb from the H2-list for ϕb and a αβγ outputs ϕb/eˆ(K1, u2g b ) =e ˆ(g, g) . The adversary 1) B runs the above H1-queries to obtain hη ∈ γ G A must have issued a query for either H2(ˆe(u1, u2) · such that H1(wη) = hη. Let < wη, hη, aη, cη > γ γ γ eˆ(g, H1(w0)) ) or H2(ˆe(u1, u2) · eˆ(g, H1(w1)) ). There- be the corresponding tuple on the H1-list. If fore, with the probability 1/2 the H2-list contains a pair cη = 0, then B responds failure and aborts the γ γ whose left hand side is ϕη =e ˆ(u1, u2) · eˆ(g, H1(wb)) . game; If B picks this pair (ϕη,I) from the H2-list, then aη a αβγ 2) Otherwise, we know cη = 1 and hη = g . B ϕη/eˆ(K1, u2g b ) =e ˆ(g, g) as required. αβγ selects u from Zp randomly, searches the ta- We will analyze that B correctly outputse ˆ(g, g) ble T for SK, and sets tk = (gaη · gα(gat)u = with probability at least 0. The probability that B does International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 854 not abort during the simulation phase is at least 1/e, Sun et al.’s scheme needs 3n exponentiations in G, and the probability that B does not abort during the one exponentiation in GT , and one pairing opera- challenge phase is at least 1/qT . Therefore, B does not tion, while our scheme only requires one exponen- abort with the probability at least 1/eqT . In a real attack tiations in G, one exponentiation in GT , and one γ γ game A issues a query for ϕη =e ˆ(u1, u2) · eˆ(g, H1(wb)) pairing operation. In Keygen algorithm, our scheme with probability at least . The adversary A issues needs (3+na,u) exponentiations in G, but Sun et al.’s γ γ an H2 query for either H2(ˆe(u1, u2) · eˆ(g, H1(w0)) ) or scheme needs (2n + 1) exponentiations in G and two γ γ H2(ˆe(u1, u2) ·eˆ(g, H1(w1)) ) with probability at least 2. exponentiations in GT . In Encrypt algorithm, our The detailed analysis of above results is shown in Boneh et scheme needs (3k+2) exponentiations in G, three ex- al.’s scheme [1]. B will choose the correct pair with prob- ponentiations in GT and one pairing operation, but ability at least 1/qH2 . Assuming B does not abort during Sun et al.’s scheme needs (n + 1) exponentiations in the simulation, it will produce the correct answer with G, one exponentiation in GT and one pairing oper- probability /qH2 . Since B does not abort with the prob- ation. The time cost of our scheme is a little larger ability at least 1/eqT , the probability of B successfully than Sun et al.’s scheme. In T rapdoor algorithm, our αβγ outputse ˆ(g, g) with probability /(e · qT · qH2 ). scheme needs (k + 4) exponentiations in G, However, Sun et al.’s scheme needs (2n + 1) exponentiations in G, which is larger than our scheme. In Search Table 1: Notations algorithm, Sun et al.’s scheme requires one exponen- Symbols Description tiation and (n+1) pairing operations, but our scheme P the pairing operation only needs two exponentiations in GT and two pair- E the group exponentiation in G ing operations. ET the group exponentiation in GT n the number of attributes in the system Experiment result: We conduct our experiments us- na,u the number of attributes a client possesses ing Java Pairing-Based Cryptography (JPBC) li- k the number of attributes embedded in a ciphertext brary [2]. We implement the proposed scheme on a Windows machine with Intel Core 2 processor run- ning at 3.30 GHz and 4.00 G memory. The running environment of our scheme is Java Runtime Environ- 5.2 Performance Analysis ment 1.7, and the Java Virtual Machine(JVM) used In this section, we will give the performance analysis to compile our programming is 64 bit which brings from the perspective of functional comparison, computa- into correspondence with our operation system. tion cost, and experiment result. In Table 1, we define In our experiments, we compare the proposed scheme with some notations which will be used in this section. Sun et al.’s scheme [19] and Wang et al.’s scheme [21] in Functionality comparisons: In Table 2, we give the the search time. The modulus of the elements in the group comprehensive comparisons according to some im- is chosen to be 512 bits, the number of attributes ranges portant features, including expressive, attribute re- from 10 to 50. vocation, keyword search and the verifiability. From From Figure 3, we know that the search time cost Table 2, Hur et al.’s scheme [7] can achieve fine- grows linearly with the number of attributes in Sun et grained attribute revocation, but not support data al.’s scheme, and the search time cost of Wang et al.’s retrieval and result verification. Zheng et al.’s scheme is less than Sun et al.’s scheme. However the scheme [27] can provide verifiability and fine-grained search time cost of the proposed scheme is the most effi- keyword search, but there are huge computational cient than the other two schemes. For example, when the overhead in the verification process. Sun et al.’s number of attributes is 50, the search time consumption scheme [19] can achieve data retrieval, the verifia- of the proposed scheme only needs 0.03s, while Sun et bility, and revocation, but the verification progress al.’s scheme needs about 0.8s and Wang et al.’s scheme is low and only support system-level client revoca- needs 0.05s. Therefore, compared with the above two tion. Wang et al.’s scheme [21] can achieve attribute schemes, the proposed scheme is more efficient and prac- revocation and keyword search, but the verifiability tical. of search results is not considered. In general, com- pared with the above schemes, our scheme has better functionality. 6 Conclusions Computation cost: In Table 3, since we have the same In this paper, we have proposed a verifiable attribute- functionality as Sun et al.’s scheme [19], we briefly based keyword search encryption with attribute revoca- compare our computational costs with Sun et al.’s. tion for electronic health record system. By using a KEK As the operation cost over Zp is much less than group tree and re-encryption techniques, the proposed scheme and pairing operation, we ignore the computational can achieve efficient revocation and assure that the up- time over Zp. From Table 3, In Setup algorithm, dated ciphertext cannot be decrypted by the revoked International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 855

Table 2: The comparisons of the functionality Schemes Expressive Revocation Keyword Search Verifiability √ Hur et al.’s scheme [7] access tree Ö Ö √ √ √ Sun et al.’s scheme [19] AND gate √ √ Wang et al.’s scheme [21] LSSS Ö √ √ Zheng et al.’s scheme [27] access tree Ö √ √ √ Ours LSSS

900 References

, 800 Sun et al. s scheme Wang et al.,s scheme 700 Ours scheme [1] D. Boneh, G. D. Crescenzo, R. Ostrovsky, and

600 G. Persiano, “Public key encryption with keyword

500 search,” in International Conference on the The- 400 ory and Applications of Cryptographic Techniques, 300 pp. 506–522, May 2004. the search time(ms) 200 [2] A. D. Caro and V. Iovino, “jPBC: Java pairing 100 based cryptography,” in Proceedings of The 16th

10 15 20 25 30 35 40 45 50 IEEE Symposium on Computers and Communica- the number of attributes in the system tions (ISCC’11), pp. 850–855, 2011. Figure 3: The time cost of EHR system [3] P. S. Chung, C. W. Liu, and M. S. Hwang, “A study of attribute-based proxy re-encryption scheme in cloud environments,” International Journal of Net- work Security, vol. 16, no. 1, pp. 1–13, 2014. clients. In addition, we introduce TPA to verify the in- [4] M. L. Florence and D. Suresh, “Enhanced secure tegrity of the returned search results, which can reduce sharing of PHRs in cloud using attribute-based en- the client’s computation overhead. Furthermore, perfor- cryption and signature with keyword search,” Ad- mance analysis shows that our scheme is efficient and vances in Big Data and Cloud Computing, vol. 645, practical for electronic health record system. Since the pp. 375–384, 2018. policy may contain some sensitive information of patient. [5] R. Gandikota, “A secure cloud framework to share The proposed scheme do not support policy hiding. For EHRs using modified CP-ABE and the attribute the future work, we intend to propose a privacy-preserving bloom filter,” Education and Information Technolo- attribute-based keyword search encryption scheme. gies, vol. 23, no. 5, pp. 2213–2233, 2018. [6] M. T. Hu, H. Gao, and T. G. Gao, “Secure and effi- cient ranked keyword search over outsourced cloud data by chaos based arithmetic coding and con- Acknowledgments fusion,” International Journal of Network Security, vol. 21, no. 1, pp. 105–114, 2019. This work is supported by the National Natural Science [7] J. Hur and D. K. Noh, “Attribute-based access con- Foundation of China under Grants No. 61807026, the trol with efficient revocation in data outsourcing Natural Science Basic Research Plan in Shaanxi Province systems,” IEEE Transactions on Parallel and Dis- of China under Grant No. 2019JM-198, the Plan For Sci- tributed Systems, vol. 22, no. 7, pp. 1214–1221, 2011. entific Innovation Talent of Henan Province under Grant [8] M. Joshi, K. Joshi, and T. Finin, “Attribute based No. 184100510012, and in part by the Program for Sci- encryption for secure access to cloud based EHR ence and Technology Innovation Talents in the Universi- systems,” in Proceedings of IEEE 11th Intenational ties of Henan Province under Grant No. 18HASTIT022. Conference on Cloud Computing, pp. 932–935, July 2018. [9] M. Li, S. C. Yu, Y. Zheng, K. Ren, and W. J. Lou, “Scalable and secure sharing of personal health Table 3: The comparisons of computation cost records in cloud computing using attribute-based en- Operations Sun et al.’s scheme [19] Ours cryption,” IEEE Transactions on Parallel and Dis- Setup 3nE + ET + P ET + E + P tributed Systems, vol. 24, no. 1, pp. 131–143, 2013. KeyGen (2n + 1)E + 2ET (3 + na,u)E Encrypt (n + 1)E + ET + P (3k + 2)E + 3ET + P [10] Y. Miao, J. Ma, X. Liu, and F. Wei Z. Liu, L. Shen, Trapdoor (2n + 1)E (4 + k)E “VMKDO: Verifiable multi-keyword search over en- Search (n + 1)P + E 2P + 2E T T crypted cloud data for dynamic data-owner,” Peer- to-Peer Networking and Applications, vol. 11, no. 2, pp. 287–297, 2018. International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 856

[11] D. Naor, M. Naor, and J. Lotspiech, “Revocation rect revocation and keyword search in cloud comput- and tracing schemes for stateless receivers,” in Pro- ing,” Sensors, vol. 18, no. 7, pp. 1–17, 2018. ceedings of The 21st Annual International Cryptology [24] F. Xhafa, J. F. Wang, X. F. Chen, J. K. Liu, J. Li, Conference, pp. 41–62, Aug. 2001. P. Krause, and D. S. Wong, “An efficient PHR ser- [12] S. Narayan, M. Gagne, and R. S. Naini, “Privacy vice system supporting fuzzy keyword search and preserving HER system using attribute-based infras- fine-grained access control,” Soft Computing, vol. 18, tructure,” in Proceedings of The ACM Workshop no. 9, pp. 1795–1802, 2014. on Cloud Computing Security Workshop, pp. 47–52, [25] S. Yu, C. Wang, K. Ren, and W. Lou, “Attribute Oct. 2010. [13] Y. S. Rao, “A secure and efficient ciphertext-policy based data sharing with attribute revocation,” in attribute-based signcryption for personal health Proceedings of The 5th ACM Symposium on Informa- records sharing in cloud computing,” Future Gener- tion, Computer and Communications Security (ASI- ation Computer System, vol. 67, pp. 133–151, 2017. ACCS’10), pp. 261–270, Apr. 2010. [14] B. E. Reedy and G. Ramu, “A secure framework for [26] Y. Zhao, M. Ren, S. Jiang, G. Zhu, and H. Xiong, ensuring EHR’s integrity using fine-grained auditing “An efficient and revocable storage CP-ABE scheme and CP-ABE,” in Proceedings of The IEEE 2nd Inte- in the cloud computing,” Computing, pp. 1–25, 2018. national conference on Big Data Security, pp. 85–89, Apr. 2016. [27] Q. Zheng, S. Xu, and G. Ateniese, “VABKS: Verifi- [15] S. Rezaei, M. A. Doostari, and M. Bayat, “A able attribute-based keyword search over outsourced lightweight and efficient data sharing scheme for encrypted data,” in Proceedings of IEEE Conference cloud computing,” International Journal of Elec- on Computer Communications, pp. 522–530, Apr. tronics and Information Engineering, vol. 9, no. 2, 2014. pp. 115–131, 2018. [16] A. Sahai and B. Waters, “Fuzzy identity-based en- cryption,” in Proceedings of The 24th Annual Inter- national Conference on the Theory and Applications Biography of Cryptographic Techniques, pp. 457–473, May 2005. [17] D. Sethia, H. Saran, and D. Gupta, “CP-ABE for se- Zhenhua Liu received the B. S. degree from Henan lective access with scalable revocation: A case study Normal University, and the master’s and Ph.D. degrees for mobile-based healthfolder,” International Jour- from Xidian University, China, in 2000, 2003, and 2009, nal of Network Security, vol. 20, no. 4, pp. 689–701, respectively. He is currently a Professor with Xidian 2018. University. His research interests include cryptography [18] D. Song, D. Wagner, and A. Perrig, “Practical tech- and information security. niques for searches on encrypted data,” in Proceed- ings of IEEE Symposium on Security and Privacy, Yan Liu received the B. S. degree from Shenyang Agri- pp. 44–55, May 2000. cultural University in 2017. She is currently pursuing the [19] W. Sun, S. Yu, W. Lou, Y. Hou, and H. Li, “Pro- master’s degree in mathematics with Xidian University. tecting your right: Verifiable attribute-based key- Her research interests include cryptography and cloud word search with fine-grained owner-enforced search security. authorization in the cloud,” IEEE Transactions on Jing Xu received the B. S. degree from Henan Nor- Parallel and Distributed Systems, vol. 27, no. 4, mal University in 2017. She is currently pursuing the pp. 1187–1198, 2016. master’s degree in mathematics with Xidian University. [20] S. F. Tzeng, C. C. Lee, and M. S. Hwang, “A batch Her research interests include cryptography and cloud verification for multiple proxy signature,” Parallel security. Processing Letters, vol. 21, no. 1, pp. 77–84, 2011. [21] S. P. Wang, D. Zhang, Y. Zhang, and L. Liu, “Effi- Baocang Wang received the B. S. degree in Computa- ciently revocable and searchable attribute-based en- tional Mathematics and Their Application Softwares from cryption scheme for mobile cloud storage,” IEEE Ac- Xidian University, China, in 2001, the M. S. degree in cess, vol. 6, pp. 30444–30457, 2018. Cryptology from Xidian University, China, in 2004, and [22] B. Waters, “Ciphertext-policy attribute-based en- the Ph.D. degree in Cryptology from Xidian University, cryption: An expressive, efficient, and provably se- China, in 2006. He is currently a Professor and Ph. D. cure realization,” in Proceedings of The 14th Interna- supervisor with Xidian University. His research interests tional Conference on Practice and Theory in Public include postquantum cryptography, fully homomorphic Key Cryptography (PKC’11), pp. 53–70, Mar. 2011. cryptography, number theoretic algorithms, and cloud [23] A. Wu, D. Zheng, Y. L. Zhang, and M. L. Yang, security. “Hidden policy attribute-based data sharing with di- International Journal of Network Security, Vol.22, No.5, PP.857-862, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).16) 857

An Unlinkable Key Update Scheme Based on Bloom Filters for Random Key Pre-distribution

Bin Wang (Corresponding author: Bin Wang)

Information Engineering College, Yangzhou University No. 196 West HuaYang Road, Yangzhou City, Jiangsu Province, P. R. China, 225127 (Email: [email protected]) (Received Apr. 16, 2019; Revised and Accepted Dec. 10, 2019; First Online Jan. 29, 2020)

Abstract sink node) in a secure manner to prevent an adversary from breaking data privacy or mounting a forgery at- Eschenauer et al. presented an efficient random key pre- tack [1, 13]. Hence security is an important issue to be distribution scheme for WSNs that assigns symmetric addressed for wide deployment of WSNs. To provide se- keys to sensor nodes by randomly sampling from a large curity services (e.g., data encryption or identity authenti- key pool. Most research in this line assume nodes ex- cation) for WSNs, it is necessary to establish shared keys change key identifiers to determine common keys between between nodes via appropriate key management mecha- them. However, an adversary can learn topology informa- nisms. However, as sensor nodes are resource-constrained tion of the underlying random key graph by intercepting equipments with limited storage and computational capa- exchanged key identifiers. In addition, when key expo- bility, traditional public key cryptographic schemes(e.g., sure occurs, compromised nodes should be revoked and Diffie-Hellman key agreement protocol [5]) are not appli- uncompromised nodes’ key rings should be updated se- cable for WSNs since computational cost of public key curely. In this paper, we design an unlinkable key update operations are too costly to be implemented for sensor mechanism that can revoke compromised nodes while an nodes [4]. That is, energy efficiency is an important fac- adversary is infeasible to link key identifiers with a node. tor to be considered when handling security challenges for A key update node is responsible for distributing a ran- WSNs [8]. dom seed among uncompromised nodes in order to update As senor nodes can only afford light-weight op- their key rings securely. The revoked keys are represented erations such as hash operations, symmetric encryp- by a bloom filter to avoid exchange of key identifiers when tion/decryption operations, Eschenauer and Gligor [6] checking whether a node is compromised. As a bloom fil- suggested a random key pre-distribution scheme for ter has zero false negative rate, we utilize negative answers WSNs in which each sensor node is equipped with a returned by a bloom filter to identify uncompromised keys fixed-sized key ring comprising symmetric keys randomly and nodes with high probability. Then a local broadcast sampled from a large key pool before network deploy- mechanism is used to speed up update of uncompromised ment. Afterwards, two nodes can compute a session nodes’ key ring securely. key for secure communication if their key rings share at Keywords: Bloom Filter; Key Pre-distribution; Unlink- least a common key. Connectivity of the induced ran- ablity dom key graph is proven to hold asymptotically under certain choices of system parameters [17]. Several im- provements or extensions to this kind of random key pre- 1 Introduction distribution schemes are suggested such as q-composite Wireless sensor networks (WSNs) are networks consist- random key pre-distribution scheme [3], random pairing ing of battery-powered sensor nodes that are able to per- key pre-distribution scheme [16]. On the other hand, de- form sensing tasks, data processing and multi-hop wire- terministic key pre-distribution schemes based on com- less communication. With the rapid development in sen- binatorial designs [14] are also presented as alternatives sor technologies and wireless communication, WSNs have to key pre-distribution schemes for WSNs. Determinis- been widely used in applications such as environment tic key pre-distribution schemes have the advantage that monitoring, target tracking, military operations and at- secure connectivity property can be proven to hold in a tracted a lot of attention from research communities [15]. deterministic way. As sensor nodes may be deployed in hostile environments, A subtle point inherent in random key pre-distribution they must forward data packets to a base station (a schemes for WSNs is how to determine common keys be- International Journal of Network Security, Vol.22, No.5, PP.857-862, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).16) 858 tween a pair of nodes. That is, a kind of key confirma- in our solution uses a bloom filter to represent the set tion mechanism is a pre-requisite in order to find common of revoked keys and is responsible for evolving key pool keys between nodes. Currently, it is generally assumed and uncompromised nodes’ key rings as well as excluding that nodes can exchange their own key identifiers directly compromised nodes. Then the key update node runs sev- as a solution for key confirmation. However, the poten- eral rounds of unlinkbale key confirmation process based tial security risk of this simple solution is that an adver- on this bloom filter to determine whether a node is com- sary may link observed key identifiers with nodes after promised or not. Recall that a node is compromised if its observing communication between nodes. These informa- key ring is a subset of the revoked keys. If we use positive tion can help an adversary to deduce topology structure answers from the bloom filter to identify compromised of the underlying key graph. keys, efficiency issues due to non-zero false-positive rate To counteract the above mentioned security risk of the of bloom filters will also be encountered. As a bloom fil- simple solution for key confirmation, Marek Klonowski ter has zero false negative rate, we suggest that negative and Piotr Syga [9] presented a novel unlinkable key confir- answers from the bloom filter can be used as indications mation solution based on bloom filters to determine com- of uncompromised keys to identify uncompromised nodes. mon keys between a pair of nodes. Each node should Analysis shows that an uncompromised node can deter- compute a local bloom filter as a compact representation mine its unrevoked keys shared with the key update node for secret keys hold by itself. Then two nodes can ex- with high probability in specified parameters setting. change their bloom filters other than key identifiers. A Then a random seed that can be used to refresh uncom- node can check whether a key hold by itself is also an promised nodes’ key rings should be generated and dis- element of the other node’s key ring by issuing set mem- tributed among uncompromised nodes securely.The key bership queries to the other node’s bloom filter. As posi- update node broadcasts the random seed encrypted un- tive answers returned by bloom filters may be faulty with der a shared unrevoked key to a selected uncompromised non-zero probability (false positive rate), their key confir- node and its neighboring nodes in the formed key graph. mation process should be repeated with fresh randomness Finally, uncompromised nodes can apply hash operations for several times between a pair of nodes to ensure that with the received random seed to update their key rings. positive answers from bloom filters can be considered as Simulation results shows that this local broadcast mecha- correct with high probability. The additional communica- nism help speed up propagation of the random seed. Sec- tion overhead will also consume a large amount of nodes’ tion 2 decribes concept of bloom filters and CPA security energy. of symmetric encryption. Section 3 decribes a key update As networks structure of WSNs may vary due to factors scheme for revoking compromised nodes in WSNs and such as node failure or malfunctioning, single phase key defines unlinkability notion for this kind of key update pre-distribution is not able to adapt to dynamic changes schemes. The presented key update scheme is proven to in WSNs.To support a flexbile secure infrastructure for be unlinkable when the underlying symmetric encryption new nodes deployments,Albert Levi, and Salim Sarimu- scheme is assumed to be CPA secure. Section 4 concludes rat [10] suggested use of multiple generation of dynamic this paper. key pools. As nodes should evolve their key rings by it- erative hash computations by the end of each generation, it is implicit that their method requires all nodes to re- fresh their key rings synchronously. On the other hand, when WSNs are deployed in hostile environments, some 2 Preliminaries compromised keys should be revoked since they may have been broken by an adversary. A node is compromised if its key ring is a subset of the compromised keys controlled 2.1 Bloom Filter by an adversary. The scheme in [10] is unable to revoke compromised nodes from WSNs. Moreover, an adversary The concept of bloom filter is presented by [2], which is is still able to intercept key identifiers exchanged between a compact data structure used for answering set mem- nodes during a specific generation. Generally speaking, bership queries. Given l hash functions h1(·), ··· , hl(·) revoking nodes from WSNs is more difficult than addi- with range [1, m],a bloom filter BFm,l is a bit vector with tion of new nodes. length m. To represent a set S consisting of n elements by As a result, how to efficiently exclude compromised BFm,l, compute l entries h1(s), ··· , hl(s) for each element keys and nodes from WSNs and evolve uncompromised s ∈ S and set BFm,l[h1(s)] = 1, ··· ,BFm,l[hl(s)] = 1. nodes’ key ring in an unlinkable way is also an issue to be Given a membership query x, BFm,l returns a posi- addressed for random key pre-distribution. In this paper, tive answer BFm,l(x) = 1 to indicate that x ∈ S if T T we suggest an unlinkable key update mechanism based BFm,l[h1(x)] == 1 ··· BFm,l[hl(x)] == 1 is true; on bloom filters to ensure that uncompromised nodes can Otherwise it returns a negative answer BFm,l(x) = 0 to refresh their key rings securely while an adversary is in- indicate that x∈ / S. It is well known that a bloom filter feasible to link key identifiers with a node. will probably return false positive answers for member- Given a set of keys to be revoked, a key update node ship queries but its false negative rate is always zero. International Journal of Network Security, Vol.22, No.5, PP.857-862, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).16) 859

i+1 i i+1 2.2 Semantic Security by a key k = G(k ||seedi) ∈ K , where G(·) is a secure hash function. Given a symmetric encryption scheme Π = (KG, E, th D),where E is encryption function and D denotes decryp- 2) A node Nj is uncompromised at the start of i round tion function, define an experiment CPAA,where A is a i i Π if the set Rj\RK is not empty. In other words, an probabilistic polynomial time (PPT) adversary: uncompromised node must hold at least one unre- i voked key in its current key ring. Key ring Rj of 1) Challenger S generates a secret key k = KG(·); th an uncompromised node Nj at the start of i round i+1 th 2) A is given access to an encryption oracle Ok(·) that will be replaced by Rj by the end of the i round i i outputs a ciphertext c = Ek(m) when taking as input as follows: Each key kj ∈ Rj is replaced by a key a plaintext m. i+1 i i+1 kj = G(kj||seedi) ∈ Rj .

3) A outputs two distinct equal-length plaintexts m0 3) A PPT adversary that has knowledge of the revoked m1. keys RKi at the start of ith round should have not enough knowledge to link key identifiers of the key 4) S picks a random bit b ← {0, 1} and provides A with set Ri T{Ki\RKi} with the corresponding uncom- c∗ = Enc (m ). j k b th promised node Nj by the end of i round by inter- 5) A outputs a random bit b0. cepting communications between nodes.

n,M A symmetric encryption scheme is CPA secure if Define an experiment LinkΠ as a notion for unlinkbil- 0 1 |P r[b = b] − 2 | is negligible for any PPT adversary A ity, where M is a PPT adversary and Π is a symmetric A in the experiment CPAΠ. encryption scheme.

1) Challenger C generates a key pool by running the 3 An Unlinkable Key Update key generation algorithm of Π and selects n node Scheme for WSNs identifiers id1, ··· , idn; 2) M is allowed to choose h ≤ n − 2 compromised nodes 3.1 System Setup from {id1, ··· , idn} and keep their corresponding key

Assume there are n sensor nodes Nj, 1 ≤ j ≤ n, dis- rings. The compromised nodes’ identifiers is kept in tributed in a geographic region. Let ID be the set of key RID. identifiers, Ki be the key pool with constant size P at 3) M is given access to an oracle O (·) that outputs the start of the ith round,1 ≤ i. Each node N holds a C j communication transcript between a node id and a key ring Ri ⊆ Ki with fixed size r at the start of ith j key update node when taking as input a node identity round.M i : Ki → ID is a one-to-one mapping from the id. key pool of the ith round to the set of key identifiers. Ini- 1 tially, the key ring Rj hold by Nj contains symmetric keys 4) C outputs two distinct uncompromised nodes’ iden- 1 randomly sampled from a large key pool K at the start tifiers mid0 mid1 from {id1, ··· , idn}\RID. of first round. Afterwards, key pool Ki+1 will be derived from key pool Ki by executing the key update process 5) C picks a random bit b ← {0, 1} and provides A with described in subsection 3.2. communication transcript between the chosen node In case of key exposure, some compromised keys should midb and a key update node. be revoked and uncompromised nodes’ key rings should be 6) M outputs a random bit b0. evolved to exclude compromised nodes. Our key update process is divided into several rounds. A key update node A key update scheme is unlinkable if |P r[b0 = b] − 1 | is i 2 V constructs a set RK of revoked keys with size q at negligible for any PPT adversary M in the experiment th the start of the i round. For the sake of simplicity, Linkn,M . we assume the full visible communication assumption as Π in [9] that assumes each pair of nodes can communicate with each other directly. This assumption help speed up 3.2 One Round of Key Update Process propagation of a secure random seed. The key update node V first generates a bloom filter The key update node V maintains a table Sj that RBF i with n bits at the start of ith round by choos- nb,l b records key identifiers associated with each node Nj. V ing l hash functions h (·), ··· , h (·) ,and initializes all th 0 l−1 also picks a random secret seed seedi at the start of i entries of RBF i to zero.V executes the following Algo- round and must ensure the following hold by the end of nb,l rithm 1 to construct RBF i associated with the revoked ith round: nb,l key set RKi with size q. i i+1 We use notation RBF i (k) to denote an answer re- 1) Key pool K will be replaced by K by the end of nb,l ith round as follows: Each key ki ∈ Ki is replaced turn by the bloom filter RBF i for a membership query nb,l International Journal of Network Security, Vol.22, No.5, PP.857-862, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).16) 860

Algorithm 1 Construction of RBF be identified as compromised by algorithm 3.2. Let i 1: Begin Ej denotes the event that USKj is empty for some i 2: for each k ∈ RK do uncompromised node Nj. Tj is a random variable to i T i 3: for j = 0 to l − 1 do count the number of revoked keys in Rj RK . We i 4: RBF [h (k)] = 1; compute the probability of Ej as follows: nb,l j 5: end for X 6: end for P r[Ej] = P r[Ej|Tj = i]P r[Tj = i]. (1) 7: End 0≤i≤r−1 As RBF i is used to represent q revoked keys, the nb,l k. RBF i (k) = 1 is a positive answer to indicate that probability of a false positive event RBF i (k) == 1 nb,l nb,l k ∈ RKi. RBF i (k) = 0 is a negative answer to in- for some unrevoked key k ∈ Ri \{RKi} is approx- nb,l j i i imately (1 − (1 − 1 )l·q)l [12] if we assume the dicate that k∈ / RK . Define a set FRK = {kx|kx ∈ nb Ki\RKi T RBF i (k) == 1}. That is,FRKi contains hash functions are modeled by independent random nb,l all keys that are not revoked but get positive answers functions. Given Tj = j,Ej occurs if and only if from RBF i . RBF i (k) == 1 occurs for each of r − i unrevoked nb,l nb,l keys in Ri \RKi. By the independence assumption, Step 1. In the following, V broadcasts the bloom filter j we have: RBF i to all nodes. Having received RBF i ,a nb,l nb,l node executes Algorithm 2 to construct a set 1 l·q l·(r−i) P r[Ej|Tj = i] ≈ (1 − (1 − ) ) . (2) i i i i n USKj ⊆ Rj\RK . Recall that Rj is the key ring b th of node Nj with size r at the start of i round. As key rings assigned to nodes are assumed to be randomly sampled: Algorithm 2 Construction of unrevoked keys r q q 1: Begin P r[T = i] ≈ ( )i(1 − )r−i. (3) i j i P P 2: USKj = ∅ i 3: for k ∈ Rj do When taking parameters P = 1000, q = 100, l = 2, 4: Initialize the answer RBF i (k) = 1; nb,l r = 50, nb = 64, we get P r[Ej] ≈ 0.0197 by numer- 5: for j = 0 to l − 1 do ical computation. On the other hand, the optimum i i 6: if RBF [hj(k)] == 0 (a) then false positive rate of bloom filter RBF in this set- nb,l nb,l 7: Set the answer RBF i (k) = 0; and break; nb nb,l ting is ≈ 0.6185 q ≈ 0.7353 [12]. Choose larger val- 8: end if ues for parameters l and nb can further reduce P r[Ej] 9: end for at the cost of additional communication overhead. 10: if RBF i (k) == 0 (b) then nb,l i i S Step 2. When USKi is not empty,the corresponding un- 11: USKj = USKj {k}; j 12: end if compromised node Nj will send a response message 13: end for to V , which contains node identifier idj of Nj. 14: End Step 3. Having received response messages from un- i compromised nodes with non-empty set USKj, V i i i S i Claim 1. We have USKj = Rj\{RK FRK } by will randomly picks a node Nj among uncompro- the end of Algorithm 2. mised nodes that have sent response messages. Then V transmits a random number rV to the chosen Proof. By construction of the bloom Filter node Nj. RBF i ,if k ∈ RKi, we have RBF i (k) == 1 nb,l nb,l Step 4. Having received r , N randomly picks a holds. In addition, the set FRKi enumerates V j key k∗ ∈ USKi and transmits ciphertext c∗ = all keys k ∈ Ki\RKi with false positive answer j j j i i Ek∗ (idj||rV ) to V , which is encrypted under the cho- RBF (k) == 1.As a result, RBF (k) == 0 j nb,l nb,l ∗ i i S i sen symmetric key kj . holds if and only if k ∈ Rj\{RK FRK }. i i i i S i ∗ As Rj ⊂ K ,Rj\{RK FRK } is a subset of Step 5. Having received cj ,V performs Algorithm 3 to i i S i i i S i i K \{RK FRK }. When Rj\{RK FRK } construct a shared key SKj with the uncompromised i i S i node Nj. is not empty, k ∈ Rj\{RK FRK } implies RBF i (k) == 0 holds and we conclude that Remark: Condition (c) denotes extraction of the nb,l i matching key k by key identifier kid; Condition (d) k ∈ USKj by condition (b) in Algorithm 2. ∗ denotes decryption of the ciphertext cj under the ex- i i S i i i tracted matching key k. As Rj\{RK FRK } is a subset of Rj\RK , it is i ∗ possible that USKj is empty for some uncompro- By the correctness of decryption, we have kj = k, mised node. That is,some uncompromised node will where k is the extracted matching key. International Journal of Network Security, Vol.22, No.5, PP.857-862, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).16) 861

Algorithm 3 Key Extraction adversary to learn information of the plaintext and en- 1: Begin cryption key when he can only intercept ciphertexts. As i 2: Set SKj=NULL,flag = 0 ; a result, a PPT adversary is not able to link key identifiers 3: for kid ∈ Sj do with any uncompromised node Nj. −1 4: k = (M i) (kid); (c) Assume there is an adversary M can break unlinkabil- ∗ 5: if Dk(cj ) == idj||rV (d) then ity of our key uodate scheme with non-negligible proba- 6: flag = 1,SKi = k,break; bility. We construct an adversary A against a symmetric j A 7: end if encryption scheme Π = (KG, E, D) in CPAΠ. 8: n,M end for A simulates LinkΠ for M in one round as follows: 9: End A generates n > 1 node identifiers id1, ··· , idn; When M chooses h ≤ n − 2 compromised nodes in the simulated Linkn,M , we assume without loss of generality Step 6. V picks a random seed and broadcasts cipher- Π ∗ that they are (id1, ··· , idh). text c = ESKi (rV + 1||seed) to the chosen node Nj j A picks two distinct uncompromised identifiers mid and its neighboring nodes in the key graph. 0 and mid1 and we implicitly assume the corresponding two ∗ ∗ Having received c , Nj decrypts c to recover rV + nodes share a common key k that is the challenge secret ∗ A 1||seed = D i (c ). Then Nj utilizes seed to update key used by the challenger in CPAΠ. Then A generates SKj its key ring as follows: key rings for nodes in {id1, ··· , idn}\{mid0, mid1}. and M is provides with key rings of compromised nodes chosen Algorithm 4 Update Key Ring by him. n,M 1: Begin Oracle OC (·) in LinkΠ is simulated by A as follows: i i 2: for each k ∈ Rj do Given a node identifier id as input, if id∈ / {mid0, i+1 i i+1 mid1}, A can simply simulate communication transcript 3: k = G(k ||seed) ∈ Rj 4: end for between id and a key update node. Otherwise, A uses A 5: End oracle access to Ok(·) in CPAΠ to generate ciphertexts for simulating communication transcript between id ∈ {mid , mid } and a key update node. It is implicitly as- In addition, each neighboring node of N in the ran- 0 1 j sumed that the challenge secret key k in CPAA is chosen dom key graph can also try to decrypt the ciphertext c∗ Π as the shared key to generate communication transcript by iterating the keys shared with N . If their decryp- j in the simulated Linkn,M . tion operations are consistent with condition (d), these Π n,M neighboring nodes of Nj can also use seed to update their A outputs mid0 and mid1 in the simulated LinkΠ . own key rings.The rest of uncompromised nodes that do A submits two distinct equal-length plaintexts m0 = not get seed continue to interact with V by executing mid0||rV m1 = mid1||rV to challenger S in experiment A Steps 2-6 repeatedly until their key rings can be success- CPAΠ. S picks a random bit b ← {0, 1} and provides A ∗ fully updated by the end of ith round. with c = Enck(midb||rV ). When taking parameters P = 1000, q = 100, l = 2, A concatenates c∗ with the rest of communication tran- r = 50, nb = 64, n = 100, our simulation results shows script between node midb and a key update node.that can that Steps 2-6 should be looped by 35 times on average A be generated by access to Ok(·) in CPAΠ and provided to finish one round of the presented key update process M with the correctly generated full communication tran- in this parameters setting. script. When M outputs a random bit b0 in the simulated Claim 2. A PPT adversary that has knowledge of revoked n,M 0 A keys in RKi at the start of ith round is computationally LinkΠ , A also outputs a random bit b in CPAΠ. A infeasible to link key identifiers with any uncompromised By the above construction, A succeeds in CPAΠ with node Nj by intercepting communications between nodes, non-negligible probability by the assumption M can break if the underlying symmetric encryption scheme is CPA unlinkability of our key uodate scheme with non-negligible secure. probability. This contradicts the assumption that the underlying symmetric encryption scheme is CPA secure. Proof. Note that the bloom filter RBF i contains nb,l Hence the presented key update scheme is unlinkable. no information with respect to the unrevoked keys i T i i in Rj {K \RK } associated with an uncompromised ∗ ∗ node N . The ciphertexts c = E ∗ (id ||r ), c = j j kj j V E ∗ (r + 1||seed) encrypted under the symmetric key kj V Furthermore, it is relatively straightforward to see that ∗ ∗ ∗ k are the only sources that an adversary can gain in- the ciphertexts c = E ∗ (id ||r ), c = E ∗ (r + 1||seed) j j kj j V kj V i T i i formation about unrevoked keys in Rj {K \RK }. In- also provides authentication functionality between an un- tuitively, by semantic security of symmetric encryption compromised node and a key update node to prevent an schemes [11],it is computationally infeasible for a PPT adversary mount a forgery attack. International Journal of Network Security, Vol.22, No.5, PP.857-862, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).16) 862

4 Conclusions [7] P. Gupta, P. R. Kumar, “The capacity of wireless networks,” IEEE Transactions on Information The- Random key pre-distribution is an important security ory, vol. 46, no. 2, pp. 388–404, 2000. technique for WSNs. In this paper, we consider an unlink- [8] M. S. Hwang, C. T. Li, “A lightweight anonymous able key update mechanism for WSNs to revoke a subset routing protocol without public key en/decryptions of compromised nodes and evolve uncompromised nodes’ for wireless ad hoc networks,” Information Sciences, key rings. Our scheme uses a bloom filter as a compact vol. 181, no. 23, p. 5333-5347, 2011. representation for the set of revoked keys. The use of a [9] M. Klonowski and P. Syga, “Enhancing privacy for bloom filter make it unnecessary to exchange key identi- ad hoc systems with predeployment key distribu- fiers for nodes to identify revoked keys. As a bloom fil- tion,” Ad Hoc Networks, vol. 59, pp. 35–47, 2017. ter has zero false positive rate, we suggest using negative answers returned by the bloom filter to identify uncom- [10] A. Levi and S. Sarimurat, “Utilizing hash graphs for promised keys. Our analysis shows that uncompromised key distribution for mobile and replaceable intercon- nodes can recover unrevoked keys with high probability. nected sensors in the IoT context,” Ad Hoc Networks, Then a key update node broadcasts a random seed en- vol. 57, pp. 3–18, 2017. crypted by a shared unrevoked key to a specified uncom- [11] Y. Lindell, J. Katz, Introduction to Modern Cryp- promised node and its neighboring nodes in the key graph. tography, 2014. (https://repo.zenk-security. Then these nodes can use the recovered random seed to com/Cryptographie\%20.\%20Algorithmes\ update their local key rings. In the future, the key update %20.\%20Steganographie/Introduction\%20to\ mechanism may be considered in other communication in- %20Modern\%20Cryptography.pdf) terference model such as protocol interference model [7]. [12] M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probalistic Analysis, 2005. ISBN 13: 978-0521835404. Acknowledgments [13] J. Newsome, E. Shi, D. Song, A. Perrig, “The sybil attack in sensor networks: Analysis & defenses,” in The research is supported by Natural Science Foun- The Third International Symposium on Information dation in JiangSu province,P.R.China (Grant No. Processing in Sensor Networks (IPSN’04), pp. 259– BK20170512) and Natural Science Foundation of Col- 268, 2004. leges and Universities in Jiangsu Province, P.R.China (Grant No. 17KJB413003), Natural Science Founda- [14] M. B. Paterson and D. R. Stinson, “A unified ap- tion in Yangzhou City, JiangSu province, P.R. China proach to combinatorial key predistribution schemes (Grant No. YZ2016113),Yangzhou University Science and for sensor networks,” Design, Codes and Cryptogra- Technology Innovation Foundation Project?(Grant No. phy, vol. 71, no. 3, pp. 433–457, 2014. 2016CXJ025). [15] K. D. Wong, Y. H. Hu, D. Li, and A. M. Sayeed, “De- tection, classification, and tracking of targets,” IEEE Signal Processing Magazine, vol. 19, no. 2, pp. 17–29, References 2002. [16] O. Yagan and A. M. Makowski, “On the connectiv- [1] S. Akleylek, A. Karakaya, “A survey on security ity of sensor networks under random pairwise key threats and authentication approaches in wireless predistribution,” IEEE Transactions on Information sensor networks,” in International Symposium on Theory, vol. 59, no. 9, pp. 5754–5762, 2012. Digital Forensic and Security (ISDFS’18), pp. 1–4, Mar. 2018. [17] O. Yagan and A. M. Makowski, “Zero-one laws for [2] B. H. Bloom, “Space/time trade-offs in hash coding connectivity in random key graphs,” IEEE Transac- with allowable errors,” Communications of the ACM, tions on Information Theory, vol. 58, no. 5, pp. 2983– vol. 13, no. 7, pp. 422–426, 1970. 2999, 2012. [3] H. Chan, A. Perrig, D. Song, “Random key predistri- bution schemes for sensor networks,” in Symposium on Security and Privacy, pp. 197–213, 2003. Biography [4] A. Diaz and P. Sanchez, “Simulation of attacks for security in wireless sensor network,” Sensors, vol. 16, Bin Wang biography. Bin Wang received his Ph. D. no. 11, p. 1932, 2016. degree of communication and information system in [5] W. Diffie and M. Hellman, “New directions in cryp- Shanghai Jiaotong University, P. R. China. His research tography,” IEEE Transactions on Information The- interests include cryptography and network security. Dr. ory, vol. 22, no. 6, pp. 644–654, 1976. Wang is now an associate professor of Department of [6] V. Gligor, L. Eschenauer, “A key-management Electronics and Communication Engineering, Informa- scheme for distributed sensor networks,” in Pro- tion Engineering College, Yangzhou University, located ceedings of the 9th ACM conference on Computer in No.196 West HuaYang Road, Yangzhou city, Jiangsu and Communications Security (CCS’02), pp. 41–47, province, P. R. China. 2002. International Journal of Network Security, Vol.22, No.5, PP.863-868, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).17) 863

Survey on Attribute-based Encryption in Cloud Computing

P. R. Ancy, Addapalli V. N. Krishna, K. Balachandran, M. Balamurugan, and O. S. Gnana Prakasi (Corresponding author: P. R. Ancy)

Department of Computer Science, Engineering, Christ University Bengaluru, Karnataka, India (Email: [email protected]) (Received Feb. 10, 2020; Revised and Accepted June 8, 2020; First Online July 26, 2020)

Abstract are going through. For enhancing security in cloud stor- age many methods are introduced and one among them Attribute-Based Encryption (ABE) is an appropriate so- is ABE (Attribute-Based Encryption) [4]. ABE is a type lution to the access mechanism and security issues in of public-key encryption in which the sender encrypts the cloud computing. As we know cloud computing is the message using receiver’s public key and the receiver de- emerging technique and solution for problems such as crypt the message using the private key. The main key storage, security, efficiency, and many more facing today. point of ABE is outsourcing which means the data is en- As cloud computing arises as a new technology many is- crypted within the source or sender’s environment and sues and confusion are arising with it mainly regarding sends it to the service provider’s environment for storing the security of the data. The cloud providers are capable it. of storing enough data, but the user is having doubt about In ABE encryption and decryption are done using a how much security the cloud providers are giving to the set of attributes and access policy. Attributes are the data. Due to this reason, many organizations and com- secret key of a user and the ciphertext is dependent upon panies are not willing to move to the cloud environment. attributes. Like traditional identity-based encryption, the So, Attribute-Based Encryption came as a solution to this sender in an ABE system only needs to know the receiver’s problem. As per our comprehensive survey that has been description in order to determine their public key [2]. The done on different ABE schemes, we have included the re- decryption of the ciphertext is possible only if the set of cent ABE schemes, different access policy, problems, and attributes of the user key matches the attributes of the their solutions for ABE in this paper. Finally, this paper ciphertext. also includes the security comparison and efficiency com- parison of different ABE schemes. There are two types of access structure monotonic Keywords: Access Control; Ciphertext-Policy Attribute- access structure and non-monotonic access structure. Based Encryption; Multiauthority; Revocation Mecha- Monotonic access structure support “AND ” and “OR” nism between attributes and non-monotonic access structure support “NOT” with “AND” and “OR” between at- tributes. [10] proposed a method that can convert any 1 Introduction monotone circuit to an equivalent access tree. There are two types of ABE, Key Policy Attribute-Based Encryp- Security is one of the main aspect of cloud computing. tion (KP-ABE) and Ciphertext Policy Attribute-Based With the development of cloud computing, many data Encryption (CP-ABE). In KP-ABE data is encrypted us- owners store their data in the cloud server for simplifying ing attributes and decrypted using access policy and in local IT management and reducing the cost [17]. Cloud CP-ABE data is encrypted using access policy and de- storage is one of the major services provided by cloud crypted using attributes. About the number of author- computing [12]. It enables data owners to remotely host ities, a system has there are two types of system single their data by outsourcing them to cloud servers, which authority system and multiauthority system. In the case brings great convenience for both individuals and enter- of revocation, there are attribute revocation and user re- prises to share data over the Internet [16, 24, 25]. As vocation. Attribute revocation is changing the user’s at- today every organization is moving to cloud for differ- tributes because of the expiring of attributes, revoke of ent services their main concern is on how much secure is attributes or need of adding new attributes. And in the their data in the cloud provider’s environment. There- case of user revocation, the system should satisfy both for- fore this is the main area where many types of research ward and backward security. In forward security, revoked International Journal of Network Security, Vol.22, No.5, PP.863-868, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).17) 864 users should not access new ciphertext using an old secret ber of thresholds. The article also shows how to use the key and in backward security newly joined users not give polynomial interpolation-based threshold secret sharing the privilege to access the previous ciphertext. systems. Proposes two effective systems that linearly pro- portion the number of public shares to the number of re- spondents. 2 Related Work A novel multi secret-sharing system based on Her- mite interpolation is implemented [21]. According to the A new scheme is suggested [3] that can withstand any threshold as well as value and number of secrets the pro- number of corrupt authorities. The author applies this posed scheme is dynamic to the changes. This scheme technique to attain a multi-authority form of identical has a key feature of multi-usability. The article also pro- access control Attribute-Based Encryption. According to vides multi-user function over elliptic curves by solving this scheme each user has to prove his set of attributes to the discrete logarithm issue. This assumes that n mem- the third party to obtain a secret key. The user can use bers and the dealer is not one of them. He will unbiassed this secret key to decrypt the message. The main chal- set the system up and issues the standards. The mem- lenge of single authority Attribute-Based Encryption is bers will give their shares to a combiner, who is any one of preventing collusion. This problem is addressed in this pa- the members for reconstructing the secret. By extending per by introducing the multiauthority scheme. A scheme the Key-Policy ABE with attribute privacy preservation is intrduced by [1] for the identification of encrypted in- in cloud storage [8] proposes an Effective Attribute-based formation with complex access control called CP-ABE. Access Control with Authorized Search system. It is fur- According to the work, the author proposed that even if ther effective than current solutions on calculation and the storage server is untrusted, the information could be storing expenses. The aim is to suggest Key Policy-ABE kept confidential. And also, this method is secure against with partly concealed characteristics constructed on the collusion attacks. In this paper, the author used the con- search for sensitive keywords and to demonstrate that the cept of attributes. system is safe under the q-2 decisional bilinear DH sup- The characteristics are used to define the credentials position. [14] defines a threshold version of the Localized of a user, and information is determined by a party en- Multi-secret Sharing Scheme (LMSS). The author pro- cryption policy for who can decrypt. A new security vides lower limits on The decryption share size of localized model of honest but curious servers to identify possible multi-secret sharing systems in a particular background attacks [27]. This scheme allows the authority to cancel and gives clear development of systems. The author also any characteristic at any period by inserting a minimum analyses various methods to relax the model providing load on the user. This method applies to Key Policy ABE. trade-offs among the shared scope and between the num- The author addresses the attribute revocation issue in the ber of safety assurances given by this system to allow the attribute-based system. [11] addressed some challenging development of a small number of shared systems. issues in different scenarios and different policy updates. Ciphertext-Policy ABE is a promising solution for these issues like access policy and data storage. Some problems 3 Attribute-based Encryption can occur to these types of framework that is revocation problem. In this article, the author develops an access The concept of ABE is first introduced by Sahai and Wa- control policy that has an effective revocation capability ters on the paper called Fuzzy Identity-Based Encryp- that is user and attribute. tion [19]. In ABE data is encrypted using a set of at- An access policy is efficient to promise safety in cloud tributes. ABE is a scheme in which each user is identi- data [26]. Data storing in insecure cloud third par- fied by a set of attributes, and some function of those at- ity become an issue in deciding access policy. For this tributes is used to determine decryption ability for each ciphertext-policy, ABE is an adequate scheme because ciphertext [3]. Attribute-Based Encryption (ABE) is a the owner is deciding policy. But it is difficult to in- type of public-key encryption. Using this public-key en- troduce due to revocation issue. For this reason, the au- cryption, the message is encrypted and decrypted using a thor introduces an effective revocation access policy that public key and private key respectively for a specific re- applies to multiple authority scheme. Also, this system ceiver. In attribute-based encryption, it contains a set of can achieve both backward and forward safety. [6] sug- attributes where secret key and ciphertext are attributes gests access structures can represent by the monotonic dependent. Here a user can decrypt the ciphertext only or non-monotonic way. One of the best techniques to when the attributes of the ciphertext match the attributes store data with encryption is ABE. In this article, the of the user key. The user encrypts the data using an access author addresses the key escrow problem and find a per- policy or access structure. An example of access policy fect solution to solve it. The author resolves this key is Area=Italy AND age <30 AND Business=Researcher. escrow problem by introducing 2pc protocols in the sys- There are two types of access structure monotonic and tem. A new multilevel secret sharing system is introduced non-monotonic. Two kinds of ABE exist they are Key- by [13] expanding the Shamir’s to the global threshold Policy ABE and Ciphertext-Policy ABE. Due to some is- that is exclusively higher than the compartment’s num- sues in Key-Policy ABE such as access policy is storing in International Journal of Network Security, Vol.22, No.5, PP.863-868, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).17) 865 a third party so it may have a chance that he can change Attribute access policy so we are mainly focusing on Ciphertext- Authority Policy ABE. PK SK 3.1 Key Policy Attribute-based Encryp- tion Data Owners Users

In KP-ABE secret keys for users are generated based on CT CT access structure and encryption of data is done based on a set of attributes. The decryption of the message is done Cloud when user attributes satisfy the access structure. The Servers most common access structure used is a tree-based ac- cess structure. The main issue in Key-Policy ABE is ac- cess policy are storing in a third party so it may have a chance that he can change access policy. In this paper, we Figure 1: Single authority system are mainly focusing on ciphertext policy attribute-based encryption. Attribute Authority 3.2 Ciphertext Policy Attribute-based Encryption Attribute In CP-ABE data is encrypted using access trees and a Authority set of attributes is used for generating user’s secret keys. PK SK There are two types of system models for the CP-ABE scheme, single authority and multi-authority. In single Data Owners Users authority systems in Figure 1 there is only one attribute authority called central authority and all attributes of the CT CT system are managed by this central authority. In the case of a multi-authority system in Figure 2, there are multiple Cloud Servers attribute authority and attributes that are shared across these authorities. The main entities of system architec- ture are: Figure 2: Multi-authority system Attribute Authority: The entity which checks the va- lidity of user attributes ana sets up parameters and secret key. It is responsible for key updating and Decryption: The ciphertext is decrypted by data users revocation process. using the secret key. Data Owners: The entity that wants to store data Delegation: The updating and revocation process are safely on cloud storage. This is achieved by outsourc- handled by taking the private key and regenerate a ing data that is sending encrypted data to the cloud. new key. The data owner is the one who defines access policy to the data. 3.2.1 CP-ABE schemes Data Users: The entity which has a set of attributes In this section, we are discussing recent CP-ABE schemes and secret key to access the ciphertext. and the different access policies. A directly revocable Cloud Servers: The entity which has the charge of stor- attribute-based encryption (DR-CP-ABE) scheme [22] is ing encrypted data from data owners. introduced by constructing a complete subtree for access structure and is solved by the subset cover method. For In the framework, five algorithms are used: data confidentiality and to keep the privacy of signcryp- tor [18] a ciphertext-policy attribute-based signcryption Setup: The attribute authority generates the parameters (CP-ABSC) is used with an access policy of monotone for Public Key (PK) and Secret Key (SK). span programs. Using this method, they achieved shorter KeyGeneration: The attribute authority generates ciphertext size when compared to the schemes which al- Public Key and Secret Key based on the parameters ready exists. A new system architecture was introduced and attribute set. by [15] for securely sharing data to a cloud by a user of limited resources. Using a linear secret sharing scheme it Encryption: The message is encrypted using a Public achieves high- efficiency, fine-grainedness, and data con- Key and access structure. fidentiality. Managing access policy in a cloud storage International Journal of Network Security, Vol.22, No.5, PP.863-868, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).17) 866 system is an importing task [23] and a scalable ciphertext- this challenge anyone can find out a better scheme that policy attribute-based encryption (SCP-ABE) is intro- can achieve low processing time, which takes limited stor- duced for that. age space, limited resources and limited energy. One ma- The most common access policy used is a monotonic jor challenge addressed by [28] is protecting user’s per- access structure but here a different access policy called sonal privacy. This is an important issue that has to ad- blocked linear secret sharing scheme (BLSSS) is used. A dress at present. Along with this one should concentrate matrix is used to describe a tree circuit of access structure on developing a scheme that is efficient in communica- due to this special scheme scalability is achieved. Another tion and computation cost. Some issues addressed by the recent ABE scheme is the ciphertext-policy attribute- author are data confidentiality, efficient decryption test, based hierarchical document collection encryption (CP- efficiencies such as parameter size and time complexity of ABHE) scheme [5] which access trees are combined ac- algorithm. Scalable user revocation [24] is an important cording to attribute sets. This scheme uses an access tree issue. Here one can develop an ABE scheme that provides that contains only “AND” gate and so it is called mono- both forward security and backward security. The above tone access structure. Security of outsourced data is the discussed are the major challenges faced by the recent main concern of personal health records in the cloud [7] ABE schemes. and a hierarchical CP-ABE scheme with multiple author- ities was introduced. As hierarchical files are sharing the 3.2.3 Security author used a layered model of access structure. Access structures are combined to form a single one as this paper The different frameworks used to prove the system se- is dealing with the sharing of hierarchical files. curity is discussed in this section. The security of an ABE scheme can be proven using dual system encryp- An ABE scheme that is applicable for resource- tion in the standard model [22]. This scheme is indis- constrained mobile devices [9] is a lightweight attribute- tinguishable under chosen-plaintext attacks. Most of the based encryption (LBE) scheme. This paper introduced ABE schemes prove their security under decisional bilin- a scheme that is applicable to cyber-physical systems and ear Diffie-Hellman exponent (dBDHE) [5, 7, 9, 18, 20, 28]. based on monotone access structure. This means AND For some schemes, security analysis is done with random gate are used as access policy. Another one scheme for oracles [18, 23]. Some paper proves the security of the mobile cloud is decentralized multi-authority attribute- scheme against chosen-ciphertext attacks [15]. Security based encryption [20] scheme which is an appropriate so- can also be proven against a collusion attack [23]. One lution of key escrow problem and is based on a monotone of the main security issue is Key escrow problem. Ex- access policy. For preserving the privacy of the user, a isting multi-authority attribute-based encryption schemes CP-ABE scheme with hidden access policy [28] is intro- however still require a trusted central authority to pub- duced called hidden access policy CP-ABE (HP-CP-ABE) lish system parameters and to generate user secret keys. scheme is introduced. This scheme improves data confi- They give to the trusted central authority enough priv- dentiality and protects the user’s privacy. This is achieved ileges to access the plaintext information meant for the by an access policy of a monotonic access structure with user, a problem referred to as key escrow issue [20]. This the “AND” gate. scheme solves the key escrow problem by removing the central authority, without making use of any global user 3.2.2 Challenges identity. In this section, we are discussing different challenges and proposed solutions. The main problem of the existing 3.3 Comparison ABE scheme is the lack of user revocation mechanism [22] In this section, we compare the various existing CP-ABE and one solution for this is by introducing one more en- schemes and is given in the following Table 1. we com- tity in the system architecture called user revocation cen- pare different scheme against the access structure used, tre (URC). URC is maintaining a revocation list and all the number of authorities, revocation mechanism is ad- revocation tasks are outsource to this third-party organi- dressed or not and the security assumption followed by zation. Lack of scheme which achieves security goals such the scheme. as data confidentiality, privacy, fine-grainedness [15, 18] of cloud data sharing still unresolved. Managing access 3.4 Conclusion policy [23] is a critical issue of the ABE scheme. The existing policy managing schemes have high computation In this paper, we have done a comprehensive survey of re- complexity and communication overhead. One solution cent CP-ABE schemes along with the access policy used for this is using a blocked linear secret sharing scheme by these schemes. We also discussed the major chal- (BLSSS) in which a matrix is used to describe tree-circuit. lenges faced by CP-ABE schemes and the existing so- Another one main challenge concerning limited resource lution for that issue. One who wants to work on this devices and mobile cloud [9,20] is to reduce computation can take any mentioned issue and can develop a better and communication overhead. CP-ABE scheme. The security analysis model of differ- Even though many ABE schemes are introduced for ent schemes also discussed. Finally, we compare different International Journal of Network Security, Vol.22, No.5, PP.863-868, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).17) 867

Table 1: Comparison of CP-ABE schemes

Revocation Security Scheme Access structure Authority mechanism assumption Fu, J., and Wang, N. [5] Tree Single No DBDH Gadge.,Snehlata et al [6] Tree Single Yes - Guo, R., et al [7] Tree Multiple Yes DBDH Hao, Liu., et al [8] Tree Single No DBDH He, Q., et al [9] LSSS Single Yes DBDH Hur, Junbeom., et al [11] Tree Single Yes - Li, J., et al [15] LSSS Single No CCA Liu.,Fan., et al [17] LSSS Single No DBDHE Rao, Y. S. [18] Tree Single Yes DBDH Arthur Sandor, V. K., et al [20] LSSS Multiple No DBDH Wang, H., et al [22] LSSS Single Yes CPA Wang, J., et al [23] BLSSS Single Yes DBDH Wei, J., et al [24] LSSS Multiple Yes DBDH Yang, Kan., et al [26] LSSS Multiple Yes BDHE Zhang, L., et al [28] Tree Single No DBDH

schemes against the access structure used, the number of [8] J. Hao, J. Liu, H. Wang, L. Liu, M. Xian, and authorities, revocation mechanism is addressed or not and X. Shen, “Efficient attribute-based access control the security assumption followed by the scheme. with authorized search in cloud storage,” IEEE Ac- cess, 2019. [9] Q. He, N. Zhang, Y. Wei, and Y. Zhang, References “Lightweight attribute based encryption scheme for mobile cloud assisted cyber-physical systems,” Com- [1] J. Bethencourt, A. Sahai, and B. Waters, puter Networks, vol. 140, pp. 163–173, 2018. “Ciphertext-policy attribute-based encryption,” [10] P. Hu and H. Gao, “A key-policy attribute-based in 2007 IEEE symposium on security and privacy encryption scheme for general circuit from bilinear (SP’07), IEEE, pp. 321–334, 2007. maps,” International Journal of Network Security, [2] Z. Cao, L. Liu, and Z. Guo, “Ruminations on at- vol. 19, no. 5, pp. 704–710, 2017. tributebased encryption,” International Journal of Electronics and Information Engineering, vol. 8, [11] J. Hur and D. K. Noh, “Attribute-based access con- no. 1, pp. 9–19, 2018. trol with efficient revocation in data outsourcing [3] M. Chase, “Multi-authority attribute based en- systems,” IEEE Transactions on Parallel and Dis- cryption,” in Theory of Cryptography Conference, tributed Systems, vol. 22, no. 7, pp. 1214–1221, 2010. Springer, pp. 515–534, 2007. [12] M. S. Hwang, T. H. Sun, C. C. Lee, “Achieving [4] P. S. Chung, C. W. Liu, and M. S. Hwang, “A study dynamic data guarantee and data confidentiality of of attribute-based proxy re-encryption scheme in public auditing in cloud storage service,” Journal of cloud environments”, International Journal of Net- Circuits, Systems, and Computers, vol. 26, no. 5, work Security, vol. 16, no. 1, pp. 1-13, 2014. 2017. [5] J. Fu and N. Wang, “A practical attribute-based doc- [13] P. S. Kumar, R. R. Kurra, A. N. Tentu, and G. Pad- ument collection hierarchical encryption scheme in mavathi, “Multi-level secret sharing scheme for mo- cloud computing,” IEEE Access, vol. 7, pp. 36 218– bile ad-hoc networks,” International Journal of Ad- 36 232, 2019. vanced Networking and Applications, vol. 6, no. 2, [6] M. S. V. Gadge, “Analysis and security based on at- pp. 22–53, 2014. tribute based encryption for data sharing,” Interna- [14] T. M. Laing, K. M. Martin, M. B. Paterson, and tional Journal of Emerging Research in Management D. R. Stinson, “Localised multisecret sharing,” Cryp- & Technology, pp. 2278–9359, 2014. tography and Communications, vol. 9, no. 5, pp. 581– [7] R. Guo, X. Li, D. Zheng, and Y. Zhang, “An 597, 2017. attribute-based encryption scheme with multiple au- [15] J. Li, Y. Zhang, X. Chen, and Y. Xiang, “Secure thorities on hierarchical personal health record in attribute-based data sharing for resource-limited cloud,” The Journal of Supercomputing, pp. 1–20, users in cloud computing,” Computers & Security, 2018. vol. 72, pp. 1–12, 2018. International Journal of Network Security, Vol.22, No.5, PP.863-868, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).17) 868

[16] C. W. Liu, W. F. Hsien, C. C. Yang, and M. S. Biography Hwang, “A survey of attribute-based access control with user revocation in cloud data storage,” Inter- Ancy P. R. is currently pursuing her PhD in Com- national Journal of Network Security, vol. 18, no. 5, puter Science and Engineering from CHRIST(Deemed to pp. 900–916, 2016. be University), Bangalore, India. She has received her [17] Z. Liu and Y. Fan, “Provably secure searchable M.Tech in Computer Science and Engineering from Mar- attribute-based authenticated encryption scheme,” ian Engineering College, India. She has worked as As- International Journal of Network Security, vol. 21, sistant Professor for over three years.Her research inter- no. 2, pp. 177–190, 2019. est includes Network Security, Cryptography and Cloud Computing. [18] Y. S. Rao, “A secure and efficient ciphertext-policy attribute-based signcryption for personal health Addepalli V. N. Krishna is working as Professor records sharing in cloud computing,” Future Gener- in Computer Science and Engineering department at ation Computer Systems, vol. 67, pp. 133–151, 2017. CHRIST(Deemed to be University), Bangalore, India. [19] A. Sahai and B. Waters, “Fuzzy identity-based en- He received Ph.D in 2010 at Department of Computer cryption,” in Annual International Conference on Science and engineering, Acharya Nagarjuna University, the Theory and Applications of Cryptographic Tech- AP, India and M. Tech degree in 2001 from department niques, Springer, 2005, pp. 457–473. of Computer Science and Engineering, Birla Institute of Technology, Mesra, Ranchi, India. He is having 25 years [20] V. K. A. Sandor, Y. Lin, X. Li, F. Lin, and of teaching experience at undergraduate and postgradu- S. Zhang, “Efficient decentralized multi-authority at- ate engineering level. His research interests includes Se- tribute based encryption for mobile cloud data stor- curity algorithm design in wireless sensor networks, Ad- age,” Journal of Network and Computer Applica- vanced key management algorithms for computer security tions, vol. 129, pp. 25–36, 2019. and protocol design for network security. [21] M. H. Tadayon, H. Khanmohammadi, and M. S. Haghighi, “Dynamic and verifiable multi-secret shar- K. Balachandran is working as H.O.D and Profes- ing scheme based on hermite interpolation and bilin- sor in Computer Science and Engineering department at ear maps,” IET Information Security, vol. 9, no. 4, CHRIST(Deemed to be University), Bangalore, India. He pp. 234–239, 2014. received Ph.D in 2015 at Department of Computer Sci- [22] H. Wang, Z. Zheng, L. Wu, and P. Li, “New directly ence and engineering, Anna University, India.He pursued revocable attribute-based encryption scheme and its his BSc and MCA from Bharathidasan University, MSc application in cloud storage environment,” Cluster Physics from Annamalai University, MTech (IT) from Computing, vol. 20, no. 3, pp. 2385–2392, 2017. Allahabad Agricultural University and MPhil from Ala- gappa University. He has worked as a scientific officer [23] J. Wang, C. Huang, N. N. Xiong, and J. Wang, for two decades with the Department Atomic Energy, In- “Blocked linear secret sharing scheme for scalable at- dia. Many of his research papers have won the best paper tribute based encryption in manageable cloud stor- award at national and international conferences. His area age system,” Information Sciences, vol. 424, pp. 1– of research includes Data Science, Networks, Big data An- 26, 2018. alytics and Computer Architecture. [24] J. Wei, W. Liu, and X. Hu, “Secure and effi- cient attribute-based access control for multiauthor- Balamurugan M. is working as Associate Professor ity cloud storage,” International Journal of Network in Computer Science and Engineering department at Security, vol. 12, no. 2, pp. 1731–1742, 2016. CHRIST(Deemed to be University), Bangalore, India. He received Ph.D in 2013 at Department of Computer Sci- [25] C. Yang, Q. Chen, Y. Liu, “Fine-grained outsourced ence and engineering, Anna University, India. He is hav- data deletion scheme in cloud computing,” Interna- ing 13 years of teaching experience at undergraduate and tional Journal of Electronics and Information Engi- postgraduate engineering level. His area of research in- neering, vol. 11, no. 2, pp. 81–98, 2019. cludes Machine Learning, Big data Analytics and Data [26] K. Yang and X. Jia, “Expressive, efficient, and re- mining. vocable data access control for multi-authority cloud storage,” IIEEE Transactions on Parallel and Dis- O. S. Gnana Prakasi is working as Assistant Profes- tributed Systems, vol. 25, no. 7, pp. 1735–1744, 2013. sor in Computer Science and Engineering department [27] S. Yu, C. Wang, K. Ren, and W. Lou, “Attribute at CHRIST(Deemed to be University), Bangalore, India. based data sharing with attribute revocation,” in She received Ph.D. in 2017 from Anna University, India. Proceedings of the 5th ACM Symposium on Informa- She received her B.Tech. degree in Information Technol- tion, Computer and Communications Security, pp. ogy and M.E. degree in Software Engineering from Anna 261–270, 2010. University. Her research interest includes, Mobile Ad hoc Networks, IoT, Software Engineering and Machine Learn- [28] L. Zhang, Y. Cui, and Y. Mu, “Improving security ing. and privacy attribute based data sharing in cloud computing,” IEEE Systems Journal, 2019. International Journal of Network Security, Vol.22, No.5, PP.869-873, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).18) 869

A Sequential Cipher Algorithm Based on Feedback Discrete Hopfield Neural Network and Logistic Chaotic Sequence

Shoulin Yin, Jie Liu, and Lin Teng (Corresponding author: Jie Liu)

Software College, Shenyang Normal University Shenyang, Liaoning 110034, China (Received Dec. 22, 2018; Revised and Accepted June 24, 2019; First Online Feb. 26, 2020)

Abstract known chaotic sequence as the public key, and the trained neural network weight and threshold param- The traditional Logistic chaotic sequence is easy to be re- eters are as the private key to implement data en- constructed and when using it to encrypt data, it is easily cryption; to leak sensitive information. Therefore, an external key is introduced to encrypt the initial value and parame- 2) Using the chaotic attraction of the discrete Hopfield ters of the Logistic equation. Then we conduct sensitive network and the unidirectional mapping of the initial and diffusion process for feedback discrete Hopfield neu- state and the attractor, the stable attractor of the ral network based on the Logistic sequence sensitivity to discrete Hopfield network can be encrypted as a key. the initial value. And by updating the control parame- ters, it produces good sequence key with random chaos The neural network mutual learning model and the iterative operation. The final algorithm analysis and sim- chaotic system were mutually interfered, and a new com- ulation experiments show that the key of the algorithm posite stream cipher was proposed in reference [15]. A has a good sensitivity, the generated random sequence has novel chaos-based hybrid encryption algorithm design for good randomness, which satisfies the requirement of cryp- secure and effective image encryption was presented. To tography. The pseudo-random sequences constructed by design the algorithm, the Zhongtang chaotic system had the algorithm are characterized by good randomness and been selected because of its rich dynamic features and complexity. its dynamical analysis was performed. On the base of Keywords: Feedback Discrete Hopfield Neural Network; this system, a new chaos-based random number genera- Logistic Chaotic Sequence; Sensitive and Diffusion Pro- tor (RNG) was developed and usefulness of the designed cess RNG in an encryption process was shown over NIST 800- 22 randomness tests in reference [21]. In reference [11], the Hermite orthogonal polynomial was introduced into 1 Introduction the neural network excitation layer, and the ”one-time- one-density” asynchronous encryption algorithm was re- With the rapid development of computer technology and alized. Pareek [12] proposed a chaotic encryption scheme multimedia technology, multimedia communication has by combining the chaotic attraction of Hopfield network gradually become an important way for people commu- with the linear feedback shift register. The most impor- nicating with each other [2, 3, 9]. Information security tant thing to apply chaos and neural network to data has been an important issue that is closely related to our encryption was that it used the chaotic characteristics lives [7, 8, 17]. The most important way to protect infor- of chaotic sequences. However, Zhang [19] also pointed mation security is data encryption. The discrete Hopfield out that there are defects in the sequence reconstruction neural network was proposed by Liu [10], so the combi- of chaotic sequences in data encryption. Moreover, the nation of chaos and neural networks for data encryption reference [4] completely reconstructed the four-point and has been continuously developed. At present, the main sixteen-point sequence fragments of the Logistic equation. research methods are as follows: Therefore, the simple chaotic sequence encryption method is the risk of being cracked. Aiming at this problem, this 1) Using a known chaotic sequence to train the neural paper introduces an external key to encrypt the Logis- network model, so that the neural network can ap- tic equation, and proposes a piecewise discrete Hopfield proximate the same chaotic sequence, then use the neural network model. Through a sensitive and diffusion International Journal of Network Security, Vol.22, No.5, PP.869-873, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).18) 870 process, the chaotic network model also has good sensi- there is an unpredictable relationship between the state tivity and good random generation sequence. messages contained in its attraction domain. If the join The rests of the paper are organized as follows. Sec- weight matrix T is changed, the attractor and its cor- tion 2 introduces the Logistic mapping. Discrete Hopfield responding attraction domain will change too. After the neural network Model is illustrated in Section 3. Section 4 introduction of random transformation matrix H, the ini- and Section 5 outline the quantitative processing and new tial state S and attractor Sµ will be updated by Sˆ = SH algorithm analysis. Section 6 finally concludes this paper. and Sˆµ = SµH and get new initial state S and attractor Sˆµ, and this process is unilateral, irreversible [14]. 2 Logistic Mapping 4 Quantitative Processing Logistic mapping [5,16,18,20] is a classic chaotic sequence mapping. Because of its simple implementation method, In order to apply the random sequence generated by dis- it is easy to be reconstructed by the thief using phase crete Hopfield neural network into data encryption, this space to construct the form of chaotic equation. There- paper introduces the conversion function T (x) to convert fore, in order to enhance the confidentiality of the Logistic the generated random sequence into a 0 − 1 random se- mapping, this paper introduces an external key to encrypt quence. T (x) is defined as follows: the initial values and parameters of the Logistic map. The 24-bit binary number K1 is used as an external T (x) = 0, x ∈ [2n/N, (2n + 1)/N]. key to initialize the initial values and parameters of the Logistic map. Let K1 be expressed in hexadecimal as Where N is an integer in interval [10, +∞], n is an integer ˜ ˜ K1 = k1k2k3k4k5k6, and ”k1k2k3k4k5k6” is an external in interval [0, N], N = bN/2c. Since the random value key. Where ”k1k2k3” is used to generate the input control generated by the network is in the interval (0, 1), this parameter r of the Logistic mapping. ”k4k5k6” is used to paper divides the interval (0, 1) into N equal parts. The generate the initial value X0 of the Logistic mapping. larger the N value is, the finer the interval division is and In this article, the Logistic mapping is as follows: the data precision is higher.

Xn+1 = r × Xn(1 − Xn),Xn ∈ (0, 1). 4.1 Generating a 0 − 1 Random Sequence

Where r is the input control parameter and X0 is the 1) Input the key K1 to initialize the Logistic mapping initial value of the Logistic mapping. They are generated and iteratively calculate the Logistic mapping 200 by an operation of an external key. During the operation, times. Let X = [x101x102x103, ··· , x199x200] be used ”k1k2k3k4k5k6” is converted into a corresponding decimal to store the next 100 chaotic values. number for operation. Therefore, we can get, 2) Use X to initialize the weight, threshold of the diffu- k /162 + k /16 + k sion matrix W and the discrete Hopfield neural net- r = 4 + 1 2 3 . 162 work. W is a 4 × 4 matrix, W1 is a 2 × 4 matrix, B1 2 is a 2 × 1 matrix, W2 is a 1 × 2 matrix, and B2 is k4/16 + k5/16 + k6 X = . (1) a 1 × 1 matrix.Extract elements from X to initialize 0 163 W , W1, B1, W2, and B2, respectively. ˜ 3 Discrete Hopfield Neural Net- 3) Input the key K2 and obtain D1 through initial work Model grouping and transformation. 4) Input D˜ 1 to the discrete Hopfield neural network, Assuming that each Neuron state is 0 or 1, the next state and a random value D3 is obtained through network Si(t + 1) depends on current state Si(t). So operation; then the values of the control parameters

N−1 Q1 and Q2 are continuously updated until a random Si(t + 1) = σ(Σj=0 Ti,jSi(t) + θj), i = 0, 1, ··· ,N − 1, sequence of the desired length is obtained. where the threshold of neuron i is θj, and the connection 5) The generated random sequence is converted to a weight between neuron j and i is Ti,j. σ(x) is any non- corresponding binary random sequence by a quan- linear function, is set as the unit step function. Then the tization function T (x). energy function of the system at time t is: X E(t) = 0.5 Ti,jSi(t)Sj(t). 5 New Algorithm Analysis i,j This paper analyzes the key space size of the new algo- Hopfield has proved that Equation (1) decreases monoton- rithm, the number of 0/1 statistics, the correlation of ran- ically with the evolution of system state, and eventually dom sequences and the sensitivity of the key through the- it will reach a stable state, namely chaos attractor. And oretical analysis and experimental simulation, and draws International Journal of Network Security, Vol.22, No.5, PP.869-873, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).18) 871

Table 1: Statistical analysis results Table 2: Statistical analysis result of LET Testing parameter Sequence 1 Sequence 2 Testing parameter Sequence 1 Sequence 2 n0 128 130 n0 127 126 n1 129 127 n1 129 132 r 129 131 r 130 128 −4 −5 −5 −4 y1 5.37e 5.98e y1 6.18e 2.51e y2 1.3214 -6.7892 y2 -0.972 -0.864 y3 -0.1179 0.1225 y3 0.259 -0.397

corresponding conclusions. The simulation experiment Table 3: Statistical analysis result of RBC data is as follows: n0 = 120, n1 = 8, n2 = 10, α = 2, Testing parameter Sequence 1 Sequence 2 T N = 128, Q1 = [0.5, 0.5] , Q2 = 0.5. n0 135 140 n1 121 116 r 140 135 5.1 Key Space Analysis −3 −2 y1 5.34e 1.23e y2 1.467 2.132 In this paper, the encryption key consists of K1 and K2, y 1.792 1.235 and the key space is determined by the length of K1 and 3 K2. Let their lengths be L1 and L2, respectively. The (L1+L2) key space is 2 . The larger L1 + L2 is, the larger the key space is. In this paper, L1 = 24, L2 = 16, and 40 In order to further verify the randomness of the gen- the key space size is 2 . Obviously, the length of L2 erated sequences, the above two randomly selected ran- is variable. Increasing the length of L2 can increase the size of the key space. However, when the length of the dom sequences are compared with the statistical analysis results in the reference LET [6] and RBC [1]. The statis- key L2 is increased, the number of inputs of the discrete Hopfield neural network will also increase. The number tical analysis results of the LET and RBC are shown in of corresponding discrete Hopfield neural network layers Table 2. will increase too, the time overhead of the network itera- From Table 2, it can be seen that the difference be- tive operation will increase. When the generated random tween ”0” and ”1” in the random sequence generated by sequence is very large, the time overhead of running the new algorithm is smaller than the difference between ”0” entire network will be very large too. and ”1” in the literature [13]. Therefore, the 0/1 sequence generated in this paper is more random. At the same time, the frequency test value y1, the sequence test value 5.2 Statistical Analysis y2 and the run test value y3 are all smaller than the corre- sponding values in [6]. Therefore, the 0/1 sequence cant A valid binary random sequence must satisfy the 0/1 ra- better pass the frequency test, sequence test and run test. tio. Therefore, the purpose of this test is to determine if Compared with the literature [6], the difference between the ratio of 0/1 in the sequence is approximately equal to ”0” and ”1” in the random sequence in this paper is close the ratio of 0/1 in the true random sequence. At the same to the difference between ”0” and ”1” in the literature [6]. time, this paper refers to the method of [13] for the fre- And, the frequency test value y1 and the run test value quency test, sequence test and run test of the generated y3 in this paper are smaller than the corresponding val- random sequence. Therefore, two random sequences of ues in the literature [6] in some cases, which indicates that length 256 are randomly selected for statistical analysis. the 0/1 sequence can better pass the frequency test and The analysis results are shown in Table 1. the run test in some cases. However, the sequence test In Table 3, n0 denotes a number of 0, n1 is number of value y2 in this paper is larger than the corresponding 1, r is total number of run test, y1 is frequency test value, value in [6], which indicates that the 0/1 sequence gener- y2 is sequence test value, y3 is run test value. ated in [6] can pass the sequence test better. In general, It can be seen from Table 1 that the number of ”0” the proposed algorithm outperforms the literature in fre- and ”1” in sequence 1 and sequence 2 are nearly equal, quency test, sequence test and run test [1], and in some satisfying the requirements of random sequence. At the cases is superior to the literature [6]. same time, the frequency test value y1 is less than 3.84, which can pass the frequency test. The sequence test 5.3 Correlation Analysis value y2 is less than 5.99, which can pass the sequence test. The run test value y3 is much less than 1.96, which The change in the autocorrelation function of the se- can pass the run test. Therefore, the generated random quence is smaller, the better the randomness of the se- sequence has good randomness. quence is. The cross-correlation function of the sequence International Journal of Network Security, Vol.22, No.5, PP.869-873, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).18) 872 is close to zero, the more unrelated the two sequences 5.5 Anti-Matrix Analysis and Difference are. Figure 1 is the autocorrelation function of the 0/1 Analysis sequence generated when the initial keys are K1 and K2. Figure 2 is a comparison of the 0/1 sequence generated If using a public key encryption system, from the or- thogonal decomposition, the singular value decomposi- when the key K1 or K2 is randomly changed by one bit. The solid line represents the original sequence and the tion and triangular decomposition, they demonstrate that dotted line represents the new sequence. Figure 3 is a the HNN network’s safety is reliable. Because the entire cross-correlation function diagram of two sequences. password system is irregular in the process of encryption, even if the same clear sequence encrypted, the obtained ciphertext sequence cannot be the same. Moreover, in 5.4 Key Sensitivity Analysis the decryption process using the attracting method, the The chaotic sequence generated by the Logistic map has differential cryptanalysis for the algorithm is invalid. good sensitivity to the initial value. Since the key K1 is used to generate the initial value of the Logistic equation, K1 also has good sensitivity. At the same time, this pa- 6 Conclusion per uses Logistic map to sensitive and diffuse the key K2, In this paper, we propose a sequential cipher algorithm which makes K2 also have good sensitivity. In order to based on feedback discrete Hopfield neural network and verify the sensitivities of the keys K1 and K2, one of the keys is changed, and the percentage of the difference be- logistic chaotic sequence. The key is processed with sen- tween the new random sequence and the original random sitive and diffused to enhance its sensitivity. After ex- sequence is counted. Let i denote the position number periment demonstration, the new algorithm has reliable security and high efficiency than other methods. corresponding to each of K1 and K2, then 1 ≤ i ≤ ϑ, ϑ is the sum of the lengths of the keys K1 and K2. In this paper, ϑ = 40. NP represents the percentage of the new random sequence and the original random sequence Acknowledgments as the percentage of the total number of sequences. The This work is supported by Science & Technology Plan- formula of NP is: ning Project of Gansu Province under Grant No. NK 1610RJZE135. X NP (i) = ( (A(n)))/(NK) × 100%. n=1  0 D(n) = D0(n) References A(n) = 1 D(n) 6= D0(n). [1] S. A. K. Albermany, F. R. Hamade, G. A. Saf- Where D(n) and D0(n) represent the original random se- dar, ”New random block cipher algorithm,” in Inter- quence and the new random sequence, respectively. NK national Conference on Current Research in Com- represents the total number of bits of the sequence D(n). puter Science & Information Technology, 2017. DOI: NP (i) represents the percentage corresponding to the 10.1109/CRCSIT.2017.7965555. change of the i − th bit. [2] J. Gao, P. Li, Z. Chen, ”A canonical polyadic deep Depending on the strict avalanche criterion in the block convolutional computation model for big data feature cipher measure, changing any bit in the key should re- learning in Internet of Things,” Future Generation sult in a change of approximately 50% of the bits in the Computer Systems, vol. 99, pp. 508-516, Oct. 2019. ciphertext. Figure 4 shows the percentage statistics ob- [3] J. Gao, J. Li and Y. Li, ”Approximate event de- tained for the key length ϑ = 40 and the sequence length tection over multi-modal sensing data,” Journal of NK = 20000. Combinatorial Optimization, vol. 32, pp. 1002-1016, It can be seen that the percentage of the sequence gen- 2016. erated when the key changes by one bit is close to 50%, [4] J. S. Gao, B. Y. Sun, W. Han, ”Construction of the which satisfies the strict avalanche criterion.Therefore, control orbit function based on the chaos theory,” both keys K1 and K2 are sensitive. Electric Machines & Control, 2002-02, 2002. In summary, the 0/1 sequence generated by the algo- [5] M. S. Hwang, S. F. Tzeng, C. S. Tsai, ”General- rithm has good randomness and the key has strong sen- ization of proxy signature based on elliptic curves,” sitivity.In addition, this paper uses a 24-bit external key Computer Standards & Interfaces, vol. 26, no. 2, to generate chaotic initial values and control parameters. pp. 73-84, Mar. 2004. Compared with the literature [16] directly using chaotic [6] R. Karmakar, S. Chatopadhyay, R. Kapur, ”Encrypt initial values and control parameters as keys, the key of flip-flop: A novel logic encryption technique for se- this paper is more convenient to manage. At the same quential circuits,” Computer Science, 2018. (https: time, it solves the problem that the Logistic sequence pro- //arxiv.org/pdf/1801.04961.pdf) posed in [15] is easy to be reconstructed, which makes the [7] S. Khatoon, T. Thakur, B. Singh, ”A provable secure security of the key better. and escrow-able authenticated group key agreement International Journal of Network Security, Vol.22, No.5, PP.869-873, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).18) 873

protocol without NAXOS trick,” International Jour- nal Processing, vol. 7, no. 6, pp. 1215-1221, Nov. nal of Computer Applications, vol. 171, no. 3, pp. 1-8, 2016. 2017. [18] S. Yin, L. Teng, J. Liu, ”Distributed searchable [8] M. Liu and F. Long, ”Stream cipher algorithm based asymmetric encryption,” Indonesian Journal of Elec- on piecewise linear chaotic networks,” Computer Ap- trical Engineering and Computer Science, vol. 4, plications and Software, vol. 33, no. 9, pp. 306-309, no. 3, pp. 684-694, 2016. 2016. [19] X. Zhang, F. Han, Y. Niu, ”Chaotic image encryp- [9] J. Liu, S. L. Yin, H. Li and L. Teng, ”A density-based tion algorithm based on bit permutation and dy- clustering method for k-anonymity privacy protec- namic DNA encoding,” Computational Intelligence tion,” Journal of Information Hiding and Multime- and Neuroscience, vol. 2017, pp. 11, 2017. (https: dia Signal Processing, vol. 8, no. 1, pp. 12-18, Jan. //doi.org/10.1155/2017/6919675) 2017. [20] Q. Zhang, L. T. Yang, X. Liu, Z. Chen, and P. Li, [10] Z. Liu, L. Zhang, L. V. Xue, et al., ”Evaluation ”A tucker deep computation model for mobile multi- method about bus scheduling based on discrete hop- media feature learning,” ACM Transactions on Mul- field neural network,” Journal of Transportation Sys- timedia Computing, Communications and Applica- tems Engineering & Information Technology, vol. 11, tions, vol. 13, no. 3, pp. 1-39:18, 2017. no. 2, pp. 77-83, 2011. [21] U. Cavu?o?lu, S. Kacar, A. Zengin, I. Pehlivan, ”A [11] Q. Meng, S. Yu, H. Liu, et al., ”A novel blind novel hybrid encryption algorithm based on chaos detection algorithm based on improved compound and S-AES algorithm,” Nonlinear Dynamics, vol. 92, sine chaotic neural networks,” in IEEE International no. 4, pp. 1745-1759, 2018. Conference on Communication Technology, 2016. DOI: 10.1109/ICCT.2015.7399973. [12] N. K. Pareek, ”Design and analysis of a novel digital Biography image encryption scheme,” International Journal of Network Security & Its Applications, vol. 4, no. 2, Shoulin Yin received the B.Eng. and M.Eng. degree 2012. from Shenyang Normal University, Shenyang, Liaoning [13] L. Teng, H. Li, J. Liu and S. Yin, ”An efficient and province, China in 2016 and 2013 respectively. Now, he secure cipher-text retrieval scheme based on mixed is a doctor in Harbin Institute of Technology. His re- homomorphic encryption and multi-attribute sorting search interests include Multimedia Security, Network Se- method,” International Journal of Network Security, curity, Filter Algorithm, image processing and Data Min- vol. 20, no. 5, pp. 872-878, 2018. ing. Email:[email protected]. [14] L. Teng, H. Li, S. Yin, ”A multi-keyword search algo- Jie Liu is a full professor in Software College, Shenyang rithm based on polynomial function and safety inner- Normal University. He received his B.S. and M.S. de- product method in secure cloud environment,” Inter- grees from Harbin Institute of Technology. He is also a national Journal of Network Security, vol. 8, no. 2, master’s supervisor. He has research interests in wire- 2017. less networks, mobile computing, cloud computing, social [15] C. Tie-Ming, J. Rong-Rong, ”New hybrid stream networks, network security and quantum cryptography. cipher based on chaos and neural networks,” Acta Professor Liu had published more than 30 international Physica Sinica, vol. 62, no. 4, pp. 191-201, 2013. journal papers (SCI or EI journals) on the above research [16] S. F. Tzeng, M. S. Hwang, ”Digital signature with fields. Email:[email protected]. message recovery and its variants based on elliptic curve discrete logarithm problem,” Computer Stan- Lin Teng received the B.Eng. degree from Shenyang dards & Interfaces, vol. 26, no. 2, pp. 61-71, Mar. Normal University, Shenyang , Liaoning province, China 2004. in 2016. Now, she is a laboratory assistant in Software [17] S. L. Yin and J. Liu, ”A k-means approach for map- College, Shenyang Normal University. Her research inter- reduce model and social network privacy protection,” ests include Multimedia Security, Network Security, Filter Journal of Information Hiding and Multimedia Sig- Algorithm and Data Mining. Email:[email protected]. International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 874

A Novel Blockchain-based Anonymous Handover Authentication Scheme in Mobile Networks

ChenCheng Hu, Dong Zheng, Rui Guo, AXin Wu, Liang Wang, and ShiYao Gao (Corresponding author: ChenCheng Hu)

School of Cyberspace Security, Xi’an University of Posts and Telecommunications National Engineering Laboratory for Wireless Security, Xi’an University of Posts and Telecommunications Xi’an 710121, China (Email: [email protected]) (Received Apr. 23, 2019; Revised and Accepted Nov. 16, 2019; First Online Jan. 29, 2020)

Abstract authentication. The handover authentication technology realizes the In the wireless mobile network (WMN), the handover interconnection, intercommunication and mutual confi- authentication scheme is the key to ensure fast and se- dence between the mobile node and access points, which cure handoff of mobile nodes among several access points. provides a guarantee for the secure communication in the However, it is difficult to design an appropriate handover mobile internet. As shown in Figure 1, a typical handover authentication protocol for the inherent drawbacks in authentication scenario consists of three entities: The mo- WMN. For example, the resources of the MN are limited bile nodes (MNs), access points (APs) and authentication and the mobile nodes have low capability in computing. server (AS). For a secure handover authentication, when Therefore, the traditional authentication schemes are un- the MN moves from the current node (AP1) to the new suited to be applied in WMN for the reason that they have node (AP2), the AP2 needs to authenticate the MN to low efficiency. To construct a fast handover authentica- prevent the illegal users, and the MN also needs to authen- tion protocol, in this paper, we design an anonymous han- ticate the AP2 to prevent an attacker from disguising the dover authentication protocol with high efficiency by con- AP2. In addition, the MN should also establish a session sidering the distributed storage, collective maintenance key with the AP2 to protect the security of user’s data. and tamper-resistance. Then, the security and efficiency Based on this framework, the handover authentication is of the proposal is analyzed, and it is concluded that ours employed in the mobile communication and real-time ser- uses chameleon hash with blockchain to achieve robust vices such as 5G, Voice over Internet Phone(VoIP), Video- security and high efficiency. Meanwhile, our scheme sat- Phone, mobile TV, Video Conference, and online games. isfies the property of user anonymity, conditional privacy However, the time-delay in these services affects user ex- protection and robust key agreement as well. perience seriously. Thus, it is crucial to reduce the time- Keywords: Anonymity; Blockchain; Chameleon Hash; delay and the energy consumption in the process of han- Handover Authentication; Wireless Mobile Network dover authentication that improves the service quality. To design an efficient and secure handover authenti- cation protocol, there are two issues should be consid- 1 Introduction ered. First, for the limited computing power of the MN, the protocol should be lightweight in computation and With the rapid development of wireless mobile network communication costs. Second, because of the openness of (WMN), various mobile Internet applications have been wireless network, the protocol should have robust secu- utilized in the different fields of life. Due to the portabil- rity to protect the privacy of MN and prevent the system ity and mobility of mobile devices, the demand for mul- from various attacks. In order to achieve the requirements timedia services by mobile nodes has exploded. There- above, many handover authentication protocols have been fore, providing secure and fast real-time services for mo- put forward in the last several years. bile users will become an inevitable trend in the future. When a service provider offers these real-time services to mobile terminals in the wireless network, the mobile 1.1 Related Works users often need a handoff in the different access points (or base stations) for the limited signal range. In detail, a In IEEE 802.11i [3], it proposed a four-way handshake new connection is established between the mobile termi- to create a Pairwise Transient Key (PTK) and then dis- nal and the new access point depending on the handover tributed a Group Transient Key (GTK) for broadcast International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 875

the scheme of [19] uses simple hash function to enhance efficiency and it can preserve user’s anonymity. Later on, Hsiang and Shih [14] showed that the scheme of [19] is vulnerable to insider’s attack. He et al. [12] proposed a strong user authentication scheme with smart cards for wireless communications. The scheme of [12] is suitable for the low-power and resource-limited mobile devices since it only performs a symmetric encryption/decryption operation. How- ever, [18] showed that He et al.’s scheme is unfairness in key agreement. Then, He et al. [13] summarized the basic security requirements of handover authentication proto- cols and proposed a novel batch verification AHA proto- col. However, their implementation calls for complex and Figure 1: Handover authentication overview time-consuming operation, such as bilinear pairing oper- ations and point multiplication. After that, Xie et al. [28] proposed an improved AHA protocol using ECC. Unfor- tunately, this scheme does not support batch verification communication. However, the time cost of full han- and are not suitable for practical applications. dover authentication is unacceptable for real-time traf- fic in nowadays. In order to improve the performance of Ramadan et al. [23] proposed a user-to-user mutual handover authentication scheme, the authentication, au- authentication and key agreement scheme, which is more thorization, and accounting (AAA) based schemes [21,22] compatible with the LTE security architecture. In re- were proposed. These schemes use AAA servers to ensure cent years, the idea of proxy signature has been utilized security handovers, and they adopt pre-authentication to design handoff authentication schemes [7–9, 20]. The and proactive key distribution methods to increase the essential idea of these schemes lies in that the authentica- efficiency of authentication. However, they need to es- tion server issues its delegation power to the MN, which tablish trust relationship and generate a large amount grants the MN the ability to generate a proxy signature of authentication traffic between network nodes, which on behalf of the authentication server. Then, the new increases the complexity of the entire system. Dif- AP trusts the MN due to the proxy signature on behalf ferent from the AAA scheme, there is an alternative of the authentication server. However, these schemes are protocol without communicating with the AAA server vulnerable to various security issues. which is called the Security Context Transfer (SCT) Different from the above schemes, the schemes in [5, schemes [4, 27, 30] were proposed. The SCT scheme need 10,11] proposed a handover authentication scheme based not to establish communication between AAA and AP. on the chameleon hash function. These schemes used the Nevertheless, these solutions are based on the assumption collision of chameleon hash function for authentication that the APs are mutually trustworthy. Thus, in actual to avoid certificate management problems. These pro- application scenarios, the APs cannot be trusted totally, tocols are lightweight authentication schemes with high which brings some security risks to these schemes. efficiency, but there are still some shortcomings in them, In order to solve the problems in handover authen- such as key escrow, privacy preservation, redirection at- tication above, some works [2, 12–15, 18, 19, 28, 29] were tack, high communication and computation overhead. proposed. In [15], an identity-based handover authenti- In the handover authentication, if the legal identity of cation scheme was proposed, which can be implemented the MN can be securely broadcast to all APs, the effi- only by ID between the MN and AP. This scheme has ciency of authentication process can be greatly improved. better efficiency than AAA scheme because there is no In order to make this property can be applied in the need to make communication between the MN and AP, handover authentication, we use blockchain technology and it reduces the overall system complexity compared to to achieve our goals. Nowadays, blockchain technology the AAA-based and SCT-based schemes. Unfortunately, has been applied in many fields [6, 17, 24, 25]. Regarding since there is a PKG to issue a private key, this solution the application of blockchain in the field of identity au- has the problem of key escrow. thentication. In the literature [6], based on the shortcom- The study in [2] used low-cost functions to achieve se- ings of traditional authentication relying on third-party curity and efficiency, it also used nonce instead of times- centers and vulnerable to man-in-the-middle attacks, a tamps to avoid the clock synchronization problem. How- blockchain PKI scheme based on privacy protection was ever, Youn et al. [29] identified that the scheme of [2] proposed. The scheme of [17] describes the concept of cannot achieve the anonymity under four attack strate- blockchain PKI and shows that it has significant advan- gies, and it is not efficient in password authentication. tages over traditional PKI and implements PKI authen- Liao and Wang [19] presented a dynamic ID-based remote tication based on Ethereum. However, none of these user authentication scheme for multi-server environment, schemes solves the handover authentication problem. International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 876

1.2 Our Contribution structure is formed by the interlocking of the hash values of each block. Because of these properties, blockchain has For the above problems in the current handover authenti- some important characteristics such as tamper resistance, cation, we propose a secure anonymous handover authen- data synchronization, traceability. tication scheme based on chameleon hash function and blockchain technology, and design a blockchain certificate model. We summarize our main research contributions as 2.2 Chameleon Hash Function follows: The chameleon hash function was first proposed by 1) In order to improve the efficiency of authentication Krawczyk and Rabin as a one-way hash function with process, our scheme uses the distributed and difficult- trapdoors [16]. A chameleon hash function is associated to-tamper features of the blockchain, and it does not with a set of public and private keys, which are also known require an extra interaction between the AP node as trapdoors. For the participants who do not grasp the and registration node. trapdoor information, it is only a one-way function that is strongly collision-resistance. But for the users who have 2) We use a blockchain certificate and chameleon hash mastered the trapdoor information, he can easily calcu- function to solve the problem of certificate manage- late the collision of the chameleon hash function. ment. Definition 1. A chameleon hash function based on the 3) To achieve the robust security, we use pseudonyms single trapdoor information) [1] to provide user anonymity, conditional privacy pro- tection, and updatable key agreement. Generation of public and private key pairs: Let p 4) Finally, we analyze the performance of our scheme be a safe prime number of bitlength τ, and satisfies and compare its performance with some existing p = 2q + 1, where q is a sufficiently large prime num- ∗ ∗ schemes. From the analysis results, our scheme is ber. Let Zp be a group, g is the generator of Zp , g has ∗ more efficient than them. order q. The user chooses random number x ∈ Zq as a private key CKR, and the corresponding public x key HKR is computed as y = g modp. Assume that 1.3 Organization the length of q is λ. Let H be a collision-resistant The rest of the paper is organized as follows: The Sec- hash function, mapping arbitrary-length bitstrings to tion 2 introduces some preliminaries, such as the knowl- strings of fixed length λ, H : {0, 1}∗ → H : {0, 1}λ. edge of blockchain techniques, chameleon hash functions and the requirements for an ideal handover authentica- Construction of chameleon hash function: ∗ tion. The anonymous handover authentication based on To commit to a message m, and m ∈ Zq . blockchain scheme is presented in Section 3. The secu- Define the chameleon hash function as: e s rity and performance analysis of the related schemes is CHAM − HASH(m, r, s) = r − (y g modp)modq, ∗ ∗ discussed in Section 4. Finally, Section 5 concludes the the random values (r, s) are choose from Zq × Zq , paper. where e = H(m||r). Collision finding: Let C be the output of Chameleon 2 Preliminaries hash function C = CHAM − HASH(m, r, s), the user chooses a new random message m0 and a ran- ∗ 0 2.1 Blockchain dom value k ∈ Zq , then computes r = C + (gkmodp)modq, e0 = h(m0||r0), s0 = k − e0xmodq. Blockchain is a new application mode of computer Then, we can get the equation: technology such as distributed database, point-to-point transmission, consensus protocols, and encryption algo- C = CHAM − HASH(m, r, s) rithm [24]. It records all transaction information occur- = CHAM − HASH(m0, r0, s0). ring on the node. The process is highly transparent and the data is highly secure. The data structure of the blockchain can be described from three levels: Chain, Computational Diffie-Hellman (CDH) problem: x y block and transaction. All transactions in the same time Given x · P, y · P (g , g ), the task of the CDH x·y ∗ period form a block, and the blocks are linked in chrono- problem is computing x · y · P (g ), where x, y ∈ Zq logical order to form a blockchain. When several transac- are two unknown numbers. tions are packaged into a block, data in all nodes can be updated. Each block is composed of a block header and 2.3 Requirements of Handover Authenti- a block body. cation Each block header contains the hash value of the pre- vious block, the timestamp, the total hash value of the In wireless networks, an ideal handover authentication transaction data (Merkle root). In this way, the chain scheme should satisfy the following requirements: International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 877

1) Mutual authentication: The AP should authen- ticate the identity of the MN to determine that the MN is a legitimate user. At the same time, although the AP is trusted, the MN should also authenticate the AP, in order to prevent the attacker from imper- sonating the AP.

2) Conditional privacy protection of user: The identity of the user should not be made public, ex- cept for the initial registration node, even if the AP does not know the true identity of the user. However, in some special cases, the AP can send a request to the registration node to obtain the true identity of the MN. Figure 2: Blockchain certificate

3) Key agreement: After mutual authentication is write and query. The valid users only have the right to completed, a session key should be established be- query. The parameter action of the interface written here tween the MN and the AP to ensure the security of indicates the user’s data processing intent, which can be communication afterwards. the state of ”issue” or ”revoke”. The parameter action of 4) Robust security property: The handover authen- the interface written here indicates the user’s data pro- tication protocols should provide robust security at- cessing intent, which can be the state of ”issue” or ”re- tributes to defend against various attacks on the wire- voke”. Since the blockchain cannot change the data al- less network (such as eavesdropping, replay attacks, ready stored in the blockchain, the issue and revoke here man-in-the-middle attacks, etc.). do not directly operate on the data, but record the op- eration of this data in the blockchain, and then generate 5) Perfect forward secrecy: To protect the security a new block and add it to the blockchain. Our handover of the session key, a handover authentication proto- authentication model is shown in Figure 3. col should be able to provide perfect forward secrecy, i.e., the adversary cannot extract the session key pro- duced in previous session even he/she gets both pri- vate keys of the MN and the AP.

6) Efficiency: Since the computing power and storage capacity of mobile nodes in mobile networks are lim- ited, the energy consumption of the authentication process should be as small as possible and the delay should be as low as possible.

The scheme we proposed in this paper satisfies the se- curity attributes required for the above handover authen- Figure 3: Our handover authentication system model tication.

3 Anonymous Handover Authen- 3.2 Our Handover Authentication tication Based on Blockchain Scheme This section introduces our proposed anonymous han- 3.1 Blockchain Certificate dover authentication scheme based on blockchain tech- nology. The notations used in the protocol are shown in In this section, we designed a blockchain certificate based Table 1. on the X.509 digital certificate and blockchain structure. Where i represents the different stages of the calcula- When the MN completes registration at the registration tion, the specific process diagram of the protocol flow is node AS, the AS will generate a unique blockchain cer- shown in Figure 4 and Figure 5: tificate and upload it to the blockchain for the next han- dover authentication. Our bolckchain certificate structure 3.2.1 Initial Authentication Phase is shown in Figure 2. According to [26], the write interface of the blockchain The Initial authentication phase is shown as Figure 4. is defined as put(action, data), and the query interface of In the Initial Authentication phase, the mobile node MN the blockchain is defined as get(condition). The registra- needs to register to the AS node with his real identity. If tion node and the authentication server have the right to the MN is valid, the AS generates a blockchain certificate International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 878

Table 1: The notations used in protocol Upon receiving the parameter from the MN. The AS Notations Meanings verifies the validity of the MN identity based on the ∗ ∗ h1() : {0, 1} → Zq One way and collision- stored identity hash value. If the identity is invalid, ∗ λ h2() : {0, 1} → {0, 1} resistance hash function the MN’s access is denied, otherwise, the authentica- IDx The identity of x tion proceed to the next step. Ni Random number The blockchain certificate 2)AS → MN: PID After confirming the identity of C() x of x the MN, AS chooses a pseudo-name set PID = pid , pid ...in which the elements are unlinkable, and T ,T Expiration and current time 1 2 Exp Curr sends it to the MN. The chameleon hash trap- (Xx,Yx) door key pair of x 3)MN → AS: CHAM(mMN , r(0)MN , s(0)MN ) After Chameleon hash function the MN receives the feedback from the AS, the CHAM(m, r, s) CHAM(m, r, s) = ∗ MN randomly chooses XMN ∈ Zq as his private e s r − (Yx g modp)modq Chameleon hash key CKR, and YMN is public The parameter of Chameleon Chameleon hash key. Then the MN chooses the ran- ∗ ∗ r(i)x hash function where r(i)x = dom values (r(0)MN , s(0)MN ) from Zq ∗Zq ,and com- CHAM(m, r, s) + (gkmodq) putes the value of Chameleon hash function: The parameter of Chameleon s(i)x hash function where C = CHAM(mMN , r(0)MN , s(0)MN ) s(i) = k − eX modq eMN s(0)MN x x = r(0)MN − (YMN g mod p) mod q. A message choose by x mx ∗ where mx ∈ Zq Then, the MN sends the value to the AS, where eis a required parameter to eMN = h2(mMN , r(0)MN ). e calculates(i) ,where x 4)AS → Blockchain: C(0) Upon receiving the e = h (m, r) MN 2 chameleon hash value sent by MN, the AS generates a blockchain certificate of the MN and uploads it to the blockchain. of MN, and uploads it to the blockchain. Similarly, for 5)AS → MN: TEXP After the AS generates the each AP node, their own blockchain certificates are also blockchain certificate of the MN, it returns the time recorded in the blockchain. when the certificate expires to the MN. At the same time, the AS opens the query interface of the System Parameters: Our scheme specifies two random blockchain to the MN, so that the MN can query the prime number p and q,q is a big prime number, where data on the blockchain. p = 2q +1.g is selected as a generator of order q from ∗ ∗ ∗ ∗ λ Zq . h1() : {0, 1} → Zq and h2() : {0, 1} → {0, 1} are two safe and collision-resistant hash functions. 3.2.2 Handover Authentication Phase

1)MN → AS: h2(IDMN ) The mobile node MN sends The Handover Authentication phase is shown as Figure 5. the hash value of his identity h2(IDMN ) to the AS. When the MN arrives at the new AP2, the AP2 needs to

Figure 4: Initial authentication phase International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 879 authenticate the legal identity of MN to decide whether the AP2 authenticates the MN as a legitimate user, to provide services for the MN. Similarly, the MN also and the AP2 sends its own parameters to the MN, needs to authenticates the AP2. so that the MN can authenticates the identity of the AP2. 0 1)MN → AP2: CertMN k mMN k r(1)MN k s(1)MN k h1(pidj +N1) g k TCurr 3) AP2 → MN: 0 k0 The MN chooses an unused pidj from PID, a new CertAP 2 k mAP 2 k r(1)AP 2 k s(1)AP 2 k g k TCurr 0 h1(pidj +N1) XAP 2 XAP 2 random message mMN , and computes g , CertAP 2 = (IDAP 2, g ), g is the public 0 then the MN sends CertMN k mMN k r(1)MN k Chameleon hash key of AP2. The AP2 chooses ran- s(1) k gh1(pidj +N1) k T to AP2, where 0 ∗ 0 ∗ MN Curr dom value mAP 2 ∈ Zq and k ∈ Zq , and computes k0 XMN r(1)AP 2, s(1)AP 2 and g . Then the AP2 uses the CertMN = (pidj, g ) value gXMN and gh1(pidj +N1) from the MN, together r(1)MN = CHAM(mMN , r(0)MN , s(0)MN ) with his own parameters, to calculate the session K +(gh1(pidj +N1) mod q) for communicating with MN. The AP2 can get:

0 0 s(1)MN = h1(pidj + N1) − eMN XMN mod q K = (gh1(pidj +N1))XAP 2 (gXMN )k . 0 0 eMN = h2(mMN , r(1)MN ). 4)MN ← Blockchain: C(0)AP 2 2) AP2 ← Blockchain: C(0)MN Upon receiving the pa- AP2 computes: rameters from the MN, the AP2 uses these param- 0 eters to find the MN’s blockchain certificate. Then, CHAM(mMN , r(1)MN , s(1)MN ) 0 the AP2 queries the status of the MN’s blockchain eAP 2 s(1)AP 2 = r(1)AP 2 − (Y g mod p) mod q certificate. If the certificate status is ”revoke”, the AP 2 MN’s access is denied. Otherwise, the AP2 com- and compares with the Chameleon hash value of AP2’s putes: blockchain certificate C(0)AP 2 to verify whether

0 CHAM(mAP 2, r(0)AP 2, s(0)AP 2) CHAM(mMN , r(1)MN , s(1)MN ) 0 0 = CHAM(m , r(1) , s(1) ) eMN s(1)MN AP 2 AP 2 AP 2 = r(1)MN − (YMN g mod p) mod q, to authenticate the AP2. If the authentication is success- and compares with the Chameleon hash value ful, the MN uses the parameters of the AP2 to calculate of MN’s blockchain certificate C(0)MN to verify the session key for communicating with the AP2. The whether MN can get:

0 k XMN XAP 2 h1(pidj +N1) CHAM(mMN , r(0)MN , s(0)MN ) K = (g ) (g ) . 0 = CHAM(mMN , r(1)MN , s(1)MN ) In the end, the session key shared between MN and AP2 is: is established. If the equation does not hold. The 0 MN is determined to be an illegal user, otherwise, K = gh1(pidj +N1)XAP 2 gk XMN .

Figure 5: Handover authentication phase International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 880

Finally, when the user no longer has a handover authenti- 4.1.4 Resistance to Replay Attack cation request or needs to quit the system, the current AP node generates the user’s revocation block and uploads it In the process of handoff authentication, an adversary to the blockchain. In this way, the revocation operation may record the message that the MN send to the AP and of the users’ blockchain certificate is achieved. replay it. Our scheme uses timestamps, random numbers and PID to prevent replay of previous messages. Since the MN updates his own pidj at every new round of han- 4 Security Analysis and Perfor- dover authentication. When the MN performs a verifica- tion, the timestamp TCurr will be sent, and the parameter h1(pidj +N1) mance Evaluation g also contains a random number N1. There- fore, if an attacker replays a message to try to enter the 4.1 Security Analysis system, the AP can detect the replay attack regardless of 4.1.1 Mutual Authentication whether he detects the pidj, the timestamp TCurr, or the value of gh1(pidj +N1). Accordingly, it is worthless for an Our scheme ensures only the legitimate user to access adversary to replay messages. the wireless network. After the MN is successfully reg- istered, the AS allows the MN to query the data on 4.1.5 Resistance to Man-in-the-Middle attack the blockchain. In a handover authentication phase, af- ter the AP authenticates the identity of the MN, the In the process of key agreement between MN and AP2, an MN calculates CHAM(m0 , r(1) , s(1) ) by using adversary may replace the parameters with his generated APi APi APi the valuer(1) ,s(1) and m0 provided by the AP, parameters to obtain information. The attack process APi APi APi and then the MN authenticates the identity of the AP implemented by the attacker can be described as Figure 6. by querying the blockchain certificate C(0)APi on the blockchain. According to this, the mutual authentication between MN and AP is completed.

4.1.2 Conditional Privacy Preservation The MN obtained the pseudonym set PID from the AS during the Initial Authentication phase. The MN uses Figure 6: Man-in-the-Middle Attack model different pid instead of the real identity during different handover authentication phase. Since the elements in the As shown in Figure 6, the adversary E may replace the PID are unlinkable to each other, when the MN reaches key parameters to get the session key between the MN and the new AP and uses a new pseudonym, there is no way for the AP. Upon receiving the parameters, the MN and the APs to collude to trace the MN according to the connec- AP2 respectively calculate the session key. The key cal- 0 h1(pidj +N1)XAP 2+E XMN tion between the pseudonyms. However, in some special culated by the MN is K1 = g , and 0 cases, the AP can send a request with the pid provided the key calculated by the AP2 is K2 = gEXAP 2+k XMN . by MN to AS. Upon receiving the request, the AS finds This is not the shared key value they expect, so the MN the pseudonym set to which it belongs according to the and the AP2 can not communicates with each other. The pid provided by the AP, and the true identity of the MN adversary will be discovered by the MN and AP2. How- can be found. Based on this, the conditional privacy pro- ever, since the attacker does not know the private indexes tection of user can be achieved. XMN and XAP 2 of the MN and the AP2, the attacker cannot calculate either of K1 or K2 . It ensures that the 4.1.3 Key Agreement MN and the AP2 can be confident that only themselves can calculate the key value shared between them. During the authentication process, the MN uses his own XMN chameleon hash function public key g and the pa- 4.1.6 Resistance to Passive Eavesdropping At- h1(pidj +N1) rameter g generated by the random number tack and pid; the AP2 uses his own chameleon hash function 0 public key gXAP 2 and the parameter gk generated by the During the process of the handoff authentication, the in- newly selected element to establish a shared session key. formation that the attacker most desires is the identity of In our construction,K can be shared by the MN and AP2, the MN and the the session key K. Firstly, the identity 0 which satisfies K = gh1(pidj +N1)XAP 2 gk XMN . Moreover, that the MN sends to the AS during the Initial Authen- whenever a new round of handover authentication is per- tication phase is hashed, which is not available to the formed, the MN must choose a new pid to protect its eavesdroppers. During the different handover authentica- privacy. At the same time, according to key agreement tion phases, the pidj in the PID are unlinkable with each process, since the session key contains the parameter pid, other, so the attacker cannot associate the user MN with the session key is updated with each new round of han- different pid appearing in different handover authentica- dover authentication. tions. As a result, obtaining the identity of the MN is International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 881

Table 2: Security comparisons Conditional Man-in- Passive Perfect Mutual Key Replay protocols Privacy the-Middle avesdropping forward Authentication Agreement Attack Preservation attack Attack secrecy [15] YES YES NO YES NO NO NO [13] YES YES YES YES YES YES YES [28] YES YES NO YES YES YES YES [7] YES YES NO YES YES YES YES [11] YES YES NO YES NO NO NO Ours YES YES YES YES YES YES YES

difficult for the attacker. Secondly, if the adversary wants Since the AS node only plays the role of register- to get the private key XMN and XAP 2, then the problem ing and uploading the MN’s blockchain certificate to the XMN XAP 2 of getting XMN and XAP 2 from g and g can be blockchain in our scheme, we only consider the operations reduced to solve the discrete logarithm problem, and it and computational cost required by the MN and AP nodes is difficult to be solved. Therefore, our construction can in the efficiency analysis. In order to prove the efficiency resist against passive eavesdropping. of our scheme, we implement the above operations on a Laptop (Lenovo with Intel I5-3320M 2.60GHz processor, 4G bytes memory and the Windows 7 operating system 4.1.7 Perfect Forward Secrecy ) using the JPBC library. The time cost of the primitive 0 To get the session key K = gh1(pidj +N1)XAP 2 gk XMN cryptography operations shown in Table 4. h (pid +N )X the adversary has to extract g 1 j 1 AP 2 from In the Handover authentication phase of our scheme, h (pid +N ) X g 1 j 1 and g AP 2 , the adversary has to address the the MN and AP2 authenticates each other and negotiates CDH problem. Because the CDH problem is hard, the a session key. During this phase, the computation cost of proposed protocol can support the perfect forward se- MN is: 5TE +3TH ≈ 1.038ms, the computation cost of AP crecy. is: 5TE + TH ≈ 0.946ms. Furthermore, the computation cost among schemes [7, 13, 15, 28] and ours is analyzed in 4.2 Security Comparisons Table 5 and is compared in Figure 9. According to the requirements of handover authentication in section 2.3 and section 4.1, the comparisons of security Table 4: Time cost of cryptography operations properties are listed in Table 2. TE TP TECSM TH Times(ms) 0.18 8.45 2.013 0.046 4.3 Performance Analysis In this section, the performance of the proposed proto- col is analyzed with some existing schemes. Then, we obtained some conclusions about the efficiency of our scheme.

4.3.1 Computation Overhead The notations we used in this section are shown in Table 3.

Table 3: The notations used in Efficiency calculation Time for executing a modular TE exponentiation in GT Figure 7: Comparison of the computation cost Time for executing a bilinear T P map operation And Table 6 shows the energy consumption at the MN Time for executing a scalar T (E ), the energy consumption can be calculated as E = ECSM multiplication operation MN T × P , where E is the energy consumption, T is Time for executing a general MN MN T the total computation time for handover authentication H hash function of MN, and P is the CPU maximum power (35W). International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 882

Table 5: Comparison of the computation cost MN operations AP operations [15] TECSM + 2TP ≈ 18.913ms TECSM + 2TP ≈ 18.913ms [13] 4TECSM + 3TE + 5TH ≈ 8.822ms 2TP + TECSM + 3TE + 5TH ≈ 19.683ms [28] 4TECSM + 5TH ≈ 8.282ms 5TECSM + 5TH ≈ 10.295ms [7] 3TECSM + 4TH ≈ 6.223ms 3TECSM + 4TH ≈ 6.223ms Ours 5TE + 3TH ≈ 1.038ms 5TE + TH ≈ 0.946ms

4.3.2 Transmission Overhead For the transmission overhead, it is assumed that the ex- pected authentication message delivery cost between the Table 6: Energy consumption of MN AP2 and the AAA server is e unit and that between the [15] [13] [28] [7] Ours MN and the AP2 is δ unit, respectively. In our scheme, EMN (mJ) 661.955 308.77 289.87 217.805 36.33 since we only view the data on the blockchain and do not need it to send us data, we only consider the time con- sumption between the MN and the AP2. The comparison of the transmission overhead as shown in table 7.

Table 7: Comparison transmission overhead [15] [13] [9] Ours TMN−AP 2 3δ 2δ 3δ 2δ TAP 1−AP 2 0 0 0 0 TAP 2−AAA 0 0 0 0 Ttot 3δ 2δ 3δ 2δ * TMN−AP 2 :The transmission cost between the MN and the AP2. * TAP 1−AP 2 :The transmission cost between APs, i.e., AP1 and AP2. * TAP 2−AAA :The transmission cost between the AP2 and the AAA server. * TTtot :The total transmission cost. Figure 8: Energy consumption of CPU

4.3.3 Communication Overhead In the proposed handover protocol, two messages corre- spondence is required for obtaining the handover authen- tication. In the protocol, the MN transmits CertMN k 0 h1(pidj +N1) mMN k r(1)MN k s(1)MN k g k TCurr to the AP2. Hence, the communication overhead incurred from the MN is (2|p| + 3|q| + lid + ltime)bits. The AP2 trans- 0 k0 mits CertAP 2 k mAP 2 k r(1)AP 2 k s(1)AP 2 k g k TCurr to the MN. Hence the generated communication overhead from the AP2 is (2|p| + 3|q| + lid + ltime)bits. According to [13], we know that the proposed protocol increases the communication cost. The reason for the increases is that 0 the MN and the AP2 send gh1(pidj +N1) and gk to each other for achieving the perfect forward secrecy. It is wor- thy to achieve the important security attribute at the cost of increasing computation cost only. Based on the above comparative analysis, it can be Figure 9: Comparison of handover authentication seen that our scheme consumes less computation. Our scheme also provides user anonymity and conditional pri- vacy protection. Therefore, our scheme is more suitable for practical application scenarios. International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 883

5 Conclusion [6] C. Fromknecht, D. Velicanu, and S. Yak- oubov, ”Certcoin: A namecoin based de- In the handover authentication of wireless networks, se- centralized authentication system 6.857 class cure and efficient handover authentication has been the project,” Unpublished Class Project, 2014. (https: focus of widespread attention. In this paper, we propose //courses.csail.mit.edu/6.857/2014/files/ a anonymous handover authentication scheme based on 19-fromknecht-velicann-yakoubov-certcoin. chameleon hash function and blockchain technology. The pdf) main idea of our scheme is to generate a blockchain cer- [7] S. Gupta, B. L. Parne, and N. S. Chaudhari, ”A tificate for the user by a registration node AS. When lightweight handover authentication protocol based the handover authentication occurs, the AP compares on proxy signature for wireless networks,” in The the chameleon hash value provided by the user with the 14th IEEE India Council International Conference blockchain certificate to verify the legal identity of the (INDICON’17), pp. 1–6, July 2017. user. Our scheme provides anonymity and conditional [8] S. Gupta, B. L. Parne, and N. S. Chaudhari, ”A privacy protection.The AP can request the true identity proxy signature based efficient and robust handover of the MN from the AS when some accidents occurs dur- AKA protocol for LTE/LTE-A networks,” Wireless ing the handover authentication phase. When the user no Personal Communications, vol. 103, no. 3, pp. 2317– longer has a handover authentication request or needs to 2352, 2018. log off, the current AP node generates the user’s revoca- [9] S. Gupta, B. L. Parne, and N. S. Chaudhari, ”Pseh: tion block and uploads it to the blockchain. In this way, A provably secure and efficient handover AKA proto- the revocation operation of the users’ blockchain certifi- col in LTE/LTE-A network,” Peer-to-Peer Network- cate is achieved. Finally, when the MN performs a new ing and Applications, vol. 12, no. 4, pp. 989–1011, handover authentication to choose a new pid, the session 2018. key for the secure communication with AP is also updated [10] S. Gupta, B. L. Parne, and N. S. Chaudhari, ”An at the same time. After the analysis of performance, our efficient handover aka protocol for wireless network scheme has the ideal efficiency. using chameleon hash function,” in The 4th Inter- national Conference on Recent Advances in Informa- tion Technology (RAIT’18), pp. 1–7, June 2018. Acknowledgments [11] Q. Han, Y. Zhang, X. Chen, H. Li, and J. Quan, ”Ef- ficient and robust Identity-Based handoff authentica- This work was supported by the Natural Science Founda- tion in wireless networks,” in International Confer- tion of China under Grants 61802303 and 61772418, the ence on Network and System Security, pp. 180–191, Innovation Ability Support Program in Shaanxi Province 2012. of China under Grant 2017KJXX-47, the Natural Science [12] D. He, M. Ma, Y. Zhang, C. Chen, and J. Bu, ”A Basic Research Plan in Shaanxi Province of China under strong user authentication scheme with smart cards Grant 2016JM6033 and 2018JZ6001. for wireless communications,” Computer Communi- cations, vol. 34, no. 3, pp. 367–374, 2011. [13] D. He, D. Wang, Q. Xie, and K. F. Chen, ”Anony- References mous handover authentication protocol for mobile wireless networks with conditional privacy preserva- [1] G. Ateniese and B. D. Medeiros, ”On the key expo- tion,” Science China Information Sciences, vol. 60, sure problem in chameleon hashes,” in International no. 5, pp. 052104, 2017. Conference on Security in Communication Networks, [14] H. C. Hsiang and W. K. Shih, ”Improvement of the pp. 165–179, Sep. 2004. secure dynamic ID based remote user authentication [2] C. C. Chang, C. Y. Lee, and Y. C. Chiu, ”Enhanced scheme for multi-server environment,” Computer authentication scheme with anonymity for roaming Standards & Interfaces, vol. 31, no. 6, pp. 1118–1123, service in global mobility networks,” Computer Com- 2009. munications, vol. 32, no. 4, pp. 611–618, 2009. [15] Y. Kim, W. Ren, J. Jo, M. Yang, Y. Jiang, and [3] C. Chaplin, E. Qi, H. Ptasinski, J. Walker, and J. Zheng, ”SFRIC: A secure fast roaming scheme S. Li, 802.11i overview, IEEE.802.11–04/0123r1, in wireless lan using ID-based cryptography,” in 2005. (http://www.drizzle.com/~aboba/IEEE) IEEE International Conference on Communications, [4] J. Choi and S. Jung, ”A secure and efficient handover pp. 1570–1575, June 2007. authentication based on lightweight diffie-hellman on [16] H. Krawczyk and T. Rabin, Chameleon Hashing and mobile node in FMIPv6,” IEICE Transactions on Signatures, Aug. 2000. US Patent 6,108,783. Communications, vol. 91, no. 2, pp. 605–608, 2008. [17] K. Lewison and F. Corella, Backing Rich Cre- [5] J. Choi and S. Jung, ”A handover authentication us- dentials with a Blockchain PKI, 2016. (https:// ing credentials based on chameleon hashing,” IEEE pomcor.com/techreports/BlockchainPKI.pdf) communications letters, vol. 14, no. 1, pp. 54–56, [18] C. T. Li and C. C. Lee, ”A novel user authentication 2009. and privacy preserving scheme with smart cards for International Journal of Network Security, Vol.22, No.5, PP.874-884, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).19) 884

wireless communications,” Mathematical and Com- Biography puter Modelling, vol. 55, no. 1-2, pp. 35–44, 2012. [19] Y. P. Liao and S. S. Wang, ”A secure dynamic ID ChenCheng Hu received the B.Eng. degree from based remote user authentication scheme for multi- the Xi’an University of Posts and Telecommunications, server environment,” Computer Standards & Inter- in 2016, where he is currently pursuing the M.S. degree. faces, vol. 31, no. 1, pp. 24–29, 2009. He is doing research at the National Engineering Labora- [20] C. Ma, K. Xue, and P. Hong, ”A proxy signature tory for Wireless Security. His current research interests based re-authentication scheme for secure fast hand- include blockchain technology, user authentication, and off in wireless mesh networks,” International Journal information security. Network Security, vol. 15, no. 2, pp. 122–132, 2013. Dong Zheng received the Ph.D. degree from Xidian Uni- [21] A. Mishra, M. H. Shin, and W. A. Arbaugh, ”Proac- versity, in 1999. He joined the School of Information Se- tive key distribution using neighbor graphs,” IEEE curity Engineering, Shanghai Jiao Tong University. He is Wireless Communication Magazine, vol. 11, no. 1, currently a Professor with the Xi’an University of Posts pp. 26–36, 2004. and Telecommunications, China. His research interests [22] S. Pack and Y. Choi, ”Fast handoff scheme based on include information theory, cryptography, and informa- mobility prediction in public wireless lan systems,” tion security. He is also a Senior Member of the Chinese IEE Proceedings of Communications, vol. 151, no. 5, Association for Cryptologic Research and a member of pp. 489–495, 2004. the Chinese Communication Society. [23] M. Ramadan, F. Li, C. X. Xu, and A. Mohamed, ”User-to-user mutual authentication and key agree- Rui Guo received the Ph.D. degree from the State Key ment scheme for LTE cellular system,” International Laboratory of Networking and Switch Technology, Bei- Journal Network Security, vol. 18, no. 4, pp. 769–781, jing University of Posts and Telecommunications, China, 2016. in 2014. He is currently a Lecturer with the National [24] N. Satoshi, Bitcoin: A Peer-to-Peer Electronic Cash Engineering Laboratory for Wireless Security, Xi’an Uni- System, 2008. (https://bitcoin.org/bitcoin. versity of Posts and Telecommunications. His current pdf) research interests include attribute-based cryptograph, [25] C. Tang and L. Gao, ”Multi-parties key agreement cloud computing, and blockchain technology. protocol in block chain,” Netinfo Security (in Chi- Axin Wu received B.S. degree from Zhengzhou Univer- nese), vol. 12, no. 9, pp. 19, 2017. sity of Light Industry in 2016. Since 2016, he is cur- [26] W. T. Tsai, L. Yu, and R. Wang, ”Blockchain appli- rently in M.Eng program in Xi’an University of Post and cation development techniques,” Journal of Software Telecommunications, Xi’an, China. His research interests (in Chinese), vol. 28, no. 6, pp. 1474–1487, 2017. include cloud security and wireless network security. [27] H. Wang and A. R. Prasad, ”Fast authentication for inter-domain handover,” in Telecommunications and Liang Wang received the B.S. degree from the Insti- Networking (ICT’04), pp. 973–982, Aug. 2004. tute of Information Technology, GUET, in 2016. He is [28] Y. Xie, L. Wu, N. Kumar, and J. Shen, ”Analysis currently pursuing the M.S. degree with the National En- and improvement of a privacy-aware handover au- gineering Laboratory for Wireless Security, Xi’an Univer- thentication scheme for wireless network,” Wireless sity of Posts and Telecommunications, China. His re- Personal Communications, vol. 93, no. 2, pp. 523– search interests include anonymous authentication, vehic- 541, 2017. ular ad hoc networks, and blockchain. [29] T. Y. Youn, Y. H. Park, and J. Lim, ”Weaknesses ShiYao Gao received the B.S. degree from the Xi’an Uni- in an anonymous authentication scheme for roaming versity of Posts and Telecommunications, in 2017. She is service in global mobility networks,” IEEE Commu- currently pursuing the M.S. degree with the Xi’an Uni- nications Letters, vol. 13, no. 7, pp. 471–473, 2009. versity of Posts and Telecommunications, China. She [30] C. Zhang, R. Lu, P. Ho, and A. Chen, ”A loca- is doing research at the National Engineering Labora- tion privacy preserving authentication scheme in ve- tory for Wireless Security. Her research interests include hicular networks,” in IEEE Wireless Communica- blockchain technology, electronic voting, and information tions and Networking Conference, Apr. 2008. DOI: security. 10.1109/WCNC.2008.447. International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 885

New Publicly Verifiable Data Deletion Supporting Efficient Tracking for Cloud Storage

Changsong Yang1, Xiaoling Tao1, and Qiyu Chen2 (Corresponding author: Changsong Yang and Xiaoling Tao)

School of Computer Science and Information Security, Guilin University of Electronic Technology1 No. 1 Jinji Road, Guilin, China, 541004 College of Information and Computer Engineering, Northeast Forestry University2 No. 26 HeXing Road, Harbin, Heilongjiang, China (Email: [email protected] and [email protected]) (Received Apr. 26, 2019; Revised and Accepted Nov. 16, 2019; First Online Feb. 15, 2020)

Abstract age service. The cloud storage service provider can offer on-demand data storage service to the tenants [9]. By With the rapid development of cloud storage, an increas- employing the high-quality cloud storage service, all the ing number of data owners are willing to outsource their resource-constraint data owners could upload their per- data to cloud server to greatly reduce local storage over- sonal data to remote cloud server for saving heavy local head. However, in cloud storage, the ownership of the out- storage overhead. Because of the attractive advantages, sourced data is disconnected from the management, which an increasing number of data owners, including individu- makes the outsourced data deletion become a crucial secu- als and corporations prefer to embrace cloud storage ser- rity challenge: the cloud server might reserve the data ma- vice. liciously for economic interests, and return wrong deletion Despite a number of advantages, cloud storage ser- results to cheat the data owners. To solve this problem, vice inescapably suffers from a few novel security chal- we design a novel outsourced data deletion scheme. If the lenges [3,6]. First of all, the outsourced file might contain cloud server does not execute deletion command honestly, some data owner’s privacy information, which should be the data owner can detect the dishonest data reservation kept secret. Therefore, data confidentiality has become a by checking the returned deletion evidence. Additionally, particularly austere security challenge for cloud storage. we adopt Merkle hash tree to achieve public verifiability Generally speaking, the traditional encryption technique in outsourced data deletion without requiring any trusted can be seen as a solution to this issue. However, it can third party. Meanwhile, the proposed scheme is able to only offer a partial solution because it is very difficult to achieve efficient data leakage source tracking to prevent execute significative operations over the ciphertext. The the data owner and the cloud server from slandering each fully homomorphic encryption algorithm seems a poten- other. Finally, we prove that our scheme can satisfy the tial solution, but the existing protocols are not efficient desired security requirements. and practical. Secondly, both the data owner and the Keywords: Cloud Storage; Data Deletion; Efficient Track- cloud server may be dishonest. Both of them might ex- ing; Merkle Hash Tree; Public Verifiability pose the data maliciously to slander each other. There- fore, how to precisely trace the data leakage source is a challenge which requires to be solved. Last but not least, 1 Introduction the data owner cannot execute any operation over the outsourced data directly because he will lose the direct Cloud computing, a newly-developing and promising com- control over the data. All the operations over the out- puting paradigm, can connect large-scale computing re- sourced data, such as data deletion operation, might be sources, network resources and storage resources together executed by the cloud server. However, the cloud server through the Internet [7]. Thanks to the rapid devel- might not remove the data sincerely for economic inter- opment of computer software and hardware technology, ests. Hence, how to securely remove outsoutced data is cloud computing can utilize its plenty of resources to pro- also a security threat. vide many attractive services, for instance, data stor- Although plenty of solutions have been proposed to age and sharing service, outsourcing service, verifiable realize data deletion, there are still some security chal- databases service, and so on. These services have been lenges in processing the outsourced data deletion. Firstly, widely applied by the public, especially for cloud stor- plenty of existing data deletion schemes reach deletion by International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 886 overwriting the physical disks [8, 13, 18]. To be specific, the data backup maliciously and return a wrong outcome they use some random data to overwrite the data which to cheat the data owner, but the data owner cannot detect needs to be deleted. To make the data deletion opera- the malicious behavior. Secondly, although some existing tion more secure, some researchers suggest that the disk deletion schemes provide the data owner with the abil- should be overwritten more than one times. Although ity to verify the deletion outcome, they cannot achieve overwriting the disk can theoretically solve the problem traceability. When the data is leaked, they cannot trace of data deletion, it still has some inherent limitations. the data leakage source. To the best of our knowledge, On the one hand, overwriting the physical medium isn’t it seems that there is not research work on publicly ver- efficient for real applications. Especially for distributed ifiable data deletion scheme that supports data leakage storage system, it is very difficult to overwrite every disk source tracking. Hence, we design a new scheme to delete which maintains the data copy. On the other hand, there the outsourced data and trace the data leakage source. may be some physical remanence of the overwritten data left on the disk. The attacker who equips with advanced 1.1 Our Contributions microsoping tools can recover the overwritten data with the physical remanence [5]. Therefore, it is desired to im- In this paper, we put forward a novel publicly verifiable prove the efficiency and the security of the data deletion data deletion scheme for cloud storage, which can simul- schemes. taneously achieve data leakage source tracking. The main Boneh and Lipton [2] firstly utilized cryptography tech- contributions of our proposed scheme are as follow: nique to delete the data instead of protecting them, which can make the data deletion operation more se- • We put forward a new Merkle hash tree-based cure and efficient, and resulting in plenty of follow-up publicly verifiable outsourced data deletion scheme, schemes [10, 14, 16]. To be specific, these schemes should which can simultaneously achieve data leakage source firstly use encryption key to encrypt the data before stor- tracking. To be specific, after executing data deletion ing. Then they destroy a very short decryption key to operation, the cloud server can utilize Merkle hash make a large amount of related ciphertext unavailable, tree to generate a deletion proof. If the cloud server and return an erasure outcome to the data owner. This reserves the data backup maliciously, the data owner approach can efficiently delete the digital data. However, can detect the cloud server’s dishonest behavior by a lot of existing cryptography-based data deletion schemes verifying the proof. are not able to achieve verifiability. That is, the data • Our proposed scheme can satisfy the property of owner must trust the returned deletion result since he traceability, which is different from the previous so- cannot verify it conveniently. However, the storage server lutions. That is, if the data is leaked, the proposed may reserve the data maliciously and return a wrong out- scheme can trace the data leakage source, which can come to cheat the data owner. Therefore, the requirement prevent data owner and cloud server from exposing of deletion result verifiability should be introduced into the data maliciously to slander each other. Addi- the data deletion schemes. tionally, the proposed scheme is also very efficient in Last but not least, some schemes have been put for- computation and communication. warded to provide the data owner with the ability to ver- ify the data deletion result conveniently [11,20,22,25,26]. This paper is an extension of our previous work that They delete the outsourced data and then return a re- was presented at SICBS [24]. In the following, we show lated data deletion proof. The data owner can check the the main differences between this paper and the confer- deletion result by verifying the returned deletion proof. ence version. Firstly, we demonstrate the related work However, these schemes all assume that the data owner is more detailedly in Section 1. Secondly, we put forward a honestly. If the deleted data is later discovered, the cloud more detailed scheme, and add the high description of the server is deemed to reserve the data maliciously, and the proposed scheme in Section 4. We also identify the main data owner should be entitled to compensation. How- security properties for the proposed scheme in Section 3.3, ever, there are some dishonest data owners in real-world, and we prove that our new scheme can satisfy these design and they may expose the data maliciously to slander the goals in Section 5.1. Finally, we will add the experimen- cloud server to obtain compensation. All the existing tal simulation and performance comparison between our schemes are not able to judge the data leakage source scheme and two previous schemes in Section 5.3. under the dishonest data owner and cloud server model because both of the two entities can obtain the same data 1.2 Related Work backup. Therefore, we should offer the ability to track the data leakage source if the data is exposed. Data deletion has been studied for a long time. Perl- Although various schemes have been put forward to man et al. [12] utilized a trusted third party (TTP) to deal with the problem of data deletion, most of them have solve the data deletion problem. First of all, they en- a few deficiencies. First of all, in a lot of existing solutions, crypt the data with a data key, then the TTP further the data owner must trust the returned outcome since he encrypts the data key with a control key. When the data cannot verify it. However, the cloud server might reserve owner will not need the file anymore, the TTP will make International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 887 the data key unavailable by destroying the control key, challenges. Finally, the security goals will be identified. In thus the corresponding ciphertext cannot be decrypted Section 4, we put forward our new publicly verifiable data anymore. In 2010, Tang et al. [15] designed a practical deletion scheme in detail. A brief analysis of the proposed and implementable file assured deletion (FADE) system. scheme, and the performance evaluation are presented in They firstly use a data key to encrypt the file. After that Section 5. Finally, we will conclude the proposed scheme the data key will be encrypted with a control key which in Section 6. associated with a policy. Besides, one or multiple TTPs maintain the policies together. Finally, when they want to delete the file, they can remove the related policy, and 2 Merkle Hash Tree instruct the TTP to delete the corresponding key. To offer the data owner the ability to verify the dele- Merkle hash tree (MHT), a specific binary tree, is always tion outcome, Hao et al. [5] presented a novel data dele- used to authenticate digital data [1, 4]. By using MHT, tion protocol. In their protocol, they store the private key the communication and computation overhead during the in a trusted platform module’s protected memory. Then verification process will be decreased greatly. In MHT, they reach data deletion by destroying the private key each leaf node maintains a hash value of the data block and finally return a signature as an evidence. In 2016, which needs to be authenticated, and every internal node Luo et al. [8] presented a permutation-based data dele- keeps a hash value of the concatenation of its two children. tion scheme. They suppose that the cloud server could For example, if we want to authenticate data set D = merely maintain the latest version of the data. Addition- {d1, d2, d3, d4}, the MHT is illustrated as Figure 1, h2 i = ally, all the backups will be consistent when they are up- H(di), where i ∈ [1, 4], h1 2 = H(h2 1||h2 2), H is a secure dated. Then they reach deletion by updating them with one-way hash function. Finally, the public key signature random data. Finally, the data owner is able to verify technique is used to sign the root node. the deletion outcome through a challenge-response proto- col. In 2018, Yang et al. [21] used blockchain to design a novel scheme to achieve publicly verifiably data deletion. In their scheme, they utilize blockchain to reach public verifiability without requiring any TTP, which is quite different from a lot of the previous schemes. After exe- cuting deletion operation, the cloud server can generate a deletion proof, which will be published on the blockchain. Finally, the data owner can check the deletion result by verifying the proof. Xue et al. [19] put forward a verifiable data deletion method, which could also achieve provable data transfer and data integrity verification. Their scheme gives the data owner the ability to move the outsourced data be- tween two different clouds. Moreover, the data owner is able to check the transferred data integrity on the tar- Figure 1: An example of MHT get cloud through provable data possession (PDP) pro- tocol. Then the original cloud deletes the transferred The verifier can verify any subset of D by utilizing the data blocks, and utilizes Rank-based Merkle hash tree verification object Φ, which is a set of all sibling nodes (RMHT) to generate a deletion proof. Wang et al. [17] on the path from the authenticated leaf node to the root presented a similar method in 2018. Recently, Yang et node. For instance, to verify d , Φ contains h and al. [23] put forward a novel verifiable outsoutced data 3 1 1 h . The verifier computes h0 = H(h ||H(h ||h )) transfer and deletion scheme. In their scheme, the data 2 4 0 1 1 1 2 3 2 4 firstly. Then he checks that whether h0 = h holds, owner is able to migrate the outsourced data between two 0 1 0 1 and verifies the validity of the signature. If both the verifi- different clouds, and then delete the transferred data from cations pass, it means that d is valid; otherwise, it means the original cloud server. Additionally, they utilize the 3 that d has been tampered with maliciously. primitive of vector commitment (VC) to realize public 3 verifiability without requiring any TTP. 3 Problem Statement 1.3 Organization 3.1 System Model The remainder of this paper is organized as follows: We describe the preliminary of Merkle hash tree in Section 2. In the following, we formalize the system model of our new In Section 3, we describe the problem statement in detail. scheme, which involves three entities: a data owner O, a To be specific, we firstly formalize the system model of cloud server S and a trusted agency TA, as illustrated in our novel scheme. Then we present the main security Figure 2. International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 888

storage. On the one hand, the internal attackers are so curious that they may dig some sensitive informa- Trusted Agency tion from the outsourced data. Moreover, the self- ish cloud server moves the outsourced data to other subcontractors for saving storage overhead, or shares them with some other corporators for economic inter- ests. On the other hand, the external attackers may Deletion Request /Result try their best to access the file to find some privacy data.

Data owner Cloud Server • Data corruption. The outsourced data may be pol- luted for the following reasons. First of all, the man- Figure 2: The system model ager performs erroneous operations, software or hard- ware malfunctions all may cause data loss. Secondly, the external attackers (e.g., hackers) may modify or • The data owner O. It is a resource-constraint en- delete the data arbitrarily. Last but not least, when try, who prefers to outsource his personal data to the the data owner downloads the file, the cloud server cloud server for saving local storage overhead. When sends part of the data for saving bandwidth, or de- O will not need the data anymore, he sends a data livers some unrelated data to cheat the data owner. deletion command to the cloud server to delete the outsourced data permanently. Finally, O can check • Malicious data reservation. When the data the deletion result by verifying the returned deletion owner will not need the data anymore, he will send proof. a deletion command to the cloud server to delete the data permanently. However, the selfish cloud server • The cloud server S. It refers to an entry which has might not execute the data deletion operation hon- large-scale computing resources, network resources estly for the following factors: (1) it needs some com- and storage resources, thus, can maintain a large putational cost to delete the data from the physical amount of data for resource-constraint O. When O medium; (2) the cloud server might try to reserve the will not need the outsourced data anymore, S will be data to dig some privacy data. required to delete the outsourced data from the disk. After that S may generate a deletion evidence for O 3.3 Design Goals to check the deletion outcome. In our new scheme, we aim to achieve publicly verifiable • The trusted agency TA. It is a trusted third party, data deletion in cloud storage. Meanwhile, when the data which is absolutely righteous. That is to say, TA will is leaked, we can trace the data leakage source efficiently. never collude with O or S to cheat the other mali- Therefore, our scheme should realize the following four ciously. TA always behaves honestly and righteously, goals. therefore, both S and O fully trust TA uncondition- ally. • Data confidentiality. To ensure the outsourced data confidentiality, it should prevent the attackers 3.2 Security Threats from accessing the data directly because the data may contain some privacy information. That is, it We assume that the cloud server S is “semi-honest-but- is necessary to use cryptography algorithm to en- curious”. As a result, S may not follow the protocol hon- crypt the file before uploading it to the cloud server. estly for economic interests. Besides, the attackers, such Moreover, the corresponding decryption key should as hackers or malicious users may try their best to ac- be maintained secretly. cess the outsourced data illegally. Therefore, we seriously consider the following two types of attacks: the internal • Data integrity. To prevent the outsourced data attacks and the external attacks. The internal attacks are from being polluted, the data owner should be given launched by the internal attackers, such as the dishonest the ability to verify the data integrity and availabil- cloud administrators, who would try to dig some sensitive ity. If the outsourced data has been polluted, the information from the outsourced data. Furthermore, the data owner should be able to detect the malicious dishonest S may share the outsourced data with others for manipulation. financial incentives. The external attackers (e.g., hackers • Verifiable data deletion. To make the cloud server and illegal users) might try to access the outsourced file delete the data from physical medium sincerely, the and dig privacy data. Therefore, we should consider the data owner should be given the ability to verify the following three security challenges. deletion result. If the cloud server reserves the data • Data privacy disclosure. Privacy disclosure is a dishonestly, the data owner can detect the malicious very common and serious security threat in cloud data reservation by verifying the returned proof. International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 889

• Accountable traceability. To prevent the data the data from S. Upon receiving the deletion request, S owner and the cloud server from slandering each deletes the related data and returns a deletion evidence other, it should satisfy the property of accountable to O. Finally, O can check the data deletion result by traceability. To be specific, we can trace the data verifying the proof. In our scheme, we utilize MHT to re- leakage source precisely when the data is leaked. Ad- alize public verifiability, and the verification process does ditionally, upon the data owner and the cloud server not need any TTP. executing some operations, they cannot deny their performances anymore. 4.2 The Concrete Construction 4 Our Construction In this part, we put forward our new scheme in detail. First of all, we define a few notations. Before embracing 4.1 High Description cloud storage service, the data owner O must pass the au- thentication of cloud server S. For simplicity, we assume In this paper, we study the problem of publicly verifiable that O has been authenticated and become a legal user cloud data deletion with efficient tracking under the com- of S. Then O can set a unique identity id, which is main- mercial mode, which is very similar to schemes [5,19]. In tained by O and TA secretly. Besides, we suppose that our system model, there is a trust problem between the O, S, TA respectively has a ECDSA key pair (pko, sko), cloud server S and the data owner O: both of them might (pks, sks) and (pkt, skt). H1(·) and H2(·) are two secure not fully believe each other. On the one hand, O might hash functions. Furthermore, we assume that every file not believe that S will execute data deletion operation is named with a secret and unique name, and the name honestly. On the other hand, S thinks that O may re- is so secure that it can resist brute-force attack. Without serve the data deliberately, and expose them to slander loss of generality, we can assume that O wants to upload that S did not sincerely delete the data after deletion. file F to S, and nf is the name of F . Now plenty of data deletion methods have been proposed, but all of them assume that O is honest, thus O will not • Encrypt. To guarantee data confidentiality, the data slander S maliciously. However, this assumption is not owner O should encrypt the file F before uploading, realistic. Therefore, we propose a novel scheme, which and the detailed processes are as follow. aims to make the data deletion publicly verifiable and the data leakage source traceable. – First of all, O encrypts the file F : Co =

Encko (F ), where ko = H1(sko||id||nf ), and Enc Trusted Agency is an IND-CPA secure symmetric encryption al- gorithm. Then O computes a file tag tagf = H1(nf ) and a hash value ho = H1(Co||id||tagf ).

Finally, O sends the ephemeral ciphertext Cfo

to TA, where Cfo = (Co, tagf , ho).

– Upon receiving Cfo , TA verifies that whether the equation ho = H1(Co||id||tagf ) holds. If ho 6= H1(Co||id||tagf ), TA aborts and returns failure; otherwise, TA further en- m Deletion Result crypts Cfo : Ct = Enckt (Cfo ), where kt = l Deletion Request H1(skt||id||tagf ). Then TA computes a hash n Deletion Verification Data owner Cloud Server value ht = H1(Ct||id||tagf ). Finally, TA sends the final ciphertext Cft to S, where Cft = Figure 3: The main processes of our scheme (Ct, tagf , ht).

• StoreCheck. On receipt of Cft , the cloud server S Our proposed scheme not only can achieve publicly maintains the data and returns a storage proof. For verifiable data deletion but also can realize efficient data simplicity, assume that m files are stored in MHT. leakage source tracking, and Figure 3 describes the main steps of the proposed scheme. First of all, O encrypts the – Upon receipt of Cft , S stores the data in the file to protect the privacy, and then sends the ephemeral leaf node of MHT. Here we can take m = 8 for ciphertext to TA. Then TA further encrypts the received example, and Cft is stored in the leaf node 6, ephemeral ciphertext and sends the final ciphertext to S. as illustrated in Figure 4. Then S computes a After that TA verifies the storage result, and O deletes signature on the hash value of the root node: the local backup. When O wants the outsourced file, he sigr = Signsks (h0 1), where Sign is a ECDSA downloads the corresponding ciphertext and decrypts it signature generation algorithm. Finally, S re- to obtain the plaintext. If O will not need the file any- turns storage proof λ = (sigr, Φ) to TA, where more, he is willing to send a deletion command to delete Φ is the verification object (h1 1, h2 4, h3 5). International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 890

Dec represents a traditional symmetric decryp- tion algorithm, and kt = H1(skt||id||tagf ). Fi-

nally, TA sends Cfo = (Co, tagf , ho) to O.

– Upon receiving Cfo = (Co, tagf , ho), O checks that if the equation ho = H1(Co||id||tagf ) holds. If ho 6= H1(Co||id||tagf ), O quits and outputs failure; otherwise, O executes decryption op- eration to obtain the corresponding plaintext

F = Decko (Co), where ko = H1(sko||id||nf ).

• Delete. When data owner O will not need the file F anymore, he wants to permanently delete the file from the cloud server S.

– To delete the outsourced data, O needs to gen- erate a deletion request Re. Firstly, O computes

a signature sige = Signsko (delete||tagf ||Te), Figure 4: MHT for storage proof where Te is a timestamp. Then O generates a deletion request Re = (delete, tagf ,Te, sige), and sends Re to S. – On receiving λ, TA checks that whether S stores – Upon receipt of R , S firstly checks the the file and generates the storage proof honestly. e validity of R . If R is not valid, S To be specific, TA computes the following equa- e e aborts and returns failure; otherwise, S deletes tions: 0 the data and computes a signature sigs = h3 6 = H2(Cft ); 0 0 Signsks (delete||tagf ||Te||Re). Then S utilizes h = H2(h3 5||h ); 2 3 3 6 sigs to replace Cf to re-construct the MHT, as 0 0 t h1 2 = H2(h2 3||h2 4); illustrated in Figure 5. Finally, S computes a 0 0 h0 1 = H2(h1 1||h1 2); new signature on the hash value of the new root ∗ ∗ node sigr = Signsks (h0 1), and returns a dele- ∗ Then TA checks that whether the equation tion proof τ to O, where τ = (Re, sigs, sig , Φ), 0 0 r h0 1 = h0 1 holds. If h0 1 6= h0 1, TA quits and Φ = (h1 1, h2 4, h3 5). and outputs failure; otherwise, TA verifies that if sigr is a valid signature on h0 1. If sigr is not valid, TA quits and outputs failure; otherwise, TA trusts that S stores the data honestly, and sends λ to O, then O deletes the local backup.

• Decrypt. When data owner O needs the file F , he should download the ciphertext from the cloud server S and decrypt it to obtain the plaintext.

– In order to download the ciphertext, O needs to generate a download request Rd. First of all, O computes a signature sigd =

Signsko (download||tagf ||Td), where Td is a timestamp. Then O sends download request Rd to S, where Rd = (download, tagf ,Td, sigd). Upon receiving Rd, S checks the validity of Rd through signature verification. If Rd is not valid, S quits and outputs failure; otherwise, S

sends Cft = (Ct, tagf , ht) and Rd to TA. Figure 5: The MHT for deletion proofs

– Upon receipt of Cft and Rd, TA firstly ver- ifies the validity of Rd. If Rd is not valid, TA quits and outputs failure; otherwise, TA • DelCheck. After receiving τ, the data owner O can checks that if ht = H1(Ct||id||tagf ) holds. If check the deletion result by verifying τ. ht 6= H1(Ct||id||tagf ), TA quits and outputs failure; otherwise, TA decrypts Ct to obtain the – Firstly, O checks that whether the signature

ephemeral ciphertext Cfo = Deckt (Ct), where sigs is valid. If sigs is not valid, O quits and International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 891

outputs failure; otherwise, O computes the fol- TA checks Cft . To be specific, TA checks that whether lowing equations: ? the equation ht = H1(Ct||id||tagf ) holds. The id is kept ∗0 secret by O and TA. Therefore, S cannot forge a new h3 6 = H2(sigs); 0 0 Ct to make equation ht = H1(Ct||id||tagf ) hold. That is, ∗0 ∗0 h2 3 = H2(h3 5||h3 6); if and only if Ct is intact can the verifications pass, and 0 0 TA decrypts C to obtain C . Therefore, TA always can h∗ = H (h∗ ||h ); t fo 1 2 2 2 3 2 4 detect the malicious operation if S falsifies C . 0 0 t h∗ = H (h ||h∗ ); 0 1 2 1 1 1 2 Besides, upon receiving Cfo = (Co, tagf , ho) from TA, O will check it before executing decryption opera- ∗0 – Then O checks that whether the equation h0 1 = tion. To be specific, O verifies that whether the equation ∗ 0 ∗ ? h0 1 holds. If h∗ 0 1 6= h0 1, O quits and out- ho = H1(Co||id||tagf ) holds. The attacker cannot falsify 0 0 0 puts failure; otherwise, O verifies that whether a new C to make equation h = H1(C ||id||tagf ) hold ∗ ∗ o o o the signature sigr is a valid signature on h0 1. because the id is maintained secretly by O and TA. That If sig∗ cannot pass the verification, O quits and 0 r is, the attacker cannot forge Co to cheat O successfully. outputs failure; otherwise, O can trust that the Therefore, if and only if Cfo is intact can the verification deletion proof τ is valid. If the deleted data dis- pass. covered again, based on the evidence, O should As the analysis described above, the proposed scheme be entitled to compensation. can achieve data integrity.

5.1.3 Public Verifiability 5 Scheme Analysis and Imple- Our new scheme can reach publicly verifiable data dele- mentation tion in cloud storage. After executing data deletion oper- ation, the cloud server S generates a deletion proof τ to In the following section, we give a brief analysis of the pro- prove that he has performed data deletion honestly. Note posed scheme. Firstly, we analyze the proposed scheme’s ∗ that τ = (Re, sigs, sigr , Φ), where Φ = (h1 1, h2 4, h3 5). security properties in detail. Secondly, we present the Then anyone who given τ (called verifier) can check the comparison among our scheme and some previous schemes data deletion result by verifying the evidence τ. Firstly, in theory. Finally, we evaluate the performance through the verifier checks the validity of the deletion request Re. the simulation experiments. If Re is not valid, the verifier aborts and returns failure; otherwise, the verifier checks the validity of the signature 5.1 Security Analysis sigs. If sigs is not valid, the verifier aborts and returns failure; otherwise, the verifier utilizes H (sig ) and Φ to In this part, we analyze the security of our proposed 2 s re-compute h∗ . Finally, the verifier checks that whether scheme, including the data confidentiality, data integrity, 0 1 the signature sig∗ is a valid signature on h∗ . If and only public verifiability, traceability and non-repudiation. r 0 1 if all verifications pass will the verifier trust that the dele- tion proof τ is valid. Note that the verification phases do 5.1.1 Data Confidentiality not involve any private information, and any verifier can To protect the sensitive information, the data owner uses check the deletion outcome. Therefore, we think that our IND-CPA secure AES algorithm to encrypt the file be- scheme is able to realize the property of public verifiabil- fore uploading. Additionally, the data owner keeps the ity. encryption key and decryption key secret. That is, any attacker cannot acquire the encryption key and decryp- 5.1.4 Traceability tion key maliciously. In other word, the malicious at- tacker cannot obtain any plaintext information from the The proposed scheme can trace the data leakage source ciphertext. Hence, our novel scheme can reach data con- precisely when the data is leaked. In our scheme, the fidentiality. data owner O owns the plaintext F and the ephemeral ciphertext Cfo . The trusted agency TA further encrypts the ephemeral ciphertext C to obtain the final cipher- 5.1.2 Data Integrity fo text Cft . Then the cloud server S maintains the final

Our proposed scheme is able to ensure the outsourced ciphertext Cft . Besides, O cannot access to the final ci- data integrity. In the decryption process, the cloud phertext Cft , and S cannot access to the plaintext F and server S will firstly send the final ciphertext Cft and the the ephemeral ciphertext Cfo . That is, only TA and O download request Rd to the trusted agency TA, where can obtain the ephemeral ciphertext Cfo , and only TA

Cft = (Ct, tagf , ht) and Rd = (download, tagf ,Td, sigd). and S can obtain the final ciphertext Cft . Note that TA

On receipt of Cft and Rd, TA will check Rd and Cft before is absolutely impartial, and it will never collude with O decrypting. If Rd is not valid, it means that O does not (or S) to cheat S (or O). Hence, on the one hand, the require to download the file, and TA aborts; otherwise, data leakage source must be O if Cfo is exposed. That is, International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 892

Table 1: Functionality comparison among three schemes Scheme Scheme [5] Scheme [22] Our Scheme Trusted Third Party Yes Yes Yes Public Verifiability Yes Yes Yes Data Confidentiality Yes Yes Yes Data Integrity No Yes Yes Non-repudiation No Yes Yes Traceability No No Yes

Table 2: Computational complexity comparison Scheme Scheme [5] Scheme [22] Our Scheme (Encrypt) 2E + 4H 1E + mH 2E + 6H (Decrypt) 1E + 1D + 3H - 1S + 1V + 2D + 4H (Store) - 1S + 1V + 46mH 1S + 1V + (2n+1 + n)H (Delete) 1S 2S + 1V + 23H 3S + 1V + (n + 1)H (DelCheck) 1V 1V + 20H 2V + (n + 1)H

the dishonest O cannot reserve Cfo and then expose it to data, and S has responded to this request. That is, successfully slander that S did not delete the data hon- the dishonest S cannot successfully slander O. estly. On the other hand, the data leakage source must Case 2: Dishonest data owner O. The dishonest be S if Cft is leaked. Therefore, if S reserves Cft mali- ciously and resulting in data leakage, S cannot deny his data owner O denies his behavior and slanders the dishonest data reservation. That is, the proposed scheme cloud server S maliciously. Firstly, O had asked can reach the data leakage source traceability. S to delete the data, and S had done it honestly. However, O declares that he had never asked S to 5.1.5 Non-repudiation delete the data, and slanders that S deleted the data arbitrarily. Here, S can show the deletion In our scheme, we assume that both the data owner O request Re = (delete, tagf ,Te, sige), which contains and the cloud server S may deny their behaviors thus a signature sige generated by O. No one else can slander the other. Without loss of generality, we analyze forge a valid signature sige to further forge a valid the non-repudiation when S is malicious and O is dishon- deletion request Re. Therefore, Re can be seen as est, respectively. an evidence which can prove that O had required S Case 1: Malicious cloud server S. The malicious to delete the data. Secondly, O had never required cloud server S may slander the data owner O. S to delete the data. Nevertheless, O declares that First of all, the malicious S deletes the outsourced he had required S to delete the file and S did not data arbitrarily to save storage overhead, and do it sincerely. In this case, S can require O to ∗ then slanders that he performed the data deletion show the deletion proof τ = (Re, sigs, sigr , Φ) which operation as O0s command. For this scenario, O generated by S. However, S did not generate τ at can require S to present the data deletion request all. In addition, O cannot forge a valid τ. Therefore, O cannot slander S successfully. Re = (delete, tagf ,Te, sige), where sige is a sig- nature generated by O with private key sko. On the one hand, O had never generated and sent Re 5.2 Comparison to S to delete the data. On the other hand, S cannot forge a valid Re since S does not has the In the following, we compare the functionality and com- private key sko. Therefore, S cannot present Re to putational complexity among our new scheme and two prove that he deleted the outsourced data as O0s previous schemes [5, 22] in theory, then Table 1 and Ta- command. Secondly, S reserves the data dishonestly ble 2 demonstrate the comparison results, respectively. and slanders that O had not required him to remove From Table 1 we can have the following findings. the outsourced data. Here, O can demonstrate the Firstly, all of these three schemes need to introduce a ∗ data deletion proof τ = (Re, sigs, sigr , Φ), where trusted third party to achieve publicly verifiable data sigs is a signature generated by S with private key deletion. Secondly, all of them can guarantee data con- sks. The signature sigs can be seen as a proof, fidentiality, which can protect the sensitive information which can prove that O has required S to delete the that contained in the outsourced file. Thirdly, Hao et International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 893 al. scheme [5] cannot satisfy the data integrity and 15 non-repudiation, which is different from our scheme and Hao et al. Scheme Our Scheme Yang et al. scheme [22]. Last but not least, only our new Yang et al. scheme scheme is able to achieve traceability, which can prevent data owner and cloud server from slandering each other. 10 Overall, our new scheme is more attractive than the other two schemes.

Then we compare the performance of the three schemes Time cost (ms) 5 in theory, and the results are listed as time complexities in Table 2. For simplicity, we use symbols E and D to represent a symmetry encryption and decryption, respec- tively. Moreover, we denote by H a hash computation, S 0 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 a signature generation operation, and V a signature veri- The size of encrypted file (MB) fication operation. Meanwhile, we assume that the MHT Figure 6: Time cost of encryption has m = 2n leaf nodes and the number of data blocks in scheme [22] is m. Finally, we ignore the other overhead, such as multiplication and communication overhead. n, the growth rate of our scheme is relatively lower than that of Yang et al. scheme [22]. Meanwhile, our proposed 5.3 Performance Evaluation scheme costs less time overhead. Therefore, our proposed In this part, we simulate our proposed scheme and two scheme is more efficient than Yang et al. scheme [22]. previous schemes [5, 22], then provide the performance evaluation. The related algorithms are implemented with Our Scheme PBC library and the OpenSSL library on an Unix ma- 25 Yang et al. Scheme chine, which equips with Intel(R) Core(TM) i5-6200U processors running at 2.4 GHz and 8 GB main memory. 20 For simplicity, we simulate all the entities on this Linux machine and ignore the communication overhead. 15 The outsourced file always contains some sensitive in- formation, which should be kept secret. Therefore, the Time cost (ms) 10 data owner needs to use secure encryption algorithm to encrypt the file before uploading. We increase the size of 5 the file from 0.125 MB to 1 MB with a step for 0.125 MB, 0 and the number of data blocks in Yang et al. scheme [22] 0 1 2 3 4 5 6 7 8 is fixed in 1024. Then the approximate time cost is shown The number of n in Figure 6. We can find that although the time over- Figure 7: Time cost of storage head will increase with the size of the file, the encryption operation is one-time. Moreover, our proposed scheme and Hao et al. scheme [5] cost almost the same time For saving storage overhead, the data owner does not cost to encrypt the same size of file. Meanwhile, Yang et maintain any local data backup after outsourcing the file al. scheme [22] needs more time cost than the other two to the remote cloud server. Therefore, when the data schemes. Hence, we think our proposed scheme is efficient owner needs the file, he needs to download the correspond- to encrypt file. ing ciphertext from the cloud server, and then decrypts After uploading the file to the cloud server, the data it to obtain the plaintext. We increase the ciphertext owner wants to check that whether the cloud server main- from 0.125 MB to 1 MB with a step for 0.125 MB, and test tains the data honestly. The main computation comes the approximate time overhead. Then the efficiency com- from storage proof generation and storage result verifi- parison of decryption process between the two schemes cation. In our scheme, the cloud server needs to com- is shown in Figure 8. From Figure 8 we can realize that pute (2m − 1) hash values and a ECDSA signature to the time cost will increase with the size of the decrypted generate storage proof, where m = 2n. Then the data ciphertext, and our scheme’s growth rate is lower than owner needs to execute (n + 1) hash calculations and a that of Hao et al. scheme [5]. Moreover, although our signature verification operation to verify the storage re- scheme will cost a little more time when the ciphertext sult. In Yang et al. scheme [22], the computation con- is less than 0.75 MB, the extra overhead is small and ac- sists of a signature generation and a signature verification, ceptable. Further, when the ciphertext is larger than 0.75 and 46m hash computations. We increase the number n MB, the time cost of our scheme is less than that of Hao et from 1 to 8 with a step for 1, and then Figure 7 presents al. scheme [5]. In real application, the ciphertext is often the efficiency comparison. From Figure 7 we can realize larger than 0.75 MB. Therefore, we can think that our that although the computational overhead increases with scheme is more efficient than Hao et al. scheme [5] to International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 894 decrypt same size of ciphertext. cation operations and (n+1) hash computations to verify the deletion result. However, Hao et al. scheme [5] only

needs to verify the validity of a signature. Meanwhile, 8 Our Scheme Hao et al. Scheme Yang et al. scheme [22] needs to verify a signature and

7 compute 20 hash values. Then the time cost comparison among the tree schemes is demonstrated in Figure 10. We 6 can easily find that the time cost of our scheme will in- 5 crease with n, while the time of the other two schemes is 4 almost constant. However, the growth rate of our scheme Time cost (ms) 3 is very low. Additionally, although our scheme costs a lit- ter more time than the other two schemes, the time cost 2 is very small. Meanwhile, note that the verification oper- 1 ation can be finished off-line. Therefore, it will not affect 0 the overall efficiency. 0 0.125 0.25 0.375 0.5 0.625 0.75 0.875 1 The size of decrypted file (MB)

Figure 8: Time cost of decryption Hao et al. Scheme Yang et al. Scheme Our scheme 1500 When the data owner will not need the outsourced file anymore, he wants to permanently delete the data from the cloud server. To delete the outsourced file, our scheme 1000 needs to execute three signature generation operations and a signature verification computation. Moreover, our Time cost (us) scheme also needs to perform (n + 1) hash calculations to 500 update the MHT. However, Hao et al. scheme [5] merely needs to generate a signature. Meanwhile, Yang et al. 0 scheme [22] needs to compute 23 hash values, generate 0 10 20 30 40 50 60 70 80 two signatures and perform a signature verification oper- The number of n ation. Then the efficiency comparison among the three Figure 10: Time cost of deletion verification schemes is shown in Figure 9. We can easily find that the time overhead of our scheme will increase with the num- ber n, but the growth rate is very low. However, the time overhead of the other two schemes is almost constant. Moreover, although our scheme needs a little more time 6 Conclusions to delete a file, the time cost is very small. For example, when n = 40, the time cost is about 1.5 microseconds. In this paper, we put forward a novel MHT-based pub- Meanwhile, note that the deletion operation is one-time. licly verifiable outsourced data deletion scheme, which Therefore, our scheme is still very efficient. also supports efficient data leakage source tracking. In cloud storage, both the data owner and the cloud server might think the other is dishonest. In our new scheme, 2000 Hao et al. Scheme we use the cryptographic primitive of MHT to deal with Yang et al. Scheme Our scheme this trust problem. To be more specific, the cloud server should use MHT to compute an evidence after deletion. 1500 If the cloud server reserves the data maliciously, the data owner is able to easily detect the dishonest data reserva- 1000 tion by verifying the deletion proof. In addition, our novel scheme can satisfy the property of data leakage source Time cost (us) traceability, which can prevent the data owner and cloud 500 server from exposing the data to slander the other. In the future, we will study how to reach data deletion and leakage source traceability without requiring any TTP. 0 0 10 20 30 40 50 60 70 80 The number of n

Figure 9: Time cost of deletion Acknowledgments

This work is supported by the National Natu- After deletion, the data owner is able to check the data ral Science Foundation of China (No. 61962015), deletion result through verifying the returned deletion ev- the Science and Technology Program of Guangxi idence. Our scheme needs to execute two signature verifi- (No.AB17195045), and the Natural Science Foundation International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 895 of Guangxi (No.2016GXNSFAA380098). Moreover, the [12] R. Perlman, “File system design with assured authors gratefully acknowledge the anonymous reviewers delete,” in Proceedings of the Third IEEE Inter- for their valuable comments. national Security in Storage Workshop (SISW’05), pp. 83–88, Dec. 2005. [13] J. Reardon, D. Basin, and S. Capkun, “Sok: Secure References data deletion,” in Proceedings of the IEEE Sympo- sium on Security and Privacy (SP’13), pp. 301–315, [1] G. S. Aujla, R. Chaudhary, N. Kumar, A. K. Das, May 2013. and J. J. Rodrigues, “Secsva: Secure storage, veri- [14] J. Reardon, S. Capkun, and D. A. Basin, “Data node fication, and auditing of big data in the cloud envi- encrypted file system: Efficient secure deletion for ronment,” IEEE Communications Magazine, vol. 56, flash memory,” in Proceedings of the 21st USENIX no. 1, pp. 78–85, 2018. Conference on Security Symposium, pp. 333–348, [2] D. Boneh and R. Lipton, “A revocable backup Aug. 2012. system,” in Proceedings of the 6th Conference on [15] Y. Tang, P. P. Lee, J. C. Lui, and R. Perlman, “Fade: USENIX Security Symposium, pp. 91–96, pp. 22-25, Secure overlay cloud storage with file assured dele- 1996. tion,” in Proceedings of the 6th Iternational ICST [3] K. Brindha and N. Jeyanthi, “Securing portable doc- Conference on Security and Privacy in Communica- ument format file using extended visual cryptogra- tion Systems, pp. 380–397, Sep. 2010. phy to protect cloud data storage,” International [16] W. L. Wang, Y. H. Chang, P. C. Huang, C. H. Tu, Journal of Network Security, vol. 19, no. 5, pp. 684– , H. W. Wei, and W. K. Shih, “Relay-based key 693, 2017. management to support secure deletion for resource- [4] N. Garg and S. Bawa, “Rits-mht: Relative indexed constrained flash-memory storage devices,” in Pro- and time stamped merkle hash tree based data audit- ceedings of the 21st Asia and South Pacific Design ing protocol for cloud computing,” Journal of Net- Automation Conference (ASP-DAC’16), pp. 444– work and Computer Applications, vol. 84, pp. 1–13, 449, Jan. 2016. 2017. [17] Y. Wang, X. Tao, J. Ni, and Y. Yu, “Data integrity [5] F. Hao, D. Clarke, and A. F. Zorzo, “Deleting secret checking with reliable data transfer for secure cloud data with public verifiability,” IEEE Transactions on storage,” International Journal of Web and Grid Dependable and Secure Computing, vol. 13, no. 6, Services, vol. 14, no. 1, pp. 106–121, 2018. pp. 617–629, 2016. [18] M. Y. C. Wei, L. M. Grupp, F. E. Spada, and [6] W. F. Hsien, C. C. Yang, and M. S. Hwang, “A sur- S. Swanson, “Reliably erasing data from flash- vey of public auditing for secure data storage in cloud based solid state drives,” in Proceedings of the 9th computing,” International Journal of Network Secu- USENIX Conference on File and Storage Technolo- rity, vol. 18, no. 1, pp. 133–142, 2016. gies (FAST’11), pp. 105–117, Feb. 2011. [7] G. Lou and Z. Cai, “A cloud computing oriented neu- [19] L. Xue, J. Ni, Y. Li, and J. Shen, “Provable data ral network for resource demands and management transfer from provable data possession and deletion scheduling,” International Journal of Network Secu- in cloud storage,” Computer Standards & Interfaces, rity, vol. 21, no. 3, pp. 477–482, 2019. vol. 54, pp. 46–54, 2017. [8] Y. Luo, M. Xu, S. Fu, and D. Wang, “Enabling as- [20] L. Xue, Y. Yu, Y. Li, M. H. Au, X. Du, and B. Yang, sured deletion in the cloud storage by overwriting,” “Efficient attribute-based encryption with attribute in Proceedings of the 4th ACM International Work- revocation for assured data deletion,” Information shop on Security in Cloud Computing, pp. 17–23, Sciences, vol. 479, pp. 640–650, 2019. May 2016. [21] C. Yang, X. Chen, and Y. Xiang, “Blockchain-based [9] H. Ma, X. Han, T. Peng, and L. Zhang, “Secure publicly verifiable data deletion scheme for cloud and efficient cloud data deduplication supporting dy- storage,” Journal of Network and Computer Appli- namic data public auditing,” International Journal cations, vol. 103, pp. 185–193, 2018. of Network Security, vol. 20, no. 6, pp. 1074–1084, [22] C. Yang, X. Tao, F. Zhao, and Y. Wang, “A new 2018. outsourced data deletion scheme with public verifia- [10] Z. Mo, Y. Qiao, and S. Chen, “Two-party fine- bility,” in Proceedings of the 14th International Con- grained assured deletion of outsourced data in cloud ference on Wireless Algorithms, Systems, and Appli- systems,” in Proceedings of the IEEE 34th Interna- cations (WASA’19), pp. 631–638, June 2019. tional Conference on Distributed Computing Systems [23] C. Yang, J. Wang, X. Tao, and C. Chen, “Pub- (ICDCS’14), pp. 308–317, July 2014. licly verifiable data transfer and deletion scheme for [11] J. Ni, X. Lin, K. Zhang, Y. Yu, and X. X. Shen, cloud storage,” in Proceedings of the 20th Interna- “Secure outsourced data transfer with integrity ver- tional Conference on Information and Communica- ification in cloud storage,” in Proceedings of the tions Security (ICICS’18), pp. 445–458, Oct. 2018. IEEE/CIC International Conference on Communi- [24] C. Ynag and X. Tao, “New publicly verifiable cloud cations in China (ICCC’16), pp. 1–6, July 2016. data deletion scheme with efficient tracking,” in Pro- International Journal of Network Security, Vol.22, No.5, PP.885-896, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).20) 896

ceedings of the 2th International Conference on Se- in 2014. And now he is currently undergoing his Ph.D curity with Intelligent Computing and Big-data Ser- degree at the School of Cyber Engineering at Xidian vices (SICBS’18), pp. 1–14, Dec.2018. university. His research interests are network security, [25] Y. Yu, J. Ni, W. Wu, and Y. Wang, “Prov- cloud computing, verifiable data deletion and blockchian. able data possession supporting secure data trans- Xiaoling Tao biography. Xiaoling Tao received the fer for cloud storage,” in Proceedings of the 10th M.S. degree in computer application technology from International Conference on Broadband and Wire- Guilin University of Electronic Technology, Guilin, less Computing, Communication and Applications China in 2008 and became the faculty member of this (BWCCA’15), pp. 38–42, Nov. 2015. university in 2000. She is currently a professor at the [26] Y. Yu, L. Xue, Y. Li, X. Du, M. Guizani, school of computer science and information security, and B. Yang, “Assured data deletion with fine- Guilin University of Electronic Technology. And now grained access control for fog-based industrial appli- she is studing for her Ph.D degree in information and cations,” IEEE Transactions on Industrial Informat- communication engineering at Guilin University of Elec- ics, vol. 14, no. 10, pp. 4538–4547, 2018. tronic Technology. Her current research interests include network security, cloud computing and computational Biography intelligence. Qiyu Chen biography. Qiyu Chen is a research scholar Changsong Yang biography. Changsong Yang received in the College of Information and Computer Engineering, the B.S. degree in school of telecommunications engi- Northeast Forestry University. neering from Xidian university, Xi’an, China in 2014 and became a master degree candidate of this university Guide for Authors International Journal of Network Security

IJNS will be committed to the timely publication of very high-quality, peer-reviewed, original papers that advance the state-of-the art and applications of network security. Topics will include, but not be limited to, the following: Biometric Security, Communications and Networks Security, Cryptography, Database Security, Electronic Commerce Security, Multimedia Security, System Security, etc.

1. Submission Procedure Authors are strongly encouraged to submit their papers electronically by using online manuscript submission at http://ijns.jalaxy.com.tw/.

2. General Articles must be written in good English. Submission of an article implies that the work described has not been published previously, that it is not under consideration for publication elsewhere. It will not be published elsewhere in the same form, in English or in any other language, without the written consent of the Publisher.

2.1 Length Limitation: All papers should be concisely written and be no longer than 30 double-spaced pages (12-point font, approximately 26 lines/page) including figures.

2.2 Title page The title page should contain the article title, author(s) names and affiliations, address, an abstract not exceeding 100 words, and a list of three to five keywords.

2.3 Corresponding author Clearly indicate who is willing to handle correspondence at all stages of refereeing and publication. Ensure that telephone and fax numbers (with country and area code) are provided in addition to the e-mail address and the complete postal address.

2.4 References References should be listed alphabetically, in the same way as follows: For a paper in a journal: M. S. Hwang, C. C. Chang, and K. F. Hwang, ``An ElGamal-like cryptosystem for enciphering large messages,'' IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 2, pp. 445--446, 2002. For a book: Dorothy E. R. Denning, Cryptography and Data Security. Massachusetts: Addison-Wesley, 1982. For a paper in a proceeding: M. S. Hwang, C. C. Lee, and Y. L. Tang, ``Two simple batch verifying multiple digital signatures,'' in The Third International Conference on Information and Communication Security (ICICS2001), pp. 13--16, Xian, China, 2001. In text, references should be indicated by [number].

Subscription Information Individual subscriptions to IJNS are available at the annual rate of US$ 200.00 or NT 7,000 (Taiwan). The rate is US$1000.00 or NT 30,000 (Taiwan) for institutional subscriptions. Price includes surface postage, packing and handling charges worldwide. Please make your payment payable to "Jalaxy Technique Co., LTD." For detailed information, please refer to http://ijns.jalaxy.com.tw or Email to [email protected].