ISSN 1816-353X (Print) Vol. 22, No. 5 (September 2020) ISSN 1816-3548 (Online)
INTERNATIONAL JOURNAL OF NETWORK SECURITY
Editor-in-Chief Prof. Min-Shiang Hwang John C.S. Lui Department of Computer Science & Information Engineering, Asia Department of Computer Science & Engineering, Chinese University of University, Taiwan Hong Kong (Hong Kong) Kia Makki Co-Editor-in-Chief: Telecommunications and Information Technology Institute, College of Prof. Chin-Chen Chang (IEEE Fellow) Engineering, Florida International University (USA) Department of Information Engineering and Computer Science, Feng Chia University, Taiwan Gregorio Martinez University of Murcia (UMU) (Spain) Publishing Editors Sabah M.A. Mohammed Shu-Fen Chiou, Chia-Chun Wu, Cheng-Yi Yang Department of Computer Science, Lakehead University (Canada) Lakshmi Narasimhan Board of Editors School of Electrical Engineering and Computer Science, University of Newcastle (Australia) Ajith Abraham Khaled E. A. Negm School of Computer Science and Engineering, Chung-Ang University Etisalat University College (United Arab Emirates) (Korea) Wael Adi Joon S. Park Institute for Computer and Communication Network Engineering, Technical School of Information Studies, Syracuse University (USA) University of Braunschweig (Germany) Antonio Pescapè Sheikh Iqbal Ahamed University of Napoli "Federico II" (Italy) Department of Math., Stat. and Computer Sc. Marquette University, Chuan Qin Milwaukee (USA) University of Shanghai for Science and Technology (China) Vijay Atluri Yanli Ren MSIS Department Research Director, CIMIC Rutgers University (USA) School of Commun. & Infor. Engineering, Shanghai University (China) Mauro Barni Mukesh Singhal Dipartimento di Ingegneria dell’Informazione, Università di Siena (Italy) Department of Computer Science, University of Kentucky (USA) Andrew Blyth Tony Thomas Information Security Research Group, School of Computing, University of School of Computer Engineering, Nanyang Technological University Glamorgan (UK) (Singapore) Chi-Shiang Chan Mohsen Toorani Department of Applied Informatics & Multimedia, Asia University (Taiwan) Department of Informatics, University of Bergen (Norway) Chen-Yang Cheng Sherali Zeadally National Taipei University of Technology (Taiwan) Department of Computer Science and Information Technology, University Soon Ae Chun of the District of Columbia, USA College of Staten Island, City University of New York, Staten Island, NY Jianping Zeng (USA) School of Computer Science, Fudan University (China) Stefanos Gritzalis Justin Zhan University of the Aegean (Greece) School of Information Technology & Engineering, University of Ottawa (Canada) Lakhmi Jain School of Electrical and Information Engineering, Ming Zhao University of South Australia (Australia) School of Computer Science, Yangtze University (China) Chin-Tser Huang Mingwu Zhang Dept. of Computer Science & Engr, Univ of South Carolina (USA) College of Information, South China Agric University (China) James B D Joshi Yan Zhang Dept. of Information Science and Telecommunications, University of Wireless Communications Laboratory, NICT (Singapore) Pittsburgh (USA) Ç etin Kaya Koç PUBLISHING OFFICE School of EECS, Oregon State University (USA) Min-Shiang Hwang Department of Computer Science & Information Engineering, Asia Shahram Latifi Department of Electrical and Computer Engineering, University of Nevada, University, Taichung 41354, Taiwan, R.O.C. Las Vegas (USA) Email: [email protected] Cheng-Chi Lee International Journal of Network Security is published both in traditional Department of Library and Information Science, Fu Jen Catholic University paper form (ISSN 1816-353X) and in Internet (ISSN 1816-3548) at (Taiwan) http://ijns.jalaxy.com.tw Chun-Ta Li PUBLISHER: Candy C. H. Lin Department of Information Management, Tainan University of Technology (Taiwan) © Jalaxy Technology Co., Ltd., Taiwan 2005 23-75, P.O. Box, Taichung, Taiwan 40199, R.O.C. Iuon-Chang Lin Department of Management of Information Systems, National Chung Hsing University (Taiwan) Volume: 22, No: 5 (September 1, 2020)
International Journal of Network Security
Research on Malware Detection and Classification Based on Artificial 1. Intelligence Li-Chin Huang, Chun-Hsien Chang, and Min-Shiang Hwang, pp. 717-727
2. A BLS Signature Scheme from Multilinear Maps Fei Tang and Dong Huang, pp. 728-735
3. Eighth Power Residue Double Circulant Self-Dual Codes Changsong Jiang, Yuhua Sun, and Xueting Liang, pp. 736-742
Identity-based Public Key Cryptographic Primitive with Delegated Equality Test 4. Against Insider Attack in Cloud Computing Seth Alornyo, Acheampong Edward Mensah, and Abraham Opanfo Abbam, pp. 743-751
5. One-Code-Pass User Authentication Based on QR Code and Secret Sharing Yanjun Liu, Chin-Chen Chang, and Peng-Cheng Huang, pp. 752-762
Efficient Anonymous Ciphertext-Policy Attribute-based Encryption for General 6. Structures Supporting Leakage-Resilience Xiaoxu Gao and Leyou Zhang, pp. 763-774
7. A Distributed Density-based Outlier Detection Algorithm on Big Data Lin Mei and Fengli Zhang, pp. 775-781
8. Binary Executable Files Homology Detection with Genetic Algorithm Jinyue Bian and Quan Qian, pp. 782-792
A New Diffusion and Substitution-based Cryptosystem for Securing Medical 9. Image Applications L. Mancy and S. Maria Celestin Vigila, pp. 793-800
10. A LWE-based Oblivious Transfer Protocol from Indistinguishability Obfuscation Shanshan Zhang, pp. 801-808
Verifiable Secret Sharing Based On Micali-Rabin's Random Vector 11. Representations Technique Haiou Yang and Youliang Tian, pp. 809-814
A Privacy-Preserving Data Sharing System with Decentralized Attribute-based 12. Encryption Scheme Li Kang and Leyou Zhang, pp. 815-827
13. Evidence Gathering of Facebook Messenger on Android Ming-Sang Chang and Chih-Ping Yen, pp. 828-837
14. Protection of User Data by Differential Privacy Algorithms Jian Liu and Feilong Qin, pp. 838-844
Verifiable Attribute-based Keyword Search Encryption with Attribute Revocation 15. for Electronic Health Record System Zhenhua Liu, Yan Liu, Jing Xu, and Baocang Wang, pp. 845-856
An Unlinkable Key Update Scheme Based on Bloom Filters for Random Key 16. Pre-distribution Bin Wang, pp. 857-862
17. Survey on Attribute-based Encryption in Cloud Computing P. R. Ancy, Addapalli V. N. Krishna, K. Balachandran, M. Balamurugan, and O. S. Gnana Prakasi, pp. 863-868
A Sequential Cipher Algorithm Based on Feedback Discrete Hopfield Neural 18. Network and Logistic Chaotic Sequence Shoulin Yin, Jie Liu, and Lin Teng, pp. 869-873
A Novel Blockchain-based Anonymous Handover Authentication Scheme in 19. Mobile Networks ChenCheng Hu, Dong Zheng, Rui Guo, AXin Wu, Liang Wang, and ShiYao Gao, pp. 874-884
New Publicly Verifiable Data Deletion Supporting Efficient Tracking for Cloud 20. Storage Changsong Yang, Xiaoling Tao, and Qiyu Chen, pp. 885-896
International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 717
Research on Malware Detection and Classification Based on Artificial Intelligence
Li-Chin Huang1, Chun-Hsien Chang2, and Min-Shiang Hwang3,4 (Corresponding author: Min-Shiang Hwang)
Department of Information Management, Executive Yuan, Taipei 10058, Taiwan1 Department of Management Information Systems, National Chung Hsing University, Taiwan2 Department of Computer Science & Information Engineering, Asia University, Taichung, Taiwan3 500, Lioufeng Rd., Wufeng, Taichung 41354, Taichung, Taiwan, R.O.C. Department of Medical Research, China Medical University Hospital, China Medical University, Taiwan4 (Email: [email protected]) (Received Apr. 13, 2020; Revised and Accepted July 8, 2020; First Online July 9, 2020)
Abstract erated adversarial network is optimized for small sample malware detection. Malware remains one of the major threats to network Keywords: Android Malware; Deep Learning; Machine security. As the types of network devices increase, in ad- Learning; Malware; Ransomware Detection dition to attacking computers, the amount of malware that affects mobile phones and the Internet of Things de- vices has also significantly increased. Malicious software 1 Introduction can alter the regular operation of the victim’s machine, damage user files, steal private information from the user, Today, malware is still one of the significant cybersecu- steal user permissions, and perform unauthorized activi- rity threats [2,10]. According to Symantec’s 2018 Cyber- ties on the device. For users, in addition to the inconve- security Threat Report [23]: In terms of traditional mal- nience caused by using the device, it also poses a threat ware that affects computers, the total number of malware to property and information. Therefore, in the face of variants discovered in 2017 was as high as 669 million, an malware threats, if it can accurately and quickly detect increase of 87.7% from the amount found in 2016; In terms its presence and deal with it, it can help reduce the im- of malware that affects mobile devices such as cell phones, pact of malware. To improve the accuracy and efficiency the number of new variants has increased from 17,000 to of malware detection, this article will use deep learning 27,000, an increase of about 55%. Due to the advance- technology in the field of artificial intelligence to study ment and popularization of IoT technology, the number and implement high-precision classification models to im- of malware that affects IoT devices has increased signifi- prove the effectiveness of malware detection. We will use cantly in recent years. The report mentions that malware convolutional neural networks and long and short-term variants affecting IoT devices have grown by about 600% memory as the primary training model. When using con- in recent years. It is the main threat to the currently volutional neural networks for training, we use malware accessible IoT device environment. visualization techniques. By converting malware features Common malware currently includes Kotver Trojans, into images for input, and adjusting the input features worms, ransomware, spyware, and Coinminer. Most mal- and input methods, models with higher classification ac- ware infection methods are that attackers use system and curacy will be found; in long-term and short-term memory software security vulnerabilities to implant malware into models, appropriate features and preprocessing methods the victim’s device or trick the victim into downloading are used to find Model with high classification accuracy. a file containing malware, and then implant the malware Finally, the accuracy of small sample training is optimized into the victim’s computer. Different malware can have by generating features for network output samples. In the different effects on infected devices. Usually, it may af- above training, all of us want to use malware as a sample fect the regular operation of the device, damage user files, that affects different devices. In this article, we propose steal user’s private information, steal user rights, and per- three research topics: 1). When importing images, high- form unauthorized activities on the device. Most ran- precision models are used to study malware. 2). When somware encrypts users’ files and requires a ransom to importing non-images, a high-precision model will be used unlock files, which poses a severe threat to users’ data to study the malware. 3). By using this model, the gen- and property. Besides, due to the prevalence of virtual International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 718 currencies such as Bitcoin, malware that secretly uses the samples: computing power of the victim’s computer for mining op- TP + TN erations is also one of the widespread malware in recent Accuracy = years. FN + TP + FP + TN Traditionally, malware detection methods can be This result indicates the classifier’s ability to roughly divided into static analysis and dynamic anal- distinguish the entire sample set. However, in ysis. Static analysis can be realized by byte sequence some cases, this indicator will fail. For example, analysis, control flow chart and operation code frequency if there are 10,000 A samples and 100 B samples distribution to achieve malware identification; Currently, in the data set, and the classifier will judge all dynamic analysis is based on the use of appropriate moni- samples as A, the accuracy is still 99%. toring tools for API call monitoring and program behavior monitoring in controlled environments (such as sandboxes Recall: The proportion of positive samples correctly or virtual machines). Due to the development of artificial classified by the classifier among all positive intelligence technology, the identification of malware has samples: excellent potential. At present, many studies aiming at TP this aspect aim to improve accuracy and efficiency. Cur- Recall = FN + TP rently, most artificial intelligence technologies are used to improve the detection of static analysis. Precision: The proportion of all classified samples In order to achieve the purpose of detecting and dis- classified as classified samples by the classifier: tinguishing malware, many methods have been proposed TP in different studies. In 2011, Nataraj et al. proposed P recision = a method for classifying malware after visualization is TP + FP proposed [17]; In 2011, Santos et al. A method for de- The result indicates the actual accuracy of pre- tecting malware using the opcode sequence of malware is dicting positive samples. proposed. In 2013, based on the 2011 proposed method and the method of tracking the behavior of malware after F-measure: If the Recall is as important as Preci- execution, combined static analysis and dynamic analy- sion in the model, the F-measure can be used sis to detect malware [20, 21]; In 2014, Zolotukhin et al. as an indicator. The calculation also considers proposed to use the n-gram method to find the essen- two indicators: Precision and Recall: tial features in the opcode sequence, and then use sup- P recision × Recall port vector machines for training to detect malware [26]. F − measure = 2 × P recision + Recall In previous studies, only a small amount of opcode ex- traction can reduce the performance overhead. However, 2) Operating cost: Compare the time spent training and when this method is used for a small number of train- distinguishing samples with the required operating ing sets, accuracy may be severely affected. Therefore, resources. Zhang et al. proposed a method to convert Opcode se- quences to images in 2017 was introduced to improve this problem [25]; In the same year, Kwon et al. proposed to Table 1: The evaluation criteria use API call patterns to identify the type of malware [15]; In 2018, Ni et al. proposed to use the SimHash method Actual True Actual False for hashing and converting it into an image as a sam- Prediction True Positive False Positive ple for training [18]; Kim et al. proposed a method for True (TP) (FP) detecting Android malware using multi-feature input as Prediction False Negative True Negative training is proposed [14]; For malware affecting the In- False (FN) (TN) ternet of Things, Hamed et al. uses LSTM as training to detect malware that affects IoT devices in 2018 [8]; Sajad et al. digitized ransomware running records, and then use CNN and LSTM to train a classifier to achieve ransomware recognition [9]. 2 Research on Malware Input by The following evaluation criteria are commonly used to evaluate the effectiveness of this research topic: Image 1) Accuracy, recall rate, precision, and F-measure: 2.1 Motivations These indicators are used to analyze the results of machine learning. Table 1 shows the calculation Many types of machine learning models can identify method for each parameter. malware. At present, the performance is the best, and the most popular are various neural network models. Among Accuracy: The proportion of correct samples clas- them, a convolutional neural network (CNN) can be said sified for the classifier to the total number of to be one of the representatives. CNN is very suitable International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 719 for the recognition of image processing. When identifying an image, training will be conducted to achieve the effect malware, if the training samples are input as images, CNN of identifying malware. In this study, the data set used can achieve a good classification effect. included ten types of malware, with a total of 9168 sam- Nataraj et al. proposed the use of image processing ples. The classification accuracy of the research results is technology to visualize and classify malware in 2011 [17]. about 93.7% 96.7%. This method converts binary malware files into grayscale In 2018, Ni et al. proposed an MCSC (Malware Clas- images for classification. Kancherla et al. proposed in sification using SimHash and CNN) method [18]. Use 2013 to convert malware executable files into grayscale the SimHash method to hash the decompiled malware to images called byteplot. And use a Support Vector Ma- achieve the effect that similar functions can hash similar chine (SVM) as a training tool [11]. The method pro- sequences. After hashing, the results are converted into posed by Zhang et al. in 2017 decompiled the binary grayscale images, which are then trained using CNN to executable file and converted the Opcode sequence into classify the malware. Figure 3 shows the research struc- an image. Then identify whether the file is malware [25]. ture. Ni et al. proposed to use the SimHash method in 2018 This method first decompiles the malware file into Op- to hash decompiled malware for sequence similarity com- code and then executes SimHash. The calculation method parison. Then use CNN for training to achieve the effect of SimHash is shown in Figure 4. of distinguishing different types of malware [18]. The results obtained by SimHash will be converted into From the above research, we can see that after visual- grayscale images and used as samples for CNN training. izing the malware and then analyzing it, a lot of research The method proposed in this study maintains a stable has been done in this area. But the methods are different, classification accuracy rate for relatively few malware cat- so we think it is still possible to find higher accuracy and egories in the sample, with an average accuracy rate of efficiency in visualizing and analyzing malware. In addi- 98.86%. tion to the above research, we also need to focus on con- verting different data from malware and then performing 2.3 Malware Detection Based on the machine learning, different preprocessing of image input, High-precision Model for Image In- and the impact on training. In the first research topic, we propose to use several different feature input methods put to convert to images, and then try to use different pre- This research topic, malware implemented with the processing methods to process the images. Finally, the high-precision model during image input, focuses on mod- best feature selection and preprocessing methods in the els that can obtain the highest accuracy and performance experiment are obtained. when using malware as training input. The research ar- chitecture is shown in Figure 5. 2.2 Related Works This research topic will use different methods to deal with the steps of converting samples into images. Eval- The malware is based on images as training samples. uate the impact of selecting the appropriate function or After the malware is visualized, it can be processed for directly converting to an image on accuracy. Then, after related research to identify the malware. Nataraj et al. converting to an image, explore whether different image proposed a method for visualizing malware, and perform- processing methods can improve accuracy. Finally, ad- ing automatic classification in 2011 is proposed [17]. Fig- just the training model to obtain a classifier that can be ure 1 is a schematic of the study. Since it was observed accurately classified. that the malware of the same family would have similari- This research topic will collect malware samples that ties in the texture and layout of the image, the study first affect different devices—for example, traditional comput- converted binary malware files into 8-bit vectors. Then ers used for research, mobile phones, Internet of Things convert the 8-bit vector into a grayscale image. Finally, devices, etc. Then, when mirroring the malware, we will the k-nearest neighbor of Euclidean distance is used for use different methods to deal with malware. One is to ex- classification. The 9458 sample classifications have 98% tract features from malware and then convert the features classification accuracy among its 25 categories. into images. The extraction is mainly based on opcodes; The method proposed by Zhang et al. in 2017 decom- the other is to image the malware directly. After process- piled the binary executable file into an Opcode sequence. ing the image, the image is used as a training sample. After converting it into an image, a convolutional neu- Convolutional neural network (CNN) is the most ral network was used to compare the target image with widely used deep learning technique in image process- the image generated by known malware. Then determine ing [1]. We will input the samples obtained in the previous whether the file is malware [25]. The flow chart of its step to CNN for training. CNN is roughly composed of method is shown in Figure 2. a convolutional layer, pooling layer, and fully connected First, after decompressing the unknown file, find out layer, as shown in Figure 6. its opcode sequence and frequency of occurrence. Next, C1 and C2 in Figure 6 are convolutional layers. The use ”Information Gain” to find out which function to use convolution layer consists of convolution units. The in- in training. After the selected function is converted to put malware image and function detector (filter) are con- International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 720
Malware Convert Binary Convert an 8-bit Use 01000010101000101 - to - Vector to - the KNN 001010100001... 8-bit Vector a Graylevel Image Classification
Figure 1: Method for visualizing and automatically classifying malware [17]
Figure 2: Flow chart of Zhang et al.’s method [25]
Figure 3: MCSC architecture volved. Adding the appropriate excitation function (such Figure 7 is an example of convolution operation and as ReLU), we get the resulting output. The convolution maximum pooling. The yellow box in the figure is the operation can extract input features and repeatedly re- convolution operation of the 3 × 3 area and the feature move high-level, sophisticated features from low-level fea- detector; The red block is the largest pool in the 2 × 2 tures such as lines in a multi-layer network. area. The pooling layer will process the output of the con- Finally, the fully connected layer flattens the previ- volutional layer. The pooling layer can be seen as a sub- ous output and connects it to the neural network, and sampling process. Taking the maximum pool as an ex- obtains the output after passing through the neural net- ample, if the maximum pool is 2 × 2, the output will be work. In this research topic, we first use the traditional the maximum value in each 2 × 2 block. The data size, CNN model. The training affected malware samples from after processing by the pooling layer, will become smaller. different devices. After finding a better image input and Therefore, the number of parameters will also be reduced, processing method, adjust the CNN model to obtain bet- which helps reduce the phenomenon of overfitting. ter accuracy and performance. International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 721
Figure 4: Example of SimHash method
Binary Malware File Malware Classification CNN Model
¡¡ A 6 ¡ AAU Malware File - Feature Selection - Malware Image Generation - Image Processing
Figure 5: A research framework for the malware detection based on the high-precision model for image input
Figure 6: Example of CNN architecture
Figure 7: Convolution operation and maximum pooling
There are currently many sample sets open to the out- samples. Several malware data sets are currently open to side world. However, new malware samples that have the public and are expected to be used in research: recently appeared may be missing. There are two leading solutions: One is to cooperate with the malware collection 1) Microsoft Malware Classification Challenge [16]; platform to obtain fresh samples. The second is to estab- 2) The ultimate gaming malware research bench- lish a platform and encourage users to upload malware mark [4]; International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 722
3) Android malware data set [24]; training. Finally, input files can be distinguished as be- nign software or malware. The study used two sets of 4) AndroZoo [3]; samples, namely 1,075 malware and 19,417 benign soft- 5) CTU-13 [22]. ware, and 1,209 malware and 1,300 benign software. The model under study achieved 98% accuracy and 0.99 F measurement in the first set of samples. In the second set 3 Research on Malware Input by of samples, 99% accuracy and 0.99 F measurement were obtained. Non-image Due to the rapid increase in malware targeting IoT de- vices, Hamed et al. used LSTM to detect IoT malware [8] 3.1 Motivations in 2018. Figure 9 is its research architecture. After de- In addition to using image input for machine learning compressing and decompiling the sample, the opcode is of malware samples, directly using opcodes or character taken out, and then the opcode is selected as the fea- string sequences as training inputs is also one of the main ture to generate the feature vector. Then use RNN and methods. However, just as the input image is used as the LSTM for training. The study used 281 malware samples training sample, the selection of the input features of the and 270 benign software samples. And use three different sample, the preprocessing of the data, and the adjustment LSTM configurations for training (Layers 1 to 3). Finally, of the model all affect the accuracy and efficiency of the 100 samples not used in training are used to evaluate the classification. performance of the model. Finally, compare with random Kim et al. proposed a multi-mode deep learning forest, support vector machine, KNN, and other classi- method using multiple types of features to detect Android fication models. The two-layer LSTM configuration has malware. The method uses information obtained from de- the highest accuracy. The accuracy rate is 98.18%. compiled apk files to extract various kinds of features—for Sajad et al. proposed the DRTHIS (Deep ran- example, strings, permissions, and calls API, and so on. somware threat hunting and intelligence system) method Then use the Multimodal Neural Network (MNN) for in 2018 [9]. The research architecture is shown in Fig- training. Finally, it is used to determine whether the in- ure 10. It is roughly divided into three parts: data conver- put file is benign software or malware [14]. Hamed et al. sion, threat detection (whether detection is ransomware), used RNN (Recurrent Neural Network) and LSTM (Long and what kind of ransomware is detected. This method Short Term Memory) in 2018 to take Opcode of IoT mal- records and digitizes the motion information within 10 ware as input to train a model that can be judged to be seconds after the file is executed. The digitized data will benign or malware [8]. Sajad et al. used to record and be merged with the label into a training data set. Then digitize the actions of the ransomware when it was run- through the two models of CNN and LSTM, the binary ning, and then used LSTM and CNN to train separately classifier and the ransomware classifier are trained. Fi- to classify benignly and ransomware and their types [9]. nally, a system capable of judging benign and malicious The subject of this research is to test different types of software and distinguishing the types of ransomware is samples (traditional computer malware, mobile malware, obtained. In this study, the LSTM model obtained rel- etc.) based on their selectable characteristics. Find the atively good results. In the experiment, three different combination of the best accuracy and efficiency, and com- types of ransomware samples (Locky, Cerber, TeslaCrypt) pare it with the best performing images obtained in the were used in the experiment. Each type of ransomware previous stage. used 220 and 219 benign samples for training. The re- sult of the F-measure is 0.96, and the true positive rate is 97.2%. Also, the study used other types of ransomware 3.2 Related Works not used for training. 99% of CryptWall samples, 75% The subject of this study uses non-image methods as of TorrentLocker samples, and 92% of Sage samples were a sample input. This section will introduce research on correctly classified. different types of malware (such as Android, IoT, and ransomware). 3.3 Malware Detection Based on the In 2018, Kim et al. proposed an Android malware High-precision Model for Non-image detection framework that uses multi-mode deep learn- Input ing methods with multiple input features to detect An- droid malware [14]. Its architecture is shown in Figure 8. For this research topic, we focus on research when non- First, decompile the apk file to extract various types of images are used as a sample input. The research archi- functions. This study is divided into seven categories: tecture is shown in Figure 11. String functions, method opcode functions, method API The input samples are adjusted according to different functions, shared library function opcode functions, per- feature selection methods and combinations. At the same mission functions, component functions, and environment time, there are many variations of LSTM. For example, functions. After generating the feature matrix from the GRU (Gated Cycling Unit), etc. The accuracy and per- features, a multimodal neural network will be used for formance of the image are also worth discussing. International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 723
Raw Data Extraction Feature Extraction Feature Vector Multimodel Neural Net Generation Each vector is inputed Decoding - Permission/Components Manifest Files /Environment Infomation Generating separately feature vector DNN ··· DNN Decoding - String/Dalvik Opcode - - Malware- Dex Files Freq./API Invocation Freq. by each or Benign ? ? Disassembling kind of - Opcode Freq. Final Net Shared Library features
Figure 8: The framework of Kim et al. [14]
ELF Parser Deep Threat Classifier Feature Extraction Updating ELF File Selecting Opcode Deep Net Tuning ? ARM ELF Files - and Decompling ELF File - Creating - ELF File ? ? Feature Deep Training Extracting Opcode Vector
Figure 9: Using LSTM to detect IoT malware [8]
Threat Intelligence Data Transformation Threat Hunting Deep Learning: Deep Learning: Training Multi-class Classifier Data Transformation Training Binary Classifier - - ? Multi-class Classifier Combing and Labeling Binary Classifier One class Classifier
Figure 10: DRTHIS architecture [9]
Malware File - Feature Selection - LSTM Model - Malware Classification
Figure 11: Research structure for the malware implemented with the high-precision model for non-image input International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 724
First, we select the features of malware samples based Most GANs are used to process image samples. When on their category characteristics. For example, malware this method is applied to the classification of malware, that affects mobile phones will have functions such as per- we convert the samples into image input methods and mission requirements, which can also be used as features. study the accuracy of classifying a small number of sam- At this stage, we use LSTM and its deformation as the ples using GANs. In addition, there are currently several training model. variants of GANs. Improving the accuracy or efficiency When dealing with non-image input, especially the in- of malware classification is also one of the research direc- put of sequence data, LSTM will be a very suitable model. tions. Since the traditional RNN will have a connection phe- nomenon between the next node and the previous node (The vanishing gradient problem for RNNs), the weight 4.2 Related Works of the self-loop can be changed through the input gate, In 2017, Kim et al. proposed the use of generative ad- output gate, and forget gate: versarial networks for malware classification [13]. In 2018, the research continued and discussed the protection mea- • The function of the input gate is to determine sures for zero-day attacks [12]. Figure 13 is its research whether to add the current input to the long-term architecture. memory. Traditional malware detection mechanisms usually rely • The function of the output gate is to determine on existing functions. The defense effect against zero- whether to add the current input to the output. day attacks is limited. Therefore, the method proposed in this study first visualizes the malware and then uses • The function of the forget gate is to determine the generative adversarial network (GAN) for training. whether the incoming information of the upper layer Pseudo samples, the GAN generates pseudo samples and should be kept in memory or forgotten. adjusts the parameters through continuous confrontation with the discriminant network. Finally, the generated Therefore, when the model parameters are fixed, the fake samples are similar to the actual samples, but not problem of gradient disappearance or expansion can be the same. It can mimic a variant of the virus. avoided. Figure 12 shows the architecture of the LSTM Part of the detector uses an autoencoder to learn the model. features of the malware and feed it back to the generation There are many variations of LSTM. One of them is network to train the generator stably. The classification GRU (Gated Circulation Unit). GRU merges the input accuracy rate is 95.74%. At the same time, in the ex- gate and forgets the gate in LSTM into one update gate. periments of this study, noise-added sample images were Compared with LSTM, GRU has the advantage of simpli- used to simulate virus variants. The SSIM (Structural fying the calculation. If the prediction performance used Similarity Index) of the sample after adding noise and is similar to LSTM, it is possible to improve performance. the original sample is 0.6 ∼ 0.69. The accuracy rate is between 98.16% and 98.99%, which indicates that it also 4 Research on Malware and Gen- has good detection ability for the new variant virus. erative Adversarial Networks 4.3 Malware Detection Based on GAN 4.1 Motivations for Generating Small-sample Mal- ware In the current research, the results of machine learning are related to the number of samples. However, in the For this research topic, we used the best performing face of new malware threats, you must limit the number image input and processing methods in the first research of samples in a short period. The Generative Adversar- topic to process the samples to be used at this stage. ial Network (GAN) that have emerged in recent years Then use GAN for training to improve the training effect has considerable potential in this regard [6, 7, 12]. GAN of a small number of samples. It can be expected that is composed of the Generative model and the Discrimi- when new types of malware appear, they can be effectively native model. Generative model random samples of the detected even with a small number of samples. network as input, and the output should be as close as The research architecture at this stage is shown in Fig- possible to the real samples in the training and deception ure 14. In this research stage, we used a malware data set network; The discriminative model identifies the real sam- with a small number of samples and extracted a certain ples of the network or generates the output of the system amount of samples from the previous samples as samples and distinguishes the non-real samples as much as possi- at this stage. Then use GAN as a model for this training ble. These two networks face each other and adjust the phase. parameters, which ultimately enables the generating net- The concept of GAN was proposed by Goodfello et al. work to generate samples, thus making the discriminating in 2014 [7]. Figure 15 shows the underlying architecture network unable to identify authenticity. of the GAN. International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 725
Figure 12: LSTM model
Generating Malware Image - Training by Using tDCGAN - Malware Type Classification From Its Pattern
Figure 13: The research framework of Kim et al. [12]
GAN Model Malware Image Malware Dataset - - DCGAN Model - Malware Classification Generation . .
Figure 14: Research structure of malware detection based on GAN for generating small-sample malware
Real Samples @ @ @R Discriminator - Noise Fake Samples
Random Index ?- Generator 6 ?
Figure 15: Generating an adversarial network model
GAN consists of a generating network and a discrim- as possible. The two networks struggle with each other inative network. The generating network generates ran- and are adjusting parameters. The final sample generated dom samples of the network as input and produce an out- by the network is almost real. Using this technology, you put that can deceive and distinguish the network; The dis- can amplify the number of samples analyzed by malware criminative network inputs real samples or generates net- and a small amount of samples, and combine the results work output and distinguishes non-real samples as much of the previous two research topics to optimize the accu- International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 726 racy of malware analysis for a small number of samples. [5] M. Arjovsky, S. Chintala, L. Bottou, “Wasserstein GAN is not suitable for learning discrete data, such as GAN,” arXiv preprint, arXiv: 1701.07875, 2017. text. Therefore, in this part, the image will still be used [6] J. Bruner, A. Deshpande, Generative Adver- as the sample input form. sarial Networks for Beginners, July 8, 2020. In addition, GAN has conducted many studies to pro- (https://github.com/jonbruner/generative pose its deformation. For example, DCGAN proposed -adversarial-networks/blob/master/ in [19], WGAN proposed in [5], and so on. gan-notebook.ipynb) [7] I. Goodfellow, J. Pouget-Abadie, M. Mirze, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Ben- 5 Conclusions gio, “Generative adversarial nets,” in Proceedings of the 27th International Conference on Neural Infor- In this article, we have proposed three research topics: mation Processing Systems (NIPS’14), vol. 2, pp. 1) Research on malware input by image: Malware detec- 2672–2680, 2014. tion based on the high-precision model for image input; [8] H. HaddadPajouh, A. Dehghantanha, R. Khayami, 2) Research on malware input by non-image: Malware de- K. K. R. Choo, “A deep recurrent neural network tection based on the high-precision model for non-image based approach for internet of things malware threat input; 3) Research on malware and generative adversarial hunting,” Future Generation Computer Systems, vol. networks: Malware detection based on GAN for generat- 85, pp. 88-96, 2018. ing small-sample malware. [9] S. Homayoun, A. Dehghantanha, M. Ahmadzadeh, This article collected and used malware samples that S. Hashemi, R. Khayami, K. K. R. Choo, D. E. New- affected different devices for training. It contains mal- ton, “DRTHIS: Deep ransomware threat hunting and ware that affects computers, malware, and ransomware intelligence system at the fog layer,” Future Genera- that affects mobile phones. Malware that affects differ- tion Computer Systems, vol. 90, pp. 94-104, 2019. ent methods has various features. For example, due to the apk file structure of mobile phones, the malware that [10] S. Islam, H. Ali, A. Habib, N. Nobi, M. Alam, and D. affects mobile phones has many different characteristics Hossain, “Threat minimization by design and deploy- compared to the malware of cellular phones; Another ex- ment of secured networking model,” International ample, the permission functions and component functions Journal of Electronics and Information Engineering, mentioned in the related research. The goal of this article vol. 8, no. 2, pp. 135–144, 2018. is to propose a high-precision model for different types of [11] K. Kancherla, S. Mukkamala, “Image visualiza- samples. tion based malware detection,” in IEEE Sympo- sium on Computational Intelligence in Cyber Secu- rity (CICS’13), pp. 40-44, 2013. Acknowledgments [12] J. Y. Kim, S. J. Bu, S. B. Cho, “Zero-day mal- ware detection using transferred generative adversar- This research was partially supported by the Ministry ial networks based on deep autoencoders,” Informa- of Science and Technology, Taiwan (ROC), under contract tion Sciences, vol. 460–461, pp. 83-102, 2018. no.: MOST 108-2410-H-468-023 and MOST 108-2622-8- [13] J. Y. Kim, S. J. Bu, S. B. Cho, “Malware detec- 468-001-TM1. tion using deep transferred generative adversarial networks,” in International Conference on Neural In- formation Processing, pp. 556-564, 2017. References [14] T. Kim, B. Kang, M. Rho, S. Sezerm, E. G. Im, “A [1] D. S. Abdul Minaam and E. Amer, “Survey on multimodal deep learning method for Android mal- machine learning techniques: Concepts and algo- ware detection using various features,” IEEE Tran- rithms,” International Journal of Electronics and In- scations on Information Foresics and Security, vol. formation Engineering, vol. 10, no. 1, pp. 34–44, 14, no. 3, pp. 773-788, 2019. 2019. [15] I. Kwon, E. G. Im, “Extracting the representative [2] A. A. Al-khatib, W. A. Hammood, “Mobile malware API call patterns of malware families using recurrent and defending systems: Comparison study,” Inter- neural network,” in Proceedings of the International national Journal of Electronics and Information En- Conference on Research in Adaptive and Convergent gineering, vol. 6, no. 2, pp. 116–123, 2017. Systems, ACM, pp. 202-207, 2017. [3] K. Allix, T. F. Bissyand´e, J. Klein, and Y. Le [16] Microsoft, Microsoft Malware Classification Chal- Traon, “AndroZoo: Collecting millions of android lenge (BIG 2015), July 8, 2020. (https://www. apps for the research community,” in IEEE/ACM kaggle.com/c/malware-classification) 13th Working Conference on Mining Software Repos- [17] L. Nataraj, S. Karthikeyan, G. Jacob, B. S. Manju- itories (MSR’16), 2016. nath, “Malware images: Visualization and automatic [4] H. S. Anderson, P. Roth, “EMBER: An open dataset classification,” in Proceedings of the Eighth Interna- for training static PE malware machine learning tional Symposium on Visualization for Cyber Secu- models,” arXiv preprint, arXiv: 1804.04637, 2018. rity, pp. 311–320, 2011. International Journal of Network Security, Vol.22, No.5, PP.717-727, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).01) 727
[18] S. Ni, Q. Qian, R. Zhang, “Malware identification in information management from Chaoyang University using visualization images and deep learning,” Com- of Technology (CYUT), Taichung, Taiwan, in 2001; and puters & Security, vol. 77, pp. 871-885, 2018. the Ph.D. degree in computer and information science [19] A. Radford, L. Metz, S. Chintala, “Unsupervised rep- from National Chung Hsing University (NCHU), Taiwan resentation learning with deep convolutional gener- in 2001. Her current research interests include informa- ative adversarial networks,” Under review as a con- tion security, cryptography, medical image, data hiding, ference paper at ICLR 2016, 2016. (https://arxiv. network, security, big data, and mobile communications. org/pdf/1511.06434.pdf) Chun-Hsien Chang received B.S. in Department of Me- [20] I. Santos, F. Brezo, X. Ugarte-Pedrero, P. G. chanical Engineering from National Cheng Kung Univer- Bringas, “Opcode sequences as representation of ex- sity, Taiwan in 2017; M.S. in Department of Management ecutables for data mining based malware variant de- Information Systems, National Chung Hsing University, tection,” Information Sciences, vol. 231, no. 9, pp. Taiwan, in 2019. His current research interests include 64-82, 2011. [21] I. Santos, J. Devesa, F. Brezo, J. Nieves, “OPEM: A artificial intelligence and information security. static-dynamic approach for machine learning based malware detection,” in Proceedings of International Min-Shiang Hwang received M.S. in industrial engi- Conference (CISIS’12), pp. 271-280, 2013. neering from National Tsing Hua University, Taiwan in [22] Stratosphereips Lab., The CTU-13 Dataset. A 1988, and a Ph.D. degree in computer and information Labeled Dataset with Botnet, Normal and Back- science from National Chiao Tung University, Taiwan, in ground Traffic, July 8, 2020. (https://www. 1995. He was a professor and Chairman of the Depart- stratosphereips.org/datasets-ctu13/) ment of Management Information Systems, NCHU, dur- [23] Symantec, Internet Security Threat Report, vol.23, ing 2003-2009. He was also a visiting professor at the 2018. (https://www.symantec.com/content/dam/ University of California (UC), Riverside, and UC. Davis symantec/docs/reports/istr-23-2018-en.pdf) (USA) during 2009-2010. He was a distinguished profes- [24] F. Wei, Y. Li, S. Roy, X. Ou, W. Zhou, “Deep sor of the Department of Management Information Sys- ground truth analysis of current android malware,” tems, NCHU, during 2007-2011. He obtained the 1997, in International Conference on Detection of Intru- 1998, 1999, 2000, and 2001 Excellent Research Award of sions and Malware, and Vulnerability Assessment, National Science Council (Taiwan). Dr. Hwang was a Springer, pp. 252-276, 2017. dean of the College of Computer Science, Asia University [25] . J. Zhang, Z. Qin, H. Yin, L. Ou, Y. Hu, “IRMD: (AU), Taichung, Taiwan. He is currently a chair professor Malware variant detection using opcode image recog- with the Department of Computer Science and Informa- nition,” in IEEE International Conference on Par- tion Engineering, AU. His current research interests in- allel and Distributed Systems (ICPADS’17), pp. clude information security, electronic commerce, database 1175–1180, 2017. and data security, cryptography, image compression, and [26] M. Zolotukhin, T. Hamalainen, “Detection of zero- mobile computing. Dr. Hwang has published over 300+ day malware based on the analysis of opcode se- articles on the above research fields in international jour- quences,” in Proceedings of The 11th Annual IEEE nals. CCNC Security, Privacy and Content Protection, pp. 386-391, 2014.
Biography
Li-Chin Huang received the B.S. in computer science from Providence University, Taiwan, in 1993 and M.S. International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 728
A BLS Signature Scheme from Multilinear Maps
Fei Tang1 and Dong Huang2,3 (Corresponding author: Fei Tang)
School of Cyber Security and Information Law, Chongqing University of Posts and Telecommunications1 Chongqing, China Chongqing University of Science and Technology 2 Chongqing Vocational and Technical University of Mechatronics3 (Email: [email protected]) (Received Mar. 12, 2019; Revised and Accepted Dec. 10, 2019; First Online Apr. 6, 2020)
Abstract security of the BLS scheme is rely on the random ora- cle model. However, it is well known that random oracle The security of the BLS signature scheme is based on the is an ideal model. After BLS scheme, there has several random oracle model. Hence, there is a question that works that presented some signature schemes which se- how to instantiate BLS scheme without random oracles. cure in the standard model, such as [3]. However, all of In this work, by using a powerful tool multilinear map, these schemes do not preserve the BLS scheme’s simple we answer this question. The main contributions of this structure. This leads us to a question: Can we instan- work are as follows. First of all, we describe the BLS tiate the BLS signature scheme without random oracles? scheme in the setting of multilinear group and prove its In this work, by using a powerful tool multilinear map, security in the standard model. Then, we design a ring we answer this question. signature scheme based on the multilinear BLS scheme. The notion of multilinear maps was introduced (but In the proposed scheme, ring signatures consist of a single without concrete instantiation) by Boneh and Silver- multilinear group element. berg [8]. Until 2013, Garg, Gentry, and Helevi [12] Keywords: BLS Signatures; Multilinear Map; Ring Sig- gave the first approximate candidate. Then there has natures; Standard Model many multilinear map schemes been proposed or ana- lyzed, e.g., [9, 15, 21] et al. There are several relevant works answered the above 1 Introduction question. Hohenberger et al. [18] instantiated the ran- dom oracle with an actual family of hash functions for Digital signatures are one of the most fundamental and the BLS scheme by using a more powerful tool indis- well studied cryptographic primitives. The research of tinguishability obfuscation [13]. Freire et al. [11] took digital signatures has two aspects: practicability and se- advantage of multilinear maps to realize programmable curity. As for the aspect of practicability, we mainly con- hash functions and construct IBE, BLS signature, and sider the efficiencies of the signing and verification algo- SOK non-interactive key exchange schemes. In addi- rithms, the storage space of the state information, and tion, Hohenberger et al. [17] made use of the multilinear the properties that the scheme can provide, such as ag- BLS scheme to construct an (identity-based) aggregate gregate signatures [17], ring signatures [23], blind signa- signature scheme which admits unrestricted aggregation. tures [16,19,20] and so on. As for the aspect of the secu- In [17], Hohenberger et al. followed in the Waters [25] framework and proved the adaptive security of the mul- rity, we mainly focus on the assumptions that the scheme 1 based on, such as one-way function, and whether in the tilinear BLS signature scheme. However, their proof is random oracle model or standard model. based on a strong new assumption, (n, k)-modified mul- tilinear computational Diffie-Hellman exponent where n At ASIACRYPT 2001, Boneh, Lynn, and Shacham [6] is a polynomial of the number of queries made by the designed a short signature scheme based on bilinear adversary. group, the so called BLS scheme. The BLS scheme In this work, we also give an adaptive proof for the has shown to be very useful to construct other crypto- multilinear BLS scheme. However, our proof which takes graphic primitives, such as threshold signature scheme [7], advantage of the technique of admissible hash function [2] blind signature scheme [7], signcryption scheme [10], key- is based on a weaker assumption, multilinear computa- generation algorithm of identity-based encryption (IBE) scheme [4]. The main reason of the BLS scheme is so 1Their adaptive proof of the aggregate signature scheme implies useful is that it has a relatively simple structure. The this result. (Please refer to Appendix D.2 of [17] for details.) International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 729 tional Diffie-Hellman assumption [12]. In addition, we 2.3 Complexity Assumption consider the applications of the multilinear BLS signature We assume that the following assumption holds in the set- scheme. It can be served as the key-generation algorithm ting described above: Multilinear Computational Diffie- of the multilinear IBE scheme [11]. It also can be used Hellman (MCDH) assumption. to construct aggregate signature [17], threshold signature scheme [7] and so on. In this work, we take advantage of Definition 1. For any PPT algorithm B, any polynomial the structure of the multilinear BLS scheme to construct p(·), any integer k, and all sufficiently large λ ∈ N, a ring signature scheme in the standard model. In a ring signature scheme [23], a signer can generate signatures on λ MP ← MulGen(1 , k); Q behalf of a group of users (i.e., ring) if and only if he is a ci 1 Pr c , . . . , c ←R ;: v = g i∈[k] < . member of the ring. Then, any verifier can confirm that 1 k Zp k−1 p(λ) v ← B( , gc1 , . . . , gck ) the message has been signed by one of the members in MP the ring, but he cannot know who is the real signer. Our This assumption can be viewed as an adaptation of the ring signature scheme has an attractive feature that for n Bilinear Computational Diffie-Hellman (BCDH) assump- members of a ring the signatures consist of just a single tion [4] in the setting of multilinear groups. group element.
3 Digital Signatures 2 Preliminaries 3.1 Definitions 2.1 Notations For ease of description, we define digital signature schemes with four algorithms: Setup, KeyGen, Sign, and The following notations will be used in this paper. Let Z Vrfy. Formally, given a security parameter λ, the PPT λ be the set of integers and Zp be the ring modulo p. 1 algorithm Setup, run by a trusted authority, generates denotes the string of λ ones for λ ∈ . |x| denotes the N public parameters PP. The public parameters will be length of the bit string x.[k] is a shorthand for the set used in all of the following three algorithms, for simplicity, {1, 2, . . . , k}. Finally, we write PPT for the probabilistic we omit this fact. The PPT algorithm KeyGen outputs polynomial time. a signing/verification key pair (SK,VK) for the signer. The PPT algorithm Sign takes as input a signing key SK and a message M, then outputs a signature σ. Finally, 2.2 Multilinear Maps the deterministic algorithm Vrfy processes a purported signature σ with respect to a message M and verification Let (G1,..., Gk) be a sequence of groups each of large key VK, accordingly, it outputs 1 to indicate a successful prime order p, and gi be a generator of group Gi, where verification and 0 otherwise. we let g = g1. There exists a set of bilinear maps {ei,j : Gi × Gj → Gi+j|i, j ≥ 1 ∧ i + j ≤ k}, which satisfy: 3.2 Existential Unforgeability a b ab ei,j(gi , gj ) = gi+j : ∀a, b ∈ Zp. The security model for signature schemes is Existential Unforgeability against adaptive Chosen-Message Attacks When the context is obvious, we omit the indexes i (EU-CMA) which is defined by the following game. a b ab and j, i.e., e(gi , gj ) = gi+j. It also will be conve- nient to abbreviate e(h1, h2, . . . , hj) = e(h1, e(h2,..., 1) Setup: Challenger runs Setup and KeyGen algo- rithms to generate the public parameters and chal- e(hj−1, hj) ...)) ∈ Gi for hj ∈ Gij and i1+i2+...+ij ≤ k. ∗ ∗ Let MulGen(1λ, k) be a PPT multilinear group genera- lenge keys (SK ,VK ). Adversary A is given the ∗ tor algorithm which takes as input a security parameter λ public parameters and VK . and an integer k, where k is the number of allowed pair- 2) Signing queries: Adversary A is allowed to adap- ing operations, then it outputs the multilinear parameters tively queries the signing oracle at most q times on MP = (G1,..., Gk, p, g = g1, g2, . . . , gk, ei,j) to satisfy the messages M1,...,Mq. In its i-th query, it receives above properties. ∗ back a signature σi ← Sign(SK ,Mi). In recent years, there has many multilinear maps been proposed, e.g., [9,12,14,22]. However, some of them have 3) Output: Finally, adversary A outputs a tuple of ∗ ∗ ∗ been shown to be insecure, e.g., [15, 21]. Fortunately, (M , σ ), where M 6= Mi for all i ∈ [q]. A wins the there still has several multilinear maps are beyond the game if Vrfy(VK∗,M ∗, σ∗) = 1. existing cryptanalysis, e.g., [1,14]. Therefore, we also can use this tool to design cryptographic schemes. For exam- We denote the success probability of a PPT adversary ple, [26,28,29] take advantage of the multilinear maps to A (taken over the random choices of the challenger and eu−cma design different cryptographic schemes. adversary) to win the game as AdvA . International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 730
Definition 2. We say that a signature scheme is EU- Correctness. To see the correctness, a signature σ Q` a CMA secure, if for any PPT adversary A, it cannot win i=1 i,mi x on message M is (gk−1 ) , and thus we have the above game with non-negligible advantage. Q` Q` i=1 ai,mi x i=1 ai,mi x e(σ, g) = e((gk−1 ) , g) = e((gk−1 ), g ) = e(A ,..., A ,VK). Selective security. The model of selective security is a 1,m1 `,m` weaker notion of the model of EU-CMA. In such model, we require that the adversary gives its chal- 4.2 Security of Multilinear BLS Scheme lenge message M ∗ before the setup phase, then it in the Standard Model cannot make signing query for M ∗. We now prove the security of the multilinear BLS scheme Definition 3. We say that a signature scheme is selec- in the standard model based on the MCDH assumption. tively secure, if for any PPT adversary, it cannot win the Our proof needs an admissible hash function h which can selective game with non-negligible advantage. be used to partition the message space to two subsets with probability 1/θ(q) (where q is the upper bound of the adversary’s queries) so that: the adversary’s query 4 Multilinear BLS Scheme messages Mi fall in one subset where we know a trap- door that allows us to answer its queries, and the adver- In this section, we describe the multilinear BLS signature sary’s challenge message M ∗ falls in the other set where scheme and its security. we do not know any trapdoor but hope to embed a chal- lenge element. We show that we can leverage the struc- ture of the multilinear BLS signature scheme to prove 4.1 Construction adaptive security. For simplicity of exposition, we as- We specify the message space M := {0, 1}`, more gener- sume that there is a polynomial s(λ) which denotes the ally, a collision resistant hash function can be used to hash length of messages space to be signed. We use a function s(λ) `(λ) messages to this size. The construction of the multilinear h : {0, 1} → {0, 1} maps the messages to ` bits, BLS signature scheme is as follows: and an efficient randomized algorithm Sample that is θ- admissible. The following definition of admissible hash • Setup(1λ, `): Trusted authority takes as input a secu- functions is from [17] which is a slight variant of the sim- rity parameter λ and the length ` of messages to runs plified definition in [11]. this algorithm to generate the public parameters. Definition 4. Let s, ` and θ be efficiently computable It first runs MP = (G1,..., Gk, p, g, . . . , gk, e) ← univariate polynomials. We say that a function h : λ MulGen(1 , k = `+1). Next, it chooses 2` random in- {0, 1}s(λ) → {0, 1}`(λ), and an efficient randomized algo- 2 tegers (a1,0, a1,1),..., (a`,0, a`,1) ∈ Zp and computes rithm Sample, are θ-admissible if the following properties a Ai,β = g i,β ∈ G1, for i ∈ [`] and β ∈ {0, 1}. The hold: public parameters PP contain the group descriptions ` s MP and (A1,0,A1,1),..., (A`,0,A`,1). • For any u ∈ {0, 1, ⊥} , define Pu : {0, 1} → {0, 1} as follows: Pu(X) = 0 iff ∀i : h(X)i 6= ui, and otherwise • KeyGen(PP): Each user chooses a random element (if ∃i : h(X)i = ui) we have Pu(X) = 1. x ∈ Zp as his signing key SK. The corresponding x verification key is VK = g ∈ G1. • We require that for any efficiently computable poly- s nomial q(λ), for all X1,..., Xq,Z ∈ {0, 1} , where • Sign(SK,M): Given a message, M, of length `, let Z 6∈ {Xi}, we have Pr[Pu(X1) = ... = Pu(Xq) = m1, . . . , m` be the bits of this message, the signer 1 ∧ Pu(Z) = 0] ≥ 1/θ(q), where the probability is computes the signature as: taken only over u ← Sample(1λ, q).
Q` ai,m Theorem 1. For any efficiently computable polynomials σ = e(A ,...,A )x = (g i=1 i )x ∈ .2 1,m1 `,m` k−1 Gk−1 s, `, there exists an efficiently computable polynomial θ such that there exists θ-admissible function families map- • Vrfy(V K, M, σ): Given a verification key VK and ping s bits to ` bits. a purported signature σ on message M, verify the following equation: The construction is identical to the multilinear BLS scheme with the exception of the Setup algorithm cre- ? ates the admissible hash functions. Then the signing and e(σ, g) = e(A1,m ,...,A`,m ,VK). 1 ` verification algorithms take as input h(M) instead of M. 2In the bilinear BLS signature scheme [6], signer computes a signature as σ = H(M)x, where H(·) is a collision-resistant hash Theorem 2. If h is a θ-admissible function and the k- function that will be treated as a random oracle in the proof. MCDH problem is hard in the multilinear groups, then In the multilinear BLS signature scheme, H(M) is defined as the multilinear BLS signature scheme with an admissible e(A1,m1 ,...,A`,m` ) which can be computed from the public pa- rameters of the (leveled) multilinear maps. hash is adaptively secure. International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 731
Proof. If there exists a PPT adversary A who can break Lemma 1. Assume an adversary that makes at most a the security of the multilinear BLS signature scheme with polynomial of signing queries q = q(λ) in Game0. If the an admissible hash in the EU-CMA game with advantage advantage of an adversary in Game0 is , then the advan- for message length s, level k of the multilinear maps, tage of the adversary in Game1 will be at least /θ(q). In and security parameter λ, then we can construct a PPT particular, any PPT adversary with non-negligible advan- challenger B to break the k-MCDH assumption with prob- tage in Game0 will also have non-negligible advantage in 0 ability ≥ /θ(q). The challenger B takes as input a Game1. k-MCDH instance (gc1 , . . . , gck ) together with the group descriptions to interactive with the adversary. The Proof. The lemma follows immediately from the property MP Q i∈[k] ci of function h satisfies the definition of a θ-admissibility, challenger’s goal is to compute g . k−1 since the only independent choice of u ← Sample(1λ, q) We describe the proof as a sequence of hybrid games determines whether or not the game aborts. where the first hybrid corresponds to the original EU- CMA game. Then in the first hybrid step we do a “par- Lemma 2. The advantage of any PPT adversary in titioning” of the space of the messages. After the first Game2 is the same as its advantage in Game1. proof step, we prove that any PPT adversary’s advan- tage must be close with negligible gap at most between Proof. The two games are equivalent as all Ai,β ∈ G1 are each successive hybrid games. We finally show that any still set to uniformly at random in both games. PPT adversary in the final game that succeeds with non- negligible advantage can be used to break the k-MCDH Lemma 3. If the k-MCDH assumption holds, then the assumption. advantage of any PPT adversary in Game2 is negligible.
• Game0 is the original EU-CMA game. Proof. We prove this lemma by giving a reduction to the k-MCDH assumption. To do so, we construct an algo- 1) Setup: Challenger B runs MulGen(1λ, k) to rithm B. produce group parameter MP. It then chooses B takes as input a k-MCDH problem instance c c λ a random exponent x ∈ Zp for the secret (MP, g 1 , . . . , g k ). Next, B runs u ← Sample(1 , q). It key and sets the challenge verification key sets the challenge key as VK∗ = gck . All these steps ∗ x as VK = g . It also randomly chooses together simulate the Setup phase of the Game2. Now, (a1,0, a1,1),..., (a`,0, a`,1) from Zp and com- it plays the game with the adversary A by using pub- ai,β putes Ai,β = g for i ∈ [`], β ∈ {0, 1}. Finally, lic parameters PP = (MP, {Ai,β|i ∈ [`], β ∈ {0, 1}}) and ∗ it sets PP = (MP, {Ai,β|i ∈ [`], β ∈ {0, 1}}) and challenge key VK . gives PP and VK∗ to the adversary A. The adversary A will then adaptively make 2) Signing queries: Adversary A adaptively at most q signing queries each for message Mi. queries the signing oracle at most q times on If Pu(Mi) = 0, B aborts and quits. Other- wise, Pu(Mi) = 1 and there exists an γ we have messages M1,...,Mq. In its i-th query, it re- x h(Mi)γ = uγ . Thus, B can compute the signature as ceives back e(A1,m ,...,A`,m ) from chal- i1 i` σ = e(A ,...,A ,A ,..., lenger B. 1,h(Mi)1 γ−1,h(Mi)γ−1 γ+1,h(Mi)γ+1 ∗ bγ,h(M ) A`,h(M ) ,VK ) i γ by knowing the exponent 3) Output: At some point A outputs a forgery i ` bγ,h(M ) of the parameter Aγ,h(M ) . σ∗ with respect to the challenge key VK∗ and i γ i γ ∗ ∗ ∗ Finally, the adversary A outputs an attempted forgery message M , it wins the game if Vrfy(VK ,M , ∗ ∗ ∗ ∗ σ with respect to the challenge verification VK on σ ) = 1 and M 6= Mi for i ∈ [q]. some message M ∗. B first checks the signature ver- ification Vrfy(VK∗,M ∗, σ∗) and aborts if it returns 0. • Game1 is the same as Game0 except that the chal- ∗ ` Next, it checks if Pu(M ) = 1 and aborts if that is the lenger begins by sampling a string u ∈ {0, 1, ⊥} by ∗ λ case. Otherwise, Pu(M ) = 0 and for all i we have revoking u ← Sample(1 , q). At the end of the game, ∗ ∗ h(M )i 6= ui. This means that the hash of M will the adversary is only considered to be successful if Q i∈[k−1] ci both its output satisfies the winning conditions and be gk−1 raised to some known product of bi,β val- Q ∗ ∗ ci for the challenge message M we have Pu(M ) = 0 i∈[k] ues. The signature therefore contains gk−1 raised to and for all messages M queried P (M ) = 1. i u i some known product of bi,β values. This value can be re- covered by taking the proper root of the signature, i.e., Q Q c • Game2 is the same as Game1 except that the follow- ∗ 1/ b ∗ i∈[k] i ∗ (σ ) i∈[`] i,h(M )i = g , and thus if σ is a suc- ing modification. The challenger sets the parameters k−1 (A ,A ),..., (A ,A ) in the following way: for cessful forgery, then this root of the signature is a solution 1,0 1,1 `,0 `,1 to the challenge instance of the k-MCDH problem. i ∈ [`] and β ∈ {0, 1} it chooses random bi,β ∈ Zp and sets By construction of the algorithm B, the probability of B succeeds is exactly the advantage that the adversary A b g i,β , if β = ui succeeds in Game2. Whenever B aborted, the adversary Ai,β = ci b (g ) i,β , if β 6= ui. by the rules of Game2 was not considered to be successful International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 732 since its queries or forgery violated the partition. The 5.2 Security Models of Ring Signatures lemma follows. Security models of the ring signature scheme contains two parts: unforgeability and anonymity. These three lemmas together yield the main theorem that the multilinear BLS signature scheme with an ad- 5.2.1 Ring Unforgeability missible hash is adaptively secure. This security guarantees that an adversary can compute a valid signature on behalf of a ring only if he knows a 5 Ring Signatures from Multilin- secret key corresponding to one of them. In this work, we use Bender et al.’s [5] model: unforgeability with respect ear BLS Scheme to insider corruption3 which is defined by the following game: In this section, we show the applications of the multilin- ear BLS signatures. It can be served as the key-generation 1) Setup: The challenger runs Setup and KeyGen al- algorithm of the multilinear IBE scheme [11], it also can gorithms to generate public parameters and users’ n(λ) be used to construct aggregate signature [17]. In addi- keys {(SKi,VKi)}i=1 . Then it gives the adver- tion, based on Boldyreva’s [7] work, we can easily obtain sary A the system parameters and verification keys a threshold signature, a multi-signature, and a blind sig- n(λ) S = {VKi}i=1 . In addition, the challenger main- nature scheme, respectively, based on the multilinear BLS tains a set C to record the corrupted users, initially, scheme in the standard model. Here, we take advantage C ← ∅. of the multilinear BLS scheme to construct a ring signa- ture scheme in the standard model. The resultant scheme 2) Signing queries: The adversary A can adaptively has an attractive feature that for n members of a ring our make singing queries on inputs (M, R, s), where M signatures consist of just a single group element. is the message to be signed, R ⊆ S is a ring of veri- fication keys and s is an index such that VKs ∈ R. The challenger returns back a ring signature σ ← 5.1 Definition of Ring Signatures Sign(SKs,R,M) to A. For convenience, we define an algorithm Setup, run by 3) Corruption queries: The adversary A also can trusted authority, to generate public parameters and build adaptively make some corruption queries on input the system of the ring signature scheme. The public pa- s ∈ [n(λ)]. The challenger returns back SKs to A rameters will be used in all of the following three algo- and adds VKs into the set C. rithms. In addition, we refer to an ordered set R = 4) Output: Finally, the adversary A outputs a tuple {VK ,...,VK } of verification keys as a ring, and let 1 n of (M ∗, σ∗,R∗). We say that A wins the game if the R[i] = VK . We will also freely use set notation, e.g., i following conditions hold: (1) Vrfy(R∗,M ∗, σ∗) = 1; VK ∈ R if there exists an index i such that R[i] = VK. (2) R∗ ⊆ S\C; (3) it never made a singing query (M ∗,R∗, s) for any s. Definition 5. A ring signature scheme contains the fol- lowing four algorithms: We denote the success probability of a PPT adversary A (taken over the random choices of the challenger and λ • Setup(1 ) → PP: The system setup algorithm takes adversary) to win the above game as AdvUnf . as input a security parameter λ to produce the sys- A tem public parameters PP. Definition 6. We say that a ring signature scheme has the property of unforgeability with respect to insider cor- • KeyGen() → (SK,VK): The key generation algo- ruption, if for any PPT adversary A, it cannot win the rithm generates users’ signing and verification keys above game with non-negligible advantage. (SK,VK). Selective security. We define a weaker notion, selective • Sign(SK ,R,M) → σ: The signing algorithm takes security, to the above model. In the game of selec- s tive security, the adversary A is required that to give as input a message M to be signed, a set of verifica- ∗ ∗ 4 tion keys R (i.e., the ring), and an user’s signing key a forgery ring/message pair (R ,M ) to the chal- lenger before the setup phase, then it cannot make SKs. It is required that VKs ∈ R meanings that the signer is a member of the signing ring. The algorithm 3We make use of a weaker notion of this security model in which outputs a signature σ. corruptions of honest users are allowed but adversary-chosen public keys are not allowed. This weaker notion has been used in [24, 27]. 4 n(λ) In the beginning, A does not given the keys S = {VKi}i=1 . • Vrfy(R, M, σ) → 0/1: The verification algorithm In order to obtain the forgery ring R∗, we require that A out- takes as input a purported signature σ on a ring R puts a set of index IR∗ = {i1, . . . , i|R∗|} ⊆ [n(λ)]. Then, after and a message M. It outputs 1 if σ is valid. Other- n(λ) ∗ the keys S = {VKi}i=1 be generated, the forgery ring R = wise, it outputs 0. {VKi1 ,...,VKi|R∗| } ⊆ S also be defined. International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 733
signing query on inputs (M ∗,R∗, s) for any s, it also • Setup(1λ, n, `): Trusted authority takes as input cannot make corruption query on input s for which a security parameter λ, the length ` of messages ∗ VKs ∈ R . and ring size n to runs this algorithm to gen- erate public parameters. It first runs MP = λ Definition 7. We say that a ring signature scheme is (G1,..., Gk, p, g, . . . , gk, ei,j) ← MulGen(1 , k = n + selectively unforgeable with respect to insider corruption, `). Next, it chooses 2` random values (a1,0, a1,1),..., 2 ai,β if for any PPT adversary A, it cannot win the selective (a`,0, a`,1) ∈ Zp and computes Ai,β = g ∈ G1, for game with non-negligible advantage. i ∈ [`] and β ∈ {0, 1}. The public parameters PP con- tain the group descriptions MP and group elements (A ,A ), ..., (A ,A ). 5.2.2 Ring Anonymity 1,0 1,1 `,0 `,1 • KeyGen( ): Each user i chooses a random value This security guarantees that any verifier can be con- PP x ∈ as his signing key SK . The corresponding vinced that someone in the ring has generated a valid ring i Zp i verification key is VK = gxi ∈ . signature, but the real signer remains unknown. In this i G1 paper, we make use of the notion of perfect anonymity. • Sign(M,SKs,R = {VK1,...,VKn}): Given a ring We say that a ring signature scheme is perfectly anony- of n verification keys, the holder of signing key SKs ∗ ∗ mous, if a signature on a message M under a ring R with s ∈ [n] can sign some message M ∈ M as σ = and key VK looks exactly the same as a signature on i0 e(A1,m ,...,A`,m ,VK1,...,VKs−1,VKs+1,..., ∗ ∗ 1 ` the same message M under the same ring R and a dif- xs VKn) . The signature consists of just a single group ferent key VK . This means that the signer’s key is hid- Q` Qn i1 ( i=1 ai,mi )·( j=1 xj ) den among all the honestly generated keys in the ring. element. In fact, σ = gk−1 ∈ Gk−1. Formally, it is defined by the following game: • Vrfy(M, σ, R = {VK1,...,VKn}): Given a ring of n verification keys and a purported signature σ on a 1) Setup: The challenger runs Setup and KeyGen algo- ? message M, check the following equation: e(σ, g) = rithms to generate public parameters and users’ keys n(λ) e(A1,m1 ,...,A`,m` ,VK1,...,VKn). {(SKi,VKi)}i=1 . Then it returns back the public n(λ) parameters and all keys {(SKi,VKi)}i=1 to the ad- Correctness. To see the correctness, a signature σ on Q` Qn versary A. ( i=1 ai,mi )·( j=1 xj ) message M and ring R is gk−1 , and Q` Qn ( i=1 ai,mi )·( j=1 xj ) 2) Challenge: The adversary A gives a tuple of thus we have e(σ, g) = e(gk−1 , g) = ∗ ∗ ∗ Q` ai,m Qn (M ,R , i0, i1), where M is the challenge message, i=1 i j=1 xj ∗ e(gk−1 , g ) = R is the challenge ring, i0 and i1 are two indices ∗ e(A1,m1 ,...,A`,m` ,VK1,...,VKn). such that {VKi0 ,VKi1 } ⊆ R , to the challenger. The challenger chooses random b ∈ {0, 1}, computes In the setting of the multilinear maps, the space to repre- σ∗ ← Sign(M ∗,SK ,R∗), and sends σ∗ to the ad- ib sent a group element might grow with k (which is n + `), versary. because this happens in the GGH [12] framework. To mit- igate this problem, we can use the method in [17], which 3) Guess: Finally, the adversary outputs b0, indicating differs the message alphabet size in a tradeoff between his guess for b. computation and storage. The above construction uses a binary message alphabet. If it uses an alphabet of 2d sym- We denote the advantage of an unbounded adversary A bols, then the ring signature could resident in the group (taken over the random choices of the challenger and the with `/d + n − 1 pairings required to compute it, adversary) to win the above game as AdvAno = |Pr[b0 = G`/d+n A at the cost of the public parameters requiring 2d · ` group b] − Pr[b0 6= b]|. elements in G1. Definition 8. A ring signature scheme has the property of perfect anonymity, if even an unbounded adversary can- 5.4 Security not win the above game with non-negligible advantage. Theorem 3. The ring signature scheme with message length ` and ring size n in the above is selectively unforge- 5.3 Construction able with respect to insider corruption under the (n + `)- MCDH assumption. We now construct a ring signature scheme based on the multilinear BLS scheme. We specify the message space Proof. If there exists a PPT adversary A who can break M := {0, 1}`, more generally, a collision resistant hash the selective security of the ring signature scheme with ad- function can be used to hash messages to this size. Let vantage for message length `, ring size n, level k = n + ` m1, . . . , m` be the bits of the message M ∈ M. The of multilinear maps, and security parameter λ, then we following construction is an n-user ring signature scheme, show that we can construct a PPT challenger B to break means that |R| = n. the k-MCDH assumption for security parameter λ with International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 734
∗ ` probability . Initially, A gives (M ∈ {0, 1} ,IR∗ = Given a ring signature, we show that any ring member ∗ {i1, . . . , in} ⊆ [n(λ)]) to B who is given an instance, could possibly have created it. Consider a signature σ c1 c ∗ ∗ (MP, g , . . . , g k ), of the k-MCDH assumption. on ring R = {VK1,...,VKn} and message M , that has been created using key SK . We will show that with the 1) Setup: The challenger B first sets signing and i0 same probability it could have been created using SK verification keys for the challenge ring (SK = i1 i1 with i 6= i . The proof is straight-forward. c`+1 ck 1 0 c`+1,VKi1 = g ),..., (SKin = ck,VKin = g ) (it does not know these ci). For indices i 6∈ IR∗ , it ∗ ∗ Proof. For any tuple (M ,R , i0, i1) which are chosen chooses random xi ∈ Zp and sets SKi = xi,VKi = xi by an unbounded adversary A, the signatures created g . It then generates parameters as follows: ∗ by the member i and i are σ = e(A ∗ ,..., 0 1 i0 1,m1 x A ∗ ,VK ,...,VK ,VK ,...,VK ) i0 • Choose random integers a1, . . . , a` ∈ Zp. `,m` 1 i0−1 i0+1 n ∗ ci and σ = e(A1,m∗ ,...,A`,m∗ , • For i ∈ [`], set Ai,m∗ = g and compute i1 1 ` i xi ai VK ,...,VK ,VK ,...,VK ) 1 , respectively. A ¯∗ = g . 1 i1−1 i1+1 n i,mi Q` Qn ( ai,m∗ )·( xi) However, σ∗ = σ∗ = g i=1 i i=1 since the Note that these parameters are distributed uniformly i0 i1 k−1 at random as in the real ring signature scheme. Then signing algorithm is deterministic. Therefore, any mem- ber of a ring can compute a same signature on a given B sets the public parameters PP = (MP, {Ai,β|i ∈ message and ring. The perfect anonymity follows easily [`], β ∈ {0, 1}}). Finally, it gives and {VK }n(λ) PP i i=1 from this observation. to A. 2) Signing queries: Conceptually, the challenger B can generate signatures for the adversary, because 6 Conclusion the adversary’s requests and the challenge ring or message will be different in at least one bit. Specifi- In this work, we consider the BLS signature scheme in the cally, when A makes a query to the signing oracle on setting of multilinear groups. First of all, we present a ∗ input (M,R = {VK1,...,VKn}, s). If R 6= R , we proof of adaptive security for the multilinear BLS scheme ∗ assume that VKj ∈ R but 6∈ R , then B ignores the based on MCDH assumption. Then, we construct a index s and signs M with SKj in the usual way since ring signature scheme that, based on the multilinear BLS ∗ B knows VKj’s singing key xj. If R = R , then we scheme, has an attractive feature that for n members of a ∗ ∗ ring the signatures consist of just a single group element. know M 6= M and assume that mγ 6= mγ , where mγ ∗ ∗ and mγ are the γ-th bit of the message M and M ,
respectively. Hence B can compute σ = e(A1,m1 , ..., aγ Acknowledgments Aγ−1,mγ−1 ,Aγ+1,mγ+1 ,...,A`,m` , VK1, ...,VKn) by knowing the exponent a of the parameter A ¯∗ . γ γ,mγ Finally, it returns σ to the adversary A. This study was supported by the National Natural Sci- ence Foundation of China under Grant 61702067, the 3) Corruption queries: When A makes a query to the Natural Science Foundation of Chongqing under Grants corruption oracle with input an index i for i 6∈ IR∗ , cstc2017jcyjAX0201 and cstc2019jcyj-msxmX0551, and B gives SKi to A and adds VKi to the set C of the the key project of science and technology research pro- corrupted users. gram of Chongqing Education Commission of China under Grants KJZD-K201803701 and KJQN201903701. 4) Output: Finally, A outputs a forgery σ∗ with re- The authors gratefully acknowledge the anonymous re- spect to the challenge ring R∗ = {VK ,...,VK } i1 in viewers for their valuable comments. and message M ∗. Then B outputs σ∗ as the so- lution to the given instance of the k-MCDH as- sumption. According to the setting of the public parameters and the verification keys of the chal- References lenge ring in the setup phase, and the assumption [1] M. R. Albrecht, P. Farshim, D. Hofheinz, E. Lar- that σ∗ is valid, we know that σ∗ should be equal raia, K. G. Paterson, ”Multilinear maps from ob- to e(A1,m∗ ,...,A`,m∗ ,VK1,...,VKj−1,VKj+1,..., 1 Q ` fuscation,” in Proceedings of Part I of the 13th In- ci c`+j i∈[1,k] VKn) = gk−1 , where c`+j is a certain sign- ternational Conference on Theory of Cryptography ing key A uses. It implies that σ∗ is a solution for the (TCC’16), vol. 9562, pp. 446-473, 2016. given instance to the k-MCDH problem, and thus B [2] D. Boneh, X. Boyen, ”Secure identity based encryp- breaks the k-MCDH assumption. tion without random oracles,” in Annual Interna- tional Cryptology Conference, pp. 443-459, 2004. It is clear that B succeeds whenever A does. [3] D. Boneh, X. Boyen, ”Short signatures without Theorem 4. The ring signature scheme with message random oracles,” in International Conference on length ` and ring size n in the above is anonymous against the Theory and Applications of Cryptographic Tech- any unbounded adversary. niques, vol. 3027, pp. 56-73, 2004. International Journal of Network Security, Vol.22, No.5, PP.728-735, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).02) 735
[4] D. Boneh, M. Franklin, ”Identity-based encryption from the Weil pairing,” in Annual International [19] M. S. Hwang, C. C. Lee, Y. C. Lai, “An untrace- Cryptology Conference, vol. 2139, pp. 213-229, 2001. able blind signature scheme”, IEICE Transactions [5] A. Bender, J. Katz, R. Morselli, ”Ring signatures: on Foundations, vol. E86-A, no. 7, pp. 1902–1906, Stronger definitions, and constructions without ran- July 2003. dom oracles,” Journal of Cryptolog, vol. 22, no. 1, [20] C. C. Lee, M. S. Hwang, and W. P. Yang, “A new pp. 114-138, 2009. blind signature based on the discrete logarithm prob- [6] D. Boneh, B. Lynn, H. Shacham, ”Short signatures lem for untraceability”, Applied Mathematics and from the Weil pairing,” Journal of Cryptology, vol. Computation, vol. 164, no. 3, pp. 837–841, May 2005. 17, no. 4, pp. 297-319, 2004. [21] H. T. Lee, H. J. Seo, ”Security analysis of multilin- [7] A. Boldyreva, ”Threshold signatures, multisigna- ear maps over the integers,” in Annual Cryptology tures and blind signatures based on the gap-Diffie- Conference, vol. 8616, pp. 224-240, 2014. Hellman-group signature scheme,” in International [22] A. Langlois, D. Stehl´e, R. Steinfeld, ”GGHLite: Workshop on Public Key Cryptography, pp. 31-46, More efficient multilinear maps from ideal lattices,” 2002. Advances in Cryptology, vol. 8441, pp. 239-256, 2014. [8] D. Boneh, A. Silverberg, ”Applications of multilinear [23] R. Rivest, A. Shamir, Y. Tauman, ”How to leak a se- forms to cryptography,” Contemporary Mathematics, cret,” in International Conference on the Theory and vol. 324, pp. 71-90, 2002. Application of Cryptology and Information Security, [9] J. S. Coron, T. Lepoint, M. Tibouchi, ”New multi- vol. 2248, pp. 552-565, 2001. linear maps over the integers,” in Annual Cryptology [24] S. SchageS, J. Schwenk, ”A CDH-based ring signa- Conference, vol. 9215, pp. 267-286, 2015. ture scheme with short signatures and public keys,” [10] L. Chen, J. Malone-Lee, ”Improved identity-based in International Conference on Financial Cryptogra- signcryption,” International Workshop on Public phy and Data Security, vol. 6052, pp. 129-142, 2010. Key Cryptography, vol. 3386, pp. 362-379, 2005. [25] B. Waters, ”Efficient identity-based encryption with- [11] E. S. V. Freire, D. Hofheinz, K. G. Paterson, C. out random oracles,” in Annual International Con- Striecks, ”Programmable hash functions in the mul- ference on the Theory and Applications of Crypto- tilinear setting,” in Annual Cryptology Conference, graphic Techniques, vol. 3494, pp. 114-127, 2005. vol. 8042, pp. 513-530, 2013. [26] H. Wang, D. He, J. Shen, Z. Zheng, X. Yang, M. [12] S. Garg, C. Gentry, S. Halevi, ”Candidate multilin- H. Au, ”Fuzzy matching and direct revocation: A ear maps from ideal lattices,” Annual International new CP-ABE scheme from multilinear maps,” Soft Conference on the Theory and Applications of Cryp- Computing, vol. 22, no. 7, pp. 2267-2274, 2017. tographic Techniques, vol. 7881, pp. 1-17, 2013. [27] F. Tang, H. Li, ”Ring signatures of constant size [13] S. Garg, C. Gentry, S. Helevi, M. Raykova, A. Sahai, without random oracles,” in International Confer- B. Waters, ”Canditate indistinguishability obfusca- ence on Information Security and Cryptology, vol. tion and functional encryption for all circuits,” IEEE 8957, pp. 93-108, 2015. 54th Annual Symposium on Foundations of Com- [28] F. Tang, H. Li, B. Liang, ”Attribute-based signatures puter Science, pp. 40-49, 2013. ISBN: 978-0-7695- for circuits from multilinear maps,” in International 5135-7. Conference on Information Security, vol. 8783, pp. [14] C. Gentry, S. Gorbunov, S. Halevi, ”Graph-induced 54-71, 2014. multilinear maps from lattices,” in Theory of Cryp- [29] F. Tang, Y. Zhou, ”Policy-based signatures for pred- tography Conference, vol. 9015, pp. 498-527, 2015. icates,” International Journal of Network Security, [15] Y. Hu, H. Jia, ”Cryptanalysis of GGH map,” in An- vol. 19, no. 5, pp. 811-822, 2017. nual International Conference on the Theory and Ap- plications of Cryptographic Techniques, vol. 9665, pp. 537-565, 2016. [16] M. S. Hwang, C. C. Lee, Y. C. Lai, ”An untrace- Biography able blind signature scheme,” IEICE Transactions Fei Tang received his Ph.D from the Institute of Informa- on Foundations, vol. E86-A, no. 7, pp. 1902-1906, tion Engineering of Chinese Academy of Sciences in 2015. 2003. He is currently an associate professor of the College of [17] S. Hohenberger, A. Sahai, B. Waters, ”Full domain Cyberspace Security and Law, Chongqing University of hash from (leveled) multilinear maps and identity- Posts and Telecommunications. His research interests are based aggregate signatures,” in Annual Cryptology public key cryptography and blockchain. Conference, vol. 8024, pp. 494-512, 2013. [18] S. Hohenberger, A. Sahai, B. Waters, ”Replacing a Dong Huang received his Ph.D from the Chongqing random oracle: full domain hash from indistinguisha- University in 2012. He is currently a professor of the bility obfuscation,” in Annual International Con- Chongqing University of Science and Technology and ference on the Theory and Applications of Crypto- Chongqing Vocational and Technical University of Mecha- graphic Techniques, vol. 8441, pp. 201-220, 2004. tronics. His research interest is public key cryptography. International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 736
Eighth Power Residue Double Circulant Self-Dual Codes
Changsong Jiang1,3, Yuhua Sun1,2, and Xueting Liang1 (Corresponding author: Yuhua Sun)
College of Science, China University of Petroleum1 Qingdao, Shandong 266580, China Provincial Key Laboratory of Applied Mathematics, Putian University, Putian, Fujian 351100, China2 School of Computer Science and Engineering, University of Electronic Science and Technology of China3 (Email: sunyuhua [email protected]) (Received Mar. 1, 2019; Revised and Accepted Sept. 16, 2019; First Online Jan. 23, 2020)
Abstract pher sequences with good pseudo-random properties (for example, see [7, 23, 27]). In fact, they have also been Self-dual codes are one of the most important classes of used to construct error-correcting codes, and a very in- linear codes. Power residue classes are widely used in teresting method of constructing linear codes or self-dual the constructions of linear codes and pseudo-random se- codes is combining double circulant matrices and residue quences. In this paper, we give new constructions of classes to give the generator matrix of codes(for example, self-dual codes over GF(2) and GF(4) by eighth power see [8, 10, 12, 16, 21])). But, in most of the relevant lit- residues. We get multiple pure double circulant codes eratures at present, the residue being used to construct and bordered double circulant codes. Some of these new codes are mainly quadratic residue. self-dual codes have large minimum distances. Recently, Zhang and Ge introduced fourth power Keywords: Cyclotomic Number; Double Circulant Code; residue double circulant and obtained several new Eighth Power Residues; Self-Dual Code infinite families of classes of self-dual codes over GF(2), GF(3), GF(4), GF(8), GF(9) [30]. Some of these codes have better minimum weight than previously known 1 Introduction codes. In this paper, inspired by their methods, we con- struct double circulant self-dual codes by by higher power The famous paper ”A mathematical theory of commu- residues, especially eighth power residues. We give new nication” [26] by Shannon marked the beginning of cod- constructions of self-dual codes over GF(2) and GF(4) by ing theory. Codes with good properties have many ap- prime p of the form 16f + 9, and some of these codes plications in cryptography and communication systems. have good parameters. Examples of such codes are bi- Most of the codes constructed in the initial stage were nary self-dual [82, 41, 14] code, quaternary self-dual [82, binary codes. Now, codes over finite fields and over finite 41, 14] code and quaternary self-dual [84, 42, 12] code. rings are very common in both mathematical and engi- All computation have been done by MATLAB R2017b neering literatures. Thanks to having neat mathematical and MAGMA V2.12 on a 2.50 GHz CPU. structure and being easy to code and decode, linear codes The paper is organized as follows. In Section 2, we give play a decisive role in coding theory. It is worth not- the relevant knowledge of double circulant codes and self- ing that, among linear codes, there is one class of special dual codes. In Section 3, we describe the detailed process codes, i.e., self-dual codes which are widely used in data of constructing linear codes by eighth power residues and transmission and have become important tools to con- discuss the parameter conditions satisfying self-dual prop- struct quantum error-correcting codes. Therefore various erty. Section 4 considers the constructions over GF(2) and methods of construction and analysis of self-dual codes GF(4) respectively. A conclusion is given in Section 5. have been presented by coding researchers and various classes of linear codes with self-dual property appeared successively in many literatures. For example, readers can 2 Preliminaries refer to [1–6,9,11,17–19,22,24,25,28,29] or can also refer to the survey paper [13] for the advances of early research Self-Dual Codes. A linear [n, k] code C of length n in this field. It is well known that power residue classes and dimension k over the Galois field with q ele- have become an important tool to construct stream ci- ments GF(q) is a linear subspace of dimension k of International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 737
GF(q)n, where q is a prime power. An element of power residue double circulant code and bordered eighth the code C is called a codeword of C. A generator power residue double circulant code, respectively. Some matrix of C is a matrix whose rows generate C. Let codes have large minimum distances, and almost reach x = (x1, x2, ..., xn) and y = (y1, y2, ..., yn) be two the bounds of the minimum distance. codewords of GF(q)n. The Euclidean inner product Pn is defined by (x, y) = i=1 xiyi. For a linear code C, the code C⊥ = {x ∈ GF(q)n|(x, c) = 0 for all c ∈ C} 3 Generator Matrices of Eighth is called its Euclidean dual code. And we say C is self-orthogonal if C ⊆ C⊥ and C is self-dual if Power Residue Double Circu- C = C⊥. lant Self-Dual Codes
Definition 1. [30]: Let Pn(R) and Bn(R) be codes with generator matrices of the form Let p = Nf + 1 be a prime with a fixed primitive root g over GF(q). We define the Nth cyclotomic classes
(In R) (1) C0,C1, ..., CN−1 of GF(p) by
jN+i and Ci = g |0 ≤ j ≤ f − 1 ,
α 1 ··· 1 where 0 ≤ i ≤ N − 1. Then we call C0 is the Nth power −1 residues modulo p, and C = giC where 0 ≤ i ≤ N − 1. I (2) i 0 n+1 . . R Define the cyclotomic number (i, j) of order N to be the −1 number of integers n (mod p) which satisfy respectively, where α ∈ GF(q), I is the identity matrix n ≡ g16s+i, 1 + n ≡ g16t+j(mod p), and R is an n × n circulant matrix. An n by n circulant matrix has the form where s, t in {0, 1, 2, ..., f − 1}. In order to give the necessary conditions, we give the r0 r1 r2 ··· rn−1 eighth power residue cyclotomic numbers and derive the rn−1 r0 r1 ··· rn−2 (3) relationships between them when p is an odd prime of the . . . . . . . . form 16l + 9. r1 r2 r3 ··· r0 so that each successive row is a cyclic shift of the previous Lemma 1. [15]: Let p = ef + 1 be an odd prime. Then 0 0 0 one. The codes Pn(R) and Bn(R) are called pure double 1) (i, j)e = (i , j )e, when i ≡ i (mod e) and j ≡ circulant and bordered double circulant, respectively. j0(mod e). 2) (i, j)e = (e − i, j − i)e The (Hamming) distance between two codewords x (j, i)e; if f even. and y denoted by d(x, y), is defined to be the num- = e e (j + , i + )e; if f odd. ber of places at which x and y differ. The Hamming 2 2 3) Pe−1(i, j) = f − δ , where δ = 1 if j ≡ weight of a codeword is the number of non-zero compo- i=0 e j j nents. And the minimum distance d(C) of C is defined 0 (mod e); otherwise δj = 0. by d(C) = min{d(x, y)|x 6= y ∈ C}, and it also equals to the minimum weight of the codewords of C except for 0. Let p be a prime of the form p = 16l + 9, where l is Let C be a self-dual code over GF(q) of length n and a positive integer. From Lemma 1, the relationships of minimum distance d(C). Then the following bounds are cyclotomic numbers of order 8 are known in [14, 22, 24, 25]. For binary self-dual codes:
4[ n ] + 4, if n 6= 22 (mod 24), d(C) ≤ 24 4[ n ] + 6, if n = 22 (mod 24). (0, 0)8 = (4, 0)8 = (4, 4)8, (0, 1)8 = (3, 7)8 = (5, 4)8, 24 (0, 2)8 = (2, 6)8 = (6, 4)8, (0, 3)8 = (1, 5)8 = (7, 4)8, The minimum distance of a self-dual ternary code C (0, 4)8, (0, 5)8 = (1, 4)8 = (7, 3)8, satisfies: d(C) ≤ 3[ n ] + 3 and for quaternary Euclidean (0, 6) = (2, 4) = (6, 2) , (0, 7) = (3, 4) = (5, 1) , 12 8 8 8 8 8 8 self-dual codes: d(C) ≤ 4[ n ] + 4. The code C is called (1, 0) = (3, 3) = (4, 1) = (4, 5) = (5, 0) = (7, 7) , 12 8 8 8 8 8 8 extremal if the equality holds. If a code has the highest (1, 1)8 = (3, 0)8 = (4, 3)8 = (4, 7)8 = (5, 5)8 = (7, 0)8, possible minimum weight for its length and dimension, we (1, 2)8 = (2, 7)8 = (3, 6)8 = (5, 3)8 = (6, 5)8 = (7, 1)8, call it optimal. (1, 3)8 = (1, 6)8 = (2, 5)8 = (6, 3)8 = (7, 2)8 = (7, 5)8, In this paper, we construct a circulant matrix R by (1, 7)8 = (2, 3)8 = (3, 5)8 = (5, 2)8 = (6, 1)8 = (7, 6)8, eighth power residue and get a necessary condition such (2, 0)8 = (2, 2)8 = (4, 2)8 = (4, 6)8 = (6, 0)8 = (6, 6)8, that the corresponding codes are self-dual. Further, under (2, 1)8 = (3, 1)8 = (3, 2)8 = (5, 6)8 = (5, 7)8 = (6, 7)8. this condition, we get two kinds of codes called pure eighth (4) International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 738
Remark 1. For simplicity, in the next we denote in equation (7) have the following relationships.
t t t t A1 = A5,A2 = A6,A3 = A7,A4 = A8, 2 A1 = AA1 + BA2 + CA3 + DA4 + EA5 + FA6 + GA7 + HA8, 2 A2 = HA1 + AA2 + BA3 + CA4 + DA5 + EA6 + FA7 + GA8, 2 A3 = GA1 + HA2 + AA3 + BA4 + CA5 + DA6 + EA7 + FA8, 2 A4 = FA1 + GA2 + HA3 + AA4 + BA5 + CA6 + DA7 + EA8, A := (0, 0)8,B := (0, 1)8,C := (0, 2)8, 2 A5 = EA1 + FA2 + GA3 + HA4 + AA5 + BA6 + CA7 + DA8, 2 A6 = DA1 + EA2 + FA3 + GA4 + HA5 + AA6 + BA7 + CA8, D := (0, 3)8,E := (0, 4)8,F := (0, 5)8, 2 A7 = CA1 + DA2 + EA3 + FA4 + GA5 + HA6 + AA7 + BA8, 2 G := (0, 6)8,H := (0, 7)8,I := (1, 0)8, (5) A8 = BA1 + CA2 + DA3 + EA4 + FA5 + GA6 + HA7 + AA8, A1A2 = A2A1 = IA1 + JA2 + KA3 + LA4 + FA5 + DA6 + LA7 + MA8, J := (1, 1) ,K := (1, 2) ,L := (1, 3) , A A = A A = NA + OA + NA + MA + GA + LA + CA + KA , 8 8 8 1 3 3 1 1 2 3 4 5 6 7 8 A1A4 = A4A1 = JA1 + OA2 + OA3 + IA4 + HA5 + MA6 + KA7 + BA8, M := (1, 7)8,N := (2, 0)8,O := (2, 1)8. A1A5 = A5A1 = (2l + 1)Ip + AA1 + IA2 + AA3 + JA4 + AA5 + IA6 +NA7 + JA8, A1A6 = A6A1 = IA1 + HA2 + MA3 + KA4 + BA5 + JA6 + OA7 + OA8, A1A7 = A7A1 = NA1 + MA2 + GA3 + LA4 + CA5 + KA6 + NA7 + OA8, A1A8 = A8A1 = JA1 + AA2 + LA3 + FA4 + DA5 + LA6 + MA7 + IA8, A2A3 = A3A2 = MA1 + IA2 + JA3 + KA4 + LA5 + FA6 + DA7 + LA8, A2A4 = A4A2 = KA1 + NA2 + OA3 + NA4 + MA5 + GA6 + LA7 + CA8, A2A5 = A5A2 = BA1 + JA2 + OA3 + OA4 + IA5 + HA6 + MA7 + KA8, Let p ≡ 1(mod 8) be a prime. Its 8th cyclo- A2A6 = A6A2 = JA1 + AA2 + IA3 + NA4 + JA5 + AA6 + IA7 + NA8, A2A7 = A7A2 = OA1 + IA2 + AA3 + MA4 + KA5 + BA6 + JA7 + OA8, tomic classes are C ,C ,C ,C ,C ,C ,C and C . A2A8 = A8A2 = OA1 + NA2 + MA3 + GA4 + LA5 + CA6 + KA7 + NA8, 0 1 2 3 4 5 6 7 A3A4 = A4A3 = LA1 + MA2 + IA3 + JA4 + KA5 + LA6 + FA7 + DA8, Suppose m , m , m , m , m , m , m , m , m are the A3A5 = A5A3 = CA1 + KA2 + NA3 + OA4 + NA5 + MA6 + GA7 + LA8, 0 1 2 3 4 5 6 7 8 A3A6 = A6A3 = KA1 + BA2 + JA3 + OA4 + OA5 + IA6 + HA7 + MA8, elements of GF(q). Then we construct the matrix A3A7 = A7A3 = (2l + 1)Ip + NA1 + JA2 + AA3 + IA4 + NA5 + JA6 +AA7 + IA8, A A = A A = OA + OA + IA + HA + MA + KA + BA + JA , Cp(m0, m1, m2, m3, m4, m5, m6, m7, m8) which is a p × p 3 8 8 3 1 2 3 4 5 6 7 8 A4A5 = A5A4 = DA1 + MA2 + LA3 + IA4 + JA5 + KA6 + LA7 + FA8, A A = A A = LA + CA + KA + NA + OA + NA + MA + GA , matrix on GF(q). The component cij, 1 ≤ i, j ≤ p, defines 4 6 6 4 1 2 3 4 5 6 7 8 A4A7 = A7A4 = MA1 + KA2 + BA3 + JA4 + OA5 + OA6 + IA7 + HA8, A4A8 = A8A4 = (2l + 1)Ip + IA1 + NA2 + JA3 + AA4 + IA5 + NA6 +JA7 + AA8, A5A6 = A6A5 = FA1 + DA2 + LA3 + MA4 + IA5 + JA6 + KA7 + LA8, A5A7 = A7A5 = GA1 + LA2 + CA3 + KA4 + NA5 + OA6 + NA7 + MA8, A5A8 = A8A5 = HA1 + MA2 + KA3 + BA4 + JA5 + OA6 + OA7 + IA8, A6A7 = A7A6 = LA1 + FA2 + DA3 + LA4 + MA5 + IA6 + JA7 + KA8, A6A8 = A8A6 = MA1 + GA2 + LA3 + CA4 + KA5 + NA6 + OA7 + NA8, m0; if j = i, A7A8 = A8A7 = KA1 + LA2 + FA3 + DA4 + LA5 + MA6 + IA7 + JA8. (9) m1; if j − i ∈ C0, m2; if j − i ∈ C1, Proof. The proof is straightforward from the definition of m3; if j − i ∈ C2, Ai and lemma 1. m4; if j − i ∈ C3, (6) Lemma 3. If p = 16l + 9 is a prime, then m5; if j − i ∈ C4, m ; if j − i ∈ C , 6 5 RRt = α I +α A + α A + α A + α A + 0 p 1 1 2 2 3 3 4 5 m7; if j − i ∈ C6, (10) α5A5 + α6A6 + α7A7 + α8A8 m8; if j − i ∈ C7. where
2 p−1 2 2 2 2 2 2 2 2 α0 = m0 + 8 (m1 + m2 + m3 + m4 + m5 + m6 + m7 + m8), 2 2 α1 = α5 = (m0m1 + m0m5) + (m1 + m1m5 + m5)A +(m1m2 + m4m8 + m5m6)B + (m1m3 + m3m7 + m5m7)C Let I be the identity matrix and J be the all-one +(m1m4 + m2m6 + m5m8)D + m1m5E n n +(m1m6 + m2m5 + m4m8)F + (m1m7 + m3m5 + m3m7)G square matrix, so that Cp(1, 0, 0, 0, 0, 0, 0, 0, 0) = Ip and +(m1m8 + m2m6 + m4m5)H 2 2 Cp(1, 1, 1, 1, 1, 1, 1, 1, 1) = Jp. Denote +(m1m2 + m1m6 + m2m5 + m4 + m5m6 + m8)I 2 2 +(m1m4 + m1m8 + m2 + m4m5 + m5m8 + m6)J +(m2m3 + m2m8 + m3m8 + m4m6 + m4m7 + m6m7)K +(m2m4 + m2m7 + m3m6 + m3m8 + m4m7 + m6m8)L +(m2m7 + m2m8 + m3m4 + m3m6 + m4m6 + m7m8)M A1 := Cp(0, 1, 0, 0, 0, 0, 0, 0, 0),A2 := Cp(0, 0, 1, 0, 0, 0, 0, 0, 0), 2 2 +(m1m3 + m1m7 + m3 + m3m5 + m5m7 + m7)N A3 := Cp(0, 0, 0, 1, 0, 0, 0, 0, 0),A4 := Cp(0, 0, 0, 0, 1, 0, 0, 0, 0), +(m2m3 + m2m4 + m3m4 + m6m7 + m6m8 + m7m8)O, 2 2 A5 := Cp(0, 0, 0, 0, 0, 1, 0, 0, 0),A6 := Cp(0, 0, 0, 0, 0, 0, 1, 0, 0), α2 = α6 = (m0m2 + m0m6) + (m2 + m2m6 + m6)A A7 := Cp(0, 0, 0, 0, 0, 0, 0, 1, 0),A8 := Cp(0, 0, 0, 0, 0, 0, 0, 0, 1). +(m1m5 + m2m3 + m6m7)B + (m2m4 + m4m8 + m6m8)C (7) +(m1m6 + m2m5 + m3m7)D + m2m6E +(m1m5 + m2m7 + m3m6)F + (m2m8 + m4m6 + m4m8)G +(m1m2 + m3m7 + m5m6)H 2 2 +(m1 + m2m3 + m2m7 + m3m6 + m5 + m6m7)I +(m m + m m + m m + m2 + m m + m2)J And the construction of the n × n circulant matrix R 1 2 1 6 2 5 3 5 6 7 +(m1m3 + m1m4 + m3m4 + m5m7 + m5m8 + m7m8)K is given as follows: +(m1m4 + m1m7 + m3m5 + m3m8 + m4m7 + m5m8)L +(m1m3 + m1m8 + m3m8 + m4m5 + m4m7 + m5m7)M 2 2 +(m2m4 + m2m8 + m4 + m4m6 + m6m8 + m8)N +(m1m7 + m1m8 + m3m4 + m3m5 + m4m5 + m7m8)O, 2 2 α3 = α7 = (m0m3 + m0m7) + (m3 + m3m7 + m7)A +(m2m6 + m3m4 + m7m8)B + (m1m5 + m1m7 + m3m5)C +(m2m7 + m3m6 + m4m8)D + m3m7E R = m0Ip+m1A1 + m2A2 + m3A3 + m4A4+ (8) +(m2m6 + m3m8 + m4m7)F + (m1m3 + m1m5 + m5m7)G +(m2m3 + m4m8 + m6m7)H m5A5 + m6A6 + m7A7 + m8A8 2 2 +(m2 + m3m4 + m3m8 + m4m7 + m6 + m7m8)I 2 2 +(m2m3 + m2m7 + m3m6 + m4 + m6m7 + m8)J +(m1m6 + m1m8 + m2m4 + m2m5 + m4m5 + m6m8)K +(m1m4 + m1m6 + m2m5 + m2m8 + m4m6 + m5m8)L +(m1m2 + m1m4 + m2m4 + m5m6 + m5m8 + m6m8)M +(m2 + m m + m m + m m + m2 + m m )N Lemma 2. Let p = 16l + 9 be a prime, then the matrices 1 1 3 1 7 3 5 5 5 7 +(m1m2 + m1m8 + m2m8 + m4m5 + m4m6 + m5m6)O, International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 739
α = α = (m m + m m ) + (m2 + m m + m2)A 4 8 0 4 0 8 4 4 8 8 where +(m1m8 + m3m7 + m4m5)B + (m2m6 + m2m8 + m4m6)C +(m1m5 + m3m8 + m4m7)D + m4m8E α 1 ··· 1 α − 1 · · · − 1 −1 1 +(m1m4 + m3m7 + m5m8)F + (m2m4 + m2m6 + m6m8)G t +(m m + m m + m m )H KK = . · . 1 5 3 4 7 8 . . 2 2 . R . Rt +(m1m4 + m1m8 + m3 + m4m5 + m5m8 + m7)I 2 2 −1 1 +(m1 + m3m4 + m3m8 + m4m7 + m5 + m7m8)J (16) α2 + p S ··· S +(m1m2 + m1m7 + m2m7 + m3m5 + m3m6 + m5m6)K S +(m m + m m + m m + m m + m m + m m )L 1 3 1 6 2 5 2 7 3 6 5 7 = . +(m1m6 + m1m7 + m2m3 + m2m5 + m3m5 + m6m7)M . 2 2 . X +(m + m2m4 + m2m8 + m4m6 + m + m6m8)N 2 6 S (p+1)×(p+1) +(m1m2 + m1m3 + m2m3 + m5m6 + m5m7 + m6m7)O. and Proof. The result comes from Lemma 2, Lemma 3 and a −→ −→ −→ −→ complex computation. X = Jp + D0(m)Ip + D1(m)A1 + D2(m)A2 + D3(m)A3 −→ −→ −→ −→ −→ +D4(m)A4 + D1(m)A5 + D2(m)A6 + D3(m)A7 + D4(m)A8, In order to facilitate, we denote p − 1 S = −α + m + (m + m + m + m 0 8 1 2 3 4 −→ 9 m := (m0, m1, m2, m3, m4, m5, m6, m7, m8) ∈ GF(q) , + m + m + m + m ). −→ 5 6 7 8 D0(m) := α0, (17) −→ D1(m) := α1 = α5, −→ D2(m) := α2 = α6, −→ D3(m) := α3 = α7, −→ The result can be obtained by the definition of self-dual D4(m) := α4 = α8. codes. (11) Theorem 1. Let p be an odd prime of the form 16l+9 and 4 Eighth Power Residue Double q be a prime power. Suppose α ∈ GF(q), −→m ∈ GF(q)9. Then Circulant Self-Dual Codes Over (1) pure eighth power residue double circulant code −→ GF(2) and GF(4) Pp(m) is self-dual over GF(q) when the following condi- tions hold: In this section, we give some constructions of self-dual −→ codes over GF(2) and GF(4) by MATLAB and MAGMA. D0(m) = −1, −→ And the corresponding minimum hamming distances are D1(m) = 0, −→ solved. Some codes have good minimum distances, even D2(m) = 0, (12) −→ almost satisfy the bounds. D3(m) = 0, −→ D4(m) = 0. Theorem 2. Let p be an odd prime of the form 16l + 9, several pure eighth power residue double circulant self- (2) bordered eighth power residue double circulant code dual codes whose generator matrix satisfies the form of B (α, −→m) is self-dual over GF(q) when the following p Pp(m0, m1, m2, m3, m4, m5, m6, m7, m8) of length 2p over conditions hold: GF(2) are abtained. The parameters that satisfy the con- ditions are listed in the following table when p = 41 = 2 α + p = −1, 16 × 2 + 9. p−1 −α + m0 + (m1 + m2 + m3+ 8 m4 + m5 + m6 + m7 + m8) = 0, When p = 41, the length n of the code is 82, so that −→ D0(m) = −2, the bound of the minimum hamming distance is 16. By −→ (13) D1(m) = −1, our method, the minimum hamming distance of the codes −→ D2(m) = −1, has a maximum of 14, which almost satisfies the bound. −→ D3(m) = −1, The self-dual [82, 41, 14] codes over GF(2) with a good −→ D4(m) = −1. property are obtained.
Proof. According to Lemma 3, Theorem 3. Let ξ be the fixed primitive element of GF(4) satisfying ξ2 +ξ+1 = 0 and p be an odd prime of the form −→ −→ t −→ −→ −→ Pp(m)Pp(m) = Ip + D0(m)Ip + D1(m)A1 + D2(m)A2 16l + 9, pure eighth power residue double circulant self- −→ −→ −→ −→ +D3(m)A3 + D4(m)A4 + D1(m)A5 dual codes Pp(m) of length 2p over GF(4) and bordered −→ −→ −→ −→ +D2(m)A6 + D3(m)A7 + D4(m)A8, eighth power residue double circulant codes Bp(α, m) of (14) length 2(p + 1) over GF(4) can be obtained. Furthermore, and it is obvious that equation α2 + p = −1 holds if and only if α = 0, because p is an odd prime. And the parameter I values except α that meet the conditions are listed in the B (−→m)B (−→m)t = (I K) p+1 = I + KKt, p p p+1 Kt p+1 following table when p = 41 = 16 × 2 + 9 and p = 73 = (15) 16 × 4 + 9. International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 740
Table 1: The parameters of Pp over GF(2) with p = 41
Serial number m0 m1 m2 m3 m4 m5 m6 m7 m8 Min-distance 1 0 0 0 1 1 1 0 1 1 10 2 0 0 1 1 0 0 1 1 1 10 3 0 1 0 0 1 1 1 0 1 10 4 0 1 1 0 1 1 0 0 1 10 5 1 0 0 0 1 0 1 1 1 14 6 1 0 0 1 0 1 1 1 0 14 7 1 0 0 1 1 0 1 0 1 12 8 1 0 1 0 1 0 0 1 1 12 9 1 0 1 1 1 0 0 0 1 14 10 1 1 1 0 1 0 1 0 0 12
Table 2: The parameters of Pp over GF(4) with p = 41
Serial number m0 m1 m2 m3 m4 m5 m6 m7 m8 Min-distance 1 1 1 1 1 ξ 1 1 1 ξ2 14 2 1 1 ξ 1 ξ2 0 0 ξ2 ξ 12 3 1 1 0 0 0 1 0 1 1 14 4 ξ 1 ξ 1 0 0 1 ξ2 ξ2 14 5 0 ξ 0 ξ ξ2 ξ ξ2 ξ2 0 14
Table 3: The parameters of Bp over GF(4) with p = 41
Serial number m0 m1 m2 m3 m4 m5 m6 m7 m8 Min-distance 1 1 1 1 ξ 1 0 ξ ξ ξ 12 2 1 1 ξ 1 ξ2 ξ2 ξ ξ2 ξ 8 3 1 1 ξ2 1 0 ξ2 ξ2 ξ2 1 8 4 ξ2 1 0 0 ξ ξ2 ξ 1 0 12 5 0 1 ξ2 ξ 0 ξ 0 ξ2 1 12
Table 4: The parameters of Pp over GF(4) with p = 73
Serial number m0 m1 m2 m3 m4 m5 m6 m7 m8 Min-distance 1 1 1 ξ ξ 0 1 ξ2 ξ2 0 12 2 1 1 ξ 0 ξ2 1 ξ2 0 ξ 12 3 0 ξ ξ ξ2 0 ξ2 ξ2 ξ 0 6 4 1 1 0 ξ2 ξ 1 0 ξ ξ2 12 5 1 ξ2 ξ2 ξ2 ξ2 ξ ξ ξ ξ 12
Table 5: The parameters of Bp over GF(4) with p = 73
Serial number m0 m1 m2 m3 m4 m5 m6 m7 m8 Min-distance 1 1 1 ξ2 ξ2 ξ 1 ξ ξ ξ2 8 2 0 1 ξ 0 ξ 1 ξ2 0 ξ2 12 3 0 1 0 ξ ξ 1 0 ξ2 ξ2 12 4 0 ξ2 1 0 ξ2 ξ2 1 0 ξ 12 5 0 ξ ξ 1 0 ξ2 ξ2 1 0 12 International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 741
The pure double circulant self-dual codes [82, 41, 14] [2] E. R. Berlekamp, F. J. MacWilliams and N. J. codes and bordered double circulant self-dual codes self- A. Sloane, “Gleason’s theorem on self-dual,” IEEE dual [84, 42, 12] codes over GF(4) which have good Transactions on Information Theory, vol. 18, pp. property are listed, especially the values of parame- 409-414, 1972. ters m0, m1, m2, m3, m4, m5, m6, m7, m8. Besides, we get [3] S. Buyuklieva, “On the binary self-dual codes with some other codes when p = 73. an automorphism of order 2,” Designs Codes & Cryp- tography, vol. 12, no. 1, pp. 39-48, 1997. [4] S. Bouyuklieva and I. Bouyukliev, “An algorithm for 5 Conclusion classification of binary self-dual codes,” IEEE Trans- actions on Information Theory, vol. 58, no. 6, pp. In this paper, we construct double circulant self-dual 3933-3940, 2012. codes by higher power residues, especially eighth power [5] J. H. Convay and V. Pless, “On the enumeration of residues. self-dual codes,” Journal of Combinatorial Theory, First of all, the relationship of the eighth power residue vol. 28, no. 1, pp. 26-53, 1980. cyclotomic numbers is given. Suppose that there are eight [6] J. H. Convay and J. A. Sloane, “A new upper bound matrices with nine parameters on the GF(q), and the ex- on the minimal distance of self-dual codes,” IEEE pression for multiplying any two matrices is represented Transactions on Information Theory, vol. 36, no. 6, by the cyclotomic numbers. From the linear combination pp. 1319-1333, 1990. of eight circulant matrices, we can construct the circu- [7] C. Ding, T. Helleseth, and W. Shan, “On the linear lant matrix R. Two kinds of codes are represented by R. complexity of legendre sequences,” IEEE Transac- One is pure circulant codes, and the other is bordered cir- tions on Information Theory, vol. 44, no. 3, pp. 1276– culant codes. Combined with the necessary condition of 1278, 1998. self-dual code (GGT = 0), parameters can be determined [8] S. T. Dougherty, J. L. Kim and P. Sole, “Double to satisfy the condition of self-dual code, which renders circulant codes from two class association schemes,” the pure double circulant self-dual codes and bordered Advances in Mathematics of Communications, vol. 1, circulant self-dual codes can be obtained. By program- no. 1, pp. 45-64, 2007. [9] S. T. Dougherty, J. Gildea, A. Korban, A. Kaya, A. ming, the parameters that satisfy the conditions and the Tylyshchak and B. Yildiz, “Bordered constructions minimum hamming distance are given. of self-dual codes from group rings and new extremal We exploit a new way to construct self-dual codes over binary self-dual codes,” Finite Fields and Their Ap- GF(2) and GF(4) by prime p of the form 16f+9, and some plications, vol. 57, pp. 108-127, 2019. codes have good properties. Examples of such codes are [10] P. Gaborit, “Quadratic double circulant codes over binary self-dual [82, 41, 14] code, quaternary self-dual [82, fields,” Journal of Combinatorial Theory, Series A, 41, 14] code and quaternary self-dual [84, 42, 12] code. vol. 97, no. 1, pp. 85-107, 2002. [11] M. Harada, M. Kiermaier, A. Wassermann, et al., “New binary singly even self-dual codes,” IEEE Acknowledgments Transactions on Information Theory, vol. 56, no. 4, pp. 1612-1617, 2010. The work is financially supported by National Nat- [12] T. Helleseth, “Double circulant quadratic residue ural Science Foundation of China (No. 61902429, codes,” IEEE Transactions on Information Theory, No.11775306), Shandong Provincial Natural Sci- vol. 50, no. 9, pp. 2154-2155, 2004. ence Foundation of China (No. ZR2017MA001, [13] W. Huffman, “On the classification and enumeration ZR2019MF070), Fundamental Research Funds for of self-dual codes,” Finite Fields and Their Applica- the Central Universities (No. 19CX02058A, No. tions, vol. 11, pp. 451-490, 2005. 17CX02030A), the Open Research Fund from Shandong [14] W. C. Huffman, R. A. Brualdi and V. S. Pless, Hand- provincial Key Laboratory of Computer Networks, Grant book of Coding Theory, 1998. No. SDKLCN-2017-03, Key Laboratory of Applied [15] K. Ireland and M. Rosen, “Gauss and jacobi sums,” Mathematics of Fujian Province University (Putian Uni- Mathematical Gazette, vol. 84, pp. 75-92, 1998. versity)(No.SX201702, No.SX201806), and International [16] M. Karlin, “New binary coding results by circulants,” Cooperation Exchange Fund of China University of IEEE Transactions on Information Theory, vol. 15, Petroleum (UPCIEF2019020). no. 1, pp. 81-92, 1969. [17] A. Kaya, B. Yildiz and I. Siap, “New extremal bi- nary self-dual codes from F4 + uF4-lifts of quadratic References circulant codes over F4,” Finite Fields and Their Ap- plications, vol. 35, pp. 318-329, 2015. [1] K. T. Arasu and T. A. Gulliver, “Self-dual codes over [18] A. Kaya, B. Yildiz, “Various constructions for self- Fp and weighing matrices,” IEEE Transactions on dual codes over rings and new binary self-dual Information Theory, vol. 47, no. 5, pp. 2051-2055, codes,” Discrete Mathematics, vol. 339, pp. 460-469, 2001. 2016. International Journal of Network Security, Vol.22, No.5, PP.736-742, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).03) 742
[19] A. Kaya, “New extremal binary self-dual codes of [28] N. J. A. Sloane and J. G. Thompson, “Cyclic self- lengths 64 and 66 from R2-lifts,” Finite Fields and dual codes,” IEEE Transactions on Information Their Applications, vol. 46, pp. 271-279, 2017. Theory, vol. 29, pp. 364-366, 1983. [20] A. Kaya, B. Yildiz and I. Siap, “New extremal binary [29] N. Yankov, M. Ivanova and M. H. Lee, “Self-dual self-dual codes of length 68 from quadratic residue codes with an automorphism of order 7 and s- 2 codes over F2 +uF2 +u F2,” Finite Fields and Their extremal codes of length 68,” Finite Fields and Their Applications, vol. 29, pp. 160-177, 2014. Applications, vol. 51, pp. 17-30, 2018. [21] A. Kaya, B. Yildiz and I. Siap, “New extremal binary [30] T. Zhang and G. Ge, “Fourth power residue dou- self-dual codes of length 68 from quadratic residue ble circulant self-dual codes,” IEEE Transactions on codes over + u + u2 [J],” Finite Fields and F2 F2 F2 Information Theory, vol. 61, no. 8, pp. 4243-4252, Their Applications, vol. 29, pp. 160-177, 2014. 2015. [22] C. L. Mallows and N. J. A. Sloane, “An upper bound for self-dual codes,” Information & Control, vol. 22, no. 2, pp. 188-200, 1973. [23] R. Meng, T. Yan, “New constructions of binary in- Biography terleaved sequences with low autocorrelation,” Inter- national Journal of Network Security, vol. 19, no. 4, Changsong Jiang was born in 1997 in Sichuan Province pp. 546–550, 2017. of China. He was graduated from China University of [24] V. Pless and N. J. A. Sloane, “On the classification Petroleum in 2019. He is currently studying for a post- and enumeration of self-dual codes,” Journal of Com- graduate degree at University of Electronic Science and binatorial Theory, vol. 18, no. 3, pp. 313-335, 1975. Technology of China. Email: [email protected] [25] E. M. Rains, “Shadow bounds for self-dual codes,” IEEE Transactions on Information Theory, vol. 44, Yuhua Sun was born in 1979. She was graduated from no. 1, pp. 134-139, 1998. Shandong Normal University, China, in 2001. In 2004, [26] C. Shannon, “A mathematical theory of communi- she received the M.S. degree in mathematics from the cation,” Bell System Technical Journal, vol. 27, pp. Tongji University, Shanghai and a Ph.D. in Cryptogra- 379-423, 1948. phy from the Xidian University. She is currently a lec- [27] S. Zhang, T. Yan, Y. Sun, L. Wang, “Lin- turer of China University of Petroleum. Her research in- ear complexity of two classes of binary inter- terests include cryptography, coding and information the- leaved sequences with low autocorrelation,” ory. Email: sunyuha [email protected] International Journal of Network Security, Xueting Liang was born in 1997 in Anhui Province of 2019. (http://ijns.jalaxy.com.tw/contents/ China. She is studying at China University of Petroleum. ijns-v22-n6/ijns-2020-v22-n6-p834-0.pdf) Email:[email protected] International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 743
Identity-based Public Key Cryptographic Primitive with Delegated Equality Test Against Insider Attack in Cloud Computing
Seth Alornyo1,2, Acheampong Edward Mensah1, and Abraham Opanfo Abbam1 (Corresponding author: Seth Alornyo)
School of Information and Software Engineering, University of Electronic Science and Technology of China1 4 E 2nd Section, 1st Ring Rd, Jianshe Road, Chenghua, Chengdu, Sichuan, China Computer Science Department, Koforidua Technical University, Koforidua-Ghana2 (Email: [email protected]) (Received Mar. 3, 2019; Revised and Accepted Oct. 3, 2019; First Online Jan. 23, 2020)
Abstract two ciphertext from user A and user B are encryption of the same message. . The notion of attacks perpetuated by an insider (cloud However, there has been a recent attack perpetuated server) is paramount in this era of cloud computing and by an adversary who is able to launch what is referred to data analytics. When a cloud server is delegated with as the insider attack [15]. In this era of cloud computing, certain responsibilities, it is possible for a cloud server equality test function are outsourced to a cloud server to peddle with users’ encrypted data for profit gains. to examine whether two ciphertext are encryptions with The cloud server takes advantage of it’s authorization to same message [4]. Such a delegated responsibility to the launch what we referred to as insider attack. We put for- cloud server gives it the leverage to launch the insider ward a new improved scheme on identity-based public key attack on users’ ciphertext. This attack when successful cryptographic primitive which integrate delegated equal- enables the cloud server peddle with encrypted data for ity test to resist insider attack in cloud computing. Our economic gains. If the cloud server has legitimate access scheme resist the insider attack perpetuated by the cloud to all users ciphertext and can test their equality, then server (insider). We refer to our new scheme as identity- the cloud server (insider) should be resisted from peddling based public key cryptographic primitive with delegated with users’ ciphertext. Recent schemes on insider attack equality test against insider attack in cloud computing has not been able to fully solve this problem. (IB-PKC-DETIA). We construct our scheme using a wit- Recent works of insider attack by [15] and a security ness based cryptographic primitive with an added pairing analysis and modification by [19] enables the user to gen- operation. Our scheme achieves weak indistinquishable erate a token tokID to prevent the tester from launching identity chosen ciphertext (W-IND-ID-CCA) security us- the insider attack. Therefore, their scheme is susceptible ing the random oracle model. to the insider attack because the cloud server was not del- Keywords: Identity Based Encryption; Insider Attack; egated to perform equality test. When the cloud server is Witness Based Encryption delegated to perform equality test on users ciphertext, it could guess the token tokID to launch the insider attack. The insider attack resistance scheme proposed by [15] and 1 Introduction a security analysis and modification by [19] enable the tester after receiving the secret trapdoor to successfully The concept of searchable public key encryption which guess the token tok and launch the attack as follows: integrate keyword search (PEKS) was unveiled by [2]. In- ID spite of this, there has been several works on this cryp- 1) The cloud server (insider) receives a valid trapdoor tographic primitives where a search on ciphertext allows Td and tries to find out the message m and the token a third party to search over an encrypted data without from Td. revealing any information about the ciphertext. A Public 2) The cloud server (insider) computes IB-PKC-DETIA key cryptographic primitive with equality test was un- 0 ciphertext C of a guessed message m and a guessed veiled by [18] and was used to manage encrypted data for 0 token tok . clients. Recently, [11] proposed identity based cryptosys- ID tem with integrated equality test (IBE-ET) in cloud com- 3) The cloud server (insider) checks whether puting and it enables the cloud server to verify whether T est(C,Td, tokID)= 1. The equation holds if a International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 744
0 0 guess of the message m and a token tok is suc- tokID could successfully enable the cloud server guess a ID 0 cessful to the adversary (cloud server). Otherwise, new token tokID to launch the insider attack. When a go back to Step 2. cloud server (insider) is delegated to perform equality test, it is possible for the cloud server to launch the insider at- Therefore, if the guess of a message and a token are suc- tack because a guess of a token is possible. To the best cessful as indicated above, then the cloud server (insider) of our knowlege, their scheme cannot resist the insider could launch the insider attack because of the delegated attack as explained above. A scheme to resist the insider responsibility. Other works of IBE-ET [11] and a security attack with delegated equality test in IBE-ET with the modification by [17] assumed that it’s possibe to resist the cloud server (insider) authorized to perform equality test insider attack. On the contrary, when the cloud server is is still problem to the research community. delegated to conduct equality test, it is possible the adver- sary can launch the insider attack. Therefore, we propose a new improved scheme to resist insider attack perpetu- 1.2 Our Contribution ated by adversary (cloud server) delegated to undertake equality or equivalence test on ciphertext. Wu et al. [17] unveiled a variant to IBE-ET scheme. How- ever, their scheme allowed anyone to perform equality 1.1 Related Work test between two ciphertext hence lack authorization for equality test. The security analysis and modification in [9] Boneh et al. [2] first unveiled the primitive of PEKS and did not authorize a third party (cloud server) by gen- was later examined by [11] on their work on off-line key- erating a trapdoor function for the cloud server to per- word guessing attack on recent keyword search schemes. form equality test. It is not clear whether a computed Their work showed that PEKS scheme was vulnerable to trapdoor to the cloud server could resist the insider at- the insider attack. A related works on a delegated tester tack as claimed in their security analysis and modification was later unveiled by [14] whereby only the designated scheme. tester (server) could perform equality test on the cipher- To address this problem, we added a pairing operation text. Their scheme concentrated on security of the trap- to the witness cryptographic primitive in [6] to resist in- door in PEKS and a resistant to insider attack. Chen et sider attack in IBE-ET. Witness based encryption ensure al. [9,17] also proposed a new general framework for secure that given a witness relation R(W, X) of an NP language public key cryptosystem with keyword search and a dual- L, an encryption of (m, w) can be tested by a generated server public key primitive with keyword search for secure trapdoor (m0, x). The tester checks if m0 = m. However, cloud storage to resist the insider attack so far as there it is difficult to compute w from x under a defined witness was no collusion by two servers [9]. However, keyword relation. guessing attacks against the insider has being a challeng- Our scheme achieves Weak-IND-ID-CCA (W-IND-ID- ing problem in PEKS until recently, [12] proposed the no- CCA) and a resistant to insider attack. Our scheme tion of witness-based searchable cryptographic primitive achieves a stronger notion of IND-ID-CCA security for to resist insider attack in PEKS. IBE-ET using the random oracle model. A special type of searchable encryption was un- veiled in [18] for a general equality test. However, [11] introduced identity based cryptosystem with equality 1.3 Organization test (IBE-ET) in cloud computing which integrated the The rest of the paper is organized as follows. In Section 2, identity-based primitive into public key cryptosystem our scheme provide some preliminaries for our construc- with equality test [2], it gains the advantages of equal- tion. In Section 3, our scheme formulate the notion of ity test in [18]. In their construction, search functions IB-PKC-DETIA. In Section 4, construction of IB-PKC- were delegated to the service provider. DETIA and prove its security in Section 5. In Section 6, Existing works on insider attacks mainly focused on we compare our work with other related works. In Sec- PEKS schemes in [8,19] whiles few works on PKEETs ex- tion 7, we conclude our paper. tensions [7,11,13,15,18] and IBE-ET applications in [19] were not resistant to insider attack. To solve the prob- lem of insider attack in IBE-ET, ID-based primitive with 2 Preliminaries equality test against the insider attack was recently put forward by Wu et al. [17], the scheme claimed their scheme Definition 1. Billinear map: Let G and GT be two mul- achieves confidentiality in IBE-ET but the work of Lee et tiplicative cyclic groups of prime order p. Suppose that g al. [9] refuted their claim of weak indistinguishability of is a generator of G. A bilinear map e : G × G → GT IBE-ET. Lee et al. [9] modified their security analysis satisfies the following properties: claims to achieve the weak indistinguishability as unveiled a b in [17]. While in [17], their scheme ensured that the desig- 1) Bilinearity: For any g ∈ G, a and b ∈ Zp, e(g , g ) = nated users’ token tokID were‘ changed per a correspond- e(g, g)ab. ing identity, but in [9] scheme, they ensured that the to- ken was fixed for all group users. Therefore, a fixed token 2) Non-degenerate: e(g, g) 6= 1. International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 745
1) Setup: It takes as input security parameter k and returns the public key K and msk.
2) Extract: It takes as imput msk, an arbitrary ID ∈ {0, 1}∗ and returns a decryption key dk for that iden- tity.
3) WBInstGen: It takes as input the security param- eter k, an arbitrary ID ∈ {0, 1}∗ and returns a pri- vate witness key w ∈ W for that identity wID, where W InsGen(w) = x and x ∈ X where (w, x) satisfies the witness relation R.
4) Trapdoor: It takes as input decryption key dk, an arbitrary ID ∈ {0, 1}∗, an instance x ∈ X and re- Figure 1: System model for IB-PKC-DETIA turns a trapdoor td for that identity. 5) WBEncrypt: It takes as input an identity ID ∈ 3) Computable: There is an efficient algorithm to com- {0, 1}∗, a plaintext m ∈ M with a random cho- pute e(g, g) for any g ∈ G. sen witness w ∈ W and outputs a ciphertext C = (x, c) where x ∈ X from a generated witness Definition 2. Bilinear Diffie-Hellman (BDH) problem: W InsGen(w) = x, and (w,x) satisfies the witness Let G,GT be two groups of prime order p. Let e : G × relation R. G → GT be an admissible bilinear map and let g be a generator of G. The BDH problem in hp, G, GT, ei is as 6) WBDecrypt: The algorithm takes as input the ci- a b c ∗ follows: Given hg, g , g , g i, for random a,b,c ∈ Zp, for phertext c ∈ C, a private decrption key dk and a any randomized algorithm A computes value e(g, g)abc ∈ witness w ∈ W and returns a plaintext m ∈ M, if GT with advantage: and only if C is a valid ciphertext with the ID and a witness w ∈ W . BDH a b c abc ADVA P r[A(g, g , g , g ) = e(g, g) ]. The BDH assumption holds if for any polynomial-time algorithm 7) Test: It takes as input a ciphertext CA ∈ C of a BDH A, it’s advantage AdvA is negligible. receiver with IDA, a trapdoor tdA for the receiver with IDA, a ciphertext CB ∈ C of a receiver with IDB and trapdoor tdB for the receiver with IDB, and Definition 3. Witness Relation: Given a witness re- returns 1 if CA and CB contains the same message. lation R(W, X) on an NP language L [3], a randomly Otherwise return ⊥. chosen w ∈ W generates an instance x ∈ X defined over the relation R. For any polynomial algorithm: AIB−PKC−DETIA, P r[AIB−PKC−DETIA(k, w) = x] = 3 Security Model 1, and AIB−PKC−DETIA, P r[AIB−PKC−DETIA(k, x) = w] < ε(k), where k is a security parameter and ε a negli- Definition 4. (Weak-IND-ID-CCA). We let t = gible functionon on k. (Setup, Extract, W BInstGen, T rapdoor, W BEncrypt, W BDecrypt, T est) be the same scheme and a polynomial Our model has four roles which includes: users, PKG, time algorithm A. cloud server and with adversary (see Figure 1). Users stores their encrypted sensitive data in the cloud. The 1) Setup: The challenger runs the security parameter cloud server with adversary is resisted from peddling on input k and derives K and randomly takes a wit- with users encrypted sensitive data for economic gains. ness w ∈ W and generates an instance x ∈ X of a In this section, we give formal definitions of our scheme. witness relation R(W, X) defined on an NP language We employ a witness based cryptographic primitive to L. It gives the relation R to the adversary. resist the insider attack in IBE-ET. Our scheme achieves weak chosen ciphertext security (i.e. W-IND-ID-CCA) 2) Phase 1: The adversary issues query N1, N2, ··· , under the defined security model. Nm. Each query is of the form:
In identity-based public key cryptographic prim- • Query (IDi): The challenger run H(.) to gen- itive with delegated equality test against in- erate dki corresponding to the public identity sider attack scheme, we specify seven algorithms: (IDi). It sends dki to A. Setup, Extract, W BInstGen, T rapdoor, W BEncrypt, W BDecrypt, T est, where M • Trapdoor (IDi): The challenger runs the pri- and C are its plaintext space and ciphertext space, vate decryption on W InsGen using a randomly respectively: chosen witness w ∈ W of the relation R(W,X) International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 746
on an NP language L. The algorithm generates Definition 5. A witness relation R(X,W ) on an an instance x and compute a trapdoor tdi using NP language L consist of the following polynomial- dki via trapdoor algorithm. Finally, it sends time algorithm. Given a randomly chosen w ∈ tdi to A. W over a witness relation R(w, x)=1 defined on an NP language L if for any polynomial algorithm: A , P r[A (k, w) = x] = 1, • Decryption queries (ID ,C , w): The challenger IB−PKC−DETIA IB−PKC−DETIA i i and A , runs the decryption algorithm to decrypt the IB−PKC−DETIA P r[A (k, x) = w] < ε(k), where k is a ciphertext C by running the extract algorithm IB−PKC−DETIA i security parameter and ε a negligible functionon on k. to obtain dki corresponding to the public key (IDi). Finally, it sends the plaintext Mi to A. 1) Setup: The system takes a security parameter k and returns the public parameter K and master secret key 3) Challenge: After Phase 1 is over, A submits two msk. ∗ equal-length message (m0, m1) and ID to be chal- • The algorithm generates the pairing parameters lenged by the challenger. However, both (m0, m1) are not issued in the encryption query and ID∗ is G and GT of prime order p and an admissible also not in the extract query in Phase 1. The chal- bilinear map e : G × G → GT . Choose a ran- lenger randomly picks b ∈ {0, 1} and respond with dom generator g ∈ G. ∗ ∗ ∗ C ← Enc(Mb,ID , w ). The algorithm generates • The system choose cryptographic hash func- ∗ ∗ ∗ ∗ a challenge trapdoor td = (ID , x ) by runing the tions: H : {0, 1} → G, H1 : GT → G, ∗ ∗ τ1+τ2 trapdoor td ← td(dk, Mb, x ) algorithm and returns H2 : R × GT → {0, 1} , where τ1 and τ2 td∗ to A. are security parameters. The elements of G are represented in τ1 bits and elements of x ∈ X 4) Phase 2: The adversary issues query N1, N2, ··· , of a witness relation R are represented in τ2 bits. Nm. Each query is of the form: • The algorithm randomly picks a witness w ∈ W • Query (IDi). The challenger responds as in and generate an instance x ∈ X on NP lan- ∗ Phase 1, since IDi 6= ID . w guage L of a relation R(W, X) and set g1 = g ∗ x • Trapdoor query (IDi). Where x 6= x . The and g2 = g . The ciphertext space C ∈ challenger respond in the same way as in (G∗ × {0.1})τ1+τ2 . The message space is M ∈ Phase 1. G∗. It publishes the public parameter K = {p, G, GT , R, e, g, g1, g2,H,H1,H2}. • Decryption Query(IDi,Ci). Where (IDi,Ci) 6= (ID∗,C∗), the challenger respond in the same 2) WBInstGen: The algorithm takes as input the se- way as in Phase 1. cret parameter k and arbitrary ID ∈ {0, 1}∗ and compute hID = H(ID) ∈ G and randomly choose 0 0 5) Output: A submits a guess b on b. If b = b, we say a witness w ∈ W and generates a corresponding in- A wins the game. stance x ∈ X on a witness relation R(W, X). 3) Extract: Given a string ID ∈ {0, 1}∗, the algorithm We define A’s advantage on breaking the scheme as computes hID = H(ID) ∈ G and set the private 0 1 w x AdvIB−PKC−DETIA(k) = P r[b = b]− 2 is negligible. decryption key dkID = (hID, hID) where (w, x) is the master key corresponding to the relation R. In the Weak-IND-ID-CCA (W-IND-ID-CCA) model, 4) Trapdoor: On input a string ID ∈ {0, 1}∗, the the adversary has access to ciphertexts but cannot algorithm compute hID = H(ID) ∈ G and set compute the witness w ∈ W from x ∈ X of a relation x tdID = (hID) where x is an instance of the relation R(W, X) over NP language L. R. 5) WBEncypt: The algorithm on input the public pa- 4 Construction rameter K, ID, it computes hID = H(ID) ∈ G and encrypt M ∈ G by choosing two random num- ∗ Our scheme aims to resist an attack continuum perpetu- bers (r1, r2) ∈ Zq with a randomly chosen witness ated by a cloud server delegated to perform equality test. and instance (w, x). The algorithm set the cipher- The cloud server (insider) is considered as an adversary text C = (C1,C2,C3,C4) as: A who is authorized only to perform equlaity test but x r1 C1 = M .H2(e(hID, g2) ), should not be able to peddle with user’s ciphertext. C = gr1 However, only the authorized cloud server could perform 2 r2 equality test. C3 = g , r2 C4 = (M k w) ⊕ H2(C1 k C2 k C3 k e(hID, g1) ). International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 747
6) Decrypt: To decrypt, the algorithm requires an in- The algorithm output 1 if the following equation put the ciphertext C, private decryption key dkID = holds. Otherwise 0. Hence: w x (hID, hID) where C = (C1,C2,C3,C4) corresponding
to the ciphertext encrypted with ID. It computes: e(C2A ,TB) =e(C2A ,TB)
0 0 w r xB r x (M k w ) = C4 ⊕ H2(C1 k C2 k C3 k e(C3, h )), 1A 1A B ID e(C2A ,TB)=e(g ,MB )=e(g, MB)
where w ∈ W of the witness relation R(W, X) of r1 xA r1 xA e(C2 ,TA)=e(g B ,M )=e(g, MA) B the NP language L. The algorithm checks whether B A 0 0 0 r C1 = (M | x ) and C2 = g 1 . Hence, if both holds, 0 return M . Otherwise ⊥. Remark 1. Given a witness w ∈ W , the user should be able to compute x to recover M on a witness relation R. However, given the instance x of a witness rela- 7) Test: The algorithm on input a ciphertext CA, a tion R, it is difficult to recover its corresponding wit- trapdoor tdA and a given sender’s ciphertext CB. ness w. Therefore the cloud server can only perform x x The algorithm test whether MA = MB by comput- equality test but cannot generate a new ciphertext. ing: If MA = MB, it implies that e(C2A , TB)=e(C2B , TA). C Given the witness relation R(W, X) defined over an T = 1A A NP language L. It means that: AIB−PKC−DETIA, H2(e(C1A , tdIDA )) C P r[AIB−PKC−DETIA(k, w) = x] = 1, and T = 1B A , P r[A (k, w) = w] < B H (e(C , td )) IB−PKC−DETIA IB−PKC−DETIA 2 1B IDB ε(k), where k is a security parameter and ε is a negligible If the above equation holds, the algorithm outputs 1. function on k. Otherwise 0. Thus: 5 Security Analysis e(TB,C2A ) = e(TA,C2B ). We define W-IND-CCA security for IBE-ET via the fol- Remark 1. With the work in [17], the token gen- lowing game similar in [18]. erated was changed per identity in their con- A probabilistic polynomial time (PPT) adversary A struction whiles a security analysis and modifi- achieves the advantage ε on breaking u=(Setup, Extract, cation in [9] had a fixed token for all group users. W BInstGen, T rapdoor, W BEncrypt, W BDecrypt, T est). We note that since the token was fixed in their Given BDH instance, a PPT adversary B takes advantage construction, the insider attack is paramount in 0 of A to solve the BDH problem with a probability of ε . their scheme. A randomly chosen witness w un- Suppose B holds a tuple (g,U,V,R) where a = log U, der the relation R(W, X) is considered to be se- g b = log V and c = log R are unknown. Given the gen- cure and should avoid a reuse. A reuse of a ran- g g erator g of G, B is supposed to output e(g, g)abc ∈ G . domly chosen witness will compromise the secu- T The game between B and A runs as follows: rity of our scheme hence should be discarded im-
a.r1 r1 ∗ mediately the decryption process is completed. Setup. B sets g1 = g = U , where r1 ← Zq and sets trapdoor td = x ← X from a witness relation Correctness: R(W, X). B gives g1 to A. Let CA ← WBE(MA,IDA, g2A , xA) and CB ← Phase 1. WBE(MB,IDB, g2B , xB) generated by user A and user B respectively. Then: 1) H Query: A query the random oracle H. A queries IDi to obtain hID. B responds CA = (CA1 ,CA2 ,CA3 ,CA4 ). with hID. If IDi has been in the H table(ID , hw , coin). Otherwise, for each ID , Test algorithm computes results as: i IDi i B responds as follows:
C1A C1B CA = , CB = . • B tosses a coin with P r[coini = 0] = δ. If H2(e(C2A ,tdIDA )) H2(e(C2B ,tdIDB )) wi coini = 1, responds to A with hID = g , xA xB MA .H2(e(C2A ,tdIDA )) MB .H2(e(C2B ,tdIDB )) wiy TA= , TB= . wi ← W . Otherwise, B sets hID = g = H2(e(C2 ,tdID )) H2(e(C2 ,tdID )) A A B B V wi . r r xA 1A xA xB 1B xB M .H2(e(g ,h )) M .H2(e(g ,h )) A 1A IDA B 1B IDB r r • B responds with hIDi , then adds TA= 1A xA , TB= 1B xB . H2(e(g ,h ) H2(e(g ,h ) 1A IDA 1B IDB (IDi, hID, wi, coin) in the H table which is xA xB initially empty. TA = MA and TB = MB . 2) Extract Query: A queries private key of IDi. B xA xB Therefore, MA =MB . responds as follows: International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 748
∗ • B obtains H(IDi) = hID in the H table. • B searches the H table, if coin = 1, B re- If coini = 0, B responds ⊥ and terminates sponds with ⊥ and terminates the game, since ∗ w∗ the game. hID = g . It is observed that for a given wit- wi ness relation R(W, X), it is difficult to compute • Otherwise, B responds with dkIDi = g1 = r1.wi the corresponding witness w for a given instance U , where (wi, hIDi ) is in the H table. x. • B sends dkIDi to A, then stores (dkID,IDi) in the private key list which is initially • Otherwise, B randomly selects b ∈ {0, 1}, how- w∗ x∗ empty. ever, dkID = (hID, hID) and calculates: 3) H2 Query: A queries Di ∈ R × GT → ∗ x ∗ r∗ τ1+τ2 1 {0, 1} . B responds with Si ∈ H2(Di) in C1 = Mb .H2(e(hID, g2) ) ∗ ∗ r1 the H2 table. Otherwise, for every Di, B selects C2 = g τ1+τ2 ∗ a random string Si = {0, 1} as H2(D). B ∗ r2 C3 = g responds A with H2(Di) and adds (Di,Si) in ∗ ∗ ∗ C4 = (Mb||w ) ⊕ S , the H2 table which is initially empty. 4) Trapdoor Query: B runs the private decryption where w ∈ W of a witness relation R(W, X), S∗ key queries on (ID ) to obtain dk = (gwi , gxi ) ∗ ∗ ∗ ∗ ∗ w i ID 1 1 = H2(D ) and D = (C1 ||C2 ||C3 ||e(C3, hID)), and responds A with td = gxi . td is the w IDi 2 IDi where hID is unknown and B want A to com- first element of the decryption key. ∗ ∗ ∗ ∗ ∗ pute. C = (C1 ||C2 ||C3 ||C4 is a valid cipher- 5) Encryption Query: A queries Mi encrypted text for Mb. with IDi. B responds as follows: • B responds A with C∗. • B searches the H table and obtain hID and computes h = (hwi , hxi ) where (w, x) Phase 2: IDi IDi IDi are randomly chosen from the witness rela- 1) H Query: A queries as in Phase 1. tion R(W, X). ∗ 2) Extract Query: A queries as in Phase 1, but • A selects r1i , r2i ← Zq and computes: ∗ IDi 6= ID . xi r1 C1i = M .H2(e(hIDi , g2) ) 3) H2: A issues the query as in Phase 1. r1i C2i = g r 4) Trapdoor Query: A queries as in Phase 1, but C = g 2i 3i responds to trapdoor queries the same as in r2i Di = (C1i ||C2i ||C3i ||e(hIDi , g1) ). Phase 1. However, the adversary given the wit- ness relation instance x ∈ X, the adversary can- • B queries O to obtain S = H (D ). H2 i 2 i not compute the corresponding witness w ∈ W • B computes C = (M ||w ) ⊕ S . 4i i i i of the relation R(W, X) to generate a new ci- ∗ Then B responds with Ci = (C1i ,C2i ,C3i ,C4i ). phertext C . 6) Decryption Query: A queries C to be decrypted i 5) Encryption Query: The message Mi ∈ {m0, in ID . B responds as follows: i m1}, A queries as in Phase 1. • B searches the H table to obtain h . IF IDi 6) Decryption Query: A queries as in Phase 1, ex- coini = 1, obtain dkID of IDi in the pri- ∗ ∗ i cept that ciphertext (Ci,IDi) 6= (C ,ID ). vate key list to decrypt Ci. Then B com-
putes the bilinear map with dkIDi as: Result: Given a witness relation R, a randomly chosen witness w ∈ W generates an instance x ∈ X. Given r2i wi r2iwi e(C3i , hIDi ) = e(g , g ) = e(g, g) . x the instance x to compute tdID = hID, it is dificult • B computes Di = to compute the corresponding witness w ∈ W . 0 0 0 r2i (C1i ||C2i ||C3i ||e(hIDi , g1) ) and ob- However, A guess b on b. If b 6= b and w 6= w, B 0 tains Si in the H2 table. B obtains Mi and responds with failure and terminate the game. If b = 0 r2i by C4i ⊕ Si. b and w = w, then B gets the results of the BDH Finally, B computes C∗ ,C∗ with M and r tuple by guessing the inputs of H query. However, 1i 1i i 2i 2 decrypted from Ci. If it is a valid ciphertext this is not possible under the define witness relation ∗ ∗ that C = C1 and C = C2 . B responds R on NP language L. B aborts the game because, 1i i 2i i 0 with M . Otherwise ⊥. |P r[b = b] − 1 | ≥ ε . i 2 e(qtd+qe+qd+1) Challenge: Once Phase 1 is over, A outputs two mes- ∗ sages m0, m1 of equal length and ID to be chal- Trapdoor Security: We further provide a trapdoor se- lenged, where both m0, m1 are not issued in encryp- curity (TD) experiment to our scheme: tion Query and ID∗ is not queried in Extract Query in Phase 1. B responds as follows: ExpW −IND−ID−CCA (k). IB−PKC−DET IA,A International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 749
Table 1: Efficiency comparisons of PKEETs variant PKEETs IA Enc Dec Test Del Security [18] N 3Exp 3Exp 2P N/A OW-CCA [16] N 4Exp 2Exp 4P 3Exp OW/IND-CCA [15] N 5Exp 2Exp 4Exp N/A OW/IND-CCA [13] N 1P+5Exp 1P+4Exp 4P+2Exp 3Exp OW/IND-CCA [11] N 6Exp 2P+2Exp 4P 2Exp OW/IND-CCA [17] Y 1P+3Exp 1P+2Exp 2P N/A W-IND-ID-CCA Ours Y 2P+3Exp 1P+1Exp 4P 2Exp W-IND-ID-CCA
00 00 00 00 00 00 Remark: In this table, Exp refers to the exponent computation, P refers to the pairing computation, IA 00 00 00 00 00 00 refers to insider attack, Y refers to ’Yes’ as a supportive remark, N refers to ’No’ as not supportive and Del refers to the delegation, W-IND-ID-CCA refers to weak indistinguishable chosen ciphertext attack against identity, OW-ID-CCA refers to one-way chosen ciphertext attack against ientity and IND-ID-CPA refers to indistinguishable chosen plaintext attack against identity.
Given a security parameter k, a witness w ∈ W and 6 Comparison A adversary against trapdoor security (TD). The ex- periment between the adversary A and the challenger In this section, we made a comparison (Table 1) on is as follows: the efficiency of algorithms adopted in our scheme with other PKEET variants. Other PKEET variants (Ta- WInsGen Generation Phase: The challenger runs ble 1) achieved a one-way chosen ciphertext attack (OW- the WInsGen algorithm with a random witness pa- CCA), one-way indistinquishable chosen ciphertext at- rameter w ∈ W . It generates a corresponding in- tack (OW/IND-CCA) and a weak indistinquishable iden- stance x ∈ X. The adversary A computes an in- tity chosen ciphertext attack (W-IND-ID-CCA) security. stance x∗(x∗ ∈/ X) from a randomly chosen witness The extended PKEET schemes cost three to four steps ∗ ∗ x∗ w . Finally, it gives tdID = hID to A. to conduct the equality test including analyzing trapdoor and inverse-computing trapdoor. Phase 1: A adaptively ask the challenger for the follow- The above comparison shows that our scheme can resist ing trapdoor oracle: insider attack, whereas others do not have such ability ex- cept in [17]. Even though Wu et al.’s scheme resist insider 1) Trapdoor oracle: On input a message M and attack, it does not provide delegation to the cloud server instance x ∈ X where x 6= x∗ submitted by A. (insider) to perform equality test. It is possible for Wu et x al.’s scheme to fail the insider attack resistance when the It output the trapdoor tdID = hID by running the trapdoor algorithm. cloud server is delegated to perform equality test. How- ever, our scheme ensures that equality test is delegated 2) Challenge Phase: A submit two messages to the cloud server and the cloudserver is resisted from (m0, m1) with equal length. The challenger launching the insider attack. picks b ← {0, 1} and generate challenge trap- ∗ The experiment results are shown in Figure 2. This door td∗ = hx corresponding to the challenge ID ID experiment is executed on a desktop computer with an i5- ciphertext M ||x ⊕ H (e(hx ,C )) by running b 2 ID 2 4460 CPU @3.2 GHz and 4gigabyte RAM, running Win- the WInsGen algorithm and returns td∗ to A. ID dows 7, 64 bit system and VC++ 6.0, by using PBC Li- brary [10]. The time consumptions were obtained from Phase 2: a repeated simulations to obtain an objective computa- tional cost comparison (see Figure 2) of ours with Yang et 1) TDID Query: A continue to ask the oracle for al. [18], Tang et al. [15, 16], Ma et al. [11, 13], and Wu et trapdoor queries. Oracle responds as in Phase 1. al. [17]. We assume both schemes were experimented on 0 2) Output: A output its quess b . The adversary the same desktop computer. 0 win the game if b = b, which shows that the Obviously, our encryption (Enc) computational cost output of experiment is 1 and 0 otherwise. Ad- seems higer than other related schemes. This is due to the versary A advantage in the above experiment is extra computational overheads by a generation of an in- defined as: stance from a witness relation to resist insider adversary. Decryptions and test computations were comparable to W −IND−TD other schemes (see Figure 2). Although time consumption AdvIB−PKC−DETIA(w) 1 of encryption (Enc) is slightly higher than in [17] scheme, = |P r[ExpW −IND−TD ] − . IB−PKC−DETIA 2 it provides enhanced security to resist insider attack. International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 750
search for secure cloud storage,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 4, pp. 789-798, 2016. [6] S. Garg, C. Gentry, A. Sahai, and B. Waters, ”Wit- ness encryption and its applications,” in Proceedings of the 45th Annual ACM Symposium on Theory of Computing, pp. 467-476, 2013. [7] K. Huang, R. Tso, Y. C. Chen, S. M. M. Rahman, A. Almogren, and A. Alamri, ”PKE-AET: Public key encryption with authorized equality test,” The Com- puter Journal, vol. 58, no. 10, pp. 2686-2697, 2015. [8] I. R. Jeong, J. O. Kwon, D. Hong, and D. H. Lee, ”Constructing PEKS schemes secure against key- word guessing attacks is possible?,” Computer Com- munications, vol. 32, no. 2, pp. 394-396, 2019. [9] H. T. Lee, H. Wang, and K. Zhang, ”Security anal- ysis and modification of ID-based encryption with Figure 2: Computational time consumption comparison equality test from ACISP 2017,” Information Secu- rity and Privacy, pp. 780-786, 2018. [10] B. Lynn, The Stanford Pairing based Crypto Library. 7 Conclusion (http://crypto.stanford.edu/pbc/) Our scheme ensures a security improvement in [9,17]. We [11] S. Ma, ”Identity-based encryption with outsourced delegate a cloud server to perform equality test on users equality test in cloud computing,” Information Sci- ciphertext. Such authorization cause the cloud server to ences, vol. 328, pp. 389-402, 2016. launch the insider attack which our scheme resist such an [12] S. Ma, Y. Mu, W. Susilo, and B. Yang, ”Witness- attack. However, our scheme ensures that even though based searchable encryption,” Information Sciences, the cloud server is authorized to perform equality test, it vol. 453, pp. 364-378, 2018. could not launch the insider attack on users cipheretext. [13] S. Ma, M. Zhang, Q. Huang, and B. Yang, ”Pub- Our scheme ensures a resistant to insider attack by the lic key encryption with delegated equality test in a adoption of witness based cryptographic primitive. Our multi-user setting,” The Computer Journal, vol. 58, scheme support weak indistinguishable identity chosen ci- no. 4, pp. 986-1002, 2015. phertext attack security (W-IND-ID-CCA) with extended [14] H. S. Rhee, J. H. Park, W. Susilo, and D. H. Lee, trapdoor security (TD). ”Trapdoor security in a searchable public-key en- cryption scheme with a designated tester,” Journal of Systems and Software, vol. 83, no. 5, pp. 763-771, References 2010. [15] Q. Tang, ”Public key encryption schemes supporting [1] S. Alornyo, M. Asante, X. Hu, and K. K. Mireku, equality test with authorisation of different granular- ”Encrypted traffic analytic using identity based en- ity,” International Journal of Applied Cryptography, cryption with equality test for cloud computing,” in vol. 2, no. 4, pp. 304-321, 2012. IEEE the 7th International Conference on Adaptive Science and Technology (ICAST’18), pp. 1-4, 2018. [16] Q. Tang, ”Public key encryption supporting plain- [2] D. Boneh, G. D. Crescenzo, R. Ostrovsky, and text equality test and user-specified authorization,” G. Persiano, ”Public key encryption with keyword Security and Communication Networks, vol. 5, no. search,” in International Conference on the Theory 12, pp. 1351-1362, 2012. and Applications of Cryptographic Techniques, pp. [17] T. Wu, S. Ma, Y. Mu, and S. Zeng, ”ID-based en- 506-522, 2004. cryption with equality test against insider attack,” [3] J. W. Byun, H. S. Rhee, H. A. Park, and D. H. Lee, in Australasian Conference on Information Security ”Off-line keyword guessing attacks on recent keyword and Privacy, pp. 168-183, 2017. search schemes over encrypted data,” in Workshop [18] G. Yang, C. H. Tan, Q. Huang, and D. S. Wong, on Secure Data Management, pp. 75-83, 2006. ”Probabilistic public key encryption with equality [4] R. Chen, Y. Mu, G. Yang, F. Guo, and X. Wang, ”A test,” in Cryptographers’ Track at the RSA Confer- new general framework for secure public key encryp- ence, pp. 119-131, 2010. tion with keyword search,” in Australasian Confer- [19] W. C. Yau, S. H. Heng, and B. M. Goi, ”Off-line key- ence on Information Security and Privacy, pp. 59-76, word guessing attacks on recent public key encryp- 2015. tion with keyword search schemes,” in International [5] R. Chen, Y. Mu, G. Yang, F. Guo, and X. Wang, Conference on Autonomic and Trusted Computing, ”Dual-server public-key encryption with keyword pp. 100-105, 2008. International Journal of Network Security, Vol.22, No.5, PP.743-751, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).04) 751
Acknowledgments work Security. A Member of IEEE.
We wish to thank the anonymous reviewers for their com- Acheampong Edward Mensah received his masters ments and contributions. degree of Software Engineering from the University of Electronic Science and Technology of China in 2019. His research interests are in the areas of cryptography and Biography information security. Abraham Opanfo Abbam received his bachelor in In- Seth Alornyo is a lecturer at Koforidua Technical Uni- formation and Communication Technology degree from versity. He received his Master of Philosophy(M.Phil) University of Education, Winneba. He is currently pursu- degree from Kwame Nkrumah University of Science and ing software engineering masters degree program at Uni- Technology in 2014, bachelor of science degree, computer versity of Electronic Science and Technology of China. science in 2012 and a higher national diploma in 2008. His research interest lies in the area of Deep Learning Currently, he is pursuing his Ph.D. in Software Engineer- specifically Natural Language Processing. ing at University of Electronic Science and Technology of China. His research interests are Cryptography and Net- International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 752
One-Code-Pass User Authentication Based on QR Code and Secret Sharing
Yanjun Liu1, Chin-Chen Chang1, and Peng-Cheng Huang1,2 (Corresponding author: Peng-Cheng Huang)
Department of Information Engineering and Computer Science, Feng Chia University1 Taichung 40724, Taiwan College of Computer and Information Engineering, Xiamen University of Technology2 Xiamen, Fujian, China (Email: [email protected]) (Received July 10, 2018; Revised and Accepted Feb. 7, 2019; First Online Oct. 6, 2019)
Abstract based, biometrics-based and image morphing-based pro- tocols. In a password-based user authentication proto- The quick response (QR) code has gained extensive pop- col [8, 18], the user first registers at the remote server by ularity in information storage and identification in our transmitting a selected password to the server privately. daily lives due to its small printout size, high storage ca- Then, the server can provide the user with required ser- pacity and error correction ability. Taking into account vices after the authentication process, in which the pass- these fascinating features of the QR code, we propose a word presented by the user is checked to confirm that the user authentication protocol based on QR code and secret user is legal. However, a cost-inefficient problem arises sharing. It is the first user authentication protocol that due to the fact that a verifier table maintaining users’ implements the “one-code-pass” functionality in which an private information for authentication must be stored on authorized user can use only one QR code to access vari- the server and the volume of the verifier table is in pro- ous services from all departments within an organization. portion to the increased number of users. Therefore, the In the proposed protocol, a secret is divided into many verifier table needs to occupy much storage space when a shares in the form of QR codes held among different de- lot of users need to be authenticated. Besides, private in- partments and users. The original secret must be restored formation of the users may be leaked out since the verifier by the cooperation of both the department’s share and the table is prone to be stolen with malicious intention. user’s share for the authentication. Experimental results To overcome the aforementioned shortcomings, the demonstrate that the proposed protocol has high robust- smart card is employed extensively in the design of a user ness and is secure against various typical attacks. authentication protocol. There are two main advantages Keywords: One-Code-Pass; QR Code; Security; Secret for the use of smart card [6]. Firstly, the level of secu- Sharing; User Authentication rity can be enhanced significantly. Each user individually owns a smart card which stores important and confiden- tial information for mutual authentication that can au- 1 Introduction thenticate not only the user but also the remote server. Furthermore, the private information of all users is fully The user authentication mechanism has been regarded as dispersed by smart cards rather than being centralized in a very important technique to efficiently verify the iden- a verifier table, substantially attenuating the risk of infor- tity of the user through various kinds of secure commu- mation disclosure. Secondly, the authentication protocol nications. In 1981, for the first time Lamport [8] intro- becomes more cost-efficient because there is no need to duced a password-based remote user authentication pro- maintain a great deal of users’ information on the server. tocol. Since then, numerous variants [6, 7, 9, 18] of Lam- A typical smart card based authentication protocol is con- port’s protocol have been developed in the literature, no- ducted as follows [9]. First of all, the user submits a regis- ticeably extending the scope of applications for user au- tration request to the server, and then, the server delivers thentication in the field of secure communications, such as a smart card storing private information of both the user the agreement and distribution of session keys, e-business and the server to the user over a secure channel. When systems and wireless network communications, etc. the user inserts the smart card into a card reader for spe- At present, user authentication protocols generally fall cific services provided by the server, the identity of the into four categories, i.e., password-based, smart card- user should be verified to confirm that the user is the International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 753 true card owner. The private information maintained in that the user is legal; otherwise, an unauthorized user is the smart card is extracted by the card reader and then successfully detected. Since the original face image of the transmitted to the server for mutual authentication. If card owner is no longer stored anywhere, this protocol the authentication is passed, a shared session key is usu- can furnish efficient privacy protection and higher secu- ally established by the authorized user and the server to rity level. In 2015, Mao et al. [15] introduced a proxy user ensure subsequent secure communication. authentication protocol. In their protocol, the proxy user Unfortunately, the smart card is liable to be forged has authority to act on behalf of the primary user and since an illegal user may easily invade the secret informa- all users are able to be authenticated by image exchange tion preserved in the smart card due to weak computing based on morphing technology. power [7]. Consequently, biometrics information [16] re- In 2017, Liu and Chang [13] innovatively extended the ferring to unique biometric feature of a person that is image morphing-based authentication protocol to a “one- unable to be stolen or forged, has been employed as a card-pass” scenario with high practical use. Under this popular tool to further increase the authentication secu- scenario, an organization contains various kinds of depart- rity. Fingerprint, face, iris, voice and gait are the most ments providing diverse services. An authorized user can common biometric information used to verify the identity register at any department to acquire a smart card stor- of the user since it is impossible for two people to present ing a generated morphed image. After that, the user can identical data of these biometric features [20, 21]. The utilize this smart card to obtain services from different biometric information of the true user is usually stored in departments without registering again if the user passes advance. For authentication, the current biometrics fea- the authentication such that the face image of the user ture is retrieved by the biometric reader and then com- is identical to that restored by de-morphing the morphed pared with the stored information. If they can match, image and another face image maintained on the cloud the identity of the user is authenticated; otherwise, the storage servers. Nevertheless, most of the morphed im- authentication process fails. ages generated in authentication protocols look unnatu- Recognition rate that indicates the identification ac- ral, which is prone to arouse suspicion among malicious curacy is one of the most important measurements for attackers. Therefore, the research of optimization algo- performance evaluation of a biometrics-based authenti- rithms [14] on the selection of control points to achieve cation protocol [21]. Therefore, the research subject on better visual effect of morphed images becomes a very recognition rate has attracted scholars’ attention in re- challenging task. cent years and great recognition rate has been obtained by Nowadays, the quick response (QR) code [11,12] plays unceasing improvements on feature matching algorithms. a very important role in information storage and identi- However, this type of authentication protocol has the fol- fication in our daily lives. The QR code has many fas- lowing weaknesses [16, 20, 21]. Firstly, it just provides cinating features [11, 12, 19], such as small printout size, weak privacy protection since personal biometric infor- high storage capacity and error correction ability. Since mation may be disclosed and then abused by malicious the QR code is very easy to be decoded by any stan- adversaries. Secondly, if the true user’s biometric infor- dard QR code reader, it also has gained extensive popu- mation is destroyed because of diseases or accidents, it larity in real-time applications. Due to these advantages, will induce incorrect authentication. At last, development the QR code is very suitable for storing the user’s per- of a robust biometric reader is both time-consuming and sonal information to verify the identity of the user. In money-consuming. 2018, Huang et al. [5] applied the aesthetic QR code and To make further efforts on the uplift in security and the single-key-lock mechanism to a smart-building access performance, a new type of authentication protocols control system. The single-key-lock mechanism produces based on image morphing has flourished. The image mor- keys for users and locks for doors. After encryption and phing technology suspends from the production of special fusion procedures, the keys are used to generate aesthetic image effects in the film industry in such a way that given QR codes as the user’s credential. This access control a source image and a target image, a morphed image is system has the attributes of high security and robust- created. The morphed image looks like both the source ness. Different from Huang et al.’s access control system, image and the target image. This attractive characteris- we consider the application of QR code from another as- tic can be well applied to identity authentication. Up to pect. The application scenario of our user authentication date lots of user authentication protocols combining im- protocol is similar to that of Liu and Chang’s morphing- age morphing with smart card are introduced [13,15,22]. based “one-card-pass” method, but ours offers “one-code- Zhao and Hsieh [22] presented a card user authentication pass” functionality via the combination of QR code and protocol in which a morphed image MI is first created secret sharing. That is to say, an authorized user can suc- via the card owner’s face image OI and a pre-selected cessfully use only one QR code to access various services face image SI and then printed on the smart card. When from all departments within an organization. Suppose a card user needs to be authenticated, the image OI is that there is a secret provided by the server of the organi- recovered by a de-morphing process operating on the im- zation in the proposed protocol. The secret is divided into ages MI and SI. Then, the image OI is compared with many shares in the form of beautified QR codes and each the face image of the user. If they are the same, it implies department holds its own share beforehand. During the International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 754 registration, a user can register at any department and padding bits. Then, n parity bits are created for detect- acquire his/her share from the department. When the ing and correcting errors when scanning the QR code. user wants to access services of any of the departments, Figure 1 demonstrates how to place an RS code into a 2D the original secret must be restored by the cooperation of QR code. Especially, each message/padding/parity bit is both the department’s share and the user’s share; knowing located onto one module of the message/padding/error only one share is unable to derive any valuable informa- correction region. tion about the secret. If the restored secret is not correct, the user is illegal. Experimental results demonstrate that the proposed protocol is secure against various attacks and the generated beautified QR code has high robust- ness. The rest of the paper is organized as follows. In Section 2, we first introduce elementary knowledge of QR code and secret sharing, and then, review related works [13] and [5]. In Section 3, a novel “one-code-pass” user authentication protocol based on QR code and se- cret sharing is proposed. Section 4 demonstrates the ex- Figure 1: The structure of a QR code perimental results of the proposed protocol and Section 5 gives the security and performance analyses. Finally, our conclusions are presented in Section 6.
2 Preliminaries
In this section, we first introduce main building blocks of a new user authentication protocol that will be pro- posed in the next section. After that, we provide a brief review of the morphing-based “one-card-pass” authenti- cation protocol [13] and the QR code-based access control Figure 2: RS code system [5].
2.1 QR Code 2.2 Shamir’s Threshold Secret Sharing The QR code, invented by the Japanese Denso-Wave Company in 1994 [4], is a two-dimensional graphical code The concept of secret sharing, independently introduced that consists of black and white square modules represent- by Shamir [17] and Blakley [3] in 1979, is an effi- ing bit information. As illustrated in Figure 1, the struc- cient mechanism for data protection. The popularity of ture of a standard QR contains message region, padding Shamir’s threshold secret sharing (TSS) has noticeably in- region and error correction region, together with version creased due to its high efficiency and security. In a (t, r) information, format information, position detecting pat- TSS scheme (t ≤ r), a secret is divided into r shares to be terns, alignment patterns and timing patterns. held by r participants. The secret can be simply recov- The QR code is extensively used in information stor- ered by the collaboration of at least t shares; otherwise, age and identification applications due to its advantages if fewer than t shares are collected, the original secret on small printout size, high storage capacity and error cannot be retrieved correctly. correction ability [11, 12, 19]. As many as 40 QR code Assume that there is a dealer and r participants, and versions are used to determine different storage capacities the secret is denoted as s. The (t, r) TSS scheme con- such that the QR code with a higher version number can tains the share establishment phase and the secret recov- provide larger data payload. As another extraordinary ery phase, which are described as follows. feature of the QR code, the error correction capability with four error correction levels (i.e. L, M, Q and H) 2.2.1 Share Establishment Phase ensures successful decodability when portions of the QR code are dirty or damaged. The dealer randomly selects a Lagrange interpolating One of the most important processes for generating a polynomial g with degree t-1 such that g(x) = s + a1x + 2 t−1 QR code is to encode information using Reed-Solomon a2x + ... + at−1x mod p, where p is a prime num- (RS) code. A t-bit RS code, shown in Figure 2, is com- ber, and a1, a2, . . . , at−1 and s are in the finite field posed of three segments, i.e., l message bits, m padding GF(p). Then, the dealer chooses r random numbers xd bits and n parity bits. The to-be-embedded message is for d = 1, 2, . . . , r to establish r shares sd = g(xd) for first decoded into a l-bit binary stream, followed by m d = 1, 2, . . . , r, and provides each participant with a share. International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 755
2.2.2 Secret Recovery Phase service provided by this department. The detailed steps in this phase are listed below. The t participants can cooperate to recover the secret s by releasing their shares. Suppose the t out of r Step 1. User U presents the smart card to a terminal T . shares are denoted as sbh ∈ {s1, s2, . . . , sr} for h = 1, 2, . . . , t. According to the principle of the Lagrange in- Step 2. T scans the morphed image MI printed on the terpolating polynomial, g(x) can be recovered by g(x) = smart card and extracts the information for authen- Pt Qt x−xbk sbh mod p. Thus, the secret s can tication from the smart card. h=1 k=1,k6=h xbk−xbh be immediately obtained by s = g(0). Step 3. T finds the face image SI stored in the cloud storage servers C according to the extracted infor- 2.3 Related Works mation.
2.3.1 Morphing-Based “One-Card-Pass” Au- Step 4. C sends SI to T . thentication Protocol Assume that an organization includes many independent Step 5. T recovers the face image of the true card owner, departments. For instance, a university may contain a OI, by de-morphing images MI and SI. library, a digital information center, a fitness center, a Step 6. T compares the face image of the user U with student association, etc. We hope that the “one-card- OI to see if they are identical. If it holds, U is legal. pass” functionality could be provided in such a way that a faculty member or a student could use only one smart This protocol is very practical since it implements the card to gain various services from all of the departments. “one-card-pass” functionality. The strategy that the real Accordingly, the objective of the “one-card-pass” user au- image of the true card owner is not stored anywhere but thentication protocol is to verify the identity of the card hidden in a morphed image increases the security to a cer- user in each department after the user registers at any of tain extent. However, the protocol requires an additional the departments. set of cloud storage servers that preserve large numbers of Recently, Liu and Chang [13] proposed the first “one- face images for the morphing operation, which may lead card-pass” user authentication protocol using image mor- to potential security problems. Furthermore, since the phing technology. This protocol contains three enti- procedures of printing and scanning may cause noises, the ties, i.e., a card user, a terminal within a department, authentication result may not be correct due to the dis- and cloud storage servers, and it consists of the registra- tortions generated to the recovered face image. To solve tion phase and the authentication phase. these problems, in this paper, we will combine QR code In the registration phase, a legal user U can register and secret sharing to design a new “one-code-pass” user at any terminal T within any department. The following authentication protocol in Section 3. steps are conducted to fulfill the registration.
Step 1. The user U sends a registration request includ- 2.3.2 QR Code-Based Access Control System ing his/her identity number to a terminal T . Huang et al. [5] presented a novel access control system for smart building based on QR code. The enrollment Step 2. T takes a digital photo of U. This photo, de- procedure of this system consists of three steps, i.e. the noted as OI, is used as the original face image of keys and locks generation, key encryption, and the QR U. code beautification, as listed below: Step 3. T selects a face image SI stored in the cloud Step 1. Keys and locks generation. The single-key-lock storage servers C according to U’s identity number mechanism is used to generate keys for users and and then C sends SI to T . locks for doors on a randomly non-singular matrix. Step 4. T generates a morphed image MI using OI as the source image and SI as the target image by a Step 2. Key encryption. Each key is encrypted by a specific morphing algorithm. symmetrickey cryptography algorithm to prevent the leak of key. Then, a QR code generated by the en- Step 5. T stores the information that will be used in crypted keys is sent to the user. the authentication phase in a smart card and prints the morphed image MI on the smart card. Then, T Step 3. QR code beautification. The owner of original delivers the smart card to U. QR code produced by Step 2 cannot be distinguished by human eyes for a number of confused black and Later, in the authentication phase, a terminal T of a white modules. This would induce difficulties in the department recovers the face image of the true card owner management of QR codes when the number of users via image de-morphing to verify the validity of the card (keys) grows sharply. Therefore, these QR code need user U. If the face image of U matches the recovered to be beautified to show the user’s photo on its own. face image, U passes the authentication and can gain the The beautification process includes: International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 756
a) Construct the basis vector matrix. user and department sides must be collected to re- b) Generate an XORed QR code by perform the cover the original secret; Just one share is impossible XOR operation between the Reed-Solomon mes- to leak the secret. In addition to this, the proposed sage of the QR code and the basis vector ma- protocol is secure against various attacks, such as the trix. In the XORed QR code, most modules impersonation attack, the collusion attack, the replay in the padding region are modified and middle attack, etc. It can also achieve high robustness as it modules are kept clear to make the preparation is tolerant of various quality degradation situations. of embedding the user’s photo. c) Synthesize an aesthetic QR code from the user’s 3.2 Main Idea photo and the XORed QR code as the user’s To authenticate the identity of a user, a (2,M + N) TSS credential. is exploited in the proposed protocol for an organization containing M users and N departments. At first, some When a user presents the aesthetic QR code to request initialization work should be done. The server of the or- the access to a door, the reader on the door would de- ganization selects a secret s and generates N shares such crypt and retrieve the user’s key from the aesthetic QR that each share is held by a corresponding department. code, and then, verifies the access right via conducting an Then, for each user, he/she can register at any depart- operation on the key and the lock. ment and acquire his/her own share from the department with the help of the server. It is noted that each share 3 The Proposed Protocol of a department/user is a beautified QR code that not only encodes authentication information, but also embeds In this section, we propose a novel “one-code-pass” user a logo/photo of the department/user in its padding re- authentication protocol. The proposed protocol contains gion for efficient management of QR codes. When a user the initialization phase, the registration phase and the wants to access services of any of the departments, the authentication phase. In the following, we first point out user is authenticated by recovering the original secret s the contribution of the proposed protocol, then address through the cooperation of both the department’s share its main idea, and finally elaborate on each phase. and the user’s share. The photo of the authorized user on the beautified QR code also provides an auxiliary way for user authentication. Figure 3 shows the architecture 3.1 Contribution of the proposed protocol, in which a user first registers The contribution of the proposed protocol is described as at the department 1, and then is authenticated by the follows: departments 1 and 2, respectively. Before addressing the detailed protocol, some notations 1) It is the first user authentication protocol that can used in the paper are described in Table 1. fully accomplish the goal of “one-code-pass” in a specified organization. More specifically, a user is able to make registration on any department to gain Table 1: Notation description his/her own beautified QR code. By using the QR Notation Description code, the user is immediately authenticated and di- S The server rectly accesses diverse services from all departments Ui(1 ≤ i ≤ M) The user i
in this organization. IDUi The identity number of Ui QU The beautified QR code for Ui 2) The QR code technique is combined with secret shar- i Dj(1 ≤ j ≤ N) The department j ing mechanism to implement the proposed protocol. IDDj The identity number of Dj A secret is divided into a number of shares that Q The beautified QR code for D are held among different departments and authorized Dj j s The secret provided by S users. The share actually is a beautified QR code The secret key shared among all containing some authentication information. Each K departments department gains its share in advance and each user g(·) The Lagrange interpolating polynomial obtains his/her share from the department on which h(·) A collision-free one-way hash function the registration operation for the user is conducted. k The string concatenation operation To verify the identity of the user, the original se- cret should be recovered by the cooperation of both the department’s share and the user’s share. If the restored secret is correct, the user is eligible to get services from all departments. 3.3 The Initialization Phase 3) The proposed protocol is significantly secure. Based This phase allows the server to generate a share in the on the secret sharing mechanism, both shares on the form of a beautified QR code for each department. As- International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 757
Figure 3: Architecture of the proposed protocol
sume that all of the N departments within the organi- Step 4. S generates a beautified QR code QUi for the zation share a secret key K. Firstly, the department Dj user Ui as the share of Ui by using the information computes xDj = h(IDDj k K) and sends it to the server (xUi , yUi ) and Ui’s photo. The method of generating S. Then, S selects a secret s, a number a1 and a prime the beautified QR code for the user is the same as number p to generate a one-degree-polynomial function g that for the department as mentioned in the initial- as ization phase.
Step 5. S sends Q to D . g(x) = s + a1x (mod p). (1) Ui j
Step 6. Dj sends QUi to Ui. For each department, S computes yDj = g(xDj ) = s + a1xDj (mod p). Finally, S creates a beautified QR code Q for the department D as the share of D by using Dj j j 3.5 The Authentication Phase the information (xDj , yDj ) and Dj’s logo. Here, we briefly explain how the beautified QR code for The authentication is conducted by an authentication de- a department is generated. The information (xDj , yDj ) is vice within a department which is equipped with a QR first placed into modules of the message region to produce code reader and has a copy of the secret s. The authenti- a QR code. Then, a beautification algorithm on the QR cation device verifies the identity of the user by recover- code is made for efficient management. There are many ing the original secret s through the collaboration of the available beautification algorithms and we use the one user’s QR code and the department’s QR code. If the presented in [10]. In this algorithm, the modules in the user passes the authentication, he/she can gain services padding region of the QR code is first modified to keep provided by this department. The steps of this phase are “clean” with the help of a basis vector matrix, and then, addressed as follows and illustrated in Figure 5. the department’s logo is embedded into the padding re- gion by a synthetic strategy. Interested readers can refer Step 1. The user Ui shows the beautified QR code QUi to [10] for a better understanding. to the authentication device. Step 2. The authentication device scans the QR code
3.4 The Registration Phase QUi of Ui and extracts (xUi , yUi ). In this phase, a user can register at any department and Step 3. The authentication device scans the QR code acquire his/her own share, also in the form of a beautified QDj of the department Dj where it is located and
QR code. This phase is described in detail as follows and retrieves (xDj , yDj ). demonstrated in Figure 4. Step 4. The authentication device uses (xUi , yUi ) and (x , y ) as two shares to restore a secret s0. If Step 1. The user Ui sends his/her identity number IDUi Dj Dj 0 to the department Dj. s 6= s, the authentication device terminates the phase and the authentication fails; otherwise, it exe-
Step 2. Dj computes xUi = h(IDUi k K) and sends xUi cutes Step 5. to the server S. Step 5. The authentication device computes x0 = Ui Step 3. S computes y = g(x ) = s+ a x (mod p) by h(ID k K) and then checks whether x0 equals Ui Ui 1 Ui Ui Ui
Equation (1). xUi . If it holds, the authentication device confirms International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 758
Figure 4: The registration phase of the proposed protocol
that the user is legal; otherwise, the phase is termi- are involved into a concrete example to illustrate the ex- nated and the authentication fails. perimental results. In the following example, we select s = 1024, a1 = 21, p = 1237 and K =“@MSNLAB”. Thus, the function g generated by the server S for the initialization becomes g(x) = 1024+21x(mod 1237). The
beautified QR code QDj for each of the two departments is generated in the initialization phase and shown in Ta- ble 2, where the department’s logo, the identity number
IDDj and the information (xDj , yDj ) used for the gen-
eration of QDj are also provided. After the registration on the department D1 or D2, each of the three users ob-
tains a beautified QR code QUi by using the user’s photo,
the identity number IDUi and the information (xUi , yUi ), as shown in Table 3. In addition, the departments’ lo- gos in Table 2 are provided by Feng Chia University and the users’ photos in Table 3 come from the Yale Face Figure 5: The authentication phase of the proposed pro- Database [2]. tocol Here, we only take the user authentication process per- formed by the department D1 as an example. Now let us In the authentication phase, it is worth noticing that if demonstrate how the department D1 verifies the identity the restored secret s0 is different from the original secret of the user U1 in the proposed protocol. Firstly, the au- s by Step 4, the user is definitely illegal. However, if s0 is thentication device in D1 scans D1’s beautified QR code Q (See in Table 2) and U ’s beautified QR code Q equal to s, it cannot confirm that the user is legal. This D1 1 U1 is because if an attacker impersonates a user to embed (See in Table 3), respectively. Secondly, the information (x , y ) and (x , y ) are extracted from Q and fake values of x00 and y00 that satisfy y00 = g(x00 ) = D1 D1 U1 U1 D1 Ui Ui Ui Ui 00 QU , respectively, and then used as two shares to recon- s + a1x (mod p) into the QR code, the correct secret 1 Ui struct a function g0 as s can still be obtained by the two points (xDj , yDj ) and (x00 , y00 ) on the function g. Therefore, Step 5 is added Ui Ui 0 x−xD1 x−xU1 00 0 g (x) = yU1 + yD1 (mod 1237) to check whether x equals h(ID k K) when s = s to xU1 −xD1 xD1 −xU1 Ui Ui ensure that x00 is real. = 1024 + 21x(mod 1237). Ui Then, a secret s0 is derived by s0 = g0(0) = 1024, which 4 Experimental Results is equal to the original secret s. Finally, we compute x0 = h(ID k K) and find that x0 equals x . This U1 U1 U1 U1 To evaluate the performance of our proposed protocol, indicates that the user U1 passes the authentication. our experiments were implemented by the open source In the following, we also show how the de- computer vision library OpenCV and C++ program lan- partment D1 detects an illegal user Bob by the guage. In this section, three users and two departments proposed protocol. Assume that Bob uses xB= International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 759
QR code, but first encrypted by a one-way hash function Table 2: The beautified QR codes for the departments as xUi = h(IDUi k K), and then, the encrypted message Department D1 Department D2 xUi and the corresponding value yUi computed via xUi by Equation (1) are used to generate the user Ui’s QR code
QUi . When a malicious attacker scans the QR code QUi Logo through a QR code reader, he/she only gets the messages
xUi and yUi but has no idea of the user’s identity number
IDUi . By this way, the user’s personal information will not be disclosed so that privacy protection is achieved. http://lib.fcu.edu http://www.sa.fcu Besides, it doesn’t matter that the photo embedded on IDD j .tw .edu.tw the QR code reveals the appearance of the user since it is dc1cf84361a49864 742b34c092beb4ea useless for the user authentication. xDj = h(ID k K) 24325645bef5be80 e998a104509635fe Dj 2344dc55 32c394f3 5.1.2 Withstanding Impersonation Attack yDj = g(xDj ) 607 107 According to the proposed protocol, the impersonation attack refers to an attack that a malicious intruder forges a QR code and then impersonates a user with the inten- QD j tion of passing the authentication. To launch this kind
of attack, the intruder must produce fake values of xUi
and yUi and use them to forge the QR code. There is a
strong possibility that the fake xUi and yUi do not sat-
isfy the equation yUi = g(xUi ) = s + a1xUi (mod p), thus 6ed3693a3ef2fa0421cd0d1bd36750691522ce29, yB = 671 the secret s restored by the cooperation of the two shares and his photo to forge his QR code QB and then shows QB (xUi , yUi ) and (xDj , yDj ) is different from the original se- to the authentication device. The authentication device cret s. This implies that the impersonation attack is suc- uses (xB, yB) extracted from QB and (xD1 , yD1 ) derived cessfully detected in this situation. However, there also 00 from QD1 to reconstruct a function g as exists a situation that the intruder knows the function
g(x) = s + a1x(mod p) and produces fake xUi and yUi 00 x−xD1 x−xB g (x) = yB + yD1 (mod 1237) satisfying the function g such that the correct secret s xB −xD1 xD1 −xB = 287 + 16x(mod 1237). can still be obtained. Since under this case the value of s cannot determine whether the user is legal, the proposed Obviously, a secret s0 is derived by s0 = g00(0) = 287 6= protocol will further compute x0 = h(ID k K). The Ui Ui 1024, which implies that Bob is an illegal user. value x created by the intruder will not equal x0 be- Ui Ui cause the intruder does not know the value of K, which
indicates that xUi is fake and the impersonation attack 5 Analyses fails. In this section, we give the security and performance anal- yses of the proposed protocol, respectively. 5.1.3 Withstanding Collusion Attack
5.1 Security Analysis The proposed protocol must be collusion-resistant. The The proposed protocol can efficiently protect the user’s collusion attack in the proposed protocol means that, if privacy and resist various typical attacks, such as the im- multiple authorized users collude, they can obtain impor- personation attack, the collusion attack and the replay tant information and use it to help an attacker to pass the attack. The detailed security analysis from the theoreti- authentication. Since it is sufficient for any two shares cal aspect is given as follows. working together to restore the secret s, two authorized users can release their shares and collude to recover the function g(x) = s+a x(mod p). The two users may share 5.1.1 Privacy Protection 1 the function g with an attacker who attempts to imper- One property of the QR code is that the information em- sonate a legal user. The attacker can create the values of bedded in the message region of the QR code can be xUi and yUi satisfying the function g and the correct secret decoded and extracted by any QR code reader. As a s can be obtained in the authentication phase. Neverthe- result, a malicious attacker must be prevented from re- less, the attacker cannot pass the authentication since the trieving any personal knowledge about the user from the created value x will not equal x0 (x0 = h(ID k K)) Ui Ui Ui Ui extracted information. To achieve this goal, the user Ui’s as addressed in Subsection 5.1.2. Therefore, the collusion identity number IDUi is not embedded directly into the attack cannot be launched successfully. International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 760
Table 3: The beautified QR codes for the users
User U1 User U2 User U3
Photo
IDUi WangJiaQing#M0557591 ZhuZhaoHua#M0419274 LiJing#P0361138
xUi = a2e5bc07294a34caaffcc79 3cf5b5ffb9153545bd8bbf1 53132e6ffe935538492e213f
h(IDUi k K) 264e0f2aec912b37a f0edb154ec2691fc1 9c232d308c9c0488
yUi = g(xUi ) 788 709 565
Photo
5.1.4 Withstanding Replay Attack 5.2.2 Robustness of the Proposed Protocol
In a replay attack, a valid data transmission is repeated In the application scenario, when the beautified QR code or delayed by a malicious attacker without detection. The was printed in the plastic card to be an ID card or scanned proposed scheme can withstand such attack by the follow- by a camera in the absence of adequate lighting condition, ing way. An attacker can steal or duplicate a valid QR it usually suffers from several image degradation factors, code and tries to use it to pass the authentication. One such as pixel distortion, geometric distortion, noise, blur, method to prevent it happening is that we can periodi- and so on. These factors can be considered as a kind cally change the value of the secret key K and accordingly of image attack. Sometimes the quality of the QR code update the QR codes for the users and departments based image being attacked has degraded significantly. Fig- on the function g. Therefore, the QR code held by the ure 6(a) shows the result of the beautified QR code QUi attacker always expires since the attacker can never get of user U1 in Table 3 suffering from the print-and-scan at- the latest version of the user’s QR code. tack. The beautified QR code was printed in 600dpi with the HP LaserJet 500 color M551 printer, and scanned by the 200dpi with HP LaserJet M2727nf scanner. Fig- 5.2 Performance Analysis ure 6(b) shows the result of QUi suffering from Gaussian blur with the parameter σ = 5. Figure 6(c) shows the re- To evaluate the performance of the proposed protocol, we sult of Q suffering from Gaussian noise with parameters analyze the decoding rate and robustness of the QR codes Ui M = 0,V = 0.05. for users and departments in this subsection. The authentication information embedded in these at- tacked beautified QR codes could still be decoded by any 5.2.1 Decoding Rate of Beautified QR Codes standard QR code reader. It demonstrates that the beau- tified QR code is tolerant to common attacks and the To evaluate the decoding rate of the beautified QR one-code-pass user authentication protocol is practically code, we conduct experiments on both Apple’s iOS and usable in the real-world applications. Google’s Android mobile operation system. As shown in Table 4, several applications from Apple APP store and Google Play APP store were installed on two different mo- 6 Conclusions bile phones (iPhone 7, XiaoMi 4) and twenty beautified QR codes synthesized from users’ authentication infor- In this paper, we proposed a novel “one-code-pass” user mation with their photos were used to test the decoding authentication protocol based on QR code and secret rate of beautified QR codes. All these 20 users’ photos sharing such that an authorized user can successfully use are from Yale face database [2] and AT&T Cambridge only one QR code to access various services from all de- face database [1]. From Table 4, we can see that all the partments within an organization. In the proposed proto- APPs in these two selected phones can successfully de- col, a secret is divided into many shares in the form of QR code the beautified QR code messages at the decoding codes held among different departments and users. Each rate of 100%. user can register at any department and acquire his/her International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 761
Table 4: Decoding rate in different applications
Mobile Phone Applications (Developer) Decoding Rate WoChaCha QR code 100% (WoChaCha Information Technology) Quick scan (iHandy Inc.) 100% iPhone 7 Quick QR code reader & creator (iOS 10.3.2) 100% (Fellow Software) QR code kit (Sima Biswas) 100% QuickMark (SimpleAct Inc.) 100% WoChaCha QR code 100% (WoChaCha Information Technology) XiaoMi 4 QR code extreme(FancyApp) 100% (Android 7.0) QR code reader (Scan.me) 100% Free QR code scanner (TWMobile) 100% QuickMark (SimpleAct Inc.) 100%
[5] P. C. Huang, C. C. Chang, Y. H. Li, and Y. Liu, “Ef- ficient access control system based on aesthetic QR code,” Personal and Ubiquitous Computing, vol. 22, no. 1, pp. 81–91, 2018. [6] Q. Jiang, J. Ma, G. Li, and X. Li, “Improvement of robust smart-card-based password authentication (a) (b) (c) scheme,” International Journal of Communication Systems, vol. 28, no. 2, pp. 383–393, 2015.
Figure 6: Results of beautified QR code of user U1 in Ta- [7] S. Kumari, S. A. Chaudhry, F. Wu, X. Li, M. S. ble 3 after image degradation processes. (a) After Print- Farash, and M. K. Khan, “An improved smart and-Scan attack; (b) After adding the Gaussian blurring card based authentication scheme for session initi- with σ = 5; (c) After adding the Gaussian noise with ation protocol,” Peer-to-Peer Networking and Appli- parameters M = 0,V = 0.05 cations, vol. 10, no. 1, pp. 92–105, 2017. [8] L. Lamport, “Password authentication with inse- cure communication,” Communications of the ACM, vol. 24, no. 11, pp. 770–772, 1981. share including authentication information from the de- [9] C. T. Li, “A new password authentication and user partment. When a user wants to access services of any anonymity scheme based on elliptic curve cryptog- of the departments, the user is authenticated by recov- raphy and smart card,” IET Information Security, ering the original secret through the cooperation of both vol. 7, no. 1, pp. 3–10, 2013. the department’s share and the user’s share. Theoretical [10] L. Li, J. Qiu, J. Lu, and C. C. Chang, “An aesthetic analyses and experimental simulations show that the pro- QR code solution based on error correction mecha- posed protocol can efficiently protect the user’s privacy nism,” Journal of Systems and Software, vol. 116, and resist various typical attacks, such as the imperson- pp. 85–94, 2016. ation attack, the collusion attack and the replay attack. [11] P. Y. Lin, “Distributed secret sharing approach with cheater prevention based on QR code,” IEEE Trans- References actions on Industrial Informatics, vol. 12, no. 1, pp. 384–392, 2016. [1] At&t Face Database, accessed 31 Oct. 2017. [12] S. S. Lin, M. C. Hu, and T. Y. Lee, C. H. Lee, “Effi- (http://www.cl.cam.ac.uk/research/dtg/atta cient QR code beautification with high quality visual rchive/facedatabase.html) content,” IEEE Transactions on Multimedia, vol. 17, [2] Yale Face Database, accessed 31 Oct. 2017. no. 9, pp. 1515–1524, 2015. (http://cvc.yale.edu/projects/yalefaces/ya [13] Y. Liu and C. C. Chang, “A one-card-pass user au- lefaces.html) thentication scheme using image morphing,” Mul- [3] G. R. Blakley, “Safeguarding cryptographic keys,” timedia Tools and Applications, vol. 76, no. 20, in Proceedings of the National Computer Conference, pp. 21247–21264, 2017. vol. 48, pp. 313–317, 1979. [14] Q. Mao, K. Bharanitharan, and C. C. Chang, [4] Denso-Wave Inc. QR code standardization, accessed “Edge directed automatic control point selection al- 31 Oct. 2017. (http://www.qrcode.com/en/index. gorithm for image morphing,” IETE Technical Re- html) view, vol. 30, no. 4, pp. 343–243, 2013. International Journal of Network Security, Vol.22, No.5, PP.752-762, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).05) 762
[15] Q. Mao, K. Bharanitharan, and C. C. Chang, “A International Conference on Awareness Science and proxy user authentication protocol using source- Technology (iCAST’11), pp. 117–122, Sep. 2011. based image morphing,” The Computer Journal, vol. 58, no. 7, pp. 1573–1584, 2014. [16] V. Odelu, A. K. Das, and A. Goswami, “A secure Biography biometrics-based multi-server authentication proto- col using smart cards,” IEEE Transactions on In- Yanjun Liu received her Ph.D. degree in 2010, in School formation Forensics and Security, vol. 10, no. 9, of Computer Science and Technology from University of pp. 1953–1966, 2015. Science and Technology of China (USTC), Hefei, China. [17] A. Shamir, “How to share a secret,” Communications She has been an assistant professor serving in Anhui Uni- of the ACM, vol. 22, no. 11, pp. 612–613, 1979. versity in China since 2010. She currently serves as a [18] H. M. Sun, Y. H. Chen, and Y. H. Lin, “oPass: A senior research fellow in Feng Chia University in Taiwan. user authentication protocol resistant to password Her specialties include E-Business security and electronic stealing and password reuse attacks,” IEEE Transac- imaging techniques. tions on Information Forensics and Security, vol. 7, Chin-Chen Chang is a professor in Feng Chia Univer- no. 2, pp. 651–663, 2012. sity. He received the BS degree in Applied Mathematics [19] I. Tkachenko, W. Puech, C. Destruel, O. Strauss, in 1977 and the M.S. degree in Computer and Decision J. M. Gaudin, and C. Guichard, “Two-level QR code Sciences in 1979, both from the National Tsing Hua Uni- for private message sharing and document authenti- versity, Taiwan. He received the Ph.D. degree in Com- cation,” IEEE Transactions on Information Foren- puter Engineering in 1982 from the National Chiao Tung sics and Security, vol. 11, no. 3, pp. 571–583, 2016. University, Taiwan. He is the author of more than 900 [20] J. Yu, G. Wang, Y. Mu, and W. Gao, “An ef- journal papers and has written 36 book chapters. His ficient generic framework for three-factor authen- research interests include computer cryptography, data tication with provably secure instantiation,” IEEE engineering, and image compression. Transactions on Information Forensics and Security, vol. 9, no. 12, pp. 2302–2313, 2014. Peng-Cheng Huang is a lecturer at the Xiamen Uni- [21] L. Zhang, Y. Zhang, S. Tang, and H. Luo, “Pri- versity of Technology. He received his BS degree from vacy protection for e-health systems by means of Xiamen University of Technology in 2007, the MS degree dynamic authentication and three-factor key agree- in Computer Architecture from the Fuzhou University in ment,” IEEE Transactions on Industrial Electronics, 2010. He is currently pursuing the Ph.D. degree from the vol. 65, no. 3, pp. 2795-2805, 2018. Feng Chia University. His current research interests in- [22] Q. Zhao and C. H. Hsieh, “Card user authentication clude multimedia security, image processing, and Internet based on generalized image morphing,” in 2011 3rd of thing. International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 763
Efficient Anonymous Ciphertext-Policy Attribute-based Encryption for General Structures Supporting Leakage-Resilience
Xiaoxu Gao and Leyou Zhang (Corresponding author: Xiaoxu Gao)
School of Mathematics and Statistics, Xidian University Xi’an, Shaanxi 710071, China (Email: gxx [email protected]) (Received Mar. 5, 2019; Revised and Accepted Nov. 6, 2019; First Online Jan. 23, 2020)
Abstract • The only computation leaks information model was proposed by Micali [14]. It requires that the leak only The traditional cryptographic schemes cannot guarantee occurs in the memory part of the system executing data security and users privacy under the side-channel at- the calculation, and the memory part that does not tacks. Additionally, most of the existing leakage-resilient participate in the calculation is not leaked. The rea- schemes cannot protect the privacy of the receivers. To son given in [14] is as “data can be placed in some achieve the leakage resilience and privacy-preserving, two form of storage where, when not being accessed and anonymous attribute-based encryption (ABE) schemes computed upon, it is totally secure.” for general access structures are proposed. In the first scheme, the access structure is encoded as minimal sets • The relative leakage model (also named memory- which provide the higher efficiency in the cost of de- attacks model) was proposed by Alwen et al. [1] to cryption algorithm. Then we show how to obtain an deal with cold boot attacks where the part not in- anonymous leakage-resilient ABE for non-monotone ac- volved in the operation also leaked information. In cess structures. Both schemes can tolerate the contin- this model, the leakage amount is bounded by a pre- ual leakage when an update algorithm is employed in the determined value. event of the occurrence of the leakage information be- yond the allowable leakage bound. They are proven to be • The bounded retrieval model is a model that stronger adaptively secure in the standard model under four static than the relative leakage model [2, 21, 25, 27, 29]. In assumptions over composite order bilinear group. The this model, the leakage parameter l is an arbitrary performance analyses confirm efficiency of our schemes. and independent parameter of the system, and se- cret keys can be increased to allow l bits of leakage Keywords: Anonymous; Attribute-based Encryption; without affecting the size of public keys. Ciphertext-Policy; Leakage-Resilience • The continuous leakage model [30] was put forward to solve the situation that the leakage bits exceed 1 Introduction the predetermined number in which the leakage is unbounded in the lifecycle of the system, but it is In traditional encryption schemes, security is based on bounded between consecutive updates. an idealized assumption that the adversary cannot get any information about the private keys and internal state. • The auxiliary input model was presented by Dodis [8] However, the practice shows this assumption is quite in- which required that polynomial time adversary can- valid. Many cryptographic schemes are vulnerable to side- not recover sk from f(sk) with negligible probabil- channel attacks, where an adversary can learn meaningful ity. Meanwhile, Yuen et al. [22] proposed a model information about a system by using some of the physical which combined the concepts of the auxiliary inputs information that the algorithm outputs, such as running and continual memory leakage. The scheme [28] also time, power consumption, and fault detection etc.. In or- comes from this model. der to characterize the leaked information that the adver- sary knows in the system better, various leakage models While the ABE schemes constructed in the above leak- are presented. Some of them are motivated by practical age model cannot achieve anonymity except [25,27]. Ad- issues, while others are for theoretical needs. ditionally, the number of leakage bits in [25] is bounded International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 764 and the performance of [27] is inefficient because its de- bilinear group. cryption time depends not only on the leakage parame- Though ABE can be directly applied to design secure ter but also on the number of attributes. Hence, it is access control, there is an increasing need to protect user’s natural to ask whether there is an efficient anonymous privacy in access control systems. In order to address the leakage-resilient ciphertext-policy attribute-based encryp- problem, the concept of anonymous ABE was introduced tion scheme resilient to the continual leakage. in schemes [10,11]. More related works are referred to [23, 24, 27]. In the anonymous CP-ABE, ciphertexts can not 1.1 Related Work reveal the information of corresponding attributes in the access policies. A user obtains his/her secret keys if the ABE [18] was regarded as a highly promising public key corresponding attribute sets satisfy the access structure primitive for realizing scalable and fine-grained access embedded in the ciphertexts. The user cannot decrypt control systems. Goyal et al. [9] formulated the idea and guess what access policy was specified by the data of ABE and presented key-policy ABE (KP-ABE). Then owner. As we know, there is no efficient anonymous CP- Bethencourt et al. [4] put forward to the ciphertext-policy ABE can achieve constant size ciphertexts and adaptive ABE (CP-ABE). In KP-ABE, the private keys are as- leakage-resilient security in the standard model. sociated with access policies and ciphertexts are labeled with sets of attributes. While in CP-ABE, ciphertexts 1.2 Our Contributions are associated with access policies and private keys are labeled with sets of attributes. ABE has become a re- Based on the works of [30] and [6], we put forward efficient search hotspot because it implements one-to-many en- anonymous leakage-resilient CP-ABE schemes for general cryption. Following this trend, many schemes have been access structures, which has better leakage rate and adap- proposed [5, 7, 13, 26]. tive security under the four static assumptions in the stan- Schemes [4,9] adopted the access tree which is a mono- dard model over composite order bilinear group. In ad- tone access structure. A KP-ABE scheme can handle non- dition, the proposed schemes were built on the relative monotone access structures over attributes where the ac- leakage model, and implicitly used an update algorithm to cess structures can be a boolean formula involving AND, tolerate continual leakage on the private keys. In the secu- OR, NOT, and threshold operations was proposed by R. rity proof, we use the dual system encryption [20], where Ostrovsky et al. [16]. Lewko et al. [12] first put forward we extend the semi-functional keys into two types: truly a CP-ABE scheme and a KP-ABE scheme where the semi-functional and nominally semi-functional. Normal access structures were monotone span program (MSP). keys and nominally semi-functional keys can decrypt nor- Both schemes are proven to be fully secure under Deci- mal ciphertexts and semi-functional ciphertexts, but truly sional Subgroup assumptions in the standard model over semi-functional keys cannot decrypt the challenge semi- composite order bilinear group. Subsequently, Okamoto functional ciphertexts. In addition, the method to prove and Takashima [15] brought forward a KP-ABE and a the indistinguishability of nominally semi-functional and CP-ABE for non-monotone access structures which were truly semi-functional is similar to [30], so we omitted it shown to be fully secure under a standard assumption, in the paper. The access structure used in the first con- the Decisional Linear (DLIN) assumption, in the standard struction is constructed by minimal sets with multi-valued model over prime order bilinear group. Then Waters [19] attributes which provides the ability to fast decryption. proposed three efficient selectively secure CP-ABE con- structions by employing MSP to express access structure in the standard model under Decisional Parallel Bilinear 1.3 Organizations Diffie-Hellman Exponent (PBDHE) assumption. Lately, The rest of paper is organized as follows. In Section 2, Attrapadung et al. [3] introduced a KP-ABE scheme for some preliminaries are given. Section 3 gives the defini- non-monotone access structures with constant size cipher- tion of leakage resilience of ABE. The security definition texts, which was selectively secure under Decisional q- is presented in Section 4. The construction of scheme and parallel Bilinear Diffie-Hellman Exponent (q-DBDHE) as- anonymity, performance, and efficiency analysis are given sumption in the standard model over prime order bilin- in Section 5. And security proof is introduced in Section 6. ear group. This scheme also adopted the method of Os- In Section 7, we give an anonymous leakage-resilient CP- trovsky et al. [16] to convert the non-monotone access ABE scheme for non-monotone access structures. Finally, structures to monotone access structures with negative we conclude this paper in Section 8. attributes. Based on the fact that there are some mono- tone access structures for which the size of MSP is at least the number of minimal sets, while the number of minimal 2 Preliminaries sets is constant. Pandit and Barua [17] put forward an ABE which used minimal sets to describe general access 2.1 Notations structures. They also constructed a corresponding hier- archical (H)KP-ABE scheme. All of the schemes achieve 1) Angle brackets h·, ·i denotes two vectors inner prod- full security in the standard model over composite order uct, and parentheses (·, ·, ·) denotes vectors. The dot International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 765
product of vectors is denoted by ‘·’ and component- $ $ Assumption 2. Pick g1,U1 ←− Gp1 ,U2,W2 ←− Gp2 , wise multiplication is denoted by ‘∗’. $ $ $ $ g3,W3 ←− Gp3 , g4 ←− Gp4 ,T1 ←− Gp1p2p3 ,T2 ←− Gp1p3 2) The fact that χ is picked uniformly at random from a and set E = (G, g1, g3, g4,U1U2,W2W3). Define the $ advantage of an algorithm A in breaking Assumption finite set Ω is denoted by χ ←− Ω, and that all ψ, ω, ζ 2 to be are picked independently and uniformly at random $ 2 from Ω is denoted by ψ, ω, ζ ←− Ω. AdvA(λ) = |P r[A(E,T1) = 1] − P r[A(E,T2) = 1]| (2) ~ρ 3) Let ~ρ = (ρ1, ρ2, ..., ρn), ~σ = (σ1, σ2, ..., σn), g denote We say that Assumption 2 holds if for all PPT algo- ~ρ ρ ρ ρ the vector of group element g = (g 1 , g 2 , ..., g n ), 2 rithm A, AdvA(λ) ≤ negl(λ) holds for the security the inner product vectors ~ρ and vector ~σ is denoted λ. by h~ρ,~σi and the bilinear group inner product is denoted bye ˆ (g~ρ, g~σ). i.e., h~ρ,~σi = P ρ σ , and $ $ n i∈[n] i i Assumption 3. Pick α, s, r ←− ZN , g1 ←− ~ρ ~σ Q ρi σi h~ρ,~σi $ $ $ eˆn(g , g ) = eˆ(g , g ) =e ˆ(g, g) . i∈[n] Gp1 , g4 ←− Gp4 , U2,W2, g2 ←− Gp2 , g3 ←− αs $ Gp3 ,T1 =e ˆ(g1, g1) ,T2 ←− GT , and set r r α s 4) A negligible function of λ is denoted by negl(λ). E = (G, g1, g2, g3, g4, g2,U2 , g1 U2, g1W2). De- fine the advantage of an algorithm A in breaking 5) {1, 2, ..., n} is denoted by [n]. Assumption 3 to be
3 AdvA(λ) = |P r[A(E,T1) = 1] − P r[A(E,T2) = 1]| 2.2 Minimal Set and Its Critical Set (3) Definition 1. Let Γ be a monotonic access structure over We say that Assumption 3 holds if for all PPT algo- 3 the set of attributes V = {a1, a2, ..., an}. If ∀A ∈ Γ\{B} rithm A, AdvA(λ) ≤ negl(λ) holds for the security λ. where B ∈ Γ, we have A ⊂ B invalid, then B is called $ $ a minimal authorized set. The collection of all minimal Assumption 4. Pick s,r, ˆ sˆ ←− ZN , g1, U1 ←− Gp1 , g4, sets in Γ is called the basis of Γ. $ $ $ $ U4 ←− Gp4 ,U2,W2, g2 ←− Gp2 , g3 ←− Gp3 ,W24,D24 ←− $ s $ Definition 2. (Dual of access structure) If V \A = Gp2p4 ,T1 ←− U1 D24,T2 ←− Gp1p2p4 , and set E = (G, c rˆ rˆ s sˆ A 6∈ Γ where A ⊂ V , then the collection of sets A is g1, g2, g3, g4, U1U4, U1 U2, g1W2, g1W24, U1g3). De- composed of the dual of access structure Γ⊥ of an access fine the advantage of an algorithm A in breaking As- structure Γ over V . sumption 4 to be
4 Definition 3. (Critical set of minimal sets) If every AdvA(λ) = |P r[A(E,T1) = 1] − P r[A(E,T2) = 1]| Yi ∈ H contains a set Bi ⊂ Yi, there is |Bi| ≥ 2: (4) We say that Assumption 4 holds if for all PPT algo- The set Bi uniquely determines Yi in the set H. i.e., no 4 rithm A, AdvA(λ) ≤ negl(λ) holds for the security λ. other set in H contains Bi.
∀Z ⊂ B , set S = S T (Y \Z) does not con- i Z Yj ∈H,Yj Z6=∅ j tain any element of B. 3 Leakage Resilience of CP-ABE A CP-ABE scheme with continual leakage model is com- If (I) and (II) hold, where B = {Y1,Y2, ..., Yr} is the set of minimal set of an access structure Γ, and H ∈ B be posed of the following five algorithms: a subset of minimal sets, then H is called a critical set of minimal sets for B. Setup((λ, V, l) −→ (PK,MSK)): The setup algorithm takes a security parameter λ, a description of at- tribute universe set V and a leakage bound l as input. 2.3 Complexity Assumptions It outputs system public keys PK and master secret $ $ $ keys MSK. Assumption 1. Pick g1 ←− Gp1 , g3 ←− Gp3 , g4 ←− $ $ Gp4 , T1 ←− Gp1p4 ,T2 ←− Gp1p2p4 and set E = KeyGen((PK,MSK,S) −→ SKS): The key generation (G, g1, g3, g4). Define the advantage of an algorithm algorithm inputs the public keys PK, master secret A in breaking Assumption 1 to be keys MSK and an attribute set S, returns secret keys SKS. 1 AdvA(λ) = |P r[A(E,T1) = 1] − P r[A(E,T2) = 1]| 0 (1) UpdateUSK((P K, S, SKS) −→ SKS): On input the We say that Assumption 1 holds if for all PPT algo- public keys PK, a set of attributes S and the secret 1 rithm A, AdvA(λ) ≤ negl(λ) holds for the security keys SKS, it outputs re-randomized secret keys 0 λ. SKS. International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 766
Encrypt((PK,M, Γ) −→ CTΓ): The encryption algo- Challenge: A outputs two pairs of message and access rithm takes the public keys PK, a message M, and structure (M0, Γ0), (M1, Γ1) to C, where for every an access structure Γ over the universe of attributes S ∈ R, neither satisfies Γ0 nor satisfies Γ1. With as input, and outputs ciphertexts CTΓ such that the restriction that the length of the message M0 only users whose attribute sets satisfy the access equals to the length of the message M1, C selects structure Γ should be able to extract M. b ∈ {0, 1} randomly and encrypts the message Mb under the access structure Γ , and sends the resulting Decrypt((PK,CT ,SK ) −→ M): The algorithm takes b Γ S ciphertexts to A. the public keys PK, ciphertexts CTΓ and secret keys SKS as input, outputs the message M if and only Phase 2: This phase is the same as Phase 1 with the if the attribute set S of key SKS satisfies the access additional restrictions that reveal queries and update structure Γ. queries can be performed, and cannot execute leakage queries. The attributes of the query do not satisfy the challenge access structure. 4 Security Definition Guess: A outputs a guess b0 ∈ {0, 1} and wins the game For key leakage attacks, we provide a game between an if b0 = b. adversary A and a challenger C to achieve an anony- We say that an anonymous attribute-based encryption mous leakage-resilient ciphertext-policy attribute-based scheme is l leakage-resilient and adaptively secure against encryption scheme. The security parameter and the up- chosen plaintext attacks (ANON-IND-CPA) if for all per bound of leakage are denoted by λ and l respectively. polynomial time adaptive adversaries A, the advantage of Setup: The challenger C runs the setup algorithm to gen- A in the above mentioned game is negligible, where the ANON−IND−CPA erate the public keys and master keys (PK,MSK), advantage of A is defined as AdvA (λ, l) = 0 1 and sends the public keys to the adversary A while |P r[b = b] − 2 |. keeps the master secret keys. At the same time, C cre- ates two initial empty lists: Q = (hd, S, SKS,LSK ), R = (hd, S) to store records, where all records are 5 Anonymous Leakage-Resilient associated with a handle hd and LSK means total CP-ABE for MAS leakage bits. 5.1 Concrete Construction Phase 1: In this stage, A can adaptively perform the following queries: Our scheme relies on a composite order bilinear group where its order is N = p p p p , and p , p , p , p are dis- • Key Generation queries: A submits the at- 1 2 3 4 1 2 3 4 tinct primes. The main system is built in subgroup, tribute set S to C, and C runs the key gen- Gp1 while the subgroup acts as the semi-functional space. eration algorithm to generate the private keys Gp2 The subgroup provides the additional randomness on SK . Challenger sets hd = hd + 1, then adds Gp3 S keys to isolate keys in our hybrid games. will make (hd, S, SK , 0) to the set Q. In this query, C Gp4 S the scheme achieve anonymity. Then we would extend the only gives A hd of the generated keys rather composite order group to multiple dimensional to tolerate than the concrete keys itself. the possible leakage. • Leakage queries: A gives a polynomial-time Let V = {attr1, attr2, ..., attrn} be a set of attributes. computable arbitrary function f : {0, 1}? −→ Each attribute contains ni possible values and vi,j repre- ? {0, 1} with a queried handle hd of the keys to C. sents the jth value of attri, and I ⊂ {1, 2, ..., n} is the C finds the tuple (hd, S, SKS,LSK ) and checks attribute name index. if LSK +|f(SK)| ≤ l established. If this is true, Setup((λ, V, l) −→ (PK,MSK)): The setup algorithm it returns f(SK) to A and updates LSK with takes as input a security parameter λ, the attribute LSK + |f(SK)|. If the check fails, it returns ⊥ to the A. universe description V and a leakage upper bound l. Then the algorithm generates the public keys and • Reveal queries: A gives the handle hd for a spec- master secret keys as follows. Run the bilinear group ified key SKS to C. The C scans Q to find the re- generator to produce Φ = (N = p1p2p3p4, G, GT , eˆ). quested entry and returns the private keys SKS −τ Define negl = p2 as the allowable maximum prob- to A. Then C removes the item from the set Q ability in succeeding in leakage guess and compute and adds it to the set R. ω = d1 + 2τ + l e, where τ is a positive constant. log p2 • Update queries: A issues a key update query for l In practice, ω ≈ d1 + e. Select g1,X1 ∈ p , log p2 G 1 SKS. C searches the record in Q. If it is not g ∈ , g ,X ∈ , α ∈ , ~ρ ∈ ω ran- found, C returns the keys with key generation 3 Gp3 4 4 Gp4 ZN ZN domly. For each i ∈ [n], j ∈ [ni], choose random algorithm and sets LSK = 0. Otherwise, C re- values t ∈ and set the public keys as follows. turns with UpdateUSK algorithm and updates i,j ZN ~ρ the corresponding LSK = 0. PK = (N, g1, g4, g1 , y, Y, Ti,j; ∀i ∈ [n], j ∈ [ni]), International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 767
where 5.2 Correctness
α ti,j y =e ˆ(g1, g1) ,Y = X1X4,Ti,j = g . The correctness can be checked by applying the orthogo-
The master secret keys are nality of Gpi , where i = 1, 2, 3, 4. If the attributes set S satisfies the access structure specified by B, then one can MSK = (X1, g3, α). obtain the below equations hold.
KeyGen((PK,MSK,S) −→ SKS): This algorithm ~ ~ ~ s~ρ d ~σ ~y1 eˆω(C1, K1) =e ˆω(g1 ∗ g4 , g1 ∗ g3 ) takes PK, MSK and an attribute set S = {v1,x1 , 0 s~ρ ~σ v , ..., v 0 0 } as input, where n ≤ n, 1 ≤ x ≤ n 2,x2 n ,xn i i =e ˆω(g1 , g1 ) 0 for each 1 ≤ i ≤ n . Then the algorithm chooses sh~ρ,~σi ω =e ˆ(g1, g1) random values t, y2, y3 ∈ ZN , ~y1, ~σ ∈ ZN , and eˆ(C ,K ) =e ˆ(gsg , gα+h~ρ,~σiXtgy2 ) yi,j ∈ ZN for vi,j ∈ S, and outputs the secret keys 2 2 1 4 1 1 3 αs+sh~ρ,~σi st as follows. =e ˆ(g1, g1) eˆ(g1,X1)
s Y sk t y3 SKS = (S, K~ 1,K2,K3,Ki,j; ∀vi,j ∈ S), eˆ(C3,k,K3) =e ˆ(Y ( Ti,j) Wk, g1g3 ) vi,j ∈Bk in which st Y skt =e ˆ(g1,Y ) eˆ( Ti,j, g1) ~ ~σ ~y1 t y3 K1 = g1 ∗ g3 ,K3 = g1g3 , vi,j ∈Bk y α+h~ρ,~σi t y2 t yi,j Y sk Y t i,j K2 = g1 X1g3 ,Ki,j = Ti,jg3 . eˆ(C4,k, Ki,j) =e ˆ(g1 Vk, Ti,jg3 ) vi,j ∈Bk vi,j ∈Bk 0 UpdateUSK((P K, S, SKS) −→ SKS): The key update Y skt ω =e ˆ(g1, Ti,j) algorithm selects ∆t, ∆y2, ∆y3 ∈ N , ∆~y1, ∆~σ ∈ Z ZN vi,j ∈Bk randomly, picks ∆yi,j ∈ ZN for vi,j ∈ S at random, 0 and outputs a new key SKS: 0 ~ 0 0 0 0 SKS = (S, K1,K2,K3,Ki,j; ∀vi,j ∈ S), 5.3 Anonymity Analysis where This section will show that the proposed scheme achieves ~ 0 ~ ∆~σ ∆~y1 0 ∆t ∆y3 K1 = K1 ∗ g1 ∗ g3 ,K3 = K3g1 g3 , the anonymity over the composite order bilinear group.
0 h~ρ,∆~σi ∆t ∆y2 0 ∆t ∆yi,j Compared with scheme [30], our scheme adds some K2 = K2g1 X1 g3 ,Ki,j = Ki,jTi,j g3 . random elements in Gp4 to each part of the cipher- texts. These random elements will not make an ef- Encrypt((PK,M, Γ) −→ CT ): At first, this algorithm Γ fect on the decryption process. However, they are converts the monotonic access structure Γ to the necessary for anonymity of the scheme. Because if set of minimal sets B = {B1,B2, ..., Bm˜ }, where ∗ there is no such elements, for some minimal sets Bk, Bk(k ∈ [m ˜ ]) is a set of attribute values. It selects ~ ω the adversary may determine whether the ciphertext s, s1, s2, ..., sm˜ ∈ ZN , d ∈ ZN ,Wk,Vk ∈ Gp4 (k ∈ component C3,k of the ciphertext C~3 is encrypted un- [m ˜ ]) randomly, then outputs the resulting ciphertexts ∗ der Bk or not. In our scheme, by utilizing the CTΓ and the index IBk ⊂ {1, 2, ..., n}(k ∈ [m ˜ ]) corre- ? Q ∗ DDH-teste ˆ(C3,k, g1) =e ˆ(Y,C2)ˆe( ∗ ∗ Ti,j ,C4,k) sponding to attribute Bk(k ∈ [m ˜ ]). vi,j ∈Bk to determine whether the ciphertext component C3,k ~ ~ ~ ∗ CTΓ = ({IBk }k∈[m ˜ ],C0, C1,C2, C3, C4), is encrypted under the Bk or not. The DDH-test ? Q ∗ eˆ(C3,k, g1) =e ˆ(Y,C2)ˆe( ∗ ∗ Ti,j ,C4,k) is the same where vi,j ∈Bk eˆ(C3,k,g1) ? s s~ρ d~ as Q = 1. The followings are the ~ eˆ(Y,C2)ˆe( v ∈B∗ Ti,j∗ ,C4,k) C0 = My , C1 = g1 ∗ g4 , i,j∗ k detailed analyses. s ~ sk C2 = g1g4, C4 = (C4,k)k∈[m ˜ ] = (g1 Vk)k∈[m ˜ ],
~ s Y sk C3 = {C3,k}k∈[m ˜ ] = {Y ( Ti,j) Wk}k∈[m ˜ ]. s Y sk eˆ(C3,k, g1) =e ˆ(Y ( Ti,j) Wk, g1) vi,j ∈Bk vi,j ∈Bk
s Y sk Decrypt((PK,CTΓ,SKS) −→ M): If the attributes set =e ˆ(Y , g1)ˆe(( Ti,j) , g1)
S satisfies the access structure specified by B, then S vi,j ∈Bk must be a superset of a minimal set in B. Let B ⊂ S Y k =e ˆ(Y, g )seˆ( T , g )sk for some k ∈ [m ˜ ], this algorithm calculates 1 i,j 1 vi,j ∈Bk s C · eˆ (C~ , K~ )ˆe(C ,K ) eˆ(Y,C2) =e ˆ(Y, g1) eˆ(Y, g4) M = 0 ω 1 1 3,k 3 eˆ(C ,K )ˆe(C , Q K ) 2 2 4,k vi,j ∈Bk i,j International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 768
Y Y sk eˆ( Ti,j∗ ,C4,k) =e ˆ( Ti,j∗ , g1 Vk) 1200 v ∗ ∈B∗ v ∗ ∈B∗ i,j k i,j k [30] [21] Y sk =e ˆ( Ti,j∗ , g1 ) 1000 [27] ∗ Ours vi,j∗ ∈Bk
Y sk =e ˆ( Ti,j∗ , g1) 800 ∗ vi,j∗ ∈Bk Q sk 600 eˆ( Ti,j , g1) eˆ(C3,k, g1) vi,j ∈Bk Q = Q s eˆ(Y,C2)ˆe( ∗ Ti,j∗ ,C4,k) eˆ(Y, g4)ˆe( ∗ Ti,j∗ , g1) k vi,j∗ ∈Bk vi,j∗ ∈Bk KeyGen Time (ms) 400 ∗ 0 If Bk = Bk, then vi,j = vi,j∗ for all i, 1 ≤ i ≤ n ≤ n. eˆ(C3,k,g1) 1 200 Therefore Q = . eˆ(Y,C2)ˆe( v ∈B∗ Ti,j∗ ,C4,k) eˆ(Y,g4) i,j∗ k ∗ 0 If Bk 6= Bk, then there exists at last one k , where 0 0 0 10 20 30 40 50 1 ≤ k ≤ n ≤ n such that v 0 6= v 0 ∗ . Without k ,j k ,j Size of attribute set loss of generality, let vk0,j = vk0,j∗ , for all 1 ≤ i ≤ 0 0 Figure 1: KeyGen time with different number of at- n ≤ n except i = k . Then ti,j = ti,j∗ . Therefore sk eˆ(C3,k,g1) eˆ(Tk0,j ,g1) tributes Q = sk . eˆ(Y,C2)ˆe( v ∈B∗ Ti,j∗ ,C4,k) eˆ(Y,g4)ˆe(Tk0,j∗ ,g1) i,j∗ k ∗ ∗ In both cases, Bk = Bk and Bk 6= Bk, the DDH-test 1200 gives a random element of GT so that the adversary will [30] be not able to determine whether the component C of [21] 3,k 1000 Ours ~ ∗ the ciphertext C3 is encrypted under the Bk or not. So the access structure is hidden, and our scheme is anonymous. 800
5.4 Performance Analysis 600 As shown in Table 1, we give the performance compar- 400 isons of schemes [21,27,30] and the proposed scheme in the UpdateUSK Time (ms) access policy, leakage model, support multi-functionality 200 and anonymity. All these schemes are constructed un- der the key leakage model, in which the access structure 0 of [21, 27] is denoted by the linear secret sharing (LSSS), 10 20 30 40 50 while that of [30] and the proposed scheme are repre- Size of attribute set sented by the minimal sets. In addition, it can be found Figure 2: UpdateUSK time with different number of at- that [21,30] do not support anonymity and scheme [21,27] tributes do not support multi-show functionality. However, our scheme achieves the anonymity and attribute multi-show Figure 4 gives the comparison of decryption time with ability simultaneously. different leakage parameters, where the leakage pa- rameter changes from 5 to 25. 5.5 Efficiency Analysis From the analysis of the experimental results, we can see We present the performance evaluation based on our that our scheme has advantages over [27] when the num- DMA implementation prototype. Our experiment is im- ber of attributes is between 10 and 20. While, compared plemented on Pairing-Based Cryptography (PBC) library with [21,30], the proposed scheme spends less time. More- to implement the scheme. We will compare the computa- over, the proposed scheme has obvious advantage over [27] tional efficiency of our scheme with scheme [21,27,30]. In in decryption. In summary, our scheme is quite practical the Figures 1, 2 and 3, the leakage parameter is set to be and efficient. ω = 5. Figure 1 shows the comparison of key generation time 6 Security Proof with different number of attributes, where the num- ber of attribute changes from 10 to 50. In the proof, we generate normal private keys and cipher- texts which are used in the real scheme. Then we generate Figure 2 presents the comparison of update time with semi-functional keys and ciphertexts which are used in the different number of attributes. proofs. They are shown as follows.
Figure 3 provides the comparison of encryption time • KeyGenSF. SKS = (S, K~ 1,K2,K3,Ki,j; ∀vi,j ∈ S) with different number of minimal sets, where the be the normal keys. The semi-functional keys are as number of minimal sets varies from 5 to 25. follows. International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 769
Table 1: Performance analysis Scheme Access policy Leakage Model Multi-show attr Anonymity [21] LSSS Continual leakage No No [30] Minimal Sets Continual leakage Yes No [27] LSSS Bounded leakage Not Yes Ours Minimal Sets Continual leakage Yes Yes
¯ ~ texts are converted as: CTΓ = ({IBk }k∈[m ˜ ],C0, C1 ∗ 3000 ~e1 e2 ~ ~e3 ~ [30] g2 ,C2g2 , C3 ∗ g2 , C4), where ~e1, e2,~e3 are random Ours elements in ZN . 2500 If we use the Type 1 of semi-functional keys to decrypt 2000 a semi-functional ciphertext, we will obtain extra term ~ hd1,~e1i−d2e2+d3e3,k ~ eˆ(g2, g2) . If hd1,~e1i − d2e2 + d3e3,k = 0, 1500 we can call it a nominally semi-functional key, otherwise it is truly semi-functional.
1000 The proof uses a series of indistinguishable games to Encryption Time (ms) prove the indistinguishability between in the Gamereal and Game . There will be 2Q + 4 games between an 500 final1 adversary A and a challenger C, where Q is the number of key queries times and k0 is from 0 to Q. The concrete 0 5 10 15 20 25 definition of games is as follows. The number of minimal sets Game : This is the real anonymous ABE security Figure 3: Encryption time with different number of min- real game, where all private keys and the ciphertexts are imal sets in normal form.
1800 Game0: All private keys are in normal form, and the [30] challenge ciphertexts are in semi-functional form. 1600 [27] Ours Gamek0,1: The challenge ciphertexts are semi- 1400 functional. The first k0 −1 keys are semi-functional of 1200 Type 2, and the k0th key is the semi-functional form
1000 of Type 1. The remaining keys are normal.
800 Gamek0,2: The challenge ciphertexts are semi- functional. The first k0 keys are semi-functional of 600 Decryption Time (ms) Type 2, and the remaining keys are normal. 400 Gamefinal0: All private keys are the Type 2 of semi- 200 functional keys. And the challenge ciphertexts are 0 semi-functional where C0 is random in group GT . 0 5 10 15 20 25 30 35 40 45 50 Size of attribute set Gamefinal1: It is same as Gamefinal0 except that the ~ Figure 4: Decryption time with different number of at- C3 is random in group Gp1p2p4 . tributes We can see that in GameQ,2, all of the keys are semi- functional. And in the last game, the adversary has no ~ ¯ ~ d1 d2 d3 – Type 1: SKS = (S, K1 ∗ g2 ,K2g2 ,K3g2 ,Ki,j advantage. gdi,j ; ∀v ∈ S), where g is a generator of group 2 i,j 2 Lemma 1. Suppose there exists a PPT adversary A ~ p and d1, d2, d3, di,j are random elements in G 2 who can distinguish Gamereal and Game0 with the non- ZN . negligible advantage , then there is a PPT algorithm B ¯ ~ d2 – Type 2: SKS = (S, K1,K2g2 ,K3,Ki,j; ∀vi,j ∈ with the advantage in breaking Assumption 1. S) Proof. B receives E = (G, g1, g3, g4) from the challenger ~ ~ ~ C and simulates Game or Game with A depending • EncSF. Let CTΓ = ({IB }k∈[m ˜ ],C0, C1,C2, C3, C4) real 0 k $ $ be a normal ciphertext. The semi-functional cipher- on whether T ←− Gp1p4 or T ←− Gp1p2p4 . International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 770
Setup: B takes the security parameter λ and leakage up- generating Type 2 of semi-functional keys. per bound l as input and outputs the description ¯ ~ of group: Φ = (N = p1p2p3p4, G, GT , eˆ). Then SKS = (S, K1,K2,K3,Ki,j; ∀vi,j ∈ S)
B generates the public keys as follows. Set ω = ~σ ~y1 α+h~ρ,~σi t h y2 t y3 l = (S, g1 ∗ g3 , g1 X1(W2W3) g3 , g1g3 , d1 + e. Select α, a, b, ti,j ∈ N randomly, and logp2 Z t yi,j a b ω Ti,jg3 ; ∀vi,j ∈ S) set Y = X1X4 = g1 g4. Choose ~ρ ∈ ZN randomly α ~ρ and generate PK = (N, g1, g4, eˆ(g1, g1) , Y, g ,Ti,j = 1 0 0 ω ti,j And if i = k , B selects ~σ ∈ that satisfies h~σ, ~ρi = g ; ∀i ∈ [n], j ∈ [ni]). ZN 1 0, outputs the private key as follows: Phase 1: B generates normal keys for attribute sets in ¯ ~ the key generation queries (keep in mind that the SKS = (S, K1,K2,K3,Ki,j; ∀vi,j ∈ S) y private keys of A query are either in normal form or = (S, T ~σ ∗ g~y1 , gαT agy2 , T gy3 ,T ti,j g i,j ; in semi-functional form). In addition, B can answer 3 1 3 3 3 the queries of leakage, reveal and update. ∀vi,j ∈ S)
$ If T ←− p p , the private key is a normal key, B sim- Challenge: A sends two equal length message M0,M1 G 1 3 $ 0 and access structures Γ0, Γ1. B selects random ulates the Gamek −1,2. If T ←− Gp1p2p3 , the private b ∈ {0, 1} and encrypts Mb under the access struc- key is a semi-functional key of Type 1. In this case, ture Γb. B encodes the access structure as the B simulates Gamek0,1. ∗ set of minimal sets B = {B1,B2, ..., Bm˜ }, where θ ~ Set the part of T in Gp2 is g2, then d1 = θ~σ, d2 = Bk(k ∈ [m ˜ ]) is a set of attribute values. Select aθ, d3 = θ. In addition, A can ask the oracles of leak random element s1, s2, ..., sm˜ ∈ ZN and generate ¯ and update. the challenge ciphertexts CTΓ = ({IBk }k∈[m ˜ ],C0 = α ~ ~ρ d~ ~ Mbeˆ(g1 ,T ), C1 = T ∗ g4 ,C2 = T g4, C3 = Challenge: It is similar with Lemma 1. Set U1 = ξ a Q sk sk s {T ( T ) W } , C~ = (g V ) ). g1,U2 = g and calculate the ciphertexts vi,j ∈Bk i,j k k∈[m ˜ ] 4 1 k k∈[m ˜ ] 2 Phase 2: It is similar with Phase 1. B can answer the ~ ~ ~ CTΓ = ({IBk }k∈[m ˜ ],C0, C1,C2, C3, C4), queries of reveal and update with the restriction that the attribute sets of adversary queries cannot meet in which the challenge access structure. α ~ρ d~ C0 = Mbeˆ(g ,U1U2), C~1 = (U1U2) ∗ g , Guess: The adversary A outputs a guess b0 ∈ {0, 1}. If 1 4 ~ sk b0 = b, A wins the game. C2 = (U1U2)g4, C4 = (g1 Vk)k∈[m ˜ ]), ~ a Y sk C3 = {(U1U2) ( Ti,j) Wk}k∈[m ˜ ]. $ c If T ←− Gp1p2p4 , set T as g2. It implicitly sets the vi,j ∈Bk semi-functional factor of the challenge ciphertexts as ~ (c~ρ,c, ac1, 0). In this situation, B simulates the Game0. Phase 2: It is similar with Phase 1. B can answer the Otherwise, B simulates the Gamereal. queries of reveal and update with the restriction that So if A can distinguish two games with a non-negligible the attribute sets corresponding to any private key of advantage , B can use the algorithm to break the As- the query cannot satisfy the challenge access struc- sumption 1 with the same advantage. ture. Lemma 2. Suppose there exists a PPT adversary A Guess: The adversary A outputs a guess b0 ∈ {0, 1}. If can distinguish Game 0 and Game 0 with the non- k −1,2 k ,1 b0 = b, A wins the game. negligible advantage , then is a PPT algorithm B with the advantage in breaking the Assumption 2. Based on the above descriptions, we obtain the semi-functional factor of the challenge ciphertexts is Proof B receives E = (G, g1, g3, g4,U1U2,W2W3) and ~ (ξ~ρ,ξ, aξ~1, 0). And the equation hd1,~e1i − d2e2 + d3e3,k = simulates Gamek0−1,2 or Gamek0,1 with A depending on 0th $ $ 0 holds. If the attributes of k keys satisfy the chal- whether T ←− Gp1p2p3 or T ←− Gp1p3 . lenge access structure, it is a nominally semi-functional Setup: It is same as Lemma 1. key. Following the Lemma 5 in [30], the leakage of the key can not help adversary detect the k0th private key is Phase 1: Because B knows the master keys, it can an- normal or semi-functional. swer all private key queries. Lemma 3. Suppose there exists a PPT adversary A If i0 > k0, B generates normal keys. can distinguish Gamek0,1 and Gamek0,2 with the non- 0 0 If i < k , B selects t, h, y2, y3 ∈ ZN and ~σ, ~y1 ∈ negligible advantage , then there is a PPT algorithm B ω ZN randomly, and picks yi,j ∈ ZN at random for with the advantage in breaking the Assumption 2. International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 771
Proof. Unlike construction of Lemma 2, the construction Proof. B receives the instance E = (G, g1, g2, g3, g4,U1 0th rˆ rˆ s sˆ of k key is as follows. U4,U1 U2, g1W2, g1W24,U1g3) and simulates the Gamefinal0 or Gamefinal1. SK¯ S =(S, K~ 1,K2,K3,Ki,j; ∀vi,j ∈ S) y ω ~σ ~y1 α a y2 d y3 ti,j i,j Setup: B selects ti,j, α ∈ ZN , ~ρ ∈ ZN randomly, and =(S, T ∗ g3 , g1 T g3 (W2W3) , T g3 ,T g3 ; sets PK = (N, g , g , eˆ(g , g )α,Y = U U , g~ρ,T = ∀v ∈ S). 1 4 1 1 1 4 1 i,j i,j ti,j g ; ∀i ∈ [n], j ∈ [ni]).
ω Phase 1: B selects ~y1, ~σ ∈ ZN , t, y2, y3 ∈ ZN , yi,j ∈ ZN $ at random, calculates and outputs the secret keys as If T ←− Gp1p3 , this private key is a Type 2 of semi- $ follows. SK¯ S = (S, K~ 1,K2, K3,Ki,j; ∀vi,j ∈ S) = functional key. Then B simulates the Gamek0,2. If T ←− ~σ ~y1 α+h~ρ,~σi sˆ t y2 t y3 t yi,j (S, g ∗ g , g (U1g ) g2g , g g ,T g ; Gp1p2p3 , this private key is the Type 1 of semi-functional 1 3 1 3 3 1 3 i,j 3 key. In this case, B simulates Gamek0,1. So if A can ∀vi,j ∈ S). distinguish these two games with the advantage , B can In addition, A can ask the oracles of leak and update. break the Assumption 2 with the same advantage. Challenge: B generates the challenge ciphertexts as fol- Lemma 4. Suppose there is a PPT adversary A can dis- $ ~ s ~ρ d~ s lows: C0 ←− GT , C1 = (g1W24) ∗ g4 ,C2 = (g1W24) · tinguish GameQ,2 and Gamefinal0 with the non-negligible Q sk sk g4, C~3 = {T ( Ti,j) Wk} , C~4 = (g Vk) advantage , then there exists a PPT algorithm B with vi,j ∈Bk k∈[m ˜ ] 1 advantage in breaking Assumption 3. k∈[m ˜ ]. r Phase 2: It is similar with Phase 1. B can answer the Proof. B receives the instance E = (G, g1, g2, g3, g4, g2, r α s queries of reveal and update with the restriction that U2 , g1 U2, g1W2), simulates GameQ,2 or Gamefinal0. the attributes sets of the adversary queries cannot Setup: B selects a, b, ti,j ∈ ZN , and sets Y = X1X4 = meet the challenge access structure. gagb. Select ~ρ ∈ ω randomly and generate the 1 4 ZN 0 α ~ρ ti,j Guess: The adversary A outputs a guess b ∈ {0, 1}. If PK = (N, g1, g4, eˆ(g U2, g1), Y, g ,Ti,j = g ; ∀i ∈ 1 1 b0 = b, A wins this game. [n], j ∈ [ni]).
Phase 1: All of the keys generated are the Type 2 of $ s If T ←− U1 D24, it is a semi-functional ciphertext, and B semi-functional keys. B selects ~y , ~σ ∈ ω , t, y , y ∈ $ 1 ZN 2 3 simulates the Game . If T ←− , it is a ran- , y ∈ randomly, and outputs the secret keys final0 Gp1p2p4 ZN i,j ZN dom element, and B simulates the Game . These ¯ ~ ~σ final1 SKS = (S, K1,K2,K3,Ki,j; ∀vi,j ∈ S) = (S, g1 ∗ two games are indistinguishable. ~y1 α t h~ρ,~σi y2 t y3 t yi,j g3 , (g1 U2)X1g1 g3 , g1g3 ,Ti,jg3 ; ∀vi,j ∈ S). Theorem 1. If the Assumptions 1, 2, 3 and 4 hold and In addition, A can ask the oracles of leak and update. for l = (ω − 1 − 2τ) where τ is a positive constant, then Challenge: Similar to Lemma 1, B calculates the chal- the proposed scheme is anonymous and l leakage-resilient. lenge ciphertexts CTΓ = ({IB } ,C0 = MbT, C~1 k k∈[m ˜ ] Proof. Suppose the Assumption 1, 2, 3 and 4 hold. We s ~ρ d~ s ~ s a = (g1U2) ∗ g4 ,C2 = (g1U2)g4, C3 = {(g1U2) · can learn from the lemma 1 to lemma 5 that the adver- Q sk sk ( Ti,j) Wk} , C~4 = (g Vk) ). vi,j ∈Bk k∈[m ˜ ] 1 k∈[m ˜ ] sary A can not distinguish between the Gamereal and Phase 2: It is similar with Phase 1. B can answer the Gamefinal1. So the value of b is hidden from the A. The queries of reveal and update with the restriction that upper bound of the leakage between consecutive updates the attribute sets corresponding to any private key of is l, so it is anonymous and l leakage-resilient. the query cannot satisfy the challenge access struc- ture. Guess: The adversary A outputs a guess b0 ∈ {0, 1}. If 7 Anonymous Leakage-Resilient b0 = b, A wins the game. CP-ABE for Non-MAS
αs Obviously, we can learn that if T =e ˆ(g1, g1) , it is In this section, we give the construction of the anony- a semi-functional ciphertext of message Mb. Otherwise mous leakage-resilient CP-ABE for non-monotone access $ T ←− GT , it is a random element of GT . So if the adver- structures. In this scheme, a non-monotone access struc- sary A can distinguish these two games, B can break the ture is represented by the set of authorized sets in the assumption 3 with the same advantage. non-monotone access structure.
Lemma 5. Suppose there exists a PPT adversary A Setup((λ, V, l) −→ (PK,MSK)): Similar to Section 5, can distinguish Gamefinal0 and Gamefinal1 with the non- the setup algorithm sets the public keys as negligible advantage , then there is a PPT algorithm B ~ρ with the advantage in breaking the Assumption 4. PK = (N, g1, g4, g1 , y, Y, Ti,j; ∀i ∈ [n], j ∈ [ni]), International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 772
Table 2: Adversaries gain advantages over two consecutive games Adjacent games Adversary gain advantage differences Related lemmas Gamereal Game0 Gamereal and Game0 |AdvA − AdvA | ≤ Lemma 1 Gamek0−1,2 Gamek0,1 Gamek0−1,2 and Gamek0,1 |AdvA − AdvA | ≤ Lemma 2 Gamek0,1 Gamek0,2 Gamek0,1 and Gamek0,2 |AdvA − AdvA | ≤ Lemma 3 GameQ,2 Gamefinal0 GameQ,2 and Gamefinal0 |AdvA − AdvA | ≤ Lemma 4 Gamefinal0 Gamefinal1 Gamefinal0 and Gamefinal1 |AdvA − AdvA | ≤ Lemma 5
where where s ~ s~ρ d~ α ti,j C0 = My , C1 = g ∗ g , y = e(g1, g1) ,Y = X1X4,Ti,j = g . 1 4 s ~ sk C2 = g g4, C4 = (C4,k)k∈[m ˜ ] = (g Vk)k∈[m ˜ ], The master secret keys are 1 1 ~ s Y sk C3 = {C3,k}k∈[m ˜ ] = {Y ( Ti,j) Wk}k∈[m ˜ ].
MSK = (X1, g3, α). vi,j ∈Bk
Finally the algorithm publishs PK and keeps MSK. Decrypt((PK,CTΓ,SKS) −→ M): If the attribute set S satisfies the non-monotone access structure Γ, then KeyGen((PK,MSK,S) −→ SKS): On input pub- S ∈ Γ, i.e., S = Bk(k ∈ [m ˜ ]) for Bk ∈ Γ. This lic keys PK, the master keys MSK, an at- algorithm calculates tribute set S = {v , v , ··· , v 0 0 }, where 1,x1 2,x2 n ,xn ~ ~ 0 0 C0 · eˆω(C1, K1)ˆe(C3,k,K3) n ≤ n, 1 ≤ xi ≤ ni for each 1 ≤ i ≤ n , this M = . ω eˆ(C2,K2)ˆe(C4,k,K4) algorithm selects t, y2, y3, y4 ∈ ZN , ~y1, ~σ ∈ ZN , calculates and outputs the secret keys as follows. Theorem 2. If Assumptions 1, 2, 3 and 4 hold, then the proposed scheme is anonymous and l leakage resilience. SKS = (S, K~ 1,K2,K3,K4), Proof. The security proof of anonymous leakage-resilient in which CP-ABE scheme for non-MAS can be derived from the proof of the scheme for MAS with minor modification. ~ ~σ ~y1 t y3 K1 = g1 ∗ g3 ,K3 = g1g3 , The minor modification is that in the simulation of key α+h~ρ,~σi y Y y t 2 t 4 components for each vi,j ∈ S is just multiplied to get a K2 = g1 X1g3 ,K4 = ( Ti,j) g3 . single key component K = Q K . vi,j ∈S 4 vi,j ∈S i,j
UpdateUSK((P K, S, SK ) −→ SK0 ): The update algo- S S 8 Conclusion rithm selects ∆t, ∆y2, ∆y3, ∆y4 ∈ ZN , ∆~y1, ∆~σ ∈ ω 0 ZN at random, and outputs a new key SKS: In this paper, an anonymous leakage-resilient CP-ABE scheme for monotone access structures is proposed at 0 ~ 0 0 0 0 SKS = (S, K1,K2,K3,K4), first, in which the access structure is converted as min- where imal sets that can provide fast decryption. By using sim- ilar ideas, we present an anonymous leakage-resilient CP- ~ 0 ~ ∆~σ ∆~y1 0 ∆t ∆y3 ABE scheme with the constant size ciphertexts for non- K1 = K1 ∗ g1 ∗ g3 ,K3 = K3g1 g3 , monotone access structures. Both schemes are proven to 0 h~ρ,∆~σi ∆t ∆y2 0 Y ∆t ∆y4 K2 = K2g1 X1 g3 ,K4 = K4( Ti,j) g3 . be adaptively secure in the standard model under four vi,j ∈S static assumptions over composite order bilinear group and can tolerate continual leakage on the private keys Encrypt((PK,M, Γ) −→ CTΓ): Let Γ be a non- when a update algorithm is implicitly employed to peri- monotonic access structure where Γ = {B1,B2, ..., odically update the private keys. However, our schemes Bm˜ } and Bk(k ∈ [m ˜ ]) is a set of attribute values and cannot achieve the optimal leakage rate to ensure the effi- m˜ is the size of the non-monotone access structure Γ. ciency, so designing an efficient ABE scheme with optimal ~ ω This algorithm selects s, s1, s2, ..., sm˜ ∈ ZN , d ∈ ZN , leakage rate will be our future work. Wk,Vk ∈ Gp4 (k ∈ [m ˜ ]) at random and out- puts the ciphertexts CTΓ and the index
IBk ⊂ {1, 2, ..., n}(k ∈ [m ˜ ]) corresponding to Acknowledgments attribute Bk(k ∈ [m ˜ ]). This work was supported in part by the International S&T ~ ~ ~ CTΓ = ({IBk }k∈[m ˜ ],C0, C1,C2, C3, C4), Cooperation Program of Shaanxi Province No. 2019KW- International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 773
056, the National Cryptography Development Fund under [14] S. Micali and L. Reyzin, “Physically observable cryp- grant (MMJJ20180209). tography,” in Theory of Cryptography, pp. 278–296, 2004. [15] T. Okamoto and K. Takashima, “Fully secure func- References tional encyption with general relations from the deci- sional linear assumption,” Crypto, vol. 6223, pp. 191– [1] J. Alwen, Y. Dodis, M. Naor, G. Segev, S. Wal- 208, 2010. fish, and D. Wichs, “Public-key encryption in the [16] R. Ostrovsky, A. Sahai, and B. Waters, “Attribute- bounded-retrieval model,” Lecture Notes in Com- based encryption with non-monotonic access str- puter Science, vol. 2009, no. 5, pp. 113–134, 2010. tures,” in Proceedings of the 14th ACM Confer- [2] J. Alwen, Y. Dodis, and D. Wichs, “Leakage-resilient ence on Computer and Communications Security public-key cryptography in the bounded-retrieval (CCS’07), pp. 195–203, 2007. model,” in Advances in Cryptology, pp. 36–54, 2009. [17] T. Pandit and R. Barua, “Efficient fully secure [3] N. Attrapadug, B. Libert, and E. D. Panafieu, “Ex- attribute-based encryption schemes for general ac- pressive key-policy attribute-based encryption with cess structures,” in Provable Security, pp. 193–214, constant-size ciphertexts,” in Public Key Cryptogra- 2012. phy (PKC’11), pp. 90–108, 2011. [18] A. Sahai and B. Waters, “Fuzzy identity-based en- [4] J. Bethencourt, A. Sahai, and B. Waters, cryption,” in Advances in Cryptology, pp. 457–473, “Ciphertext-policy attribute-based encryption,” 2005. in IEEE Symposium on Security and Privacy, pp. 321–334, 2007. [19] B. Waters, “Ciphertext-policy attribute-based en- [5] Z. Cao, L. Liu, and Z. Guo, “Ruminations on cryption: An expressive, efficient, and provably seure attribute-based encryption,” International Journal realization,” Lecture Notes in Computer Science, of Electronics and Information Engineering, vol. 8, vol. 2008, pp. 321–334, 2008. no. 1, pp. 9–19, 2018. [20] B. Waters, “Dual system encryption: Realizing fully [6] A. D. Caro, V. Iovino, and G. Persiano, “Fully secure secure ibe and hibe under simple assumptions,” in anonymous hibe and secret-key anonymous ibe with International Cryptology Conference on Advances in short ciphertexts,” in International Conference on Cryptology, pp. 619–636, 2009. Pairing-Based Cryptography, pp. 347–366, 2010. [21] Q. Yu and J. Li, “Continuous leakage resilient [7] P. S. Chung, C. W. Liu, and M. S. Hwang, “A study ciphertext-policy attribute-based encryption sup- of attribute-based proxy re-encryption scheme in porting attribute revocation,” Computer Engineer- cloud environments,” International Journal of Net- ing and Applications, vol. 52, no. 20, pp. 29–38, 2016. work Security, vol. 16, no. 1, pp. 1–13, 2014. [22] T. H. Yuen, S. S. M. Chow, Y. Zhang, and S. M. [8] Y. Dodis, Y. T. Kalai, and S. Lovett, “On cryptogra- Yiu, “Identity-based encryption resilient to contin- phy with auxiliary input,” in Proceedings of the 41st ual auxiliary leakage,” in International Conference Annual ACM symposium on Theory of Computing, on Theory and Applications of Cryptographic Tech- pp. 621–630, 2009. niques, pp. 117–134, 2012. [9] V. Goyal, O. Pandey, A. Sahai, and B. Waters, [23] L. Zhang, Y. Cui, and Y. Mu, “Improving privacy- “Attribute-based encryption for fine-grained access preserving CP-ABE with hidden access policy,” in control of encrypted data,” in ACM Conference on The 4th International Conference on Cloud Comput- Computer and Communications Security, pp. 89–98, ing and Security, pp. 596–605, 2018. 2006. [24] L. Zhang, G. Hu, Y. Mu, and F. Rezaeibagha, [10] A. Kapadia, P. Tsang, and S. W. Smith, “Attribute- “Hidden ciphertext policy attribute-based encryp- based publishing with hidden credentials and hidden tion with fast decryption for personal health record policies,” NDSS, vol. 7, pp. 179–192, 2007. system,” IEEE Access, no. 7, pp. 33202–33213, 2019. [11] J. Katz, A. Sahai, and B. Waters, “Predicate en- [25] L. Zhang and Y. Shang, “Leakage-resilient attribute- cryption supporting disjunctions, polynomial equa- based encryption with CCA2 security,” International tions, and inner products,” in Advances in Cryptol- Journal of Network Security, vol. 22, no. 6, pp. 1–9, ogy, pp. 146–162, 2008. 2019. [12] A. Lewko, T. Okamoto, A. Sahai, K. Takashima, [26] L. Zhang and H. Yin, “Recipient anonymous and B. Waters, “Fully secure functional encryption: ciphertext-policy attribute-based broadcast encryp- Attribute-based encryption and (hierarchical) inner tion,” International Journal of Network Security, product encryption,” in International Conference on vol. 20, no. 1, pp. 168–176, 2018. Theory and Application of Cryptographic Techniques, [27] L. Zhang and J. Zhang, “Anonymous CP-ABE pp. 62–91, 2010. against side-channel attacks in cloud computing,” [13] L. Liu, Z. Cao, and C. Mao, “A note on one out- Journal of Information Science and Engineering, sourcing scheme for big data access control in cloud,” vol. 33, no. 3, pp. 789–805, 2017. International Journal of Electronics and Information [28] L. Zhang, J. Zhang, and Y. Hu, “Attribute-based en- Engineering, vol. 9, no. 1, pp. 29–35, 2018. cryption resilient to continual auxiliary leakage with International Journal of Network Security, Vol.22, No.5, PP.763-774, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).06) 774
constant size ciphertexts,” Journal of China Uni- Biography versities of Posts and Telecommunications, vol. 23, no. 3, pp. 18–28, 2016. Xiaoxu Gao is a master degree student in the school of [29] L. Zhang, J. Zhang, and Y. Mu, “Novel leakage- mathematics and statistics, Xidian University. Her re- resilient attribute-based encryption from hash proof search interests focus on computer and network security. system,” Computer Journal, vol. 60, no. 4, pp. 541– Leyou Zhang is a professor in the school of mathemat- 554, 2017. ics and statistics at Xidian University, Xi’an China. He [30] M. Zhang, W. Shi, C. Wang, Z. Chen, and Y. Mu, received his PhD from Xidian University in 2009. From “Leakage-resilient attribute-based encryption with Dec. 2013 to Dec. 2014, he is a research fellow in the fast decryption: Models, analysis and constructions,” school of computer science and software engineering at in International Conference on Information Security the University of Wollongong. His current research in- Practice and Experience, pp. 75–90, 2013. terests include network security, computer security, and cryptography. International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 775
A Distributed Density-based Outlier Detection Algorithm on Big Data
Lin Mei1,2 and Fengli Zhang1 (Corresponding author: Fengli Zhang)
School of information and software engineering, University of Electronic Science and Technology of China1 Chengdu, China School of Computer Science and Technology, Southwest Minzu University2 Chengdu, China (Email: [email protected]) (Received Mar. 11, 2019; Revised and Accepted Nov. 6, 2019; First Online Jan. 29, 2020)
Abstract mal patterns. Therefore, it has become one of hotspot directions in data mining, recently. As one of popular issues in data mining area, outlier de- Hawkins who has addressed that an outlier is an ob- tection aims to find the objects which show abnormal be- servation that deviates obviously from other observations haviors from original datasets’ distribution, and it can as to arouse suspicion that it was generated by a differ- be applied in various applications such as bank fraud, ent mechanism [9]. In recent years, there are many studies network intrusion detection, system health monitoring, about outlier detection. For example, distance-based out- medical care, public safety and security, and etc. Re- lier detection [13] and density-based outlier detection [5] cently, the density-based outlier detection has been pro- are well representative works in traditional outlier detec- posed which is the highly efficient and significant method tion methods. However, most of them only consider the for outlier detection processing. It adopts the relative centralized environments or single node processing. With density of an object to indicate the degree of an object the increasing scale of big data, the performance of these is an outlier compared with its neighbors. Specifically, proposed methods cannot satisfy computing requirements it aims at computing the Local Outlier Factor (LOF). In of users. For instance, in the area of credit card fraud this paper, we propose a novel distributed density-based detection, we obtain the users’ trade information as a outlier detection method for large-scale data processing, dataset. If the credit card is stolen, its transaction pat- namely IGBP. First, we split the data space into several tern usually changes dramatically, especially that the lo- grids and then allocates these grids into the data nodes cations of transactions and the purchased items are often with greedy algorithm in a distributed environment. Be- unusual for the authentic card owner. Therefore, we de- sides, we propose a distributed LOF computing method fine these abnormal transaction records as outliers. Then, with KD-tree for detecting density-based outliers in par- the techniques of the outlier detection can help us to iden- allel way. The validity of the proposed approaches is fi- tify outliers to find out whether user accounts have been nally verified by experiments, Experimental results which theft. And it can avoid the property damage. Besides, demonstrate that our proposed method outperform the Outlier detection technology also can used to detect the baselines. cheating of game bots in Massively Multiplayer Online Keywords: Density-based Outlier; Distributed Algorithm; Role-playing Games [17]. Greedy Algorithm; Kd-Tree; LOF Fortunately, there are some recent studies which at- tempt to utilize distributed computing environment to speed up the computation, and there are several related 1 Introduction methods [10, 11, 14] for distributed outlier detection were proposed. For example, E. Lozano and E.Acufia [16] pro- With the development of big data techniques in large scale pose a master-slave architecture for distributed comput- data processing, outlier detection is one of importance but ing. More specifically, each slave node computes its neigh- complex tasks in data mining area, it can be widely used borhood set and sends it to the master node. And the in various applications such as bank fraud, network in- master node will collect all the partial neighborhood sets trusion detection, system health monitoring, medical care and compute LOFs of all the tuples. However, a large and public security protection, and etc. Outlier detection number of tuples in proposed method are aggregated to can help us to discover valuable knowledge and abnor- the master node, and then lots of calculations are needed International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 776 to conduct the result. Thus, the master node will be the present a review about statistical approaches for out- bottleneck when the data scale is increasing. And a high- lier detection. Especially in [19], they present a review performance master node is necessary. Recently, Mei Bai about neural network based approaches for outlier detec- et al. adopt a coordinator and a number of datanodes for tion. Fujimaki et al. present a semi-supervised outlier outlier detection problem [3]. The coordinator is respon- detection method by using a set of labeled “normal” ob- sible for the overall scheduling. Each datanode stores sev- jects [7]. Dasgupta and Majumdar also propose a semi- eral data subsets in grids, and calculates LOFs in the data supervised method [6] for outlier detection task. Subse- subsets. The coordinator in this frame only takes charge quently, distance-based outliers was developed by Knorr of the scheduling and all the actual computations are allo- and Ng [13]. And the index-based, nested loop–based cated to the datanodes. Thus, the coordinator would not and grid-based approaches are well explored to speed up be a bottleneck if the data scale is large. In order to reduce distance-based outlier detection [12, 13]. the network overhead, their grid allocation algorithm allo- Other proximity-based approach is the density-based cates the adjacent grids to the same datanodes. However, outlier detection [5], In order to detect a tuple whether their proposed algorithm only need a small quantity of is an outlier or not, a local outlier factor (LOF) which network communications between pairs of datanotes and defines in [5] represents the degree of this tuple to be an the average numbers of tuples in each grid are likely to outlier is assigned to each tuple [3]. LOF is based on be very different. Hence, to address these limitations, we a concept of a local density, where locality is given by focus on improve the computational complexity which is k nearest neighbors, whose distance is used to estimate the greatest bottleneck of this issue. the density. By comparing the local density of a tuple In this paper, we aim to model density-based outlier to the local densities of its neighbors, one can identify detection in a distributed manner. The general idea is the regions of similar density, and tuples that have a sub- that we attempt to compare the density around an object stantially lower density than their neighbors. These are with the density around its local neighbors. The basic considered to be outliers [8]. Besides, the HilOut algo- assumption of density-based outlier detection method is rithm was proposed by Angiulli and Pizzuti [2]. Aggarwal that the density around a non-outlier object is similar to and Yu [1] develop the sparsity coefficient-based subspace the density around its neighbors, but the density around outlier detection method. Kriegel et al. proposed angle- an outlier object is significantly different from the density based outlier detection [15]. around its neighbors [8]. Through our sufficient analysis, we discover that workload and network communications in computing architecture will increase while the scale of data is increasing. But the increased speed of workload is higher than network communication. Besides, if the 2.2 Outlier Detection in Distributed En- dimensionality of data increases, the performance will be vironments much better. Therefore, we propose an improved algo- rithm in this paper which aims at detecting density-based outliers in distributed environments efficiently compared Lozano and Acufia propose a distributed algorithm to to [3]. Moreover, our experiments are implemented to compute density-based outliers [16]. However, owing to demonstrate the effectiveness and high efficiency. We will all the tuples are transferred to the master node, the work- show the detailed description of algorithm in Section 4. load on the master node is quite heavy. Thus, this method The rest of this paper is organized as follows. In Sec- is unable to achieve good performance when the data scale tion 2, we will overview the related works. Section 3 is huge. states the problem of density-based outlier detection in Mei et al. adopt a coordinator and a number of datan- a distributed environment. Section 4 detailedly presents odes [3]. And the coordinator is responsible for the overall our improved algorithm. Section 5 gives the experimental scheduling. Each datanode stores several data subsets in results. In the end, we conclude this paper in Section 6. grids, and calculates LOFs in the data subsets. After com- parison, the coordinator in this frame is only in charge of the scheduling and all the actual computations are allo- 2 Related Work cated to the datanodes. Hence, the coordinator would not be a bottleneck if the data scale is huge. In order to re- We first make briefly overview of outlier detection in Sec- duce the network overhead, their gird allocation algorithm tion 2.1. Then the previous methods of distributed outlier allocates the adjacent grids to the same datanodes. How- computing are described in Section 2.2. ever, in fact, their algorithm only need a small amount of network communications between pairs of datanotes. In 2.1 Outlier Detection addition, the time and space complexity of LOF are very high. In this paper, we present an improved algorithm Hawkins firstly give a definition about outliers in 1980 [9]. based on theirs to efficiently detect density-based outliers Afterwards, Beckman and Cook [4] present more im- in distributed environments. Moreover, our experiments proved definition and survey. Markou and Singh [18, 19] are implemented to demonstrate the validity. International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 777
3 Preliminaries C 3.1 Problem Formalism D we will show some definitions to better solve our problem and they will be applied in our proposed method. Rdk(A,B)= k-distance (A) Local density: It is estimated by the typical distance B at which a tuple can be reached from its neighbors. And distance is usually adopted in LOF, which is an G additional measure to produce more stable results Rdk(A,G)=d(A,G) within clusters. A Reachability distance: Let d(A, B) be the distance be- tween A and B, k-distance(A) be the distance of the object A to the k-th nearest neighbor. To simplify F the description, we define the distance as Euclidean E distance in the rest of this paper similar to [21]. Hence, 1) There are at least k tuples A0 such that Figure 1: k-distance neighborhood and reachability dis- d(A, A0) ≤ d(A, B). tance when k=3 2) There are at most k − 1 tuples A00 such that d(A, A00) < d(A, B). where the average local reachability density of the neigh- Note that the set of the k nearest neighbors includes all bors divided by the object’s own local reachability density. tuples at this distance, which can in the case of a ”tie” A value of approximately a given threshold indicates that be more than k tuples. We denote the set of k nearest the object is comparable to its neighbors (and thus it is 0 0 neighbors as Nk(A) = A |d(A, A ) ≤ d(A). This distance not an outlier). A value less than the threshold indicates is used to define what is called reachability distance: a denser region (which would be an inlier), while values significantly larger than the threshold indicate outliers. Rdk(A, B) = max{k-distance(A), d(A, B)}. The traditional methods usually form all pair Eu- As shown in following Figure 1, it illustrates the original clidean distance matrix, and then run KNN query to intention of reachability distance. Tuples B and D have proceed further. It is Θ(n2) in terms of both space and the same reachability distance (k=3), while G is not a k time complexity. However, it can be improved with KD- nearest neighbor. tree [20]. Thus, the reachability distance of an object A from B is the true distance of the two objects, but at least the k-distance(A). Objects that belong to the k nearest neighbors of A are considered to be equally distant. The 3.2 Distributed Environment reason for this distance is to get more stable results. Note that this is not a distance in the mathematical definition, As Figure 2 shows, we utilize a distributed framework that since it is not symmetric. consists of a coordinator and a number of datanodes. The The local reachability density of an object A is defined coordinator is for the overall scheduling similar to [3]. by Each datanode is to store a portion of a complete data set. Most of previous algorithms utilize a master-slave 1 LRD(A) = P architecture for outlier detection, while lots of computa- RdkB,A B∈Nk(A) tions are performed on the master node. Following [3], |Nk(A)| we also do that the coordinator in our frame only takes where the inverse of the average reachability distance of charge of the scheduling and all the actual computations the object A from its neighbors. Note that it is not the are allocated to the datanodes. Besides, Density-based average reachability of the neighbors from A (which by outlier detection is to compute the LOF of each tuple for definition would be the k-distance(A)), but the distance a given integer k. First, we use improved GBP (IGBP) at which A can be ”reached” from its neighbors. With algorithm to split the data set into several subsets and duplicate points, this value can become infinite. assign them to the datanodes. After that, our algorithm The local reachability densities are then compared with work via two main steps. First, each datanode processes those of the neighbors using the local tuples. And LOFs of some tuples can be com- P LRD(B) puted directly in the local nodes. Second, we will output B∈Nk(A) LRD(A) LOFs of the rest of the tuples by a few necessary network LOFk(A) = |Nk(A)| communications compared with [3]. International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 778
Algorithm 1 Grid allocation Input: The grid set G; The datanode set N; Output: output: Allocation plan; 1: sort the grids in G according to the number of tuples in a grid in the descending order; 2: for each grid g in G do 3: if there exist datanodes with no grid then 4: n ← randomly choose a datanode with no grid; 5: else 6: Initialize a datanode set N 0;
7: for each datanode ni in N do
Figure 2: Computation frame 8: if ni has the least tuples then 0 9: insert ni into N ; 4 The Improved GBP Algorithm 10: end if (IGBP) 11: if there exists at least one datanode in N 0 with grids which belong to adjacent grids Ng then 4.1 Grid-based Partition 12: n ← choose the datanode in N 0 with In our IGBP algorithm, we first attempt to split the whole the largest number of grids which belong to adjacent d-dimensional space into several isometric grids. While gridsNg; grid-based partition method conducts limitations in the 13: else high-dimensional data, Hence, we cut each dimension into 0 several equal segments (the number of segments is de- 14: n←randomly choose a datanode in N ; noted by s). After that, the space is partitioned into sd 15: end if grids. Let gx1,x2,··· ,xd be the gird that is at the xi-th 16: end for position for dimension i. Next, we give the definition of 17: end if adjacent grid: 18: allocate g to n;
N(gx1,x2,··· ,xd ) = {gy1,y2,··· ,yd | max(1, xi − 1) ≤ yi ≤ 19: end for
min(s, xi + 1), gy1,y2,··· ,yd 6= gx1,x2,··· ,xd }.
Next, we allocate these grids to the datanodes. In order calculate LRDs effectively, we have to compute the k- to speed up the computations of density-based outliers, distances and k-distance neighborhoods for all the tuples we propose an allocation method through considering the first, which is the core part of this section. following two factors. However, in distributed environments, the situation is more complex. It’s difficult to compute the actual k- 1) To obtain high parallelism, we set the number of tu- distances of all the tuples. For example, in Figure 3, con- ples on each datanode almost the same (balance the sidering the tuples in grid g1, the local k-distance neigh- workload). borhood of tuple A is identical to its actual k-distance neighborhood. However, tuple B shows a complex situa- 2) According to Section 3.1, we compute the k-distance tion. Its local k-distance neighborhood is C,D,E, which neighborhood for each tuple and reduce the network cannot be computed unless D,E is transmitted from g2 overhead if we allocate the adjacent grids to the same to gird g1. datanodes as possible. The details of the proposed For previous work in [3], they classify the tuples in a method are shown in Algorithm 1. grid into 2 categories. A tuple whose neighborhoods can be computed in local grid is a grid-local tuple. Otherwise, it is a cross-grid tuple. After that, they proposed an al- 4.2 Distributed LOF Computing gorithm to solve this problem. This part is not the most significant point in our paper. For distributed LOF com- In above part, we have split the data set into several puting, we will make a example to explain our proposed grids and allocated them to the corresponding datanodes. method which is different from previous work. Next, we turn to compute the LOF for each tuple in par- First, as shown in Figure 4, there is a set P with 120 allel way. By analyzing the definitions in Section 3, it tuples in 2-dimensional space, and the number of datan- is clear that LRD is the premise of LOF. In order to odes is 10. Thus, we set s = 4 and split the space into 16 International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 779
Table 1: Improved algorithm process Sequence number Grid ID Number of tuples Allocated datanode Average number of tuples 1 g1,2 11 n1 / 2 g1,1 10 n2 / 3 g3,2 10 n3 / 4 g1,4 10 n4 / 5 g3,4 9 n5 / 6 g3,3 9 n6 / 7 g3,1 8 n7 / 8 g4,1 8 n8 / 9 g4,4 7 n9 / 10 g1,3 7 n10 8.9 11 g2,2 6 n10 9.5 12 g2,4 6 n9 10.1 13 g2,1 6 n7 10.7 14 g4,2 5 n8 11.2 15 g4,3 4 n6 11.6 16 g2,3 4 n5 12
Dim 2 Neighbors of g2,2 & ' g1,4 g2,4 g3,4 g4,4 % ( $ 10 6 9 7 g1,3 g2,3 g3,3 g4,3
g1 g2 7 4 9 4 Figure 3: Example of DLC (k=3) g1,2 g2,2 g3,2 g4,2
11 6 10 5 grids. The related number of tuples is shown at the bot- tom of each grid. According to the algorithm 1, we sort g1,1 g2,1 g3,1 g4,1 the grids by the number of tuples in a grid, and the result is shown in Table 1. Then, for each of the first 10 grids, 10 6 8 8 we randomly allocate it to a datanode with no grid. Af- ter that, totally 89 tuples have been allocated, and the Dim 1 average number of tuples per datanode is 8.9. When al- Figure 4: Example of grids locating the 11th grid g2,2, there are 4 datanodes whose numbers of tuples are not larger than 8.9, including n7; n8; n9; n10. We choose n10 because the number of tuples the number of tuples in datanotes {n1, n2, ··· , n10} in n10 is smallest. Using the same method, we allocate all is {11, 10, 10, 10, 15, 13, 14, 13, 11, 13}. We also obtain the grids to the corresponding datanodes. σ ≈ 1.732. After allocation, the number of tuples in data notes {n1, n2, ··· , n10} is {11, 10, 10, 10, 13, 13, 14, 13, 13, 13}. We use σ (standard deviation) to indicate how spread out 5 Experimental Evaluation a data distribution is. A low standard deviation means that this algorithm balances the workload well. We use We implement our proposed approaches using python pro- computational formula of standard deviation to obtain: gramming language, and evaluate the performance in a q cluster (with 4 data nodes and 1 coordinator) where each σ = 112+102+102+102+132+132+142+132+132+132 ≈ 1.483 10 node (coordinator or datanode) has a Intel Core i5 @ 2.53 Second, according to the previous algorithm in [3], GHz CPU, 32G main memory. We first use a synthetic International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 780
Table 2: The influence of data scale 100000 tuples 500000 tuples 1000000 tuples Method k=10 k=20 k=10 k=20 k=10 k=20 In [16] 782ms 1243ms 5341ms 7988ms 13568 27755 In [3] 512ms 910ms 4122ms 6278 9742 22495 In IGBP 547ms 932ms 3907ms 5611 8472 16625
k=10 Table 3: Open datasets 16000 Number of Instances Number of Attributes approach in [8] Shuttle 58000 9 14000 approach in [9] Census 48842 14 12000 IGBP Drug 215063 6 10000 8000
6000 runtime(ms) 4000 Table 4: Results in three datasets 2000 Runtime(ms) 0 Methods 100000 500000 1000000 Shuttle Census Drug data scale In [16] 12339 34987 47788 In [3] 3374 7120 9759 Figure 5: Runtime comparison IGBP 3115 5780 6880 dataset to verify the efficiency of our proposed method IGBP. Then we choose three open datasets Shuttle, Cen- sus and Drug which have been widely used in outlier de- which have been widely used in outlier detection. The tection to further show the highly efficiency and effective- details has shown in following Table 3. ness of IGBP. As Table 4 shows that our proposed method have low time consumption than [3, 16] based on three real-world 5.1 IGBP with Synthetic Data datasets. More specifically, with the increase of data scale, we find that our proposed method have higher effi- We generate various synthetic data sets to analyze the ciency compared with [3, 16]. performance of our methods. Specifically, we compare our proposed method with [3, 16]. We generate several clustering center points. And the tuples in each cluster follow a Gaussian distribution. Finally, we add some gen- erated noise into dataset, and the dimension of the data 6 Conclusions set is 3. The detailed parameter settings and the runtime are summarized in Table 2. In this paper, we focus on the problem of density-based As shown in the table, with the increase of data scale, outlier detection in distributed environments for high- the runtime for our algorithm show better performance dimensional and large-scale data sets. We first introduce than previous work [3, 16]. Besides, experimental results the basic concept of LOF. We summarized the approach demonstrate that workload balance is more significant in [3] to solve the problem of distributed LOF computing than network overhead in this issue. Figure 5 illustrates in detail. The advantages and disadvantages of the ap- that the impact of different data scale, with the increase of proach are discussed. Then, we propose an improved algo- data scale, our proposed method outperform better base- rithm based on greedy algorithm, namely IGBP. With the lines. experimental results, we show the efficiency and effective- ness of the proposed approaches compared with previous 5.2 IGBP with Open Datasets work. The results demonstrate that our algorithm out- perform baseline. In future, we will considerate more effi- In this part, we use three popular real-world datasets to cient policies to balance the time consumption and space evaluate our proposed method. Shuttle, Census and Drug consumption. International Journal of Network Security, Vol.22, No.5, PP.775-781, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).07) 781
Acknowledgments [13] E. M. Knox and R. T. Ng, ”Algorithms for mining This research is supported by the National Natural Sci- distancebased outliers in large datasets,” in Proceed- ence Foundation of China (61472064,61602096) and the ings of the International Conference on Very Large Sichuan Provincial Science and Technology Department Data Bases, pp. 392–403, 1998. Project (2018GZ0087,2016FZ0002). [14] A. Koufakou, J. Secretan, J. Reeder, K. Cardona, and M. Georgiopoulos. ”Fast parallel outlier detec- tion for categorical datasets using mapreduce,” in References IEEE International Joint Conference on Neural Net- works, pp. 3298–3304, 2008. [1] C. C. Aggarwal and P. S. Yu, ”Outlier detection [15] H. P. Kriegel, A. Zimek, et al., ”Angle-based outlier for high dimensional data,” in ACM Sigmod Record, detection in high-dimensional data,” in Proceedings vol. 30, pp. 37–46, 2001. of the 14th ACM SIGKDD International Conference [2] F. Angiulli and C. Pizzuti, ”Outlier mining in large on Knowledge Discovery and Data Mining, pp. 444– high-dimensional data sets,” IEEE Transactions on 452, 2008. Knowledge and Data Engineering, vol. 17, no. 2, pp. [16] E. Lozano and E. Acufia, ”Parallel algorithms for 203–215, 2005. distance-based and density-based outliers,” in Fifth [3] M. Bai, X. Wang, J. Xin, and G. Wang, ”An effi- IEEE International Conference on Data Mining, pp. cient algorithm for distributed density-based outlier 4, 2005. detection on big data,” Neurocomputing, vol. 181, [17] Y. Lu, Y. Zhu, M. Itomlenskis, S. Vyaghri, and H. pp. 19–28, 2016. Fu, ”Mmoprg bot detection based on traffic analy- [4] R. J. Beckman and R. D. Cook, ”Outlier ...... sis,” International Journal of Electronics and Infor- s,” Technometrics, vol. 25, no. 2, pp. 119–149, 1983. mation Engineering, vol. 2, no. 1, pp. 18–26, 2015. [5] M. M. Breunig, H. P. Kriegel, R. T. Ng, and J. [18] M. Markou and S. Singh, ”Novelty detection: A Sander, ”Lof: Identifying density-based local out- review—part 1: statistical approaches,” Signal Pro- liers,” in ACM Sigmod Record, vol 29, pp. 93–104, cessing, vol. 83, no. 12, pp. 2481–2497, 2003. 2000. [19] M. Markou and S. Singh, ”Novelty detection: A re- [6] D. Dasgupta and N. S. Majumdar, ”Anomaly de- view—part 2:: Neural network based approaches,” tection in multidimensional data using negative se- Signal Processing, vol. 83, no. 12, pp. 2499–2521, lection algorithm,” in Proceedings of the Congress 2003. on Evolutionary Computation (CEC’02), vol. 2, pp. [20] F. P. Miller, A. F. Vandome, and J. McBrew- 1039–1044, 2002. ster, Kd-Tree, 2009. (https://www.amazon.fr/ [7] R. Fujimaki, T. Yairi, and K. Machida, ”An ap- kd-tree-Frederic-P-Miller/dp/6130094590) proach to spacecraft anomaly detection problem us- [21] E. Schubert, A. Zimek, and H. P. Kriegel, ”Local ing kernel feature space,” in Proceedings of the outlier detection reconsidered: a generalized view on Eleventh ACM SIGKDD International Conference locality with applications to spatial, video, and net- on Knowledge Discovery in Data Mining, pp. 401– work outlier detection,” Data Mining and Knowledge 410, 2005. Discovery, vol. 28, no. 1, pp. 190–237, 2014. [8] J. Han, J. Pei, and M. Kamber, ”Data min- ing: Concepts and techniques,” A vol- ume in the Morgan Kaufmann Series in Biography Data Management Systems, 2012. (https: //www.sciencedirect.com/book/9780123814791/ LinMei received B.Sc and M.Sc in computer science data-mining-concepts-and-techniques) and technology from University of Electronic Science and [9] D. Hawkins, ”Identification of outliers,” Monographs Technology of China in 2003 and 2006, He is currently on Statistics and Applied Probability, vol. 11, 1980. a lecturer in Southwest Minzu University. And pursing [10] Q. He, Y. Ma, Q. Wang, F. Zhuang, and Z. Shi, ”Par- Ph.D degree in University of Electronic Science and allel outlier detection using kd-tree based on mapre- Technology of China. His research interests include duce,” in IEEE Third International Conference on network security and cloud computing. Cloud Computing Technology and Science, pp. 75–80, 2011. Fengli Zhang received B.Sc, M.Sc and Ph.D in computer [11] E. Hung and D. W. Cheung, ”Parallel mining of science and technology from University of Electronic outliers in large database,” Distributed and Parallel Science and Technology of China in 1983, 1986 and 2007. Databases, vol. 12, no. 1, pp. 5–26, 2002. She is currently a professor in University of Electronic [12] E. M. Knorr, R. T. Ng, and V. Tucakov, ”Distance- Science and Technology of China, her research interest based outliers: algorithms and applications,” The include database management, network security and big VLDB Journal—The International Journal on Very data processing. Large Data Bases, vol. 8, no. 3-4, pp. 237–253, 2000. International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 782
Binary Executable Files Homology Detection with Genetic Algorithm
Jinyue Bian1 and Quan Qian1,2 (Corresponding author: Quan Qian)
School of Computer Engineering and Science, Shanghai University1 Shanghai 200444, China Materials Genome Institute, Shanghai University, Shanghai, China2 (Email: [email protected]) (Received Mar. 16, 2019; Revised and Accepted Dec. 29, 2019; First Online May 9, 2020)
Abstract ing the corresponding relations between their vertexes and considering the consistency of edge sets at the same Software homology detection is very meaningful for soft- time. CFG similarity comparison methods include graph ware copyright protection and malicious code variants de- edit distance (GED), string matching, execution sequence tection. In this paper, we propose a genetic algorithm to comparison, matching program basic blocks, and so on. justify the binary code similarity. First of all, the binary However, in general, the computation complexity of these executable files are converted into control flow graph , methods are quite large. Therefore, in this paper, we and then use genetic algorithm to compute the similar- propose a new graph matching algorithm for binary code ity among control flow graphs, which is regarded as the similarity analysis. First, the binary file will be converted evaluation metric for software homology detection. The into CFG with relatively complete control flow informa- experimental results show that the method is not only ef- tion using the dynamic and static combination technique. fective, but also the average time efficiency is 0.3 times And then use genetic algorithm (GA) to calculate the sim- that of the classical algorithm of graph edit distance. ilarity between CFGs. The algorithm can be used to ac- Keywords: Binary Executable Files; Control Flow Graph; curately identify the isomorphism subgraph relationship Genetic Algorithm; Homology Detection and the exact identical CFGs, which can shorten the run- ning time greatly, so as judging the software homology effectively. 1 Introduction The organization of the paper is as follows: Section II describes some related work. Section III presents the With the rapid development of software, code plagia- framework of the proposed method. The experimental rism is emerging endlessly, which seriously threatens the results and detailed analysis are discussed in Section IV. software intellectual property rights. In addition, in Section V concludes the whole paper and outlines some March 2018, a total of 7,235,983 viruses were found in directions of the future work. the National Computer Virus Emergency Center,36,931 new viruses were added, and 76,606,087 computers were infected [13]. Viruses or malware’s escape from detection 2 Related Work through code variants, but the core code does not change much. Therefore, detection of code similarity is very nec- So far, there are certain amounts of research on homology essary. And the existing methods can be divided into identification related area. Here, we will give some back- two categories, source code based or binary code based. ground work in two aspects, including sequenced based Considering sometimes we can not get the source code. analysis method and graph based method that closely re- Therefore, binary code based homology detection is more lated to this paper. promising. That is, the similarity analysis based on bi- nary code is very important. 2.1 Sequence-based Analysis Method For binary code similarity detection, graph based method plays an important role in which Control Flow Aiming at the problem of source code plagiarism, Guo Graph (CFG) is one of the most commonly used method. proposed an improved code plagiarism detection algo- Therefore, for binary code homology detection, it can be rithm based on abstract syntax tree, which can detect transferred to graph similarity matching problems. That plagiarism effectively [22]. Koschke demonstrated how is, given two graphs, graph matching involves establish- suffix trees can be used to obtain a scalable comparison, International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 783 and presented a method to improve the accuracy through to generate CFG, and then use graph matching to eval- user feedback and automatic data mining [10]. Liu pre- uate the binary software homology. The main contribu- sented an improved abstract syntax tree, which can effec- tions of the paper are as follows: tively detect code plagiarism by modifying the variable • Design a special GA to evaluate the similarity of bi- type and adding meaningless variables [14]. nary files, which can be used to accurately identify However, the above mentioned methods are all the isomorphism subgraph and the exact identical sequence-based ones, that can be bypassed by interference CFGs, which is normally regarded as a NP-complete techniques, such as instruction rearrangement, equivalent problem. And the experimental result shows when instruction sequence replacement, branch inversion, etc. the number of CFG nodes is large, our method can And the essence of interference is that malicious code still get the CFG similarity and comparing with other can produce homogeneous code with different syntax but classical methods, the time overhead of our algorithm same semantics. Therefore, the other direction is dynamic is obviously reduced. based analysis, which generally relies on dynamic execu- tion log to analysis the program behaviour. For instance, • For GA based CFG matching, we give a complete through analyzing the anomaly and similarity of process framework including CFG mapping to chromosome, access behavior in data flow dependent networks, Mao et operation design for the crossover, mutation, selec- al. introduced an active learning method by minimizing tion and fitness evolution. risk estimation, which can improve the detection effect of malicious code apparently [23]. Although dynamic based method can extract the code running features, it relies on 3 The Proposed Method virtual running environment and also can be challenged by anti-virtual machine attacks [21].Yang et al. intro- The method proposed can be divided into two steps. The duces a method of defect detection based on homology first step is CFG extraction from the binary executable detection technology for open source software [28]. files combining static and dynamic recovery method. And So, we can use more information, for instance, the func- the second step is CFG encoded and read into GA to tion call diagram, to further detect malicious code. In compute the similarities among different graphs. The this aspect, Chae proposed a software plagiarism detec- flowchart of the method is shown in Figure 1, where the tion system using an API labeled CFG (A-CFG) that implementation of each step is described in detail below. abstracts the functionalities of a program [2]. Lim pre- sented a method to compare CFGs by matching the basic 3.1 Binary Files’ CFG Extraction block of a binary program, which can effectively identify Although static method has high code coverage, it cannot the similar CFGs [12]. Wu gave a parallel method to ex- obtain complete control flow. Although dynamic method tract the function call graph from the source code, and can obtain the exact program execution information, the introduced a new software structure information compar- code coverage is low. So, the hybrid recovery method is ison algorithm to effectively check the homology of the the combination of dynamic and static method, which en- software [24]. sures not only the high code coverage but also can solve the indirect jump issues. And then use symbolic execu- 2.2 Graph Matching Based Method tion and reverse slicing techniques to further traverse the binary file to obtain complete control flow information. Graph matching is a classical problem in computer sci- ence. At the same time, GA also has some applications 3.1.1 Symbolic Execution in graph matching, code similarity detection and other security areas [20]. Moon proposed a malware detection Symbolic execution is the symbol substitution for real val- system using a hybrid GA, in which a malware is repre- ues when running a program. The advantage is that we sented as a directed dependency graph and transforms the can traverse the paths of a program as much as possi- malware detection problem to the subgraph isomorphism ble by using the variable symbols. KLEE [9] and S2E [3] problem [8]. Jaeun gave a multi-objective GA with a lo- are two typical symbolic execution tools based on source cal search heuristic, comparing the degrees of each vertex code and binary code respectively. In symbolic execu- of two graphs that are mapped, and counted the num- tion, we can convert the program operation process into ber of mismatched vertices [5]. Kim proposed a new cost a mathematical expression. Whenever a judgment and function based on the program dependency graph using a jump statement are encountered, symbolic execution the GA to measure the similarity of the program, the gathers the path constraints of the current execution path method was proved to be feasible [7]. Xiang presented into the constraint set of that path. Through constraint an improved GA, which mainly studied the isomorphism solver, the path reachability can be obtained by solv- of subgraphs, and designed a special crossover function ing the constraint set. Combining the actual execution and a new fitness function to measure the evolution pro- with the symbol execution, that first emerged in 2005, in cess [25]. DART [18] and CUTE [19], many problems encountered In this paper, we combine static and dynamic method in traditional static symbol execution have been solved. International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 784
Figure 1: The flowchart of binary code homology detection
However, there are some challenges to symbolic execu- also for binary codes, it lacks structural information, so tion, for instance, the path explosion, solver unable to IDA Pro’s disassembly results are far from ideal [29]. solve, etc. For reducing the path explosion, heuristic func- In this paper, we use a new CFG tool Angr [27], which tions, reliable program analysis and software verification is a binary analysis platform based on Python. It im- techniques are some common strategies for reducing the plements different symbolic execution strategies, such as complexity of path exploration. For constraint solving veritesting [1]. The Angr system is originally designed problems, there are two kinds of optimization methods. for the DARPA Challenge. It can load different binary One is the elimination of uncorrelated constraints and the format files and their dependent libraries, and integrate other is incremental solution. many advanced binary analysis techniques to generate CFGs by combining dynamic and static methods. More- 3.1.2 Reverse Slicing over, Angr can optimize the basic block overlap in the CFG, and the overlapping parts are merged and subdi- Program slicing is an important technique for program vided to obtain more accurate control flow information. analysis, Negi et al. highlights the different test cases and Meanwhile, since Angr analyses and removes the edges comparative analysis of program slicing methods which generated by loop structure and pseudo-return, it sim- corresponds to the applications which are usually utilized plifies the CFG and alleviates the explosion problem of in software modification activities [15]. Given the slic- program state space, which improves the reconstructing ing standard < p, V >, the forward slicing of program ability of CFG. p contains all statements and control conditions affected by variables in V , while the backward slice of program p 3.2 GA for CFGs Similarity Calculation contains all statements and control conditions that have direct or indirect effects on variables in V . Given two directed graphs, and then number the nodes in the graph [4], for example, as shown in Figure 2. We use 3.1.3 CFG Generation and Optimization GA to select the best individual after several generations of evolution, such as crossover and mutation, until the Currently, the common tools for generating CFG are FXE evolution stop criteria is satisfied. (forced execution engine) [26], IDA Pro [17], et al. FXE can solve the problem of indirect branch jump by dynam- 3.2.1 Chromosome Initialization ically executing code, it is only suitable for the binary ex- ecutable of Windows PE format under x86 architecture. The GA adds a node of the matching graph to each chro- IDA Pro can generate CFG and function call graph, but mosome once each iteration, a corresponding node of the the repeated nodes in the graph are not merged or deleted, matched graph is generated by random function until all International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 785
Figure 2: Two examples of numbering nodes for directed graphs
the nodes of the matching graph are added to the chro- returns the current chromosome number 2, and so on. mosome. According to Figure 2, the initialization process is shown in Table 1, all the left nodes in the chromosome Pk Sk i=1 fi comes from a relatively small matching graph (the left one bk = = L . (1) S P f of Figure 2) and the right nodes come from the matched i=1 i graph (the right one of Figure 2). About the matching Where f denotes the fitness value and L = sizepop, k graph, adding some rules for initialization: 6 sizepop. • When the in-degree of the left node is zero, and the out-degree of the right node is zero, they do not 3.2.3 Chromosomes Crossover match. And if the out-degree of the left node is zero, and the in-degree of the right node is zero, it will Before a crossover, we generate a number between 0 and not match. For example, in Figure 2, node 5 of the 1 through a random function, denoted as rand. If rand 6 left graph doesn’t match node 7 of the right graph, cross, then do crossover operation, otherwise, keep silent. because the in-degree of node 5 is zero and the out- Two chromosomes are extracted from the population degree of node 7 is zero. by the selection function, the right nodes of the two • The current right node number is different from the chromosomes (noted as P arent one and P arent two) are previous right node number in the chromosome, and crossed, and the beginning position of the middle part the number does not exceed the total number of is marked as “begin”, the ending position of the mid- nodes in the matched graph. For example, in the dle part is marked as “end”. begin and end generated first chromosome of Table 1, the node corresponding by random functions must satisfy begin < end, and the to the left node 2 must not be node 5 and the number difference is not equal to the length of the whole chro- doesn’t exceed 7. mosome. Let the middle part of the P arent one as the middle part of the child one after crossing, then let the nodes of the P arent two in turn to fill in the child one, 3.2.2 Chromosomes Selection and after that do conflict detection. If there is a conflict, traversing backward in turn. After the traversal, if the Both crossover and mutation operations in GA rely on se- nodes of the child one chromosome is not filled up, then lection function. The selection operation in this paper is use the random function to generate the missing node and similar to roulette-wheel selection. First of all, a random then perform conflict detection. According to the above function produces a number between 0 and 1 denoted as crossover rules, can obtain two child chromosomes after a. Sum the fitness value of all chromosomes in the popula- the crossover. tion as S; Next, calculate the cumulative fitness value for For example, take the right node of the first and second each chromosome, that is the sum of the fitness from the chromosome in Table 1 to do a crossover and assume that first chromosome to the current one k, marked as Sk. Set begin = 3, end = 5, the specific crossover steps with the Sk as the divisor and S the dividend, the resulting quo- P arent one in mind are shown in Figure 3. Thus, the tient is recorded as bk and compared with a, as shown in chromosome produced by the crossover are 1 → 6, 2 → Equation (1). If a 6 bk, returns the current chromosome 5, 3 → 4, 4 → 2, 5 → 1, 6 → 3. number k. Combine two chromosomal populations before and af- For example, in Table 1, according to Equation (1), ter the crossover and then sorted according to the fitness the cumulative fitness value of the first chromosome is value in ascending order. If the population size is set to f1 b1 = f1+f2+f3+f4 , and the cumulative fitness value of sizepop, then select the top sizepop chromosome to form f1+f2 the second chromosome is b2 = f1+f2+f3+f4 . If a 6 b2, the population after crossover. International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 786
Table 1: An example of chromosome initialization iterations # First Second Third Fourth chromosome chromosome chromosome chromosome 1 1 → 5 1 → 6 1 → 1 1 → 2 2 1 → 5, 2 → 6 1 → 6, 2 → 1 1 → 1, 2 → 4 1 → 2, 2 → 3 3 1 → 5, 2 → 6 1 → 6, 2 → 1 1 → 1, 2 → 4 1 → 2, 2 → 3 3 → 4 3 → 4 3 → 2 3 → 6 4 1 → 5, 2 → 6 1 → 6, 2 → 1 1 → 1, 2 → 4 1 → 2, 2 → 3 3 → 4, 4 → 2 3 → 4, 4 → 2 3 → 2, 4 → 3 3 → 6, 4 → 4 5 1 → 5, 2 → 6 1 → 6, 2 → 1 1 → 1, 2 → 4 1 → 2, 2 → 3 3 → 4, 4 → 2 3 → 4, 4 → 2 3 → 2, 4 → 3 3 → 6, 4 → 4 5 → 1 5 → 5 5 → 6 5 → 1 6 1 → 5, 2 → 6 1 → 6, 2 → 1 1 → 1, 2 → 4 1 → 2, 2 → 3 3 → 4, 4 → 2 3 → 4, 4 → 2 3 → 2, 4 → 3 3 → 6, 4 → 4 5 → 1, 6 → 3 5 → 5, 6 → 3 5 → 6, 6 → 5 5 → 1, 6 → 5 fitness function f1 f2 f3 f4 value
Figure 3: An example of crossover step International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 787
3.2.4 Chromosome Mutation 3.2.5 Fitness Calculation for Each Chromosome In this paper, the fitness function F is divided into two Each chromosome in population is selected in turn on the parts F 1, F 2, and define F = F 1 + F 2. F 1 is composed basis of the crossover operation. We generate a num- of the mismatched nodes number f1 and the mismatched ber between 0 and 1 through a random function, which edges number f2, that is, F 1 = f1+f2, where f1 and f2 is denoted as rand. If rand 6 mutation, the mutation are both for smaller graphs. For example, in Table 1, for operation is performed, otherwise, no mutation. When the 4th chromosome, after 3 times of iteration, the results the nodes of the second graph are not all added to the are: 1 → 2, 2 → 3, 3 → 6, at this time f1 = 3, f2 = 2. chromosome, we use different random functions to gener- However, only according to the results obtained by F 1 ate a mutation position (position) and a mutation value is not scientific, because when two graphs are exactly the (num) in the chromosome, and then do conflict detec- same and two graphs are isomorphic subgraphs, the re- tion. If there is no conflict, replace them. Otherwise, a sults of F 1 are all zero. Therefore, we add the F 2 part, new mutation is generated. If all the nodes of the second as shown in Equation (2), if the number of nodes in the graph have been added to the chromosomes, use random two CFGs is equal, then F 2 is zero, otherwise it is not functions to generate two different positions in the chro- zero. mosome (i and j), and exchange the right nodes of the node one two positions as the mutated chromosomes. F 2 = 1 − . (2) node two For example, the third chromosome in Table 1, not all Where node one represents the number of nodes of the nodes of the second graph are added to the chromosome at smaller graph, and node two the larger graph. the sixth iteration, and suppose position = 3, num = 4, it Therefore, the fitness function of the proposed method is obvious that the mutation value generated at this point is shown as Equation (3). conflicts with the subsequent right node, so suppose we re- generate a mutation value num = 7, and there is no con- node one F = F 1 + F 2 = f1 + f2 + 1 − . (3) flict, then the chromosome mutation is shown as the left node two figure in Figure 4. Therefore, after the third chromosome With this fitness function, if F 1 = 0, F 2 = 0, the nodes mutation, it’s 1 → 1, 2 → 4, 3 → 7, 4 → 3, 5 → 6, 6 → 5. and edges of two CFGs are exactly the same. And if If there are 6 nodes in the second graph and all of them F 1 = 0, F 2 6= 0, the two CFGs are isomorphic subgraphs. have been added to the chromosome, assuming i = 2, j = 5, the chromosome mutation is shown as the right figure in Figure 4. Thus, after the 3rd chromosome mu- 3.2.6 Stop Criteria of Population Evolution tation, it’s 1 → 1, 2 → 6, 3 → 2, 4 → 3, 5 → 4, 6 → 5. When any one of the followings three rules occurs, the GA will stop iteration and return the chromosome with the smallest fitness value. • The evolution times reach the predefined number; • During evolution, the F1 value of a chromosome equals 0; • After several iterations, comparing with the popula- tion changes before and after each evolution, there are almost no changes in the fitness value.
4 Experiments 4.1 Experimental Data and Environment Kernel32.dll is a very important dynamic link library file in Windows, which belongs to kernel level file. It con- Figure 4: An example of mutation process trols the system memory management, the data input and output operation and interrupt processing. When the Windows start, the kernel32.dll resides in a specific write-protection area in memory, preventing other pro- According to the rule of mutation, we perform the op- grams from occupying the memory area [16]. We select eration of chromosome variation, and we compare the fit- loadimagefile, loadappinitdlls and basepappinitdlls in ness value of chromosomes before and after mutation. If kernel32.dll and write.exe as experimental data. In de- the fitness value of the chromosome is not reduced after tail, we extract the experimental data files from Win- mutation, the mutation operation is canceled. dows 7 and Windows 10, altogether 8 files. write.exe, International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 788 loadimagefile, loadappinitdlls, basepappinitdlls of Win- 4.3 Experimental Results and Analysis dows 10 system. For Windows 7 system, named them as write 7.exe, loadimagefile 7, loadappinitdlls 7 and 4.3.1 Comparison of CFG Generated by Angr basepappinitdlls 7, and numbered them as 1,... 8, for and IDA subsequent description. All the experiments is under the In order to verify the integrity and simplification of the Windows 10 system, and the related software include the CFG generated by Angr, we compare and analyze the IDA, Angr, and VS2017. In addition, for GA parame- CFG generated by Angr and IDA. For example, the CFG ters, we set the number of population (sizepop) 20, the of the CADET 0001 file given by the DARPA Challenge, number of evolution (maxgen) 2000, the crossover rate Figure 5 is a CFG generated using Angr, and Figure 6 by (cross) 0.3, and the mutation rate (mutation) 0.8. IDA. The number of basic blocks in the CFG generated by Angr is more than that of IDA, and the basic block of 4.2 Comparative Experimental Methods CFG generated by Angr also contains the basic block of CFG generated by IDA. After merging overlapped nodes Among all graph matching algorithms, GED is the clas- and eliminating cyclic and unreachable edges and nodes, sical one with good fault tolerance and be suitable for Angr can obtain more accurate CFG, which can provide various types of graphs. Although there are other graph a more concise CFG for subsequent analysis. similarity computation methods, especially the lately hot graph neural network (GNN), it needs not only massive data to train the network but also rich computational re- sources. Therefore, we select a series of GED algorithms as comparative ones, which can be more targeted when compared with the experimental results of the proposed method.
4.2.1 GED The GA proposed in this paper is mainly used to calcu- late the similarity between two directed graphs, which is consistent with the classical GED method. Therefore, our comparative experiment selects the edit distance method and its two variant methods. GED is the sum of the min- imum editing operation costs required to edit a source graph into a target graph. The cost of each step is ob- tained by defining the corresponding cost function [30]. In this paper, the cost of replacing, deleting and inserting the nodes and edges of the directed graph in the edit- distance method is all set to 1. Figure 5: CFG based blocks generated by Angr 4.2.2 Edit Distance Based on Binary Linear Pro- gramming Formula In 2015, Julien Lerouge proposed a new binary linear programming formulations for computing the exact GED between two graphs (GED linear) [11], the advantage of which is the universality of the formula, the similarity be- tween digraphs and undirected graphs can be calculated. But when the number of nodes is too large, the method does not work, and the time cost of this method is larger than that of other methods. Figure 6: CFG based block generated by IDA
4.2.3 Edit Distance Based on Hausdorff Match- ing 4.3.2 Comparison of GA with Other Experimen- In 2015, Andreas Fischer proposed a quadratic time tal Methods approximation of GED based on Hausdorff matching (GED distance) [6]. The advantage of the method is that First, we get the CFGs of the control group files, and the the time performance is greatly improved compared with nodes and edges of the CFGs are shown in Table 2. Then other methods. However, the disadvantage is that some take one of them and compare it with the CFGs of the accuracy is lost, that is, a small loss of precision. control group. The experimental results are demonstrated International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 789 by similarity trend diagram, in which the solid line refers to the left Y axis and the dashed line refers to the right Y axis. In this paper, the smaller the result, the greater the similarity between the two graphs.
(2.1) Similarity between loadimagefile and contrast files. CFG of the loadimagefile consists 8 nodes and 12 edges, and the similarity trends are shown in Fig- ure 7. Results of GA and three comparative ex- perimental methods show that the descending or- der of similarity with loadimagefile is 1 = 2 > 4 > 3 > 5 > 6 > 8 > 7. However, for GED linear and GED distance, the edit distance be- tween loadimagefile and itself is not zero, while GED and GA can get that there is no differ- ence between loadimagefile and itself, and the two Figure 8: Time performances of four different methods CFGs are exactly the same. It further shows that for computing loadimagefile loadimagefile of kernel32.dll in Windows 10 is not modified, just the same as in Windows 7.
nodes is large, GED linear cannot calculate the similarity (distance value) of two graphs. Just as in Table 3, we use NA to describe. Therefore, we mainly concern the other 3 methods and the similarity results are shown in Figure 9.
Figure 7: Similarity between loadimagefile and contrast files
The time performance of the four methods is shown as Figure 8. From Figure 8, it shows that the runtime Figure 9: Similarity comparison between loadappinitdlls of GA is relatively stable. But with the number of and contrast files nodes increasing, the other three methods need more time. When the loadimagefile file similarity calcu- lation with the file #8, the GA method needs the From Figure 9, it says that, for loadappinitdlls, the least running time, and both GED and GED linear similarity order calculated by GED distance is 4 > 1 = methods cost a long runtime. And on average, the 2 > 3 > 5 > 6 > 8 > 7, and the distance between running time of GED linear is 9 times that of GA, loadappinitdlls and itself is not 0. When using GED, GED 3 times, and GED distance 0.6 times. But com- the similarity order is 3 > 4 > 5 > 6 > 1 = 2 > 8 > 7, paring with GED distance, GA is more accurate and and loadappinitdlls is exactly the same as itself. For GA reasonable for identifying identical CFGs. method, the result is consistent with the GED method. (2.2) Analysis of similarity between loadappinitdlls and Therefore, the results obtained, the results obtained by contrast files. GED distance and GED are inconsistent, and GA is con- sistent with that of GED. And when the two programs are The CFG of loadappinitdlls contains 17 nodes and 29 exactly the same, the distance value should be 0, for in- edges. As mentioned in Section 4.2, when the number of stance, loadappinitdlls and file #3. Therefore, the results International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 790
Table 2: Nodes and edges of experimental files CFG filenum(#) 1 2 3 4 5 6 7 8 Node number 8 8 17 10 24 25 48 45 Edge number 12 12 29 16 36 39 68 64
Table 3: Distance calculation by the GED linear method filenum filename distance value (#) 1 loadimagefile 46 2 Loadimagefile 7 46 3 loadappinitdlls NA 4 Loadappinitdlls 7 50 5 basepappinitdlls NA 6 Basepappinitdlls 7 NA 7 Write.exe NA 8 Write 7.exe NA
obtained by GA and GED are more reasonable. Further, the experiment shows that loadappinitdlls of kernel32.dll in Windows 10 is slightly modified from Windows 7. When calculating the similarity between the loadappinitdlls and contrast files, the time perfor- mance of the three methods is shown in Figure 10. It can be seen that GA is less than that of both GED and GED linear methods, and when the number of nodes is too many, the GED linear cannot get result in a reasonable time. When calculating similarity between loadappinitdlls and the file #3, GA is significantly longer than that of GED distance. And the main reason is file #3 is the loadappinitdlls file itself, and GA will evolve to a perfect match. When a chromosome fitness value is 0, the evolution stops. But for GED distance, it is not 0. On average, running time for GED is 3 times that of the GA, and GED distance 0.6 times. Figure 10: Time performance of 4 different methods for (2.3) Similarity comparison between loadappinitdlls 7 computing loadappinitdlls and contrast files. The CFG of loadappinitdlls 7 contains 10 nodes and 16 edges. The similarity diagram is as Figure 11. According For loadappinitdlls 7, the time performance of the to Figure 11, the results calculated by the two variations four methods are shown in Figure 12. It can be seen of editing distance show that the order of similarity with that the time cost of the GED and GED linear meth- loadappinitdlls 7 is 1 = 2 > 4 > 3 > 5 > 6 > 8 > 7, ods is gradually increasing with the increase of the num- and loadappinitdlls 7 is not exactly the same as itself. ber of nodes, but the trend of GA is almost the same as For GED and GA method, the order of similarity with that of GED distance. On average, the running time of loadappinitdlls 7 is 4 > 1 = 2 > 3 > 6 > 5 > 8 > GED linear is 7 times that of GA, GED is 3 times, and 7. So, for GA and GED, there is no difference between GED distance is 0.5 times. loadappinitdlls 7 and itself, and two CFGs are identical. Based on the results, the loadappinitdlls 7 should be the most similar to itself, so the most similar file number 5 Conclusions and Future work should be 4. The results of the four methods show that the file with the least similarity to loadappinitdlls 7 is In this paper, we proposed a genetic algorithm method the file #7. Therefore, it can be proved that the method to calculate the similarity between two binary files. First, proposed in this paper is reasonable to some extent. the binary file is converted into CFG, and then the simi- International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 791
Figure 11: Similarity comparison between Figure 12: Time performance of 4 different methods for loadappinitdlls 7 and contrast files computing loadappinitdlls 7
larity between CFGs are calculated by GA. For similarity China (2018YFB0704400, 2016YFB0700504, calculation, the proposed method is consistent with the 2017YFB0701601), Research and Development Program GED method and can be used to accurately identify the in Key Areas of Guangdong Province (2018B010113001). isomorphism subgraph relationship and the exact identi- The authors gratefully appreciate the anonymous review- cal CFGs, which is the basis of software homology detec- ers for their valuable comments. tion. GED is a very classical graphic similarity calculation algorithm. Compared with other methods, GED has the advantages of high accuracy and simple operation. The References distance can be calculated simply by inputting 2 pairs of graphs. The similarity trend of GA in experiment (2.1) is [1] T. Avgerinos, A. Rebert, K. C. Sang, and D. Brum- almost the same as the three comparison algorithms, and ley, “Enhancing symbolic execution with veritest- the results of GA in experiment (2.2) and (2.3) are only ing,” in International Conference on Software En- the same as those of GED algorithm. Since the difference gineering, pp. 1083–1094, 2014. between the target file and itself should be equal to 0, we [2] D. K. Chae, J. Ha, S. W. Kim, B. J. Kang, and can infer that the results of GA and GED are more rea- E. G. Im, “Software plagiarism detection: A graph- sonable. Moreover, the proposed method is high efficient, based approach,” in ACM International Conference especially for graphs with massive nodes. Also, according on Conference on Information Knowledge Manage- to the fitness function designed in this paper, GA can ef- ment, pp. 1577–1580, 2013. fectively identify whether the two graphs are isomorphic [3] V. Chipounov, V. Georgescu, C. Zamfir, and G. sub-graphs or exactly the same. About time performance, Candea, “Selective symbolic execution,” in The on average, GA is 0.3 times that of the GED, 0.1 times Workshop on Hot Topics in System Dependability, the GED linear and 1.7 times the GED distance. pp. 1286–1299, 2009. About future work, some directions should be aug- [4] H. G. Choi, J. H. Kim, and B. R. Moon, “A hybrid mented. First of all, CFG is the analysis basis. How incremental genetic algorithm for subgraph isomor- to enrich the CFG is a direction worth further study. Af- phism problem,” in Conference on Genetic and Evo- ter that, for GA based software homology detection, it lutionary Computation, pp. 445–452, 2014. should be reinforced in more real and wide areas applica- [5] J. Choi, Y. Yoon, and B. R. Moon, “An efficient ge- tions. Meanwhile, some hyper-parameters of GA should netic algorithm for subgraph isomorphism,” in Con- be optimized automatically and adaptively according to ference on Genetic and Evolutionary Computation, different applications. pp. 361–368, 2012. [6] A. Fischer, C. Y. Suen, V. Frinken, K. Riesen, and H. Bunke, “Approximation of graph edit distance based Acknowledgment on hausdorff matching,” Pattern Recognition, vol. 48, no. 2, pp. 331–343, 2015. This work is partially sponsored by National [7] J. Kim, H. G. Choi, H. Yun, and B. R. Moon, “Mea- Key Research and Development Program of suring source code similarity by finding similar sub- International Journal of Network Security, Vol.22, No.5, PP.782-792, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).08) 792
graph with an incremental genetic algorithm,” in ACM Transaction on Privacy and Security, vol. 21, Genetic and Evolutionary Computation Conference, no. 1, 2018. pp. 925–932, 2016. [22] G. Tao, G. Dong, Q. Hu, and B. Cui, “Improved pla- [8] K. Kim and B. R. Moon, “Malware detection based giarism detection algorithm based on abstract syntax on dependency graph using hybrid genetic algo- tree,” in International Conference on Emerging In- rithm,” in Conference on Genetic and Evolutionary telligent Data Web Technologies, pp. 714–719, 2013. Computation, pp. 211–1218, 2010. [23] M. Weixuan, C. Zhongmin, and T. Li, “A malicious [9] V. Kononov and V. A. Rusakov, “On the problems of code detection method based on active learning,” developing klee based symbolic interpreter of binary Software Journal, vol. 28, no. 2, pp. 384–397, 2017. files,” Procedia Computer Science, vol. 145, pp. 275– [24] P. Wu, J. Wang, and B. Tian, “Software homology 281, 2018. detection with software motifs based on function-call [10] R. Koschke, “Large-scale inter-system clone detec- graph,” IEEE Access, no. 99, pp. 1–1, 2018. tion using suffix trees,” in European Conference on [25] Y. Xiang, J. Han, H. Xu, and X. Guo, “An im- Software Maintenance and Reengineering, pp. 309– proved heuristic method for subgraph isomorphism 318, 2012. problem,” IOP Conference Series: Materials Science [11] J. Lerouge, Z. Abu-Aisheh, R. Raveaux, P. H´eroux, and Engineering, vol. 231, no. 1, pp. 012050, 2017. and S. Adam, “Graph edit distance : A new bi- [26] L. Xu, F. Sun, and Z. Su, “Construct- nary linear programming formulation,” Computer ing precise control flow graphs from bi- Science, vol. 72, pp. 254–265, 2015. naries,” University of California, 2012. [12] H. I. Lim, “Comparing control flow graphs of bi- (https://pdfs.semanticscholar.org/8a80/ nary programs through match propagation,” in IEEE f0d173ec7420478e4b96a8264e21e0dafac0.pdf) Computer Software and Applications Conference, [27] S. Yan, R. Wang, C. Salls, N. Stephens, M. Polino, pp. 598–599, 2014. A. Dutcher, J. Grosen, S. Feng, C. Hauser, and C. [13] Mudpo, “Analysis of computer virus epidemic situa- Kruegel, “Sok: (State of) the art of war: Offensive tion in March 2018,” Information Network Security, techniques in binary analysis,” in Security and Pri- no. 5, 2018. vacy, pp. 138–157, 2016. [14] L. Nan, H. Lifang, and X. Kunfeng, “An improved [28] J. Yang, X. Song, Y. Xiong, and Y. Meng, “An open software source code matching algorithm based on source software defect detection technique based on abstract syntax tree,” Information Network Security, homology detection and pre-identification vulnerabil- no. 1, pp. 38–42, 2014. itys,” Advances in Intelligent Systems and Comput- [15] G. Negi, E. Elias, R. Kohli, and V. Bibhu, “Relia- ing, vol. 773, pp. 932–940, 2019. bility analysis of test cases for program slicing,” in [29] Y. Zhibin, J. Xin, and S. Dawei, “A binary-oriented The 1st International Conference on Innovation and mixed recovery method for control flow graphs,” Challenges in Cyber Security (ICICCS’16), pp. 36– Computer Application Research, no. 7, 2018. 40, 2016. [16] H. Qin, Study on Software Homology Detection Tech- [30] X. Zhoubo, Z. Li, Ninglihua, and A. Tianlong, nology Based on AST Structure Optimization and “Graphic editing distance summary,” Computer Sci- CFG Comparison (in Chinese), Master Thesis, Bei- ence, vol. 45, no. 4, pp. 11–18, 2018. jing University of Posts and Telecommunications, Jinyue Bian is a master degree student in the school 2011. [17] Q. W. Qin, “Reverse analysis of software based on of computer science, Shanghai University. His research ida-Pro,” Computer Engineering, vol. 34, no. 22, interests include big data analysis, machine learning, pp. 86–88, 2008. and computer and network security especially in host [18] K. Sen, P. Godefroid, N. Klarlund, “DART: Directed behaviour analysis. automated random testing,” ACM Sigplan Notices, Quan Qian is a full Professor in Shanghai University, vol. 40, no. 6, pp. 213–223, 2005. China. His main research interests concerns computer [19] K. Sen, D. Marinov, and G. Agha, “Cute: A concolic network and network security, especially in cloud com- unit testing engine for C,” ACM Sigsoft Software En- puting, big data analysis and wide scale distributed gineering Notes, vol. 30, no. 5, pp. 263–272, 2005. network environments. He received his computer science [20] M. H. R. A. Shaikhly, H. M. El-Bakry, and A. Ph.D. degree from University of Science and Technol- A. Saleh, “Cloud security using markov chain and ogy of China (USTC) in 2003 and conducted postdoc genetic algorithm,” International Journal of Elec- research in USTC from 2003 to 2005. After that, he tronics and Information Engineering, vol. 8, no. 2, joined Shanghai University and now he is the dean of pp. 96–106, 2018. department of machine intelligence and technology, and [21] H. Shi, J. Mirkovic, and A. Alwabel, “Handling anti- the lab director of materials informatics and data science. virtual machine techniques in malicious software,” International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 793
A New Diffusion and Substitution-based Cryptosystem for Securing Medical Image Applications
L. Mancy and S. Maria Celestin Vigila (Corresponding author: S. Maria Celestin Vigila)
Department of Information Technology, Noorul Islam University Kumaracoil, Tamilnadu, India (Email: [email protected]) (Received Mar. 21, 2017; Revised and Accepted June 26, 2018; First Online July 11, 2019)
Abstract random advent. The second is the diffusion assets, which requires that alike keys should yield entirely contradictory In recent periods, individual security is becoming increas- cipher texts for the similar plaintext. ingly endangered. Several methods of safeguarding person Digital imageries like hypnotic quality images, X-rays are private, economic or health data are developed by en- images, etc. are generally used in remedial solicitations. tities, fabrications, and managements. Thus the patient It compact with patient proceedings that are remote and data are generally produced in the place of the health should only obtainable to official folks. Though certain images for observing. Thus, it is effortlessly available to patients are unworried about rupture of privacy could each one. The patient data obviously shown may be in- root risky awkwardness and disgrace. Therefore, there terrupted by other parties during automated broadcast. is a necessity to defend and sustain privacy of patient For analysis purposes, these health informatics desires to information. In this dispatch, an endangered image en- be voluntarily nearby to the physicians. Unique feature cryption for DICOM images, based on the substitution of this scheme is to make a revision on the Digital Imag- and diffusion transformations is suggested. A secret key ing and Communications in medicine (DICOM) standard, of 128-bits size is generated by an image a histogram. Ini- which specifies the form that all numerical health images tially, the visual feature of DICOM image is decomposed will be well-suited. Several private data, such as patient’s by the mixing process. The subsequent image is divided name, date of birth, gender, and patient identity with de- into key reliant blocks and further, these blocks are passed tails about where the image was taken. So it is essential through diffusion and substitution processes. Total five from the patient’s idea of sight is to keep the above evi- rounds are used in the encryption method. Finally the dence in private. Therefore the security of medical images generated secret key is embedded within the encrypted can be achieved through confidentiality, availability, reli- image in the process of steganography. At the receiver ability and authentication. side the secret key was recovered from the embedded im- Keywords: Diffusion; Encryption; Histogram; Steganog- age and decryption operation was performed in inverse raphy; Substitution format. Here the Steganography is the art of secreting data by means that avoid the recognition of secret com- munications. It contains a huge amount of approaches to 1 Introduction coat a communication from being grasped. The main aim Cryptography has developed a communal, to afford an ex- of steganography is to distort the presence of any secreted traordinary defense against various attacks. It diminishes statement. with the progression of systems for altering facts among The above introductory section explains the medical reasonable and worthless practices. In endangered dis- image security and provides some security measures to patches the cryptographic methods are focused by one or overcome the problem. The second section presents a re- more keys. When the keys are same, are convoked as view of literature and relevant research associated with private key cryptography. Consecutively, when keys are the problem addressed in this study. Then the third sec- different, then the cryptographic methods are known as tion explains the methodology and the steps of the pro- public key cryptography. There are two vital assets which posed cryptographic algorithm. Finally the fourth section all encryption schemes must fulfill. The first is the confu- explains the analysis of the data and the presentation of sion assets which involves, that cipher texts should have the results. Then the final section explains the conclusion International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 794 of medical image security applications. the Elliptic curve cryptosystem for a text message was implemented. Chong Fu et al. (2013) offer a chaos medi- cal image encryption pattern, through a bit-level shuffling 2 Related Works process. Viswanathan et al. (2014) advocated by giving admission to the results of medical images with secrecy, In the appraisal of works several researches have raised convenience and veracity. Abhilasha Sharma et al. (2015) the safety of medicinal images in their own idea. Certain direct the watermarking method based on DWT to embed appropriate mechanism is highlighted in this section. multiple watermark into the cover image. Jani Anbarasi Zhou et al. introduced a technique certainty and reli- et al. (2015) anticipated a multi top-secret image sharing ability of digital mammography to encounter the neces- scheme that shares the multiple disruption polynomial. sities of authenticity and integrity of images [36]. Zhang Mancy and Maria Celestin Vigila (2015) reviewed on et al. (2007) an image scrambling technique was estab- image encryption technique and their functionalities are lished on queue transformation. Zhang et al. (2007) im- analyzed. Maria Celestin Vigila and Muneeswaran (2015) plemented a secured digital communication protocols un- proposed the implementation of reversible information der various conditions. Zhicheng et al. (2008) suggested hiding in spatial domain images rooted in neighbor mean a novel lossless data embedding technique. Michal Voss image interruption without impairing the image emi- berg et al. (2008) acclimate the procedure to the globes nence. ZeinabFawaz et al. (2016) a novel image encryp- grid safekeeping administration. Gouenou Coatrieux et tion scheme based on two rounds of substitution–diffusion al. (2009) familiarized to make the image more service- is proposed. Ritu Agrawal et al. (2016) uses a loss- able, by watermarking it with a precis data. Yong Feng et less medical image watermarking method using modula- al. (2009) presented an invertible map, called Line map, tion variation was implemented, which offers high health- for image crypto system. iness and low entropy distance for safeguard. Akram Luiz Octavio Massato Kobayashi et al. (2009) a Belazi et al. (2016) a novel image encryption approach method using cryptography to improve conviction of med- based on permutation substitution network and chaotic ical images. Weihai Li et al. (2009) Peizhenwang et systems are proposed. BalaKrishnan Ramalingam et al. al. (2010) a better image encryption technique, which (2017) present permutation and diffusion based hybrid is based on hyper chaotic categorization is anticipated. image crypto system in transform domain using combined Vinod Patidar et al. (2010) a diffusion-substitution sys- chaotic maps and Haar Integer Wavelet Transform. Wei- tem, based on chaotic map, for the image encryption jia Cao et al. (2017) presents a medical image encryption is recommended. Sanfu Wangl et al. (2010) offered a algorithm using edge maps derived from a source image. image scrambling technique. Yi Wan et al. (2010) In Shahryar Toughi et al. (2017) utilize the Elliptic curve this paper, a histogram based vigorous estimator for the generator to generate a sequence of arbitrary numbers noise mixing prospect was projected. Guiliang Zhul et al. based on curves. (2010) recommended the three levels of multilayer scram- ble. Stallings (2010) has documented about security is- sues in his book. Sathish kumar et al.(2011) suggested 3 The Proposed Methodology anew algorithm for the image cryptographic system. Thomas Neuberger et al. (2011) safeguards the medi- The medical image safety has become a significant prob- cal accounts from prohibited access in which the patient lem during the storage and broadcast of data. So to pre- as material container to resolve, who are the certified serve the conveyed proof beside undesirable description, persons. Mustafa Ultras et al. (2011) authorize a se- the secret key used in encryption process is generated cret scattering system, which segments the health images from the image itself. By mixing process, the distinctive among a health team of ‘n’ doctors. Maria Celestin Vigila pixel is extended by associating the present pixel with its and Muneeswaran (2012) projected an Elliptic curvature former pixel and its session key. Here the size of the block based key generation for stream cipher. Dalel Bouslimi is decided by session key, in which the block may be of any et al. (2012) advocate a mutual encryption watermarking one of the ten different sizes. Then in diffusion process, method to combine the Quantization index modulation the pixel of each block is reorganized within the block by and an encryption algorithm. Abir Awad et al. (2012) a a spiral path pattern. In the substitution process, the novel chaotic replacement method based on the comple- pixel of each block is changed with one of their nearby mentary rule. pixels. Finally the generated secret key is embedded into Musheer Ahmad and Tanvir Ahmed (2012) recom- the cipher image using the process of steganography. mended a framework to provide visual protection to with- To secure the medical DICOM image, the secret key stand statistical attacks to medical images. Li Chin of 128 bit size is generated by an image histogram. Then Huangc et al. (2013) put forward a histogram fluctuating the medicinal image is encoded by using diffusion and method to reach high bit depth health images. Tsang substitution procedure. Total five rounds are used in the IngRen et al. (2013) estimated a motion recompense encryption method. At last the secret key is fixed within method to be applied in both encryption and decryption the encrypted image by the method of steganography. At process. Maria Celestin Vigila and Muneeswaran (2013) the receiver side the key was recovered from the embedded International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 795
Figure 1: Block diagram of the proposed methodology
image and then decryption operation is performed in the ‘H’ and ‘W’ represent the Highness and Extensive- inverse setup. Fig 1 shows the block diagram of proposed ness of the image respectively; Px,0 = {0 when x = 1 approach. & Px−1,W when x > 1}. The steps of this proposed cryptographic algorithm is explained as follows. Algorithm 1 The mixing process 1: i ←− position of first session key, i.e. 1 1) Key generation: 2: for x = 1: H do do The private key assets in the encryption process are 3: for y = 1: W do do generated from the image itself. 4: Px,y ←− (Px,y ⊕ Px,y−1 ⊕ Ki) 5: i ←− position of next session key, i.e. ((i mod Step 1. At first the histogram of the image is calcu- 16) + 1) lated by means of histogram counts. 6: end for Step 2. Then histogram counts of 255 values are ob- 7: end for tained by dividing the ith count by i+1th count. Step 3. The obtained result is rounded to the near- 3) Block Size Decision: est integer and modulo10 operation is performed The resulting image from the mixing process is di- on the values. vided into non-coinciding blocks B 1, B2....BN. Here Step 4. The resulting values will be in the range of the size of the block is decided by session key Ki .The [0-9]. block may be of any one of the ten different sizes. So, total five rounds are used to complete the encryption Step 5. From the 255 values, the first 128 values are process. In each round unique session key Ki is used. extracted and formed as a secret key with a size The table1 shows the block size decision criteria. of 128 bits. 4) Diffusion Process: Step 6. Then this key is divided into block of 8 bits Here the scrambling of the image is based on spi- to form the session keys k = k k ··· k (in 1 2 32 ral scanning pattern. Here the scrambling is done in hexadecimal) given as K = K K ··· K (in 1 2 16 a key dependent manner. The Pixel of each block ASCII). is replaced within the same block by a spiral path of size 8 × 8 matrix. Location of the starting pixel Here, ki’s referred to as sub-keys are hexadecimal for navigating in a block is made key reliant on ex- digits (0-9 and A-F) and Ki represents the session keys. clusively. For this purpose pairs of neighboring sub keys are formed. The Key pairs are given as (k1, k2), 2) Mixing Process: (k3, k4),(k5, k6), (k7, k8) and (k1, k2). When all sub- In mixing process every pixel of the image is inter- key pairs are firmed it is commenced again from the changed with its earlier pixels and the session key first sub-key pair (k1, k2). At the end of diffusion pro- by XOR operation. The algorithm related to mixing cesses, not only all pixels get altered, but also their process is given in Algorithm 1. In this algorithm, neighboring pixels are rearranged broadly within the International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 796
Table 1: Block size decision table
Ki mod 10 0 1 2 3 4 5 6 7 8 9 Block Size (Bi) 16 24 32 40 48 56 64 72 80 96
image block. The 8 × 8 matrix diagram is given in Table 2: Pixel location table Figure 2.
Sub Key Directions of Location Surrounding value adjacent surrounding pixel Px,y ki pixel to a pixel w.r.t current pixel 0/F E P (i, j + 1) 1/E NE P (i − 1, j + 1) 2/D N P (i − 1, j) 3/C NW P (i − 1, j − 1) 4/B W P (i + 1, j − 1) 5/A SW P (i + 1, j) 6/9 S P (i + 1, j) 7/8 SE P (i + 1, j + 1)
traction process and the image is decrypted in the Figure 2: Spiral path Scrambling reverse order as done in the encryption stage.
4 Performance Analysis 5) Substitution Process: In substitution process the property of the selected The current scheming power is capable of breaking en- pixel is altered by the one of the neighboring pix- cryption patterns in a real time, if the scheme is not de- els in the eight directions (i.e.) the current pixel signed to look into these problems. Hence, a good en- is XOR-ed with the neighboring pixel. Therefore cryption system would preserve away from the possible the eight neighboring pixels are North, North West, attacks. Hence, analysis of encryption systems such as North East, South, South West, South East, East histogram analysis, Entropy, Correlation analysis, etc., (E) and West (W). Here, we use neighboring as Px,y certifies the precise growth of the security scheme. which is XOR-ed with current pixel Pi,j. When all the sub-keys are shattered, begin the procedure from 1) Statistical Analysis: the first sub-key k1 again. In this step, some of the The statistical analysis involves collecting every data pixels lying on the boundary of a block may be re- sample in a set of items from which samples can be main unaffected. The pixel location table is given in drawn. The different types of statistical analysis are Table 2. given as follows.
6) Key Embedding: a. Histogram Analysis: The generated secret key is embedded into the cipher A histogram is a graphical design of numerical image using DWT transform, which is applied to the data. An image histogram is a chart that shows image to form four sub bands such as LL, LH, HL and the dispersal of intensities in a grayscale image. HH. The LL level is obtained to the size of 128×128. To prevent the outflow of data to an opponent, Then the transformed coefficients in all the bands are it is substantial to guarantee that the cipher im- converted into the binary form and along the diago- age does not have any statistical resemblance to nal elements the 3 LSB is replaced with each bit of the input image. The histogram of the input the secret key after converting each digit in binary image has massive sting. But, the histogram of form with three digits. Then after implanting the the cipher image is nearly even and constant, secret key is transformed into decimal format and signifying the nearly same probability of exis- again, it is converted into original format by IDWT. tence of each intensity level. Figure 3 shows the The ultimate output is the embedded image and the histogram of original, cipher and stegno image. secret key is mended at the receiver side through ex- b. Information Entropy: International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 797
(a) (b)
(c) (d)
(e) (f)
Figure 3: Histogram of original, encrypted, and stegno-cipher Images. (a)Original Image, (b) Histogram of Original Image, (c) Encrypted Image , (d) Histogram of Encrypted Image, (e) Stegno-Image, (f) Histogram of Stegno Image.
Information entropy is a concept from informa- Where P (Si) is the probability of an i th image. tion theory. It tells how much information there Table 3 shows the entropy of original, cipher and is at an event. In general, the more uncertain stegno image. To enterprise a good image en- or random the event is, the more information it cryption pattern, the entropy of encrypted im- will contain. It has applications in many areas, age should be closer to the ultimate expected including lossless data compression, statistical value 8. Therefore the information outflow in inference, cryptography, etc. The Entropy cal- the proposed cipher is insignificant, and it is culation formula gives as follows, safe upon the entropy outbreak.
2) Correlation Analysis: X H(s) = P (Si) log 2 × (1/P (Si)). (1) The correlation is an arithmetic method that shows International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 798
5 Conclusion Table 3: Entropy of original, cipher and plain image To secure the medical image, the secret key of 128 bit size Image Entropy is generated by means of image histogram and the medi- Original Image 7.2551 cal image is encrypted by using diffusion and substitution Cipher Image 7.5564 process. Finally the secret key is embedded within the en- Stegno Image 7.6463 crypted image in the process of steganography. This also enriched the security of medical image. At the receiver side the key was recovered from the embedded image and then decryption is performed in the reverse format. Due whether and how powerfully the couples of variables to this, the security of medical image is enriched. Here are connected. The correlation between pairs of origi- we deliberate the security analysis of medical image en- nal and its corresponding encrypted image produced cryption pattern such as statistical analysis, correlation using the proposed image encryption algorithm by analysis and key sensitivity analysis, which prove that the computing the correlation coefficients. The formula proposed cipher is safe against the most common attacks. related to correlation coefficients are given as follows.
p p References Cx,y = cov(x, y)/ D(x) D(y) X E(x) = 1/N × x [1] R. Agrawala, M. Sharma, “Medical image water- i marking technique in the application of diagnosis us- X 2 D(x) = 1/N × (xi − E(x)) ing M-ary modulation,” in International Conference X on Computational Modeling and Security, 2016. Cov(x, y) = 1/N × (xi − E(x))(yi − E(y)). [2] M. Ahmed, T. Ahmed, “A framework to protect pa- tient digital imagery for secure telediognosis,” Pro- In the given formula, there are N pairs of neighboring cedia Engineering, vol. 38 , pp. 1055-–1066, 2012. pixels and x, y represent the intensity values of two [3] J. S. Anbarasi, A. G. S. Malab, M. Narendrac, “ neighboring pixels from an image. Table 4 shows the DNA based multi-secret image sharing,” in Interna- correlation of original, cipher and stegno images. tional Conference on Information and Communica- tion Technologies, 2015. [4] A. Awad and A. Miri, “A new image encryption algo- 3) Key Sensitivity Analysis: rithm based on a chaotic dna substitution method,” Even a variation in a single bit of the key will make in IEEE International Conference on Communica- an entirely different cipher image for the attackers tions (ICC’12) , 2012. to identify the key. This makes the encryption pro- [5] A. Belazi, A. A. Abd El-Latif, S. Belghith, “A novel cedure sensitive enough to the secret key. To test Image Encryption Scheme based on Substitution- the sensitivity of the proposed image cipher with re- Permutation network and chaos,” Signal Processing, spect to the key, encrypted image corresponding to vol. 128, pp. 155–170, 2016. plain image is decrypted with a slightly different key [6] D. Bouslimi, G. Coatrieux, M. Cozic, C. Roux, “A than the original one. Here the two cipher images joint encryption/watermarking system for verifying are associated, in which it was not easy to compare the reliability of medical images,” IEEE Transactions the cipher images by simply observing these images. on Information Technology in Biomedicine, vol. 16, Thus, for comparison, the correlation between the pp. 5, pp. 891–899, 2012. identical pixels of the two cipher images is intended. [7] W. Cao, Y. Zhou, C. L. P. Chen, L. Xia, “Medical Table 5 shows the entropy and correlation between image encryption using edge maps,” Signal Process- two cipher images. ing, vol. 132, pp. 96–109, 2017. [8] G. Coatrieux, C. Le Guillou, J. M. Cauvin, “Re- Here Figure 4 shows the Key Sensitivity Analysis versible watermarking for knowledge digest embed- of original and two cipher images. Therefore a per- ding and reliability control in medical images,” fect diffusion and substitution method should resist IEEE Transactions on Information Technology in against all kinds of attacks. Hence some analysis Biomedicine, vol. 13, no. 2, pp. 158—165, 2009. techniques such as statistical, correlation and key [9] Z. Fawaz, H. Noura and A. Mostefaoui, “An efficient sensitivity analysis are discussed in the above ses- and secure cipher scheme for images confidentiality sion to prove that the proposed algorithm is secre- preservation,” Signal Processing, vol. 42, pp. 90–108, tive against most common attacks. Therefore a new 2016. diffusion and substitution based cryptosystem for se- [10] Y. Feng, X. Yu, “A novel symmetric image encryp- curing the medical image applications is implemented tion approach based on an invertible two-dimensional and performance analysis designates that the pro- map,” in 35th Annual Conference of IEEE Industrial posed cipher is more secure. Electronics, pp. 4244–4649, 2009. International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 799
Table 4: Correlation of original, cipher and stegno images
Image Vertical Correlation Horizontal Correlation Diagonal Correlation Original Image -0.0053 -0.0075 -0.0068 Cipher Image -0.0063 -0.0042 -0.0130 Stegno Image -0.0089 -0.0032 -0.0073
(a) (b) (c)
Figure 4: Key sensitivity analysis of original, cipher image 1 and cipher image 2. (a)Original Image, (b) Cipher Image 1, (c) Cipher Image 2.
Table 5: Entropy and Correlation between two cipher images
Image Entropy Vertical Correlation Horizontal Correlation Diagonal Correlation Cipher Image 1 7.5564 -0.0063 -0.0042 -0.0130 Cipher Image 2 7.6463 -0.0089 -0.0032 -0.0073
[11] C. Fu, W. H. Meng, Y. F. Zhan, et al., “An efficient [17] Z. Ni, Y. Q. Shi, N. Ansari, W. Su, Q. Sun, X. Lin, and secure medical image protection scheme based on “Robust lossless image data hiding designed for semi chaotic maps,” Computers in Biology and Medicine, fragile image authentication,” IEEE Transactions on vol. 43, pp. 1000–1010, 2013. Circuits and Systems for Video Technology, vol. 18, [12] L. C. Huang, L. Y. Tseng, M. S. Hwang, “A re- no. 4, pp. 497—509, 2008. versible data hiding method by histogram shifting in [18] V. Patidar, G. Purohit, K. K. Sud, N. K. Pareek, high quality medical images,” The Journals of Sys- “Image encryption through a novel permutation- tems and Software, vol. 86, pp. 716–727, 2013. substitution scheme based on chaotic standard map,” [13] L. O. M. Kobayashi, S. S. Furuie, P. S. L. M. Barreto, in International Workshop on Chaos-Fractal Theory “Providing integrity and authenticity in DICOM im- and Its Applications, 2010. ages: A novel approach,” IEEE Transactions on In- [19] B. Ramalingam, A. Rngarajan, J. B. B. Rayappan, formation Technology in Biomedicine, vol. 13, no. 4, “ Hybrid image crypto system for secure image com- pp. 582–589, 2009. munication - A VLSI approach,” Microprocessor and [14] W. Li, Y. Yuan, “ Improving security of an image Micro Systems, vol. 50, pp. 1–13, 2017. encryption algorithm based on chaotic circular shift,” in IEEE International Conference on Systems and [20] C. C. Sabino, L. S. Andrade, T. I. Ren, G. D. C. Cav- Cybernetics, 2009. alcanti, T. I. Jyh, J. Sijbers, “Motion compensation [15] L. Mancy, S. M. C. Vigila, “A survey on protection techniques in permutation-based video encryption,” of medical images,” in International Conference on in IEEE International Conference on Systems and Instumentation, Communication and Computational Cybernetics, 2013. Technologies, 2015. [21] G. A. Sathishkumar, K. B. Bagan, N. Sriraam, [16] T. Neubauera, J. Heurixb, “A methodology for the “Image encryption based on diffusion and multiple pseudonymization of medical data,” International chaotic maps,” International Journal of Network Se- Journal of Medical Informatics, vol. 80, pp. 190–204, curity & Its Applications, vol. 3, pp. 2, pp. 181—194, 2011. 2011. International Journal of Network Security, Vol.22, No.5, PP.793-800, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).09) 800
[22] A. Sharma, A. K. Singh, S. P. Ghrera, “Secure hy- ping,” in International Conference on Computer Ap- brid robust watermarking technique for medical im- plication and System Modeling, 2010. ages,” in International Conference on Eco-friendly [33] S. Wang, Y. Zheng, Z. Gao, “ A new image scram- Computing and Communication Systems, 2015. bling method through folding transform,” in Inter- [23] W. Stallings, Cryptography and Network Security, national Conference on Computer Application and Fifth Edition, 2010. System Modeling, 2010. [24] S. Toughi, M. H. Fathi, Y. A. Sekhavat, “ An im- [34] H. Y. Zhang, “A new image scrambling algorithm age encryption scheme based on elliptic curve pseudo based on queue transformation,” in International random and advanced encryption system,” Signal Conference on Machine Learning and Cybernetics, Processing, vol. 141, pp. 217–227, 2017. pp. 19–22, 2007. [25] M. Ulutas, G. Ulutas, V. Nabiyev, “Medical image security and EPR hiding using Shamir’s secret shar- [35] J. Zhang, F. Yu, J. Sun, Y. Yang, C. Liang, “DICOM ing scheme,” The Journals of Systems and Software, image secure communications with internet protocols vol. 84, pp. 341–353, 2011. IPv6 and IPv4 ,” IEEE Transactions on Informa- [26] S. M. C. Vigila, K. Muneeswaran, “Key generation tion Technology in Bio medicine, vol. 11, no. 1, pp. based on elliptic curve over finite prime field,” Inter- 70—80, 2007. national Journal on Electronic Security and Digital [36] X. Q. Zhou, H. K. Huang, and S. L. Lou, “Authen- Forensics, vol. 4, pp. 1, pp.65—81, 2012. ticity and integrity of digital mammography images,” [27] S. M. C. Vigila, K. Muneeswaran, “A new elliptic IEEE Transaction on Medical Imaging, vol. 20, pp. curve cryptosystem for securing sensitive data appli- 8, pp. 784—791, 2001. cations,” International Journal on Electronic Secu- [37] G. Zhu, W. Wang, X. Zhang, M. Wang, “ ZGW-1 rity and Digital Forensics, vol. 5, pp. 1, 2013. digital image encryption algorithm based on three [28] S. M. C. Vigila, K. Muneeswaran, “Hiding of con- levels and multilayer scramble,” in 2nd IEEE Inter- fidential data in spatial domain images using image national Conference on Network Infrastructure and interpolation,” International Journal of Network Se- Digital Content, 2010. curity, vol. 17, No. 6, pp. 722–727, 2015. [29] P. Viswanathan, V. P. Krishna, “A joint FED wa- termarking system using spatial fusion for verifying the security issues of teleradiology,” IEEE Journal Biography of Biomedical and Health Informatics, vol. 18, no. 3, 2014. S. Maria Celestin Vigila completed her B.E. in Com- [30] M. Vossberg, T. Tolxdorff, “DICOM Image puter Science and Engineering in 1996 and M.E. in Com- Communication in Globus-Based Medical Grids,” puter Science and Engineering in 1999. She completed her IEEE Transactions on Information Technology in Ph.D. in the area of data security from Anna University, Biomedicine, vol. 12, pp. 2, pp. 145—153, 2008. Chennai. She is currently Associate Professor in the De- [31] Y. Wan, Q. Chen, Y. Yang, “Robust impulse noise partment of Information Technology, Noorul Islam Uni- variance estimation based on image histogram,” versity, Kumaracoil and member of ISTE and IET. She is IEEE Signal Processing Letters, vol. 17, no. 5, pp. the reviewer for quite a few peer reviewed international 485—488, 2010. journals. Her research interest includes cryptography and [32] P. Wang, H. Gao, M. Cheng, X. Ma, “A new image network security, wireless networks and information hid- encryption algorithm based on hyperchaotic map- ing. International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 801
A LWE-based Oblivious Transfer Protocol from Indistinguishability Obfuscation
Shanshan Zhang1,2 (Corresponding author: Shanshan Zhang)
State Key Laboratory of Integrated Services Networks, Xidian University1 No. 2, Taibai South Road, Xi’an 710071, Shaanxi Province, China School of Mathematics and Information Science, Baoji University of Arts and Sciences2 No. 44, Baoguang Road, Baoji 721013, Shaanxi Province, China (Email: [email protected]) (Received Mar. 18, 2019; Revised and Accepted Aug. 4, 2019; First Online Jan. 29, 2020)
Abstract wants to ensure that R receives only one of the two mes- sages. Due to the simple functionality of OT, it has been Oblivious transfer is an important cryptographic primi- widely exploited to construct cryptographic schemes, such tive and served as a powerful tool in secure computation. as contract signing, secure multi-party computation, the Most existing oblivious transfer protocols are built upon exchange of secrets, and key agreement. Therefore, it is the hardness of factoring or computing discrete logarithm of great significance to design efficient OT protocols. problem. However, threatened by quantum computing, these protocols will be broken down directly in the pres- ence of quantum computer. Therefore, it is essential to 1.1 Our Motivation construct OT protocol based on post-quantum cryptogra- As far as we know, most existing OT protocols are built phy. As a subarea of post-quantum cryptography, lattice- upon number theoretical problems mainly consist of the based cryptography has some attractive features. Specif- hardness of factoring and computing discrete logarithm ically, the learning with errors (LWE) problem has been problem. However, threatened by quantum comput- used as an amazingly versatile basic tool to design cryp- ing [16], these protocols will be broken down directly in tographic schemes. We are inspired by a result which pro- the presence of quantum computer. Therefore, it is es- posed an oblivious transfer protocol using the decisional sential to construct OT protocol based on post-quantum Diffie-Hellman assumption and indistinguishable obfusca- cryptography, such as lattice-based protocol and code- tion. Therefore, we propose a new secure LWE-based based protocol. Lattice-based cryptography has some oblivious transfer protocol from indistinguishability ob- attractive properties when compared with other post- fuscation. The main tools consist of LWE-based dual- quantum research fields, for instance, strong security mode cryptosystem and a secure indistinguishability ob- guarantees from worst-case hardness and algorithmic sim- fuscation which guarantee the security of our oblivious plicity. Among lattice-based hard problems, the learn- transfer protocol. ing with errors (LWE) problem [14] has been used as Keywords: Dual-mode Cryptosystem; Indistinguishability an amazingly versatile basic tool to design cryptographic Obfuscation; Lattice-based; Oblivious Transfer schemes. Specifically, Peikert et al. [11] proposed an ef- ficient and universally composable OT protocol which is extracted from a dual-mode cryptosystem and can be in- 1 Introduction stantiated with the decisional Diffie-Hellman assumption, the quadratic residuosity assumption and the worst-case Oblivious transfer (OT) is a fundamental cryptographic lattice assumption, respectively. primitive, first proposed by Rabin [13] in 1981. It contains Zheng et al. [17] researched the framework for compos- two participants, a sender (denoted by S), and a receiver able oblivious transfer and proposed a secure OT protocol (denoted by R), and requires that S sends a message to from indistinguishability obfuscation (iO), with a dual- R with probability 1/2, while S is oblivious to whether or mode cryptosystem and an iO as main technical tools. not the message was received by R. A well-known flavor of Their work mainly has the following contributions. First, 2 OT is called 1-out-of-2 OT (denoted by OT1), where S has a k-out-of-n OT protocol was presented. Second, it ex- two inputs m0 and m1, and R has a chosen bit b ∈ {0, 1}, plored the applications of iO. iO is a weaker notation of and R wishes to obtain mb, without S learning b, while S obfuscation that was first formally defined by [2]. They International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 802 suggested a definition of virtual black box obfuscation, notes the quotient ring Z/qZ. Let “ ← ” denote sampling and proved that this notion is impossible to realize. In an element from some distribution uniformly at random. order to avoid the impossibility result, they presented the We use bold lower-case letters to denote vectors in col- notion of iO, which only requires that if two circuites com- umn form, and bold upper-case letters to denote matri- pute the same functionality, then their obfuscation should ces. Let n ∈ N denote the security parameter through- be computational indistinguishable from each other. iO out this paper, and all other quantities are functions of n. is both very useful and potentially achievable. Garg et We use standard notation o to classify the growth of func- al. [7] proposed the first candidate GGH13-based [6] iO tions, the function negl(n) denotes an unspecified function for general circuits. Subsequently, many applications of f(n) = o(n−c) for some constant c > 0, calling negl(n) iO were described in [15], such as public encryption, in- is negligible, and we say a probability is overwhelming if jective trapdoor function, deniable encryption, and so on. it is 1-negl(n). We use the definition of computational c The OT protocol of Zheng et al. has a main tool that indistinguishability, denoted by ≈. is based on the hardness of the decisional Diffie-Hellman (DDH) problem. However, DDH assumption does not guarantee against quantum attack, and the selection of 2.2 Learning with Errors iO is the first candidate that have been attacked by [4]. The LWE problem was proposed by Regev [14], the hard- For these reasons, we aim to remedy the insufficiency and ness of it can be reduced by a quantum algorithm to some try to design another new OT protocol that is security standard problems on lattices in the worst case. in quantum setting. Therefore, we take advantage of the For an integer q = q(n) ≥ 2 and some probability LWE problem and a secure iO. Furthermore, if the secu- distribution χ over Zq, we define As,χ as the distribution rity of OT protocol is proved only according to an ideal n T on Zq ×Zq of the tuples (a, c) = (a, a s+e) where s, a ← world simulator that is shown only for a cheating receiver, n Zq is uniform and e ← χ, and all operations are performed then they are not necessarily secure when integrated into in Zq. There are two versions of the LWE problem, search- a lager protocol. Thus, our protocol needs to satisfy the LWE and decision-LWE, respectively. property of universally composable simultaneously. Definition 1 (Search-LWE and decision-LWE). For an integer q = q(n) and a distribution χ on Zq, for any 1.2 Our Contribution n s ∈ Zq , search-LWE finds s given any independent sam- In this work, we combine LWE-based dual-mode cryp- ples (a, c) from As,χ.The goal of decision-LWE is to dis- tosystem [11] and a secure iO to design a new OT pro- tinguish between an oracle that returns independent sam- n tocol. The key technique is an obfuscator of the dual- ples from As,χ for some uniform s ← Zq , and an oracle mode cryptosystem based on the hardness of LWE, versus that returns independent samples from the uniform distri- n based on DDH assumption in [17]. It is important that we bution on Zq × Zq. choose GGH13-based obfuscator which against quantum attack when combined with the technique of [5] to prevent Regev showed that these two versions are polynomially input partitioning. By utilizing these tools, we realize the equivalent for q = poly(n). He proved that for certain oblivious transform functionality, and guarantee security choices of q and χ, the decision-LWE problem is as hard as of our OT protocol. solving the shortest independent vectors problem (SIVP) using a quantum algorithm. 1.3 Organization Theorem 1. Let q = q(n) be√ a prime and let α = The rest of this paper is organized as follows. In Section 2, α(n) ∈ (0, 1) such that αq > 2 n. If there exists an we introduce two useful definitions of LWE and iO. Then efficient algorithm that solves the decision-LWE problem, two corresponding building blocks are given in Section 3. then there exists an efficient quantum algorithm for the In Section 4, we construct an LWE-based oblivious trans- SIVP within Oe(n/α) in the worst case. fer protocol from iO, and the security proof of our OT Due to the hardness of SIVP, we choose the decision- protocol is presented in Section 5. Finally, conclusions LWE problem as underlying hardness in this paper. are drawn in Section 6. 2.3 Indistinguishability Obfuscation 2 Preliminares Program obfuscation aims to make computer programs In this section, we introduce some notations and funda- “unintelligible” while preserving their functionality. The mental definitions. systematic study of program obfuscation was initiated by Barak et al. in 2001. In their work, they gave a poten- 2.1 Notation tially realizable notion of iO. iO requires that, given any two equivalent circuits of the same size, the obfuscation We let N denote the set of natural numbers, for n ∈ N,[n] of these two circuits should be computationally indistin- denotes the set {1, . . . , n}. For an integer q ≥ 1, Zq de- guishable. The specific definition is as follows. International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 803
Definition 2 (Indistinguishiability obfuscation). A PPT These security properties make the dual-mode cryptosys- algorithm iO is said to be an indistinguishability obfusca- tem able to derive a UC-secure OT protocol. tor for a class of circuits C, if it satisfies: The instantiation of the dual-mode cryptosytem based Functionality: For any C ∈ C and security parameter on the hardness of LWE relies on existing techniques, n, including an LWE-based encryption and an efficient se- P r[∀x : iO(C, 1n)(x) = C(x)] = 1. curely embedded a trapdoor algorithm. So, we first in- troduce an optimized version of the LWE-based encryp- Indistinguishability: For any PPT distinguisher D, tion, then instancing the dual-mode cryptosystem, where there exists a negligible function negl(·), such that for any the message space is Z2 = {0, 1}. Let the modulus two circuits C0,C1 ∈ C that compute the same function q = poly(n) be a prime, all operations are performed over are of the same size: Zq. For every message M ∈ Z2, the “center” of M is defined as t(M) = M · bq/2c ∈ Zq. Let χ denote an error |P r[D(iO(C , 1n)) = 1] − P r[D(iO(C , 1n)) = 1]| 0 1 distribution over Zq. n n×m ≤ negl(n). LWEKeyGen (1 ): Choose a matrix A ∈ Zq and n×1 a secret key s ← Zq are both uniformly at ran- Starting with the work of [7] constructing the first iO dom. To generate the public key, choose an error 1×m candidate for the polynominal-size circuit. Several iO vector x ← Zq where each entry xi ∈ χ is cho- candidates have appeared in literatures [1,3,9,10]. Unfor- sen independently for all i ∈ [m]. Then compute tunately, many constructions are attacked by [10]. So we p = sT A + x, the public key is (A, p). choose a secure iO [12] based on GGH13 multilinear map that haven’t found classical attack and quantum attack. LWEEnc ((pk = (A, p),M): To encrypt a message M ∈ m Z2, choose a vector e ∈ Z2 uniformly at random. The ciphertext is the pair (u, c) = (Ae, pe + M · n 3 Building Blocks bq/2c) ∈ Zq × Zq. LEWDec ((sk = s, (u, c))): Compute d = c − sT u ∈ , In order to construct an LWE-based oblivious transfer Zq output 0 if d is closer to 0 than bq/2c modulo q, protocol from iO, we need two building blocks, including otherwise output 1. LWE-based dual-mode cryptosystem and a secure iO. We verify the completeness of the encryption scheme 3.1 LWE-based Dual-mode Cryptosys- based on the LWE. The decryption algorithm needs to tem compute T T T The dual-mode cryptosystem is a simple and general d = c − s u = (s A + x)e + M · bq/2c − s Ae framework, proposed by Peikert et al. [11]. Actually it = xe + M · bq/2c ∈ . is an encryption scheme that can operate in two modes, Zq which are called messy mode and decryption mode. The If xe + M · bq/2c is closer to 0 than bq/2c modulo q, then trusted setup phase produces a common reference string output 0, otherwise output 1. (denoted by crs) and the corresponding trapdoor infor- The encryption scheme based on the LWE is secure mation according to one of two chosen modes. The crs under chosen plaintext attack, unless SIVP and GapSVP may be uniformly random or be some specified distribu- are easy for quantum algorithms. tion. We now give the construction of the LWE-based dual- The dual-mode cryptosystem has four security proper- mode cryptosystem using LWE-based encryption. It con- ties: sists six probabilistic algorithms, and the last two algo- rithms are only used in the security proof. 1) It suffices decryption completeness with overwhelm- n n×m ing probability over the randomness of the entire ex- SetupMessy (1 ): Choose a matrix A ∈ Zq uni- periment; formly at random, together with a trapdoor t = 1×m {S, A} as in [8]. Choose a row vector w ← Zq . 2) Given crs, the first outputs of SetupMessy and Se- For each b ∈ {0, 1}, choose an independent row vec- 1×m tupDec are computationally indistinguishable; tor τb ∈ Zq uniformly at random. Let crs = (A, τ1, τ2) and output (crs, t). 3) In Messy mode, for every pk, at least one of the de- n n×m rived public keys can statistically hide its encrypted SetupDec (1 ): Choose a matrix A ∈ Zq and a row 1×n message; vector w ← Zq are both at random. For each n b ∈ {0, 1}, choose a secret sb ← Zq and an error row 4) In decryption mode, the honest receiver’s chosen bit vector xb ← χ are both uniformly at random. Let T σ is statistically hidden by its choice or the base key τb = sb A + xb − w, crs = (A, τ1, τ2), t = (w, s0, s1) pk. and output (crs, t). International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 804
KeyGen (σ): Choose a secret s ← n uniformly at Zq P 1 random and a row vector x ← χ1×m. Let pk = T s A + x − τσ, sk = s, and output (pk, sk). 1 SK ,g1,g2 Given input (M, e1, e2, φ), (P 1 FHE ) proceeds Enc(pk, b, M): Output y ← LWEEnc(A, pk + τb,M), as follows: where y is the pair (u, c). 1. Check if φ is a valid low-depth proof for the NP- statement: Dec(sk, y): Output M ← LWEDec(sk, (u, c)). 1 e1 = EvalFHE(PKFHE,Uλ(·,M), g1), Findmessy(t, pk): Parse t as (S, A), run 2 e2 = EvalFHE(PKFHE,Uλ(·,M), g2). ISMessy (S, A, pk + τb) for each b ∈ {0, 1}, and output a b such that IsMessy can output messy 2. If the check fails output 0; otherwise, output on at least one branch correctly with overwhelming 1 probability. DecryptFHE(e1, SKFHE).
TrapKeyGen(t): Parse t as (w, s0, s1), and out- Figure 1: Program P1 put (pk, sk0, sk1) = (w, s0, s1).
According to [8], we know an efficient and UC-secure OT P 2 protocol based on LWE hardness can be directly derived when the LWE-based dual-mode cryptosystem built well. 2 SK ,g1,g2 Although the LWE-based dual-mode cryptoystem is a re- Given input (M, e1, e2, φ), (P 2 FHE ) proceeds laxed version, it can still derive a UC-secure OT protocol as follows: based on the LWE hardness. 1. Check if φ is a valid low-depth proof for the NP- statement:
1 3.2 Secure Indistinguishability Obfusca- e1 = EvalFHE(PKFHE,Uλ(·,M), g1), 2 tion e2 = EvalFHE(PKFHE,Uλ(·,M), g2). In this section, we introduce a secure iO as another tool in 2. If the check fails output 0; otherwise, output our scheme. Indistinguishability obfuscation is a power- 2 ful notion, which holds for every pair functionally equiv- DecryptFHE(e2, SKFHE). alent circuits C0,C1 that iO(C0) and iO(C1) are compu- tationally indistinguishable. Almost all known candidate Figure 2: Program P2 constructions of iO are based on multilinear-maps, which have been the subjects of various attacks. The first candi- date branching program obfuscator can be attacked when input M, denoted by Evaluate(τ, M). Compute the fol- the branching program has input partitioning. So we com- lowing procedure, where Uλ is a poly-sized universal cir- bine it with the prevent input partitioning technique to cuit. against cryptanalytic attacks. 1) Compute In this section, we introduce the secure iO for all 1 1 circuits. Firstly, we need to construct iO for NC cir- e1 = EvalFHE(PKFHE,Uλ(·,M), g1), cuit. More specifically, an NC1 circuit can be computed e = Eval (PK2 ,U (·,M), g ). by branching programs. Let f : {0, 1}n → {0, 1} be 2 FHE FHE λ 2 a function to be obfuscated. Fernando et al. [5] give 2) Compute a low depth proof φ that e1 and e2 were a model which takes partitionable f as input and pro- computed correctly. duces a function g with the same functionality, where 3) Run P (M, e , e , φ) and output the result. g has no input partitions exist. In this way, GGH13 - 1 2 based iO can defence the extension of annihilation at- tacks by [4]. Secondly, using iO for NC1 circuit together with Fully Homomorphic Encryption (FHE) to achieve 4 LWE-based Oblivious Transfer iO for all circuits. The process contains an obfuscation Protocol from Indistinguishabil- algorithm and an evaluation algorithm. To obfuscate a ity Obfuscation circuit C, we choose and publish two FHE keys PK0 λ and PK1. Obfuscate(1 , C ∈ Cλ), then output τ = (P , 1 1 2 SKFHE 4.1 Obfuscator for LWE-based Dual- PKFHE, PKFHE, g1, g2), where P = iONC1 (P 1 , SK1 mode Cryptosystem g1, g2), g1 = EncryptFHE(P 1 FHE ,C) and g2 = SK2 EncryptFHE(P 1 FHE ,C). We describe the two pro- The setup of a dual-mode cryptosystem has messy mode gram classes in Figure 1 and Figure 2. The evaluate al- and decrytion mode, and they are computationally indis- gorithm takes in the obfuscation output τ and program tinguishable. Therefore, their obfuscating results are still International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 805 indistinguishable. We know SetupMessy and SetupDec have two choices respectively, denoted as four circuits Cn that describe in Figure 3, 4, 5, 6, where n = 1, 2, 3, 4. We obfuscate these circuits to substitute the two modes in LWE-based dual-mode cryptosystem. The key is to Circuit C1 describe the four circuits Cn, that are based on LWE en- cryption scheme. Input: choose a matrix A ∈ n×m and a row vector The circuit C1 and C2 output truly random vectors, Zq w ← 1×m are both uniformly at random. Choose a and the circuit C3 and C4 output LWE instantiations. Zq secret s ← n and an error row vector x ← χ1×m Through obfuscating these circuits C1, C2, C3 and C4, 0 Zq 0 we obtain the result that their outputs are computation- are all uniformly at random. 1×m ally indistinguishable. The specific setup of the LWE- Output: a row vector v0 ← Zq uniformly at ran- based dual-mode cryptosystem are called two obfuscation dom. branches as follows. Figure 3: Circuit C1
n n×m Messy-obf-branch (1 ): Choose a matrix A ∈ Zq uniformly at random, together with a trapdoor t = 1×m {S, A} as in [8]. Choose a row vector w ← Zq . n Circuit C2 For each b ∈ {0, 1}, choose a secret sb ← Zq and an error row vector xb ← χ are both uni- formly at random. Let τ = Obfuscate(1λ,C ), n×m 1 1 Input: choose a matrix A ∈ Zq and a row vector τ = Obfuscate(1λ,C ), crs = (A, τ , τ ) and out- 1×m 2 2 1 2 w ← Zq are both uniformly at random. Choose a put (crs, t). n 1×m secret s1 ← Zq and an error row vector x1 ← χ are both uniformly at random. n n×m 1×m Dec-obf-branch (1 ): Choose a matrix A ∈ Zq and Output: a row vector v1 ← Zq uniformly at ran- 1×n a row vector w ← Zq are both uniformly at ran- dom. n dom. For each b ∈ {0, 1}, choose a secret sb ← Zq Figure 4: Circuit C2 and an error row vector xb ← χ are both uni- λ formly at random. Let τ1 = Obfuscate(1 ,C3), τ2 = λ Obfuscate(1 ,C4), crs = (A, τ1, τ2), t = (w, s0, s1) and output (crs, t). Circuit C3 Next, we invoke the evaluate algorithm in KeyGen pro- cess, where we let V = Obfuscate(τ , A) and V = 0 1 1 Input: choose a matrix A ∈ n×m and a row vector Obfuscate(τ , A). Comparing to the above LWE-based Zq 2 w ← 1×m are both uniformly at random. Choose a dual-mode cryptosystem, the rest of the steps are identi- Zq secret s ← n and an error row vector x ← χ1×m cal except that V is substituted for v . 0 Zq 0 σ σ are both uniformly at random. T Output: v0 = s0 A + x0 − w. 4.2 Oblivious Transfer Protocol from In- Figure 5: Circuit C3 distinguishability Obfuscation
2 OT1 is a two-party protocol, involving a sender S inputs M0,M1 and a receiver R inputs a choice bit σ ∈ {0, 1}. Circuit C The result is that R learns Mσ and nothing about an- 4 other message, while S learns nothing at all. Our OT protocol operate in the common reference string model, n×m Input: choose a matrix A ∈ Zq and a row vector denoted by F D , where D denotes a PPT algorithm. F D 1×m crs crs w ← Zq are both uniformly at random. Choose a runs with two parties and there is a trusted party which n 1×m secret s1 ← Zq and an error row vector x1 ← χ can produce crs for two parties before interacting. Once are both uniformly at random. the obfuscator for LWE-based dual-mode cryptosystem is T Output: v1 = s1 A + x1 − w. constructed well, our LWE-based OT protocol from iO de- noted by iOdmbranch can be derived directly, we describe Figure 6: Circuit C4 the protocol in Table 1. It can realize the exact definition D branch of the ideal OT functionality in the Fcrs. iOdm op- erates in two branches, when D=Messsy-obf-branch, D runs in the Messsy-obf-branch; when D=Dec-obf-branch, D runs in the Dec-obf-branch. International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 806
Table 1: Protocol iOdmbranch for oblivious transfer Sender Receiver (sid, ssid,M0,M1)(sid, ssid, σ) Setup: (sid, S, R) (sid, S, R) GGGGGGGGGGGGGGGGB D EGGGGGGGGGGGGGGGG FGGGGGGGGGGGGGGGG Fcrs GGGGGGGGGGGGGGGGC (sid, crs) (sid, crs) Multi-session OT:
(sid, ssid, pk) DGGGGGGGGGGGGGGGGGGG (pk, sk) ← KeyGen-obf-branch(crs, σ) yb ← Enc(pk, b, Mb) (sid, ssid, y0, y1) for each b ∈ {0, 1} GGGGGGGGGGGGGGGGGGGGGGGA outputs (sid, ssid, Dec(sk, yσ))
T 5 Security Proof Through defining w = s A + x − v0 and using s0 and x0 replace s and x, the second game outputs Obfuscator for LWE-based dual-mode cryptosystem in- T cluds two obf-branches, the trapdoor generation of keys in (A, s0 A + x0 − w, v1, w, s0), Dec-obf-branch, and the guaranteed existence and identi- fication of messy branches in Messy-obf-branch. Its prop- where w is uniform. Because v1 is uniform and indepen- erties take the form of theorem as follows. dent of the other variables, the third game outputs T Theorem 2. In the obfuscator for LWE-based encryption (A, s0 A + x0 − w, v1 − w, w, s0). scheme construction, the Messy-obf-branch and Dec-obf- branch are indistinguishability, assuming LWE is hard. The above three games are equivalent to each other. The hardness of LWE implies that (A, v1) is distin- T n Proof. The output of Dec-obf-branch is of form guishable from (A, s1 A + x1), where s1 ← Zq and T T 1×m (A, (s0 A+x0)−w, (s1 A+x1)−w). Because of the hard- x1 ← χ . So the prior games are indistinguishable T c ness of LWE, we have (A, s1 A + x1) ≈ (A, w1), where from the one that outputs 1×m w1 ← Z is uniformly random and independent. So we q (A, sT A + x − w, sT A + x − w, w, s ). T T c T 0 0 1 1 0 have (A, (s0 A + x0) − w, (s1 A + x1) − w) ≈ (A, (s0 A + x0) − w, w1 − w). The right side of the vector equation is This is the whole process, the final output is equivalent totally uniform, because w and w1 are uniform and inde- to (crs, (pk, sk0)) by definition. pendent. By the output of Messy-obf-branch is entirely c Theorem 4. In the obfuscator for LWE-based en- uniform, thus (A, (sT A + x ) − w, (sT A + x ) − w) ≈ 0 0 1 1 cryption scheme construction using the parameters (A, v , v ). √ 0 1 m ≥ 2(n + 1)logq and t ≥ qm · log2m, for (crs, t) ← Theorem 3. In the obfuscator for LWE-based en- Messy-obf-branch(1n) and every key pk, FindMessy(t, pk) cryption scheme construction satisfying for every outputs a messy branch with overwhelming probability. (crs, t) ← Dec − obf − branch(1n), TrapKenGen(t) Proof. In [8], the facts are as follows. Let m ≥ 2(n + outputs (pk, sk , sk ) such that for every σ ∈ {0, 1}, √ 0 1 1)logq and t ≥ qm · log2m, there is a negligible function (pk, sk ) ≈ KeyGen(σ), assuming LWE is hard. σ negl(m) such that with overwhelming probability over the √ Proof. The case σ = 0 and σ = 1 are symmetrically, so choice of A, S, for all but an at most (1/2 q)m fraction of 1×m we consider only one of them. Given the case σ = 0, we vectors p ∈ Zq , Ismessy(S, A, p) outputs messy with 1×m will prove that overwhelming probability. Define D ⊆ Zq to be the set of vectors p, then we have n c (Dec − obf − branch(1 ), KeyGen(0)) ≈ (crs, (pk, sk0)), √ m 1×m P r[v ∈/ D] ≤ (1/2 q) , where v ∈ Zq . where (crs, t) ← Dec − obf − branch(1n) and (pk, sk0, sk1) ← T rapKeyGen(t). We get the result using a sequence of hybrid games. 1×m For every pk ∈ Zq , there is a branch pk + vb ∈ M, 1×m By the outputs of these two branches are indistinguish- where b ∈ {0, 1} and v0, v1 ∈ Zq in the crs. For any able, we have the first hybrid game expands as fixed pk, we have
T (A, v0, v1, s A + x − v0, s), P r[pk + v0 ∈/ M and pk + v1 ∈/ D]
1×m 2 m where A, v0, v1 and s are uniform and x ← χ . = (P r[v ∈/ D]) ≤ (1/4q) . International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 807
For (crs, t) ← Messy-obf-branch(1n) and every key in Proceedings of the IEEE 56th Annual Symposium pk, FindMessy(t, pk) outputs a messy branch with over- on Foundations of Computer Science (FOCS’15), whelming probability, because of both branches lie outside pp. 171–190, 2015. D is at most (1/4)m=negl(n). [4] Y. L. Chen, C. Gentry, and S. Halevi, “Cryptanaly- From the above, we draw a conclusion that the ob- ses of candidate branching program obfuscators,” in fuscator for LWE-based encryption scheme is a slightly Advances in Cryptology, pp. 278–307, 2017. relaxed dual-mode cryptosystem. On the basis of it, we [5] R. Fernando, P. M. R. Rasmussen, and A. Sahai, obtain an OT protocol. The protocol operates in ei- “Preventing CLT attacks on obfuscation with linear ther obf-branches, which are obfuscation of the dual-mode overhead,” in Advances in Cryptology, pp. 242–271, encryption branches. The OT protocol based on dual- 2017. mode securely realizes the functionality FOT . Therefore, [6] S. Garg, C. Gentry, and S. Halevi, “Candidate multi- our LWE-based oblivious transfer protocol from indistin- linear maps from ideal lattices,” in Advances in Cryp- guishability obfuscation is secure. tology, pp. 1–17, 2013. [7] S. Garg, C. Gentry, S. Halevi, M. Raykova, A. Sahai, and B. Waters, “Candidate indistinguishability ob- 6 Conclusions fuscation and functional encryption for all circuits,” in IEEE 54th Annual Symposium on Foundations of We can see that most existing OT protocols are based Computer Science, pp. 40–49, 2013. on the hardness of number theoretical problems. In this [8] C. Gentry, C. Peikert, and V. Vaikuntanathan, paper, we propose a secure LWE-based oblivious trans- “Trapdoors for hard lattices and new cryptographic fer protocol from iO, and give the proof of security. In constructions,” in Proceedings of 40th Annual ACM addition to iO, our protocol is based on LWE-based dual- Symposium on Theory of Computing, pp. 197–206, mode encryption, which is a framework for efficient and 2008. composable oblivious transfer. Thus, our protocol can [9] H. Lin, “Indistinguishability obfuscation from realize the oblivious transform functionality. Compared constant-degree graded encoding schemes,” in Ad- with the protocol of Zheng et al., our protocol is secure in vances in Cryptology, pp. 28–57, 2016. the quantum environment. At present, using punctured [10] E. Miles, A. Sahai, and M. Zhandry, “Annihilation programs technique to carry out some applications of iO attacks for multilinear maps: Cryptanalysis of in- gradually become a central primitive for cryptography, so distinguishability obfuscation over GGH13,” in Ad- we would like to use this technique to build another se- vances in Cryptology, pp. 629–658, 2016. cure, efficient, and succinct oblivious transfer protocol. [11] C. Peikert, V. Vaikuntanathan, and B. Waters, “A framework for efficient and composable oblivious Acknowledgments transfer,” LNCS, vol. 5444, no. 72, pp. 554–571, 2009. This study was supported by the National Key R&D Pro- [12] A. Pellet-Mary, “Quantum attacks against indistin- gram of China under Grant No.2017YFB0802000, the guishablility obfuscators proved secure in the weak National Natural Science Foundations of China under multilinear map model,” in Advances in Cryptology Grant Nos.61972457, 61672412, 61402015, U1736111, the – CRYPTO 2018, pp. 153–183, 2018. National Cryptography Development Fund under grant [13] M. O. Rabin, “How to exchange secrets by obliv- No.MMJJ20170104, the MOE Layout Foundation of Hu- ious transfer,” Technical Report, 1981. (https:// manities and Social Sciences under Grant 19YJA790007, eprint.iacr.org/2005/187.pdf) and the Research Program of Baoji University of Arts and [14] O. Regev, “On lattices, learning with errors, random Sciences under Grant No.ZK2018093. I acknowledge the linear codes, and cryptography,” in Proceedings of anonymous reviewers for their valuable comments. 37th Annual ACM Symposium on Theory of Com- puting, pp. 84–93, 2005. [15] A. Sahai and B. Waters, “How to use indistinguisha- References bility obfuscation: Deniable encryption, and more,” Proceedings of the Annual ACM Symposium on The- [1] S. Badrinarayanan, E. Miles, A. Sahai, and ory of Computing, pp. 475–484, 2014. M. Zhandry, “Post-zeroizing obfuscation: New math- [16] P. W. Shor, “Polynomial-time algorithms for prime ematical tools, and the case of evasive circuits,” in factorization and discrete logarithms on a quantum Advances in Cryptology, pp. 764–791, 2016. computer,” SIAM Review, vol. 41, no. 2, pp. 303– [2] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, 332, 1999. A. Sahai, S. Vadhan, and K. Yang, “On the [17] Y. Zheng, W. Mei, and F. Xiao, “Secure oblivious (im)possibility of obfuscating programs,” in Ad- transfer protocol from indistinguishability obfusca- vances in Cryptology, pp. 1–18, 2001. tion,” The Journal of China Universities of Posts [3] N. Bitansky and V. Vaikuntanathan, “Indistin- and Telecommunications, vol. 23, no. 3, pp. 1–10, guishability obfuscation from functional encryption,” 2016. International Journal of Network Security, Vol.22, No.5, PP.801-808, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).10) 808
Biography Now she is a PhD student in Xidian University. Her main research interests include public key cryptography Shanshan Zhang received her B.S. degree in 2004 from and indistinguishable obfuscation. Baoji University of Arts and Sciences, and received her M.S. degree in 2007 from Huaibei Normal University. International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 809
Verifiable Secret Sharing Based On Micali-Rabin’s Random Vector Representations Technique
Haiou Yang and Youliang Tian (Corresponding author: Youliang Tian)
School of Computer Science and Technology, Guizhou University Guizhou Province, Guiyang, China (Email: [email protected]) (Received Mar. 24, 2019; Revised and Accepted Nov. 16, 2019; First Online Jan. 29, 2020)
Abstract sharing scheme based on Lagrange interpolation poly- nomial and mapping geometry theory respectively. As- Verifiable secret sharing is the core basic protocol of many muth [1] proposed the threshold secret sharing scheme cryptographic systems, which is widely used in secure based on Chinese Remainder Theorem (CRT). Halper and communication in network environment. By now, there Teague [6] combined game theory with secret sharing and are many researches on verifiable secret sharing for thresh- proposed the concept of rational secret sharing for the old structure, which lacks generality and flexibility com- first time, that is, all participants are rational rather than pared with general access structure. However it is diffi- honest or malicious. TIAN [12] analyzed the distribution cult to realize verifiable secret sharing scheme for general mechanism and reconstruction mechanism of secret shar- access structures. Existing generalized verifiable secret ing under the framework of game theory, and studied the sharing schemes are few and have low efficiency. In this problem of one secret sharing based on Bayesian game, paper, we propose a new verifiable secret sharing scheme which solved the cooperation of this kind of rational se- of general access structure. We use knowledge commit- cret sharing system. But none of these schemes can prop- ment scheme based on bilinear pairing to ensure the se- erly detect and prevent the malicious behavior of dealers curity and concealment of public information, and adopt and participants. To address possible dishonesty among the Micali-Rabin’s random vector representations tech- participants, Chor et al. [3] proposed the concept of ver- nique to improve the the efficiency of verification process. ifiable secret sharing (VSS) based on large integer factor Our security and performance analysis shows that the new decomposition problem for the first time. Stadler [11] im- scheme is more efficient and practical compared to exist- proved the Chor’s scheme, and proposed the Publicly Ver- ing similar schemes. ifiable Secret Sharing schemes (PVSS) based on discrete Keywords: Bilinear Pairing; General Access Structure; logarithm. TIAN [13] constructed a non-interactive pub- Micali-Rabin’s Random Vector Representations Tech- lic verifiable secret sharing using bilinear pairs on elliptic nique; Verifiable Secret Sharing curves, and its information rate reached 2/3. Jhanwar [4] proposed a PVSS scheme and provided a formal proof for the IND-secrecy of his scheme, based on the (t, n)-multi- 1 Introduction sequence of exponents Diffie-Hellman assumption. After that, a lot of achievements have been made in the re- Secret sharing is the basic protocol for constructing cryp- search on secret sharing schemes. However, almost all of tographic schemes such as secure multiparty computation the above schemes are designed for threshold structure, and digital signature, which is mainly used for the dis- the threshold secret sharing is only a special case of gen- tribution, preservation and reconstruction of secret and eralized secret sharing. key (or other secret information), to prevent loss, dam- age or been tampered of information. The fundamen- Threshold structure lack flexibility and are not applica- tal idea of secret sharing is that secret is divided into ble in some specific scenario compared with general access multiple parts by one dealer and shared among differ- structure. For instance, the dealer share secret among ent participants, and some subsets of participants can be participants U1,U2,U3,U4 and specify the two subsets of used to reconstruct the secret, while others cannot recon- participants {U1,U2}, {U2,U3,U4} can reconstruct the se- struct it and get no information about secret. Shamir [10] cret. So (t,n) threshold secret sharing scheme is no longer and Blakley [2] first proposed the (t,n) threshold secret applicable under this circumstance. In order to study the International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 810 secret sharing of general access structures with wider ap- of performance between our proposed GVSS and the VSS plicability, Ito et al. [8] first proposed a secret sharing proposed by ZHANG [16] and Tsu-Yang [14]. Conclusion scheme based on general access structure, that is, the co- is given in Section 5. operation of participants with any authorized subset can reconstruct the secret. Harn et al. [7] applied integer pro- gramming to the generalized secret sharing scheme. They 2 Preliminaries stipulate authorized subsets and non-authorized subsets to try to find a reasonable share allocation scheme so as 2.1 Bilinear Pairing to achieve the general access structure with the tradi- Let G1 and G2 be additive cyclic groups and multiplica- tional (t,n) threshold scheme, but the scheme was expen- tive cyclic groups of order q, where q is a big prime num- sive and has low efficiency in calculation. In the existing ber. Assuming that discrete logarithm problems on group secret sharing scheme of general access structure, either G1 and G2 are difficult. A map: e : G1 × G1 → G2 with the correctness of secret shares cannot be verified, or the the following properties is called a bilinear pairing: computational overhead is increased to achieve the verifi- ab ability of secret shares. 1) Bilinear: e(aP, bQ) = e(P,Q) for all P,Q ∈ G1 and ∗ Therefore, we propose an efficient generalized verifi- a, b ∈ Zq . able secret sharing scheme based on Micali-Rabin’s ran- dom vector representations technique. In view of the 2) Non-degenerate: There exists P,Q ∈ G1 such that complexity of verification process and the low probabil- e(P,Q) 6= 1. ity of verifying the correctness of computing result, we 3) Computable: For all P,Q ∈ G1, there exists an effi- adopt Micali-Rabins random vector representation tech- cient algorithm to compute e(P,Q) = 1. nique, based on Zero-Knowledge Proof(ZKPs), proposed by Micali and Rabin [9]. Zero Knowledge Proofs, pro- posed by Golddwasser [5], are one of the most remark- 2.2 Knowledge Commitment Scheme able innovations in information security, which refers to Based on Bilinear Pairing the ability of a prover to convince a verifier that a state- Let P,Q ∈ G1 be two generators of group G1. Nobody ment is true without providing any useful information to knows the the discrete log of P,Q (anybody does not know the verifier, has been widely used in the field of infor- ∗ the n ∈ Zq such that Q = nP ). When making commit- mation security. Rabin [9] developed a novel secure and ∗ ment to s ∈ Zq , we just have to compute the commitment highly efficient way for verifying correctness of the output COM(s) = e(P,Q)s. When revealing the commitment, of a transaction while keeping input values secret, based only s needs to be disclosed, and the verifier can verify on the ZKPs. Xin [15] proposed a new fair and ratio- whether the commitments revealed by dealer are correct nal delegation computation. Aiming at the complexity of according to COM(s) = e(P,Q)s. the verification problem, they adopted the Micali-Rabin’s random vector representation technique. Consequently, ZKPs is used to prove the correctness of the computing 2.3 Micali-Rabin’s Random Vector Rep- results, which provides a new direction for our research. resentations In this paper, a new generalized verifiable secret shar- We adopt the knowledge commitment scheme based on ing (GVSS) scheme is proposed. The contributions of our bilinear pairing by Tian [?], the equality is proved by zero proposed GVSS are as follows: First, we proposed a GVSS knowledge proofs. The properties of the bilinear pairing scheme based on the difficulty of Diffie-Hellman problem satisfy the above assumption, and Fq is a finite field and of bilinear pairings. In this scheme, secret shares are cho- q is a large prime number of 128bits. sen by the participants themselves, effectively avoiding deception by the dealer. Also, compared with threshold Definition 1. A random vector representation of x is ∗ schemes, this scheme can specify any authorized subsets a vector X = (u, v), where u, v ∈ Zq , u was randomly to reconstruct secret, which greatly increases the flexibil- chosen, and v = (x − u)modq. The value of the vector X ity of secret sharing and expands the application scenarios is val(X) = (u + v)modq. of the schemes. Second, on account of the comlexity of verification phase, we adopt Micali-Rabin’s random vec- Definition 2. Commitment to vector X = (u, v) is tor representation technique, that is, the secret shares are COM(X) = (COM(u),COM(v)), where COM(u) = u v represented by knowledge commitment scheme for bilin- e(P,Q) , COM(v) = e(P,Q) . ear pairing. When needs to verify the correctness of secret Definition 3. A list of commitments COM(X(j)), 1 ≤ shares, it only needs to execute an efficiently process ac- j ≤ m are called value consistent if val(X(j)) = cording to the public information on the bulletin board. val(X(j+1)) for any 1 ≤ j ≤ m. The rest of this paper is organized as follows. In Sec- tion 2 we give the relevant backgrounds. Our proposed When needs to prove COM(X),COM(Y ) value con- GVSS is described in Section 3. In Section 4, we give se- sistent, where X = (u1, v1),Y = (u2, v2), we will prove curity analysis of our proposed GVSS and the comparison val(X) = val(Y ). Note that val(X) = val(Y ) if and only International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 811
∗ if there exists w ∈ Zq such that X = Y + (w, −w)(If ax)modq. Simultaneously, chooses t different ran- such an w does not exist, the value is inconsistent). The dom numbers d1, d2, ··· , dt to represent these t au- prover randomly chooses c ← {1, 2}. Assume that c = 1, thorized subsets in Γ0 respectively. In succession, the prover reveals to verifier u1, u2, −w. The verifier com- SD computes f(1), and for each authorized sub- putes COM(u1),COM(u2), and compares to the posted set δj = {P1j,P2j, ··· ,P|δj |j} in Γ0 computes Hj = s0 s0 s0 first coordinates of COM(X),COM(Y ). The verifier f(dj) ⊕ H(R1j ) ⊕ H(R2j ) ⊕ · · · ⊕ H(R|δj |j ). Fi- next checks that u1=u2 − w is true. Vice versa, assume nally, publish R0, f(1),H1,H2, ··· ,Ht, d1, d2, ··· , dt that c = 2, the prover reveals to verifier v1, v2, w. The ver- on the SBB. ifier computes COM(v1),COM(v2) and compares to the posted second coordinates of COM(X),COM(Y ). The 3.2 Verification Phase verifier next checks that v1 = v2 + w is true. Apparently, 1 the prover accepts an false formula with a probability of 2 . All participants of any authorized subset δj can cooperate to reconstruct the secret s. Assume that the participants Lemma 1. If more than k commitments are of δj = {P1j,P2j, ··· ,P|δj |j} reconstruct the secret s. false, then the probability that the verifier accepts sij 1 k Step 3. Each participant P computes R 0 = R is ( 2 ) . ij ij 0 based on the public information R0 on the SBB. And (j) (j+1) Proof. The probability that any val(X ) 6= val(X ) each participant Pij posts on the SBB 3k rows of is not found to be wrong is at most 1 . So the probability 0 (h) (h) 2 Rij : COM(R1j0 ), ··· , COM(R|δ |j0 ), 1 ≤ h ≤ 3k. 1 k j that at least k formulas are not found wrong is ( 2 ) with The 3k rows of SBB use the Micali-Rabin’s ran- a randomly chosen value c ← {1, 2}. (h) dom vector representations technique COM(R1j0 ) = (h) (h) (h) (h) (COM(uij0 ), COM(vij0 )), where R1j0 = (uij0 , (h) (h) (h) (h) 3 Scheme vij0 ), val(R1j0 ) = (uij0 + vij0 ) mod q, 1 ≤ h ≤ 3k. Step 4. To begin with, P randomly chooses half of This scheme assumes that the Dealer D needs to share the ij the commitments from the 3k rows of Rij0 , Pij se- secret s between n participants. Also, the scheme assumes (h) the existence of a secure bulletin board(SBB), which is cretly reveals Rij0 and commitment values R1j0 = (h) (h) used to publish data and cannot be deleted or modified (uij0 , vij0 ) to SR. as soon as it is published. At the same time, the published Next, determines the value of c ← {1, 2} by flip- data is visible to dealer and all participants. The scheme ping a coin, opening a part of the remaining com- includes Distribution Phase, Verification Phase and Se- mitments value. Assume that c = 1, Pij secretly cret Reconstruction. (h) (h+1) reveals the commitments COM(u 0 ),COM(u 0 ) Initialization: Assume that F is a finite field and q is a ij ij q (h+1) (h) large prime number, G1 and G2 are respectively additive and −w to SR, where w = (uij0 − uij0 ) mod q. cyclic groups and multiplicative cyclic groups of order q, Assume that c = 2, Pij secretly reveals the commit- (h) (h+1) P,Q ∈ G1 are two generators of group G1. The properties ments COM(vij0 ), com(vij0 ) and w to SR, where of the bilinear pairing satisfy the above assumption, and (h+1) (h) w = (vij0 − vij0 ) mod q. there are efficient algorithms for mapping e : G1 × G1 → ∗ G2 on groups G1 and G2. H : G2 → Zq is the anti- Step 5. At first, SR privately received commitment val- collision hash function. (h) (h) (h) ues R1j0 = (uij0 , vij0 ) and Rij0 sent by Pij. SR The secret distributor specifies the authorized subset. (h) first verify that the equation val(Rij0 ) = (uij0 + Assume that the participants set is P = {P1,P2, ··· ,Pn}, (h) v 0 ) mod q is correct. Γ0 = {δ1, δ2, ··· , δn} is the minimum access struc- ij ture, δ = {P ,P , ··· ,P } is the authorized subset, j 1j 2j |δj |j Next, SR performs a value consistent check on the which |δj| is the number of members in δj. The secret dis- remaining commitment values. Assume that re- tributor is SD, the secret reconstructor is SR, the shared ceived c = 1, SR opens the commitment value secret is s. (h) (h+1) COM(uij0 ),COM(uij0 ) and −w, then verifies that the (h) (h+1) equation COM(uij0 ) = (COM(uij0 ) + (−w)) mod q is 3.1 Distribution Phase correct. If received c = 2, SR opens the commitment (h) (h+1) ∗ value COM(v 0 ),COM(v 0 ) and w, then verifies that Step 1. Each participant Pij randomly chooses sij ∈ Z , ij ij q (h) (h+1) sij and computes Rij = e(P,Q) . And keeps sij se- the equation vij0 = (uij0 +w) mod q is correct. Appar- cretly, delivers Rij to SD. ently, only opened half of the commitment values at one time, from lemma 1, it can be seen that the probability of 1 Step 2. SD randomly selects s0, and computes R0 = the participants accepting an false equation is 2 , if more s0 ∗ e(P,Q) . Then, chooses a ∈ Zq randomly and than k commitments are false, then the probability that 1 k construct a 1st degree polynomial f(x) = (s + the participants accept is ( 2 ) . International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 812
3.3 Secret Reconstruction of the commitment, since the calculation of Diffie-Hellman problem (CDHP) of bilinear pairings are hard to work out. Step 6. And then SR received the verified R 0 . With ij In conclusion, the knowledge commitment scheme based these values, SR can compute H 0 = H ⊕ H(R 0 ) ⊕ j j 1j on bilinear pairing meets the requirements of complete H(R 0 ) ⊕ · · · ⊕ H(R 0 ). With the two coordinate 2j |δj |j hiding and computational binding. points (1, f(1)), (dj,Hj0 ), SR can reconstruct f(x) = −1 xf(1) − xHj0 − djf(1) + Hj0 (1 − dj) . At last, the Theorem 3. It is assumed that the calculation of shared secret can be recovered by computing s = Diffie-Hellman problem (CDHP) of bilinear pair- f(0) mod q. ings are difficult to be solved, then the proposed scheme is of security. 4 Scheme Analysis Proof. In the verification phase, the attackers try to get the s0 and sij from the commitments R0 and the com- 4.1 Security Analysis (h) (h) mitments of 3k rows COM(R 0 ), ··· ,COM(R 0 ), 1 ≤ 1j |δj |j Theorem 1. If the R 0 received by the secret recu- ij h ≤ 3k posted on the SBB, they have to solve R0 = perator SR are verified by the Micali-Rabin’s ran- (h) u(h) (h) s0 ij0 e(P,Q) , COM(uij0 ) = e(P,Q) and COM(vij0 ) = dom vector representations technique, then Rij0 is v(h) the correct and has not been modified. Then this e(P,Q) ij0 . However, the discrete logarithm prob- new scheme is verifiable. lem(DLP) on the elliptic curve and the calculation of Diffie-Hellman problem (CDHP) of bilinear pairings are Proof. In the verification phase, by adopting the difficult to be solved. So it’s not computationally feasible Micali-Rabin random vector representations tech- to get the specifics of the commitments. nique, each participant Pij committed 3K rows: (h) (h) COM(R 0 ), ··· ,COM(R 0 ), 1 ≤ h ≤ 3k to the 1j |δj |j At the same time, the participant Pij tries to dis- Rij0 on the SBB. In the verification phase, SR verify (h) (h) tribute false Rij0 to SR in the verification phase, half of the commitments u 0 , v 0 and Rij0 sent by 0 0 ij ij that is COM(Rij0 ) = COM(Rij0 ), where Rij0 6= (h) (h) (h) 0 ∗ Pij(That is to verify that val(Rij0 ) = (uij0 + vij0 )modq Rij0 ,Rij0 Rij0 ∈ Zq . However, according to theorem 2, is equal). If verification fails, SR reject Rij0 sent by the the knowledge commitment scheme based on bilinear pair- dealer. If it’s verified, SR performed a value consistent ing meets the requirement of computational binding, the check on the remaining commitment values, that is dealer only can open the commitment in one way. So Pij (h) (h+1) 0 verified COM(uij0 ) = (COM(uij0 ) + (−w)) mod q cannot send a false Rij to SR. Also, the secret shares sij (h) (h+1) of each participant Pij in this scheme are chosen by the or vij0 = (uij0 + w) mod q. According to Lemma 1, if more than k commitment values are wrong, the participants themselves, avoiding the distributor’s decep- 1 k tion. probability of SR accepting the wrong results is ( 2 ) . To sum up, the Rij0 verified by Micali-Rabin’s random vector representations technique is the correct and has 4.2 Performance Analysis not been modified, this scheme is verifiable. This section briefly analyzes the performance of the pro- Theorem 2. The knowledge commitment scheme posed scheme by comparing it with the existing scheme. based on bilinear pairing meets the requirements Te denotes the time of executing a bilinear pairing, Tm of complete hiding and computational binding. denotes the time of executing a scalar of multiplication 0 ∗ 0 in G1, Texp denotes the time of executing an exponenti- Proof. Assume that there exists s ∈ Zq and s 6= s such that COM(s0) = COM(s)(that is, the dealer can open ation in G2, Tp denotes the time of computing the poly- the commitment in two ways). Assume that s = s0 +t, 0 < nomial value. The time of executing a modular addition ∗ s0 s operation in Z and one-way hash function are negligible t < q, that is e(P,Q) = e(P,Q) . Because P,Q ∈ G1 q compared with Te and Tm. Therefore, we just consider are two generators of group G1, and q is the big prime those time-consuming operations Te, Tm, Texp, Tp, other order on group G1, so qP = 0, qQ = 0(0 is the point at 0 computational overhead is ignored.the computational ef- infinity of the group G1). We get e(s P,Q) = e(sP, Q) 0 from e(P,Q)s = e(P,Q)s = e(sP, Q) = e(s0P,Q), so we ficiency as shown in Table 1. have sP = s0P . So there exists sP − s0P = tP = 0 with The above mentioned, |δj| denotes the number of mem- s = s0 + t, 0 < t < q. But, 0 < t < q, this contradicts bers in authorized subset, hence |δj| n. Therefore, P with order q, so it has to be s = s0, that is, the dealer performance analysis shows that in our scheme, with the only can open the commitment in one way. So the scheme adoption of Micali-Rabin’s random vector representations meets the requirement of computational binding. technique, the verification process is much less computa- tionally intensive. Compared with the above two schemes, Also COM(s) = e(sP, Q) = e(P, sQ), it is not compu- the computational costs in the distribution phase has also tationally feasible for an attacker to try to get the specifics been significantly improved. International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 813
Table 1: Comparison of computational costs
Distribution Verification Reconstruction Schemes Phase Phase Phase TIAN’s scheme[8] (3n + t)G1 tTexp + G1 + (t + 1)Te 3tTm + 2tTe ZHANG’s scheme[14] (n + 1)Tm + 2tTexp tTe + n(t + 1)Texp tTm Tsu-Yang’s scheme[15] nTe + (4n + t)Tm (n+3)Te + n(t + 1)Tm tTm +nTexp + nTp +nTexp + ntTp TIAN’s scheme[8] (n+1)Te + nTexp 6k |δj| Te + |δj| Texp Tm
4.3 Simulation Analysis For the generalized verifiable secret sharing based on Micali-Rabin’s random vector representation technique proposed in this paper, simulation analysis was carried out in combination with the actual scenario. All data were the average of the experimental results for 10 times. The execution performance of the secret distribution pro- cess is shown in Figure 1. It can be seen that the execution time is linear with the change of the number of people. Because as the number of people increases, the number Figure 2: The curve of calculation costs as the number of of operations of bilinear pairing and exponentiation in- people changes in the verification phase, and the curve of creases. As can be seen from Figure 2, as the sub-secret secret reconstruction calculations as the number of people is verified by Micali-Rabin’s random vector representation changes technique in the verification phase, the calculation time of the secret verification process is less affected by the number of participants, and the calculation time of the scheme adopts Micali-Rabin’s random vector representa- secret reconstruction process does not change much with tions technique, which greatly improves the efficiency of the number of people. In general, the scheme has good the verification phase. Finally, by analyzing the security application value in practical application scenarios. and performance of the scheme, the scheme satisfies the feature of verifiable secret sharing and is more efficient than the existing schemes. The next work is to design a efficient secret sharing scheme with general access struc- ture that can be publicly verified and applied to suitable application scenarios.
Acknowledgments
This work is supported by Key Projects of the Union Fund of the National Natural Science Foundation of China un- Figure 1: The curve of secret distribution calculations as der Grant No. U1836205; The National Natural Science the number of people changes Foundation of China under Grant No. 61772008; The Sci- ence and Technology Top-notch Talent Support Project in Guizhou Province Department of Education under Grant No.Qian Education Combined KY word [2016]060; The 5 Conclusions Guizhou Province Science and Technology Major Special Plan No. 30183001; The Guizhou Provincial Science and The (t,n) threshold secret sharing scheme has certain lim- Technology Plan Project under Grant No. [2017]5788; itations in practical application, so it is of great applica- The Ministry of Education-China Mobile Research Fund tion value to study the secret sharing of general access Project under Grant No. MCM20170401; The Guizhou structure. This paper proposes a verifiable secret shar- University Fostering Project No. [2017]5788; Research on ing scheme based on general access structure. Firstly, Block Data Fusion Analysis Theory and Security Manage- our scheme adopts the knowledge commitment scheme ment Model of Data Sharing Application (No. U1836205); based on the bilinear pairing to guarantee the conceal- Research on Key Technologies of Blockchain for Big Data ment and security of public information. Secondly, our Applications (Grant No. [2019]1098). International Journal of Network Security, Vol.22, No.5, PP.809-814, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).11) 814
References [12] Y. L. Tian and J. Katz J. F. Ma and C. G. Peng, “One-time rational secret sharing scheme based on [1] C. Asmuth and J. Bloom, “A modular approach to Bayesiangame,”Wuhan University Journal of Nature key safeguarding,”IEEE Transactions on Information Science, vol. 16, pp. 430-434, 2011. Theory, vol. 29, no. 2, pp. 208-210, 1983. [13] Y. Tian and C. Peng, “publicly verifiable secret [2] G. R. Blakley, “Safeguarding cryptographic sharing schemes using bilinear pairings,”International keys,”IEEE Computer Society Digital Library, Journal Network Security, vol. 14, no. 3, pp. 142-148, vol. 22, no. 11, pp. 612-613, 1979. 2012. [3] B. Chor and S. Goldwasser and S. Micali and B. Awer- [14] T. Wu and Y. Tseng, “A pairing-based publicly ver- buch, “ Verifiable secret sharing and achieving simul- ifiable secret sharing scheme,”Journal of Systems Sci- taneity in the presence of faults,” in Foundations of ence and Complexity, vol. 24, no. 1, pp. 186-194, 2011. Computer Science, pp. 383-395, 1985. [15] Y. Xin and M. O. Rabin, “Fair and rational delega- [4] Y. Gan and L. Wang and P. Pan and Y. Yang, “Pub- tion computation protocol,”Journal of Software, vol. licly verifiable secret sharing scheme with provable 29, no. 7, pp. 1953-1962, 2018. security against chosen secret attacks,”International [16] F. Zhang, “Efficient and information-theoretical se- Journal of Distributed Sensor Networks, pp. 1-9, 2013. cure verifiable secret sharing over bilinear groups,” [5] S. Goldwasser and S. Micali and C. Rackoff, Chinese Journal of Electronics, vol. 23, no. 1, pp. 13- “The knowledge complexity of interactive proof sys- 17, 2014. tems,”SIAM Journal on Computing, vol. 18, no. 1, pp. 186-208, 1989. [6] J. Halpern and V. Teague, “Rational secret sharing Biography and multiparty computation: Extended abstract,”The 36th Acm Symposium on Theory of Computing, pp. Haiou Yang biography. He received the B.Sc. degree in 623-632, 2004. Information Management and System from Dalian Com- [7] L. Harn and C. Hsu and M. Zhang and T. He and muication University in 2017. He is now a postgraduate M. Zhang, “Realizing secret sharing with general ac- student at Guizhou University. His research interests cess structure,”Information Sciences, vol. 367-368, pp. include Security, Cloud computing and Cryptographic 209-220, 2016. protocols. [8] M. Ito and A. Saito and T. Nishizeki, “Secret sharing Youliang Tian biography. He received the B.Sc. degree scheme realizing general access structure,”Electronics in Mathematics and Applied Mathematics in 2004 and and Communications in Japan Part Iii-fundamental the M.Sc. degree in Applied Mathematics from Guizhou Electronic Science, vol. 72, no. 9, pp. 56-64, 1989. University in 2009. He received the Ph.D. degree in cryp- [9] M. O. Rabin and Y. Mansour and S. Muthukrishnan tography from Xidian University in 2012. In the years and M. Yung, “Strictly-black-box zero-knowledge and 2012 to 2015, he was a Postdoctoral Associate at the efficient validation of financial transactions,” in In- State Key Laboratory for Chinese Academy of Sciences. ternational Colloquium Conference on Automata, pp. He is currently a professor and Ph.D. supervisor at 738-749, 2012. College Of Computer Science and Technology, GuiZhou [10] A. Shamir, “How to share a secret,”Communications University. His research interests include algorithm game of The ACM, vol. 22, no. 11, pp. 612-613, 1979. theory, cryptography and security protocol. [11] M. Stadler, “Publicly verifiable secret sharing,” in Theory and Application of Cryptographic Techniques, pp. 190-199, 1996. International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 815
A Privacy-Preserving Data Sharing System with Decentralized Attribute-based Encryption Scheme
Li Kang and Leyou Zhang (Corresponding author: Li Kang)
School of Mathematics and Statistics, Xidian University Xi’an, Shaanxi 710071, China (Email: li [email protected]) (Received Mar. 26, 2019; Revised and Accepted Nov. 16, 2019; First Online Jan. 29, 2020)
Abstract In many cases, the data owners are reluctant to ex- pose their data stored in the cloud server to all users as The huge storage space and powerful computing power the data may be sensitive, such as the personal health make cloud servers be the best place for people to store record (PHR). PHR is a new summarized electronic data. However, convenience comes with the risk of data record of an individual’s medical data and information, leakage as the cloud servers are not completely reliable. which are often exchanged through cloud servers. How- To ensure data confidentiality and user privacy, decentral- ever, the record may contain sensitive information about ized attribute-based encryption (ABE) can provide a solu- consumers, such as allergies, diseases, etc. Placing the raw tion. Nevertheless, most of existing works cannot provide PHR data directly on an unreliable third-party server will a complete method since there are some vulnerabilities threaten the owner’s privacy. Thus, to protect the data in users’ collusion resilience and privacy protection. In confidentiality, it is an effective way for the owner to exe- this paper, a decentralized ciphertext-policy (CP) ABE cute encryption operation before uploading it to the cloud scheme is proposed for the secure sharing of data. In server. the proposed scheme, without requiring any cooperation In 2005, Sahai and Waters [30] proposed the concept of and knowing users’ global identifiers (GID), authorities attribute-based encryption (ABE) to support fine-grained can issue private keys for users independently through a access control, in which only when the attributes of ci- privacy-preserving key generation protocol. Additionally, phertext match the decryption keys can the user success- this scheme supports policy anonymity. The security of fully recover the received ciphertext. Since then, many the proposed scheme is reduced to the decisional bilinear studies [4,22,25] have been proposed. The multi-authority Diffie-Hellman (DBDH) assumption. Theoretical analy- ABE (MA-ABE) [5] was introduced to reduce the bur- sis and performance evaluations illustrate the efficiency of den on central authority (CA) and to divide its pow- the scheme. ers. In 2011, the decentralized ABE scheme [18] as a Keywords: Cloud Storage; Data Sharing; Decentralized new MA-ABE scheme was proposed by Lewko and Wa- CP-ABE; Privacy-preserving ters. Neither CA nor authority cooperation exists in this construction, thus it is more in accord with the actual re- quirements for the independence of authorities. The basic 1 Introduction properties [17,21] that all ABE schemes should satisfy are: fine-grained access control, scalability, data confidential- 1.1 Background ity, and collusion resistance. The emergence of the big data era makes cloud storage In addition, the privacy protection of users cannot technology become the one of the hottest technologies. be ignored. Although the rapid development of cloud The cloud becomes the best choice for data users to store computing technology enables users to deal with a large data as its storage space is much larger than the local one. amount of digital information, the privacy of personal The so-called cloud storage is a new technology that puts data is also facing unprecedented challenges. After wit- data on the cloud for human access, by which users can nessing some information leakage incidents, people grad- easily access data at any time and anywhere through any ually realize the importance of privacy protection. They connected device. However, it also brings some challenges, want to keep their data private while gaining legal ac- such as data confidentiality. cess. Therefore, there is an urgent need for an encryption International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 816
data and the user privacy will be compromised. In addi- tion, there are many categories of attributes in the real world, so it is impractical to have only one authority to manage them. Chase [5] put forward a multi-authority ABE (MA- ABE) scheme to weaken the power of central author- ity (CA), in 2007. Instead of a single authority in the scheme, there are many authorities here to issue private keys for users. While sharing the pressure of Figure 1: Colluding authorities gather information about single-authority, multiple authorities have differentiated the user its rights and provided a more secure and effective man- agement mode. However, there is still a CA to manage other attribute authorities, and the multiple authorities scheme that can protect users’ privacy. need to cooperate to initialize the system. Chase and In the MA-ABE schemes, in order to protect data con- Chow [6] first introduced the concept of privacy protec- fidentiality against collusion attacks, the global identi- tion and proposed a privacy-preserving MA-ABE scheme, fier (GID) is usually tied to the decryption keys. Al- which employed an anonymous key issuing protocol to though the introduction of global identifier in secret keys achieve the goal of hiding the user’s GID and employed enhances their uniqueness and can resist the collusion a distributed pseudorandom functions (PRF) to get rid attack of unauthorized users to a certain extent, the of the CA. Later, some MA-ABE schemes with privacy- user’s privacy will inevitably be compromised if he/she preserving in PHR system have been proposed [19,27,32]. directly sends the undisguised identifier to each authority These schemes employed the distributed PRF technique for private key request. As shown in Figure 1, suppose in [6] to protect the GID information. In these schemes, that the authority Ai(i = 1, 2, 3) manages the attribute there are multiple authorities that manage a set of dis- ˜ set Ai = {ai,j}j={1,2}, the user U who owns attributes joint attributes, though CA is not required, there is still a {a1,1, a2,1, a3,1} and identifier u directly submits (u, ai,1) partnership among the multiple authorities namely every to Ai for secret key request. By jointly tracking the same two authorities (Ak,Aj) must perform a 2-party key ex- u [6], the colluding authorities can easily gather all the change to share the PRF seed sk,j = sj,k, which increases attributes attached to it. As a result, users’ personal in- communication and computing costs. In 2011, Lewko and formation U(u, {a1,1, a2,1, a3,1}) is thoroughly exposed to Waters [18] introduced a novel MA-ABE, namely decen- them. Therefore, the hiding of GID is the primary task of tralized attribute-based encryption scheme. Unlike the constructing a privacy-preserving encryption scheme. Be- previous schemes, there is no CA in this scheme, and the sides, the access policy also needs to be hidden because authorities are independent of each other without any co- it indicates the legal recipient. Malicious users can infer operation. the attributes of the recipient based on the access policy. The first decentralized KP-ABE scheme considering This also compromises user privacy. Based on this, a se- user GID privacy was proposed by Han et al. [12], in 2012. cure decentralized scheme dedicated to privacy protection Unfortunately, it turned out to be unsafe [10]. Then a se- is proposed. ries of improvement schemes [28, 34] were proposed. In scheme [34], Zhang et al. modified the original scheme 1.2 Related Work and came up with a more secure scheme against user col- lusion. All the schemes mentioned above exploited the Along with the development, many expansion schemes [1, technique of hiding GID in [6] to produce the secret keys 29, 35] were derived from the original ABE scheme [30]. for users. In general, ABE comes in two categories: key-policy In 2015, Huang et al. [15] put forward a revocable MA- ABE (KP-ABE) [11] and ciphertext-policy ABE (CP- ABE scheme. In their scheme, a CA is needed to send ABE) [2]. The CP-ABE scheme enables data owners to the seed sk to each authority Ak and generate the se- achieve absolute control over their data and decide who cret key component to users. To protect user GID and can and cannot access the data, since the access policy is attributes from being leaked, Han et al. [13] proposed a determined by them. In the data encryption system, con- PPDCP-ABE scheme. However, Wang et al. [31] found sidering the computational and storage costs of the autho- that it could neither resist the collusion attack nor achieve rized organization, single-authority ABE scheme like [36] attributes hiding. In 2018, the scheme [14] gave a new is no longer applicable. Since there is only a trusted cen- idea. In their proposal, there exists an identity manage- tral authority (CA) to manage all users, thus the compu- ment (IDM) that actually acts as a fully trusted CA and tational and communication costs of CA have undoubt- multiple attribute authorities. IDM is responsible for gen- edly increased. More notably, the excessive power of CA erating a pseudonym P idU,j for the user U corresponding will be a potential risk because it has all users’ information to the authority Aj, and generating an anonymous iden- and decryption keys, so that it can access all encrypted tity certificate (AIDCred,j) for the user by signing the files. Once the CA is corrupted, the confidentiality of the pseudonym and attributes information. The AIDCred,j is International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 817
Table 1: Comparisons of the proposed scheme with other MA-ABE schemes
Collusion Central AA GID Policy Decryption Schemes Resistance Authority Cooperation Anonymous Anonymous Outsourced User Authority Huang [15] Yes No No No No Yes N-1 Feng [9] Yes Yes Yes Yes No Yes N-2 Fan [7] Yes No No Yes No Yes N-1 Hu [14] Yes No Yes No No Yes N-1 Chase [6] No Yes Yes No No – N-2 Qian [27] No Yes Yes No No – N-1 Zhang [34] No No Yes No No Yes N-1 Qian [26] No No Yes No No No N-1 Feng [8] No No No Yes No Yes – Han [13] No No Yes No No No N-1 Lyu [23] No No Yes No Yes No N-1 Ours No No Yes Yes Yes Yes N-1
sent to Aj for verification, and if the verification passes, scheme supports both user anonymity and collusion resis- the authority will generate partial secret keys for the user. tant, and the proxy server is introduced here to undertake During the whole process, the authority cannot know the partial decryption computation. The main contributions user’s GID and attributes information, so the user’s pri- of the proposed paper are listed below. vacy is protected. This scheme is resistant to users col- • A decentralized CP-ABE scheme is presented to lusion and authorities collusion, however the existence of cater to the requirements of a distributed system IDM and collaboration between IDM and multiple au- where there is no central authority and multiple at- thorities suggests that further improvement is needed. tribute authority can independently manage users’ In 2017, Lyu et al. [23] proposed a decentralized scheme attributes and generate secret keys for them. to achieve GID anonymity and improve the efficiency by using the online/offline encryption and the verifiable out- • By using the privacy-preserving key generation pro- source decryption. This scheme can tolerant N − 1 com- tocol (PPKeyGen) and composite order bilinear promised authorities. However, the authors did not men- groups, we hide both GID and access policy, thus tion the privacy of the access policy. providing a higher privacy security. In consideration of GID and policy privacy, Qian et • The proposed construction can resist collusion at- al. [26] employed the AND,OR gates on multi-valued at- tacks of the unauthorized users and tolerant N − 1 tributes and put forward a PPDCP-ABE with fully hid- (N is the number of attribute authorities involved in den policy scheme in 2013. But a malicious user can guess ? encryption) corrupted authorities, which provides a the access structure with a simple test: e(Ci,j,1, g) = guarantee for data confidentiality. k e(Πk∈Ic Ti,j,Ci,j,2). Fan et al. [7] came up with a MA- ABE scheme for a hidden policy under the q-BDHE • Theoretical analysis and experimental simulations assumption. In 2018, a new access control system [8] are given to enhance the credibility of the proposed was proposed by Feng et al., where they divided at- scheme in terms of security and efficiency. tributes into attribute names and attribute values, and realized policy hiding by hiding attribute values. In this 1.4 Organization scheme, GID is bounded with the time period, namely The organization of the rest paper is as follows. In Sec- H1(GID||T ime), to resist the collusion attack of illegal users. In 2019, Feng et al. [9] improved the scheme [16] tion 2, some preliminaries involved in this paper are listed. and proposed a privacy-preserving searchable CP-ABE The system model and security model are described in scheme, which achieved attributes anonymity and collu- Section 3. In Section 4, the proposed scheme is presented sion resistance. However, the CA is needed in this scheme in detail. Sections 5 and 6 show the security analysis and to issue the random identity (RID) and user key (UK) for performance analysis, respectively. A brief conclusion is each user, where RID is used to replace the user’s real given in Section 7. identity to achieve identity anonymity. 2 Preliminaries 1.3 Contributions 2.1 Composite Order Bilinear Groups As shown in Table 1, some existing solutions either have Assume that and are two cyclic groups with same weaknesses in privacy protection and collusion resistance, G GT order N = pp , where p and p are two different primes, e : or require the CA or cooperation among multiple author- 1 1 × → is a bilinear map. and are subgroups ities to generate private keys for users. To improve the G G GT Gp Gp1 of the group having order p and p respectively. The user privacy and data confidentiality, a privacy-preserving G 1 map e satisfies the following features: decentralized CP-ABE scheme is proposed, in which the CA is no longer needed and each authority can indepen- 1) Bilinearity: ∀f, h ∈ G, ∀u, v ∈ ZN , here is the equa- dently issue partial secret key for the user. The proposed tion e(f u, hv) = e(f, h)uv. International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 818
2) Non-degenerate: ∃f ∈ G such that e(f, h) in group 3 Definition and Security Model GT has order N. 3.1 System Model 3) Orthogonality: Let gp and gp1 be the generator of
group Gp and Gp1 , respectively. Notice that e is effi- The system architecture is shown in Figure 2, where there ciently computable, and it actually satisfies another are 5 entities: Data Owner, N Attribute Authorities,
property: e(gp, gp1 ) = 1. Cloud Storage Server, Proxy Server and Data User.
2.2 Decisional Bilinear Diffie-Hellman Data Owner: The owner encrypts data file under his/her own specified access policy, then upload the cipher- (DBDH) Assumption text to the cloud server. Suppose g is the generator of group , and a, b, c, z are G Attribute Authorities: Authorities are responsible for random numbers selected from . The DBDH assump- Zp generating secret keys for users, which are indepen- tion holds if given a tuple (A, B, C, Z) = (ga, gb, gc,Z), dent of each other and manage disjoint attribute sets. there is no algorithm B can distinguish Z = e(g, g)abc or The authorities are not entirely credible as they will Z = e(g, g)z in polynomial-time with non-negligible ad- try to collude with other entities to gather useful in- vantage. B’s advantage is defined as: formation in order to obtain illegal profits. AdvDBDH = | Pr[B(A, B, C, e(g, g)abc) = 1] B Cloud Storage Server: The server is assumed to be an − Pr[B(A, B, C, e(g, g)z) = 1]|. honest-but-curious entity that stores the encrypted data file uploaded by data owners and provides access 2.3 Access Structure channel for the users. It acts normally in most of time but may attempt to collect as much information as The AND-gate on multi-valued attributes is applied in possible. this scheme as an access policy. Suppose U = [a1, a2, ··· , an] is an attribute set, Proxy Server: The proxy server owns strong computing each attribute ai ∈ U has ni possible values Vi = power and it only provides partial decryption com- puting service for users. {vi,1, vi,2, ··· , vi,ni }. Let S = [S1,S2, ··· ,Sn] be a user’s attribute set where Si ∈ Vi, and W = [W1,W2, ··· ,Wn] Data User: The user can issue secret key queries to the be an access structure where Wi ∈ Vi. The symbol S |= W indicates that the attribute set S satisfies the authorities and download any ciphertext on the cloud server. However, only when the user’s attributes sat- access structure W (namely, Si = Wi), and S 6|= W indi- cates the opposite. isfy the access structure can the data be recovered. Users are assumed to be distrusted, and they have a tendency to conspire with other entities to illegally 2.4 Commitment access the data file. The commitment scheme adopted in this construction is the Pedersen commitment scheme [24], which is a 3.2 Outline of Decentralized ABE perfectly hiding scheme. This scheme is stated as fol- Scheme lows. Assume G is a prime-order group with generators g0, g1, ··· , gk. To commit messages (m1, m2, ··· , mk), the Let A1,A2, ··· ,AN represent N attribute authorities, r Qk mk user picks r ∈R Zp, and calculates T = g0 j=1 gj . The and each authority Ak monitors a disjoint attributes set number r is used by the user to decommit the commit- A˜k, namely, A˜k ∩ A˜j = ∅ (k, j ∈ [1,N] ∧ k 6= j). L˜ ment T when needed. represents a list of attributes for the user U. A decentralized ABE scheme has five algorithms as fol- 2.5 Zero-Knowledge Proof lows: Zero knowledge proof (ZKP) is a kind of interactive proof Global Setup(1λ): Input λ as a security parameter, the that the prover proves some knowledge to the verifier algorithm outputs the system parameters PP . without leaking them. The ZKP scheme proposed by Ca- menisch and Stadler [3] is briefly described as follows. Authority Setup(PP ): Taking system parameters PP Take the following formula as an example, as input, each authority Ak executes this algorithm to generate its public-secret key pair (PKk,SKk). α β α γ P oK{(α, β, γ): y = g h ∧ y0 = g h }, 0 0 ˜k KeyGen(PP,SKk, GID, Au): Input the system param- where {g, h} and {g0, h0} are generators of group G and eters PP , secret keys SKk, user’s GID and attributes ˜k ˜k ˜ T ˜ G0, respectively. The values α, β and γ are knowledge set Au, where Au = Ak L, authority Ak performs k that need to be proven, the remaining values are used by this algorithm and returns secret keys SKU for the the checker to verify the equations. user. International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 819
Figure 2: System model of the proposed scheme
Encrypt(PP,PKk, M,W ): With the system parame- a random number ξ ∈ {0, 1}. The encryption algo- ters PP , public keys PKk, message M and access rithm is executed to encrypt the message Mξ under policy W as input, the encryption algorithm outputs the access policy W∗ and outputs the ciphertext CT ∗ CT as the corresponding ciphertext. accordingly. Then B returns ciphertext CT ∗ to ad-
k versary A. Decrypt(P P, GID, SKU ,CT ): This algorithm consists of two phases: Pre-Decrypt and User-Decrypt. The Phase 2: Same as Phase 1. first phase is performed by the proxy server and the 0 0 second is executed by the data user. The user first Guess: A returns ξ as a guess on ξ. If ξ = ξ, A wins k the game. modifies the obtained secret keys SKU to construct 0k 0k new keys SKU , and then sends the (SKU ,CT ) to the Definition 1. If there is no t-time adversary who can proxy server for partial decryption calculations. Fi- make q secret key queries to break through the above game nally, the user performs the remaining decryption cal- with a non-negligible advantage, then the scheme is (t, q, ) ∗ culations with the results T returned by the proxy secure in the selective model. server.
3.3 Security Model 4 Scheme Construction The security model is defined by a security game between 4.1 Decentralized ABE Scheme adversary A and challenger B. Global Setup(1λ): Taking the security parameter λ as Initialization: A provides an access policy W∗ that input, this algorithm produces two cyclic groups he/she wants to challenge and a list of corrupted au- G, GT with same order N = pp1, where p and p1 thorities CA (|CA| < N) to B. are two different primes, and a bilinear map, e : × → . Suppose that and are sub- Global Setup: B executes this algorithm and returns G G GT Gp Gp1 groups of with order p and p , respectively. Let g system parameters PP to A. G 1 p1 be a generator of Gp1 and gp, g0, g1 be generators of ∗ Authority Setup: Two cases are discussed here: Gp. H : {0, 1} → ZN is a hash function. Public pa-
rameters are PP = (e, p, p1, gp, g0, g1, gp1 ,H, G, GT ). 1) For the corrupted authorities, namely Ak ∈ CA, B performs the authority setup algorithm and Authority Setup(PP ): Let A˜k be an attribute set returns the public-secret key pair (PKk,SKk) monitored by authority Ak. Each Ak randomly se-
to A. lects αk, βk, ak,i,j ∈R ZN and Rk,Rk,i,j ∈R Gp1 (j = [1, n ]). Then A computes Y = e(g , g )αk ,Z = 2) For the uncorrupted authorities, namely Ak 6∈ i k k p p k β ak.i,j g k · R and A = gp · R as its public keys, CA, B performs the authority setup algorithm p k k,i,j k,i,j namely PK = (Y ,Z , {A } ). The secret and sends public keys PKk to A. k k k k,i,j j=[1,ni] keys are SKk = (αk, βk, {ak,i,j}j=[1,ni]). Phase 1: A initiates a private key query to B by sub- ˜k mitting a list of attributes L˜, and the only limitation KeyGen(PP,SKk, GID, Au): Let u = H(GID). Input ˜ is that the list of attributes submitted does not meet PP,SKk and user information (u, L), Ak performs the access policy to be challenged, namely, L˜ 6|= W∗. the PPKeyGen algorithm to produce the decryption ˜k u ∗ Then B executes the algorithm KeyGen and outputs keys as follows. For vk,i,j ∈ Au, Ak picks tk,i,j ∈R ZN P u the secret keys to A correspondingly. and sets tk,u = tk,i,j. Then, Ak calculates:
tu Challenge: The adversary A provides M0 and M1 to tk,u k,i,j (β +u)a αk βk+u uβk k k,i,j challenger B, where |M0| = |M1|. Then B picks Dk,u = gp g0 g1 ,Dk,i,j = g0 International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 820
˜k ˜ T ˜ ˜ where Au = Ak L, L represents an attribute list of The detail steps are shown in algorithm 2. In the the user. check phase, the equation gH(K) =? CT 00 is used to
Then, Ak outputs the secret keys verify the correctness of the symmetric key K. Only the user whose attributes satisfy the access policy k can get the correct K and further run the symmetric SKU = (Dk,u, {Dk,i,j}v ∈A˜k ). k,i,j u decryption algorithm Dec to obtain the correct data file M. It can be known from the algorithm that user Encrypt(PP,PK , M,W ): The owner employs the ex- k only needs to perform one exponential operation and isting symmetric encryption algorithm to encrypt the one pairing operation to obtain the symmetric secret data file M, and gets CT 0 = Enc (M), where K is K key K, thus greatly reducing the computing cost of the symmetric key. The decentralized ABE scheme user. is then applied to encrypt K under the access struc- ture W defined by the encryptor. Finally, the owner Algorithm 2 Decrypt uploads the ciphertext {CT,CT 0,CT 00} to the cloud 1: Begin server. Algorithm 1 gives the specific encryption k 0 00 2: Input {PP,SKU , {CT,CT ,CT }}. steps. Ic represents an index set of relevant authori- 3: User selects a random number z ∈ ∗ . ties A . ZN k 0k 0 0 1/z 1/z 4: Calculate SKU = (Dk,u,Dk,i,j) = (Dk,u ,Dk,i,j). 0k Algorithm 1 Encrypt 5: Set NK = (SKU ). 1: Begin 6: User sends (CT,NK) to the proxy server. 7: Pre-Decrypt: 2: Input {PP,PKk,M, K,W }. Q e(D0 ,C ) ∗ k∈Ic k,u 1 3: Select symmetric encryption algorithm Enc and key 8: Calculate T = Q 0 . e(D ,Ck,x,y ) K. vk,x,y ∈W k,i,j 9: Send T ∗ to the user. 4: for data file M do 10: 0 User-Decrypt: 5: CT = EncK(M). u ∗ z 11: Calculate B = e(C2, g1 ), K = CB/(T ) . 6: end for ? 12: Check gH(K) = CT 00 7: for symmetric key K do 13: if gH(K) = CT 00 then 8: 00 H(K) CT = g . 0 ∗ 14: M = DecK(CT ). 9: select s ∈R ZN ,R1,R2 ∈R Gp1 . Q s s Q s 15: else 10: C = K k∈I Yk ,C1 = gp·R1,C2 = ( k∈I Zk)·R2. c c 16: return ⊥. 11: for each attribute vk,x,y ∈ W do 0 17: end if 12: select Rk,x,y ∈R Gp1 . s 0 18: Output M or ⊥. 13: Ck,x,y = Ak,x,y · Rk,x,y. 19: End 14: end for
15: CT = (C,C1,C2, {Ck.x,y}vk,x,y ∈W ). 16: end for 0 00 17: Output {CT,CT ,CT }. Correctness. 18: End Q e(D0 ,C ) ∗ k∈Ic k,u 1 T = Q 0 v ∈W e(Dk,i,j,Ck,x,y) k k,x,y Decrypt(P P, GID, SKU ,CT ): Any user can download tk,u 0 00 β +u uβ the ciphertext {CT,CT ,CT } from the cloud server Q e((gαk g k g k )1/z, gs · R ) k∈Ic p 0 1 p 1 = tu for decryption. To reduce the computational burden, k,i,j the user can outsource some decryption operations Q (βk+u)ak,i,j 1/z sak,x,y s 0 v ∈W e((g0 ) , gp Rk,x,yRk,x,y) to the proxy server. The decryption algorithm has k,x,y tk,u α s s suβ 1/z two phases: Pre-Decrypt and User-Decrypt. The user Q (e(g , g ) k e(g , g ) βk+u e(g , g ) k ) k∈Ic p p p 0 p 1 ∗ = t first chooses z ∈R ZN , and then calculates the new s k,u Q (e(g , g ) βk+u )1/z keys NK. Then, the user sends (CT,NK) to the k∈Ic p 0
proxy server and keeps z secret. Y αks/z suβk/z = e(gp, gp) e(gp, g1) k∈I Pre-Decrypt: This phase is executed by the proxy c server. Input (CT,NK), the server performs
partial decryption calculation and sends the re- u Y βks s u Y suβk B = e(C2, g ) = e( g R ·R2, g ) = e(gp, g1) sult T ∗ to the user. 1 p k 1 k∈Ic k∈Ic User-Decrypt: The user decrypts K through the se- cret value z and the result T ∗ returned by the Q αks suβk server. Then, user uses the symmetric key K to K· e(gp, gp) · e(gp, g1) K = BC/(T ∗)z = k∈Ic get M. (Q e(g , g )αks/ze(g , g )suβk/z)z k∈Ic p p p 1 International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 821
∗ Algorithm 3 PPKeyGen Protocol 2) U picks z ∈R ZN and runs the Commit algorithm on 1: Begin the GID u. Let T be the commitment value. U com- 0 0 2: U selects u, ρ0 ∈R N , Ak selects ρ1, βk ∈R N putes (T,P0,T ,P ) and sends them to Ak. More- Z Z 0 ∗ 2PC ∗ z u 3: over, U sends P oK{(u, ρ0, z ): T = gp g1 ∧ P0 = U(u, ρ0) ←−−→ Ak(ρ1, βk): η = (u + βk)ρ0ρ1 mod N ρ ∗ g 0 } to A to anonymously prove that he/she has 4: U selects z , z1, z2, z3 ∈R N , and computes T = 0 k ∗ Z ∗ z u ρ0 0 z1 z2 0 z3 knowledge (u, ρ0, z ). gp g1 , P0 = g0 ,T = gp g1 , P0 = g0 0 0 5: U returns (T,P0,T ,P ) to Ak 0 3) Ak checks the proof first. If the proof is correct, then 6: Ak picks c ∈R ZN and sends c to U ˜ ˜ ∗ Ak computes (P1, Dk,u, Dk,i,j) and sends them to U. 7: U sets a1 = z1 − cz , a2 = z2 − cu, a3 = z3 − cρ0 and Otherwise, Ak will terminate the algorithm. Then, sends (a1, a2, a3) to Ak Ak sends P oK{(αk, βk, ρ1, tk,u, ak,i,j): D˜ k,u∧D˜ k,i,j ∧ 8: Ak checks the zero-knowledge proof a a P1} to U to anonymously prove that it has knowledge 9: if T 0 = ga1 g 2 T c and P 0 = g 3 P c then p 1 0 0 0 (αk, βk, ρ1, tk,u, ak,i,j). ˜k u 10: ∀vk,i,j ∈ Au, Ak selects tk,i,j ∈R ZN and sets tk,u = P u 4) Similarly, A checks the zero-knowledge proof. If tk,i,j k 11: Ak selects y1, y2, y3, y4, y5 ∈R ZN and computes this proof works correctly, U can compute Dk,u and tu k,i,j Dk,i,j. Otherwise, U stops this algorithm. ηa ρ1 0 y4 0 y2 ˜ k,i,j P1 = g0 , P1 = g0 , Zk = gp , Dk,i,j = P1 , ρ t Algorithm 3 illustrates the specific steps. 1 k,u y ˜ αk βk η ˜ 0 5 ˜ 0 Dk,u = gp T P0 , Dk,i,j = P1 , Dk,u = The PPKeyGen algorithm needs to meet two features: y1 y2 y3 gp T P0 leak-freeness and selective-failure blindness. In the first ˜ ˜ 0 ˜ 0 ˜ 0 0 12: Ak sends (P1, Dk,u, Dk,i,j,P1, Dk,u, Dk,i,j,Zk) to U one, a malicious user running the PPKeyGen algorithm 13: else with trusted authorities would not gain more informa- 14: Abort tion than executing the KeyGen algorithm. The second 15: end if means that a malicious authority cannot get any informa- 0 0 16: U chooses c ∈R ZN , and sends c to Ak tion about the user’s GID, nor can it fail the PPKeyGen 0 0 0 0 0 17: Ak sets y1 = y1 − c αk, y2 = y2 − c βk, y3 = y3 − algorithm based on the user’s selection of GID. The defi- u 0 tk,uρ1 0 0 0 0 tk,i,j c , y = y4 − c ρ1, y = y5 − c nitions of the two features are shown below. η 4 5 ηak,i,j 0 0 0 0 0 18: Ak sends (y1, y2, y3, y4, y5) to U. Definition 2. (leak-freeness). A PPKeyGen algorithm 19: U checks the proof is leak-free if for all efficient adversaries U, there exists y0 0 y0 0 0 0 0 4 c ˜ 0 1 y2 y3 ˜ c ˜ 0 0 20: if P1 = g0 P1 ,Dk,u = gp T P0 (Dk,u) ,Dk,i,j = a simulator U such that no efficient distinguisher D can 0 y0 y0 0 ˜ c 5 0 2 c distinguish whether U is executing in the real game or in Dk,i,jP1 and Zk = gp (Zk) then ˜ the ideal game with non-negligible advantage. The two Dk,u ˜ ρ0 21: U computes Dk,u = z∗ ,Dk,i,j = Dk,i,j, where Zk games are defined as follows: β Zk = g k p Real Game: The distinguisher D runs the setup algo- 22: else rithm as many times as it wants, the malicious user 23: Abort U selects a GID u and executes the PPKeyGen algo- 24: end if rithm with the honest authority. 25: End Ideal Game: The distinguisher D runs the setup algo- rithm as many times as it wants, and the simulator 4.2 Privacy-Preserving Key Generation U0 chooses a GID u0 and executes the KeyGen algo- Protocol rithm with a trusted authority. Each user has a unique global identifier that distin- Definition 3. (selective-failure blindness). A PPKeyGen guishes them from each other, if the GID is exposed to algorithm is selective-failure blind if no probably polyno- a malicious party, this will directly endanger the privacy. mial time adversary Ak can win the following game with To achieve GID hidden, the knowledge of ZKP and com- non-negligible advantage. mitment scheme can be utilized to conduct a privacy- preserving key generation protocol between the relevant 1) The malicious authority Ak submits public key PKk authority and the user. Without revealing GID, user can and two global identifiers u0, u1. prove his/her legitimacy to the authority and get the se- 2) A random bit b ∈ {0, 1} is selected. cret keys generated by the authority. The PPKeyGen algorithm is presented as follows. 3) Ak is given two comments comb and com1−b, then it can black-box access oracles U(P P, ub,PKk, comb) 1) The user U selects u, ρ ∈ , and the authority A 0 ZN k and U(P P, u1−b,PKk, com1−b). selects ρ1, βk ∈ ZN . Then Ak executes the 2-party 4) The algorithm U returns the secret keys SKk and secure computing (2PC) protocol with U and gets Ub η = (u + β )ρ ρ mod N. SKk , separately. k 0 1 U1−b International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 822
5) If SKk 6=⊥ and SKk 6=⊥, A is given If X0 = Y 0, namely, a = a , means that Ub U1−b k k,i,j k,x,y 0 0 (SKk ,SKk ); If SKk 6=⊥ and SKk =⊥, A vk,i,j ∈ W ; If X 6= Y , namely, ak,i,j 6= ak,x,y, means Ub U1−b Ub U1−b k is given (, ⊥); If SKk =⊥ and SKk 6=⊥, A that vk,i,j 6∈ W . Ub U1−b k k k is given (⊥, ); If SK =⊥ and SK =⊥, Ak is Ub U1−b Do the same operations on the proposed scheme, then given (⊥, ⊥). X = e(A ,C ) = e(gak,i,j · R , gs · R ) 0 k,i,j 1 p k,i,j p 1 6) Finally, Ak returns its guess b on b. Ak wins the = e(g , g )sak,i,j e(R ,R ) game if b0 = b. p p k,i,j 1 sak,x,y s 0 Y = e(gp,Ck,x,y) = e(gp, gp · Rk,x,y · Rk,x,y) = e(g , g )sak,x,y 5 Security Analysis p p An attacker cannot determine whether the attribute value 5.1 Scheme Analysis vk,i,j belongs to W by comparing the values of X and Y . Therefore, the privacy of the recipient is protected to a Identity Privacy: The PPKeyGen algorithm intro- certain extent. duced in this paper is an interactive process between the user and each authority, which ensures that the user can obtain the correct secret key from the au- 5.2 Security Proof thority without directly providing the unencapsu- Theorem 1. The proposed scheme is secure in the se- lated u. In this case, the identifier u is confidential lective access policy model, assuming that the DBDH as- to all authorities, so they can no longer track it to sumption holds. aggregate user’s information. Theorem 2 in Section 5.2 shows that this protocol satisfies leak-freeness and Proof. Assuming that the proposed scheme can be bro- selective-failure blindness. ken by an adversary A with a non-negligible advantage , then, using the ability of A, a simulator B can be con- Collusion Resistance: In this construction, the au- structed to solve the DBDH problem with the advantage thors bind all the user’s key components to u, making /2. the user’s secret key unique. The power settings of g0 tk,u and g1 (namely, and uβk, respectively) make it Firstly, the challenger generates a bilinear group βk+u impossible for different users to achieve effective at- (e, p, p1, G, GT ), where G = Gp × Gp1 . Then he/she picks tacks by combining their secret keys. What’s more, a random number ϕ ∈ {0, 1}, and sets: the scheme can prevent the collusion attack of multi- ( abc ple authorities. By observing the ciphertext compo- Z = e(gp, gp) , ϕ = 0 Q αks (1) nent C in CT (namely, K e(gp, gp) ), it can z k∈Ic Z = e(gp, gp) , ϕ = 1 be known that only when all relevant αk is obtained can the message K be restored. Therefore, as long where, z is a random number in group G. The sim- as one authority is honest in the set of authority in- ulator gets the challenge tuple (gp, gp1 , A, B, C, Z) = volved, the attack will not succeed. a b c (gp, gp1 , gp , gp, gp,Z) from the challenger and finally re- turns a guess ϕ0 on ϕ. Hidden Policy: The authors implement attributes anonymity in the access policy by applying some Initialization: The adversary A provides a set of cor- 0 random elements (R1,R2,R ) in group Gp on ∗ k,x,y 1 rupted authorities CA and an access structure W some ciphertext components (C1,C2,Ck,x,y) [20,33]. which he/she wants to challenge. Suppose there is Such a setting is very necessary. The existence at least one completely honest authority A∗ in the of these random elements makes it difficult for security game. an attacker to easily guess the value of the at- tribute embedded by the encryptor in the access Global Setup: B randomly chooses γ, θ ∈R ZN and sets γ a+γ θ policy. Without the introduction of these random g0 = A · gp = gp , g1 = gp. Then, B sends PP = elements, the original ciphertext would be C0 = 1 (e, p, p1, gp, g0, g1, gp1 ,H, G, GT ) to adversary A. gs,C0 = (Q Z0 s),C0 = A0s , where Z0 = p 2 k∈Ic k k,x,y k,x,y k a βk 0 k,i,j Authority Setup: Three cases are discussed here: gp ,Ak,i,j = gp . An attacker needs only a simple 0 0 ? 0 test e(Ak,i,j,C1) = e(gp,Ck,x,y) to be able to guess 1) For the corrupted authorities Ak ∈ CA, B whether the encryptor has embedded the attribute randomly selects αk, βk, ak,i,j ∈R ZN and
value vk,i,j in the access policy or not. In fact, Rk,Rk,i,j ∈R Gp1 , then calculates Yk = a αk βk k.i,j e(gp, gp) ,Zk = gp · Rk and Ak,i,j = gp · 0 0 0 ak,i,j s sak,i,j X = e(Ak,i,j,C1) = e(gp , gp) = e(gp, gp) Rk,i,j. Then, B sends the secret keys SKk =
(αk, βk, {ak,i,j}j=[1,ni]) and the public keys 0 0 sak,x,y sak,x,y Y = e(gp,Ck,x,y) = e(gp, gp ) = e(gp, gp) PKk = (Yk,Zk, {Ak,i,j}j=[1,ni]) to adversary A. International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 823
2) For the authority Ak not corrupted and not Challenge: The adversary A provides two messages of ∗ A , B randomly chooses αk, βk, ak,i,j ∈R ZN the same length K0 and K1. B chooses a random bit ∗ and Rk,Rk,i,j ∈R Gp1 , computes Yk = ξ ∈ {0, 1} and then encrypts the message Kξ: CT = a bαk βk k,i,j ∗ ∗ c ∗ Q βkc c e(gp, gp) ,Zk = gp ·Rk,Ak,i,j = gp ·Rk,i,j, {C = Kξ · Z,C1 = g · R1,C2 = ( k∈I gp · Rk) · ca c ∗ bak,i,j ∗ ∗ k,x,y c 0 when vk,i,j ∈ W , or Ak,i,j = gp · Rk,i,j, R2, ∀vk,x,y ∈ W : Ck,x,y = gp · Rk,x,y · Rk,x,y}. when v 6∈ W∗. B sends the public keys k,i,j Phase 2: Same as Phase 1. PKk = (Yk,Zk, {Ak,i,j}j=[1,ni]) to adversary A. 3) For the authority A∗, B randomly selects β∗, Guess: Adversary A returns the guess ξ0 on ξ. If ξ0 = ξ, ∗ ∗ ∗ simulator B returns ϕ0 = 0 to the challenger; other- ai,j ∈R N and R ,Ri,j ∈R p1 , then com- Z G 0 putes Y ∗ = e(ga, gb) · Q e(g , g )−αk · wise, simulator B returns ϕ = 1. p p Ak∈CA p p ∗ Q −bαk ∗ β ∗ ∗ e(g , g ) and Z = g · R . z Ak6∈CA∪A p p p ∗ ∗ If ϕ = 1, Z = e(gp, gp) is a random value, adversary ∗ ai,j ∗ ∗ ∗ bai,j ∗ 0 Ai,j = gp ·Ri,j, if vi,j ∈ W ; Ai,j = gp ·Ri,j, A cannot get any information about ξ, so P r[ξ 6= ξ | ∗ 1 0 0 if vi,j 6∈ W . Then, B sends the public keys ϕ = 1] = 2 . Since B returns ϕ = 1 when ξ 6= ξ, thus ∗ ∗ ∗ ∗ P r[ϕ0 = ϕ | ϕ = 1] = 1 . PK = (Y ,Z , {Ai,j}j=[1,ni]) to adversary A. 2 If ϕ = 0, then Z = e(g , g )abc, the adversary A will Phase 1: The authority A issues a secret keys query for p p get a valid ciphertext of message K . The advantage of A global identifier u∗ with a list of attributes L∗, ξ ∗ ∗ in breaking the proposed scheme is (non-negligible) by where L 6|= W . 0 1 definition, so P r[ξ = ξ | ϕ = 0] = 2 + . Since B returns ϕ0 = 0 when ξ0 = ξ, thus P r[ϕ0 = ϕ | ϕ = 0] = 1 + . 1) For A ∈ C , A can generate the user secret 2 k A Therefore, the advantage of B to break the DBDH as- keys directly by himself. 1 0 1 0 sumption is | 2 P r[ϕ = ϕ | ϕ = 0] + 2 P r[ϕ = ϕ | ϕ = ∗ k 1 2) For A 6∈ C ∪ A , ∀v ∈ A˜ ∗ , B picks k A k,i,j u 1] − 2 |= /2 (non-negligible). u∗ P u∗ tk,i,j ∈R ZN and sets tk,u∗ = tk,i,j. Then, tk,u∗ Theorem 2. The proposed PPKeyGen algorithm is leak- ∗ β +u∗ αk k u βk B computes Dk,u∗ = B g0 g1 ,Dk,i,j = free and selective-failure blind. u∗ tk,i,j ∗ (βk+u )ak,i,j Proof. (Leak-freeness) Suppose that there is a malicious g0 . user U runs the PPKeyGen algorithm with an honest Ak ∗ ˜∗ u∗ 3) For Ak = A , ∀vi,j ∈ Au∗ , B chooses ti,j ∈R ZN , in the real game. There should also exist a simulator P u∗ ∗ ˜ sets tu∗ = ti,j, and computes Du∗ = U runs the KeyGen algorithm with a trusted authority in tu∗ ∗ ∗ ∗ ∗ the ideal game such that no distinguisher D can effectively −γ β +u u β Q −αk Q −αk B g0 g1 gp B ˜ ∗ distinguish the two games. The simulator U can simulate Ak∈CA Ak6∈CA∪A u∗ the communication between D and U. U˜ works as follows: ti,j (β∗+u∗)a and D∗ = g i,j . Where, ˜ i,j 0 1) U sends PP and the public-key PKk of authority Ak U t ∗ to the malicious user . u ∗ ∗ ∗ −γ β∗+u∗ u β D ∗ = B g g u 0 1 2) The U needs to prove to U˜ that he/she owns u in Y −αk Y −αk gp B zero-knowledge by submitting two values (T,P0). If ∗ ˜ ∗ Ak∈CA Ak6∈CA∪A the proof succeeds, then U will gets (u, ρ0, z ) using tu∗ rewind technique. −bγ β∗+u∗ = gp g0 P P ˜ −( αk+ bαk) 3) U sends u to the trusted party and gets secret keys ∗ ∗ ∗ u β Ak∈CA Ak6∈CA∪A (D ,D ). g1 gp k,u k,i,j P P ab−( αk+ bαk)−b(a+γ) A ∈C ∗ ˜ k A Ak6∈CA∪A 4) U chooses ρ ∈R Zp, and calculates ρ1 = ρ/ρ0,P1 = = gp ∗ ρ1 z 1/ρ0 t ∗ g , D˜ = D · Z , D˜ = D . Then U˜ re- u ∗ ∗ 0 k,u k,u k k,i,j k,i,j g β∗+u∗ gu β 0 1 turns (P1, D˜ k,u, D˜ k,i,j) to U. P P ab−( αk+ bαk) ∗ Ak∈CA Ak6∈CA∪A = gp t ∗ u −b ∗ ∗ β∗+u∗ u β If (D ,D ) are correct keys from the trusted AA in g0 g1 k,u k,i,j the ideal game, then (D˜ k,u, D˜ k,i,j) are correct keys from 0 ∗ ∗ A in the real game. The distinguisher D cannot distin- Let tu∗ = tu∗ − b(β + u ), then k guish the real game with the ideal game. t0 P P u∗ ab−( αk+ ∗ bαk) ∗ ∗ ∗ Ak∈CA Ak6∈CA∪A β∗+u∗ u β Du∗ = gp g0 g1 Proof. (Selective-failure blindness) The malicious au- thority Ak provides PKk and two global identifiers ∗ Therefore, Du∗ is a valid secret key. (u0, u1). A random bit b ∈ {0, 1} is picked. Ak can International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 824
Table 2: The comparison of computing cost Scheme Setup KeyGen Encryption Decryption Huang [15] P + (N + 2)E (2|AU | + N + 1)E (2|Ac| + 2)E (2|Ac| + N + 1)P + |Ac|E Qian [26] P + (2N + m · |U|)E (2|AU | + 4N)E (2|Ac| + 2)E (|Ac| + 3N + 1)P Qian [27] P + (3N + |U|)E (|AU | + 3N(N − 1))E (|Ac| + 2)E (N|Ac| + 1)P + (|Ac| + 1)E Ours P + (2N + m · |U|)E (|AU | + 3N)E (|Ac| + N + 2)E (|Ac| + N + 1)P + E 1 P : the bilinear pairing operation. E: the exponential operation. | ∗ |: the number of elements in ∗. 2 AU : the attribute set of U. Ac: the attribute set in ciphertext. m: the number of values for each attribute.
have a black box access to U(ub, comb,PKk,PP ) and 6 Performance Analysis U(u1−b, com1−b,PKk,PP ). Then, U runs PPKeyGen algorithm with A and returns secret keys SKk and In this section, Table 2 and Figure 3 show the comparisons k Ub SKk : If SKk 6=⊥ and SKk 6=⊥, A is given of computation costs and efficiency, respectively. The sim- U1−b Ub U1−b k (SKk ,SKk ); If SKk 6=⊥ and SKk =⊥, A is ulation is executed on a Windows machine with 2.70 GHz Ub U1−b Ub U1−b k k k Intel(R) Core (TM) i5-4210U CPU and 4 GB RAM. The given (, ⊥); If SK =⊥ and SK 6=⊥, Ak is given Ub U1−b implementation is based on Java Pairing-Based Cryptog- (⊥, ); If SKk =⊥ and SKk =⊥, A is given (⊥, ⊥). Ub U1−b k raphy (PBC) Library (version 0.5.14). Random elements Finally, A outputs a guess b0 on b. k in group Gp1 are introduced into our scheme to implement policy hiding. Apart from this function, the authors com- pare the scheme with schemes [15,26,27]. Figure 3(a) and In the PPKeyGen algorithm, U first computes T,P0, Figure 3(b) show the comparisons of Setup time and Key- ∗ ∗ z u ρ0 and proves P oK{(u, ρ0, z ): T = gp g1 ∧ P0 = g0 }. So Gen time with different number of attribute authorities. far, the two oracles should be computationally indistin- In this simulation, the number of attribute authorities guishable to Ak. Otherwise, it will violate the commit- is increased from 2 to 14, each authority monitors 5 at- ment scheme’s hiding property and the witness undistin- tributes, and each attribute has 3 values. Scheme [15] guishable of the zero-knowledge proof. Assume that Ak uses a hash function to map the user’s attributes to a outputs secret keys for the first oracle with some comput- random group element. Attribute authorities do not need ing strategies. The next thing to prove is that Ak can to calculate the public keys related to the attributes, so predict secret keys for user without interaction with the the setup stage of the scheme takes less time. Figure 3(b) two oracles: shows that the proposed scheme takes less time to gen- erate the secret keys than the other three schemes. The reason why the secret key generation time of scheme [27] 1) Ak checks P oK{(αk, βk, rk,u, ρ1, η): D˜ k,u = tu increases rapidly with the number of authorities is that ρ t k,i,j 1 k,u ρ ηa αk βk η 1 ˜ k,i,j gp T P0 ∧ P1 = g0 ∧ Dk,i,j = P1 }. If every two authorities have to interact with each other. k The KeyGen time is a quadratic function of the indepen- the proof fails, Ak outputs SKU =⊥. 0 dent variable N. Figure 3(c) and Figure 3(d) show the comparisons of encryption time and decryption time with ˜ ˜ 2) Ak generates different (Dk,u, Dk,i,j) for the second different number of attributes in the access policy, and the ˜ oracle and proves P oK{(αk, βk, rk,u, ρ1, η): Dk,u = number of attribute authorities in the system is fixed at tu ρ t k,i,j 1 k,u ρ ηa 5. This proposed scheme encryption time is longer than αk βk η 1 ˜ k,i,j gp T P0 ∧P1 = g0 ∧Dk,i,j = P1 }. If the scheme [27], but the difference is very small. Figure 3(d) proof fails, A outputs SKk =⊥. k U1 shows that the proposed scheme is superior to the other three schemes in the decryption phase.
3) Finally, Ak returns its prediction on (u0, u1). If SKk 6=⊥ and SKk 6=⊥, the prediction is U0 U1 (SKk ,SKk ); If SKk 6=⊥ and SKk =⊥, the pre- 7 Conclusion U0 U1 U0 U1 diction is (, ⊥); If SKk =⊥ and SKk 6=⊥, the U0 U1 To implement data sharing, a privacy-preserving decen- prediction is (⊥, ); If SKk =⊥ and SKk =⊥, the U0 U1 tralized ABE scheme is proposed in this paper. All at- prediction is (⊥, ⊥). tribute authorities are independent of each other and do not require any collaboration. All users can get the se- The predication has the same distribution with the oracles cret keys from authorities by using the PPKeyGen pro- as Ak performs the same check as the honest user. There- tocol without revealing his/her GID, which meets users’ fore, if Ak can predict the user secret keys, then Ak, with requirement to protect their private information. Besides, or without the final outputs, has the same advantage. It this scheme supports policy anonymity, so one cannot get means that Ak can distinguish the two oracles before the any information about the user attributes. The security of prediction. However, based on the security of commit- the scheme is proved under a standard model and the sim- ment scheme and zero-knowledge proof, the advantage of ulation results verify the efficiency of the scheme. The dis- Ak in distinguishing the two oracles is negligible. advantage of the proposed scheme is that it only supports International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 825
500 1200
450 Huang [15] Huang [15] Qian [26] Qian [26] Qian [27] 1000 400 Qian [27] Ours Ours 350 800 300
250 600
200 400
The time of setup (ms) 150 The time of the KeyGen (ms) 100 200 50
0 0 2 4 6 8 10 12 14 2 4 6 8 10 12 14 The number of attribute authorites The number of attribute authorites (a) (b)
140 2500
Huang [15] Huang [15] Qian [26] 120 Qian [26] Qian [27] Qian [27] 2000 Ours Ours 100
1500 80
60 1000
40
The time of the encryption (ms) The time of the decryption (ms) 500 20
0 0 5 10 15 20 25 30 35 5 10 15 20 25 30 The number of attributes in the access structure The number of attributes in the access structure (c) (d)
Figure 3: Efficiency comparisons of the proposed scheme with others the AND-gate access structure. Constructing encryption [6] M. Chase and S. S. M. Chow, “Improving privacy and schemes to support more flexible access structures are left security in multi-authority attribute-based encryp- as the future works. tion,” in Acm Conference on Computer and Com- munications Security, pp. 121–130, 2009. [7] Y. Fan, X. Wu, and J. Wang, “Multi-authority Acknowledgments attribute-based encryption access control scheme with hidden policy and constant length ciphertext for This work was supported in part by the National Cryptog- cloud storage,” in IEEE Second International Con- raphy Development Fund under grant (MMJJ20180209). ference on Data Science in Cyberspace, pp. 205–212, 2017. [8] T. Feng and J. Guo, “A new access control system References based on CP-ABE in named data networking,” Inter- national Journal of Network Security, vol. 20, no. 4, [1] Y. Baseri, A. Hafid, and S. Cherkaoui, “Privacy 2018. preserving fine-grained location-based access control [9] T. Feng, X. Yin, Y. Lu, J. Fang, and F.Li, “A search- for mobile cloud,” Computers and Security, vol. 73, able CP-ABE privacy preserving scheme,” Interna- pp. 249–265, 2018. tional Journal of Network Security, vol. 21, no. 4, [2] J. Bethencourt, A. Sahai, and B. Waters, 2019. “Ciphertext-policy attribute-based encryption,” [10] A. Ge, J. Zhang, R. Zhang, C. Ma, and Z. Zhang, in IEEE Symposium on Security and Privacy, “Security analysis of a privacy-preserving decentral- pp. 321–334, 2007. ized key-policy attribute-based encryption scheme,” [3] J. Camenisch and M. Stadler, “Efficient group signa- IEEE Transactions on Parallel and Distributed Sys- ture schemes for large groups,” in Advances in Cryp- tems, vol. 24, no. 11, 2013. tology (CRYPTO’97), pp. 410–424, 1997. [11] V. Goyal, O. Pandey, A. Sahai, and B. Waters, [4] Z. Cao, L. Liu, and Z. Guo, “Ruminations on “Attribute-based encryption for fine-grained access attribute-based encryption,” International Journal control of encrypted data,” in ACM Conference on of Electronics and Information Engineering, vol. 8, Computer and Communications Security, pp. 89–98, no. 1, 2018. 2006. [5] M. Chase, “Multi-authority attribute based encryp- [12] J. Han, W. Susilo, Y. Mu, and J. Yan, “Privacy- tion,” in Conference on Theory of Cryptography, preserving decentralized key-policy attribute-based pp. 515–534, 2007. encryption,” IEEE Transactions on Parallel and Dis- International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 826
tributed Systems, vol. 23, no. 11, pp. 2150–2162, in Acm on Symposium on Access Control Models and 2012. Technologies, pp. 255–262, 2017. [13] J. Han, W. Susilo, and Y. Mu, et al., “PPDCP- [26] H. Qian, J. Li, and Y. Zhang, “Privacy-preserving de- ABE: Privacy-preserving decentralized ciphertext- centralized ciphertext-policy attribute-based encryp- policy attribute-based encryption,” in Computer Se- tion with fully hidden access structure,” in Interna- curity, pp. 73–90, 2014. tional Conference on Information and Communica- [14] S. Hu, J. Li, and Y. Zhang, “Improving security and tions Security, pp. 363–372, 2013. privacy-preserving in multi-authorities ciphertext- [27] H. Qian, J. Li, Y. Zhang, and J. Han, “Privacy- policy attribute-based encryption,” KSII Transac- preserving personal health record using multi- tions on Internet and Information Systems, vol. 12, authority attribute-based encryption with revoca- no. 10, 2018. tion,” International Journal of Information Security, [15] X. Huang, Q. Tao, B.Qin, and Z. Liu, “Multi- vol. 14, no. 6, 2015. authority attribute based encryption scheme with re- [28] Y. Rahulamathavn, S. Veluru, J. Han, F. Li, vocation,” in International Conference on Computer M. Rajarajan, and R. Lu, “User collusion avoid- Communication and Networks, pp. 1–5, 2015. ance scheme for privacy-preserving decentralized [16] T. Jung, X. Li, Z. Wan, and M. Wan, “Con- key-policy attribute-based encryption,” IEEE Trans- trol cloud data access privilege and anonymity with actions on Computers, vol. 65, no. 9, 2016. fully anonymous attribute-based encryption,” IEEE [29] S. Rezaei, M. A. Doostari, and M. Bayat, “A Transactions on Information Forensics and Security, lightweight and efficient data sharing scheme for vol. 10, no. 1, 2015. cloud computing,” International Journal of Electron- [17] C. C. Lee, P. S. Chung, and M. S. Hwang, “A survey ics and Information Engineering, vol. 9, no. 2, 2018. on attribute-based encryption schemes of access con- [30] A. Sahai and B. Waters, “Fuzzy identity-based en- trol in cloud environments,” International Journal of cryption,” in Advances in Cryptology, pp. 457–473, Network Security, vol. 15, no. 4, 2013. 2005. [18] A. Lewko and B. Waters, “Decentralizing attribute- [31] M. Wang, Z. Zhang, and C. Chen, “Security analy- based encryption,” in Advances in Cryptology- sis of a privacy-preserving decentralized ciphertext- eurocrypt -international Conference on the The- policy attribute-based encryption scheme,” Concur- ory and Applications of Cryptographic Techniques, rency and Computation: Practice and Experience, pp. 568–588, 2011. vol. 28, no. 4, 2015. [19] M. Li, S. Yu, Y. Zheng, K. Ren, and W. Lou, “Scal- [32] F. Xhafa, J. Feng, Y. Zhang, X. Chen, and J. Li, able and secure sharing of personal health records in “Privacy-aware attribute-based phr sharing with cloud computing using attribute-based encryption,” user accountability in cloud computing,” Journal of IEEE Transactions on Parallel and Distributed Sys- Supercomputing, vol. 71, no. 5, 2015. tems, vol. 24, no. 1, 2013. [33] R. Xu, Y. Wang, and B. Lang, “A tree-based CP- [20] X. Li, D. Gu, Y. Ren, N. Ding, and K. Yuan, ABE scheme with hidden policy supporting secure “Efficient ciphertext-policy attribute based encryp- data sharing in cloud computing,” in International tion with hidden policy,” in Internet and Distributed Conference on Advanced Cloud and Big Data, pp. 51– Computing Systems, pp. 146–159, 2012. 57, 2013. [21] C. W. Liu, W. F. Hsien, C. C. Yang, and M. S. H. [34] L. Zhang, P. Liang, and Y. Mu, “Improving privacy- Wang, “A survey of attribute-based access control preserving and security for decentralized key-policy with user revocation in cloud data storage,” Inter- attributed-based encryption,” IEEE Access, vol. 6, national Journal of Network Security, vol. 18, no. 5, pp. 12736–12745, 2018. 2016. [35] L. Zhang and H. Yin, “Recipient anonymous [22] L. Liu, Z. Cao, and C. Mao, “A note on one out- ciphertext-policy attribute-based broadcast encryp- sourcing scheme for big data access control in cloud,” tion,” International Journal of Network Security, International Journal of Electronics and Information vol. 20, no. 1, 2018. Engineering, vol. 9, no. 1, 2018. [23] M. Lyu, X. Li, and H. Li, “Efficient, verifiable and [36] Y. Zhang, D. Zheng, and R.H. Deng, “Security privacy preserving decentralized attribute-based en- and privacy in smart health: Efficient policy-hiding cryption for mobile cloud computing,” in IEEE Sec- attribute-based access control,” IEEE Internet of ond International Conference on Data Science in Cy- Things Journal, vol. 5, pp. 2130–2145, 2018. berspace, 2017. DOI: 10.1109/DSC.2017.8. [24] T. P. Pedersen, “Non-interactive and information- theoretic secure verifiable secret sharing,” in Interna- Biography tional Cryptology Conference on Advances in Cryp- Li Kang is a master degree student in the school of tology, pp. 129–140, 1991. mathematics and statistics, Xidian University. Her re- [25] H. S. G. Pussewalage and V. A. Oleshchuk, “A dis- search interests focus on computer and network security. tributed multi-authority attribute based encryption scheme for secure sharing of personal health records,” Leyou Zhang is a professor in the school of mathemat- International Journal of Network Security, Vol.22, No.5, PP.815-827, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).12) 827 ics and statistics at Xidian University, Xi’an China. He received his PhD from Xidian University in 2009. From Dec. 2013 to Dec. 2014, he is a research fellow in the school of computer science and software engineering at the University of Wollongong. His current research in- terests include network security, computer security, and cryptography. International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 828
Evidence Gathering of Facebook Messenger on Android
Ming Sang Chang and Chih Ping Yen (Corresponding author: Chih Ping Yen)
Department of Information Management, Central Police University Taoyuan 33304, Taiwan (Email: [email protected]) (Received Aug. 28, 2019; Revised and Accepted Feb. 5, 2020; First Online Feb. 26, 2020)
Abstract daily activities. Facebook Messenger is a messaging application. Orig- The trend in social networking is changing people’s inally developed as Facebook Chat in 2008, the company lifestyle. Since both the smart phone and computers are revamped its messaging service in 2010. It released stan- connected to the same tools, the newly developed applica- dalone iOS and Android apps in August 2011. Over the tions must serve both ends to please the users. Therefore, years, Facebook has released new apps on a variety of the modes of cybercrime have also changed in accordance different operating systems, launched a dedicated website with the users’ activities. In order to identify crimes, it interface, and separated the messaging functionality from is necessary to use appropriate forensic techniques to re- the main Facebook app. Users can send messages and trieve these traces and evidence. This study considers the exchange photos, videos, stickers, audio, and files, as well social network, Facebook Messenger, as the research sub- as react to other users’ messages and interact with bots. ject. We analyze the artifacts left on the Facebook Mes- The service also supports voice and video calling. After senger application and show evidence of gathering, such being separated from the main Facebook app, Facebook as sending texts, pictures, and videos and making calls Messenger had 600 million users in April 2015. This grew on the Android platform. This study explores the differ- to 900 million in June 2016, 1 billion in July 2016, and ences between the traces that are left on Non-Rooted and more 1.3 billion monthly active users in October 2019 [22]. Rooted Android platform. Finally, the forensic analysis Social networking websites provide a virtual exchange found, due to the differences in privacy control, can lead space on the Internet for people with common interests, to discrepancies in recording the user behaviors on the hobbies, and activities to easily share, discuss, and ex- same social network. It proves to be helpful to forensic change their views without any limitation of space and analysts and practitioners because it assists them in map- time. Therefore, social networking websites continue to ping and finding digital evidences of Facebook Messenger accumulate a large number of users. According to the on Android smart phone. Metcalfe’s law, the value of a telecommunications network Keywords: Cybercrime; Digital Forensics; Facebook Mes- is proportional to the square of the number of connected senger; Social Network users of the system. As a result, social networking has be- come a great force in today’s society. However, this has also brought about endless criminal activities on social 1 Introduction networks, such as cyberbullying, social engineering, and identity theft, among the other issues. Due to the follow- Nowadays, social networking sites are increasingly popu- ing characteristics, the detecting cybercrime on social net- lar. The popularity of social networking sites has given works is different in comparison to other cybercrime [10]. rise to the number of social networking users for busi- Therefore, to assist the investigators in improving their ness, recreation or any other likely purposes. Over the efficiency of solving crimes, researches focusing on these past few years, social networking sites have been signif- upcoming technologies are needed [16]. icant mediums for people to enhance their interpersonal relationships. The prevalence of social networking web- 1) Anonymity: Users are often unaware of the true iden- sites has changed the living habits of many people. They tity of their counterpart in a social network because share their emotion or daily life with their friends via tex- they are dealing with a fake account. Therefore, in ting, photographing or videoing. There is no doubt that the case of a social network cybercrime, it is difficult people have integrated social networking sites into their to extract the suspect’s information and make arrests lives and turned the use of social networking sites into immediately [1]. International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 829
2) Diffuseness: Any news published on the social net- chance that a record can be found in the registry to prove work will be forwarded or shared immediately, which that the instant messenger has once installed onto the generates the diffusion effect [21]. Therefore, if a so- system. Information is also stored within the memory. cial network crime is not responded to immediately, Since every application requires memory to execute, it is it may cause the victim to suffer some serious dam- logical to think that there evidence could be left behind age. in the system’s memory. The analysis on live memory allows us to extend the possibility in providing additional 3) Cross-Regional feature: Due to the nature of Inter- contextual information for any cases. net, the location of the cybercrime is not necessarily Presently, various researches focusing on the forensic the place where the criminal suspects are located. A analysis of social networking are being conducted. Arti- bottleneck is formed during the crime investigation facts of instant messaging have been of interest in many due to the difficulty in locating the suspects [13]. different digital forensic studies. Early work focused on artifacts left behind by many instant messaging applica- 4) Vulnerability of evidence: The evidences obtained on tions, such as MSN Messenger [7], Yahoo Messenger [8], social networks are in the form of digital data. In ad- and AOL Instant Messenger [17]. In 2013 Mahajan per- dition to the highly volatile nature of the digital ev- formed forensic analysis of Whatsapp and Viber on five idences in the processing program from collection to android phones using UFED and manual analysis [12]. storage, it is easy to change, delete, lose, or contam- Katie Corcoran forensic 7 Messaging applications and first inate the digital evidences due to the anti-forensics analyzed the evidence of Facebook Messenger [5]. Leven- operation of the suspects or negligence of the inves- doski concluded that artifacts of the Yahoo Messenger tigators [14]. client produced a different directory structure on Win- This study considers the social network, Facebook Mes- dows Vista and 7 [11]. Wong and Al Mutawa demon- senger, as the study subject. User activities are per- strated that artifacts of the Facebook web-application formed through Android smart phones. Forensic analysis could be recovered from memory dumps and web brows- is conducted to understand what type of user behavior ing cache [15, 25]. leaves digital evidence on Android. We explore the dif- Said investigated Facebook and other IM applications, ferences between the traces that are left on Non-Rooted it was determined that only BlackBerry Bold 9700 and and Rooted Android platform. The results will be served iPhone 3G/3GS provided evidence of Facebook unen- as a reference for the future researchers in social network crypted [19]. Sgaras analyzed Skype and several other cybercrime investigation or digital forensics. VoIP applications for iOS and Android platforms [20]. It The rest of this paper is organized as follows. In the was concluded that the Android apps store far less arti- next section, we present our related works. In Section facts than of the iOS apps. Chu focused on live data ac- 3, we present our methodology. In Section 4, we present quisition from personal computer and was able to identify the results and findings of digital forensics on Facebook distinct strings that will assist forensic practitioners with Messenger. In Section 5, we discuss the findings. Finally, reconstruction of the previous Facebook sessions [4]. The we summarize our conclusions. analysis was conducted on an iPhone running iOS6 and a Samsung Galaxy Note running Android 4.1. Walny- cky analyzed 20 popular instant messaging applications 2 Related Works for Android, of which Facebook Messenger can get evi- dences such as text chat, voice call, audio, video, image, The evidences were stored on three principle areas by us- location, and stickers [24]. Azfar adapt a widely used ing instant messenger program (IM). They are hard drive, adversary model from the cryptographic literature to for- memory, and network. Some IM services have the ability mally capture a forensic investigator’s capabilities during to log information on the user’s hard drive. To use an IM, the collection and analysis of evidentiary materials from an account must be established to create a screen name mobile devices [2]. provided with user information. Some instant messenger William Glisson explored the effectiveness of different providers might assist the investigation with information forensic tools and techniques for extracting evidences on of the account owner. mobile devices [9]. In 2015, Nikos Virvilis presented stud- Evidence can be found in various internet file caches ies based on the security of web browsers and reported used by Internet Explorer for volatile IM and each cache the shortcomings and vulnerabilities of browsers oper- holds different pieces of data. Apart from the normal files, ated on desktop and mobile devices. It was found that files left by instant messenger on a hard drive can be in some browsers using secure browsing protocols had actu- temp file format and will generally be deleted could be ally limited their own protection level [23]. Dezfouli exam- very difficult to retrieve once the machine is power down. ine four social networking: Facebook, Twitter, LinkedIn An operating system generally stores information of all and Google+, on Android and iOS platforms, to detect the installed and uninstalled applications in the system. remnants of users’ activities that are of forensic inter- The uninstalled application also leaves evidence. If a user est [6]. In 2017, Yusoff report the results of investiga- has deleted an instant messenger application, there is a tion and analysis of three social media services (Facebook, International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 830
Twitter, and Google +) as well as three instant messag- systems. The devices include mobile phones, tablets or ing services (Telegram, OpenWapp, and Line) for foren- any other electronic device that is running Android mo- sic investigators to examine residual remnants of forensics bile operating system could obtain highest authority when value in Firefox OS [27]. Song-Yang Wu describes several they rooted the phone. Rooting is often carried out with forensic examinations of Android WeChat and provides the aim of overcoming limitations that mobile operators corresponding technical methods [26]. Imam Riadi per- and developers put on some devices. In order to obtain formed a comparison of tool performance to find digital more information on the mobile phone, the investigators and chat and pictures from Instagram Messenger [18]. Al- should execute a series of rooting processes before exam- though Zhang et al. analyzed the local artifacts of the 4 ining a mobile phone. Instant Messaging applications, the types of these local XRY is a commercial digital forensics and mobile de- artifacts are incomplete [28]. Jusop Choi analyzed the vice forensics product by the Swedish company Micro Sys- personal data files in three instant messaging applications temation. It used to analyze and recover information from (KakaoTalk, NateOn, and QQ) which are the most pop- mobile devices. XRY is designed to recover the contents ularly used in China and South Korea [3]. of a device in a forensic manner so that the contents of This paper investigated the user activities of Facebook the data can be relied upon by the user. The XRY sys- Messenger through Android smart phones. We conducted tem allows for both logical examinations and also physical forensics on Non-Rooted and Rooted Android platform, examinations. and explored and compared the type of user behavior that Autopsy is a free computer software that makes it leaves digital evidence on the device. The results will be simpler to deploy many of the open source programs. served as a reference for the future researchers in the social The graphical user interface displays the results from the network cybercrime investigation or digital forensics. forensic search of the underlying volume making it easier for investigators to flag pertinent sections of data. Win- Hex is a hex editor useful in data recovery and digital 3 Methodology forensics. WinHex is a free powerful application that you can use as an advanced hex editor, a tool for data analysis, In our research, we use the smart phone with an instal- editing, and recovery, a data wiping tool, and a forensics lation of Facebook Messenger. The study was focused on tool used for evidence gathering. identifying data remnants of the activities of Messenger SQLite is a software library that provides a relational on an Android platform. This is undertaken to determine database management system. SQLite database is inte- the remnants an examiner should search for when Instant grated with the application that accesses the database. Messenger is suspected. Our research includes the circum- The applications interact with the SQLite database read stances of Non-Rooted and Rooted Android platform. and write directly from the database files stored on disk. SQLite is an open source. SQL database that stores data 3.1 Research Goal to a text file on a device. Android comes in with built This paper studies the user behaviors, including logging in SQLite database implementation. The physical smart into Facebook Messenger, uploading images, exchanging phone uses SQLite to read and analyze the database files information, GIS location sharing, and special application on mobile devices. Android debug bridge (ADB) is a ver- functions under the Non-Rooted and Rooted Android en- satile command line tool that lets users communicate with vironment. The study also explored and compared the connected Android devices or emulators. Android debug type of user behavior that leaves digital evidence on the bridge command also facilitates a variety of devices ac- device. We explore the differences between the artifacts tions, for example, installing or debugging applications. that are left on Non-Rooted and Rooted Android plat- All the specifications of the tools we used are listed in the form. We checked the changes and discrepancies in the Table 1. residual digital data and relevant evidence on the Android smart phone. 3.3 Development of Experiments
3.2 Experimental Environment and Tools Based on the experimental environment designed, we run the Messenger features, including logging in, sending mes- In this paper, all the experiments were conducted on the sages, exchanging photos, videos, audio, and files, making real system. This study is built on Sony Xperia Z1 C6902 a call, etc. After that, the relevant evidence on each de- with Android 4.3. Under the Android operating envi- vice was extracted and analyzed using forensic tools. ronment, the Facebook Messenger social networking ap- plication was installed to run the Messenger features di- 3.3.1 Extract Data rectly. In addition, if Facebook Messenger have ever been installed, the ”/com.facebook.orca” folder will appear in XRY is a commercial tool specifically for mobile phone the directory structure. forensics. Besides XRY tool, we need backup the image Rooting is a process of allowing users to gain privileged file of smart phone to analyze for Autopsy and WinHex. control which is known as root over the various Android We create an image file for physical memory on the smart International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 831
Table 1: List of hardware and software used Devices/Tools Description Specification/Versions Sony Xperia Z1 C6902 Android Smart Phone Android 4.3, Memory 2GB/16GB XRY Mobile Forensics Tool Version v7.4.1 Autopsy Digital Forensics Tool Version v4.4.1 WinHex Digital Forensics Tool Version v18.9 SQLite Expert Personal Database Management Tool Version v3.5.96.2516 Messenger Social Networking App Version v141.0.0.31.76 Minimal ADB and Fastboot ADB Tool Version v10.0.16299.371
phone and use forensic tools to extract and analyze im- the relevant evidence. portant digital evidences from the image file. We can extract data and create image file from physi- 3.3.3 Experiment Elaboration cal memory. We connect the mobile phone with the com- puter by using the phone cable. First, we enter ”adb The experiment steps are summarized as follows. devices” command to connect the two devices. If the two 1) We install the Messenger software on the smart devices are connected, the message will show the list of phone, and the forensic tool on the personal com- devices attached. Next, we enter ”adb shell” command puter. to execute remote control and now the mark sign will be- come ”$”. Then, we need to obtain the administrator 2) We logged into the Messenger for running various level permissions. Thus, we enter ”su” command and the features for any material evidence left by the users. mark sign will become ”#”. Now, we can enter ”busybox df -h” command to inspect the system partition, path, 3) After the activities completed, the Messenger soft- volume, space usage, available space and so on. How- ware is logged out. ever, most of the application data installed and stored 4) Use the forensic tool to find out all kinds of artifacts on the phone would locate at the data partition. There- about Messenger software on smart phone. fore, we are interested in this data partition. The path of data partition is ”/dev/block/by-name/data”. Now, 5) Perform a comprehensive evidences analysis. we use ”dd” command to create an image file for this partition. ”dd” command can perform physical imaging by adopting bit-by-bit method. We enter ”busybox dd 4 Results and Findings if=/dev/block/by-name/data of=/storage/MicroSD/test conv=noerror bs=4096” command to create an image file. In this section, we will use three scenarios to describe the The string behind ”if” is a partition that we would create result and findings. Each scenario has two modes that are an image file. The string behind ”of” is an image out- the Non-Rooted mode and Rooted mode. The details of put path. In the experiment, we name the output image result and findings are as follows. file ”test.img” and store it on the external SD card. The ”conv=noerror” shows that there is no interruption when 4.1 Scenario 1: XRY there is an error occurs. The ”bs” represents the block In the scenario 1, we follow the experiment steps as Sec- size that we would write and read per time. tion 3.3 and use the XRY tool to find the evidences of the activities on the Messenger. 3.3.2 Design of Analysis 4.1.1 Non-Rooted Mode In order to ensure the integrity of digital evidence and avoid interference between digital evidence, we divide the Using the XRY tool to find the evidences on the Messen- experiment into the following three cases according to dif- ger, we can’t find the user account and password. The ferent forensic tools XRY, Autopsy and WinHex. The artifacts of user account can be found as Figure 1. The three experimental scenarios include Scenario 1: XRY, user’s information is as Figure 1 that are phone number Scenario 2: Autopsy, and Scenario 3: WinHex. And each +886985028322, nickname Huang Gordon, profile picture experimental scenario, the same experimental steps are URL, and Facebook number 100021304820523. performed. The friend list can be found as Figure 2. It includes In addition, each of the above experimental scenarios the name and the profile picture URL of friends. is further divided into two models: non-root and root An- The artifacts of sending text, image, audio, video, GIS droid platforms.We then performed the same experiments location, and GIF animation can be found. For an exam- on non-root and root Android platforms and compared ple, we could show the artifacts of sending video as Figure International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 832
Figure 1: The artifacts of user account Figure 4: The artifacts of making a call
Figure 2: The artifacts of friend list
3. The video delivery time, the video URL of storage lo- cation, who sending the video can be found.
Figure 5: The artifacts of message table
find the user network number. The contacts db2 database contains the contacts table. It includes friend list. The nickname and the URL of profile picture are on the table. The path, /data/data/com.facebook.orca/, is the main storage location of Messenger. All the storage path of different artifacts is shown as Table 2.
Figure 3: The artifacts of sending video Table 2: The storage path of various traces Artifacts Storage Path The artifacts of making a call can be found. We can Profile Picture /data/data/com.facebook.orca/ find the calling record that includes the calling time, who files/image/v2.ols100.1/52/ making the call, and threads db2 database. For an exam- Friend Picture /data/data/com.facebook.orca/ ple, the artifacts of making a call is as Figure 4. files/image/v2.ols100.1/88/ About the database, there are three main databases Image /data/data/com.facebook.orca/ for the recovered Messenger artifacts. The threads db2 cache/image/v2.ols100.1/84/ database contains the sending messages. The Video /data/data/com.facebook.orca/ call log db 10021304820523 database contains the files/ExoPlayerCacheDir/videocache/ setting of user account. 10021304820523 is the network GIFAnimation /data/data/com.facebook.orca/ login number. The contacts db2 database contains cache/image/v2.ols100.1/32/ the user information. The contacts db2 database Sticker /data/data/com.facebook.orca/ is on /data/data/com.facebook.katana/databases/. cache/image/v2.ols100.1/99/ The threads db2 database and the Audio /data/data/com.facebook.orca/ call log db 10021304820523 database are on cache/audio/v2.ols100.1/88/ /data/data/com.facebook.orca/databases. There are three significance tables in the threads db2 database. It includes the message table, the thread users table, and the message reactions table. The message table stores 4.1.2 Rooted Mode the sending messages. An example of message table is as Figure 5. The evidences of text, sender, and timestamp Root is the highest privilege of the mobile phone, which is can be found on the table. We can find the user name, equivalent to the administrator privilege in the computer nickname, and network number in the thread users table. window system. After obtaining the root privilege, all the The stickers can be found on message reactions table. files of the mobile phone can be read and modified. From the call log db 10021304820523 database, we can Using the forensic tool XRY, we can find the artifacts International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 833 of the Messenger account. It includes nickname, friend 4.2.2 Rooted Mode nickname, profile picture, user name, phone number, net- work number and related data URL. But the account We use the Autopsy tool to open the backup image and password cannot be found. The sending text mes- file of smart phone. Our nickname, friend nickname, sage, picture, video, GIF animation, sticker, audio file, text, image, video, GIF animation, sticker, audio file, GIS location, and calling records are also can be found. GIS location, and calling record can be found. But It includes message content, sending time, name of the the account number, password, and user name can’t be database, information about the sender and the recipi- found. We also find the databases such as threads db2, ent, etc. According to the experiment, the sending mes- call log db 10021304820523 and contacts db2. There is sage is stored in the threads db2 database, the account no data on the message reactions table in the threads db2 data is stored in the contacts db2 database, and the net- database. The Autopsy displays the artifacts of Messen- work login number of the Messenger is known by the ger is as Figure 6. call log db 10021304820523database. The Rooted mobile phone experiment has the same results as the Non-rooted mobile phone experiment, and can find artifacts on vari- ous tables in the different database.
4.1.3 The Comparison of Findings on Rooted Mode and Non-Rooted Mode The comparisons of findings between Rooted Mode and Non-Rooted Mode using XRY are as Table 3.
Table 3: The comparison of Rooted Mode and Non- Rooted Mode using XRY Evidences Rooted Non-Rooted Account None None Password None None Profile Name Found Found Nickname Found Found Friend Nickname Found Found Text Found Found Image Found Found Video Found Found GIF Animation Found Found Figure 6: The artifacts of Messenger using Autopsy Stickers Found Found Audio Found Found GIS Location Found Found User nickname and the network number can be found Calling Found Found on the thread users table in the threads db2 database. We also find the sending text, sticker, GIS location, calling record on the messages table in the same database. The artifacts of video are located on data/com.facebook.orca/ 4.2 Scenario 2: Autopsy files/ExoPlayerCacheDir/videocache/. We find the arti- facts of audio on data/com.facebook.orca/cache/audio/. In the scenario 2, we follow the experiment steps as Sec- The Image and GIF animation are on tion 3.3 and use the Autopsy tool to find the artifacts of data/com.facebook.orca/cache/image/. For an ex- the activities on the Messenger. ample, the artifact of GIS location is shown in Figure 7.
4.2.1 Non-Rooted Mode The Autopsy is a free information security forensics tool that provides a graphical interface for digital forensic in- vestigation. It can analyze Windows and UNIX disks and file systems such as NTFS, FAT, UFS1/2 and Ext2/3. In the case of Non-rooted mobile phone, we use the Autopsy tool to open the smart phone backup image file and ana- Figure 7: The artifacts of GIS location using Autopsy lyze it. As a result, no artifacts about Messenger can be found. International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 834
4.2.3 The Comparison of Findings on Rooted 4.3.3 The Comparison of Findings on Rooted Mode and Non-Rooted Mode Mode and Non-Rooted Mode The comparisons of findings between Rooted Mode and The comparisons of findings between Rooted Mode and Non-Rooted Mode using Autopsy are as Table 4. Non-Rooted Mode using WinHex are as Table 5.
Table 4: The comparison of Rooted Mode and Non- Table 5: The comparison of Rooted Mode and Non- Rooted Mode using Autopsy Rooted Mode using WinHex Evidences Rooted Non-Rooted Evidences Rooted Non-Rooted Account None None Account None None Password None None Password None None Profile Name None None Profile Name None None Nickname Found None Nickname Found None Friend Nickname Found None Friend Nickname Found None Text Found None Text Found None Image Found None Image None None Video Found None Video None None GIF Animation Found None GIF Animation None None Stickers Found None Stickers None None Audio Found None Audio None None GIS Location Found None GIS Location None None Calling Found None Calling None None
4.3 Scenario 3: WinHex In the scenario 3, we follow the experiment steps as Sec- 5 Discussions tion 3.3 and use the WinHex tool to find the artifacts of the activities on the Messenger. Using XRY tool, the path, /data/data/com.facebook.orca/, is the main storage location of Messenger. Under this path, there are two 4.3.1 Non-Rooted Mode folders, files, and cache, have more multimedia artifacts. WinHex is a disk editor and a hex editor useful in data They include the profile picture, image, GIF animation, recovery and digital forensics. WinHex is a free power- sticker, and audio files. The video artifacts are located ful application that you can use as an advanced hex ed- in the ExoPlayerCacheDir folder. In addition, the itor, a tool for data analysis, editing, and recovery, and storage path after the smart phone has been Rooted is a forensics tool used for evidence gathering. In the case the same as that of Non-rooted, but the pathname is of Non-rooted mobile phone, we use the WinHex tool to changed to /userdata/data/com.facebook.orca/. Both open the smart phone backup image file and analyze it. the Non-rooted smart phone and the Rooted smart As a result, no artifacts about Messenger can be found. phone have the same artifacts. The reason is that the forensic tool XRY uses the way to downgrade the version 4.3.2 Rooted Mode of Messenger to get a lot of artifacts. According to the experiments, in the case of the Non- We use the WinHex tool to open the backup image file of rooted smart phone, the forensic tools, Autopsy, and Win- smart phone. Our nickname, friend nickname, text, can Hex, could not find any artifacts of Messenger. On the be found. But the image, video, GIF animation, sticker, contrary, use the forensic tool XRY to bypass the secu- audio file, GIS location, calling record, account number, rity protection mechanism by downgrading the Messenger password, and user name can’t be found. For an exam- version. We can obtain many artifacts of the Messenger. ple, we find the user login email and nickname using the But the account number and password cannot be found. ”username” keyword as Figure 8. Therefore, it is proved that the use of the forensic tool XRY to perform the forensic analysis in the Non-rooted smart phone is better. In the Rooted smart phone, using the forensic tool Au- topsy can find many traces, but the account number, pass- word and user name cannot be found. Using the forensic Figure 8: The artifacts of the user under using WinHex tool WinHex can find our nickname, friend nickname, and text. According to the results of the experiments, it is International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 835
Table 6: The findings of three forensic tools Forensic Tools XRY Autopsy WinHex Messenger Artifacts Non-Rooted Rooted Non-Rooted Rooted Non-Rooted Rooted Account None None None None None None Password None None None None None None User name Found Found None None None None My nickname Found Found None Found None Found Friend nickname Found Found None Found None Found Text Found Found None Found None Found Image Found Found None Found None None Video Found Found None Found None None GIF animation Found Found None Found None None Sticker Found Found None Found None None Audio Found Found None Found None None GIS location Found Found None Found None None Calling log Found Found None Found None None
Table 7: The results of forensic analysis of relevant researchers for Facebook messenger Researcher Artifacts Katie Corcoran [5] User name, my nickname, friend nickname, text, GIS location, timestamp Hao Zhang [28] User name, my nickname, friend nickname, text, image, audio, GIS location, times- tamp Ming-Sang Chang & Chih-Ping User name, my nickname, friend nickname, text, image, video, GIF animation, Yen (this work) sticker, audio, GIS location, calling log, timestamp
proved that the use of the forensic tool XRY to perform 6 Conclusions the forensic analysis in the Rooted smart phone and Non- Rooted smart phone is better. However, considering the Although the instant messaging software has the ad- funding or other restrictions, we can try to use the free vantages of convenience and immediacy, it is always in- forensic tool, Autopsy or WinHex, to assist in the forensic evitable that it will be abused by cyber criminals. Crimes analysis work. are often exploited from software, website and web appli- cation exploits, using cloud services to spread malware, We know the investigation step of instant messaging and further exploiting social media posts and links to trick from the research of the experiments. First, we find the users into fraud traps. basic information of the user, and then search for the be- havior of the user through the keyword strings such as ac- In this paper, we investigated the apps of Facebook count number, nickname ,and network number. Using the Messenger to conduct a forensic analysis of the user be- user artifacts to estimate possible crimes. It also can use haviors in Android environments. The study found that other accounts that may be additionally discovered dur- different activities can lead to the discrepancies in record- ing the search period. It may help to expand the search ing the user behaviors on the same social network. scope to see if there are accomplices or other victims. While investigating cybercrime on Facebook Messen- ger, we recommend that the first goal should be finding We also can analyze the relationship between the crim- the account number, nickname, and network number of inal modus operandi and the timing chain. The investi- the criminal suspect. Using the account number and nick- gator can infer the motives and tactics of possible crimes name, the operational behaviors of the criminal suspect through the timing chain, and discover the criminal ac- on the social network can be searched, such as, uploading complices, transaction content, plans, time and place. pictures, sending text, calling log, and timestamps. Then, When investigators find all kinds of traces of crimes, they based on the contents of the operation, the possible crim- can prevent crimes in advance. inal activity or victimization practice can be deduced or Finally, we summarize the findings of three forensic estimated. At the same time, using the additional ac- tools based on Non-rooted mode and Rooted mode as count numbers that are possibly discovered during the Table 6. It also presents the results of forensic analysis of evidence gathering phase, the scope of the investigation relevant researchers for Facebook Messenger, as shown in can be expanded to find the possible accomplices or other Table 7. victims. The full evidence scenario obtained in a step-by- International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 836 step and layer-by-layer outward expansion will be the key [14] D. K. Mendoza, “The vulnerability of cyberspace – to solving the case. the cyber crime,” Journal of Forensic Sciences & Criminal Investigation, vol. 2, no. 1, 2017. [15] N. Mutawa, I. Awadhi, I. Baggili, and A. Marrington, References “Forensic artifacts of facebook’s instant messaging service,” in Proceedings of the International Confer- [1] N. Al-Suwaidi, H. Nobanee, and F. Jabeen, “Esti- ence for Internet Technology and Secured Transac- mating causes of cyber crime: Evidence from panel tions (ICITST’11), pp. 771–776, 2011. data fgls estimator,” International Journal of Cyber [16] D. Quick and K. K. Choo, “Pervasive social network- Criminology, vol. 12, pp. 392–407, 2018. ing forensics: Intelligence and evidence from mobile [2] A. Azfar, K. Choo, and L. Liu, “An android so- device extracts,” Journal of Network and Computer cial app forensics adversary model,” in Proceedings Applications, vol. 86, pp. 24–33, 2017. of the International Conference on System Sciences, [17] J. Reust, “Case study: Aol instant messenger pp. 5597–5606, 2016. trace evidence,” Digital Investigation, vol. 3, no. 4, [3] J. Choi, J. Yu, S. Hyun, and H. Kim, “Digital foren- pp. 238–243, 2006. sic analysis of encrypted database files in instant [18] I. Riadi, A. Yudhana, and M. C. F. Putra, “Forensic messaging applications on windows operating sys- tool comparison on instagram digital evidence based tems: Case study with kakaotalk, nateon and QQ on android with the nist method,” Scientific Journal messenger,” Digital Investigation, vol. 28, pp. 50–59, of Informatics, vol. 5, no. 2, pp. 235–247, Nov. 2018. Apr. 2019. [19] H. Said, A. Yousif, and H. Humaid, “Iphone forensics [4] H. C. Chu, D. J. Deng, and J. H. Park, “Live data techniques and crime investigation,” in Proceedings mining concerning social networking forensics based of the International Conference and Work-shop on on a facebook session through aggregation of social Current Trends in Information Technology, pp. 120– data,” IEEE Journal on Selected Areas in Commu- 125, 2011. nications, vol. 29, no. 7, pp. 1368–1376, 2011. [20] C. Sgaras, M. T. Kechadi, and N. A. Le-Khac, [5] K. Corcoran, A. Read, J. Brunty, and T. Fenger, “Forensics acquisition and analysis of instant messag- “Messaging application analysis for android ing and voip applications,” Computational Forensics, and iOS platforms,” Research and Semi- pp. 188–199, 2015. nar in Forensic Science Graduate Program, [21] P. Shakarian, A. Bhatnagar, A. Aleali, E. Shaabani, 2013. (http://www.marshall.edu/forensics/ and R. Guo, ”Diffusion in social networks,” Com- research-and-seminar/class-of-2013) puter Science, 2015. ISBN: 978-3-319-23105-1. [6] F. N. Dezfouli, A. Dehghantanha, B. Eterovic-Soric, [22] Statista, Most Popular Global Mobile Messenger and K. R. Choo, “Investigating social networking ap- Apps as of October 2019, Based on Number of plications on smartphones detecting facebook, twit- Monthly Active Users [Online], 2020. (https: ter, linkedin and google+ artefacts on Android and //www.statista.com/statistics/258749/ iOS platforms,” Australian Journal of Forensic Sci- most-popular-global-mobile-messenger-apps) ences, vol. 48, pp. 469–488, 2016. [last accessed 10.01.2020]. [7] M. Dickson, “An examination into msn messenger 7.5 [23] N. Virvilis, A. Mylonas, N. Tsalis, and D. Gritza- contact identification,” Digital Investigation, vol. 3, lis, “Security busters: Web browser security vs.rogue no. 2, pp. 79–83, 2006. sites,” Computer & Security, vol. 52, pp. 90–105, [8] M. Dickson, “An examination into yahoo messen- 2015. ger 7.0 contact identification,” Digital Investigation, [24] D. Walnycky, I. Baggili, and A. Marrington, et al., vol. 3, no. 3, pp. 159–165, 2006. “Network and device forensic analysis of android [9] W. B. Glisson, T. Storer, and J. Buchanan- social-messaging applications,” Digital Investigation, Wollaston, “An empirical comparison of data recov- vol. 14, pp. 77–84, 2015. ered from mobile forensic toolkits,” Digital Investi- [25] K. Wong, A. Lai, J. Yeung, and W. Lee, et al., gation, vol. 10, no. 1, pp. 44–55, 2013. “Facebook forensics,” valkyrie-x security research [10] J. Golbeck, Introduction to Social Media Investiga- group, 2011. (https://www.fbiic.gov/public/ tion, pp. 273-278, 2015. ISBN: 9780128016565. 2011/jul/facebook_forensics-finalized.pdf) [11] M. Levendoski, T. Datar, and M. Rogers, “Ya- [26] S. Y. Wu, Y. Zhang, and X. P. Wang et al., “Forensic hoo! messenger forensics on windows vista and win- analysis of wechat on android smartphones,” Digital dows 7,” Digital Forensics and Cyber Crime, vol. 88, Investigation, vol. 21, pp. 3–10, 2017. pp. 172–179, 2012. [27] M .N. Yusoff, A. Dehghantanha, and R. Mahmod, [12] A. Mahajan, M. S. Dahiya, and H. P. Sanghvi, “Chapter 4 – forensic investigation of social media “Forensic analysis of instant messenger applications and instant messaging services in firefox os: Face- on android devices,” International Journal of Com- book, twitter, google+, telegram, openwapp, and puter Applications, vol. 68, no. 8, pp. 38–44, 2013. line as case studies,” Contemporary Digital Foren- [13] E. Martellozzo and E. A. Jane, Cybercrime and Its sic Investigations of Cloud and Mobile Applications, Victims, 2017. ISBN10: 1138639443. pp. 41–62, 2017. International Journal of Network Security, Vol.22, No.5, PP.828-837, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).13) 837
[28] H. Zhang, L. Chen, and Q. Liu, “Digital forensic a Professor. His research interest includes Computer Net- analysis of instant messaging applications on android working, Network Security, Digital Investigation, and So- smartphones,” International Conference on Comput- cial Networks. ing, Networking and Communications (ICNC’18), Chih Ping Yen is an Associate Professor, Department pp. 647–651, 2018. of Information Management, Central Police University. Received his Ph.D. degree from Department of Computer Biography Science and Information Engineering, National Central University, Taiwan, in 2014. His research interest in- Ming Sang Chang received the Ph.D. degree from Na- cludes Digital Investigation, Artificial Intelligence & Pat- tional Chiao Tung University, Taiwan, in 1999. In 2001 tern Recognition, Image Processing, and Management In- he joined the faculty of the Department of Information formation Systems. Management, Central Police University, where he is now International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 838
Protection of User Data by Differential Privacy Algorithms
Jian Liu1 and Feilong Qin2 (Corresponding author: Feilong Qin)
School of Automobile and Transportation, Chengdu Technological University1 School of Big Data and Artificial Intelligence, Chengdu Technological University2 No. 1, The second section of Zhongxin Avenue, Pidu District, Chengdu, Sichuan 611730, China (Email: [email protected]) (Received Apr. 13, 2019; Revised and Accepted Jan. 5, 2020; First Online July 13, 2020)
Abstract providers of application software mine information using big data mining technology, analyze users’ preferences, With the emergence of more and more social software and provide more accurate personalized services [9]. How- users, increasingly larger social networks have appeared. ever, the social network also contains sensitive private These social networks contain a large number of sensitive information, which is usually collected and archived by information of users, so privacy protection processing is service providers, so the protection measures of privacy needed before releasing social network information. This and confidential data become critical issues of service paper introduced the hierarchical random graph (HRG) providers. based differential privacy algorithm and the single-source The traditional privacy protection is mainly to encrypt shortest path based differential privacy algorithm. Then, sensitive data, but this method is gradually challenging the performance of the two algorithms was tested by two to play an active role in big data mining technology [13]. artificial networks without weight, which was generated Differential privacy algorithm is a method to deal with by LFR tool and two real networks with weight, which the above problem. Its basic principle is to disturb the were crawled by crawler software. The results show that original data and network structure, including adding, after processing the social network through the differen- deleting, exchanging, etc., to make the disturbing data tial privacy algorithm, the average clustering coefficient different from the original data, i.e., protecting original decreases, and the expected distortion increases. The data through publishing the disturbed data. To reduce smaller the privacy budget, the higher the reduction and the large amount of noise caused by separate privacy in the more significant the increase. Under the same privacy related data sets, Zhu et al. [15] proposed an effective cor- budget, the average clustering coefficient and expected related differential privacy solution. They found that the distortion of the single-source shortest path differential scheme was superior to the traditional differential privacy privacy algorithm are small. In terms of execution effi- scheme in terms of mean square error on a large group of ciency, the larger the size of the social network, the more queries. Li et al. [7] proposed segmentation mechanisms time it takes, and the differential privacy algorithm based based on privacy perception and utility to deal with the on the single-source shortest path spends less time in the personalized privacy parameters of every individual in the same network. data set and maximize the efficiency of the differential pri- Keywords: Differential Privacy; Hierarchical Random vacy calculation. Experiments a large amount of original Graph; Single Source Shortest Path Model; Social Net- data sets verified the effectiveness of the method. work Chen et al. [2] proposed two optimization techniques, PrivTHR and PrivTHREM, to optimize the differen- 1 Introduction tial privacy in wave clusters, and the simulation results showed that the optimization technique had high practi- The popularity of wireless communication technology and cability when the privacy budget allocation was appropri- intelligent mobile terminals makes people’s communica- ate. This paper briefly introduced the differential privacy tion more and more convenient, and a variety of com- algorithm based on a hierarchical random graph (HRG) munity communication application software makes more and the differential privacy algorithm based on the single- and more registered users on the Internet, to build a vast source shortest path. Then the performance of the two and sophisticated social network [1, 12]. Social network algorithms was tested by two artificial networks without contains different kinds of relevant information. Service weight, which was generated by LFR tool and two real International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 839 networks with weight, which were crawled by crawler soft- Figure 1 is one kind of HRG in the social network. The ware. division of G by binary tree is similar to the random di- chotomy of a node set. As shown in Figure 1, HRG di- chotomizes five nodes into (1, 2, 3) and (4, 5) groups. 2 Differential Privacy Algorithm The binary tree root (i.e. the box in Figure 1) of the two groups shows the connection probability of the two 2.1 The Concept of Differential Privacy groups, and the formula is: Social network is a network of points and lines in the P r = e /(n · n ), visual image. Every node represents a user, while the y L,r R,r line represents the connection between users. Points and where r is the internal node (root node) in the sample lines in the social network diagram contain various vi- tree, i.e. the box node of HRG in Figure 1, nL,r and tal data. At present, the commonly used social network nR,r are the number of network nodes on the left and privacy is divided into two categories, both of which sub- right sides under root node r, and er is the number of stantially change the overall structure of the social net- connection edges between node sets on both sides of the work graph. One is to cluster the network nodes into root node. After that, the dichotomy of groups continues, ”clusters” by using the clustering algorithm [8] and then and the connection probability is calculated until the seg- encrypt them; the other is to add disturbance to the net- mentation completes. The HRG based differential privacy work graph structure, including deleting, exchanging, and algorithm is as follows. adding nodes and connections. Although the former can hide the privacy data well, it seriously destroys the local 1) Firstly, the sample tree of HRG is sampled by Markov structure of the network and affects the typical mining of Chain Monte Carlo (MCMC) method [3], and the the network structure data. Although the latter disturbs details are as follows. A sample tree (HRG) T0 is the network structure, the overall scale is the same, and randomly selected. Then a neighbor tree is generated the impact on the regular use of the data is not signifi- according to the previous sample tree, and whether cant. Differential privacy is one of the protection methods to update the sample tree is determined according to of the latter. The definition of differential privacy is as the acceptance probability. The formulas are: follows. If the following equation holds: T 0 α ε Ti = P r(F (D1) ∈ S) ≤ e P r(F (D2) ∈ S). Ti−1 1 − α 0 ε1(log L(T )−log L(Ti−1)) α = min(1, exp( ) )) (1) Then the algorithm F can complete ε-differential privacy. P log L(T ) = − r∈T nL,rnR,rh(pr) D1 and D2 are two data sets which only had difference h(pr) = −pr log pr − (1 − pr) log(1 − pr) in one data; S is the output result of algorithm F to D1 and D , and it is in the domain of definition of algorithm 2 where T , T , and T 0 are sample trees after and be- F ; ε is called privacy budget [5], and its value determines i i−1 fore the update and the neighbor tree of T respec- the protection degree of differential privacy to data, in i−1 tively, α is the probability of acceptance, log L(T ) is details, the smaller the value is, the higher the protection the logarithm of similarity between the sample tree degree is and the larger the disturbance of data addition and G, h(p ) is Gibbs Shannon entropy function. is. r Through Equation (1), the sample tree is updated and iterated until log L(T ) before and after update and iteration is smaller than the set threshold. The number of samples is selected after a certain number of iterations, and finally the stable sample tree set SST = (TS1,TS2, ··· ,TSN ) is obtained through sam- pling, where TSN stands for the sample tree which is obtained by the N-th sampling after the sample tree becomes stable through iterations. Figure 1: The schematic diagram of social network and one of its HRG 2) Sample tree set SST is added with Laplace noise [5]. After noise addition, the calculation formula of the connection probability of node r inside the sample tree is: 2.2 Differential Privacy Algorithm Based e + laplace(ε−1) on HRG P r0 = min(1, r 2 ), nL,r · nR,r HRG [6] divides the hierarchical structure of G = (V,E) 0 using binary tree, in which V is a set of nodes in a net- where pr is the connection probability of internal work and E is a set of relationships among network nodes. node r after noise addition. International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 840
3) The lower triangular matrix of every HRG in SST trix representing the constraints is obtained. The after noise addition is calculated, and then the lower relevant steps are as follows. triangular mean value matrix of SST is calculated. The element in the lower triangular matrix is the a. Firstly, node is selected from the network as a connection probability of each pair of network nodes, source point and induced into to set V0, and which can be obtained through the multiplication of then nodes which can be reached in one step 0 from v0 are selected from the remaining nodes pr in HRG. to form set Q. 4) According to the connection probability of network b. Node µ which has the smallest edge weight with nodes in the lower triangular mean value matrix, the V0 is selected from Q. Then a row is added in connection edges between nodes is set. constraint matrix A according to constraint con- dition {f(v0, pre µ) ≤ f(v0, µ)}. Values of the 2.3 Single Source Shortest Path Con- corresponding positions in the matrix are con- straint coefficients obtained by the set of con- straint Model Based Differential Pri- straint conditions. pre µ is µ which is selected vacy Algorithm from Q previously. Then µ is induced into V0, The HRG based differential privacy algorithm described and the path between V0 and Q is updated. Set above can effectively protect the privacy of social net- Q is updated, i.e., deleting µ. works, but it is more aimed at the weightless social net- c. Step 2 repeats until Q becomes an empty set. works, i.e. although the degree of connection between Then the spanning tree which is composed nodes in the social networks in this algorithm is different, of nodes in V0 that has complete induction the importance of each connection is similar, or it does not and corresponding connection edges is added to matter to the algorithm. However, with the expansion of spanning tree sequence T . the scale of the Internet and the increase of social soft- d. A new source point is selected from the re- ware users, social networks not only increase in scale, but maining nodes which are not induced, and then also have different sensitivities between nodes. In order to Steps 1, 2, and 3 repeat until all the nodes are describe social networks more accurately, in addition to induced. Finally constraint matrix A and span- the connection between nodes, different weights are also ning tree sequence T are output. given to the connection, which is used to indicate the im- portance of the connection. The original expression of the 2) After getting the single source shortest path con- social network transforms to: G = (V,E,W ), where W straint model of social network, noise is added to represents the weight set of the corresponding connection differential privacy. The noise addition of social net- edges. For the social network with weight, the weight work includes two aspects: one is to add constraint that it has is also part of the sensitive information, and noise to the weight of the network connection edge, moreover it t is also necessary to deal with the weight and the other is to disturb network nodes. The algo- when dealing with the differential privacy of the social rithm steps of the former are as follows. network as the importance degree of connection edges is represented by weight. a. Firstly, according to the spanning tree in span- The HRG based differential privacy algorithm will af- ning tree sequence T , edge set E in G is divided fect the edge weight in the processing of differential pri- into ET and EN , where ET is the edge set of vacy of social network with weight. Once the edge weight spanning tree and EN is the remaining edge set. in the social network with weight changes, the structure b. Laplace noise is added to the edge weight in ET , of the whole graph will change; although the encryption of and the formula of noise addition is: the information is achieved, the data availability seriously reduces. Therefore, the constraint model of social network 0 wi = wi + laplace(ε1), was constructed by the single source shortest path algo- rithm [10] in this study, and linear constraints were ap- where S(f) stands for the sensitivity of f, i plied to the disturbance of differential privacy on the basis stands for the edge of node pair which accepts 0 of the constraint model. The single source shortest path search by f, and wi and wi are edge weights constraint model based differential privacy algorithm is of the corresponding node pair before and after divided into two steps: 1) Building a single source short- noise addition respectively. est path constraint model; 2) Adding noise to differential c. The edge weight in E is solved based on the privacy. N weight in ET after noise addition, constraint 1) The first step is to build a single source shortest path matrix A and constraint inequation. constraint model. For G = (V,E,W ), nodes in the d. Every spanning tree in spanning tree sequence T network are induced into the corresponding spanning is processed as follows. Node pair which has no tree using Dijkstra algorithm, and the constraint ma- edge originally in the spanning tree is randomly International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 841
Table 1: Data sets of artificial network and Weibo social network
The maximum The minimum Number of nodes Number of edges Average node degree number of nodes number of nodes in the community in the community LFR1 1200 4125 30 110 30 LFR2 5000 9230 35 250 50 Weibo 1 12530 151132 55 1123 122 Weibo 2 21650 213578 64 1624 231
selected. A new edge is added between the node 3 Simulation Experiment pair. The weight of the new edge is the smaller one among the maximum weight of the spanning 3.1 Experimental Environment tree and the shortest path of node pair. In this study, the coding of the above algorithm was real- The algorithm steps of network node disturbance are ized by Python software [4]. The experiment was carried as follows. out with a laboratory server which was configured with Core i7 processor (2.6 GHz), Windows 7 operating system a. Firstly, the number of nodes to be disturbed is and 16 GB memory. calculated according to the set privacy budget, 3.2 Experimental Setup Nn = |laplace(1/ε2)|. The performance of the two differential privacy algo- b. In order to reduce the influence of disturbance rithms was tested by the artificial network data set gener- such as addition and deletion of nodes on the ated by LFR tool and the Weibo social network data set sensitivity of query function, nodes whose node crawled by crawler software. The relevant parameters of degree is smaller than the set threshold are se- the artificial network data set generated by LFR and the lected firstly. Node v is randomly selected, and Weibo social network data set crawled from the Weibo then node set V1 which is connected with v is interface by crawler software are shown in Table 1. LFR processed as follows. generated two artificial network data sets, and the artifi- If (µ1, v) ∈ E,(µ2, v) ∈ E and f(µ1, µ2) = cial network also forms communities of different sizes for w(µ1, v)+w(v, µ2), then w(µ1, µ2) = w(µ1, v)+ simulating the real network. In LFR1, there were 1200 w(v, µ2); if there is no edge between µ1 and µ2, nodes and 4125 edges, with an average node degree of 30; then an edge is constructed. µ1 and µ2 are any the maximum and minimum number of nodes in the com- two nodes in set V1. f is a query function in munity composed of nodes was 110 and 30 respectively. the single source shortest path model, and it In LFR2, there were 5000 nodes and 9230 edges, with returns the shortest path between two nodes. an average node degree of 35; The maximum and min- After nodes in V1 are processed, v and its edge imum number of nodes in the community composed of are deleted. nodes was 250 and 50 respectively. The artificial network The increase of virtual nodes is as follows. Node generated by LFR tool only contained node identifica- v is randomly selected. Then virtual node v1 tion and connection relationship, which belongs to undi- is added. A connection line is added between rected network graph without weight. According to the cv and v1, and the weight of the connection preliminary statistics of two Weibo data networks which edge is the average value of edge weights of were composed of Weibo data crawled by crawler soft- other nodes which connected with v. Moreover, ware, there were 12530 nodes and 151132 edges in Weibo node µ which connects with v is also connected 1, and an average node degree of 55, and the maximum with v1, and the estimation formula of its weight and minimum number of nodes in the community was value is: 1123 and 122 respectively; there were 21650 nodes and 213578 edges in Weibo 2, and an average node degree of w(v1, µ) = w(µ, v) + laplace(S(f)/ε2). 64, and the maximum and minimum number of nodes in the community was 1623 and 231 respectively. Besides c. Step 2 repeats to disturb network nodes until the basic node identification and connection relationship, the number of disturbed nodes reaches Nn. Af- the real network which is composed of Weibo data also ter the noise addition of differential privacy for included weight information such as attribute labels, and original social network G, social network G0 is the real network is a social network with weight. output. For the above four social network data sets, the soil International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 842 networks are processed by the above two differential pri- extent. In this study, two differential privacy algorithms vacy algorithms. Privacy parameter ε of two algorithms were applied to deal with four kinds of social networks in differential privacy processing was set as 10, 1 and 0.1 under different privacy budgets. The average clustering respectively. coefficient before and after the processing is shown in Fig- ures 2 and 3. Algorithm 2.2 represents the HRG based 1) Privacy parameter ε = ε1 + ε2 was used when the differential privacy algorithm; Algorithm 2.3 represents social network was processed by the HRG based dif- the single source shortest path based differential privacy ferential privacy algorithm (Algorithm 2.2), where algorithm, and numbers in brackets after the algorithm ε1 : ε2 = 1 : 1. represent the privacy budget adopted. It was seen from Figures 2 and 3 that the average clustering coefficients 2) Privacy parameter ε = ε1 + ε2 + ε3 was used when the social network was processed by the single source of different social networks before and after differential shortest path based differential privacy algorithm privacy processing were different; the larger the scale of social networks was, the larger the average clustering coef- (Algorithm 2.3), where ε1 : ε2 : ε3 = 2 : 1 : 2. ficient was; the average clustering coefficient of real Weibo networks was significantly larger than that of artificial 3.3 Performance Evaluation networks, which was because that connections between In this study, the performance of the two algorithms was users in real networks are more close and frequent in ad- measured by average clustering coefficient, expected dis- dition to the reason of larger scale. tortion degree and data processing time. The average clustering coefficient [14] could reflect the structure of so- cial network. Comparing the average clustering coefficient of the network before and after the differential processing could understand the degree of privacy protection of an algorithm; the greater the difference was, the higher the degree of privacy protection was. The formula is:
n 1 X 2Ei C = , n k (k − 1) i=1 i i where n is the total number of nodes, Ei is the actual number of connections between nodes adjacent to node i, Figure 2: Average clustering coefficients of two LFR ob- and ki is the number of nodes adjacent to node i. tained by two algorithms under different privacy budgets The expected distortion degree [11] could reflect the degree of distortion of the data after differential privacy processing and could measure the availability of data; the larger the value was, the lower the degree of data distor- tion after processing was and the higher the availability was. The calculation formula is: X X E[d(X,X0)] = p(x)q(x0|x)d(x, x0), X X0 where X and X0 are data sets before and after differential privacy processing respectively, d(x, x0) is the Hamming distance of the data before and after processing, p(x) is the probability distribution of data before processing, and Figure 3: Average clustering coefficients of two Weibo net- q(x0|x) is the probability of differential privacy transfer works obtained by two algorithms under different privacy condition. budgets Due to the randomness of the noise added in the dif- ferential privacy algorithm, the differential privacy algo- rithm of each social network was repeated 10 times under The comparison of the average clustering coefficient different privacy budgets, and the average value was taken under the same network data set suggested that the aver- as the final result. age clustering coefficient after differential privacy process- ing reduced; the smaller the privacy budget was, the more 3.4 Experimental Results the reduction was. The comparison of the average cluster- ing coefficient under the same privacy budget suggested The average clustering coefficient could reflect the degree that the average clustering coefficient of Algorithm 2.3 of clustering among nodes in the network, and it could in the same social network was smaller. Overall, Algo- reflect the structural distribution of the network to some rithm 2.3 was better in the differential privacy protection International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 843
Figure 4: Expected distortion degrees of two LFR obtained by two algorithms under different privacy budgets
Figure 5: Expected distortion degrees of two Weibo networks obtained by two algorithms under different privacy budgets
of social networks. of social network scale and the existence of weights signif- The expected distortion degree could reflect the aver- icantly increased the time required for differential privacy age degree of distortion between the original data set and processing; under the same social network, the average the data set after differential privacy processing. This in- time required by Algorithm 2.3 was significantly less than dex measured the loss degree of effective information in that of Algorithm 2.2, i.e. the single source shortest path the process of differential privacy processing of social net- based differential privacy algorithm was more efficient for works. Once the loss degree of effective information was differential privacy processing of social networks. The too large, social networks would not have the value of HRG based difference privacy algorithm needed to gen- information mining. Under different privacy budgets, the erate neighbor trees constantly in constructing the most expected distortion degree of the four social networks pro- matched HRG and sampled after converging to stability; cessed by the two differential privacy algorithms is shown in this process, it takes some time to converge to stabil- in Figures 4 and 5. It was seen from Figures 4 and 5 ity and sample. The single source shortest path constraint that the expected distortion degree increased after dif- model completed at one time without repeated generation ferential privacy processing with the reduction of privacy and convergence, so it took less time. budget no matter what kind of network it was; under the same privacy budget, no matter what kind of network it was, the expected distortion degree of Algorithm 2.3 was 4 Conclusion smaller; moreover, the expansion of social network scale This paper briefly introduces the differential privacy al- also increased the expected distortion degree of networks gorithm based on HRG and the differential privacy al- after processing by algorithms. gorithm based on a single-source shortest path. Then, The purpose of applying differential privacy algorithm two artificial networks without weights generated by LFR to social network is to add noise to the privacy infor- Gongzu and two real networks with weights crawled by mation, so as to achieve the effect of privacy protection. searcher software were used to test the performance of Therefore, in addition to the encryption effect, the execu- these two algorithms. The results are as follows. tion efficiency of its encryption is also an important per- formance index. The average time of the two algorithms 1) After the two differential privacy algorithms process in processing differential privacy of four networks is shown the community network, the average clustering coef- in Figure 6. It was seen from Figure 6 that the expansion ficient is reduced; the lower the privacy budget, the International Journal of Network Security, Vol.22, No.5, PP.838-844, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).14) 844
[5] A. Friedman, S. Berkovsky, M. A. Kaafar, “A dif- ferential privacy framework for matrix factorization recommender systems,” User Modeling and User- Adapted Interaction, vol. 26, no. 5, 2016. [6] K. Kalantari, L. Sankar, A. D. Sarwate, “Robust privacy-utility tradeoffs under differential privacy and hamming distortion,” IEEE Transactions on In- formation Forensics & Security, vol. 13, no. 11, pp. 1-1, 2018. [7] H. Li, L. Xiong, Z. Ji, X. Jiang, “Partitioning-based mechanisms under personalized differential privacy,” pp. 615-627, 2017. [8] R. Rogers, A. Roth, A. Smith, O. Thakkar, “Max- Figure 6: The average time of two algorithms for differ- information, differential privacy, and post-selection ential privacy processing hypothesis testing,” 2016. [9] T. Steinke, J. Ullman, “Between pure and approx- imate differential privacy,” Computer Science, vol. greater the reduction; under the same privacy bud- 8096, no. 2, pp. 363-378, 2015. get, the single-source shortest path algorithm can re- [10] X. C. Wang, Y. D. Li, “Geo-social network publica- duce more. tion based on differential privacy,” Frontiers of Com- puter Science, vol. 12, no. 6, 2018. 2) After the community network is processed by the dif- [11] X. Wu, Y. Wei, Y. Mao, L. Wang, “A differential ferential privacy algorithm, the smaller the privacy privacy DNA motif finding method based on closed budget of the algorithm, the greater the expected frequent patterns,” Cluster Computing, vol. 21, pp. distortion of the network; under the same privacy 1-13, 2018. budget, the expected distortion of the network pro- [12] B. Yang, I. Sato, H. Nakagawa, “Bayesian differential cessed by the algorithm based on the single-source privacy on correlated data,” in ACM Sigmod Inter- shortest path is smaller. national Conference on Management of Data, 2015. 3) As the scale of social networks increases, the time [13] D. Zhang, D. Kifer, “LightDP: Towards automating required for the two algorithms to process social net- differential privacy proofs,” ACM Sigplan Notices, works also increases, and the algorithm based on the vol. 52, no. 1, pp. 888-901, 2016. single-source shortest path requires less time to pro- [14] G. Q. Zhou, S. Qin, H. F. Zhou, “A differential pri- cess the same social network. vacy noise dynamic allocation algorithm for big mul- timedia data,” Multimedia Tools & Applications, vol. 78, no. C, pp. 1-19, 2018. References [15] T. Zhu, P. Xiong, G. Li, W. Zhou, “Correlated differ- ential privacy: Hiding information in non-IID data [1] A. Bhardwaj, V. Avasthi, S. Goundar, “Impact of set,” IEEE Transactions on Information Forensics Social Networking on Indian Youth: A Survey,” In- and Security, vol. 10, no. 2, pp. 229-242, 2015. ternational Journal of Electronics and Information Engineering, vol. 7, no. 1, pp. 41-51, 2017. [2] L. Chen, T. Yu, R. Chirkova, “Wave cluster with Biography differential privacy,” Computer Science, vol. 11, no. 2, pp. 191–198, 2015. Jian Liu, born in 1975, has received the doctor’s degree. [3] L. Chen, P. Zhu, “Preserving network privacy with He is a lecturer in Chengdu Technological University. He a hierarchical structure approach,” in 12th Interna- is interested in Internet of vehicles, big data analysis and tional Conference on Fuzzy Systems and Knowledge risk management. Discovery (FSKD’15), 2015. Feilong Qin, born in 1983, has received the doctor’s de- [4] G. Eibl, D. Engel, “Differential privacy for real smart gree. He is a lecturer in Chengdu Technological Univer- metering data,” Computer Science Research & De- sity. He is interested in mathematical geology and applied velopment, vol. 32, no. 1-2, pp. 173-182, 2016. statistics. International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 845
Verifiable Attribute-based Keyword Search Encryption with Attribute Revocation for Electronic Health Record System
Zhenhua Liu1, Yan Liu1, Jing Xu1, and Baocang Wang2 (Corresponding author: Yan Liu)
School of Mathematics and Statistics, Xidian University1 Xi’an 710071, P. R. China State Key Laboratory of Integrated Services Networks, Xidian University2 (Email: ly10 [email protected]) (Received Apr. 7, 2019; Revised and Accepted Dec. 1, 2019; First Online Jan. 29, 2020)
Abstract Traditional public key encryption can only support “one-to-one” model, which is not suitable for multi- Considering the security requirements of electronic client data sharing in EHR scenarios. Fortunately, Sahai health record (EHR) system, we propose a ciphertext- and Waters [16] first proposed the concept of attribute- policy attribute-based encryption scheme, which can sup- based encryption (ABE) in 2005, which can provide port data retrieval, result verification and attribute re- “one-to-many” service and be considered as one of the vocation. In the proposed scheme, we make use of the most appropriate encryption technologies for cloud stor- BLS signature technique to achieve result verification for age. Attribute-based encryption contains two variants: attribute-based keyword search encryption. In addition, Ciphertext-Policy ABE (CP-ABE), where the ciphertext key encrypting key (KEK) tree and re-encryption are uti- is associated with access policy, and key-policy ABE (KP- lized to achieve efficient attribute revocation. By giv- ABE), where a client’s secret key is associated with ac- ing thorough security analysis, the proposed scheme is cess policy. Furthermore, Narayan et al., [12] proposed proven to achieve: 1) Indistinguishability against selec- a privacy preserving EHR system using attribute-based tive ciphertext-policy and chosen plaintext attack un- encryption technology in 2010, which enables patients to der the decisional q-parallel bilinear Diffie-Hellman expo- share their data among health care providers in a flexi- nent hardness assumption; 2) Indistinguishability against ble, dynamic and scalable manner. Li et al., [9] designed chosen-keyword attack under the bilinear Diffie-Hellman a new ABE scheme for personal health records system assumption in the random oracle model. Moreover, the using multi-authority ABE, which avoids the key escrow performance analysis results demonstrate that the pro- problem. Reedy et al., [14] proposed a secure framework posed scheme is efficient and practical in electronic health for ensuring EHR’s integrity, and solved the key escrow record system. issue by using two-authority key generation scheme. Since Keywords: Attribute-Based Encryption; Attribute Revo- then, some attribute-based encryption schemes [5, 8] for cation; Electronic Health Records; Keyword Search; Veri- EHR system have been presented. fiability Although attribute-based encryption can achieve fine- grained data sharing, there are many problems to be con- 1 Introduction sidered in practical applications. For example, when a client leaves the system or discloses the secret key, it is Electronic health record (EHR) system can provide essential to revoke the client’s attributes or secret key. health record storage service that allows patients to In order to solve the problem, a lot of revocable ABE store, manage and share their EHR data with intended schemes (RABE) [3,17,23] have been put forward. Yu et clients [13]. With the development of electronic health al., [25] presented a revocable CP-ABE scheme by using record system, much sensitive information from patients proxy re-encryption, which allows an untrusted server to is being uploaded into the cloud. Since the cloud server update a ciphertext into a new ciphertext without de- may be dishonest, it is of vital importance to protect the cryption. Hur et al., [7] proposed an attribute-based ac- confidentiality of the sensitive EHR data. Furthermore, it cess control scheme with efficient revocation in data out- remains to be solved that how to securely share and search sourcing system using key encrypting key (KEK) tree. EHR data without revealing the information of patients. By using Chinese remainder theorem, Zhao et al., [26] in- International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 846 troduced an efficient and revocable CP-ABE scheme in result verification mechanism [10] is used to achieve cloud computing. However, there is few RABE schemes the verifiability for attribute-based keyword search for electronic health record system. encryption, which can reduce the computational In addition, attribute-based encryption can protect overhead of the client. data confidentiality, but hinder data retrieval from en- crypted data in cloud storage. To address this issue, 3) We provide thorough analysis of the security and per- searchable encryption (SE) is proposed. SE contains two formance of the proposed secure EHR sharing sys- types: symmetric searchable encryption (SSE) and asym- tem. The performance analysis results show that the metric searchable encryption (ASE). Song et al., [18] first proposed scheme is efficient and practical for elec- proposed the concept of symmetric searchable encryp- tronic health record system. tion. Boneh et al., [1] introduced the first public-key encryption with keyword search (PEKS) scheme, and for- 1.2 Organization malized a well-defined security notion of semantic secu- The rest of this paper is organized as follows. We de- rity under chosen-keyword attack. After that, a lot of scribe some preliminaries and system architecture in Sec- searchable encryption schemes [6, 15, 21] have been pro- tion 2. A formal definition and security model are given in posed. Furthermore, searchable encryption has widely Section 3. The proposed VABKS-AR scheme is presented been used in electronic health record system. For exam- in Section 4. The security proof and performance analysis ple, Xhafa et al., [24] presented an efficient fuzzy keyword are given in Section 5. Finally, we make the conclusions search scheme with multi-user over encrypted EHR data. in Section 6. Florence et al., [4] proposed an enhanced secure sharing of personal health record system scheme with keyword search in cloud. 2 Preliminaries and System Ar- Searchable encryption allows a client to search over the encrypted data in cloud storage to retrieve the inter- chitecture ested data without decryption. Nevertheless, the semi- trusted cloud server maybe performs search operation on 2.1 Bilinear Map the encrypted data and only returns a fraction of the Let G and GT be groups of prime order p, and g be a results. In order to resist the cloud server’s dishonest generator of G. The mape ˆ: G × G → GT is said to be an behavior, the verification technique [20] was introduced. admissible map if it satisfies the following properties [22]: Zheng et al., [27] proposed a verifiable attribute-based a b ab keyword search scheme using bloom filter and digital sig- 1) Bilinearity:e ˆ(g , g ) =e ˆ(g, g) for all a, b ∈ Zp. nature techniques, which has good performance in search 2) Non-degeneracy:e ˆ(g, g) 6= 1. efficiency, but needs huge computational overhead in the verification process. Sun et al., [19] introduced a verifiable 3) Computability: There is an efficient polynomial- attribute-based keyword search with fine-grained owner- time algorithm to computee ˆ(g, g). enforced search authorization in the cloud, but the ver- ification efficiency is low. Furthermore, Miao et al., [10] proposed a verifiable multi-keyword search over the en- 2.2 Access Structure crypted cloud data for dynamic data-owner. Let P1,P2, ··· ,Pn be a set of parties. A collection Unfortunately, the above existing schemes can not A ⊆ 2P1,P2,··· ,Pn is monotone for ∀B,C: if B ∈ A and achieve fine-grained access control with attribute revoca- B ⊆ C, then C ∈ A. An access structure [22] (respec- tion, data retrieval and result verification for EHR system, tively, monotone access structure) is a collection (respec- simultaneously. tively, monotone collection) A of non-empty subsets of P1,P2,··· ,Pn P1,P2, ··· ,Pn, i.e. A ⊆ 2 \{∅}. The sets in A 1.1 Our Contributions are called the authorized sets, and the sets not in A are called the unauthorized sets. Based on Waters’ scheme [22], we will propose a verifi- able attribute-based keyword search encryption scheme 2.3 Linear Secret Sharing Scheme (LSSS) with attribute revocation (VABKS-AR) for electronic health record system. Our contributions are described A linear secret-sharing scheme [22] Π over a set of par- as follows: ties P is described as follows:
1) We can achieve efficient attribute-level revocation by 1) The shares of each party form a vector over Zp. using a KEK tree and re-encryption. A KEK tree is utilized to distribute attribute group key and re- 2) There exists a share-generating matrix M for Π, encryption assures that the updated ciphertext can- where M has ` rows and n columns. For all i = not be decrypted by the revoked clients. 1, 2, ··· , `, the function ρ labels the i-th row of M as ρ(i). Consider the vector ~v = (s, r2, ··· , rn), where 2) Since the cloud service provider is semi-trusted, the s ∈ Zp is a secret to be shared, and r2, ··· , rn ∈ Zp International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 847
are chosen at random. µi = Mi · ~v is one of ` shares 2.6 KEK Tree of the secret s according to Π, where M ∈ n is the i Zp Let U = {u , u , ··· , u } be the universe of clients and i-th row of the matrix M. The share M · ~v belongs 1 2 n i L be the universe of descriptive attributes in the system. to party ρ(i). Let Gj ⊂ U be s set of clients that hold the attribute λ (j = 1, 2, ··· , q), which is referred to as an attribute Linear reconstruction property [22]: Suppose that j group. G will be used as a client access list to λ . Let G = Π is an LSSS for the access structure . Let S ∈ j j A A {G ,G , ··· ,G } be the universe of attribute groups and be any authorized set, and I = {i : ρ(i) ∈ S} ⊆ 1 2 q GK be the attribute group key that is shared among {1, 2, ··· , `}. Then, there exist constants {ω ∈ λj i the non-revoked clients in G ∈ G. } such that, if {µ } are valid shares of any se- j Zp i∈I i In a KEK tree [7], each node holds a KEK . A set of cret s according to Π, we have Σω µ = s. These j i i KEKs on the path node from leaf to root are called the constants ω can be found in polynomial time in the i path keys. A KEK tree is constructed by the data service size of the share-generating matrix M. manager as follows:
2.4 Bilinear Diffie-Hellman (BDH) As- 1) Each client uid (id = 1, 2, ··· , n) in the universe U is assigned to a leaf node of the tree. Random keys are sumption generated and assigned to all leaf nodes and internal nodes. Let G and GT be multiplicative cyclic groups with prime order p, and g be a generator of G. Given a tuple a b c 2) Each client uid ∈ U obtains the path keys P AKid ~y = (g, g , g , g ), where a, b, c are selected from Zp ran- from its leaf node to the root node of tree, securely. domly. The Bilinear Diffile-Hellman (BDH) problem [1] is abc For example, the client u4 has the path keys P AK4 = to computee ˆ(g, g) ∈ GT . An algorithm B has at least {KEK ,KEK ,KEK ,KEK } in Figure 1. advantage ε in solving the Bilinear Diffile-Hellman (BDH) 11 5 2 1 problem if 3) The minimum cover sets [11] node(Gj) is a minimum Pr[ˆe(g, g)abc ← B(~y)] ≥ ε. set of nodes in the tree, which can cover all of the leaf nodes associated with clients in Gj. KEK(Gj) is a BDH Assumption: We say the BDH assumption [1] holds set of KEK values owned by node(Gj). To consider if no probabilistic polynomial time algorithm can the intersection of P AKid and KEK(Gj), we have solve the BDH problem with a non-negligible proba- KEK = KEK(Gj) ∩ P AKid. bility ε. Let us give an example to illustrate the attribute groups Gj. Suppose {u1, u2, u3} are associated with 2.5 Decisional Parallel Bilinear Diffie- {λ1, λ2}, {λ1, λ2, λ3}, {λ2, λ3}, respectively. We have the Hellman Exponent Assumption attribute group G1 = {u1, u2},G2 = {u1, u2, u3},G3 = {u2, u3}. Let G be a group with order p, and g be a generator Consider the example in Figure 1. If the attribute of . Given G group for attribute λj is Gj = {u2, u3, u5, u6, u7, u8}
q q+2 2q and u6 is associated with leaf node v13, we compute ~y = g, gs, ga, ··· , ga , ga , ··· , ga , the minimum cover sets node(Gj) = {v9, v10, v3} and q q+2 2q s·bj a/bj a /bj a /bj a /bj get KEK(Gj) = {KEK9,KEK10,KEK3}, which will ∀1≤j≤q, g , g , ··· , g , g , ··· , g , be used to encrypt the attribute group key GKλj in q a·s·bk/bj a ·s·bk/bj ∀1≤k,j≤q,k6=j, g , ··· , g , the data re-encryption phase. Since u6 stores path keys P AK6 = {KEK13,KEK6,KEK3,KEK1}, we have KEK = KEK(G ) ∩ P AK = {KEK }, then u can where a, s, b1, ··· , bq ∈ Zp are chosen randomly, the j 6 3 6 decisional q-parallel bilinear Diffie-Hellman exponent decrypt the header message to get the attribute group (BDHE) problem [22] is to distinguish a valid tuple key GKλj using KEK3. aq+1·s eˆ(g, g) ∈ GT from a random element R ∈ GT . An al- gorithm B has advantage ε in solving the q-parallel BDHE 2.7 System Architecture problem if As shown in Figure 2, a verifiable attribute-based key- q+1 word search encryption scheme with attribute revocation Pr[B(~y, eˆ(g, g)a s) = 0] − Pr[B(~y, R) = 0] ≥ ε. (VABKS-AR) system consists of five entities: Trusted Authority (TA), Cloud Service Provider (CSP), Data Decisional q-Parallel BDHE Assumption: We say Owner/Patient, Client/Doctor, and Third Party Audit the decisional q-parallel BDHE assumption [22] holds (TPA). if no probabilistic polynomial time algorithm can solve the decisional q-parallel BDHE problem with Trusted Authority (TA): TA generates the pub- a non-negligible probability ε. lic parameter, the master secret key and the clients’ International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 848
Third Part Audit v1 Result Proof Challenge
Encrypted data v v 2 3 Data Owner/Patient Trapdoor Cloud Service Provider Client/Doctor Encrypted index
G v v v v j ' 4 5 6 7 PP SK
Trusted Authority v8 v9 v10 v11 v12 v13 v14 v15 Figure 2: System architecture of VABKS-AR u1 u2 u3 u4 u5 u6 u7 u8
Figure 1: KEK tree revocation is described as the following nine algorithms:
λ secret key according to their attributes. TA is fully Setup(1 ) → (PP,MSK): TA takes a security pa- trusted in the system. rameter λ as input, and outputs the public parameter PP and a master secret key MSK. Cloud Service Provider (CSP): CSP consists of a data server and a data service manager. The data KeyGen(MSK, id, S) → SK: TA runs this algo- server has huge storage space and a strong computa- rithm, which takes the master secret key MSK, the tional power. The data service manager is in charge identifier id of a legal client uid and attribute set of managing the attribute group keys of each at- S ⊆ L as input. This algorithm outputs a secret key tribute group and providing the corresponding ser- SK to the client uid. vices. We assume the data service manager is honest- but-curious. i.e., it will honestly performs the opera- Encrypt(PP, (M, ρ), K,W ) → (CT,IW ): The data tion but try to acquire much more information about owner runs this algorithm, which takes the public pa- the sensitive data. rameter PP , an access policy (M, ρ), the symmetric key K, and a set of keyword W as input. Using key Data Owner/Patient: A patient is viewed as the encapsulation technology, this algorithm outputs a data owner, which encrypts the sensitive data (i.e. ciphertext CT and an encrypted index set IW . electronic health record system data) and the key- word, and then uploads them to CSP in the form of Re-encrypt(CT, G, RL) → (CT 0, Cˆ): This algo- ciphertext. Meanwhile, the data owner enforces the rithm is performed by the data service manager. Tak- access policy for encrypted data, where the cipher- ing the ciphertext CT , attributes group G ⊆ G and a text will be shared with the client whose attributes revocation list RL as input. This algorithm outputs a satisfy the access structure embedded in ciphertext. re-encrypted ciphertext CT 0 and a header message Cˆ. Client/Doctor: A doctor is viewed as a client, Trapdoor(SK, w) → tk: The client runs this algo- which submits a search query to retrieve the en- rithm, which takes a secret key SK and a keyword crypted EHR stored on the cloud server. Upon re- w as input. This algorithm outputs tk to CSP. ceiving the query, the cloud server searches the in- tended ciphertext by the use of trapdoor. If a client 0 0 Search(P P, tk, IW ) → (C ,ID ): CSP runs this al- is not revoked and her or his attribute set satisfies the gorithm, which takes the public parameter PP , a access policy, the client can decrypt the ciphertext. search token tk and an encrypted index set IW as input. This algorithm outputs intended encrypted Third Party Audit (TPA): TPA can provide the 0 0 verification of search result and response a challenge file set C and corresponding identifier ID to TPA if to CSP. Upon receiving the challenge, CSP returns the search token tk matches with the index set IW ; a proof to TPA. Finally, TPA calculates a value to otherwise, outputs ⊥. verify the integrity of returned search results. Verify(PK,C0,ID0) → (0, 1): TPA runs this algo- rithm, which takes the data owner’s PK, the re- turned encrypted file set C0 and corresponding iden- 3 Formal Definition and Security tifier set ID0 as input. This algorithm outputs 1 if Model passes the result verification; otherwise, outputs 0. 0 3.1 Formal Definition Decrypt(CT ,SK) → K: The client runs this algo- rithm, which takes the ciphertext CT 0 and a secret In this section, the formal definition of a verifiable key SK as input. This algorithm outputs the sym- attribute-based keyword search encryption with attribute metric key K. International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 849
CTUpdate(CT 0,RL0) → (CT 00, Cˆ0): The data ser- Phase 1. The adversary A can adaptively query vice manager runs this algorithm, which takes the re- the challenger B for the trapdoor Tw of any keyword encrypted ciphertext CT 0 and a new revocation list w ∈ {0, 1}∗ in polynomial time. RL0 as input. This algorithm outputs an updated ciphertext CT 00 and a new header message Cˆ0. Challenge. The adversary A sends two equal length keywords w0 and w1 to the challenger B. The only restriction is that w0 and w1 have not been queried 3.2 Security Model for the trapdoor. The challenger B randomly selects
In this section, we will give two security models: in- one bit b ∈ {0, 1}, generates index Iwb for keyword distinguishability against selective ciphertext-policy and wb, and submits the challenge index Iwb to the ad- chosen plaintext attack (IND-sCP-CPA) game and in- versary A. distinguishability against chosen keyword attack (IND- Phase 2. The adversary A can issue more trap- CKA) game. The security of our scheme is based on the door queries for keyword w with the restriction w 6= following two games: w , w . Firstly, according to Waters’ scheme [22], we describe 0 1 the IND-sCP-CPA game as follows: Guess. The adversary A outputs its guess b0 of b and wins the game if b0 = b. Init. The adversary A gives the challenge access policy (M ∗, ρ∗) and a revocation list RL∗, where M ∗ The advantage of the adversary A is defined as follows: has n∗ ≤ q columns. 1 AdvIND−CKA = Pr[b0 − b] − . Setup. The challenger B runs the Setup algorithm, A 2 sends the public parameter PP to A, and then keeps Definition 2. A verifiable attribute-based keyword search the master secret key MSK for himself. encryption scheme with attribute revocation is IND-CKA Phase 1. The adversary A issues polynomial time secure if all polynomial time adversaries have at most a secret key queries for (id, S). The challenger B sends negligible advantage in the above game. SK to the adversary A, but with the restriction that:
∗ 0 1) if uid ∈/ RL , S = S, and the set of attribute 4 Concrete Construction S0 does not satisfy the challenge access policy (M ∗, ρ∗). The concrete construction is described as follows: ∗ 0 λ 2) if uid ∈ RL , then S = S \{λj∗ }, and the set of Setup(1 ): This algorithm selects a bilinear map 0 attribute S does not satisfy the challenge access eˆ : G × G → GT , such that G and GT are cyclic policy (M ∗, ρ∗). groups of order p, an λ-bit prime, and E(·) be a prob- abilistic symmetric encryption algorithm. We define Challenge. The adversary A selects two equal ∗ three hash functions H : Zp → G,H1 : {0, 1} → length message k0 and k1 to the challenger B. Then ,H : → {0, 1}log2 p. Let CL be a client list B randomly selects one bit b ∈ {0, 1} and encrypts k G 2 GT b and RL be a revocation list, where CL and RL are under (M ∗, ρ∗) and the revocation list RL∗. Finally, initially empty. TA runs this algorithm as follows: B sends the challenge ciphertext CT ∗ to A. 1) Pick random a, α ∈ and compute Phase 2. Same as Phase 1. Zp a α Guess. The adversary A outputs its guess b0 of b h = g ,Y =e ˆ(g, g) . and wins the game if b0 = b. 2) Publish the public parameter The advantage of the adversary A is defined as follows: PP = e,ˆ h, Y, H, H1,H2,E(·), CL, RL 1 AdvIND−sCP −CPA = Pr[b0 − b] − . A 2 and keep the master secret key MSK = α him- self. Definition 1. A verifiable attribute-based keyword search encryption scheme with attribute revocation is IND-sCP- KeyGen(MSK, id, S): A client sends its identifier CPA secure if all polynomial time adversaries have at id and a set of attributes S ⊆ L to TA. TA runs this most a negligible advantage in the above game. algorithm as follows:
Secondly, according to Boneh’s scheme [1], we define 1) Select t ∈ Zp randomly and compute a secret the IND-CKA game as follows: key
Setup. The challenger B runs the Setup algorithm, α at t t SK = K = g g ,L = g , {Kj = H(λj) }λ ∈S sends the public parameter PP to A, and then keeps j the master secret key MSK for himself. for the client uid ∈ U. International Journal of Network Security, Vol.22, No.5, PP.845-856, Sept. 2020 (DOI: 10.6633/IJNS.202009 22(5).15) 850
2) Add (id, gat) to the client list CL and send SK Trapdoor(SK, w): A client with identifier id and to the client. attribute set S inputs a secret key SK and a keyword w, The algorithm runs as follows: Encrypt(PP, (M, ρ), K, F,W ): The data owner in- puts the public parameter PP , an access policy 1) The client selects u ∈ Zp randomly, computes 1/u (M, ρ), a symmetric key K, a set of data file F and a qu = g , and sends (id, qu) to TA. Then, TA set of keyword W . This algorithm is run by the data retrieves gat according to id in the client list at α owner as follows: CL, computes qid = g qu , and sends qid to the client. 1) Encrypt the data file F = (f1, f2, ··· fd) as ck = 2) The client calculates a search token tk = Tw = EK(fk) (1 ≤ k ≤ d) with the symmetric key K. u 0 u 0 u H1(w)qid,L = L , {Kj = Kj }λj ∈S and sends 2) Select random s, y2, ··· , yn ∈ Zp and set a col- tk to the data service manager. umn vector ~v = (s, y2, ··· , yn). For 1 ≤ i ≤ `, compute µi = Mi · ~v, where Mi is the i-th row Search(P P, tk, IW ): This algorithm inputs the pub- of M. Choose random numbers r1, ··· , r` ∈ Zp lic parameter PP , a search token tk, and encrypted and calculate index set IW . CSP runs this algorithm as follows:
αs s s CT = {C0,0 = K·eˆ(g, g) ,C0,1 = g ,C0,2 = h , 1) Compute aµi −ri ri ∀i = 1, ··· , ` : Ci = g H(ρ(i)) ,Di = g }. 0 eˆ(C0,1,Tw) αs s 3) Extract a set of keywords lw = 0 0 =e ˆ(g, g) · eˆ g, H1(w) . eˆ(L ,C0,2)
W = (w1, w2, ··· , wm) 2) If there exists some encrypted index Iwδ such that H (l ) = I , send (CT 0, Cˆ), the relevant from the data files F. For each keyword w (1 ≤ 2 w wδ δ 0 δ ≤ m), compute encrypted file set C = {c1, c2, ··· , cτ } and the corresponding identifier set ID0 = {1, 2 ··· , τ} αs s ϕδ =e ˆ(g, g) · eˆ(g, H1(wδ)) to TPA, where τ is the number of returned files; otherwise, return ⊥. and Verify(PK,C0,ID0): This algorithm inputs the I = {I = H (ϕ )}m , W wδ 2 δ δ=1 data owner’s PK, the returned encrypted file set C0 0 where IW is the encrypted index set for the key- and corresponding identifier set ID . TPA runs this word set W . algorithm as the following steps: x 4) Select a random x ∈ p, and compute PK = g Z 1) TPA randomly selects vr ∈ Zp, and generates a as its public key. For each encrypted data file chal =< r, vr > (r ∈ [1, τ]) to CSP. ck with identifier k, calculate a signature σk = c x 2) Upon receiving the chal of TPA, CSP computes (H1(k)g k ) with the data owner’s secret key x. vr ζ = Σr∈[1,τ]vrcr and σ = Πr∈[1,τ]σr , where cr x After the construction of CT , the data owner sends σr = (H1(r)g ) . Then CSP sends (ζ, σ) to d (CT,IW , {ck, σk}k=1) to CSP. TPA. 3) TPA verifies whether the following equation Re-encrypt(CT, G, RL): This algorithm inputs a holds or not. If hold, return 1 and send ciphertext CT , a set of attribute group G ⊆ G and (CT 0, C,Cˆ 0,ID0) to the client; otherwise, re- a revocation list RL. The data service manager runs turn 0. this algorithm as follows: