Volume 3, Issue 4(2), April 2014 International Journal of Multidisciplinary Educational Research
Published by Sucharitha Publications Visakhapatnam – 530 017 Andhra Pradesh – India Email: [email protected] website : www.ijmer.in
Editorial Board Editor-in-Chief Dr. Victor Babu Koppula Faculty Department of Philosophy Andhra University – Visakhapatnam -530 003 Andhra Pradesh – India
EDITORIAL BOARD MEMBERS
Prof. S.Mahendra Dev Prof. Josef HÖCHTL Vice Chancellor Department of Political Economy Indira Gandhi Institute of Development University of Vienna, Vienna & Research Ex. Member of the Austrian Parliament, Mumbai Austria
Prof.Y.C. Simhadri Prof. Alexander Chumakov Director Chair of Philosophy Department Institute of Constitutional and Parlimentary Russian Philosophical Society Studies, New Delhi & Formerly Vice Moscow, Russia Chancellor of Benaras Hindu University, Andhra University Prof. Fidel Gutierrez Vivanco Nagarjuna University, Patna University Founder and President Escuela Virtual de Asesoría Filosófica Prof. (Dr.) Sohan Raj Tater Lima Peru Former Vice Chancellor Singhania University , Rajasthan Prof. Igor Kondrashin The Member of The Russian Philosophical Prof.K.Sreerama Murty Society Department of Economics The Russian Humanist Society and Expert of Andhra University - Visakhapatnam the UNESCO, Moscow, Russia
Prof. K.R.Rajani Dr. Zoran Vujisiæ Department of Philosophy Rector Andhra University – Visakhapatnam St. Gregory Nazianzen Orthodox Institute Universidad Rural de Guatemala, GT,U.S.A Prof. A.B.S.V.Rangarao Department of Social Work Swami Maheshwarananda Andhra University – Visakhapatnam Founder and President Shree Vishwa Deep Gurukul Prof.S.Prasanna Sree Swami Maheshwarananda Ashram Education Department of English & Research Center Andhra University – Visakhapatnam Rajasthan, India
Prof. P.Sivunnaidu Dr. Momin Mohamed Naser Department of History Department of Geography Andhra University – Visakhapatnam Institute of Arab Research and Studies Cairo University, Egypt Prof. P.D.Satya Paul Department of Anthropology Andhra University – Visakhapatnam I Ketut Donder Dr.K.Chaitanya Depasar State Institute of Hindu Dharma Postdoctoral Research Fellow Indonesia Department of Chemistry Nanjing University of Science and Prof. Roger Wiemers Technology Professor of Education People’s Republic of China Lipscomb University, Nashville, USA Dr.Merina Islam Prof. G.Veerraju Department of Philosophy Department of Philosophy Cachar College, Assam Andhra University Visakhapatnam Dr R Dhanuja PSG College of Arts & Science Prof.G.Subhakar Coimbatore Department of Education Andhra University, Visakhapatnam Dr. Bipasha Sinha S. S. Jalan Girls’ College Dr.B.S.N.Murthy University of Calcutta Department of Mechanical Engineering Calcutta GITAM University –Visakhapatnam Dr. K. John Babu N.Suryanarayana (Dhanam) Department of Journalism & Mass Comm Department of Philosophy Central University of Kashmir, Kashmir Andhra University, Visakhapatnam Dr. H.N. Vidya Governement Arts College Dr.Ch.Prema Kumar Hassan, Karnataka Department of Philosophy Andhra University, Dr.Ton Quang Cuong Visakhapatnam Dean of Faculty of Teacher Education University of Education, VNU, Hanoi Dr. E.Ashok Kumar Dr.E. Ashok Kumar Department of Education Prof. Chanakya Kumar North- Eastern Hill University, Shillong University of Pune PUNE
© Editor-in-Chief, IJMER Typeset and Printed in India www.ijmer.in
IJMER, Journal of Multidisciplinary Educational Research, concentrates on critical and creative research in multidisciplinary traditions. This journal seeks to promote original research and cultivate a fruitful dialogue between old and new thought. Volume 3 Issue 4(2) April 2014
Page S.No No 1. 3D Touchless Finger print Recognition with Identical 1 Twin Fingerprint Chetan G. Puri, Dipak S. Kapadane and S.M. Rokade
2. Applying QoS Base Data Replication Attributes In 12 Clouds A.J.Musmade and S.M. Rokade
3. Cloud compiler for C, C# and Java 26 Sagorika Datta, Aradhana, Anjali Dewani and PriyaBankar
4. GSM Networked DTMF Based Smart Password Entry 38 System Charushila B. Bachhav, Poonam S. Bagul and Pradnya S. Sanap
5. Auditing Protocols: A New Approch for Security of 46 Cloud Data Sonali Pardeshi, Ankita Rathi, Shalini Shejwal and Pooja Kuyate
6. An Extraction Technique for Universal Distance Cache 61 Chotia Amit N, Joshi Abhishek A, Gosavi Darpan V and Wagh Ganesh V
7. Safety Management of Construction Workers 74 Vivek K. Kulkarni and R. V. Devalkar
8. Improving Startup Time and Providing Security to 80 Snapshots on Linux Platform Sheetal R. Tambe, Monika Shinde, Rohini Hire, Shanku Mandal and Rokade S. M
9. Efficiently Securing Privacy of User Information in 92 Cloud Based Health Monitoring System Dhoot Suyog S, Naoghare M. M and Shinde Girish. R
10. Cloud Based Mobile Service Delivery 106 Using QoS Mechanism PrachiB.Gaikwad and S.M. Rokade
11. Fault Diagnosis in Induction Motor 117 K.R.Gosavi and A.A.Bhole
12. A review on intrusion detection system for web based 127 application R S Jagale and M M Naoghare
13. Implementation of Enhanced security on vehicular 141 cloud computing Rajbhoj Supriya K and Pankaj R. Chandre
14. Document Clustering for Forensic Analysis & 149 Investigation Dhokane R.M and Rokade S.M
15. Bluetooth file Transfer with Breakpoint 175 Priyanka V. Godse, Snehal P.Katore, Poonam A.Modani and Poonam B.Sonawane
16. Optimal Multiserver Configuration for Profit 184 Maximization in Cloud Computing Pravin Pokale, Vishal Agrawal and Rahul Wakchaure
17. Facilitating Effective User Navigation through Website 196 Structure Improvement Jyoti B. Kshirsagar and S .D. Jondhale
18. VeinSecured: A Detection and Prevention of DDoS 204 Attacks Sucheta Daware, Saurabh Chatterjee and Shrutika Jadhav 19. An Apotheosis Extraction Approach by Genetic 218 Programming Swati Shahi, Priyanka Tidke, Shital Vanis and Jayshree Sangale
20. Constriction Based Particle Filter to Denoise Video and 234 Trace Multiple Moving Objects A. R. Potdar and V. K. Shrivastava
21. Android Application on Latest Auditions Online Portal 247 Patil Kalpesh R,Sangale Swapnil V, Baviskar Sachin M and Mahajan Vilas U
22. A Survey On User Identity Verification via Keyboard 254 and Mouse Dynamics Ashwini Subhash Sonawane, Uzma Anis Shaikh, Vaishali Sitaram Kadu and Shedge Kishne N
23. Intelligent Transportation System 261 Deepti Patil, Sheetal Sharma, Kiran Madihalli and Gunjan Deore
24. Green Cloud Energy Efficient A New Methodology for 267 Creating Autonomous Software Deployment Packages Rohan Nagar, Sagar Karad and Rakesh Vaishnav
Editorial …….. Provoking fresh thinking is certainly becoming the prime purpose of International Journal of Multidisciplinary Educational Research (IJMER). The new world era we have entered with enormous contradictions is demanding a unique understanding to face challenges. IJMER’s contents are overwhelmingly contributor, distinctive and are creating the right balance for its readers with its varied knowledge. We are happy to inform you that IJMER got the high Impact Factor 2.735, Index Copernicus Value 5.16 and IJMER is listed and indexed in 34 popular indexed organizations in the world. This academic achievement of IJMER is only author’s contribution in the past issues. I hope this journey of IJMER more benefit to future academic world. In the present issue, we have taken up details of multidisciplinary issues discussed in academic circles. There are well written articles covering a wide range of issues that are thought provoking as well as significant in the contemporary research world. My thanks to the Members of the Editorial Board, to the readers, and in particular I sincerely recognize the efforts of the subscribers of articles. The journal thus receives its recognition from the rich contribution of assorted research papers presented by the experienced scholars and the implied commitment is generating the vision envisaged and that is spreading knowledge. I am happy to note that the readers are benefited. My personal thanks to one and all.
(Dr.Victor Babu Koppula)
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 3D TOUCHLESS FINGERPRINT RECOGNITION WITH IDENTICAL TWIN FINGERPRINTS
Chetan G. Puri1, Dipak S. Kapadane2, Prof.Sharad M. Rokade3 3Associate Professor and Head,Computer Engg. Dept. 1, 2, 3Sir Visvesvaraya Institute Of Technology, Nashik [email protected], [email protected] [email protected]
Abstract verification system can Fingerprint recognition with distinguish identical twins identical twins is a challenging without drastic degradation in task due to the closest genetics- performance. (b) The chance that based relationship existing in the the fingerprints have the same identical twins. Several pioneers type from identical twins is have analyzed the similarity 0.7440, comparing to 0.3215 from between twins’ fingerprints. In non-identical twins. (c) For the this work we continue to corresponding fingers of identical investigate the topic of the twins which have same similarity of identical twin fingerprint type, the probability fingerprints. (1) Two state-of-the- distribution of five major art fingerprint identification fingerprint types is similar to the methods: P071 and VeriFinger probability distribution for all 6.1 were used, rather than one the fingers’ fingerprint type. (d) fingerprint identification method For each of four fingers of in previous studies.. (2) A novel identical twins, the probability of statistical analysis, which aims at having same fingerprint type is showing the probability similar. distribution of the fingerprint Fingerprints are types for the corresponding traditionally captured based on fingers of identical twins which contact of the finger on paper or have same fingerprint type, has a platen surface. This often been conducted. (a) A state-of- results in partial or degraded the-art automatic fingerprint images due to improper finger
1 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 placement, skin deformation, Introduction slippage and smearing, or sensor Biometrics refers to the noise from wear and tear of automatic identification of a surface coatings. A new person based on his or her generation of touchless live scan physiological or behavioural devices that generates 3D characteristics. These methods representation of fingerprints is have advantages over traditional appearing in the market. This token based identification new sensing technology approaches using a physical key addresses many of the problems or access card, and over stated above. However, 3D knowledge based identification touchless fingerprint images need approaches that use a password to be compatible with the legacy for various reasons. First, the rolled images used in Automated person to be identified is required Fingerprint Identification to be physically present at the Systems (AFIS). In order to solve point of identification to provide this interoperability issue, we his or her biometric traits. propose a unwrapping algorithm Second, identification based on that unfolds the 3D fingerprint in biometric characteristics avoids such a way that it resembles the the need to carry a card or effect of virtually rolling the 3D remember a password. Finally, finger on a 2D plane. Our the biometric characteristics of preliminary experiments show identified person cannot be lost promising results in obtaining or forged. During the past few touchless fingerprint images that decades, a number of verification are of high quality and at the systems based on different same time compatible with legacy biometric characteristics have rolled fingerprint images. been proposed [3]. Fingerprints are the pattern of Keywords: Fingerprint, ridges on the tip of our fingers. dizygotic, monozygotic, Minutiae They are one of the most mature points biometric technologies and are
2 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 considered legitimate proofs of ovulation. The two fertilized eggs evidence in courts of law all over develop separately and have their the world. Fingerprints are fully own genes. They may or may not formed at about seven months of be the same gender. Monozygotic fetus development and finger twins result from one fertilized ridge configurations do not egg. This egg divides into two change throughout the life except individuals who will share all of due to accidents such as bruises their genes in common. These and cuts on the finger tips. More twins are genetically identical, recently, an increasing number of with the same chromosomes and civilian and commercial similar physical characteristics applications (e.g., welfare and, therefore, they cannot be disbursement, cellular phone distinguished using the same access, laptop computer log-in) deoxyribonucleic acid. are either using or actively considering using fingerprint- based identification because of the availability of inexpensive
and compact solid state scanners as well as its superior and proven matching performance over other Figure 1. Some examples of biometric technologies. fingerprint images in our There are two basic types of database. (a) Are fingerprint twins: dizygotic, commonly images of four fingers of the first referred to as fraternal twins and twin, and (b) are the fingerprint monozygotic, referred to as images of the corresponding four identical twins [4]. Dizygotic twins fingers of his/her identical twin. result from two eggs that are (c) And (d) show fingerprint fertilized separately by two images from a non-identical twin different sperms. This usually pair. happens when the mother An automated fingerprint produces more than one egg at authentication system consists of
3 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 three components, namely, image advantages with respect to the acquisition, feature extraction contact-based technology. The and matching. Among the three, main reason is the cost of this image acquisition is often technology. In fact, in order to considered the most critical as it keep the production costs of determines the fingerprint image these devices low, their quality, which has a large effect manufacturers often use only one on the system performance[1]. camera. This results in Traditionally, fingerprint images fingerprint images with less are acquired by pressing or usable area, due to the curvature rolling a finger against a hard of the finger, compared to the surface (e.g., glass, silicon, contact-based approach. In a polymer) or paper (e.g., index touchless fingerprint image, the card). This often results in apparent frequency of the ridge- partial or degraded images due to valley pattern increases from the improper finger placement, skin centre towards the side until deformation, slippage and ridges and valleys become smearing, or sensor noise from undistinguishable. Hence, wear and tear of surface coatings. dedicated algorithms are needed to correct the ridge-valley pattern 3D RECONSTRUCTION OF with an increase in the overall TOUCHLESS computational load. FINGERPRINTS
Touchless fingerprinting is essentially a remote sensing technique used to capture the ridge-valley pattern. While it is not a completely new approach to Figure 2. Fingerprint acquire fingerprints [2, 3, 4], it did acquisition using a set of not generate a sufficient interest cameras surrounding the finger. in the market, in spite of its
4 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 This paper presents the (3) Compared to Sun et al.’s continued investigations of the method [7], the fingerprint ability of fingerprint verification database is from the same source. technology to distinguish However, only a part of the between identical twins. fingerprint dataset (51 pairs) was (1) Compared to all the methods used in [7], while the whole [2][5][7], two state-of-the-art fingerprint dataset (83 pairs) is fingerprint identification used in this paper. methods: P071 and VeriFinger (4) A novel statistical analysis is 6.1 are used for twin fingerprint conducted for five major identification in this paper rather fingerprint types, which aims at than one fingerprint showing the probability identification method in [2][5][7]. distribution of the fingerprint (2) Compared to Jain’s [2] and types for the corresponding Srihari’s [6] methods, six fingers of identical twins which impressions per finger were have same fingerprint type. This captured rather than just one is novel in our paper. (5) A impression, which makes the probability analysis is conducted genuine distribution of matching for four fingers from identical- scores more realistic. As we twins, which aims at showing know, the genuine distribution of which finger has higher matching scores needs to be probability of having same estimated from matching fingerprint type. This is also multiple fingerprint impressions novel in our paper. of the same finger. In both Jain’s Methods and Srihari’s databases, due to In this paper, two state-of- only a single impression for each the-art methods are used to finger was captured, the identify the similarity of twin distribution of the genuine scores fingerprints: P071 [13] and has to be synthesized, i.e., it is VeriFinger 6.1 SDK (VF6.1) 0. not from the real genuine The details are given as follows. matching.
5 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 P071 Algorithm process of similarity computing. P071 algorithm is first Fuzzy features were used to used for the identification of twin represent n and d. Each fingerprints which has been character is associated with a evaluated in the Fingerprint fuzzy feature that assigns a value Verification (between 0 and 1) to each feature Competition 2004 (FVC2004) and vector in the feature space. The the performance was ranked value, named degree of No.3 among all of the membership, illustrates the participated algorithms. The degree of similarity of the detailed performance of the template and input fingerprints. proposed algorithm on FVC2004 can be seen from the website [15]. The P071 method was based on a normalized fuzzy similarity measure. The algorithm has two main steps. First, the template and input fingerprints were aligned. In this process, the local topological structure matching Figure 3. ROC curves for was introduced to improve the identical-twin and non-twin matching by P071 method. robustness of the global alignment. Second, the method of normalized fuzzy similarity VeriFinger 6.1 SDK measure was introduced to VeriFinger 6.1 SDK [14] is a world- compute the similarity between well-known commercialized the template and input fingerprint recognition software, fingerprints. Two features are which is based on an advanced selected: the number of matched fingerprint recognition sample points (n) and the mean technology and is intended for distance difference of the biometric system developers and matched minutiae pairs (d) in the integrators. The technology
6 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 assures system performance with Parametric 3D Fingerprint fast, reliable fingerprint Unwrapping matching in 1-to-1 and 1-to-many modes and comparison speeds of up to 40,000 fingerprints per second. VeriFinger 6.1 SDK has many features: (1) NIST MINEX proven reliability; (2) robust processing of poor quality and deformed fingerprints; (3) more than 50 scanners are supported by VeriFinger SDK. Some of the Figure. 4. Parametric functions for VeriFinger 6.1 SDK unwrapping using a cylindrical are listed as follows: model (topdown view). Point
1. Enroll fingerprint. Fingerprint (x,y,z) on the 3D finger is can be enrolled from the image or projected to (µ, z) on the 2D by using fingerprint the scanner. plane. 2. Enroll fingerprint with Non-parametric 3D generalization. Using this option, Fingerprint Unwrapping several fingerprints can be enrolled and features generalized. 3. Verification. Using this option, one fingerprint can be verified against the other (1:1 matching). 4. Identification. Using this option, the fingerprint is identified against an internal Figure. 5. 3D representation of database (1: N matching). the finger. Vertices of the triangular mesh are naturally divided into slices.
7 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 3D Fingerprint Figures 6 (a) and (b) show the unwrapped touchless fingerprint images using the cylindrical- based and the proposed method, respectively. Minutiae points (white arrows) are extracted using the feature extraction Figure. 6. Unwrapping a 3D algorithm in [14] and distances fingerprint captured with between a few minutiae points Surround Imager using (a) the (red solid lines) are shown in cylindrical-based parametric Figures 6. These figures show method and (b) the proposed that the proposed unwrapping non-parametric method. method better preserves the inter-point distances with less distortion than the cylindrical- based method. To demonstrate the compatibility of the unwrapped touchless fingerprints with legacy rolled images, we have collected a small database with 38 fingers; each includes one ink-on paper rolled print and one touchless print using the new line scan sensor.
Figure.7. Visualizing compatibility between (a) a touchless fingerprints from line-scan sensor using the
8 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 proposed nonparametric likely to be the same in twins unwrapping. than in unrelated persons, and (b) the corresponding ink-on- more recent studies confirm this. paper rolled fingerprint. we propose a unwrapping Conclusion algorithm that unfolds the 3D We are presenting a study fingerprint in such a way that it of the distinctiveness of biometric resembles the effect of virtually characteristics of identical twins rolling the 3D finger on a 2D fingerprint. We have assessed the plane. capacity of state-of-the-art commercial biometric matchers References in distinguishing identical twins [1]. Tao X, Chen X, Yang X, Tian based on fingerprint. Although J, et al. (2012) Fingerprint the unimodal fingerprint Recognition with Identical biometric system also can Twin Fingerprints. PLoS discriminate two different ONE 7(4): e35704. persons who are not identical doi:10.1371/ journal. twins better than it can pone.0035704 discriminate identical twins, this [2]. Yi Chen1, Geppy Parziale2, difference is not as large as for Eva Diaz-Santana2, and Anil the face biometric system. In the K. Jain1 3d Touchless fingerprint experiments, the Fingerprints: Compatibility identical twin impostor with Legacy Rolled Images distribution is shifted to the [3]. Jain AK, Bolle R, Pankanti S right, getting closer to the (1999) Biometrics: personal genuine distribution. This identification innetworked suggests a higher correlation society: kluwer academic between fingerprints of identical publishers. twins compared to fingerprints of [4]. Jain AK, Prabhakar S, unrelated persons. Previous Pankanti S (2002) On the studies have shown that the similarity of identical twin fingerprint type is much more
9 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 fingerprints. Pattern Recognition Pattern Recognition. 2149- 35: 2653–2663. 2156. [5]. Ashbaugh, D. R. (1999). [10]. Lee, H. C. and Gaensslen, R. Quantitative-Qualitative E. (2001). Advances in Friction Ridge Analysis. Fingerprint Technology, 2nd Boca Raton, Florida: CRC Edition. United States of Press LLC. America: CRC Press. [6]. Cummins, H. and Kennedy, [11]. Lin, C. H., Liu, J. H., R. (1940). Purkinjes’ Osterburg, J. W. and Nicol, Observations (1823) On J. D. (1982). Fingerprint Fingerprints and Other Skin Comparison 1: Similarity of Features. American J. Police Fingerprints. Journal of Sci. 31(3). Forensic Science, 27 (2), 290- [7]. Fayrouz, N. E., Farida, N. 304. and Irshad, A. H. (2011). [12]. Liu, Y. and Srihari S. N. Relation between (2009). A Computational fingerprints and different Discriminability Analysis on blood groups. Journal of Twin Fingerprints. Forensic and Legal Computational Forensic, Medicine. 19, 18-21. 5718, 43-54. [8]. Jain, A. K., Prabhakar, S. [13]. Neale, M. C. and Maes, H. and Pankanti, S. (2001). H. M. (2004). Methodology of Twin Test: On Genetic Studies of Twins Discriminability of and Families.Netherlands: Fingerprints. Audio and Kluwer. video based Biometric Person [14]. Simon, A. C. (2002). Suspect Authentication, 1 (2091), Identities: A History of 211-217. Jain, Fingerprinting and Criminal [9]. Kong, A. W., Zhang, D. and Identification. United States Lu, G. (2006). A Study of of America: President and Twins’ Palmprints for Fellows of Harvard College. Personal Verification.
10 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 [15]. Temaj, G., Juric, T. S., education/ECE523/Slides.pdf Tomas, Z., Behluli, I., ? Narancic, N. S., Sopi, R., [20]. Introducing a New Jakupi, M. and Milicic, J. Multimodal Database from (2012). Qualitative Twins /PID2759199.pdf dermatoglyphic traits in monozygotic and dizygotic twins of Albanian population in Kosovo. Journal of Comparative Humnan Biology. [16]. Yager, N. and Amin, A, (2004). Fingerprint verification based on minutiae features: a review. Pattern Analytical Application, 7, 94-113. [17]. Can Identical Twins be Discriminated Based on Fingerprints? biometrics.cse.msu.edu/.../Fi ngerprint/JainetalTwinFpTe chReport00.pdf? [18]. The Fingerprint Sourcebook - National Criminal Justice Reference https://www.ncjrs.gov/pdffile s1/nij/225320.pdf [19]. Fingerprint Identification Overview Introduction - CVIP Lab www.cvip.uofl.edu/wwwcvip/
11 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Applying QoS Base Data Replication Attributes In Clouds
Ms. A. J. Musmade Prof. S. M. Rokade (Computer Engineering Department, HOD, Computer Engineering Department, SVIT COE, Nasik/ University of Pune, India) SVIT COE, Nasik/ University of Pune, India)
Abstract - Cloud computing is using some concepts of HQFR an important mechanism for [High QoS First Replication] utilizing computing services. algorithm. Along with the Because of the flexible QoS requirement our main nature of cloud computing goal is to minimize data environment most of the replication cost. So we are applications are developed in developing another this environment. As there algorithm which is inspired are number of applications from MCMF [Minimum Cost they have different quality of Maximum Flow] algorithm. service (QoS) requirements. At the end we will propose an Due to the data corruption in efficient scheme for data data nodes, numbers of replication on the basis of applications are unable to QoS requirement. reach their successful outcomes. To support such Index Terms—Cloud type of applications Computing, Data replication, continuously we are Quality of service. performing data replication on the basis of QoS I. INTRODUCTION requirements of the Cloud computing is corresponding application. becoming an important For performing such type of mechanism for utilizing data replication we are computing services worldwide. developing an algorithm Cloud computing have different
12 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 features like transparency in application in running state resource allocation and service which has request of that provisioning facilities. There is a corrupted data cannot access that rapid growth in new information data. Due to this data corruption, services and business oriented most of the applications are applications via internet. Because unable to reach their successful of the flexible nature of cloud outcomes. So for supporting such computing environment most of type of applications continuously the applications are developed in by avoiding data corruption a this environment. As there is new technique is introduced in tremendous increase in data cloud computing system which is intensive applications there is nothing but data replication. In necessity of new efficient data replication we maintain techniques for processing of a more than one replica that means huge volume of data. Cloud copies of each data block to avoid computing have focus on data corruption. There are scalability and availability of different techniques used for data large scale applications. Apache replication in cloud computing. Hadoop can be regarded as a But very few of them concerned typical example of cloud about quality of service computing. requirements of applications. Cloud computing system In this paper, we are processes a large volume of data focusing on quality of service in the network for performing a requirements of applications huge number of applications. In while performing data the network, there are number of replication. According to our nodes. As there are huge number knowledge very few papers are of nodes there is more possibility concerned with this quality of of hardware failure or system service(QoS) requirement failure. Due to hardware failure problem in data replication. Here some data stored at node may get QoS requirement is considered corrupted. Simultaneously an from the aspect of request
13 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 information of an application. We interested in finding efficient are trying to investigate the energy consumption of nodes in problem of applications regarding cloud computing system. Our QoS requirements while goal is to perform data performing data replication.We replication by considering the are trying to solve this QoS QoS requirements of application problem in data replication and and also the energy consumption will propose an efficient requirements of applications. technique to improve this Apart from this our paper problem. Along with this our goal is distributed in following is to minimize the total data sections. Section II describes the replication cost and to minimize related work. Section III gives the number of QoS violated data the system model. Section IV replicas [1]. describes efficient schemes for In our newly proposed data replication. And lastly we technique we will use some conclude our paper work. algorithms. The idea for first II. LITERATURE SURVEY algorithm is taken from HQFR To tolerate failures of algorithm. HQFR stands for High application in the cloud QoS First Replication. In this computing various concepts are algorithm the application having introduced. We will have look to high QoS will be considered first them. and it can perform data Some techniques are replication. Along with the introduced to avoid data minimization of replication cost corruption in hadoop distributed our aim is to minimize the file system (HDFS). For number of QoS violated data NameNode failure checkpoint replicas. For achieving such type method is used. In this method of goal we will introduce another NameNode periodically combines algorithm which gives optimal the existing checkpoint and solution to the QoS requirements journal to create a new in data replication. We are also checkpoint and an empty journal.
14 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Where checkpoint is persistent failure which is nothing but the record of the image stored in the data replication. In this local host’s native file system and technique they maintain more journal is storage of changes in than one copies of each data the image. The checkpointNode block. In HDFS default data usually runs on a different host replication factor is two. When a from the NameNode since it has new block is created, it places the the same memory requirements first replica on the node where as the NameNode. It takes the the writer is located, the second current checkpoint and journal and third replicas on two files from the NameNode, merges different nodes in a different rack them locally and returns the new and the rest are mounted on checkpoint back to the random nodes with restrictions NameNode. Creating checkpoints such that no more than one is one way to protect the file replica is placed at one node and system metadata [2]. no more than two replicas are Another concept of placed in the same rack when the BackupNode is discussed in the number of replica is less than same paper. It is similar to twice the number of racks [2]. CheckpointNode, creating There are many periodic checkpoints but in techniques introduced to improve addition it keeps record of an data availability and reliability in inmemory, up to date image of cloud computing system, but very the file system namespace which few of them investigate the is always synchronized with the problem of QoS requirements of NameNode. If NameNode fails, application in data replication. the BackupNode’s image in The problem is memory and the checkpoint on investigated of placing the disk is a record of the latest replicas of an object in content namespace state [2]. distribution systems to meet the A new technique is QoS requirements of clients with introduced to tolerate DataNode the objective of minimizing the
15 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 replication cost. In this paper more efficient than l-Greedy- QoS requirements are specified Delete when optimal solution has in the form of a general distance few replicas and l-Greedy-Delete metric. They consider two classes become more efficient when of service models which is replica optimal solution has many aware services and replica blind replicas [3]. services. They proposed efficient A simple problem algorithms to compute the formulation is NP-complete is optimal locations of replicas shown firstly. They present under different models. Their different types of heuristics for goal is to find a replica placement object replication that satisfies that satisfies all requests without the specified access time violating any range constraint, deadlines and they try to achieve and minimize the update and low storage overhead. Their goal storage cost at the same time. was to minimize the number of They show that this QoS aware replicas presented in system. replica placement problem is NP- They proposed a simple complete for general graphs and algorithm which finds the they provide two heuristic solution of QoS aware problem. algorithms called l-Greedy-Insert This algorithm may be known as and l-Greedy-Delete, for general Greedy MSC [Greedy Minimum graph and a dynamic Set Covering [4]. programming solution for tree Authors presented a new topology. Since l-Greedy-Insert heuristic algorithm for QoS starts by inserting replicas into a aware problem which is based on empty replica set & l-Greedy- the idea of cover set. It Delete starts by deleting replicas determines the positions of a from a full replica set, the minimum number of replicas execution time of these two expected to satisfy certain quality algorithms depends on the requirements. Their placement number of replicas in the optimal algorithm exploits the data access solution. So l-Greedy-Insert is history for popular data files and
16 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 computes replica locations by about minimization of replication minimizing QoS satisfaction for a cost, minimization of QoS given traffic pattern [5]. violated data replicas and Authors proposed a new minimization of energy heuristic algorithm, called consumption. Greedy-Cover, which finds good solutions for QoS aware replica III. SYSTEM MODEL placement problem in general For designing our graph. Algorithm helps to decide replication strategy we refer to the positions of the replicas to the architecture of Hadoop improve system performance and Distributed File System. The satisfy the quality requirements basic architecture of HDFS specified by the user cluster is consisting of two major simultaneously [6]. nodes NameNode and DataNode Authors proposed a [2]. replica replacement strategy to make dynamic replica management effective. A dynamic replacement strategy is proposed for a domain based network where a weight of each replica is calculated to make the decision for replacement [7].
All the above explained Figure 3.1: Hadoop Distributed algorithms are not suitable for File System solving our QoS aware problem. Fig. shows the In our system number of replicas architecture of HDFS. Multiple is fixed for specific data block. So DataNodes are mounted in a it is not easy to minimize rack. There are number of racks replication cost with fixed at a NameNode. They number of replicas. Our QoS communicate with each other aware problem is concerned through switches. Files and
17 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 directories are represented on the disk access latencies). After that, NameNode by i nodes, which if data corruption occurs in the record attributes like node running the high-QoS permissions, modification and application, the data of that access times, namespace and disk application will be revert from space quotas. The file content is the low-latency node. Since the divided into large numbers of low-performance node has slow blocks and each block of the file communication and disk access is independently replicated at latencies, the QoS requirement of multiple DataNodes. NameNode the high-QoS application may be keeps the record of datablocks violated. Note that the QoS stored in DataNodes. DataNodes requirement of an application is send heartbeats to the defined from the aspect of the NameNode to represent its request information. proper functioning. We are trying to In a cloud computing investigate the problem of QoS cluster of thousands of nodes, requirements satisfaction while failures of a node (most performing data replication in commonly storage faults) are cloud computing. The problem is daily occurrences. A replica given as, due to limited space for stored on a DataNode may replication some high QoS data become corrupted because of get stored at lower performance faults in memory, disk, or node and cannot reach to the network. To avoid the data appropriate application to give corruption, the data replication correct outcomes. Sometimes low technique is used which provide QoS data get stored at high high data availability. Because of performance node, that data is the heterogeneous nature of not in use for longer time, in this node, the data of a high-QoS way it is wastage of data application may be replicated in a replication space at high low-performance node (the node performance node. This type of having slow communication and lower frequency data blocks
18 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
which are not in use by any . Assumption 1: consider a application or they cannot meet cloud computing system with their appropriate application, are a set of storage node (S). known as QoS violated data These nodes can be run replicas. We are trying to solve applications along with this QoS problem in data storing data. The replication and will propose an functionality of storage nodes efficient technique to improve is similar to the storage nodes this problem. Along with this our in HDFS. goal is to minimize the total data . Assumption 2: Let r be the replication cost and to minimize requested node such as r ϵ S. the number of QoS violated data If node r is running its replicas. application writes a data block b to the disk of node r. IV. AN EFFICIENT DATA Then the replication request REPLICATION is forwarded from node r to TECHNIQUE the cloud computing system. Then the copies of block b are In this section, we will replicated to other nodes in represent two efficient the cloud computing system. algorithms for solving QoS aware . Assumption 3: Now let q be problem in data replication. The the satisfied node. That mean main goal of our scheme is to a replica copy of block b from minimize replication cost and to node r is stored as dr block at minimize QoS unsatisfied data node q. Let T be the time replicas count. We are also required to store dr. dr is interested in finding an efficient associated to the replication algorithm for performing data cost (RC) and total access replication with minimum energy time of replication (AC). consumption. We have some . Assumption 4: When node r assumptions for solving our QoS cannot read its data block b aware problem: due to data corruption then it
19 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 will retrieve the data replica than original copy and they dr from the node q, but if mounted on different data racks access time AC is greater to avoid rack failure. than the time T, then dr Basic idea of algorithm: As becomes QoS violated the name indicates the (unsatisfied) data replica. applications with high QoS . Assumption 5: Our main should be replicated first. goal is to minimize the total According to our knowledge the number of Qos violated data high QoS application have replicas and the total stricter requirements in the replication cost for all data response time of a data access blocks. time than the normal applications. High QoS Now we will see the requirement application should algorithms: take precedence over the low QoS requirement application to 4.1. HQFR algorithm: As the name indicates High QoS perform data replication. First Replication algorithm. The In the cloud computing main thing is that we are system when any application considering the QoS requirement perform a write operation then from the aspect of request the node at which that information and its access time application is executing will only. In HDFS the data is divided forward a replication request of a into 64MB data blocks. The data block to the NameNode. The replication factor is two in HDFS. access time means the qoS There are two numbers of copies requirement of that application is of data block other than the also attached with that request original one. And that two copies which going to generate a QoS are stored on different aware replication request. Like DataNodes or different data this multiple QoS aware racks. And the NameNodes keeps replication requests are issued in track of all the replicas other the cloud computing system from
20 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 different nodes. But these The requested node R and its i requests are processed and sort qualified node Qj should not be them in ascending order mounted in the same rack. according to their associated They should belong two access time. If the replication different racks. request r has higher QoS Rack(R ) (1) i i) ≠ Rack(Qj requirement than the replication Where Rack() is the function to request rj that means the ri has determine in which rack a
smaller access time than rj. In node is located. such a case r will be processed The total data replica access i first to store its data replicas in time from qualified node Qj to this algorithm. request node Ri (Taccess(Ri,Qj)) should be smaller than the QoS requirement of running
application in Ri which is Tqos. T (R ,Q (2) access i j) ≤ Tqos After finding the qualified nodes by using these two conditions the data block can store its one data replica in each While processing these qualified nodes and the replication request we have to qualified nodes update their find the qualified node’s list replication space respectively. which helps to satisfy the QoS Now we will calculate the total requirements of the appropriate replication cost. In HQFR application while running. The algorithm the total replication QoS requirement is given in the cost is represented by the total form of access time of that data storage cost taken by all the block which is requested by an requested nodes to store their application. Note that while appropriate replicas. The finding qualified node it should replication cost is nothing but satisfy two conditions:
21 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
the total summation of storage using set Sr and Sq form a costs of all data block replicas. network flow graph. The But we are mainly interested in vertices in the graph are
minimizing replication cost from both the sets Sr and Sq and also the number of QoS and each edge represents violated data replicas. For the pair of appropriate achieving second objective we capacity and cost of the data are going to propose another replication. Then by algorithm for data replication. applying a suitable MCMF 4.2. An efficient optimal algorithm find out an replica placement efficient solution for that algorithm: As its name network flow graph. Then indicates, this algorithm after we will perform the gives an efficient solution to same operation for the the QoS aware replication unqualified nodes problem. In this algorithm corresponding to each
we are transforming QoS requested node Ri. Form the aware problem to the new graph from both the MCMF [Minimum Cost sets described above. Solve Maximum Flow] problem. the graph by using same As same to the previous MCMF algorithm. Consider algorithm in this algorithm both the solutions obtained also we are going to find out previously and perform an efficient optimal placement SqRi the set of qualified nodes for each requested of all QoS violated data replicas. node Ri. Then after we will make union of the set of Because of optimal placement of Qos-violated data qualified nodes Sq with the replicas the number of these newly derived set SqRi which is set of qualified replicas are minimized, which nodes corresponding to each is our main goal. As we have used MCMF algorithm in this requested node Ri. Then by
22 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 scheme, we get our solution in In this algorithm similar to polynomial time. In this above algorithm we are scheme the second part is finding a set of qualified having one flow graph. The nodes corresponding to each
amount of this flow graph is requested node Ri. Create set the amount of flow leaving of such nodes. After that we
from requested node Ri.. Here have check status of each we are considering the amount qualified node. So we will of flow leaving, which is not collect energy status of each added to the total replication node and make another set
cost, which automatically for this Er. helps in minimizing the total Then according to the energy replication cost. Hence we status of nodes, we should achieved our both objectives. sort them in descending order, node with higher energy should come first. The replication request of that node should be considered first from the set of requested node. So the replication request is performed in minimum time with efficient energy. This algorithm is not derived in But we are interested in real time yet, but we are minimizing energy trying to develop this consumption in data algorithm soon. replication. We are going to investigate another algorithm for energy optimization. 4.3. Efficient Energy optimization algorithm:
23 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
V. CONCLUSION REFERENCES We have investigated the Journal Papers: QoS requirement problem in data Jenn-Wei Lin, Chien-Hung [1] replication in cloud computing Chen, and J. Morris Chang “ system. We have been proposed QoS – Aware Data an efficient scheme to solve the Replication for Data Intensive QoS aware problem in data Applications in Cloud replication. First algorithm is Computing System”, Digital inspired from HQFR algorithm. Object Identifier 10.1109 / This algorithm cannot give TCC 2013 IEEE. K. Shvachko, H. Kuang, S. optimal solution to the QoS [2] aware problem. So we proposed Radia, and R. Chansler, “The another algorithm which gives Hadoop Distributed File optimal solution in polynomial System,” in Proc. IEEE 26th time. This algorithm also helps in Symp. Mass Storage Systems achieving both the objectives of and Technologies (MSST), our paper which is minimization Jun. 2010, pp. 1–10. X. Tang and J. Xu, “QoS- of replication cost and [3] minimization of QoS violated Aware Replica Placement for data replicas. Content Distribution,” IEEE In future, we are going to Trans. Parallel and Distrib. find out an efficient energy Syst., vol. 16, no. 10, pp.921– optimization algorithm to energy 932, Oct. 2005. Won J. Jeon, I. Guptaand K. consumption problem of nodes [4] Nahrstedt “QoS Aware Object while performing data replication Replication in Overlay in cloud computing system. We Networks”, IEEE want to develop a new algorithm GLOBECOM 2006, IEEE. which gives proper solution to X. Jia, Deying Li, Hongwei this problem. We will try our best [5] Du and Jinli Cao “On levels to implement that Optimal Replication of Data algorithm in real time. Object at Hierarchical and
Transparent Web Proxies”,
24 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 IEEE Transactions on Mohamed-K HUSSEIN, [10] Parallel and Distributed Mohamed-H MOUSA"A Systems, VOL 16, No. 8. Light-weight Data August 2005. Replication for Cloud Data H. Wang, Pangfeng Liu, Jan- Centers Environment" [6] Jan Wu “A QoS aware International Journal of Heuristic Algorithm for Engineering and Innovative Replica Placement” Grid Technology (IJEIT) Volume Computing Conference 2006 1, Issue 6, June 2012 IEEE. Dejene Boru, Dzmitry [11] K.Shashi and T. Santhanam Kliazovich, Fabrizio Granelli, [7] “Replica Replacement Pascal Bouvry, and Albert ,Y. Algorithm for Data Grid Zomaya"Energy-Efficient Environment”, ARPN Data Replication in Cloud Journal, Vol. 8,No. 2, Feb Computing Datacenters". 2013. M. Creeger, “Cloud [12] Nihat Altiparmak and Ali S, Computing: An Overview,” [8] aman Tosun "Integrated Queue, vol. 7, no. 5,pp. 2:3– Maximum Flow Algorithm for 2:4, Jun. 2009. Optimal Response Time K. V. Vishwanath and N. [13] etrieval of Replicated Data" Nagappan, ‘Characterizing 2012 41st International Cloud Computing Hardware Conference on Parallel Reliability,” in Proc. ACM Processing Symp. Cloud Computing, Da-Wei Sun ,Xing-Wei Wang Jun.2010, pp. 193–204. [9] "Modeling a Dynamic Data E. Pinheiro, W.-D. Weber, [14] Replication Strategy to and L. A. Barroso, “Failure Increase System Availability Trends in a Large Disk Drive in Cloud Computing Population,” in Proc. 5th Environments" JOURNAL USENIX Conf. File and OF COMPUTER SCIENCE Storage Technologies, Feb. AND TECHNOLOGY 27(2): 2007, pp. 17–28. 256272 MAR. 2012.
25 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Cloud Compiler for C, C# and Java
SagorikaDatta, Aradhana, Anjali Dewani, PriyaBankar Sir Visvesvaraya Institute of Technology, Nashik University of Pune Email: [email protected]
Abstract-A compiler converts of excess space required to higher order language manually install compilers on instructions into lower order each machine and other set- or assembly language up options and configuring if instructions. Whereas, cloud not installed using the computing is a metonym for default settings and distributed computing over a parameters. Platform network, and means the independence is also an issue ability to run a program or when a program is compiled, application on demand, on so it is nearly impossible to many connected computers at transport the same code to the same time with marginal other machines. Also usage of efforts of management. The multiple languages implies cloud compiler created by us the installation of multiple is an amalgam ofthe best of compilers. To avoid all these both worlds, embedding not 1 hindrances, our project, the but 3 compilers in cloud. In online cloud compiler today’s technologically provides easy access to the advanced day and age, there compilers of three majorly is a new and advanced used programming languages: operating system or software C, Java and C#, just by which makes the software accessing through any released a mere few years ago browser enabled device with obsolete and unsupported on any network connection, thus the new platform. For providing remote access as compilers, there is the issue well as platform
26 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 independence. We can just client accessing the required upload the program on cloud resources of cloud can be and it will get compiled along impervious to the settings and with its space and time mechanism of the system which complexity. This promotes actually provides said resources. portability, conservation of Even though there is the slight space and less overhead. drawback of loss of management over the infrastructure/ Keywords: Compiler, Cloud surrounding utilized by the users, Computing, Cloud Compiler, the many benefits of cloud Platform Independence including the conservation of memory, minimization of costs make cloud computing the new I. INTRODUCTION age technology clamored by A. CLOUD COMPUTING everyone. The existing The National Institute of Standards and Consequently, security on cloud Technology (NIST) defines ‘Cloud can also be enhanced through use Computing’ as ‘a model for of various innovative techniques enabling easy, on-demand such as through use of aggregate network access to a shared pool of keys, etc. Cloud computing offers configurable computing resources a varied range of services, some of (e.g., networks, servers, storage, the more common of which are applications, and services) that Software as a Service (SaaS), can be rapidly provisioned and Platform as a Service (PaaS) and released with minimal Infrastructure as a Service. The management effort or service types of cloud include public, provider interaction.’ The private, hybrid, community and advantage of this being that the combined cloud computing. resources on cloud can be Fig. 1 displays the overall layout accessed remotely through any of cloud computing. browser enabled device. Also, the
27 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 2. Generation of efficient object code. 3. Performs run-time organization 4. Formatting the output according to asset according to linker and assembler conventions.
Fig. 1. Cloud Computing The compiler comprises of:- B. COMPILER The Front End: It performs verification of syntax and For the purpose of converting semantics and generation of source code to object code, an intermediate compilers are used. Where the representation of the source source code includes higher level code for processing the middle programming languages and the end. Type checking is also object code includes lower order done by collecting type programming languages. A information. It produces compiler usually entails executing faults and Warnings, if any, in operations like: lexical analysis, a constructive way. preprocessing, parsing, semantic Middle End: It performs: analysis (Syntax directed- 1. Optimization, including translation), code generation and removal of useless or code optimization [17]. unreachable code.
2. Discovery and propagation Structure of a Compiler [17] of constant value. 3.
Relocation of computation to The Bridging of source program less frequently executed way. is done by compilers in high level 4. Generation of another languages with underlying intermediate representation hardware. for the back end. The work done by compilers is:- The Back End: 1. Verification of code syntax
28 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 1. It generates assembly code by allocating register in Our project as stated before process. includes embedding 3 major 2. NP heuristic techniques are compilers in cloud, hence the used for optimization of name cloud compiler, which not algorithm. only enables remote access to the 3. It helps in optimizing target compiler, but after compiling the code utilization of the program (which can be created hardware. using the editor of one’s own The existing system includes choice) also gives us the time and manually installing compilers on space complexity of the program. our computers. Apart from As a result of this, there is occupying more space on the conservation of memory, platform system, there is at times the independence and the much overhead of need of separate needed portability that today’s editors for each compiler and professionals on the move finding the compatible software require. Fig. 2 provides an with the operating system on the overview of the cloud compiler. machine. If the software is not compatible, for example, the older version of Turbo C being incompatible with Windows 8, we may need to take other measures like the use of DosBox to run programs, or installing a virtual machine to install an operating Fig. 2. Overview of Cloud Compiler system like Windows XP which can thereby compile and run the II. ONLINE CLOUD program by installing the COMPILER software in it. Additionally, there The Cloud Compiling™ Cloud is the disadvantage of not being Compiler™ (CC) is a family of able to use the compiler remotely. cloud compiler licensed programs
29 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 for the IBM OS/390 and z/OS 2. Load: This will load the operating environments. A cloud selected file on cloud, where compiler is a program that by it will be interpreted to functions equivalently to an categorize it in 1 of the three actual compiler but does not languages offered by our cloud require that the actual compiler compiler, which are either of be installed or licensed on the C, Java or C#. machine on which it runs. The 3. Compile: Here, the uploaded CC utilizes FTP (the File program is compiled using the Transfer Protocol client of the appropriate compiler for the IBM OS/390 or z/OS SecureWay language of the program and Communications Server) to if there are any errors, then transmit the user’s source code to they are displayed in the error another mainframe (on which the window along with the line actual compiler is installed), number and details which a compile it there, and return the normal compiler would output of the actual compiler usually provide. If the (system messages, listing, object program is error free, then code, etc.) to the user’s specified the correct output is displayed target datasets. Most of the in the output window. options and features of an actual The following Fig. 3, 4, and 5 compiler are supported. display the user interface of the The principal function of our project which exhibit the compiler in cloud is undertaken simplistic way through which one in three very simple steps: can make use of the project. 1. Choose File: Where you select the file to be uploaded from your device. The file should be the program that you want to compile, typed in the editor of your choice.
30 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 As opposed to any similar concept that may have been implemented before ours, we use the concept of Platform as a Service rather than Software as a Service. In the SaaS model, cloud providers install and operate application software in the cloud and cloud users access the software from cloud clients. Cloud Fig. 3.Homepage GUI users do not manage the cloud infrastructure and platform where the application runs. This eliminates the need to install and run the application on the cloud user's own computers, which simplifies maintenance and support. Cloud applications are different from other applications
Fig. 4. Output of uploaded in their scalability—which can be program achieved by cloning tasks onto
multiple virtual machines at run- time to meet changing work demand [15]. In the PaaS models, cloud providers deliver a computing platform, typically including operating system, programming language execution environment, database, and web server. Application developers can Fig. 5.Errors displayed in Error Window develop and run their software
31 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 solutions on a cloud platform services. The .NET Framework is without the cost and complexity designed to fulfil the following of buying and managing the objectives [16]:
underlying hardware and To provide a consistent object- software layers. With some PaaS oriented programming offers like Windows Azure, the environment whether object underlying computer and storage code is stored and executed resources scale automatically to locally, executed locally but match application demand so that Internet-distributed, or the cloud user does not have to executed remotely.
allocate resources manually. The To provide a code-execution latter has also been proposed by environment that minimizes an architecture aiming to software deployment and facilitate real-time in cloud versioning conflicts.
environments[14]. To provide a code-execution environment that promotes Platform as a Service offered by safe execution of code, the cloud compiler is also including code created by an advantageous on the fact that unknown or semi-trusted once registered as a user, the third party.
client can access the compilers To provide a code-execution indefinitely as opposed to using environment that eliminates software that is on lease and may the performance problems of expire after a certain period of scripted or interpreted time. environments.
III. MICROSOFT .NET To make the developer FRAMEWORK experience consistent across widely varying types of The .NET Framework is a applications, such as technology that supports building Windows-based applications and running the next generation and Web-based applications. of applications and XML Web
32 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
To build all communication 2. Compiler embedding: It on industry standards to involves embedding the ensure that code based on the previously integrated .NET Framework can compilers on cloud. integrate with any other code. 3. Passing input: Sending the input from user side to the
IV. PROJECT ARCHITECTURE cloud compiler for compiling. The architecture used in the 4. Retrieving output: After system is two-fold i.e. upper layer recognition of language from and lower layer. The upper layer the cloud compiler side, the comprises of server and lower compiler compiles the given layer comprises of clients of lower input program and resends configuration. the result back to the user. Following are the key Protocols used are the SOAP and components of the upper layer: WSDL protocols.
1. A web framework, Visual Fig. 6 gives an explicit Studio 2010: It handles the representation of the architecture work of scripting and of the cloud compiler. compilation of code 2. Internet Information Services Server: It handles the client request. 3. Cloud hard disk: It is a shared resource.
Our project includes the following: Fig. 6. Architecture of Cloud 1. Compiler Integration: This Compiler entails the integration of the C, C# and JAVA compilers in cloud in our Project.
33 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
V. PROJECT File managing, running compiler IMPLEMENTATION and processing the compiler result is done by the script. List While developing the software it of source code and list of errors is of the essence to decide which are sent as result to the user. programs will be executed on server side and which programs VI. PROJECT USE will be executed on client side. When the program to be The main purpose of the creation transferred to the user is of a project like this is to provide moderate in size this approach is utmost convenience to used. The application is run on students/professionals on the the server and the data is move as mentioned before. It transferred between client and gives freedom from being server. With the online compiler restricted to older technologies all the execution is done on the just to use compilers which are server side and the information unsupported by newer ones, as has to be on to the server. After one can easily update their the execution of the information, systems and still be able to the server sends back the result compile programs at ease. to the client that made the When it comes to the specifics of request. usage, however, we can The designing of the front end is recommend using our projects in to be as simple as possible that arenas like the conduction of loads quickly and is platform University Online Practical independent. Examinations, where instead of ASP.NET handles the having to specifically check each communication between a user system for the compiled program and compiler. Therefore the and using different editors and server side uses this part of compilers to do so, the cloud application for implementation provides a centralized system using ASPX written in ASP.NET. whereby we can easily compile
34 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 and run programs. Same goes for advantage of this project is that the interviews for applicants for a upgradation of compiler package technical job, where the company can be done easily without might need to test their coding installing it on each and every skills in the Technical Aptitude machine. round. Here, the cloud compiler In future we hope to implement a will provide a simple and major compiler embedding of sophisticated way through which almost all available compilers. the coding prowess of the applicant can be evaluated by the REFERENCES personnel in-charge and the errors/output can easily be [1] Vouk, M. A., “Cloud displayed without much hassle. Computing – Issues, Research andImplementations" - ITI
VII. CONCLUSION AND 2008 - 30th International FUTURE SCOPE Conference on Information Technology Interfaces By integrating and enhancing the [2] Sweet, W. and Geppert, L., capabilities of fundamental “http:// It has changed technologies we are introducing everything, especially our the “Cloud Compiler” to engineering thinking,” IEEE contribute to the ease of remotely Spectrum, January 1997, pp. compiling programs without the 23-37. overhead of manually installing [3]Camposano, R.; Deering, S.; compilers separately on the DeMicheli, G.; Markov, L.; system. Mastellone, M.; Newton, A.R.; As this would eradicate the need Rabaey, J.; Rowson, J.; “What’s of installing compilers separately, ahead for Design on the Web”, the professionals/students need IEEE Spectrum, September not to visit a specific system but 1998, pp. 53-63. can check the codes at the [4] Hank Shiffman, Making Sense centralized server. Another of Java,
35 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 http://www.disordered.org/Java Vulnerabilities”, Security & -QA.html Privacy, IEEE March-April [5] Hank Shiffman, Boosting Java 2011 Performance: Native Code and [11]Chunye Gong Jie Liu Qiang JIT Compilers, Zhang Haitao Chen Zhenghu http://www.disordered.org/Java Gong, “The Characteristics of -JIT.html Cloud Computing”, Parallel [6] Gundavaram, S.,.CGI Processing Workshops Programming on the World (ICPPW), 2010 39th Wide Web.OReilly& Associates, International Conference Inc., 1996. [12]Junjie Peng Xuejun Zhang [7]Wall,L., Christiansen, T., Zhou Lei Bofeng Zhang Wu Schwartz, R.L. Programming Zhang Qing Li, “Comparison of Perl, OReilly& Associates, Inc., Several Cloud Computing 1996 Platforms”, Information [8] Shufen Zhang Shuai Zhang Science and Engineering Xuebin Chen Shangzhuo , (ISISE), 2009 Second “Analysis and Research of International Symposium 3594 Cloud Computing System [13]AamirNizam Ansari, Instance”, Future Networks, SiddharthPatil, 2010. ICFN '10. Second ArundhatiNavada, Aditya International Conference Peshave, VenkateshBorole , [9] Shuai Zhang Shufen Zhang Online C/C++ Compiler using Xuebin Chen XiuzhenHuo, Cloud Computing, Multimedia “Cloud Computing Research Technology (ICMT), July 2011 and Development Trend”, International Conference, pp. Future Networks, 2010. ICFN 3591-3594. '10. Second International [14] Platform-as-a-Service Conference Architecture for Real-Time [10]Grobauer, B. Walloschek, T. Quality of Service Management Stocker, E. , “Understanding in Clouds Cloud Computing
36 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 [15]Hamdaqa Mohammad, A Reference Model for
Developing Cloud Applications
[16] http://msdn.microsoft.com/en- us/library/zw4w595w(v=vs.110 ).aspx [17] Ano, Alfred V.,RaviSethi, Ullman, Jeffrey D.(1986) Compilers; Principles, Techniques and Tools (1st ed.) Addison-Wesley. ISBN 9780201100884
37 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 GSM NETWORKED DTMF BASED SMART PASSWORD ENTRY SYSTEM 1.Charushila Bhika Bachhav,SVIT ,Nashik 2.Poonam Shivaji Bagul,SVIT,Nashik 3.Pradnya Somnath Sanap,SVIT,Nashik Under the guidance of:-S.A.Gade,SVIT,Nashik [email protected] [email protected] [email protected]
Abstract— Most of the Hacking is the most security problems meet likely and unusual security unexpectedly on the Internet problem with itself. True are due to human faults. hacking usually means that The first level of security the hacker had no or few leaks occurs at the time of information on his target and website development.Hacker does most of the could extract confidential breakthrough with his own information from the website knowledge. General users are itself if a website developer usually not the target of doesn't correctly plan or hackers; hackers will usually proof test his try to get through security scripts.Although some of the barrier of big organization’s virus might contain Internet servers or try to damageable programs for hack Website Servers. They your computer or even allow usually get to do so by using a distant user to take control some software engineering of your computer most of faults that have yet to be them will usually not affect fixed. your computer. These At large to help in this programs are known as case,there is no particular "Trojan Horse". way unless you're the developer of that software.
38 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 To prevent this problem we aim of the proposed system is to developed a system in which develop a cost effective solution we uses a keyboard less that will provide security against password entry system by attackers. using DTMF card through Generally in every system mobile phones. we enter a password using the Keywords- keyboard in which any outsider Decoder,DTMF,GSM,Microco person hack the password or can ntroller,Mobile phone. access important data by hacking that system. When we press the I. INTRODUCTION Today the most common internet keys on the keyboard .that security problem is hacking . keystroke can captured by the True hacking means that the Trojan horse which is already hacker had only few or none entered in the system through information on his hacking goal the network. It captures all and he can do most of the keystrokes which we press and technological advance with the send it to the other person. The knowledge which already he Trojan horse is a small program knows. Generally common usrs that does not attempt to delete not the target of hackers.Hackers anything on user’s disk, but always try to hack big instead replicates itself on the organizations Internet servers or computer s or networks. It enters Website Servers. They use some silently in system during software engineering techniques authentication. When we enter that have yet to be fixed. Pass password using keyboard on a few months there is continuously system that time Trojan horse the problem of hacking is in can gathers important data from news. We proposed a system to system and send back to the prevent problem of hacking. The outsider person who can misuses system imply keyboard less that data. In big organizations password entry through mobile like NASA security is must. To phones by using DTMF card. The prevent that problems our
39 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 system works in which we use a pins of 8870.These 4 bit output id DTMF card through mobile displayed on the DTMF circuit. phones These 4 bit is converted II.SYSTEM DEVELOPMENT into particular decimal number A. Block Diagram when passing to server. B. Circuit Diagram
Fig I:Pictorial representation of complete system One mobile in Fig II:Decoding of DTMF MT- Administrates hand which is a 8870(DECODER) sending phone. Receiver phone is attached to the DTMF card. Receiver phone in set to auto answering mode so when call is made it automatically pick up the call. Earphone is permanently attached to the receiving phone. 8870 DTMF decoder is connected Fig III:Microcontroller output control to receiver. device After receiving the phone In the circuit diagram, the by receiving phone, sending working of system’s mechanism phone or we can say that has explained. Here, a mobile Administrators phone acts as a phone is connected in the control remote. as any key is pressed on unit with headset. When a call is sending phone, the corresponding made the mobile phone in the 4-bit output appears on output control unit automatically
40 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 answered after that password is units and the voice message unit being pressed. The Decoder MT- will be activated by MC.The MC 8870 converts these DTMF sends a deactivation signal when tones.(As shown in Fig II for the the recorded message is played system)[1].Then output which is back.In the same manner this decoded sent to the operation is continues until the microcontroller, which issues the last call is performed.The speaker command to control devices output of the ISD is connected to which is connected to it (As the cellular phone speaker ,so shown in Fig. III for the system) that the recorded message is [1]for. Switching of Device is directly heard by the receiving performed by relay. end of the phone that has been called. C. Voice Message Circuit D. The DTMF Generation and Decoding DTMF is a generic communication term used for touch tone.It is a registered trademark of AT & T.DTMF i.e.Dual Tone Multi Frequency is a popular method of signaling used between switching center Fig IV:Circuit diagram for voice message and telephones. DTMF is also used for the signaling between The circuit diagram for the computer network and the voice message units is shown telephone.When we pressed the in FigIV.The numbers which is digits of mobile phone then recorded on the SIM card of the automatically DTMF tones are mobile phone are called produces and these DTMF tones sequentially when the triggering are different for every signal is detected by digit.Generally mobile phones microcontroller from the scanned
41 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 have 12 keys .These 12 keys are Following table gives frequency 1,2,3,…,9,0,#,*.each key on the pairs [2] on key presses. mobile phone has the unique Mobi Higher Lower signal .this signal is called as le Frequenc Frequenc DTMF Signal. The extra keys keys ies in Hz ies in Hz 0 1209 941 A,B,C and D are not present on 1 1209 697 the cellphone but these keys also 2 1336 697 has unique frequency pairs. these 3 1477 697 keys are the special keys used for 4 1209 770 special purpose.When call is 5 1336 770 6 1477 770 connected ,and pressing of any 7 1209 852 numeric key of mobile phone 8 1336 852 generates the DTMF signal. 9 1477 852 These DTMF signals are audiable * 1336 941 to all of us.the DTMF tone for # 1477 941 A 1633 697 each key is a combination of 2 B 1633 770 different frequencies.each key C 1633 852 has unique frequency pair D 1633 941 associated with itself. hence it Above table shows generates the unique tone or we frequency pairs of each key.Each key combines unique can say that generates unique combination of pairs hence it DTMF signal.DTMF tone for generates unique tone. each key is combination of one III.RESULT AND ANALYSIS higher frequency and one lower frequency. For example,DTMF tone for key 7 is sum of two sinusoidal waves of frequency 1209 HZ as higher frequencyand 852 HZ as lower frequency.
42 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 In our application we use and resistor the tone duration a DTMF receiver IC. MT8870 is can be set to different values. the most commonly used DTMF This circuit is configured in receiver IC used in electronic balanced-line mode.A balanced communications circuits. This differential amplifier input is MT8870 is an 18-pin IC. This IC used to reject common-mode is used in telephones and a noise signals. This circuit also number of other applications. A provides an excellent bridging quick testing of IC MT8870 could interface across a properly save a lot of time in terminated telephone line. manufacturing industries of Transient protection is achieved communication instruments and by splitting the input resistors research labs. Here’s a little and and inserting zener diodes (ZD1 easy to test tester circuit for the and ZD2) to achieve voltage DTMF IC. This DTMF IC can be clamping. This allows the assembled on a multipurpose transient energy to be PCB with an 18-pin IC base. This disapproved in the diodes and IC can also test on a simple resistors, and limits the breadboard. maximum voltage that may For optimum working of appear at the inputs. telephone equipment, the DTMF receiver is designed to accept Whenever you press any successive digit tone-pairs that key on your local telephone are greater than 40 ms apart and keypad, on receiving the tone- to . recognized a valid tone pair pair the delayed steering greater than 40 ms in duration. (Std)output of the IC goes high, However, for other applications causing LED5 (connected to pin like radio communications and 15 of IC via resistor R15) to glow. remote controls , the tone Depending on the values of duration may change due to noise capacitor and resistors at pins 16 considerations. Therefore, by and 17 it will be high. adding an extra steering diode The optional circuit shown
43 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 within dotted line is used for 852 133 8 H 1 0 0 0 254 6 guard time adjustment. The 852 147 9 H 1 0 0 1 246 LEDs are connected via resistors 7 941 120 0 H 1 0 1 0 250 R11 to R14 at pins 11 through 9 14, respectively, indicates the output of the IC. The tone-pair IV. FUTURE SCOPE DTMF (dual-tone multi- 1. in missile firing. frequency) generated by pressing 2. Home security system the telephone button is converted 3. Mobile / Wireless Robot control into binary values internally in 4. Wireless Radio Control the IC. The binary values are indicated with the help of LEDs V. CONCLUSION at the output pins of the IC. 1. The main purpose of our LED4 represents the most project is to perform virtual significant bit (MSB) and LED1 password insertion for any represents the lowest significant system, so we can choose bit (LSB). secure way to enter password A.Functional Table using DTMF card through Following table shows the mobile phones by using GSM decoding of DTMF tones into network. Decimal Number: 2. The usual way to solve the FL FHI K T Q Q Q Q Deci problem of hacking is to OW GH e O 4 3 2 1 mal y W No. make entry of password by 697 120 1 H 0 0 0 1 247 using virtual password entry 9 697 135 2 H 0 0 1 0 251 system and better planning 6 when coding your website 697 147 3 H 0 0 1 1 243 7 and to further test your 770 120 4 H 0 1 0 0 253 scripts, especially those 9 770 133 5 H 0 1 0 1 245 dealing with private data. 6 770 147 6 H 0 1 1 0 249 7 852 120 7 H 0 1 1 1 241 9
44 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 REFERENCES [1]. Coskun And H. Ardam, “A Remote Controller For Home And Office Appliances By Telephone”, Ieee Trans. Consumer Electron. , Vol. 44, No. 4, Pp. 1291- 1297, November 1998 [2]. Tuljappa M Ladwa, Sanjay M Ladwa, R Sudharshan Kaarthik, Alok Ranjan Dhara, Nayan Dalei,” Control Of Remote Domestic System Using DTMF”, ICICI-BME 2009 Bandung, Indonesia. [3]. Nehchaljindal,” Wireless control via mobile communication”, IIT Kanpur-2010.
45 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 AUDITING PROTOCOLS: A NEW APPROACH FOR SECURITY OF CLOUD DATA
Sonali Pardeshi1,Ankita Rathi2, Shalini Shejwal 3 Pooja Kuyate4 1M.E. Student Of Computer, Pune 3M.E. Student Of Computer, Pune. 2Assistant Professor SVIT,Nashik 4M.E. Student Of Computer, Pune
Abstract- cloud computing is public auditability process a long dreamed vision of using TPA, should not computing as utility, where burden the user with data owner can remotely additional online work and store their data in cloud to data privacy of user should enjoy on demand high quality not be vulnerable. With application and services form public auditability ,a trusted shared pool of computing entity with expertise and resources. data integrity capability protection in cloud data owner do not possess computing is a mandatory can be delegated as an task as users no longer have external audit party to assess physical possession of the the risk of outsource data outsourced data users should when needed such auditing be able to just use the cloud system helps to save data storage as if it is local, owner computational without worrying about the resources and provide cost need to verify its integrity. effective method to gain To check the integrity of trust in cloud. we describe outsource data, users can approaches and system resort to a third-party requirement that should be auditor (TPA).while enabling brought in to consideration
46 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 for such publically auditable data access with independent cloud storage and also show geographical locations, and how TPA can perform audits avoidance of capital expenditure for multiple users on hardware, software, and simultaneously and personnel maintenances, etc. As efficiently. a disruptive technology with Keywords-data storage, profound implications, cloud public auditability, cloud computing is transforming the storage, batch verification. very nature of how businesses use information technology. I.INTRODUCTION As cloud make this advantages New computing paradigms keep more beneficial it also brings, emerging. One notable example new and challenging threats is the cloud computing paradigm, towards users outsources data. a new economic computing model basically cloud service providers made possible by the advances in (CSP) are separate networking technology, where a administrative entities. so this client can leverage a service cause users to desist from provider’s computing, storage or controlling the fate of their data. networking infrastructure. With Which results in risking of the unprecedented exponential correctness of data. Even though growth rate of information, there cloud have powerful is an increasing demand for infrastructures compare to outsourcing data storage to cloud personal computing devices they services such as Microsoft’s still face the broad range of both Azure and Amazon’s S3 they internal and external threats for assist in the strategic data integrity. management of corporate data. storing data remotely to the Moreover CSP might prove to cloud in a flexible on-demand behave unfaithfully toward the manner brings appealing cloud users regarding their benefits: relief of the burden for outsourced data status. They can storage management, universal reclaim storage for monetary
47 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 reasons by discarding data that cloud environment can be have not been or are rarely formidable and expensive for the accessed, or even hide data loss cloud users. Moreover, the incidents to maintain a overhead of using cloud storage reputation. Concluding cloud is should be minimized as much as economic-ally attractive for long- possible, such that a user does term large-scale storage, it does not need to perform too many not immediately offer any operations to use the data. guarantee on data integrity and Also users are reluctant for going availability. This problem, if not through process of complexity in properly addressed, may impede verifying data integrity. Consider the success of cloud architecture. example of enterprise where their As users no longer physically may be more than one user possess the storage of their data, accessing same cloud storage. So traditional cryptographic it is desirable that cloud only primitives for the purpose of data entertains verification request security protection cannot be from a single designated party. directly adopted. In particular, In order to solve the simply downloading all the data problem of data integrity for its integrity verification is not checking many schemes are a practical solution due to the proposed under different systems expensiveness in I/O and and security models. In all these transmission cost across the works, great efforts are made to network. Also it is insufficient to design solutions that meet detect the data corruption only various requirements: high when we are accessing as it does scheme efficiency, stateless not give assurance for those verification, unbounded use of unaccessed data. Considering the queries and retrievability of data, large size of the outsourced data etc. Considering the role of the and the user’s constrained verifier in the model, all the resource capability, the tasks of schemes presented before fall auditing the data correctness in a into two categories: private
48 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 auditability and public stored data. This schemes may auditability. potentially reveal users data to Although schemes with private auditors and do not consider auditability can achieve higher privacy protection of data. This is scheme efficiency, public severs drawback which greatly auditability allows anyone, not hampers the security protocol in just the client (data owner), to cloud computing. Users who own challenge the cloud server for the data and rely on TPA just for correctness of data storage while the storage security of data, do keeping no private information. not will to go through this To enable public auditing auditing process as it introducing service for cloud storage , users new vulnerabilities of resort to an independent Third unauthorized information Party Auditor (TPA) to audit the leakage toward their data outsource data when required. security. The TPA, who has expertise and Simply exploiting data capabilities that users do not, can encryption before outsourcing periodically check the integrity of could be one way to mitigate this all the data stored in the cloud on privacy concern of data auditing, behalf of the users, which but it could also be an overkill provides a much more easier and when employed in the case of affordable way for the users to unencrypted/public cloud data. ensure their storage correctness Besides, encryption does not in the cloud. TPA helps users to completely solve the problem of evaluate risk of their subscribe protecting data privacy against cloud data services. The audit third-party auditing but just result from TPA are beneficial reduces it to the complex key for CSP to improve their service management domain. platform. Public auditability Unauthorized data leakage still allows an external party, in remains possible due to the addition to the user himself, to potential exposure of decryption verify the correctness of remotely keys.
49 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
Firstly in this paper we are 1) Motivating public auditing going to tackle the problem of system for clod data storage. how to enable privacy preserving 2) Provide privacy preserving TPA protocol, independent to auditing protocol. data encryption. Secondly the 3) Auditing the cloud data individual auditing of increasing without learning the data tasks is tradiouseand hectic, so content. we have to look for , enabling the 4) Achieving batch auditing in TPA to efficiently perform privacy preserving manner. multiple auditing tasks in a batch 2 PROBLEM STATEMENT manner, i.e., simultaneously. We begin with the high level To address these architecture data storage problems, our work utilizes the As shown in figure 1. technique of public key-based homomorphic linear authenticator (or HLA for short) [9], [13], [8], which enables TPA to perform the auditing without demanding the local copy of data . this approach automatically reduces communication and computation overhead. By integrating the HLA with random masking, our protocol Figure 1: The architecture of guarantees that the TPA could cloud data storage services .s not learn any knowledge about The architecture consist of the data content stored in the architecture consists of four cloud server (CS) during auditing different entities: data process. owner,user, cloud server (CS), Specifically, our paper and TPA. the data owner is the summarizes as the following : one who has large amount of data . The cloud user is the one who
50 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 has to store large amount of data the network path, economically to the cloud . The cloud server is motivated hackers, malicious or managed by CSP to provide data accidental management errors, storage services and has etc. cloud server may for their significant storage to use. TPA is own benefits or to maintain expertise and capable, that cloud reputation CS may even decide to user do not have, and trusted to hide these these data corruption assess the cloud storage service incidents to users. TPA is a on behalf of storage. TPA reliable and independent and provides a transpernt yet cost TPA provides a cost effective effective method for establishing method to users. However, it may trust between data owner and harm the user if the TPA could cloud server. Users rely on the learn the outsourced data after CS for cloud data storage and the audit. maintenance. They may also In this model, beyond users’ dynamically interact with the CS reluctance to leak data to TPA, to access and update their stored we also assume that cloud servers data for various application has no incentives to reveal their purposes. It is important to hosted data to external parties. insure the user that their data On the one hand, there are are correctly stored and regulations, e.g., HIPAA [16], maintained. Cloud users may use requesting CS to maintain users’ the TPA for insuring the data privacy. On the other hand, integrity of the outsource data as users’ data belong to their that will save the computation business asset [10], there also resource and periodically brought exist financial incentives for CS the storage verification. to protect it from any external The data integrity threats parties. Therefore, we assume come from both internal and that neither CS nor TPA has external attacks at Cloud Server. motivations to collude with each These threats include: software other during the auditing bugs, hardware failures, bugs in process. In other words, neither
51 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 entities will deviate from the backgroundfor our proposed prescribed protocol execution in scheme. the following presentation. Bilinear Map. Let G1, G2, and 3 THE PROPOSED SCHEMES GT be multiplicative cyclic The paper is going to present the groups of prime order p. Let g1 solution for outsourcing and g2 be generators of G1 and data and it also check the G2, respectively. A bilinear map is a map e : G1XG2 G such that integrity of the data. First we see → T for all u G , v G2 and a, b the notation and preliminaries € 1 € € ,overview then , present our Zp, e(ua ,vb)=e (u,v)ab. This main scheme and show how to bilinearity implies that for any extent our main scheme to u1,u2 € G1, v €G2, e(u1.u2,v)=e(u ,v). e(u ,v) Of support batch auditing for the 1 2 TPA upon delegations from course there exists an efficiently multiple users. computable algorithm for computing e and the map should 3.1 Notation be nontrivial, i.e., e is F—the data file to be nondegenerate: e(g ,g outsourced, denoted as a 1 2)≠1 Definitions sequence of n blocks m1; . . .;mi; . We are following the a similar . .;mn € Zp for definition of previously proposed some large prime p. schemes in the context of remote MAC(.)—message authentication data integrity checking [9], [11], code (MAC) [13] and adapt the framework for function, defined as: our privacy-preserving public KX{0,1}*→{0,1}* where K auditing system. denotes the key space. This system consist of four
H(.),h(.)—cryptographic hash algorithm functions. 1 .KeyGen: a key generation We now introduce some algorithm that is run by the user necessary cryptographic to setup the scheme 2. SigGen: it consist of digital
52 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 signatures demerits— 3. GenProof: generate proof of The second one is a system based correctness of data storage on homomorphic linear 4. VerifyProof: TPA run this to authentica-tors, which covers audit the proof. many recent proof of storage This auditing done in to the two systems phases: MAC-based solution. There are Setup : The setup consist of two possible ways to make use of keyGen and sigGen which MAC to authenticate the data. A preprocess the data file F and trivial way is just uploading the verify the metadata. The then data blocks with their MACs to keep the file F for verification of the server, and sends the data and delete the local copy. corresponding secret key sk to the TPA. Audit: The TPA does audit to make sure that the CS retained Before data outsourcing, the the integrity of the data file F. it cloud user chooses s random uses the GenProof and message authentication code keys {sk } VerifyProof response message. T 1≤t≤s Precomputess MACs, {MAC The TPA is stateless, i.e., TPA skt (F)} and publishes these does not need to maintain and 1≤t≤s update state between audits[13]. verification metadata (the keys We can easily extend the and the MACs) to TPA. The TPA framework to capture a sateful can reveal a secret key sk to the auditing system, by verifying cloud server and ask for a fresh data in two parts which are keyed MAC for comparison in sorted by TPA and cloud server. each audit. This is privacy preserving as long as it is 3.3 The Basic Schemes impossible to recover F in full First we will study two classes of given MAC (F) and skt. schemes. The first one is a MAC- skt Basic Scheme 1: based solution which suffers from undesirable systematic
53 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 File is divided into Blocks preserving public auditing, we propose to uniquely integrate the homomorphic linear authenticator with random masking technique. In our protocol, the linear combination Basic Scheme 2: of sampled blocks in the server’s response is masked with randomness generated by the server. With random masking, the TPA no longer has all the necessary information to build up a correct group of linear HLA-based solution. To equations and therefore cannot effectively support public derive the user’s data content, no auditability without having to matter how many linear retrieve the data blocks combinations of the same set of themselves, the HLA technique file blocks can be collected. On [9], [13], [8] can be used. HLAs, the other hand, the correctness like MACs, are also some validation of the block- unforgeable verification authenticator pairs can still be metadata that authenticate the carried out in a new way which integrity of a data block. The will be shown shortly, even with difference is that HLAs can be the presence of the randomness. aggregated. It is possible to Our design makes use of a public compute an aggregated HLA key-based HLA, to equip the which authenticates a linear auditing protocol with public combination of the individual auditability. Specifically, we use data blocks. the HLA proposed in [, which is based on the short signature 3.4 Privacy-Preserving Public Auditing Scheme scheme proposed by Boneh, Lynn, and Shacham (hereinafter Overview. To achieve privacy-
54 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 referred as BLS signature) . uniformly at random from Zp as the identifier of file F. Denote the set of authenticators by = Scheme details. Let G1, G2, and ф { i} GT be multiplicative cyclic groups σ 1≤i≤n The last part of SigGen is of prime order p, and e :G1 X G2- for ensuring the integrity of the >GT be a bilinear map as unique file identifier name. One introduced in preliminaries. Let g simple way to do this is to
compute t = name||SSigssk be a generator of G2. H(.)is a secure map-to-point hash (name) as the file tag for F,
where SSigssk(name) is the function: {0,1}*->G1 which maps signature on name under the strings uniformly to G1. Another private key ssk. For simplicity, hash function h(.):GT->Zp maps we assume the TPA knows the group element of GT uniformly to number of blocks n. The user Zp. then sends F along with the The scheme is as follows: verification metadata ( ,t) to the Setup Phase: The cloud user ф server and deletes them from runs KeyGen to generate the local storage. public and secret parameters. Audit Phase: The TPA Specifically, the user chooses a first retrieves the file tag t. With random signing key pair (spk,ssk) respect to the mechanism we a random element x<-Z , p describe in the Setup phase, the random element u<-G1, and x TPA verifies the signature SSigssk computes v ← g The secret (name) via spk, and quits by parameter is sk = (x,ssk) and emitting FALSE if the the public parameter verification fails. Otherwise, the pk=(spk,v,g,u,e(u,v)) TPA recovers name. Given a data file F ={m } the i Now lets see the “core” user runs SigGen to compute
mi x part of the auditing process. To authenticator σi <- (H(Wi).u ) € generate the challenge message G for each i. Here, W = name||i 1 i for the audit “chal,” the TPA and name is chosen by the user
55 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 picks a random c-element subset elaborated as follows : I={s1,….sc}. For each element i € I, the TPA also chooses a random value vi . The message “chal” specifies the positions of the blocks required to be checked. The TPA sends chal = {(I,vi)} to the server. i€I Upon receiving challenge, chal = {(I,vi)} the server runs Properties of our protocol. It is i€I GenProof to generate a response easy to see that our protocol proof of data storage correctness. achieves public auditability. Specifically, the server chooses a There is no secret keying random element r←Zp and material or states for the TPA to calculates R = e(u,v)t G Let µ € t keep or maintain between audits, denote the linear combination of and the auditing protocol does sampled blocks specified in chal: not pose any potential online µ’ = i v m To blind µ’ with r, burden on users. This approach Σ €I i i the server computes: µ = r+ γµ’ ensures the privacy of user data mod p where γ = h(R)€Zp content during the auditing Meanwhile the server calculates process by employing a random an aggregate authenticator σ = masking r to hide a linear vi G It then sends {µ, ,R}as combination of the data blocks. πi€Iσi € 1. σ the response proof of storage Note that the value R in our correctness to TPA. With protocol, which enables the response ,the TPA runs privacy-preserving guarantee, VerifyProof to validateit by first will not affect the validity of the computing γ = h(R) and then we equation, due to the circular cheack the eqation relationship between R and _ in and the verification equation. Storage correctness thus follows The correctness of equation is from that of the underlying
56 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 protocol [13]. The security of this and the cloud server than protocol will be formally proven checking all the data, as long as in Section 4. Besides, the HLA the sampling strategies provides helps achieve the constant high-probability assurance. In communication overhead for Section 4, we will present the server’s response during the experiment result based on these audit: the size of is independent sampling strategies. of the number of sampled blocks For some cloud storage c. providers, it is possible that Previous work [9], [8] showed certain information dispersal that if the server is missing a algorithms (IDA) may be used to fraction of the data, then the fragment and geographically number of blocks that needs to be distribute the user’s out-sourced checked in order to detect server data for increased availability. misbehavior with high We note that these cloud side probability is in the order of O(1). operations would not affect the In particular, if t fraction of data behavior of our proposed is corrupted, then random mechanism, as long as the IDA is sampling c blocks would reach systematic, i.e., it preserves the detection probability P. Here, user’s data in its original form every block is chosen uniformly after encoding with redundancy. at random. When t =1% of the This is because from user’s data F , the TPA only needs to perspective, as long as there is a audit for c ¼ 300 or 460 complete yet unchanged copy of randomly chosen blocks of F to his outsourced data in cloud, the detect this misbehavior with precomputed verification probability larger than 95 and 99 metadata ð_; tÞ will remain percent, respectively. Given the valid. As a result, those metadata huge volume of data outsourced can still be utilized in our in the cloud, checking a portion auditing mechanism to guarantee of the data file is more affordable the correctness of user’s and practical for both the TPA outsourced cloud data.
57 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Storage and communication possibly expensive auditing task, trade off. As described above, but also alleviates the users’ fear each block is accompanied by an of their outsourced data leakage. authenticator of equal size of |p| Considering TPA may bits. This gives about 2 X storage concurrently handle multiple overhead on server. However, as audit sessions from different noted in [13], we can introduce a users for their outsourced data parameter s in the authenticator files, we further extend our construction to adjust this privacy-preserving public storage overhead, in the cost of auditing protocol into a multiuser communication overhead in the setting, where the TPA can auditing protocol between TPA perform multiple auditing tasks and cloud server. In particular, in a batch manner for better we assume each block mi consist efficiency. Extensive analysis of s sectors{m shows that our schemes are ij}with 1≤ j≤s, where u1,u2,u3….us are randomly provably secure and highly chosen form G1. efficient. Our preliminary VI. CONCLUSION experiment conducted on In this paper, we propose Amazon EC2 instance further a privacy-preserving public demonstrates the fast auditing system for data storage performance of our design on security in cloud computing. We both the cloud and the auditor utilize the homomorphic linear side. We leave the full-fledged authenticator and random implementation of the masking to guarantee that the mechanism on commercial public TPA would not learn any cloud as an important future knowledge about the data extension, which is expected to content stored on the cloud robustly cope with very large server during the efficient scale data and thus encourage auditing process, which not only users to adopt cloud storage eliminates the burden of cloud services more confidently user from the tedious and REFERENCES
58 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
Its Doors,” http:// www.techcrunch.com/2008/07/10 [1] C. Wang, Q. Wang, K. Ren, and W. Lou, “Privacy-Preserving /mediamaxthelinkup-closes-its- Public Auditing for Storage doors/, July 2008. Security in Cloud Computing,” [7] Amazon.com, “Amazon s3 Proc. IEEE INFOCOM ’10, Mar. Availability Event: July 20, 2010. 2008,” http://status.aws.amazon.com/s3- [2] P. Mell and T. Grance, “Draft NIST Working Definition of 20080720.html, July 2008. Cloud Computing,” [8] Q. Wang, C. Wang, K. Ren, W. http://csrc.nist.gov/groups/SNS/cl Lou, and J. Li, “Enabling Public oud-computing/index.html, June Auditability and Data Dynamics 2009. for Storage Security in Cloud Computing,” IEEE Trans. [3] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R.H. Katz, A. Parallel and Distributed Konwinski, G. Lee, D.A. Systems, vol. 22, no. 5, pp. 847- Patterson, A. Rabkin, I. Stoica, 859, May 2011. and M. Zaharia, “Above the [9] G. Ateniese, R. Burns, R. Clouds: A Berkeley View of Curtmola, J. Herring, L. Kissner, Cloud Comput-ing,” Technical Z. Peterson, and D. Song, Report UCB-EECS-2009-28, “Provable Data Possession at Univ. of California, Berkeley, Untrusted Stores,” Proc. 14th Feb ACM Conf. Computer and Comm. Security (CCS ’07), pp. [4] Cloud Security Alliance, “Top Threats to Cloud Computing, 598-609, 2007. http://www.cloudsecurityalliance [10] M.A. Shah, R. Swaminathan, .org, 2010. and M. Baker, “Privacy- Preserving Audit and Extraction [5] M. Arrington, “Gmail Disaster: Reports of Mass Email of Digital Contents,” Cryptology Deletions,” ePrint Archive, Report 2008/186, http://www.techcrunch.com/2006 2008. /12/28/gmail-disasterreports-of- [11] A. Juels and J. Burton, S. mass-email-deletions/, 2006. Kaliski, “PORs: Proofs of Retrievability for Large Files,” [6] J. Kincaid, “MediaMax/TheLinkup Closes Proc. ACM Conf. Computer and
59 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
Comm. Security (CCS ’07), pp. [17] R. Curtmola, O. Khan, and 584-597, Oct. 2007. R. Burns, “Robust Remote Data Checking,” Proc. Fourth ACM [12] Cloud Security Alliance, “Security Guidance for Critical Int’l Workshop Storage Security Areas of Focus in Cloud and Survivability (StorageSS Computing,” ’08), pp. 63-68, 2008. http://www.cloudsecurityalliance . org, 2009.
[13] H. Shacham and B. Waters, “Compact Proofs of Retrievability,” Proc. Int’l Conf. Theory and Application of Cryptology and Information Security: Advances in Cryptology (Asiacrypt), vol. 5350, pp. 90- 107, Dec. 2008.
[14] C. Wang, K. Ren, W. Lou, and J. Li, “Towards Publicly Auditable Secure Cloud Data Storage Services,” IEEE Network Magazine, vol. 24, no. 4, pp. 19-24, July/Aug. 2010. [15] M.A. Shah, M. Baker, J.C. Mogul, and R. Swaminathan, “Auditing to Keep Online Storage Services Honest,” Proc. 11th USENIX Workshop Hot Topics in Operating Systems (HotOS ’07), pp. 1-6, 2007.
[16] 104th United States Congress, “Health Insurance Portability and Accountability Act of 1996 (HIPPA),” http://aspe.hhs.gov/ admnsimp/pl104191.htm, 1996.
60 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 An Extraction Technique for Universal Distance Cache
Chotia Amit Nandkishor#1 Joshi Abhishek Arvind#2 Gosavi Darpan Vinod#3 Wagh Ganesh Vijay#4 Sir Visvesvaraya Institute of Technology, Nashik#1, 2, 3, 4 Bachelor of Engineering in Computer Engineering#1, 2, 3, 4 [email protected]#1 [email protected]#2 [email protected]#3 [email protected]#4
1. Abstract Performance is also a big factor while extracting the data. Todays Due to fast internet, a huge what we do? Simply we increase amount of data is uploaded in the disk so that loading can be data repositories hence minimized after then we use the extracting a perfect data is a concept of clustering in which we challenging task todays. Another make a number of cluster to thing which is important that as reduce the load. Another concept distance increases, extraction is RAID (Redundant array of matter. As we know, server independent disk). RAID can be always try to extract data which used according to the is nearer to them but on world requirement of server but all the wide web, when any user want to concepts/techniques require more search data for different location cost, more maintenance that’s as in USA then it may chances why it is not possible to maintain that data may not properly for a small firmware company extracted. but our project will remove all Here, we are going to develop these dependencies and will such type of technique from increase performance which distance won’t matter. automatically.
61 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 We introduce a new extraction process is facilitated by building approach with caching distance. an index on the relevant This is called as disk based attributes. These indexes are caches. User enter a distance often based on treating the range search and find out the records as points in a results. Here, we are going to use multidimensional space and parsing technique. It can extract using what are called point access the results from desired caches methods. and distances. More recent applications involve data that has considerably less 2. Introduction structure and whose specification Classical database methods are is therefore less precise. Some designed to handle data objects example applications include that have some predefined collections of more complex data structure. This structure is such as images, videos, audio usually captured by treating the recordings, text documents, time various attributes associated with series, DNA sequences, etc. The the objects as independent problem is that usually the data dimensions, and then can neither be ordered nor is it representing the objects as meaningful to perform equality records. These records are stored comparisons on it. Instead, in the database using some proximity is a more appropriate appropriate model (e.g., retrieval criterion. Such data relational, object-oriented, object- objects are often described via a relational, hierarchical, network, collection of features, and the etc.). The most common queries result is called a feature vector. on such data are exact match, For example, in the case of image partial match, range, and join data, the feature vector might applied to some or all of the include color, color moments, attributes. Responding to these textures, shape descriptors, etc., queries involves retrieving the all of which are usually described relevant data. The retrieval using scalar values. In the case of
62 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 text documents, we might have to each other (closest pairs one dimension per word, which queries). leads to prohibitively high 3. Existing System dimensions. There are mainly two following Correcting misspelled text or problems in the Existing System: searching for semantic equivalents is even more difficult. A. Problem of deep Video retrieval involves finding Extraction based on overlapping frames which is distance: somewhat like finding Already there are a lots of subsequences in DNA sequences. problem such as language The goal in these applications is dependency, scripting often one of the following: dependency, version dependency 1. Find objects whose feature in extraction but now a days, values fall within a given many technique has been range or where the distance, released such as page level using a suitably defined extraction, fiva tech extraction, distance metric, from some vision based extraction from query object falls into a which efficient extraction can be certain range. done but the main problem is 2. Find objects whose features extraction based upon distance. have values similar to those Many of time, we observe that we of a given query object or set don’t find that type of result of query objects (nearest what we want. neighbor queries). In order to Suppose there is a website in reduce the complexity of the USA for courier service (FedEx) search process, the precision related. That courier company of the required similarity can have also branches in another be an approximation. company such as in India, china, 3. Find pairs of objects from the Russia and etc. Obviously all same set or different sets branches may have relevant which are sufficiently similar
63 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 website in different location. The Another problem is Complexity. problem is that, when any user When data is uploaded from wants to search a branch for that different location then it may courier company in search box having chance of more and more then he find only main branch complex data. If we consider a (USA) which is a problem in digital library website such as extraction. The extraction Google, yahoo, Wikipedia then approach having a problem to there exists too many unwanted find nearer located branch i.e. data. One links may occur many Distance has been not considered times.as, all the links having in that tool (extraction tool). some information behind them. If they will occur more than one Another example is, suppose we time then space will be taken want the information about “java more. Hence performance will be programming language”. We type automatically decreased and after this keyword in any search box then response time will be then what the server do? They increased which is not a solution try to find the java programming for good extraction.so, this type language containing information of problem exists in existing then this time, the concept of system. similarity is used. Server will first match the data after then 3.1 Disadvantages: extract. Here also, distance 1. Minor performance is matter? Which data should be available here. presented, nearer or far 2. Less efficiency and less distanced data? Which data performance only. having sufficient information for 3. It can take more amount of user? So this type of problem time under retrieval. existed in existing system. 4. Computational cost is high. B. Problem of data retrieval 5. More Complexity based on Complexity and web dependency:
64 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
3.2 Existing Algorithms operation is used to find the best A. Basic Algorithms path for inserting a new data In this section, we will present item into the tree by providing a some basic algorithms on high- number representing how bad an dimensional index structuresfor insertion into that path would be. index construction and The PickSplitoperations is used maintenance in a dynamic to split a data page in case of an environment, as well as for query overflow. processing. Although some of the The insertion and delete algorithms are published using a operations of tree structures are specific indexing structure, they usually the most critical are presented here in a more operation which heavily general way. determine the structure of the resulting index and the 3.2.1 Insert, Delete and achievable performance. Update Insert, delete and update are the B. Exact Match Query operations, which are most Exact match queries are defined specific to the corresponding as follows: Given a query point q, index structures. Despite that, determine whether q is contained there are basic algorithms in the database or not. Query capturing all actions which are processing starts with the root common to all index structures. node, which is loaded into main In the GiST framework, the memory. For all regions build-up of the tree via the insert containing point q the function operation is handled using three ExactMatchQuery () is called basic operations: Union, Penalty, recursively. As overlap between and Pick-Split. The Union page regions is allowed in most operation consolidates index structures presented in this information in the tree and chapter, it is possible that several returns a new key which is true branches of the indexing for all data items in the structure have to be examined for considered subtree. The Penalty processing an exact match query.
65 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 In the GiST framework, this caches we are uses here parsing situation is handled using the technique. It can extract the Consistent operation which is the results from desired caches. It generic operation that needs to can extract the results from be implemented for different useful distances. Unnecessary instantiations of the generalized distance locations of caches we search tree. The result are not search here. It can give ofUnderflow conditions can the fewer amounts of results, generally be handled by three with low indexing results. D- different actions: cache gives the reduced cost • Balancing pages by moving results in implementation. Here objects from one page to we show efficient indexing another results in output. It can gives the • Merging pages Exact Match faster extraction results. Query is true if any of the 4.1 Advantages recursive calls returns true. For data pages, the result is 1. It avoids the distance true if one of the points computation problems. stored on the data page fits. 2. Enhance the query
performance. 4. Proposed System 3. Batch insertion of queries We introduce the new extraction also in cache servers. approach with caching distances. Here new database technology 4. Speedup query result introduced here. That is called as display. a disk based caches. It is not 4.2 Applications search the total caches here. User entered distance range search 1. In Dynamic Web and find out and display the 2. In Digital Libraries. results. It is search the data in limited number of caches. 3. In Large Files. Searching in limited number
66 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
4. Pattern Recognition object r and a database object o on input, the D-cache should 5. Universal Distance quickly determine the tightest Cache possible lower or upper bound of D Cache is a technique/tool for without the need of an explicit general metric access distance computation. This cheap methodsthat helps to reduce a determination of lower/upper cost of both indexing and bound distances then serves a querying. Themain task of D MAM in order to filter out a no cache is to determine tight lower relevant database object or even and upperbound of an unknown a whole part of the index. distance between two objects. 6. Metric Access Methods The desired functionality of D- cache is twofold: First, we have to understand First, given a pair runtime about Metric access methods— object/database object hr; oi, the Metric access methods are the D-cache should quickly technique which is used in determine the exact value _ðr; oÞ thatsituations where the in case the distance is stored in similarity searching can be the D-cache. However, as the applied. E.g.Search for SBI, it can exact value could only be found search in entire country i.e. when the actual distance was similarsearch has been invoked. already computed previously in First try to understand the the session, this functionality is concept ofsimilarity searching. limited to rather special cases, When any user submit a query in like rendering of data objects (or thesearch box or any database index rearrangements), repeated then the process of responding queries or querying by database tothese queries is termed as objects. similarity searching. Given a The second functionality, which queryobject this involves finding is the main D-cache contribution, objects that are similar to q in is more general. Given a runtime
67 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 adatabase S of N objects, based Because the main memory is on some similarity measure. always limited and the distance matrix could expand to an Both q and s are drawn from enormous size, we need to choose some “universe” U of objects,but a compact data structure that q is generally not in S. We consumes a user-defined portion assume that the of main memory. In order to similaritymeasure can be provide also fast retrieval, the D- expressed as a distance metric cache implements the distance such thatd(01,02) becomes Matrix as a linear hash table smaller as 01 and 02 are more consisting of entries.the hash key similar thus(s, d) is said to be a (pointing to a position in the finite metric space. hash table) is derived from the Now, metric access method will two ids of objects whose distance facilitate the retrieval processby is being retrieved or stored. building an index on the various In addition, there is a constant- features which areanalogous to size collision interval defined, attributes. These indexes are that allows to move from the based on treatingthe records as a hashed position to a more points in a multidimensional suitable one. space and usepoint access methods. However, in order to keep the D- cache as fast as possible, the Metric access methods uses a collision interval should be very structure for caching small, preferably just one distancescomputed during the position in the hash table (i.e., current runtime session. The only the hashed position). distancecache ought to be an analogy to the classic disk cache 7. Project Modules widelyused in DBMS to optimize There are mainly five modules: I/O cost. Hence instead of sparingI/O, the distance cache 7.1Apply the Concept of D – should spare distance Cache: computations.
68 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Any user can forward any output. It is very cheap for type of distance based query extraction of results and provides starts the searching process and the results as a output. It can create the runtime object and give the results as a minimized database object. Each and every result of content. object session time and index are 7.3 Filtering the Data: calculated here for particular It can start the searching distance based query. Other user process based on radius. It forward same query extracts the searches the data within the results from previous distance. region. It can start the search in Automatically index value is all number of dimensions. increases here. It is the Display the results after procedure of D-cache. D-cache collection of multidimensional starts the searching process and objects. It can give exact and quickly displays the results. It accurate results in output display can calculates lower bound and content. upper bound, which is the nearest locations results those 7.4 Approximate Similarity results are displayed as a final Search: results. It can give relevant It can start the search by distance based caches results exact approximate similarity only in output. search. It can save the cost under extraction of results. This type 7.2 Selection of Dynamic search retrieves the exact results. pivots: It is good incremental search It consider the input of without lower and upper bound first module. That is called as a distances. It is related good preprocessing data or indexing hierarchy related search data. In this particular data only mechanism here. perform the similarity search operations. Automatically creates the dynamic pivot calculation and display the final results in
69 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 7.5 PerformanceUp GetLowerBoundDistance is set gradation: by the user, while this parameter D-cache gives faster is an exact analogy to the number results. It can have less overhead of pivots used by Pivot tables, problems. It reduces the e.g., LAESA. There exists no distances and provides the general rule for the automatic results with less expensive. determination of the number of pivots, especially when 8. Algorithms Used minimizing the real-time cost The D-cache is initialized by a rather than just the number of MAM when loading the index distance computations. (the session begins). Besides the In general, the effective number initialization, the D Cache is also of pivots depends on the notified by a MAM whenever a (expected) size of the database, new query/ insertion is to be its intrinsic dimensionality (see started (the MAM calls method Section 6.1.1), the computational StartRuntimeProcessing on D- complexity of the used metric, cache). At that moment, new the pivot set quality itself, etc. runtime object r is announced to The same reasons apply also for be processed, which also includes D-cache. the computation of distances 8.2 Distance Insertion from r to the k actual dynamic Every time a distance &(ri,oi) is pivots. computed by the MAM, the These are the following triplet( id(r),id(o),&(ri,oi))is algorithms used: inserted into the D-cache (the 8.1 Distance Retrieval MAM calls method The main D-cache functionality InsertDistance on D-cache). Since is operated by methods the storage capacity of D-cache is GetDistance and limited, at some moment the GetLowerBoundDistance collision interval in the hash The number of dynamic pivots (k table for a newly inserted ¼ jDPj) used to evaluate distance entry is full. Then, some
70 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 older entry within the collision structures and the interval has to be replaced by the performance of similarity new entry. Or, alternatively, if it queries. turns out the newly inserted distance is less useful than all the 10. Conclusions distances in the collision interval, In this paper we presented the D- the insertion of the new distance cache, a main-memory data is canceled. structure which tracks computed Note that we should prioritize distances while inserting objects replacing of such entries where or performing similarity queries none of the objects oid1, oid2 in the metric space model. Since belongs to the current set of k distance computations stored in dynamic pivots anymore. the D-cache may be reused in further database operations, it is 9. Future Enhancements not necessary to compute them The D-cache supports three again. Also, the D-cache can be functions useful for metric access used to estimate distance methods (MAMs)—the functions between new objects GetDistance (returning the exact and objects stored in the distance between two objects, if database, which can also avoid available), the expensive distance computations. GetLowerBoundDistance The D-cache aims to amortize the (returning the greatest lower- number of distance computations bound distance between two spent by querying/updating the objects, by means of the dynamic database, similarly like disk page pivots), and the buffering in traditional DBMSs GetUpperBoundDistance aims to amortize the I/O cost. (returning the lowest upper The D-cache structure is based bound distance). With these on a hash table, thus making functions, the D-cache may be efficient to retrieve stored used to improve the distances for further usage. construction of MAMs’ index
71 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
REFERENCES Systems, vol. 24, no. 8, pp. 834- 848, 2008 [1] J.S. Vitter, “External Memory Algorithms and Data [6] P. Zezula, G. Amato, V. Dohnal, Structures: Dealing with and M. Batko, Similarity Massive Data,”ACMComputing Search: The Metric Space Surveys, vol. 33, no. 2, pp. 209- Approach (Advances in 271, Database Systems). Springer, citeseer.ist.psu.edu/vitter01exte 2005 rnal.html, 2001. [7] E. Cha´vez, G. Navarro, R. Baeza-Yates, and J.L. [2] C. Bo¨hm, S. Berchtold, and D. Keim, “Searching in High Marroquı´n, “Searching in Dimensional Spaces—Index Metric Spaces,” ACM Structures for Improving the Computing Surveys, vol. 33, no. Performance of Multimedia 3, pp. 273-321, 2001. Databases,” ACM Computing [8] G.R. Hjaltason and H. Samet, Surveys,vol. 33, no. 3, pp. 322- “Index-Driven Similarity 373, 2001 Search in Metric Spaces,” ACM Trans. Database Systems, vol. [3] S.D. Carson, “A System for Adaptive Disk Rearrangement,” 28, no. 4, pp. 517-580, 2003 Software—Practice and [9] H. Samet, Foundations of Experience, vol. 20, no. 3, pp. Multidimensional and Metric 225-242, 1990. Data Structures. Morgan Kaufmann, 2006. [4] W. Effelsberg and T. Haerder, “Principles of Database Buffer [10] T. Skopal and B. Bustos, “On Management,” ACM Trans. Index-Free Similarity Search in Database Systems, vol. 9, no. 4, Metric Spaces,” Proc. 20th Int’l pp. 560-595, 1984 Conf. Database and Expert Systems Applications (DEXA [5] M. Batko, D. Novak, F. Falchi, and P. Zezula, “Scalability ’09), pp. 516-531, 2009 Comparison of Peer-to-Peer [11] E. Vidal, “New Formulation and Similarity Search Structures,” Improvements of the Nearest- Future Generation Computer Neighbour Approximating and Eliminating Search Algorithm
72 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
(AESA),” Pattern Recognition Letters, vol. 15, no. 1, pp. 1-7, 1994
[12] M.L. Mico´, J. Oncina, and E. Vidal, “An Algorithm for Finding NearestNeighbour in Constant Average Time with a Linear Space Complexity,” Proc. Int’l Conf. Pattern Recognition, 1992.
[13] M.L. Mico´, J. Oncina, and R.C. Carrasco, “A Fast Branch & Bound Nearest- NeighbourClassifier in Metric Spaces,” Pattern Recognition Letters, vol. 17, no. 7, pp. 731- 739, 1996.
73 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Safety Management of Construction Workers
Vivek K. Kulkarni1 Professor R. V. Devalkar2 Dept. of civil Engineering N.D.M.V.P.S K.B.G.T., C.O.E. Nasik
Abstract
Construction is the most dangerous Keywords— DFS, GDP, Hagan, land-based work sector in the world. Bureau of Labor, prime Accidents at work places are due to a sequence of events. Events I.INTRODUCTION may be physical due to hazardous Construction Safety Management situations & incidents due to consists of three phases: behaviors of workers by unsafe 1. Planning and Preparation acts. Construction accidents can Phase be reduced by tokening some 2. Identification and Assessment presentational precaution. Very Phase little attention is given towards 3. Execution and Improvement managerial, organizational & Phase human factors. Thus an Safety management organization lacks the approach system is the basis of safety towards the development & performance, which can allow a enforcement of effective powerful means for controlling & Performance measures & metrics monitoring performance. Safety or model needed to achieve an is the best approach of doing efficient safety management business, which maximizes the system. So to avoid such accident competitive nature of safety management of organization, through continual construction works should be the improvement of its product, at most priority of the concerned services, people & environment organization . by concentrating on customer
74 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 focus, commitment & a solid (the intermediate phase between team work. By giving a safety a finished design and a completed workplace is an important aspect building) is the responsibility of of the system. Safety should be the contractors and other site included as one of the strategic professionals. The success of a objectives of the construction project depends on the planning work. The construction industry and decisions that are made on stands out from other site. Most construction accidents employments as having one of result from basic causes such as the highest worker injury and lack of proper training, deficient fatality rates. Construction enforcement of safety, unsafe comprises a very small equipment, unsafe methods or percentage of the overall sequencing, unsafe site workforce. Yet, the incidence rate conditions, not using the safety for non-fatal injuries and equipment that was provided, illnesses exceeds that of many and a poor attitude towards other industries. The safety (Toole, 2002). Often the construction industry has the role of the various contractors is most fatalities of any other unclear as some contractors may industry sector (Bureau of Labor try to transfer responsibility for Statistics, 2004). Some studies safety to others. The most have shown that a fairly large common construction project percentage of construction arrangement is that of general accidents could have been (prime) contractor/subcontractor. eliminated, reduced, or avoided Before any excavation has taken by making better choices in the place, the contractor is design and planning stages of a responsible for notification of all project (Hecker 2005). The applicable companies that problem is not that the hazards and excavation work is being risks are unknown, it is that they performed. Location of utilities is are very difficult to control in a a must before breaking ground. constantly changing work During excavation, the environment. Construction safety
75 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 contractor is responsible for engineering design, or reduced by providing a safe work incorporating a safety device, environment for employees and then warnings, instruction, and pedestrians. Access and Egress is training are the last resort. This also an important part of process has been applied to the excavation safety. Ramps used by design of products, equipment, equipment must be designed by a machines, facilities, buildings, competent person, qualified in and job tasks. Manufacture, structural design. assembly, and maintenance are considered during the design process. Safety has a high degree of Uncertainty. The main four factors of uncertainty are (1) Inherent Variability (2)Estimation error II. Overview of safety (3) Model Imperfection Designing for Safety (DFS) is the (4) Human error process that incorporates hazard analysis at the beginning of a The sub factor of Inherent design (Hagan). This process Variability is the randomness in starts with identifying the the characteristic of the hazard(s). Engineering measures workplace & the environment to are then applied to eliminate the which the work place is exposed. hazard(s) or reduce the risk. The The sub factors of Estimation hierarchy of design measures Errors are incomplete statistical starts with eliminating the data, inaccurate estimate of the hazard(s) by engineering design. parameters of the probability If the hazard(s) cannot be models of Inherent Variability. eliminated by engineering design, The sub factors for Model then safety device(s) are Imperfection are due to the incorporated. If the risk of injury application of ideal model to cannot be eliminated by
76 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 describe safety. The ideal model improved ergonomics. is due to ignorance & inability to Construction workers are already understand the phenomenon of at a higher risk of accidents than safety measures. The sub factors in any other industry, and the for human errors are uncertainty large influx of workers from that is due to human errors, Eastern European countries is errors in design, construction & presenting considerable operations. additional challenges to employers’ efforts to manage health and safety. In India the construction industry is the second largest employer next to agriculture whereas it is next to the road accidents in our country. The annual turnover of the construction industry in India is about 4000 Billion Rupees, which is more
III. Need for safety management Safety & productivity are co-related. It is a false belief in construction industry that safety management is an additional expense that hinders productivity. But it is not true. Safety management strategies improve productivity through than 6% of the National GDP curtailments in delays & employing a large work force. distraction enhances team work, While dealing with safety cleaned & orderly worksites & management, the main three questions arise:-
77 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 (1) What types of difficulties construction & to enhance the arises on the projects? performance with a (2) To what extent the safety questionnaire. issues affect the project? IV. Methodology (3) What measures can be taken As this topic is very vast to improve safety performance of with different types of problems a construction project? with different scales of industry Some of the basic objectives to deal with problem and to behind taking this study are as conclude from it. Health and follows:- safety is taken extremely 1) To conduct literature survey:- seriously at every level of the We have to study various papers company, from the board of and literature on construction directors’ right through to the safety measures & different site teams and operatives. We are situations in which measures are always striving to find new followed/ not followed & to methods and initiatives in which develop models for safety climate the workforce can be involved on safety. while working towards a safe site 2) To study the factors:- We have environment, as well as to study the main factors which reviewing and improving existing play an important part in safety systems. so taking this into measures & allocations of the consideration we will be working factors as per the perspectives of in the following manner contractors & workers. 3) To examine the safety management implementation i.e. .the efficiency with which it is applied in the industry. 4) To provide a practical suggestion & recommendations towards safety management to (i) The question will be sent to include in the system of large scale construction firms.
78 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 (ii)Thus a stratified sampling References :- method will be used to select the (1) Sherif Mohamed, “Scorecard sample of construction firms. Approach to Benchmarking (iii) Random sampling method Organizational Safety Culture In Construction”, Journal of will be used if construction firms Construction Engineering & who are not ready to provide the Management, Vol-129, No-5, 2003, responses for the question given pp.80-88. to them. (2) Enno Koehn, P.E., and Nirmal K. (iv)The primary data will be Datta “Quality, Environmental, collected mainly from all the Health and Safety Management construction labors with the help Systems for Construction of interviews. Engineering”, Journal of (v)Concluding that Identifying Construction Engineering & the different reasons regarding Management, Vol-129, No-5, October2003.pp-562-569. not following safety measures & (3) Frank Gross and Paul P. Jovanis suggesting medical procedures ,“Current State of Highway Safety for following a mandatory safety Education: Safety Course. measures. (4) Journal of Structural Conclusion Engineering, Vol-115, No-5, May While concluding on this 1989, pp.1119-1140. important topic of construction (5) Ovedit levsen, “Fundamental safety, I would like to refresh you Postulate in Structural Safety”, all that each site poses its own Journal of Engineering Mechanics, unique challenges in terms of Vol-109, No-4, August 1984, pp.1096-1102. industrial safety requirements which have to be tackled by (6) Yiquan Chen, Sebastian Tan and sincerity and professionalism. Samuel Lim, “Singapore Work Place Modern management and Safety & Health Research Agenda: machinery are helpful in Research-To-Practice”, Journal of achieving these objectives when Safety, Health & Environmental used in a disciplined way. Research, Vol-8, No1, 2012, pp 27- 32.
79 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 IMPROVING STARTUP TIME AND PROVIDING SECURITY TO SNAPSHOTS ON LINUX PLATFORM Sheetal R. Tambe, Monika Shinde, Rohini Hire, Shanku Mandal, Rokade S. M. Department Of Computer Engg., Sir Visvesvaraya Institute Of Technology,Nashik.
Abstract: This paper provides smaller, which results less an entirely new mechanism time to write to the disk for traditional shutdown and substantially. It results in boot to improve the boot fresh user session to be time and provide user session conducted more quickly as more quickly. Generally, compared to that of the before closing the kernel traditional. threads and services all the Keywords user space applications are Freeze, thaw, system closed initially during image, kernel, boot kernel, target shutdown. Finally for a kernel, snapshot image, complete shutdown, all the hibernation security, encryption devices prepare themselves. on hibernated image. It takes more time to resume from a complete shutdown. 1. Introduction Therefore, the following Nowadays it is common technique is enhanced in this to have a suspend option in paper. In this technique, the several laptops. The suspend user session is followed by option saves the state of the hibernation of the kernel machine to a file system or to a session. The full hibernation partition and switches to includes a lot of usage of standby mode. The machine memory pages by the user can continue its work by application compared to the resuming the machine saved proposed hibernation in state that is loaded back to which the data is much RAM .It has two benefits. First is that we save ourselves, the
80 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 time machine goes down and encryption mechanism for then boots up, when running hibernated snapshot image. from batteries the energy costs Hence, to overcome this are really high. The other gain problem we are providing is that the processes that are security i.e. we are going to add calculating something for a long an authentication test before time shouldn’t need to be accessing the snapshot image. written interruptible as we The system is so designed that don’t have to interrupt our it checks for the password programs. The technique to be before accessing the snapshot implemented saves the state of images. The level of security for the machine into active swaps the snapshots will be and then reboots or power maintained by using encryption downs. The user must explicitly algorithm to encrypt the image. specify the swap partition to To achieve this some changes in resume with “resume=” kernel the kernel level coding is firstly option. The saved state is required. The system loaded and restored if the implementation requires signature is found. The complete understanding of the resuming is skipped, if the entire flow of the system i.e. option “noresume” is specified how actually hibernation is as a boot parameter. The carried out in the system. The hibernation image is saved area is to be found out on which without compression if the the hibernated snapshot image option gets loaded on disk. The “hibernate=nocompress” is security at a higher level is specified as a boot parameter. achieved by adding encryption The existing hibernation algorithm. process has a problem that it The concept of power does not provide any security to management is implemented in the hibernated snapshot various different ways in images. Secondly, there is no computers viz. suspend, stand-by
81 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 and hibernate. Standby is quicker than performing a full sometimes referred to as power- boot sequence. In this paper we on suspend which is a low-latency are trying to propose the power state. In the standby state, technique which will provide the the power is conserved by placing security to snapshot images at the CPU in a halt state and the kernel level. We will ask for the devices are placed in the D1 password on hibernation for the (Class-specific low-power state) corresponding image then the state. The response latency is user session will be logged off. minimal typically less than 1 While in hibernation, the image second but the power savings are of the kernel space will be not significant. Suspend is also created and is to be stored on a known as suspend-to-RAM non-volatile disk. The proposed commonly. In the suspend state, hibernation data is much smaller all devices are placed in the D3 which will take substantially less state (state in which device is off time to write to the disk as and not running) and the entire compared to a full hibernate system other than main memory which includes a lot of memory is expected to maintain power. pages in use by user applications. The content is not lost as Finally, apply some suitable memory is placed in self-refresh encryption algorithm to provide mode. The response latency of high security. standby is higher, yet it’s still very low between 3-5 seconds. Objectives The most power is conserved by The main idea of the hibernate due to turning off the paper is to provide a kernel level entire system after saving state security to the hibernation state, to a persistent medium, usually a reduces the time required for disk. All the devices are powered booting after traditional shut off unconditionally in the system. down. We are taking into The response latency is highest consideration with resuming the (about 30 seconds) but still previous state of the system
82 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 which has already been done conserve the amount of energy using hibernation in the existing being used there has been many system. While resuming the advances in system and software hibernation image it will architectures. The power authenticate the user using the management that the OS must password to conduct a new user handle is of two types -System session. The future scope of our Power Management and Device project can be extended to have a Power Management. System backup to be taken of the power management deals with snapshot images of hibernated the state which the system states. Further higher security governs. The System Power can be achieved by using the Management will include higher level and complex shutting down the system as well encryption algorithm. as booting it up and taking the system again into a usable state. 2. Literature Survey However, with the 2.1 Power Management in Linux implementation of power Power management in a management of Linux, it is also device is defined as the process possible for system to enter into a by which the overall consumption low-power state which ultimately of power is limited based on user saves the power. Device Power requirements and policy of a Management deals with the computer. As laptops and mobile concept of putting individual phones have become more devices into a proper state as common place, power directed by the user events. The management has become a hot Device Power Management topic in the computer world in describes the state in which a recent years and the users have particular device is working. This become more conscious of the module can put devices into OFF environmental and financial or ON state as well as other effects of limited power power-saving states. In Linux, resources. Over the years, to the Advanced Configuration and
83 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Power Interface (ACPI) is the restoring previously saved state interface in which the mapping of instead of reinitializing the entire events is done to appropriate system in System power action. An industry-standard management. The battery life in interface for power management mobile devices is conserved by in laptops, embedded devices or reducing the annoying wait to desktops is established in an boot into a usable state for the open-industry specification. computer. Several mechanisms like suspend-to-disk, suspend-to-ram, 2.2 Advanced Configuration and standby or shutdown is Power Interface (ACPI) supported by this The Advanced implementation. A state is Configuration and Power defined for the system and each Interface (ACPI) specification of the devices in each mechanism. was mainly developed to Though the most obvious area of establish industry common benefits is conservation of power interfaces. The industry common and hence longer battery life in interfaces enabled robust power management of devices operating system (OS)-directed like laptops or embedded devices motherboard device, Faster is concerned. Boot time is one of Booting Technique configuration the important areas of and power management of both application for power entire systems and devices. The management. To provide a more key element in Operating efficient and faster booting System-directed configuration process several mechanisms are and Power Management is ACPI. proposed in Linux kernel or All classes of computers including related patches. It is much (but not limited to) desktop, desirable at user-level to get the mobile, workstation, and server user screen in lesser time known machines are suitably specified as faster booting. The boot time defined within the concepts of of a system is reduced by the interfaces and OSPM. The
84 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 concept that system should if the power button is pressed or conserve energy by the transition lid of laptop is closed. The of unused devices into low power /etc/ACPI/handler.sh is a shell states including placing the script containing action to be entire system in a low-power executed for each of the events in state (sleeping state) whenever which these actions are defined. possible is promoted by The command ACPI listen is OSPM/ACPI perspective, from a used to recognize how events are power management. For detected by ACPID. The instance, if a laptop is having command ACPI listen is run in battery’s power at critical level, the shell to demonstrate. When a ACPI will detect the event and certain power event occurs it will put laptop in a low-power state display the tokens generated like suspend- to-ram or suspend- after listening to ACPI port. For to-disk. instance, if power button is 2.3 How ACPI works? pressed then the tokens With the help of a process generated are power/button running in background called PBTN 0000000000000b31. The ACPID (ACPI Daemon) ACPI parameters to handler.sh are the subsystem is implemented on tokens generated which will devices. The user-space programs match the tokens with of ACPI events are notified by appropriate case and the action ACPID. The power management will be executed. By creating an of system is controlled and the event file and corresponding rules defined against the events action script the user can define are executed as they occur. The his/her own event and rules. /etc/ACPI/events is the default 2.4 Suspend-to-RAM (Sleep) rule configuration file for ACPID. The Suspend-to-RAM The default rule configuration state offers significant power file comes with a number of savings. In this state everything predefined actions for triggered in the system is put into a low- events, such as what is the result power state, except for memory,
85 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 which is placed in self-refresh However, for example, if mode to retain its contents. one of the devices refuses to While in memory the system and freeze then we need to wake up device state is saved and stored. all already frozen devices, thaw 2.5 Suspend-to-Disk processes and enable non-boot (Hibernation) CPUs. By booting with At a higher level, for the ’resume=¡swap partition¿’ execution of suspend-to-disk command line parameter, the operation ACPI will be kernel-driven resume procedure responsible and then the kernel may be started (where ¡swap will implement actions to take partition¿ is the one the suspend over to suspend the system and image has been written to in step the system state will be saved (7)). Then, the following actions before shutting down the system. are performed: The Suspend-to-Disk operation is 1. The suspend image is read into composed of various steps where RAM. different part of swsusp performs 2. Devices are prepared to each step. The actions performed resume. by kernel are as follows: 3. System memory state is 1. Non-boot CPUs are taken off- restored from the suspend line. image. 2. Tasks are frozen. 4. Devices are woken up. 3. Some memory is released, if 5. Tasks are thawed. necessary. 6. Non-boot CPUs are enabled. 4. Devices are frozen. Freezing and thawing 5. Atomic copy of the memory tasks, snapshotting memory and (aka suspend image) is restoring its state, then saving created. and loading kernel image are the 6. Devices are woken up. most important steps involved 7. The suspend image is written here. Hibernation process is to a swap partition. initiated from the function 8. The system is powered off. hibernate () in “hibernate.c”.
86 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 2. Related Work one zone. Each zone bitmap object consists of a list of objects 2.1 Architectural Design of type struct bm-block that and The design of the each block of bitmap certain proposed system-Improving information is stored. Struct Startup Time and Providing memory-bitmap contains a Security to hibernated snapshot pointer to the main list of zone images is as follows: When a bitmap objects, where a struct shutdown command or restart is bm-position is used for browsing invoked the entire code is run the bitmap and for allocating all within the kernel code. The of the zone bitmap objects and working of the system design bitmap block objects a pointer to starts with user giving shutdown the list of pages is used. A pointer command and which in turn calls to list of bitmap block objects and the proposed module. Following a pointer to the bitmap block Figure depicts sequential object that has been most processes involved in the recently used for setting bits is implementation process: contained in struct zone-bitmap. The memory bitmap data structure is used for storing the Additionally, the struct snapshot images. The memory zone bitmap contains the pfns bitmap data structure is used to that correspond to the start and map the memory pages that are end of the represented zone. A to be saved and that are pointer to the memory page in forbidden. Basically, the pages which information is stored (in that are to be included in the the form of a block of bitmap) is system image are indicated in contained in struct bm-block. bitmap. A structure consisting of The struct bm-block also many linked lists of objects is contains the pfns that correspond called as Memory bitmap. to the start and end of the The main elements of list represented memory area. It also are of type struct zone bitmap contains numbers of normal and and each of them corresponds to
87 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 highmem page frames allocated for hibernation image before suspending devices. For marking saveable pages (during hibernation) or hibernation image pages (during restore) memory bitmap is used. Memory bitmap is used during hibernation for marking allocated page frames that will contain Figure 2.2: Architectural Design copies of saveable pages. During – Resume the restore process memory bitmap is initially used for 3. Proposed System marking hibernation image The main idea of the pages, but then the set bits from proposed system is to provide a it are duplicated in @orig-bm kernel level security to the and it is released. hibernation state, reduces the time required for booting after traditional shut down. With resuming the previous state of the system which has already been done using hibernation is concerned. During the process of resuming the hibernation image, the task of authentication of the user using the password will be done. The scope can be further Figure 2.1: Architectural design – extended to have the backup of shutdown the snapshot images of hibernated states. The high security can be achieved using
88 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 the higher level and complex The study of comparison encryption algorithm. of boot times between normal boot mechanisms and our improved mechanisms has been worked out. After the observations done we had our conclusions that the normal boot takes about 40 seconds, but taking into consideration of the proposed system we visualize the idle screen within 3 seconds and Fig.:3.1 Flow Diagram for total boot time does not exceed 6 shutdown seconds. We realize an additional The main difference in reduction of 0.5 seconds if the existing system and proposed approach is applied at the boot system is checking of swsusp loader level. Some functions must signature before loading the be implemented in the boot hibernated image to RAM. The loader which is already main aim is to provide security implemented in the kernel to for the snapshots. have our mechanism applied at the boot loader level: snapshot image loading, initializing some devices, and some other functions. As a result, another 0.5 seconds can be eliminated by applying these mechanisms at the boot loader level. The required works are much less than the snapshot boot although Fig.:3.2 Flow Diagram for the boot loader level approach Resume requires additional management
89 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 and implementation. Finally, this security level of data get result suggests that there is a increased by great extend. trade-off between boot time and management cost. Bibliography
4. Conclusion [1] Reducing boot time in Linux Nowadays the security of devices, C lass We- the data is very much important 2.2,Embedded Live Conference, factor to be taken into UK.2010,Chris Simmonds. consideration. It is necessary to [2] Efficient operating system switching using mode bit and maintain the consistency and hibernation mechanism, CSI security of the hibernated image publications 2012. when the system gets hibernated [3] System Power Management and your work is saved on disk. States Various technology are Documentation/power/sataes.tx implemented in the computer t. system like suspend to disk, [4] Swsuspend Porting Notes suspend to RAM, TuxOnIce for http://tree.celinuxforum.org/Cel hibernation but none of the fPubWiki/SwsuspendPortingNo mentioned technologies provide tes. [5] swsusp for OSK http: the security and encryption for //lists.osdl.org/pipermail/linux- the hibernated snapshot image. pm/2005-July/001077.html Thus the main aim was to design [6] OMAP 5912 Starter Kit and implement a completely new http://tree.celinuxforum.org/Cel mechanism for providing security fPubWiki/OSK . and encryption. So here’s a new [7] Das U-Boot-Universal design and implementation that Bootloader will use the concept of suspend- http://sourceforge.net/projects/ to-disk for providing the user u-boot. with an experience of secured [8] Execute in Place (XIP) http:// www .montavista.co.jp/ hibernation. With the new technique to hibernation the
90 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
[9] KernelXIP http://tree.celinuxforum .org/CelfPubWiki/KernelXIP [10] Prelink http://people.redhat.com/ jakub/prelink.pdf [11]MakingMobilePhonewithCELinu xhttp://tree.celinuxforum.org/C elfPubWiki/ITJ2005Detail1_2d 2
91 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
Efficiently Securing Privacy of User Information in Cloud Based Health Monitoring System
Mr. Dhoot Suyog S, Prof. Naoghare M. M, Mr. Shinde Girish. R (Department of Computer Engineering, SVITCOE, Pune University, India) Email: [email protected]
system which is operated
through mobile phones or ABSTRACT: Cloud computing is emerging technology which used for new wireless network is important revolutionary approaches for user and society. Through cloud a decision cum aspect in the field of technology feedback system has been developed to and adopted by developing obtain report, precautions based on user health information at low cost. System countries. In remote areas of can be operated through mobile and Caribbean countries Microsoft provide good market for health service launched project “MediNet” i.e. provider. Privacy of user health information and the properties of health health monitoring feedback service provider kept to be secure in
cloud. New system required to maintain decision system for diabetes and
privacy with less overhead al client side cardiovascular diseases. User for secure computation. Programs give its health related through which feedback or decision reply to user kept to be secured from information as input which inside and outside attacks. Decryption outsourcing and newly proposed secure passes through web based key duplicate 2-way encryption are medical application programs to selected to reduce overhead at client side with securing privacy of information. give decision or feedback or
This paper demonstrates the precaution to user based on effectiveness of system in terms of security and computational program is set. It provide good
performance. market sector for health service
Keywords : Health monitoring, Health provider to deliver its service to service provider, Privacy, Decryption user for various deceases. User outsourcing, secure key duplicate 2-way can select their medical health 1. INTRODUCTION service provider based on privacy of information and efficient The most important aspect computing provide by them. of human life is its health. Health service provider operated Remote health monitoring through cloud so that less setup
92 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 cost is required and user can survey shows that majority of access this system from people are very careful about anywhere and also user get privacy of their health output al low cost so user information. User is not ready to acceptances will also increase. adopt this type of system as they Due to involvement of cloud think that the information which more work will performed by is passed through electronic or cloud and less computational wireless media can be break or work done at user, health service tackled at any level of operation. provider side. In order to move In order to make effective system towards a developing nation this privacy of user information is type of systems are very useful to very needful so that large maintain the health of peoples number of peoples gets involved which lived in remote areas. in such systems. Existing rules, Government can take initiative regulations, laws, standards that in budget allocation which will are set by regulating agencies be needful project in the field of and standard creation medical, health science. committees are applicable for only static record system and not Cloud based health considered for cloud based monitoring system is useful environment. Some laws are put system but security or privacy of limitations on cloud to maintain information is important factor security of user information but while design this type of system. not provide any constraint on Maintaining privacy of user health service provider. information which contains Internal user like employee sensitive data regarding their of health service provider might health status is required at high obtain useful information of level. Information may be client and can use that for breached at different operations multiple reasons. Insider attacks like storing, monitoring, are very dangerous if information applying, communicating etc. A is shared with mediclaim,
93 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 insurance, third party provide security of its other agencies/user, regulating identifiable information. User agencies or forums. This input information which is information can be used to passed to health service provider maintain record of user or for contains sensitive information research purpose. Effective like blood group, blood pressure mechanism is required to deals or any other related health with insider attacks from health information can use by third service provider by using party person so security of it different security schemes and breaks. By using this type of providing constraint on them to information, identification of use user information to provide individual user is possible so decision or feedback. privacy is not maintained in Anonymization technique is electronic or wireless deals with privacy of data but it environment. Proposed only consider normal mechanism will provide a good information of personal solution to maintain user information of user. Personal information and provides identifiable information is now alternative choice to deals with diverse which contains multiple privacy of user. Health service things of user input like weight, provider has their medical blood group, biometrics programs to provide feedback to information regarding health. user which act as intellectual Any information which identify property rights of them. Privacy or associated with user is of these programs is also personal identifiable information important so that correct, that may be related to history, accurate and timely feedback will geography, relationship, biology, receive by user. If user send its Vocation, genealogy etc. of user. blood pressure value as input to Existing techniques are only service provider and if security of deals with user basic information service provider program is like name, address but not tackled then user get wrong
94 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 decision or feedback so it not provider or cloud server must provide good data give guarantee regarding communication system. accurate information contained Maintaining privacy of health in output. service provider medical In this we proposed system in programs is also important which first we consider the aspect for secured system. privacy problems and try to In order to maintain privacy provide some kind of solutions or of information, computational alternatives. In advance system workload is also to be think for by using outsourcing decryption involved parties. Important technique privacy of user factor to be considered, that how information and intellectual to move more computation property rights of health service overhead on cloud as compared provider are secured in effective to client with also considering way. In last design a secure key security of user information and duplicate 2-way encryption application programs. Proposed mechanism is used to move system has provided more computational overhead to cloud emphasis on insider attack and and less computational work is also considers outsider attack required at user and health which tackled privacy of service provider side. It was information. Insider attacks proposed that health service must handle at high level and provider will only work online at some other predefined initialization or setup phase and techniques are available to deals then it will go offline so that user with outsider attack that may can work with service provider include cryptographic schemes to through cloud. Computational maintain integrity of data, work is move to cloud in that certificate authority schemes, security or privacy of user digital signature etc. The information is encrypted and feedback or decisions which are decrypted so none of attack is getting from health service possible. By using new scheme a
95 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 complexity required for setup, Fig 1: Architecture of cloud data computing, validations, storage service computations will get reduce so 2. LITRETURE SURVEY it also not provide any additional 2.1 Johannes Barnickel, Hakan overhead on cloud as well as Karahan, Ulrike Meyer proposed user. User at client side, cloud a system for security and privacy server, health service provider architecture and implementation can communicate securely by of the HealthNet mobile maintaining privacy of their electronic health monitoring and information and efficiently for data collection system. Privacy fast and good data and security is achieved through communication system. data avoidance, data A number of mechanisms are minimization, decentralized available for maintaining privacy storage, and the use of but those only deals with some cryptography. This system does information of client and not not deal with centralized consider health service provider approach where as health service programs. User can use mobile provider program does not phones to access cloud based secure. health monitoring system which consist of sensors to obtain 2.2 Rifat Shahriyar, Md. Faizul health information in that case Bari, Gourab Kundu, Sheikh resource constraint of that device Iqbal Ahamed, and Md. Mostofa also maintained for less Akbar proposed Intelligent overhead. Mobile Health Monitoring System (IMHMS) for improving communication among patients, physicians, and other health care workers. Security in IMHMS is provided by using RFID. Each patient will be provided RFID
96 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 tags that will be used to uniquely service provider and more identify the patient. The IMS records are required for will maintain patients profile monitoring program. information with the RFID in the central database. So malicious 2.4 D. D. Kouvatsos, G. Min and attacks can be blocked using this B. Qureshi research on information because a patient Performance Issues in a Secure can be easily tracked using RFID. Health Monitoring Wireless As it require large memory and Sensor Network. It concerned cost so high computational with Data Privacy at acquiring complexity required to secure level, Data security at user personal information from transmission level, Data security unauthorized access. at healthcare provider level. In that therefore a new secure 2.3 Minho Shin, Research Article transmission protocol is required on Secure Remote Health providing optimal transmission Monitoring with Unreliable control and bandwidth Mobile Devices in which he utilization to incorporate provided risk analysis and multimedia (audio / video) data. present a framework for secure remote health monitoring 2.5 BENJAMIN C. M. FUNG, systems. We also designed a KE WANG, RUI CHEN, PHILIP health monitoring architecture S. YU present A Survey of that leverages a special Privacy-Preserving Data monitoring unit that plays the Publishing which state that central role of the security by Detailed person-specific data in providing critical security its original form often contains services including sensitive information about authentication, audit, key individuals, and publishing such management, and data fusion. data immediately violates This system does not concerned individual privacy. The current regarding security of health practice primarily relies on
97 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 policies and guidelines to restrict Chow, Qian Wang proposed the types of publishable data and service i.e Privacy-Preserving on agreements on the use and Public Auditing for Secure Cloud storage of sensitive data. The Storage. Enabling public limitation of this approach is auditability for cloud storage is that it either distorts data of critical importance so that excessively or requires a trust users can resort to a third party level that is impractically high in auditor (TPA) to check the many data-sharing scenarios. For integrity of outsourced data and example, contracts and be worry-free. To securely agreements cannot guarantee introduce an effective TPA, the that sensitive data will not be auditing process should bring in carelessly misplaced and end up no new vulnerabilities towards in the wrong hands. user data privacy, and introduce 2.6 K. Venkatasubramanian, and no additional online burden to S.K. Gupta proposed user. This system does not deal AYUSHMAN: A Secure, Usable with attribute based encryption Pervasive Health Monitoring and also no operation on System. It integrates health encrypted and decrypted data monitoring sensors with highly can be performed. capable entities to robustly 2.8 Arvind Narayanan, collect patient data, and utilizes shamatikov gives viewpoints physiological values for regarding Myths and Fallacies of generating keys and securing “Personally Identifiable inter-sensor communication. Information” which state that Security of health result is not any information, recorded or provided also centralized otherwise, relating to an approach is not possible in that identifiable individual. It is .It also not deal with insider worth noting that the collected attack. information from a mHealth 2.7 Cong Wang, Student monitoring system could contain Member, IEEE, Sherman S.M. client’s personal physical data
98 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
such as their heights, weights, and blood types, or even their ultimate personal identifiable information such as their fingerprints and DNA profiles.
3. PRELIMERIES CONCEPTS /ALGORITHEMS
3.1 Branching program Fig 2: Branching Program It include binary classification.. Based on input 3.2 Outsourcing decryption user value a tree will traversed and will provide result present at It is useful technique for leaf node. Let h be the vector of information based encryption clients information in terms of and useful to move attributes. Each value has index computational workload from and value to create information user to cloud server. Using this component with information secure key transferred to index and the respective transformation key so that user information value. First element will require single computation is a set of nodes in the branching in order to receive feedback or tree. Node above leaf node is take decision. It work with multi- decision and label associated dimensional range query based with decision or feedback is anonymous IBE which consist of present at leaf node. different algorithms: initialization to start the setup, encryption of user input value, transformation of secure key to transformation key, encryption of key with information, decryption performed by client
99 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
and cloud. Third party auditor communicate with service will use above scheme to provider. maintain secure and private 3.3.4 EncryptionHP: It done by communication between all health service provider and parties involved in system. generate cipher text to deliver to Holomorphic encryption is used user through cloud. for encryption of information vector given by user. 3.3.5 ReencryptionHP: It performed by third party 3.3 Secure key duplicate 2- auditor and duplicate server to way encryption encrypt decision or feedbacks This scheme only transfer which deliver to user. required cipher text to other 3.3.6 Decryption: It performed party in order to maintain by client after getting decision security of underlying message or feedback so that user only with two features i.e. key gets valid information privateness and unidirectionality. It consists of 4. SETUP AND WORKING following algorithms: A cloud based health 3.3.1 Initialization: It monitoring system consist of four performed by third party components or parties i.e. User auditor after receiving user at client side, Cloud server, information in form of vector. Health service provider, Third party auditor. Figure 3 shows the 3.3.2 Keygen: It performs by architecture of proposed system user and third party auditor to without using newly proposed create secure key i.e. Private key secure key duplicate 2-way that send as transformation key. encryption which not gives performance analysis in terms of 3.3.3 Rekeygen: It performed by third party auditor which again security and efficiency. computes new secure key to
100 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
provider. After initialization health service provider will store there medical application program in the form of branching tree in cloud. This branching tree is encrypted and generated cipher text is stored in cloud. To identify service Fig 3: Architecture of Proposed provider, each service provider System gets one index and along with An advance system which uses that index encrypted branching secure key duplicate 2-way tree program stored in cloud. encryption for security and When particular user wants efficiency in which 2 ways decision or feedback from service encryption is done. It also provider it starts token reduces computational workload generation operation in at user side and move to cloud. association with third party auditor. Client sends index value of health service provider along with its input value vector which consists of user health information. User input query passed in the form of vector with Fig 4: Module Structure of information components. After Proposed System getting input query from client, During initialization third Third party auditor generates party auditor initializes and run one token and sends to client. setup phase and it generates During this process third party system required parameter. After will not get any user identifiable generation of system parameter information. Client sends token it will publish it to user, cloud to cloud server which require for server and health service getting decision. Third party
101 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 auditor will validate tokens and as it was getting in input vector cloud send this token to health and branching programs of service provider then service health service provider is not provider will generate feedback revealed to cloud by applying or decision based on decision tree outsourcing decryption structure and it was passes to technique. Decision or feedback cloud in cipher text format. is encrypted and user only gets Cloud will get partially decrypted that information so it was also cipher text which pass to client. secured. Due to token As this is partially message so generation mechanism cloud not get any useful communication between user, information of decision or service provider and cloud feedback. server is secured. A system is designed and security of information is maintained by creating tokens. Cloud had been formed and access through website.
5.2 Efficiency: Computational complexity of service provider depends on number of nodes in branching program and computational mechanism. As Fig 5: Final Structure of service provider goes offline Proposed System after setup phase less computations are required and 5. PERFORMANCE ANALYSIS computational overhead will move from user side to cloud 5.1 Security: Cloud server will server for fast and timely not get any personal communication system. identifiable information of user Experiments will be conducted
102 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
in order to compute result after reduce so small organization or completion of project. companies can take participation in business and create their 5.3 Mobility: System can be market in efficient way. Security handled through wireless and Effectiveness are achieved network or electronic media with through proposed system. any platform.
6. CONCLUSION ACKNOWLEDGEMENTS Whenever we are standing on Cloud based health monitoring most difficult step of the dream system efficiently secure privacy of our life, we often remember of user information and the great almighty god for his application programs of health blessings kind help. And he service provider. To maintain always helps us in tracking on privacy of user information in the problems by some means in cloud and from insider attack, our lifetime. I feel great pleasure the anonymous Boneh-Franklin to represent this seminar entitled identity based encryption (IBE) Efficiently Securing Privacy of is used to deals with personal User Information in Cloud Based identifiable information. Health Monitoring System. I Outsourcing decryption would like to convey sincere technique will reduce gratitude to my seminar guide computational overhead at user and M.E. Coordinator Prof. M. side and move to cloud. M. Naoghare for her valuable Branching program are guidance and support and who encrypted using different branch guided me provided me with his node values for maintaining useful and valuable suggestions security of that. By applying and without his kind co- newly developed secure key operation it would have been duplicate 2-way encryption extremely difficult for me to scheme computational overhead complete this paper. at service provider side will
103 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
I would also like to extend my Symposium on Security and gratitude to our respected Prof. Privacy, 2007, pp. 350–364. S.M. Rokade, Head of Computer [3] G. Clifford and D. Clifton, Engineering Department for “Wireless technology in disease their kind co-operation for the management and medicine,” betterment and successful Annual Review of Medicine, vol. 63, pp. 479–492, 2012. completion of this paper and support they ever provided to [4] Cong Wang, Student Member, IEEE, Sherman S.M. Chow, Qian me. And last but not least I Wang, Student Member, IEEE, would also like to thanks my Kui Ren, Member, IEEE, and parents and all my friends for Wenjing Lou, Member, IEEE, their encouragement from time “Privacy-Preserving Public to time. Finally, I am very Auditing for Secure Cloud grateful to Mighty God and Storage”. inspiring parents who loving and caring support contributes a Books: major share in completion of my [5] Krishna K. Venkatasubramanian, task. Sandeep K. S. Gupta,”security for Pervasive Health Monitoring REFERENCES Sensor Applications”. Journal Papers: [6] W. Stallings, “Cryptography and [1] Huang Lin, Jun Shaoy, Chi Network Security: Principle and Zhangz, Yuguang Fang, fellow Practices”, Prentice Hall IEEE,” CAM: Cloud- Assisted [7] Johannes Barnickel, Hakan Privacy Preserving Mobile Karahan, Ulrike Meyer, UMIC Health Monitoring.” IEEE Research Center,” Security and Transaction on Image Processing Privacy for Mobile Electronic VOL: 8 NO: 6 YEAR 2013. Health Monitoring and Recording [2] E. Shi, J. Bethencourt, H. T.-H. Systems” Chan, D. X. Song, and A. Perrig, Proceedings Papers: “Multidimensional range query over encrypted data,” in IEEE [8] J. Brickell, D. Porter, V. Shmatikov, and E. Witchel,
104 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
“Privacy-preserving remote process for patients with diagnostics,” in Proceedings of the diabetes and cardiovascular 14th ACM conference on disease using mobile telephony.” Computer and communications Conference Proceedings of the security. ACM, 2007, pp. 498–507. International Conference of [9] P. Mohan, D. Marin, S. Sultan, IEEE Engineering in Medicine and A. Deen, “Medinet: and Biology Society, personalizing the self-care vol. 2008
105 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 CLOUD BASED MOBILE SERVICE DELIVERY USING QOS MECHANISM Ms.PrachiB.Gaikwad, SVIT,CHINCHOLI Prof. S.M.Rokade, SVIT,CHINCHOLI
Abstract— Cloud computing is resources by improving QoS an emerging trend for large and QoE of mobile services. scale infrastructures. It has This framework tells that the the advantage of reducing services run on public cloud cost by sharing computing are able to populate to other and storage resources, cloud in different location. combined with an on-demand This paper proves that if we provisioning mechanism add resource pool for every relying on a pay-per-use cloud then it is responsible business model. Mobile for removing ambiguity devices always maintain which occurs at the time of network connectivity by migrating services. different network providers. So if user moves around then Key Words—cloud computing, they can access cloud QoS, QoE, service polulation services without any disadvantages. In current I. INTRODUCTION model when user moves from Cloud computing becomes one geographical area to popular now a days because of its another he will keep simple nature. It offers various accessing services from computing and storage services previous cloud over a long over a internet.Cloud service distance. It results in more providers rent data centers congestion on network. hardware and software to deliver There is the need of different storage and computing services approach which maintains through the internet. Internet
106 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 users can access services from hardware resources which is used cloud. Instead of their own to perform some critical task. At devices cloud users can store that situation there is the need to their data on cloud. They can run access those devices remotely their applications on cloud through network for storage and platform without full installation processing. This feature is of software. provided by cloud computing. It Cloud service providers provide provides center based resources various cloud services and and those devices require resources as user requirement decenter based pool of resources. and they charged them It creates traffic congestion accordingly. Cloud computing problem on internet due to user increases its popularity because mobility and high bandwidth of its simple nature. Amazon EC2 services. It affects QoS and QoE and Applel’siCloud are very factors in mobile services. This popular cloud based products[2]. paper consists of framework Those vendors create their own which overcomes the problem by cloud services and offer them to service populating technique. client for business and individual II. LITERATURE SURVEY uses. They create cloud services A project invents a of reshaping as reqirements come from of the physical footprint of market and each have different virtual machines within a than others. cloud[8]. It invents a concept Mobile computing also becomes towards the lower operational more popular due to smartphones costs for cloud providers and and tablet pc’s. Laptops and improvement of hosted desktops are cannot be easily application performance by operate due to its size and form. taking into account affinities and So it increases the demand of less conflicts between replaced virtual weight and size mobile devices machines. It is achieve by than laptops and desktops. But mapping virtual machine thesedevices cannot have some footprints. After comparing if
107 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 similarities found in memory concept the physical machines footprint the virtual machines closer to the clouds outer are migrated to the same memory boundarywill used to handle QoS location and content based sensitive services. As these memory sharing also deployed to machines located on outer get consolidation[9][10][11]. The boundary of cloud the data has to basic thing is to build control travel less distance within the system for cloud which perform cloud before sending to the client. footprint reshaping to achieve It improves QoE for client and higher level objectives like low reduces network congestion of power consumption, high cloud. reliability and better All these researches aims only to performance. It then reduces the improve the cloud performance, cost for cloud providers and no one can think about the user creates low cost cloud services for mobility. Providing media user. services to mobile clients MEC (Media Edge Cloud) becomes popular in future. As per architecture improves the that concept mobility and performance of cloud technology. multimedia contents becomes This architecture also improves more popular and high QoS and QoE for multimedia bandwidth data streams will have applications. To achieve that to travel more distance and reach “cloudlets ” of servers running at moving target can create a edge of bigger cloud. So it problem in future. Cloud handles the request closer to the providers may need to create cloud thus it reduces the latency. more clouds to handle the load If requests needs further and reduces the congestion. processing then requests are sent In cloud computing client get to the inner cloud due to that the services by contacting a physical “cloudlets” are reserved for QoS resources directly and then ask based multimedia about the service. Clients need to applications[13]. Using that connect to the cloud then they
108 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 can accesses the services from the III. PROGRAMMERS DESIGN cloud. But in this approach client QoS aware service delivery model need to know the name of the is necessary to deliver the physical resource which offers services. The network the services to the client, so it the infrastructure is used to decide problem of redundancy. Some the network status between the organizations solve this problem client and service. Service by running multiple servers and providers provide services with by using DNS, for load balancing best QoS and QoE parameters to and fail over [13]. This approach their clients. In this model client needs more cost which is not of cloud services will remain affordable for small entities connect to the same cloud which offers a service at lower without thinking about its layer. physical location and network The ability for clients to request status. If the network condition services directly from the satisfactory and there is no network instead of asking for redundant path the service will physical resources that offers be out of reach of network. So those services[1]. It opens a doors providers not able to reach their for future development. Client SLA standards which results that request a service ID and network clients not getting the best QoE . infrastructure which is used to Another thing is that the cloud find whether the actual service is from any location has connected running and then connect it to to the same cloud to get services the client. This approach is able without thinking about the to running a service in multiple distance of cloud from itself. It locations and directly client results in creating load on cloud requests to the appropriate which degrades the QoS of instance depending on their services. It is not possible for location and network status. cloud providers to build a multiple clouds to provide services to the different
109 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 geographical areas. So there is degrades. At that time it is not the need of new technique for expected to create multiple service delivery which provides clouds by cloud providers. various services to clients with Single cloud providers may not proper QoS and QoE parameter, own multiple clouds at different it is also provide better physical location so it is possible management of cloud to the that many cloud providers have providers, it also reduces the their cloud far apart or down to network congestion. In this regional scale within a country. service delivery model we will So we able to address the issue of have clients who request the service population across the services and their requests will different boundaries of cloud be directed to physical location at providers. It introduces a concept which the service is running by where service providers will fulfilling QoS and QoE register their services globally parameters. In the mobility case and not bound to specific cloud it is difficult to direct client to a providers. Services which are specific instance of service. We globally registered and not bound can connect client to the service with specific cloudproviders will instance based on their present free to populate or migrate to location and network conditions, different cloud depending on QoS but if client move to another and source of service request location with different network parameters. This will only area then it is difficult to get this. possible when cloud providers If the user moves away from the open their boundaries, so services cloud then it creates congestion can move in and out of their on network so it impacts on the cloud. It will change the model of QoS of all services on the service providers. Service network. To solve this problem providers will register their we could connect to the client at services with service level different instance of service each agreement which defines the time the QoS parameter expected QoS parameters. Cloud
110 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 providers provide services with references or object of all best QoS so that it will populate populated and non-populated their service and it gives income services. Another client can for them. It is not possible to any access the populating service big cloud to take all the services without any interruption. due to the network congestion problem. So the services may Smart Recommend populate from bigger cloud to phone ation QO smaller cloud to maintain S network congestion free and Eng minimize the distance of itself ine from client. After populating Us Reco er mme services from one cloud to Clo another the receiving cloud can ud also reject the populating service, if it is already under the heavy Fig: System architecture load. This population of service Above figure shows a system process is completely transparent architecture smartphones and to the user. To achieve all those users are the clients who things there is the need of new accessed the services of cloud. service delivery framework and it Those are mobile clients so if should be QoS aware and support they move from one location to service population. At the time of another then there is the need to migration of any service from one populate the services to another cloud there may be the chance location. So the engine gives the that another user is accessing the recommendations depending on same service so after migration of the QoS parameters. The another service from current cloud leads cloud decides whether to receive to starvation of second client. So or not the populated services. to solve this problem we add IV.SERVICE FRAMEWORK separate resource pool to each The service populating model cloud which is used to keep needs a concept of an open cloud.
111 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 The existing closed cloud only parameters defined by cloud. It is runs services controlled by its also used to migrate the service owner. Open cloud allows to find proper clouds that can services from third party accept the service. At that time if providers to populate it. The service needs an extra resource, open cloud is like resource pool so it can be given & as per that anyone can use these resources to service providers will billed. run their services and anyone can Service Subscription Layer provide such resource pool and (SSL)- It is used to perform the accept services from other +. It is used to keep track of providers to run on it. So the new number of from which location framework comes in model.This accessing a service. proposed framework consists of Service Delivery Layer (SDL)-It six layers. delivers services to specific Those are as follows clients. It is responsible for Service management layer:-It is publishing a service from one used to check how services are cloud to another cloud. Finding registered in a cloud. Billing the appropriate cloud as per the information between resources necessary requirements is done and service providers is and then service is populating to processed. It is considered as part this cloud. of application layer in OSI Service Migration Layer (SMiL)- because it defines the The migration of services applications and how they use between clouds is the resources. responsibility of SMiL . To When service providers wants to populate a service we have to publish & service they have to first sure that the target cloud define security &QoS can accept the service. So the parameters. It is the requirement decision of whether to move or to run the service. So to do that not to move a service is done at each service must have a list of SDL. Using that decision parameters this must agree with SMiLinstruct the cloud about
112 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 which resources need to be suited to pass the client request allocated. on the basis of location of user. Service connection layer:-It Service can decide whether to handles the client mobility issue add new client or reject and pass and also checks the connection them to another cloud if possible. between client and services. The STAR works like a DNS Service network abstraction system. The service subscriber layer:-It provides the abstraction request to the STAR for getting property to simplify the cloud id. Once the cloud id is migration process. It acts as found then it is resolved into IP interface between service addresses of cloud that client can delivery framework and new connect to access the service. technology. The decision making algorithm is used to decide whether to accept or reject the service at delivery IV. IMPLEMENTATION To gather the QoS data and layer. network conditions we are using Algorithm: QoS manager. It collects such 1. Start. data by querying the clients for 2. Create node and start network conditions. It also node. resolves service names into the 3. Start QoS manager which unique service ids. To deliver any checks the QoS of various service we need to connect the services. service to its proper service 4. Then next step is user instance, Service Tracking and authentication which is Resolution(STAR) is used to used to authenticate the connect service subscriber to users. correct service instance. It also 5. Authenticated user is keeps records of service ids and connected to the service. in which cloud their instances are 6. Suppose user access video, running. STAR will make video streaming is going decision which cloud is best on.
113 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
7. At the time of streaming balancing becomes more easier in system tracks the QoS this method. It provides with the help of QoS adaptable resource allocation manager. scheme as services can be 8. QoS manager gives some replicated as per the demand. recommendations those Cloud providers can share their are track by system. resources with providers, it gives 9. On the basis of flexibility to add more resources recommendation system as their cloud needs them. It is takes migration decision. also useful in gaming as it is used 10. Service is migrated to in rendering technique. Most of another cloud or keeps as the traffic generated due to the it is. video and audio streaming so it 11. If service is migrated to reduces the network traffic another cloud system will generated by streaming. As the again check the QoS. distance between client and 12. Stop. services reduces by migration decreases the latency which gives V. APPLICATIONS user the more interactive feel to A QoSbased service delivery multimedia applications by model provides various improving QoE. applications and services. It reduces the network congestion CONCLUSION for frequently accessed websites This paper gives the solution on or that having more multimedia challenges presented by user data. This method consumes a mobility. Previous service bandwidth in streaming which delivery model is inefficient to requires appropriate QoS. In this, provide future requirements of the whole service is populated to mobile user. The cloud the area where it is more technology with proposed model demanded it provides great can bring the solution to proper benefits to that area. Load management of network
114 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 resources. This paper introduced Second (2.5 G) and Third (3 a technique which reduces the G) Generation Wireless congestion on network, which Networks, IETF,2003. generated by streaming video and [6] Amazon, 2012, EC2, Feb. 28, audio. 2012. [Online]. Available: REFERENCES http://aws. amazon.com/ec2/. [1] FragkiskosSardis, [7] Microsoft, GlenfordMapp, Jonathan 2011CloudComputing, Loo"On_the_Investigation_of Feb.28, 2012. [Online]. _Cloud-Based Mobile Media Available:http://www.microso Environment with Service ft.com/enus/cloud/default.asp Populating QoS Aware x?fbid Mechanisms" IEEE trasaction [8] J. Sonnek, J. Greensky, R. on multimedia, vol.15, no.4, Reutiman, and A. Chandra, June Starling: [2] Apple, 2012. iCloud Feb. 15, Minimizingcommunication 2012. overhead in virtualized [Online].Available:http://www computing platforms using .apple.com/icloud. decentralizedAfnity-aware [3] J. Postel and J. Reynolds, ISI, migration, in Proc. 39th Int. RFC 948, A Standard for the Conf. on Parallel Processing Transmission of IPDatagrams (ICPP10),San Diego, CA, Over IEEE 802 Networks, USA, Sep. 2010. IETF, 1988. [9] J.Sonnekand, [4] ETSI, 2011, Mobile A.Chandra,Virtualputty:Resh Technologies GSM, Feb. 15, apingthephysical footprint of 2012.[Online].Available:http:/ virtualmachines, in Proc. /www.etsi.org/WebSite/Techn Workshop on Hot Topics in ologies/gsm.aspx. Cloud Computing (Hot [5] H. Inamura, G. Montengero, Cloud09), San Diego, CA, R. Ludwig, A. Gurtov, and F. USA, Jun. 2009 34. Kha_zov, RFC 3481,TCP over
115 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
[10] T.Wood, G. Tarasuk- [12] W.Zhu, C.Luo, J.Wang, Levin, P. Shenoy, P. and S.Li, Multimedia cloud Desnoyers, E. Cecchet, and computing, IEEE Signal M. Corner, Memory buddies: Process.Mag.,vol 28, no.3 Exploiting page sharing for 5969,May 2011. smart colocation in [13] T. Brisko, RFC 1794, DNS virtualizeddata centers, in Support for Load 3Proc. 5th ACM Int. Conf. Balancing,IETF, 1995. Virtual Execution [14] D.N.Thakker, Prefetching Environments, 2009. and clustering Techniques for [11] D. Gupta, S. Lee, M. network based storage, Ph.D. Vrable, S. Savage,A.C.Snoer- dissertation, Sch. Eng. Inf. en,G.Varghese,G.M.Voelker,a Sci., Middlesex Univ., nd A.Vahdat,Di_er- London, U.K.,2010 enceengine: Harnessing memory redundancyin virtualmachines, in Proc. OSDI, 2008.
116 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 FAULT DIAGNOSIS IN INDUCTION MOTOR
Miss K.R.Gosavi Mrs. A.A.Bhole(Asst. Prof) Department of Electrical Engineering Department of Electrical Engineering Government College of Engineering Government College of Engineering Aurangabad, Maharashtra, India Aurangabad, Maharashtra, India [email protected] [email protected]
Abstract— Although, I. INTRODUCTION Induction motors are highly The study of induction motor reliable, they are susceptible behavior during abnormal to many types of faults that conditions due to the presence of can became catastrophic and faults; and the possibility to cause production shutdowns, diagnose these abnormal personal injuries, and waste conditions has been a challenging of raw material. Induction topic for many electrical machine motor faults can be detected researchers. Induction motor has in an initial stage in order to been established as the prevent the complete failure workhorse of industry ever since of the system and unexpected the 20th century. Speed control of production costs. The purpose AC motors has been a of this paper is the analysis of continuously pressing various faults of inverter fed requirement of industry, so as to induction machine. The ensure better production with a laboratory tests thus high degree of qualitative conducted have been reported, consistency. Although recent and it is hoped that the developments in Power research investigations Electronics and Controls have reported would be very useful brought forth some very to the Power Electronics significant drive alternatives like circuit industry. the Switched Reluctance motor,
117 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Permanent magnet and Brushless the market breakups between AC DC Motor; these have not yet drives and DC drives. The rugged become very popular and cost construction of AC drives has effective for a wide range of opened up a host of new applications, especially in the application areas, thereby damp-proof, dust-proof and providing the user and also the flame-proof environment. manufacturer additional potential Therefore, the widespread use of to increase their productivity and induction motors is still II. CONCEPT OF DRIVE SYSTEMS economically viable as well as While comparing the dynamic popular, and is likely to continue performance of a separately for the next few decades. Variable excited DC motor with that of an speed drives are widely used in all Induction motor, the latter application areas of industry. presents a much more complex These include transport systems control plant. This is due to the such as ships, railways, elevators, fact that the main flux and conveyors; material handling armature current distribution of plants and utility companies for a DC motor is fixed in space and mechanical equipment e.g. can be controlled independently; machine tools, extruders, fans, whereas in the case of AC motor, pumps and compressors. The these quantities are strongly penetration of variable speed ac interacting. This design drives into these sectors has been constraint makes the induction further accelerated by the motor drive structure more development of new power complex and non-linear. The semiconductor devices and drive drive hardware complexity concepts, which further allow increases as more and more new functions and performance stringent performance characteristics to be realized. The specifications are demanded by application of new Power the user. The complexity further Electronic components has also increases because of the variable initiated a significant change in frequency power supply, AC
118 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 signal processing and relatively resistance drop may cause a complex dynamics of the AC higher motor current to flow at machine. light loads due to saturation. PWM Inverters: These effects may overheat the One of the best possible methods machine at low speeds.. to control the torque and speed of These limitations of a six induction motor is to implement step inverter drive are overcome variable voltage and variable in a pulse width modulated frequency inverters. Inverters (PWM) inverter. The basic block used for variable speed drive diagram of a PWM inverter is applications should have the shown in figure 1. capability of varying both the voltage and frequency in accordance with speed and other control requirements. The simplest method to achieve this control is through a six step inverter. But this method suffers from the following limitations: Figure 1. Block Diagram Of (i) Presence of low order Inverter System harmonics, because of which the Because of a low harmonic motor losses are increased at all content in the output voltage of speeds causing derating of the diode bridge and also due to the motor. presence of harmonics in the (ii) Torque pulsation is present at input current of a PWM inverter, low speeds, owing to the presence the requirement of filter size in of lower order harmonics. such systems is small. The drive (iii)The harmonic content system consequently delivers increases at low speeds, thus smooth low speed operation, free increasing motor losses. Also the from torque pulsation, thus increase in V/f ratio at low speed leading to lower derating of motor to compensate for the stator and higher overall efficiency. Also
119 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 because of a constant DC bus • DC link capacitor short circuit voltage, a number of PWM fault F4. inverters with their associated • Transistor base drive open motors can be supplied from a fault F5. common diode bridge. However, • Transistor short circuit fault these advantages are obtained at F6. the expense of a complex control • Line to line short circuit at system and higher switching loss machine terminals F7. due to high frequency operation. • Single line to ground fault at Survey of Various Faults: machine terminals F8. A wide range of motors are • Single phasing at machine currently being used for terminals F9. industrial applications. They A three phase voltage fed deliver a wide range of inverter candevelop any of the characteristics demanded for above stated faults, out of which specific tasks. Motors for all types thepen base drive and shoot of duties and with various through are the most common. characteristics require adequate protection. Hence it is essential that the characteristics of motors be carefully examined and considered before applying protection systems.A three-phase voltage fed inverter can develop various types of faults as shown in figure 2. • Input supply single line to ground fault F1.
• Rectifier diode short circuit
fault F2.
• Earth fault on DC bus F3.
120 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 MOTOR CURRENT analysis due to improper peration SIGNATURE ANALYSIS of the equipment, etc. and is Motor Current Signature there other data that can be used Analysis (MCSA) is a system used in an analysis. for analyzing or trending 3. Take data. dynamic, energized systems. 4. Review data and analyze: Proper analysis of MCSA results Review the 10 second snapshot of assists the technician in current to view the operation identifying: over that time period. Review low 1. Incoming winding health frequency demodulated current to 2. Stator winding health view the condition of the rotor 3. Rotor Health and identify any load-related 4. Air gap static and dynamic issues. Review high frequency eccentricity demodulated current and voltage 5. Coupling health, including in order to determine other faults direct, belted and including electrical and geared systems mechanical health. 6. Load issues FAULT DETECTION 7. System load and efficiency a) Detection of broken bars 8. Bearing health It is well known that a 3-phase symmetrical stator winding fed BASIC STEPS FOR from a symmetrical supply with ANALYSIS frequency f 1, will produce a There are a number of simple resultant forward rotating steps that can be used for analysis magnetic field at synchronous using MCSA. The steps are as speed and if exact symmetry follow: exists there will be no resultant 1. Map out an overview of the backward rotating field. Any system being analyzed. asymmetry of the supply or stator 2. Determine the complaints winding impedances will cause a related to the system in question. resultant backward rotating field For instance, is there reason for from the stator winding. When
121 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 applying the same rotating torque pulsation at twice slip magnetic field fundamentals to frequency (2sf1) and a the rotor winding, the first corresponding speed oscillation, difference compared to the stator which is also a function of the winding is that the frequency of drive inertia. This speed the induced electro-magnetic oscillation can reduce the force and current in the rotor magnitude (amps) of the f1(1-2s) winding is at slip frequency, i.e sideband but an upper sideband s.f1, and not at the supply current component at f1(1+2s) is frequency. The rotor currents in a induced in the stator winding due cage winding produce an effective to rotor oscillation. The upper 3-phase magnetic field with the sideband is enhanced by the third same number of poles as the time harmonic flux. Broken rotor stator field but rotating at slip bars therefore result in current frequency f2= s.f1 with respect to components being induced in the the rotating rotor. With a stator winding at frequencies symmetrical cage winding, only a given by [8]: fsb =f1(1±2s) Hz (2) forward rotating field exists. If These are the classical twice slip rotor asymmetry occurs then frequency sidebands due to there will also be a resultant broken rotor bars. b) Detection of backward rotating field at slip air gap eccentricity: Air-gap frequency with respect to the eccentricity in electrical machines forward rotating rotor. As a can occur as static or dynamic result, the backward rotating eccentricity. Static eccentricity is field with respect to the rotor defined as a stationary minimum induces an e.m.f. and current in air-gap. This can be caused by the stator winding at: Fsb=f1 (1- stator core ovality or incorrect 2s) HZ (1) This is referred to as positioning of the rotor or stator the lower twice slip frequency at the commissioning stage. At sideband due to broken rotor the position of minimum air-gap bars. There is therefore a cyclic there is an unbalanced magnetic variation of current that causes a pull which tries to deflect the
122 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 rotor thus increasing the amount shorted turns and the detailed of air-gap eccentricity. mathematics can be found in the Dynamic eccentricity is defined as references [9-10]. d) Detection of a rotating minimum air-gap. It mechanical influences Changes in can be caused by a bent shaft, air gap eccentricity results in mechanical resonances at critical changes in the air gap flux speeds, or bearing wear. Either waveform. With dynamic can lead to a rub between the eccentricity the rotor position can rotor and stator causing serious vary and any oscillation in the damage to the machine. The radial air gap length results in effects of air-gap eccentricity variations in the air gap flux. produce unique spectral patterns Consequently this can induce and can be identified in the stator current components given current spectrum. The analysis is by [11-13]: e r f f m f 1 (3) based on the rotating wave Where f1 = supply frequency approach whereby the fr = rotational speed frequency of magnetic flux waves in the air- the rotor m = gap are taken as d) At low speed 1,2,3...... harmonic number fe on break B the product of = current components due to air permeance and magneto motive gap changes This means that force (MMF) waves. problems such as shaft/coupling c) Detection of shorted turns in misalignment, bearing wear, LV stator winding The objective roller element bearing defects and is to reliably identify current mechanical problems that result components in the stator winding in dynamic rotor disturbances can that are only a function of be potentially detected due to shorted turns and are not due to changes in the current spectrum. any other problem or mechanical e) Influence of gearboxes drive characteristic. There has Mechanical oscillations will give been a range of papers published rise to additional current on the analysis of air gap and components in the frequency axial flux signals to detect spectrum. Gearboxes may also
123 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 give rise to current components of spectrum is completely free of any frequencies close to or similar to current components around the those of broken bar components. main supply frequency, f1, and Hence, to perform a reliable consequently, the frequency diagnosis of a rotor winding for range in which current motors connected to a gearbox, components due to broken rotor the influence of gearbox bars are expected are empty. The components in the spectrum need motor thus shows no signs of be considered. Specifically, slow broken rotor bars Case II: Rotor revolving shafts will give rise to Asymmetry Rotor asymmetry was current components around the detected in a 50 HP, 440 volt main supply frequency operating in sugar mill during components as prescribed by quality control analysis using equation (5) where the rotational MCSA. The full oad speed is 940 speed frequency of the shaft, rpm yielding a frequency interval rotating with Nr rpm, may be of 48.55 Hz to 51.66 Hz for calculated as: 60 r detection of broken rotor bars. V. CASE STUDIES The motor was operating at 85 Case I: Current spectrum of a Amps, corresponding to healthy motor A 100 HP, 440 V approximately 60% full load. standard efficient motor driving a Based on the load conditions, the pump was tested in sugar mill. instrument predicted current The motor was operating at 95 components due to broken rotor Amps, corresponding to bars to be positioned at 49.0 Hz approximately 75% full load. The and 51.0 Hz. A search band is full load speed was 1775 rpm applied around these positions. yielding a frequency interval of Figure 2 shows one current 48.55 Hz to 51.77 Hz for component to be present in each detection of broken rotor bars. search band. The components are Figure 1 shows part of the distributed symmetrically around frequency resolved current f1, as expected, but different spectrum for the motor. The magnitudes, 47.2dB and 58.0dB
124 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 from the main supply frequency. frequency within the applied The components are a sign of search bands. initial rotor asymmetry but yet not indicative of an unhealthy motor. Case III: Damaged Rotor Figure 3 shows part of the
frequency resolved current
spectrum for coal mill rated 440 V, 150 HP operating in a utility plant. The full load speed is 955 rpm yielding a frequency interval of 48 Hz to 52 Hz for detection of broken rotor bars. Based on the supply current, the instrument predicted sidebands due to and thus disregard these slopes broken rotor bars to be positioned from the analysis thereby at 49.70 Hz and 50.57 Hz. These correctly identifying the current frequency positions are close to components due to broken rotor that of the supply frequency. bars. Figure 3 show the supply Case IV: Fault in gear box frequency to have a somewhat Figure 4 shows part of the wide declining current frequency resolved current component. This is caused by the spectrum for coal mill rated 440 motor being subjected to smaller V, 240 HP operating in a utility changes in load, i.e. smaller plant. The full load speed is 885 changes in supply current, during rpm yielding a frequency interval the data acquisition process. of 48 Hz to 52 Hz for detection of However, the peak detection broken rotor bars. The motor is algorithms embedded in the driving a coal mill through a instrument was able to detect the three-stage reduction gearbox, i.e. declining slopes of the supply the gearbox thus contains three
125 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 shafts. The output speed of the References: gearbox at full load conditions is 1] VAS, P. “Parameter Estimation, 18.46 rpm and the individual Condition Monitoring, and Diagnosis shaft speeds internal to the of Electrical Machines”, Clarendon Press, Oxford, 1993. gearbox are 48.60 rpm and 135.78 [2] Kilman, G. B., KoegL, R. A., rpm at full load conditions. Stein, J., Endicott, R. D., Madden, M. Conclusion: W. “Noninvasive Detection of Broken Motor Current Signature Rotor Bars in Operating Induction Analysis is an electric machinery Motors”, IEEE Trans. Energy Conv. monitoring technology. It (1988), 873–879. provides a highly sensitive, selective, and cost-effective.
126 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
A Review on Intrusion Detection System for Web Based Applications
R S Jagale, M M Naoghare (Department of Computer Engineering, SVIT/ Pune University, India) Email:[email protected])
ABSTRACT : Today Internet and applications related with Internet have become an important part of our day to day life. Internet gives us a strong communication media. We can manage our personal information from anywhere in the world. Internet provides banking services, E-mail, online Shopping and many other services. These applications generate large volume of data, and complexity. To manage this increase in application and data complexity, web services use multitier design where the web server runs the application in the frontend logic and data are stored in a database server as a back end. In this paper, we present network based Intrusion Detection System that models the network behavior of user sessions across both the frontend web server and the backend database. By analyzing both web and database requests, we are able to search out attacks that other IDS would not be able to identify. Furthermore, we quantify the limitations of any multitier IDS in terms of training sessions and functionality coverage. We proposed Intrusion Detection System using an Apache web server with MySQL and lightweight virtualization. It is an Intrusion Detection System used to detect attacks in multitier web application. Our approach can create normality models of isolated user sessions that include both the web front-end (HTTP) and back-end (File or SQL) network transactions. To achieve this, employ a lightweight virtualization technique to assign each users web session to a dedicated container, an isolated virtual computing environment. We use the container ID concept to accurately associate the web request with the Database queries. Thus, our IDS can build a mapping profile by taking both the web server request and Database SQL queries into consideration.
Keywords: Anomaly, Intrusion, SQL,
I. INTRODUCTION administrators. For a regular user, the web request Ru will There are four major attacks will generate the set of SQL queries be possible on the multi tier web Qu for an administrator, the based applications. request Ra will generate the 1.1 User Privilege Escalation set of admin level queries Qa. Attack Now assume that an attacker Let’s assume that the website login into the web server as a serves both regular users and normal user and changes his
127 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 privileges by user privilege Sessions by not generating any escalation attack. Then he Database queries for normal user triggers admin queries so as to requests. obtain an administrator’s data. This type of attack can never be detected by either the web server IDS or the database IDS since both Ru and Qa are legitimate requests and queries. Fig. 1.1 shows that how a normal Figure 1.1: User Privilege user may use admin queries to escalation attack obtain privileged information. 1.3 A SQL Injection Attack 1.2 Hijack Future User In SQL Injection attack Session Attack attacker scan concatenate the This type of attacks is mainly dynamic part like data or done at the web server side. An string contents to the static attacker first takes control of the part of the SQL query. This will web server and therefore hijacks generate the SQL Injection all subsequent user sessions to attack. This attack collects launch attacks. For instance, by important information from the hijacking other user sessions, the back end database. Since our attacker can send spoofed replies, approach provides two-tier and/or ignore user requests. A detection, even if the SQL session hijacking attack can be Injection attack are accepted by further categorized as a the web server, the relayed Spoofing/Man in-the-Middle contents to the Database server attack, a Denial-of- would not be able to take on the Service/Packet Drop attack, or a expected structure for the Replay attack. Fig. 1.2 illustrates given web server request. For a scenario where in a instance, since the SQL injection compromised web server can attack changes the structure of harm all the Future User the SQL queries, even if the
128 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
injected data were to go Database queries were within through the web server side, the set of allowed queries, it would generate SQL queries then the database IDS would in a different formats that not detect it either. Fig. 1.4 could be detected as a illustrates the scenario wherein deviation from the SQL query an attacker by passes the web format that would normally server to directly pass the SQL follow such a web request. Fig. query to the database. 1.3 illustrates the scenario of a SQL injection attacks are accepted by the web server
Figure 1.3: SQL Injection Attack
Figure 1.2: Hijack Future Session Attack 1.4 Direct Database Attack Figure 1.4: Direct Database In direct database attack an Attack attacker can bypass the web II. LITERATURE SURVEY server or firewalls and connect directly to the database. An A network Intrusion Detection attacker could also have System can be classified into two already taken the control of types: anomaly detection and web server and be trigger such misuse detection. Anomaly queries from the web server detection first requires the IDS without sending web requests. to define and characterize the The web server IDS could not correct and acceptable static able to detect unmatched web form and dynamic behaviour of request. Furthermore, if these the system, which can then be
129 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
used to detect abnormal changes reduce the number of replicated or anomalous behaviours [11], alerts, false positives, and non [12]. The boundary between relevant positives. It also fuses acceptable and anomalous forms the alerts from different levels of stored code and data is describing a single attack, with precisely definable. Behaviour the goal of producing a models are built by performing a succinct overview of security statistical analysis on historical related activity on the data [13], [18], [21] or by using network. It focuses primarily on rule-based approaches to specify abstracting the low-level sensor behaviour patterns [20]. An alerts and providing compound, anomaly detector then compares logical, high-level alert events to actual usage patterns against the users. Our IDS differs from established models to identify this type of approach that abnormal events. Our detection correlates alerts from approach belongs to anomaly independent IDSs. Rather, our detection, and we depend on a IDS operates on multiple feeds training phase to build the of network traffic using single correct model. As some IDS that looks across sessions legitimate updates may cause to produce an alert without model drift, there are a number correlating or summarizing the of approaches [19] that are alerts produced by other trying to solve this problem. independent IDSs. Our detection may run into the An IDS such as in [17] also same problem; in such a case, uses temporal information to our model should be retrained detect intrusions. Our IDS, for each shift Intrusion alerts however, does not correlate correlation [18] provides a events on a time basis, which collection of components that runs the risk of mistakenly transform intrusion detection considering independent but sensor alerts into succinct concurrent events as correlated intrusion reports in order to events. Our IDS does not have
130 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 such a limitation as it uses the virtualization techniques are container ID for each session to commonly used for isolation and causally map the related containment of attacks. events, whether they are However, in our IDS, we utilized concurrent or not. Virtualization the container ID to separate is used to isolate objects and session traffic as a way of enhance security performance. extracting and identifying causal Full virtualization and para- relationships between web server virtualization are not the only requests and database query approaches being taken. An events. alternative is a lightweight III. IMPLEMENTATION virtualization, such as OpenVZ DETAILS [3], Parallels virtuozzo [4], or Linux-VServer [16]. In general, we set up the threat model to these are based on some sort of include our assumptions and container concept. With the types of attacks We are containers, a group of processes aiming to protect against. still appears to have its own Attacks are generated from dedicated system, yet it is network and initiated by the web running in an isolated clients; they can launch environment. On the other application layer attacks to hand, lightweight containers can compromise the web servers have considerable performance they are connecting to. The advantages over full attacker can directly attack on virtualization or para web site database by ignoring virtualization. Thousands of web server. we assume that the containers can run on a single attacks can neither be detected physical host. There are also nor prevented by the currently some desktop systems [15], [14] available web server IDS, that that use lightweight attacker may take control of virtualization to isolate different the web server after the application instances. Such attack, and that afterward they
131 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
can obtain full control of the called as containers, disposable web server to launch subsequent web servers for client sessions. attacks. For example, the It can be possible to create attackers could modify the thousands of virtualized application logic of the web containers on a single server applications, or take the computer, and these virtualized control of other user’s web containers can be discarded, or requests, or intercept and modify quickly reinitialized to give the database related queries to service to new client request. access important data by ignoring their user privileges.
3.1 Architecture of proposed system
All network traffic, from both
legitimate users and adversaries, Figure 3.1: Classic three-tier is received and intermixed at the model same web server. If an attacker A single physical web server takes charge of the web server, runs many containers; each one he can badly affect all future is an exact copy of the original sessions. Assigning each session web server. Our approach to a dedicated web server is dynamically generates new not a practical option, as it containers and recycles used will increase load on the web ones. As a result, a single server resources. To achieve physical server can run similar performance while continuously and serve all web maintaining a low performance requests. However, from a logical and resource overhead, we use perspective, each session is lightweight virtualization assigned to a dedicated web Technique. In our IDS design server and isolated from other approach; we make use of sessions. Since we initialize each lightweight process containers,
132 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
virtualized container using a compromised directly by an read only clean template, we can attacker. Fig. 3.1 show the guarantee that each session will classic three-tier architecture. be served with a clean web server The database server acts as a instance at initialization. we back end and web server plays choose to separate the role of front end application communications at the session logic. From the database end, level so that a single user always we are not able to tell which deals with the same web server. transaction associate to which Sessions can represent different client request The users to some extent, and we communication between the expect the communication of a database server and the web single user to go to the same server is not separated, and it is dedicated web server, thereby difficult to understand the allowing me to identify suspect relationships among them. Fig. behavior by both session and 3.2 show that how user. If we detect abnormal communications are divided in behavior in a session, we will separate sessions and how treat all traffic within this database transactions can be session as tainted. associated to a corresponding If an attacker compromises a session. As express in the Fig. vanilla web server, other 3.1, if the session of Client 2 is session’s communications can malicious and takes the charge also be hijacked. In our of the web server, then it will Intrusion Detection system, an infect all upcoming database attacker can stay within the web related transactions, as well as server containers to which the response to the client. But, as currently he is connected, with per the Fig. 3.2, Client 2 will only no knowledge of the existence of use the VE 2, and the respective other session communications. database transaction set T2 will we can thus ensure that be the only affected section of legitimate sessions will not be data within the database.
133 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
tier setup. So that the web 3.2 Design of Normality server can distinguish sessions Model from different web clients, the This container based and SQL queries are mixed and all session separated web server from the same web server. architecture not only enhances the security performances but also provides us with the isolated information lows that are separated in each container session. It allows us to identify the mapping between the web server requests and the subsequent Database queries, and to utilize such a mapping Figure 3.2: Web server instances model to detect abnormal running in containers behaviors on a client session It is difficult for a database level. In typical three-tiered web server to determine which server architecture, the web SQL queries are the results of server receives HTTP requests which web requests. Even if from clients and then issues SQL we knew the application logic queries to the database server to of the web server and want to fetch and make the modification build a correct model, it would be in the data. The SQL queries are impossible to use such a model. mostly dependent on the web To detect attacks within huge server request. We wish to model amount of concurrent real such causal mapping time traffic unless we had a relationships of all legitimate mechanism to identify the traffic so as to detect combination of the HTTP abnormal/attack traffic. It is request and SQL queries that impossible to build such are generated by the HTTP mapping under a classic three- request. However, within our
134 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
container based web servers, it mapping model, it can be used to is a straightforward matter to detect abnormal behaviors of identify the causal pairs of user request. The web request web requests and resulting and the database queries within SQL queries in a given each session should be associated session. In addition, as traffic with the model. If there exists can easily be separated by any request or query that session, it is possible for us to violates the normality model compare and analyze the request within a session, then the session and queries across different will be consider as a possible sessions. Section 3.4 discusses attack. how to build the mapping model 3.3 Inferring Mapping by profiling session traffics. To Relations that end, we put sensors at both 3.3.1 Deterministic Mapping sides of the servers. At the web pattern server, our sensors are deployed Deterministic mapping model is on the host system and the one of the general and perfect cannot be attacked directly pattern. That means the web since only the virtualized request rm appears in all traffic containers are exposed to with the SQL quer y set Qn. The attackers. Our sensors will not mapping pattern is then express be attacked at the database as rm → Qn (Qn = ∅). For server either, as we assume any user session in the testing that the attacker cannot phase with the request rm , the completely take control of the absence of a query set Qn database server. In fact, we matching the request indicates assume that our sensors a possibility of intrusion. cannot be attacked and can Otherwise, if the Qn is always capture correct traffic present in the user session information at both ends. Fig. traffic without the corresponding 3.2 displays the locations of our rm , this may also be the sign of sensors. After building the an intrusion. In static websites,
135 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 this type of mapping adjusted the similar to the opposite case of majority of cases because the the Empty Query Set mapping same results should be returned pattern. These kind SQL for each time a user visits the queries cannot match with any same link. web requests, and we keep 3.3.2 Empty SQL Query Set these unmatched SQL queries in Pattern a set NMR. At the time In special cases, the SQL query of testing phase, any query set may be the empty set. So that within set NMR is considered the web request not generates valid. The size of NMR depends any database queries. For on web server application logic, example, when a web request for but it is most of the time it is retrieving an image GIF file from small. the same web server is made, a mapping relationship does not 3.3.4 Nondeterministic Mapping exist because only the web The same web request may result requests are observed. This type in different SQL query sets based of mapping is represented as rm on input parameters or the → ∅. During the testing phase, status of the web page at the we store these types of web time the web request is received. requests keep together in the But, these different SQL query query set EQS. sets don’t appear randomly, and there exists a candidate pool of 3.3.3 Request which not matched query sets (e.g.{Qa,Qb,Qc...}). with any pattern In some Each time that the same type of situation, the web server may web request arrives, it always periodically submit SQL queries matches up with one (and only to the database server to one) of the query sets in the conduct some prescheduled pool. The mapping pattern is rm tasks for database backup → Qi(Qi ε {Qa,Qb,Qc...}). system. These queries are not Therefore, it is difficult to generated by any web request, identify traffic that matches this
136 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
pattern. This happens only because the intrusion sensor has within dynamic websites, such as a more precise normality model blogs or forum sites. Fig. 3.3 that detects a wider range of illustrates all four mapping threats. We achieved this by patterns. isolating the flow of information from each web server session with a lightweight virtualization.
Acknowledgement It is a great pleasure to acknowledge those who Figure 3.3: Overall extended their support, and representation of mapping contributed time and psychic patterns energy for the completion of this paper. At the outset; I would IV. CONCLUSION We presented an intrusion like to thank my guide Prof. M M detection system that builds Naoghare, who served as models of normal behavior for sounding board for both contents multitier web applications from and programming work. Her both frontend web (HTTP) valuable and skilful guidance, requests and backend database assessment and suggestions from (SQL) queries. Unlike previous time to time improved the approaches that correlated or quality of work in all respects. I summarized alerts generated would like to take this by independent IDSs, Double opportunity to express my deep Guard forms a container based sense of gratitude towards her, IDS with multiple input for her invaluable contribution in streams to produce alerts. We completion of this paper. I am have shown that such correlation also thankful to Prof. S M of input streams provides a Rokade, Head of Computer better characterization of the Engineering Department for system for anomaly detection his timely guidance, inspiration
137 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
and administrative support [2] SANS, The Top Cyber without which my work would Security Risks, not have been completed. I am http://www.sans.org/top- also thankful to the all staff cyber-securityrisks/, 2011. members of Computer [3] Openvz, Engineering Department and http://wiki.openvz.org, Librarian, Sir Visvesvaraya 2011 Institute of Technology College [4] Virtuozzo Containers, of Engineering Chincoli, ,Nasik. http://www.parallels.com/p Also I would like to thank roducts/pvc45/, 2011. Colleagues and friends who [5] C. Anley, Advanced Sql helped me directly and indirectly Injection in Sql Server to complete this paper. Lastly my Applications, technical special thanks to my family report, Next Generation members for their support and Security Software, Ltd., cooperation during this paper 2002 work. [6] K. Bai, H. Wang, and P. Liu, REFERENCES Towards Database Proceedings Papers: Firewalls, Proc. Ann. IFIP [1] Meixing Le, WG 1.3 Working Conf. AngelosStavrou, Member, Data and Applications IEEE, and Brent ByungHo Security (DBSec 05), 2005 on Kang, “DoubleGuard: [7] B. Parno, J.M. McCune, D. Detecting Intrusions in Wendlandt, D.G. Multitier Web Andersen, and A.Perrig, Applications”, IEEE CLAMP: Practical TRANSACTIONS ON Prevention of Large-Scale DEPENDABLE AND Data Leaks,Proc. IEEE SECURE COMPUTING, Symp. Security and VOL. 9, NO. 4, privacy, 2009 JULY/AUGUST 2012. [8] T. Pietraszek and C.V. Berghe, Defending
138 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
against Injection Attacks WebBased Attacks,” Proc. through Context-Sensitive 10th ACM Conf. Computer String Evaluation, Proc. and Comm. Security (CCS Intl Symp. Recent ’03), Oct. 2003. Advances in Intrusion [14] Y. Huang, A. Stavrou, Detection (RAID 05), 2005 A.K. Ghosh, and S. [9] R. Sekar, An Efficient Black- Jajodia, “Efficiently Box Technique for Tracking Application Defeating Web Interactions Using Application Attacks, Lightweight Proc. Network and Virtualization,” Proc. Distributed System First ACM Workshop Security Symp. (NDSS), Virtual Machine Security, 2009 2008. [10] greensql, [15] S. Potter and J. Nieh, http://www.greensql.net/, “Apiary: Easy-to-Use 2011. Desktop Application Fault [11] H. Debar, M. Dacier, Containment on and A. Wespi, “Towards Commodity Operating a Taxonomy of Intrusion- Systems,” Proc. USENIX Detection Systems,” Ann. Technical Conf., Computer Networks, vol. 2010. 31, no. 9, pp. 805-822, [16] Linux-vserver, http://linux- 1999. vserver.org/, 2011 [12] T. Verwoerd and R. Hunt, [17] A. Seleznyov and S. “Intrusion Detection Puuronen, “Anomaly Techniques and Intrusion Detection Approaches,” Computer Systems: Handling Comm., vol. 25, no. 15, pp. Temporal Relations 1356-1365, 2002. between Events,” Proc. [13] C. Kruegel and G. Vigna, Int’l Symp. Recent “Anomaly Detection of Advances in Intrusion
139 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
Detection (RAID ’99), and Artificial Intelligence, 1999. 2009. [18] F. Valeur, G. Vigna, C. [20] M. Roesch, “Snort, Kru ¨ gel, and R.A. Intrusion Detection Kemmerer, “A System,” Comprehensive Approach http://www.snort.org, to Intrusion Detection 2011. Alert Correlation,” IEEE [21] M. Cova, D. Balzarotti, V. Trans. Dependable and Felmetsger, and G. Secure Computing, vol. 1, Vigna, “Swaddler: An no. 3, pp. 146-169, July- Approach for the Anomaly- Sept. 2004. Based Detection of State [19] A. Stavrou, G. Cretu- Violations in Web Ciocarlie, M. Locasto, and Applications,” Proc. Int’l S. Stolfo, “Keep Your Symp. Recent Advances in Friends Close: The Intrusion Detection (RAID Necessity for Updating ’07), 2007. an Anomaly Sensor with Legitimate Environment Changes,” Proc. Second ACM Workshop Security
140 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Implementation of Enhanced security on vehicular cloud computing
Ms.RajbhojSupriya K.1, Prof.Pankaj R. Chandre2
Abstract:-In a VC, multiple players caused by underutilized vehicular intermittent short- range resources including communications. We begin by computing power, data describing the VC models, storage, and Internet i.e.ad-hoc-based models and connectivity can be shared demonstrate algorithms to between rented out over the improve the scalability of Internet to various customers. security schemes and If the VC concept is to see a establishing trust wide adoption and to have relationships among multiple significant societal impact, players caused by security and privacy issues intermittent short- range need to be addressed. The communications. main contribution is to detect Index Terms— Challenge and examine a number of analysis, cloud computing, security challenges and privacy, security, vehicular potential privacy threats in cloud. VCs. Even though security I. INTRODUCTION issues has received the attention in cloud computing IN an work to help their vehicles and vehicular networks, we competent in the market, the identified security challenges vehicles manufacturers are that are special to VCs, e.g., offering increasingly more potent challenges of authentication onboard devices, including of high-mobility vehicles, powerful computers, a large scalability and the collection of wireless complexity of establishing transceivers. These device trust relationships among provide a set of customers that
141 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 expect their vehicles to provide vehicle insurance company unified extension of their home enquiry the accident, they cannot environment populated by link the accident to the driver refined entertainment centers, who caused it. Casually, the access to Internet, and other security issues faced in VCs may similar requirements and needs. look corruptly similar to those Powerful onboard devices experienced in other networks. support new applications, However, a more careful analysis including location-specific exposes that many of the classic services, online gaming, and security challenges are various forms of mobile intensified by the characteristic infotainment. features of VCs to the point Security and privacy where they can be construed as problems need to be addressed if VC-specific .The main the VC concept is to be widely contributions of this work are to adopted. Traditional network recognize and evaluate security systems try to prevent attackers challenges and privacy threats from arriving a system. In VC, all that are VC specific and to the users, including the propose a reasonable security attackers, are equal. The structure that addresses some of attackers can be physically the VC challenges recognized in located on one machine. The this paper. attackers can utilize system ambiguities to reach their goals, II. RELATED WORK such as obtaining confidential The security challenges in VC is a records and interfering with the new, interesting topic. Vehicles integrity of information and the will be independently shared to availability of resources. Suppose create a cloud that can provide that an accident has taken place services to certified users. This at center, and the accident will be cloud can provide real-time reported to the VC. The driver services, such as mobile responsible for the accident can systematic laboratories, conquer the VC and modify the intelligent transportation accident record. In Future, when systems and smart electric power the law enforcement or the grids. Vehicles will share the
142 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 ability of calculating power, PKI is a method that is well Internet access, and storage to suited for security purposes, form conventional clouds. These particularly for roadside researchers have only focused on infrastructure. Geo Encrypt in providing a structure for VC VANETs has been proposed by computing, but as already said, Yan et al [12]. The idea is to use the problem of privacy and the geographic location of a security has not yet been movable device to create a secret mentioned in the literature. As key. Encrypted messages with pointed out by Hasan [8], cloud the secret key, and the encoded security becomes one of the texts are sent to receiving major obstacles of a widespread movable device. The receiving adoption of traditional cloud movable device must be services. Generalizing the physically present in a certain recommendations of [8], we geographic region specified by expect that the same problems the sender to be able to decrypt will be present in VCs. the message. Now a days, vehicular ad In recent times, some hoc network (VANET) security attention has been given to the and privacy have been addressed general security problem in by a large number of papers. Yan clouds, although not associated et al [9], [10] proposed active and with vehicular networks [13]. passive location security The simple solution is to control algorithms. Public Key access to the cloud hardware Infrastructure (PKI) and digital facilities. This can minimize risks signature-based methods have from insiders [14]. Santos et al been well exposed in VANETs [15] proposed a new platform to [11]. A certificate authority (CA) achieve trust in conventional generates public and private keys clouds. A trust coordinator for nodes. The use of digital maintained by an external third signature is to validate and party is imported to validate the authenticate the sender. delivered cloud manager, which The main use of encryption makes a set of virtual machines is to reveal the content of (VMs) such as Amazon’s E2C messages only to entitled users. (i.e., Infrastructure as a Service,
143 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 IaaS) available to users. homomorphismauthenticator and Garfinkelet al [16] proposed a random masking to secure cloud solution to prevent the owner of data andpreserve privacy of a physical host from retrieving public cloud data. The bilinear and snooping with the services on aggregatesignature has been the host. Berger et al [17] and extended to simultaneously audit Murray et al. [18] adopted a multipleusers. Ristenpartet al. similar solution. When a VM [23] resented experiments of boots up, systeminformation such locatingco-residence of other as the basic input output system users in cloud VMs. (BIOS), systemprograms, and all the service applications is III. IMPLEMENTATION recorded, anda hash value is DETAILS generated and transmitted to a In a previous papers, Prof. Olariu third-party TrustCenter. For have promoted the vision of every period of time, the system vehicular clouds (VCs), a will collect systeminformation of nontrivial extension, along the BIOS, system programs, and several dimensions, of all the serviceapplications and conventional cloud computing. In transmit the hash value of a VC, underutilized vehicular system informationto the third- resources including computing party Trust Center. The Trust power, storage, and Internet Center can evaluatethe trust connectivity can be shared value of the cloud. Krautheim between renters out over the [19] also proposeda third party to Internet to various customers. share the responsibility of The security challenges are security in cloudcomputing addressed of a novel perspective between the service provider and of VANETs. This system have client, decreasingthe risk first introduced the security and disclosure to both. Jensen et al privacy challenges that VC [20] stated technicalsecurity computing networks have to face, issues of using cloud services on and system also addressed the Internet access.Wang et al possible security solutions. [21], [22] proposed public-key- Although some of the solutions based can leverage existing security
144 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 techniques, there are many Each vehicle has long-term PKI unique challenges. For example, public/private key pairs: attackers can physically locate on • private key: S; the same cloud server. • public key: g, p, T,where The below figure shows the T = gSmodp. system block diagram, theflow of It should be noted that a message how to generate the truth m can be combined as m—T, relationship. where T is the timestamp. The timestamp can ensure the freshness of the message. For each message m to be signed, three steps are followed. 1. Generate a per-message public/private key pair of Sm
(private) and Tm = gSm mod p Fig 1: System Block Diagram. (public). 2. Compute the message digest The signature of the safety dm = H(m|Tm) and the message signature X = S + dmS mod (p - message can be described as m follows: Following the ElGamal 1), where mod is the modulo signature scheme defines the operation and | is the parameters. concatenation operator. 1. Generate user global userset 3. Send m, Tm, and X. 2. If currentuser user then To verify the message, three 3. H: a collision-free hash steps are followed. function; 1. Compute the message digest 4. p: a large prime number that dm = H(m—Tm). x will ensure that computing 2. Compute Y1 = g and Y2 = discrete logarithms modulo p is TmTdm. very difficult; 3. compare Y1 = Y2. If Y1 = Y2, 5. g smaller than p: a randomly then the signature is correct. chosen generator out of a The reason is:Y1=gX multiplicative group of integers =gSm+dmS =gSmgdmS modulo p. =TmgSdm =TmTdm =Y 2.
145 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 communication is inherently unstable and intermittent. We have provided a directional security scheme to show an appropriate security architecture that handles several, not all, challenges in VCs. Now we have investigated the brand-new area Fig 2: System Architecture and design solutions for each The above figure shows the individual challenge. Many system architecture how the applications are developed on sender sends the data then it will VCs. In proposed work a special get sliced and passed to the application will need to analyze mobile devices and again it will and provide security solutions. get desliced so the other usre will Extensive work of the security not be able to use the data and and privacy in VCs will become a passed to the receiver.using it we complex system and need a will maintain the security of VC systematic and synthetic way to computing. implement intelligent transportation systems. IV. RESULT AND CONCLUSION We have first proposed the REFERENCES security and privacy challenges [1] Gongjun Yan, Ding Wen, that VC computing networks Stephan Olariu, and Michele C. have to face, and have also Weigle, “ Security Challenges in Vehicular Cloud Computing”, addressed possible security IEEE transactions on solutions. Even though some of intelligenttransportation the solutions can leverage systems Syst., 2012. vol. 14, no. existing security techniques, 1, march 2013. there are many distinctive [2] S. Arif, S. Olariu, J. Wang, G. challenges. Yan, W. Yang, and I. Khalil, For example, attackers can “Datacenter at the airport: physically locate on the same Reasoning about time- cloud server. The vehicles have dependent parking lot high mobility, and the
146 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
occupancy”, IEEETrans. [9] G. Yan, S. Olariu, and M. Parallel Distrib. Syst., 2012. C.Weigle, “Providing VANET [3] S. Olariu, M. Eltoweissy, and security through active position M. Younis, “Toward detection”, Comput. Commun., autonomous vehicular clouds”, vol. 31, no. 12, pp. 2883- 2897, ICST Trans. MobileCommun. Jul. 2008, Special Issue on Comput., vol. 11, no. 7âAS9, pp. Mobility Protocols for 1 to 11, Jul.âASSep. 2011. ITS/VANET. [4] S. Olariu, I. Khalil, and M. [10] G. Yan, S. Olariu, and M. Abuelela,“Taking VANET to Weigle, “Providing location the clouds” , Int. J. Pervasive security in vehicular ad hoc Comput.Commun., vol. 7, no. 1, networks” IEEEWireless pp.7-21, 2011. Commun., vol. 16, no. 6, pp. 48- [5] G. Yan and S. Olariu, “A 55, Dec 2009. probabilistic analysis of link [11] J. Sun, C. Zhang, Y. Zhang, and duration in vehicular ad hoc Y. M. Fang, “An identity-based networks”, IEEE Trans.Intell. security system for user privacy Transp. Syst., vol. 12, no. 4,pp. in vehicular ad hoc networks”, 1227-1236, Dec. 2011. IEEE [6] D. Huang, S. Misra, G. Xue, and Trans.ParallelDistrib.Syst., vol. M. Verma, “PACP: An efficient 21, no. 9, pp. 1227-1239, Sep. pseudonymous authentication 2010. based conditional privacy [12] G. Yan and S. Olariu, “An protocol for vanets”,IEEE efficient geographic location- Trans.Intell. Transp. Syst., vol. based security mechanism for 12, no. 3, pp. 736-746, Sep.2011. vehicular ad hoc networks”, in [7] J. Li, S. Tang, X. Wang, W. Proc. IEEE Int. Symp. Duan, and F.-Y. Wang, TSP,Macau SAR, China, Oct. “Growing artificial 2009, pp. 804-809. transportation systems: A [13] A. Friedman and D. West, rulebased iterative design “Privacy and security in cloud process”, IEEE Trans. computing”,Centerfor Intell.Transp. Syst, vol. 12, no. Technology Innovation: Issues 2, pp. 322-332, Jun. 2011. in Technology Innovation, no. [8] R. Hasan, Cloud Security. 3, pp.1-11, Oct. 2010. [Online]. [14] J. A. Blackley, J. Peltier, and T. Available:http://www.ragibhasa R. Peltier,“ Information n.com/research/cloudsec.html. Security Fundamentals”,. New York: Auerbach, 2004.
147 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
[15] N. Santos, K. P. Gummadi, and IEEE Int. Conf. Cloud Comput., R. Rodrigues, “Toward trusted 2009, pp. 109-116. cloud computing”, in Proc. [21] C. Wang, Q. Wang, K. Ren, and HotCloud, Jun. 2009. W. Lou, “Privacy preserving [16] T. Garfinkel, B. Pfaff, J. Chow, public auditing for data storage M. Rosenblum, and D. B. Terra, security in cloud computing”, in “Virtual machine-based Proc. IEEE INFOCOM, platform for trusted SanDiego, CA, 2010, pp1-9. computing”, in Proc. ACM [22] Q.Wang, C.Wang, J. Li, K. Ren, SOSP, 2003,pp. 193-206. andW. Lou, “Enabling public [17] S. Berger, R. CÃaceres, K. A. verifiability and data dynamics Goldman, R. Perez, R. Sailer, for storage security in cloud and L. van Doorn, “VTPM: computing”, in Virtualizing the trusted Proc.14thESORICS, 2009, pp. platform module”, in Proc.15th 355-370. Conf.USENIX Sec. Symp., [23] F.-Y. Wang, “Parallel control Berkeley, CA, 2006, pp. 305-320. and management for intelligent [18] D. G. Murray, G. Milos, and S. transportation systems: Hand, “Improving XEN Concepts, architectures, and security through applications”, IEEE Tran. disaggregation”, in Proc.4th Intell.Transp. Syst., vol. 11, no. ACM SIGPLAN/SIGOPS Int. 3, pp. 630-638, Sep. 2010. Conf.IEEE, NewYork, 2008, pp. [24] H. Xie, L. Kulik, and E. Tanin, 151-160. “Privacy-aware traffic [19] F. J. Krautheim, “Private monitoring”, IEEE Trans. virtual infrastructure for cloud Intell. Transp. Syst., vol. 11, no. computing”, in Proc. Conf. Hot 1, pp.61-70, Mar. 2010. Topics CloudComput., 2009, [25] L. Li, J. Song, F.-Y. Wang, W. pp.1-5. Niehsen, and N. Zheng, “IVS [20] M. Jensen, J. Schwenk, N. 05: New developments and Gruschka, and L. L. Iacono, research trends for intelligent “On technical security issues in vehicles”, IEEE Intell.Syst., cloud computing”, in Proc. vol.20, no. 4, pp. 10-14, Jul./Aug. 2005.
148 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 DOCUMENT CLUSTERING FOR FORENSIC ANALYSIS & INVESTIGATION
Mr. Dhokane R.M.1, 1SVIT, Chincholi, Tal: Sinner, Dist: Nashik. [email protected]
Prof. Rokade S.M.2 2 SVIT, Chincholi, Tal: Sinner, Dist: Nashik. [email protected]
Abstract: in computer carrying out extensive forensic analysis, millions of experimentation with six files are usually examined. well-known clustering Most of the data in those files algorithms (K-medoids, K- consists of un-structured means, Complete Link, Single text, whose examination by Link, Average Link, and computer examiners is CSPA) applied to five real- difficult to be performed. In world datasets obtained from this context, automatic computers grabbed in real- methods of analysis are of world investigations. great interest. In particular, Experiments have been algorithms for clustering performed with different documents has facility of the combinations of parameters, discovery of new and useful resulting in 16 different knowledge from the instantiations of algorithms. documents under analysis. I In addition, two relative present an approach that validity indexes were used to applies document clustering automatically estimate the algorithms to forensic number of clusters. Related investigation of Computers studies in the literature are grabbed in police significantly more limited investigations. I illustrate the than our study. Our proposed approach by experiments show that the
149 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Average Link and Complete collect and analyze data from Link algorithms provide the computer systems in a way that best results for our is admissible as evidence in a application domain. If court of law. In our particular suitably initialized, application domain, it usually partitional algorithms (K- involves examining hundreds of means and K-medoids) can thousands of replace (copy-paste) also yield to very good the content with your own results. Finally, I also present material. Files per computer. and discuss several practical This activity exceeds the expert’s results that can be useful for ability of analysis and researchers and interpretation of data. Therefore, Practitioners of forensic methods for automated data computing. analysis, like those widely used for machine learning and data Keywords—Clustering, mining, are of paramount forensic computing, text importance. In particular, mining. algorithms for pattern recognition from the information present in text documents are I. INTRODUCTION IT IS estimated that the promising, as it will hopefully volume of data in the digital become evident later in the world increased about 18 times paper. the amount of information Clustering algorithms are present in all the books ever usually used for exploratory data written and it continues to grow analysis, where there is few or no exponentially. This large amount prior knowledge about the data of data has a direct impact in [2], [3]. This is precisely the case Computer Forensics, which can in several applications of be broadly defined as the Computer Forensics, including discipline that combines elements the one addressed in our work. of law and computer science to From a more technical point of
150 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 view, our datasets consist of cluster [2], [3]. Thus, once a data unlabeled objects the classes or partition has been induced from categories of documents that can data, the expert examiner might be found are a priori unknown. initially focus on reviewing Moreover, even assuming that representative documents from labeled datasets could be the obtained set of clusters. available from previous analyses, Then, after this preliminary there is almost no hope that the analysis, (s) he may eventually same classes (possibly learned decide to scrutinize other earlier by a classifier in a documents from each cluster. By supervised learning setting) doing so, one can avoid the hard would be still valid for the task of examining all the upcoming data, obtained from documents (individually) but, other computers and associated even if so desired, it still could be to different investigation done. In a more practical and processes. More precisely, it is realistic scenario, domain experts likely that the new data sample (e.g., forensic examiners) are would come from a different scarce and have limited time population. In this context, the available for performing use of clustering algorithms, examinations. Thus, it is which are capable of finding reasonable to assume that, after latent patterns from text finding a relevant document, the documents found in grabbed examiner could prioritize the computers, can enhance the analysis of other documents analysis performed by the expert belonging to the cluster of examiner. interest, because it is likely that The rationale behind these are also relevant to the clustering algorithms is that investigation. Such an approach, objects within a valid cluster are based on document clustering, more similar to each other than can indeed improve the analysis un-structured text they are to of grabbed computers, as it will objects belonging to a different be discussed in more detail later.
151 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Clustering algorithms have been application domain using five studied for decades, and the real-world investigation cases literature on the subject is huge. conducted by the Brazilian Therefore, I decided to choose a Federal Police Department. In set of (six) representative order to make the comparative algorithms in order to Show the analysis of the algorithms more potential of the proposed realistic, two relative validity approach, namely: the partitional indexes (Figure [4] and its K-means [3] and K-medoids [4], simplified version [7]) have been the hierarchical used to estimate the number of TABLE I clusters automatically from Data. SUMMARY OF ALGORITHMS It is well-known that the AND THEIR PARAMETERS number of clusters is a critical parameter of many algorithms and it is usually a priori unknown. As far as I know, however, the automatic estimation of the number of clusters has not been investigated in the Computer Forensics literature. Actually, I could not even locate one work Single/Complete/Average Link that is reasonably close in its [5], and the cluster ensemble application domain and that algorithm known as CSPA [6]. reports the use of algorithms These algorithms were run with capable of estimating the number different combinations of their of clusters. Perhaps even more parameters, resulting in sixteen surprising is the lack of studies different algorithmic on hierarchical clustering instantiations, as shown in Table algorithms, which date back to I. Thus, as a contribution of our the sixties. Our study considers work, I compare their relative such classical algorithms, as well performances on the studied
152 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 as recent developments in present in all the books ever clustering, such as the use of written and it continues to grow consensus partitions [6]. The exponentially. This large amount present paper extends our of data has a direct impact in previous work [23], where nine Computer Forensics, which can different instantiations of be broadly defined as the algorithms were analyzed. As discipline that combines elements previously mentioned, in our of law and computer science to current work I employ sixteen collect and analyze data from instantiations of algorithms. In computer systems in a way that addition, I provide more is admissible as evidence in a insightful quantitative and court of law. qualitative analyses of their 2. Clustering algorithms are experimental Results in our typically used for exploratory application domain. data analysis, where there is few The remainder of this or no prior knowledge about the paper is organized as follows. data [2], [3]. This is precisely the Section II presents Literature case in several applications of Survey. Section III briefly Computer Forensics, including addresses the implementation i.e. the one addressed in our work. adopted clustering algorithms From a more technical viewpoint, and preprocessing steps. Section our datasets consist of unlabeled IV reports our experimental objects the classes or categories results, and Section V addresses of documents that can be found some future works of our study. are a priori unknown. Finally, Section VI concludes the 3. There are only a few studies paper. reporting the use of clustering II. LITERATURE SURVEY algorithms in the Computer 1. Digital world increased from Forensics field. Essentially, most 161 hexabytes in 2006 to 988 of the studies describe the use of hexabytes in 2010 [1] about 18 classic algorithms for clustering times the amount of information data e.g., Expectation-
153 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Maximization (EM) for documents found by the user unsupervised learning of anymore. An integrated Gaussian Mixture Models, K- environment for mining e-mails means, Fuzzy C-means (FCM), for forensic analysis, using and Self-Organizing Maps classification and clustering (SOM). These algorithms have algorithms, was presented in well-known properties and are [10]. In a related application widely used in practice. For domain, e-mails are grouped by instance, K-means and FCM can using lexical, syntactic, be seen as particular cases of EM structural, and domain-specific [21]. Algorithms like SOM [22], features [11]. Three clustering in their turn, generally have algorithms (K-means, Bisecting inductive biases similar to K- K-means and EM) were used. means, but are usually less The problem of clustering e-mails computationally efficient. In [8], for forensic analysis was also SOM-based algorithms were used addressed in [12], where a for clustering files with the aim Kernel-based variant of K-means of making the decision-making was applied. The obtained results process performed by the were analyzed subjectively, and examiners more efficient. The the authors concluded that they files were clustered by taking are interesting and useful from into account their creation an investigation perspective. dates/times and their extensions. 4. More recently [13], a FCM- This kind of algorithm has also based method for mining been used in [9] in order to association rules from forensic cluster the results from keyword data was described. The searches. literature on Computer Forensics 3. The underlying assumption is only reports the use of algorithms that the clustered results can that assume that the number of increase the information retrieval clusters is known and fixed a efficiency, because it would not priori by the user. Aimed at be necessary to review all the relaxing this assumption, which
154 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 is often unrealistic in practical for text mining, in which applications, a common approach documents are represented in a in other domains involves vector space model [15]. In this estimating the number of model, each document is clusters from data. Essentially, represented by a vector one induces different data containing the frequencies of partitions (With different occurrences of words, which are numbers of clusters) and then defined as delimited alphabetic assesses them With a relative strings, whose number of validity index in order to characters is between 4 and 25. estimate the best value for the I also used a number of clusters [2], [3], [14]. dimensionality reduction This work makes use of such technique known as Term methods, thus potentially Variance (TV) [16] that can facilitating the work of the expert increase both the effectiveness examiner who in practice would and efficiency of clustering hardly know the number of Algorithms. TV selects a number clusters a priori. of attributes (in our case 100 III. IMPLEMENTATION words) that have the greatest DETAILS. variances over the documents. In A. Pre-Processing Steps order to compute distances Before running clustering between documents, two algorithms on text datasets, I measures have been used, performed some preprocessing namely: cosine-based distance steps. In particular, stop words [15] and Levenshtein- based (prepositions, pronouns, articles, distance [17]. The later has been and irrelevant document used to calculate distances metadata) have been removed. between file (document) names Also, the Snowball stemming only. algorithm for Portuguese words has been used. Then, I adopted a traditional statistical approach
155 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
B. Estimating the Number of process, I am also estimating the Clusters from Data number of clusters. In order to estimate the A widely used relative number of clusters, a widely used validity index is the so-called approach consists of getting a set Figure [4], which has also been of data partitions with different adopted as a component of the numbers of clusters and then algorithms employed in our work. selecting that particular partition Therefore, it is helpful to define that provides the best result it even before I address the according to a specific quality clustering algorithms used in our criterion (e.g., a relative validity study. index [2]–[5]). Such a set of Let us consider an object partitions may result directly belonging to cluster A. The from a hierarchical clustering average dissimilarity of to all dendro gram or, alternatively, other objects of A is denoted by a from multiple runs of a (i). Now let us take into account partitional algorithm (e.g., K- cluster C. The average means) starting from different dissimilarity of i to all objects of numbers and initial positions of C will be called d (i,C) . After the cluster prototypes (e.g., see computing d(i,C) for all clusters [14] and references therein). C≠A, the smallest one is selected, For the moment, let us i.e.(i)= mid d(i,C) , C≠A. This assume that a set of data value represents the dissimilarity partitions with different numbers of i to its neighbor cluster, and of clusters is available, from the Figure for a give object, s (i), which I want to choose the best is given by: one according to some relative validity criterion. Note that, by S (i) = ------(1) choosing such a data partition, I am performing model selection It can be verified that - s (i) and, as an intrinsic part of this 1≤ ≤+1. Thus, the higher s (i) the better the assignment of object to
156 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 a given cluster. In addition, if is centroid. Thus, it is necessary to equal to zero, then it is not clear compute only one distance to get whether the object should have the a (i) value, instead of been assigned to its current calculating all the distances cluster or to a neighboring one Between and the other objects of [4]. Finally, if cluster A is a A. Similarly, instead of singleton, then s (i) is not defined computing d (i,C) as the average and the most neutral choice is to dissimilarity of to all objects of set s (i) =0 [4]. Once I have C, C≠A, I can now compute the computed s (i) over i=1, 2, 3….N, distances between I and the where N is the number of objects centroid of C. in the dataset, I take the average Note that the over these values, and the computation of the original resulting value is then a Figure [4], as well as of its quantitative measure of the data simplified version [7], [14], partition in hand. Thus, the best depends only on the achieved clustering corresponds to the partition and not on the adopted data partition that has the clustering algorithm. Thus, these maximum average Figure. Figures can be applied to assess The average Figure just partitions (taking into account addressed depends on the the number of clusters) obtained computation of all distances by several clustering algorithms, among all objects. In order to as the ones employed in our come up with amore study and addressed in the computationally efficient sequel. criterion, called simplified C. Clustering Algorithms Figure, one can compute only the The clustering algorithms distances among the objects and adopted in our study the the centroids of the clusters. The partitional K-means [2] and K- term a (i) of (1) now corresponds medoids [4], the hierarchical to the dissimilarity of object to its Single/Complete/Average Link corresponding cluster (A) [5], and the cluster ensemble
157 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 based algorithm known as CSPA which distant objects from each [6] are popular in the machine other are chosen as starting learning and data mining fields, prototypes [18]. Unlike the and therefore they have been partitional algorithms such as K- used in our study. Nevertheless, means/medoids, hierarchical some of our choices regarding algorithms such as Single/ their use deserve further Complete/Average Link provide a comments. For instance, K- hierarchical set of nested medoids [4] is similar to K- partitions [3], usually means. However, instead of represented in the form of a computing centroids, it uses dendro gram, from which the best medoids, which are the number of clusters can be representative objects of the estimated. In particular, one can clusters. This property makes it assess the quality of every particularly interesting for partition represented by the applications in which (i) dendro gram, subsequently Centroids cannot be computed; choosing the one that provides and (ii) Distances between pairs the best results [14]. of objects are available, as for The CSPA algorithm [6] computing dissimilarities essentially finds a consensus between names of documents clustering from a cluster with the Levenshtein distance ensemble formed by a set of [17]. different data partitions. More Considering the precisely, after applying partitional algorithms, it is clustering algorithms to the data, widely known that both K-means a similarity (co association) and K-medoids are sensitive to matrix [19] is computed. Each initialization and usually element of this matrix represents converge to solutions that pair-wise similarities between represent local minima. Trying to objects. The similarity between minimize these problems, I used two objects is simply the fraction a nonrandom initialization in of the clustering solutions in
158 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 which those two objects lie in the the relative validity index) is same cluster. Later, this taken as the result of the similarity measure is used by a clustering process. For each clustering algorithm that can partitional algorithm (K- process a proximity matrix e.g., means/medoids), K-medoids to produce the final I execute it repeatedly for consensus clustering. The sets of an increasing number of clusters. data partitions (clustering) were For each value of K, a number of generated in two different ways: partitions achieved from different (a) by running K-means 100 initializations are assessed in times with different subsets of order to choose the best value of attributes (in this case CSPA K and its corresponding data processes 100 data partitions); partition, using the Figure [4] And (b) by using only two data and its simplified version [7], partitions, namely: one obtained which showed good results in by K-medoids from the [14] and is more computationally dissimilarities between the file efficient. In our experiments, I names, and another partition assessed all possible values of K achieved with K-means from the in the interval [2, N], where N is vector space model. In this case, the number of objects to be each partition can have different clustered. weights, which have been varied D. Dealing with Outliers between 0 and 1 (in increments I assess a simple approach to of 0.1 and keeping their sum remove outliers. This approach equals to 1). For the hierarchical makes recursive use of the algorithms Figure. Fundamentally, if the (Single/Complete/Average Link), best partition chosen by the I simply run them and then Figure has singletons (i.e., assess every partition from the clusters formed by a single object resulting dendro gram by means only), these are removed. Then, of the Figure [4]. Then, the best the clustering process is repeated partition (elected according to over and over again until a
159 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 partition without singletons is I used five datasets found. At the end of the process, obtained from real-world all singletons are incorporated investigation cases conducted by into the resulting data partition the Brazilian Federal Police (for evaluation purposes) as Department. Each dataset was single clusters. Table I obtained from a different hard summarizes the clustering drive, being selected all the non- algorithms used in our work and duplicate documents with their main characteristics. extensions “doc”, “docx”, and “odt”. Subsequently, those E. Experimental evaluation documents were converted into 1. Datasets: plain text format and Sets of documents that preprocessed as described in appear in computer forensic Section III-A. The obtained data analysis applications are quite partitions were evaluated by diversified. In particular, any taking into account that I have a kind of content that is digitally reference partition (ground truth) compliant can be subject to for every dataset. Such reference investigation. In the datasets partitions have been provided by assessed in our study, for an expert examiner from the instance, there are textual Brazilian Federal Police documents written in different Department, who previously languages (Portuguese and inspected every document from English). Such documents have our collections. The datasets been originally created in contain varying amounts of different file formats, and some documents (N), groups (K), of them have been corrupted or attributes (D), singletons (S), and are actually incomplete in the number of documents per group sense that they have been (#), as reported in Table II. (partially) recovered from deleted data.
160 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 TABLE II on the Adjusted Rand Index [3], DATASET CHARACTERISTICS [20], which measures the agreement between a partition P, obtained from running a clustering algorithm, and the
reference partition R given by the 2. Evaluation Measure From a scientific expert examiner. More perspective, the use of reference specifically, ARI € [0, 1] and the partitions for evaluating data greater its value the better the clustering algorithms is agreement between P and R. considered a principled approach. F. Data Flow Diagram In controlled experimental settings, reference partitions are usually obtained from data generated synthetically according to some probability distributions. From a practical standpoint, reference partitions are usually obtained in a different way, but they are still employed to choose a particular clustering algorithm that is more appropriate for a given application, or to calibrate its parameters. In our case, reference partitions were constructed by a domain expert and reflects the expectations that (s) he has about the clusters that should be found in the datasets. In this sense, the evaluation method that I used to assess the obtained data partitions is based
161 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
IV. RESULTS AND especially for datasets A and B. DISCUSSIONS This result can be explained by Table III summarizes the the presence of outliers, whose obtained ARI results for the chain effect is known to impact algorithms listed in Table I. In Single Link performance [2]. general, AL100 (Average Link The results achieved by algorithm using the 100 terms Kmd100* and KmsT100* were with the greatest variances, also very good and competitive to cosine- based similarity, and the best hierarchical algorithms Figure criterion) provided the (AL100 and CL100). I note that, best results with respect to both as expected, a the average and the standard TABLE III deviation, thus suggesting great ADJUSTED RAND INDEX (ARI) RESULTS accuracy and stability. Note also that an ARI value close to 1.00 indicates that the respective partition is very consistent with the reference partition this is precisely the case here. In this table, I only report the best obtained results for the algorithms that search for a consensus partition between file name and content (NC100 and NC) i.e., partitions whose weights for name/content resulted in the greatest ARI value. The ARI values for CL100 are similar to those found by AL100. Single Link (SL100), by its turn, presented worse results than its Fig. 1. Figure for Kms100, hierarchical counterparts AL100, and Kms100 (dataset E).
162 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Good initialization procedure random initialization of such as the one described in [18] prototypes (Kms100). However, provides the best results. the ARI values for Kms100*, Particularly, the initialization on which uses distant objects can minimize the K-means/medoids problems with respect to local minima. To illustrate this aspect, Fig. 1 shows the Figure values as a function of K for three algorithms (Kms100, Kms100*, and AL100). One can observe that Kms100 (with random initialization of prototypes), Fig. 2. Figure Kms100 presents more local maxima for (Dataset D). the Figure (recall that these were The simplified Figure to obtained from local minima of K- estimate, are worse than those means), yielding to less stable Obtained for, KmsT100*, results. Conversely, Kms100* Kmd100* and the hierarchical (initialization on distant objects algorithms. Recall that these [18]) has fewer local maxima, algorithms use the traditional being more stable. This trend has version of the Figure. This also been observed in the other observation suggests that the datasets. Surprisingly, Kms100* simplified Figure provided worse has curves similar to those of estimates for the values of K. AL100, especially for higher This can be explained values of K. This fact can be from the fact that the simplified explained, in part, because both Figure essentially computes algorithms tend to separate distances between objects and outliers. It can also be observed centroids, whereas the traditional that Kms100* got slightly better Figure takes all the pair-wise results than its variant with distances between objects into
163 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 account in order to estimate. In underestimates the intra cluster other words, there is some distances. information loss in the As a consequence, in the computation of the simplified trade-off relationship between Figure, which indeed provided cluster compactness and cluster smaller values for K than both separability, the simplified the Figure and the reference Figure value tends to be higher partition in four out of five than the one find by Figure for a datasets. I also point out that given data partition. Also, the both the Figure and its simplified higher the number of clusters the version estimate the number of better the simplified Figure clusters by taking into account estimate for the compactness, two concepts: cluster approaching the value estimated compactness (average intra by Figure and becoming equal in cluster dissimilarity) and cluster the extreme case (K=N), as separability (inter-cluster shown in Fig. 2 (Dataset D). dissimilarity). Both are Similar figures have been materialized by computing observed for the other datasets. average distances. I observed Thus, the simplified Figure that the average distance from a favors lower values for K than given object to all the objects of a the Figure. cluster tends to be greater than The use of the file names the distance of that object to the to compute dissimilarities cluster’s centroid. Moreover, between documents in principle such a difference tends to be seemed to be interesting, because greater when computing intra one usually chooses meaningful cluster distances (compactness) file names. However, it turns out compared to calculating inter that, empirically, the use of only cluster distances (separability). the file names to compute In other words, it is more likely that the simplified Figure
164 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 algorithms that search for a consensus partition between those formed by name and content. The ARI values are shown for different weights that capture the (imposed) relative importance of name and content. Some particular combinations of values for the weights, like when Fig. 3. ARI values NC and the content weight is comprised NC100 algorithms (Dataset C). between 0.1 and 0.5, provided worse results compared to the The dissimilarity between standalone use of either name or documents did not bring good content, thus suggesting that results in general e.g., see results mixing information of different for KmdLev and KmdLevS in nature may be prejudicial to the Table III. On the other hand, clustering process. However, one these results may not be can see that NC100 shows a considered surprising because the (secondary) peak of the ARI value name of the file provides less (for content weight equals to 0.6). information than the file content. Although the primary peak However, there is an exception to suggests using only the data this general behavior that can be partition found from content observed from the relatively good information, it seems that adding results of these algorithms on information about the file name Dataset D, thus suggesting that may indeed be useful. To wrap up the file name, although less this discussion, the use of the file informative than the file content, name can help the clustering can help the clustering process. process, but it seems to be highly In addition, let us now analyze data dependent. Fig. 3 (for Dataset C), which Let us now focus on the shows the ARI values for the performance of E100. Recall from
165 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Table I that this is a particular Partitions, which are the inputs instantiation of the CSPA [6] for the computation of the co algorithm. More specifically, it association matrix. Thus, a obtains a consensus partition misleading consensus clustering from a set of partitions generated is obtained. Therefore, the choice by K-means by randomly of random sets of attributes to selecting 100 different attributes. generate partitions for consensus Because of both the high sparsity clustering algorithms seems to be of the data matrix (common to an inappropriate approach for text datasets) and the random such text data. attribute selection, many Considering the documents are represented by algorithms that recursively apply very sparse vectors. the Figure for removing Consequently, such documents singletons (KmsS and Kms100S), have been grouped into a single Table III shows that their results cluster, whose centroid is also are relatively worse when very sparse and with component compared to the related versions values close to zero. Such that do not remove singletons centroids induce misleading (Kms and Kms100). However, TABLE IV KmdLevS, which is based on the EXAMPLE OF THE similarities between file names, INFORMATION FOUND IN presented similar results to those THE CLUSTERS found by its related version that does not remove singletons (KmdLev). In principle, one could expect that the removal of outliers, identified from carefully analyzing the singletons, could yield to better clustering results. However, I have not observed this potentially good property in our experiments and, as
166 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 expected, this aspect is rather the investigated datasets, the data-dependent. As such, these best data partitions are formed algorithms may be potentially by clusters containing either helpful to deal with other relevant or irrelevant documents. datasets. For example, in dataset C, the As far as the adopted algorithm AL100 obtained a data dimensionality reduction partition formed by some technique is concerned Term singletons and by other 15 Variance (TV) [16] I observed clusters (C1, C2, C3… C15) that the selection of the 100 whose information are listed in attributes (words) that have the Table IV. greatest variance over the documents provided best results than using all the attributes in three out of five datasets (see Table III). Compared to Kms and KmsS, the worse results obtained from feature selection by Kms100 and Kms100S, especially in the dataset D, are likely due to k- Fig. 4. Dendro gram obtained means convergence to local by AL100 Dataset A. optima from bad initialization. By considering all the results For obvious reasons, I cannot obtained from feature selection, I reveal more detailed, confidential believe that it should be further information. However, I can studied mainly because of the mention that, on this real world potentially advantageous financial crime investigation, computational efficiency gains. clusters of relevant documents Finally, from a practical have been obtained, such as C10, viewpoint, a variety of relevant C14, and C2, which contain findings emerged from our study. financial or exchange transaction It is worth stressing that, for all information. Also, other obtained
167 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 clusters have only irrelevant clusters it merges. This documents e.g., C12, which representation provides very contains label designs. These informative descriptions and clusters of either relevant or visualization for the potential irrelevant documents can help data clustering structures [5], computer examiners to efficiently thus being helpful tools for focus on the most relevant forensic examiners that analyze documents without having to textual documents from grabbed inspect them all. In summary, computers. document clustering has great As I already discussed, the potential to be useful for ultimate clustering results can be computer inspection. obtained by cutting the dendro As a final remark, a gram at different levels e.g., by desirable feature of hierarchical using a relative validity criterion algorithms that make them like the Figure. For the sake of particularly interesting for expert illustration, Figs. 4–8 show examiners is the summarized examples of dendro grams view of the dataset in the form of obtained by AL100. The dendro a dendro gram, which is a tree grams were cut horizontally at diagram that illustrates the the height corresponding to the arrangement of the clusters. The number of clusters estimated by root node of the dendro gram the Figure (dashed line). Roughly represents the whole data set (as speaking, sub trees with low a single cluster formed by all height and large width represent objects), and each leaf node both cohesive and large clusters. represents a particular object (as These clusters are good a singleton cluster). The candidates for a starting point intermediate nodes, by their inspection. Moreover, the turn, represent clusters merged forensic examiner can, after hierarchically. The height of an finding a cluster of relevant intermediate node is proportional documents, inspect the cluster to the distance between the most similar to the one just
168 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 found, because it is likely that it is also a relevant cluster. This can be done by taking advantage of the tree diagram. V. FUTURE SCOPE It is well-known that the success of any clustering algorithm is data dependent, but for the assessed datasets some of Fig. 7. Dendro gram obtained our adaptations of existing by AL100 Dataset D. algorithms have shown to be good enough. Scalability may be an issue, however. In order to deal
Fig. 5. Dendro gram obtained by AL100 Dataset B.
Fig. 8. Dendro gram obtained by AL100 Dataset E. With this issue, a number of sampling and other techniques can be used e.g., see [24]–[29]. Also, algorithms such as bisecting k-means and related approaches can be used. These algorithms can also induce dendro grams, and have a similar inductive bias with respect to hierarchical Fig. 6. Dendro gram obtained by methods considered in our work. AL100 Dataset C. More precisely, and aimed at circumventing computational
169 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 difficulties, partitional clustering mentioned in the paper, to algorithms can be used to alleviate this potential difficulty, compute a hierarchical clustering especially when dealing with very solution by using repeated cluster large datasets, a simplified bi sectioning approaches. For Figure [7] can be used. The instance, bisecting k-means has simplified Figure is based on the relatively low computational computation of distances requirements, i.e., it is O between objects and cluster (N.logN), versus the overall time centroids, thus making it possible complexity of ) for to reduce the computational cost agglomerative methods. Since the from to O(k.N.D), where inductive biases of bisecting k- , the number of clusters, is means and the hierarchical usually significantly less than . It algorithms used in our work are is also worth mentioning that similar, I believe that, if the there are several different number of documents is relative validity criteria that can prohibitively high for running be used in place of the Figure agglomerative algorithms, then adopted in our work. As bisecting k-means and related discussed in [14], such criteria approaches can be used. are endowed with particular Considering the features that may make each of computational cost of estimating them to outperform others in the number of clusters, the specific classes of problems. Also, Figure proposed in [4] depends they present different on the computation of all computational requirements. In distances between objects, this context, in practice one can leading to an estimated try different criteria to estimate computational cost of , the number of clusters by taking where N is the number of objects into account both the quality of in the dataset and D is the the obtained data partitions and number of attributes, the associated computational respectively. As already cost.
170 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Finally, as a cautionary suitable for the studied note, I would like to mention application domain because the that, in practice, it is not always dendro grams that they provide of paramount importance to have offer summarized views of the scalable methods. In our documents being inspected, thus particular application scenario, being helpful tools for forensic there are no hard time examiners that analyze textual constraints to get data partitions documents from grabbed (like those present when computers. As already observed analyzing streaming data with in other application domains, online algorithms). Instead, dendro grams provide very domain experts can usually spend informative descriptions and months analyzing their data visualization capabilities of data before reaching a conclusion. clustering structures [5]. The partitional K-means and K- VI. CONCLUSION medoids algorithms also achieved I presented an approach good results when properly that applies document clustering initialized. Considering the Methods to forensic analysis of approaches for estimating the computers grabbed in police number of clusters, the relative investigations. Also, I reported validity criterion known as and discussed several practical Figure has shown to be more results that can be very useful for accurate than its (more researchers and practitioners of computationally efficient) forensic computing. More simplified version. In addition, specifically, in our experiments some of our results suggest that the hierarchical algorithms using the file names along with known as Average Link and the document content Complete Link presented the information may be useful for best results. Despite their usually cluster ensemble algorithms. high computational costs, I have Most importantly, shown that they are particularly
171 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 I observed that clustering Minton, I. Xheneti, A. Toncheva, algorithms indeed tend to induce and A. Manfrediz, “The clusters formed by either expanding relevant or irrelevant documents, Digital universe: A forecast of worldwide information growth thus contributing to enhance the through 2010,” Inf. Data, vol. 1, expert examiner’s job. pp. 1–21, 2007. Furthermore, our evaluation of [2] B. S. Everitt, S. Landau, and M. the proposed approach in five Leese, Cluster Analysis. London, real-world applications show that U.K.: Arnold, 2001. it has the potential to Speed up [3] A. K. Jain and R. C. Dubes, the computer inspection process. Algorithms for Clustering Data. Aimed at further leveraging the Englewood Cliffs, NJ: Prentice- use of data clustering algorithms Hall, 1988. L. Kaufman and P. Rousseeuw, in similar applications, a [4] Finding Groups in Gata: An promising venue for future work Introduction to Cluster Analysis. involves investigating automatic Hoboken, NJ: Wiley- approaches for cluster labeling. Interscience, 1990. The assignment of labels to [5] R. Xu and D. C.Wunsch, II, clusters may enable the expert Clustering. Hoboken, NJ: examiner to identify the semantic Wiley/IEEE content of each cluster more Press, 2009. quickly eventually even before [6] A. Strehl and J. Ghosh, “Cluster examining their contents. ensembles: A knowledge reuse Finally, the study of algorithms framework for combining multiple partitions,” J. Mach. that induce overlapping Learning Res., vol. 3, pp. 583– partitions (e.g., Fuzzy C-Means 617, 2002. and Expectation-Maximization [7] E. R. Hruschka, R. J. G. B. for Gaussian Mixture Models) is Campello, and L. N. de Castro, worth of investigation. “Evolving clusters in gene- References expression data,” Inf. Sci., vol.
[1] J. F. Gantz, D. Reinsel, C. Chute, 176, pp. 1898–1927, 2006. W. Schlichting, J. McArthur, S. [8] B. K. L. Fei, J. H. P. Eloff, H. S. Venter, andM. S. Oliver,
172 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
“Exploring forensic data with and Pattern Recognition, 2010, self-organizing maps,” in Proc. pp. 23–28.
IFIP Int. Conf. Digital Forensics, [14] L. Vendramin, R. J. G. B. 2005, pp. 113–123. Campello, and E. R. Hruschka,
[9] N. L. Beebe and J. G. Clark, “Relative clustering validity “Digital forensic text string criteria: A comparative searching: Improving overview,” Statist. Anal. Data information retrieval Mining, vol. 3, pp. 209–235, effectiveness by thematically 2010.
clustering search results,” [15] G. Salton and C. Buckley, “Term Digital Investigation, Elsevier, weighting approaches in vol. 4, no. 1, pp. 49–54, 2007. automatic text retrieval,” Inf.
[10] R. Hadjidj, M. Debbabi, H. Process. Manage. vol. 24, no. 5, Lounis, F. Iqbal, A. Szporer, and pp. 513–523, 1988.54 IEEE D. Benredjem, “Towards an TRANSACTIONS ON integrated e-mail forensic INFORMATION FORENSICS analysis framework,” Digital AND SECURITY, VOL. 8, NO. Investigation, Elsevier, vol. 5, 1, JANUARY 2013
no. 3–4, pp. 124–137, 2009. [16] L. Liu, J. Kang, J. Yu, and Z. [11] F. Iqbal, H. Binsalleeh, B. C. M. Wang, “A comparative study on Fung, and M. Debbabi, “Mining unsupervised feature selection writeprints from anonymous e- methods for text clustering,” in mails for forensic investigation,” Proc. IEEE Int. Conf. Natural Digital Investigation, Elsevier, Language Processing and vol. 7, no. 1–2, pp. 56–64, 2010. Knowledge Engineering, 2005,
[12] S. Decherchi, S. Tacconi, J. Redi, pp. 597–601. A. Leoncini, F. Sangiacomo, and [17] V. Levenshtein, “Binary codes R. Zunino, “Text clustering for capable of correcting deletions, digital forensics analysis,” insertions, and reversals,” Soviet Computat. Intell. Security Inf. Physics Doklady, vol. 10, pp. Syst., vol. 63, pp. 29–36, 2009. 707–710, 1966.
[13] K. Stoffel, P. Cotofrei, and D. [18] B. Mirkin, Clustering for Data Han, “Fuzzy methods for Mining: A Data Recovery forensic data analysis,” in Proc. Approach. London, U.K.: IEEE Int. Conf. Soft Computing Chapman & Hall, 2005.
173 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
[19] A. L. N. Fred and A. K. Jain, Discov., vol. 10, no. 2, pp. 141– “Combining multiple clusterings 168, 2005.
using evidence accumulation,” [26] Y. Zhao and G. Karypis, IEEE Trans. Pattern Anal. “Evaluation of hierarchical Mach. Intell., vol. 27, no. 6, pp. clustering algorithms for 835–850, Jun. 2005. document datasets,” in Proc.
[20] L. Hubert and P. Arabie, CIKM, 2002, pp. 515–524. “Comparing partitions,” J. [27] S. Nassar, J. Sander, and C. Classification, vol. 2, pp. 193– Cheng, “Incremental and 218, 1985. effective data summarization for
[21] C. M. Bishop, Pattern dynamic hierarchical Recognition and Machine clustering,” in Proc. 2004 ACM Learning. New York: Springer- SIGMOD Int. Conf. Verlag, 2006. Management of Data (SIGMOD
[22] S. Haykin, Neural Networks: A ’04), 2004, pp. 467–478. Comprehensive Foundation. [28] K. Kishida, “High-speed rough Englewood Cliffs, NJ: Prentice- clustering for very large Hall, 1998. document collections,” J. Amer.
[23] L. F. Nassif and E. R. Hruschka, Soc. Inf. Sci., vol. 61, pp. 1092– “Document clustering for 1104, 2010, doi: forensic computing: An approach 10.1002/asi.2131.
for improving computer [29] Y. Loewenstein, E. Portugaly, M. inspection,” in Proc. Tenth Int. Fromer, and M. Linial, “Effcient Conf. Machine Learning and algorithms for exact hierarchical Applications (ICMLA), 2011, vol. clustering of huge datasets: 1, pp. 265–268, IEEE Press. Tackling the entire protein
[24] Aggarwal, C. C. Charu, and C. X. space,” Bioinformatics, vol. 24, Zhai, Eds., “Chapter 4: A Survey no. 13, pp. i41–i49, 2008. of Text Clustering Algorithms,” in Mining Text Data. NewYork: Springer, 2012.
[25] Y. Zhao, G. Karypis, and U. M. Fayyad, “Hierarchical clustering algorithms for document datasets,” Data Min. Knowl.
174 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 BLUETOOTH FILE TRANSFER WITH BREAKPOINT
1Priyanka V. Godse 2Snehal P. Katore 3Poonam A. Modani 4Poonam B. Sonawane
[email protected] [email protected] [email protected] [email protected] Guided by : S. V. Londhe
Sir Visvesvaraya Institute Of Technology,Chincholi(422101).
Abstract-:In this proposed system will store the current system, we provide a new state of transferred file. So methodology for transferring the next time user can file via Bluetooth. Here file transfer only the remaining can be transferred from one part of the failure file. In this device to another device situation, there is no need to simultaneously. There will be transfer whole file again. a log file system on sender for break point and featuring Keywords : Bluetooth, purpose. User can transfer a Breakpoint, File Transfer, file from an android device Log File, Retransfer to another android device via I . INTRODUCTION: Bluetooth.Easy graphical Now a day’s many smart user friendly interface phones are support Wireless or access. Maintaining Log file Bluetooth technology for for break point. In our transferring files without any system we provide a wire. But it provides users with mechanism to handle the limited accessibility. If there is failure of transfer file. If any failure on transferring the there is any failure on file, the transfer will be cancelled transferring the file, the automatically by the application.
175 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 If it is interrupted then it will not universal standard that enables continue from that point rather communication between mobile than whole file will be required to phones, laptops or other portable transfer again so loss of data devices. It supports short range occurs and time will also be wireless transmissions by using consumed more. So rather than the unlicensed 2.4GHz short- sending from the start we will range radio frequency resend the file from the bandwidth. Bluetooth allows breakpoint that is sending of the users to form clusters of 8 file will be resumed. maximum connected devices Need : As the use of bluetooth file that form a star shaped transfer is being increading day network named a piconet. The by day there is need of more main device from the cluster is advanced features in a bluetooth named a master; all other devices file transfer such as there should are named slaves. Two Bluetooth be the facility of retransfer of file devices can transfer data with a from the breakpoint and maximum speed of 2.1Mbps[1]. maintainance of log files. It was originally II . LITERATURE SURVEY: developed as an alternative to Bluetooth is an open RS-232 data cables, personal wireless technology standard for devices communicate based on exchanging data over short the RS-232 serial port protocol, distances using short wavelength proprietary connectors and pin radio transmissions in the ISM arrangements make it impossible band from 2400-2480MHz from to use the same set of cables to fixed and mobile devices. It was interconnect devices from originally developed as an different manufactures, and, alternative to RS-232 data cables. some times, even from the same It can connect several devices, manufacturer[2]. While that of overcoming problems of the Bluetooth connection enables synchronization. Wireless easy exchange of information Bluetooth technology is a
176 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 between different as well as same Android platform provides the manufacturing devices. basic functionalities with the The Android platform Bluetooth File Libraries which includes support for the are necessary for the operations Bluetooth network stack, which related to the file transmission. allows a device to wirelessly III. MATHEMATICAL MODEL: exchange data with other Let S be a system, Bluetooth devices. The S={I,O,W}; application framework provides Where, access to the Bluetooth I=Input set, functionality through the O=Output set, Android Bluetooth APIs. These W=Phases of system. APIs let applications wirelessly Input set : connect to other Bluetooth I={F,D}; devices, enabling point-to-point Where, and multipoint wireless F=Files to be transfferred, features[3]. D=Devices to which files should Using the Bluetooth APIs, an be transfer. Android application can perform Now, the following: F={F0,F1,F2,...... ,Fn}; Scan for other Bluetooth Where, devices. F0={Fs0,Fb0,Fe0}; Query the local Bluetooth F1={Fs1,Fb1,Fe1}; adapter for paired F2={Fs2,Fb2,Fe2}; Bluetooth devices. . . Establish RFCOMM channels. . . Connect to other devices through service discovery. Fn={Fsn,Fbn,Fen}; Where, Transfer data to and from other devices. Fsn=starting file of file n, Fbn=breaking point of file n,
177 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Fen=ending point of file n. And, D={P,C}; Where, P= list of paired bluetooth devices, C= Visible bluetooth devices. Again ,
P={P0,P1,P2,...... ,Pm}; M1:Q0->Q1; C={C0,C1,C2,.....,Cm}; Where, M1 is a connection Output : manager which connnects source O={Q0,Q1,Q2,Q3,Q4,Q5,Q6}; and target devices for further Q0=List of paired and data transmission. discoverable devices. M2:Q1->Q2; Q1=Connected device. Where, M2 is a function that will Q2=List of files for transmission. select files in source device to Q3=Actual file transmission. transmit it to the target device. Q4=Log files. M3:Q2->Q3; Q5=File break point details. Where, M3 is a function in which Q6=Complete file transmission. system will transfer the file from W={M0,M1,M2,M3,M4,M5,M6}; source device to target device,the M0:D->Q0; file transmission will be parallel Where, M0 is a function to scan in case of multiple files. the devices so D will contain list M4:Q3->Q4; of paired and discoverable Where, In this the log file will be devices(It automatically enables maintained in case of failure in bluetooth discovery if it is not the transmision of the file,log file discoverable). will be updated in every data transfer and will be stired on both side. M5:Q4->Q5;
178 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Where, In M5 the system will ask point if the transmission is the filename from target user interrupted in any manner. and verify the file availability There will be a log file system on from source and target devices sender for break point and and transfer the file from break featuring purpose. It is based on point. parallel data transfer and M6:Q5->Q6; multitasking. Where, M6 is the state at which Features: file will be transffered 1. User can transfer a file from a successfully. android device to another android device. 2. Easy graphical user friendly interface access. 3. Maintaining Log file for break point. Figure : DFA for Bluetooth file 4. In our system we provide a transfer with breakpoint. mechanism to handle the failure Where, of transfer file. If there is any 1=scanning the devices (Paired failure on transferring the file, and new devices). the system will store the current 2=connection manager. state of transferred file. So in the 3=file selection. next time user can transfer only 4=file transfer. the remaining part of the failure 5=log file for failure connection. file. In this situation, there is no 6=file retransfer. required to transfer whole file 7=successful file transfer. again. So the loss of data occurs Success:{M6}; and time will also be consumed Failure:{M0,M1,M2,M3,M4,M5}; more by our proposed system. IV. PROPOSED SYSTEM: 1.Working: Current Bluetooth File We use a novel algorithm Transfer does not guarantee the to manage the file transfer transmission of a file from break failure with break point facility.
179 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Which is Current Object State 2.1 Module 2 (Connection Maintenance to Re/Store the data Manager): into targeted devices. In this module, the system The Current Object State will find the target device (which Maintenance algorithm will selected by user from scan list) maintain a log file of transferred and try to create connection data between source and target between the source and target devices. And following are the device.The system will display a modules: success message to source and 1.Scan Devices (Paired and new) target device if connection 2.Connection Manager succeeded, else display failure 3.File Selection in source device message. 4.File Transfer Manager 5.Log File Manager for Failure 2.3 Module 3 (File Selection in Connection Source Device): 6.File Re Transfer Module In this module, the system will ask the user to enter the 2.Module filename (which is going to transfer to target device).The file 2.1 Module 1 (Scan Devices should be from a external (SD (Paired and New)): Card) memory card. The system In this module, the system will not support internal memory will find all available Bluetooth card file system.The system will supported (android) devices. The add the file to transfer list if file System will display all paired selection is valid, else display a devices and new devices for failure message and will not add pairing. the file to transfer list. It automatically enables Bluetooth Discovery, if its not 2.4 Module 4 (File Transfer enabled. Manager): In this module, the system will transfer the file (which is
180 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 selected by user) from source 2.6 Module 6 (File Re Transfer device to target device.The Module): transfer will be in sequential and In this module, the system the data transfer speed will be will ask the filename from target depends the source and target user and verify the file device. availability from source and target devices. If file verified, it 2.5 Module 5(Log File Manager display the related information for Failure Connection): like File Size, Total Transferred In this module, the system Bytes, Break Position and will maintain a log file of Remaining Bytes to Transfer and transferred data between source etc.In the last, source user will and target devices.The log file transfer the file from the break will update in every data position to target user. transfer.The log file will stored in both side. The log file will 3. Block Diagram contains Filename, Total File 3.1 Bluetooth File Transfer: Size, Current Index Position, Transferred Bytes, Total Transfer Bytes and etc
Figure 3.1 : Bluetooth file transfer(current bluetooth file transfer system).
181 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 3.2 Bluetooth File Re-transfer:
Figure 3.2 : Bluetooth file retransfer(proposed system bluetooth file transfer)
V.APPLICATIONS: VI.CONCLUSION: The proposed system is itself an Thus the Bluetooth File application which is basically transmission with breakpoint designed for the mobile phones. system will save the users time Better GUI is provided for this of transmitting data application. Different android phones will use this application VII.REFRENCES: for faster transmission and [1] Olaiya Folorunsho and Mariam purpose of retransmission from Biola Bello “Development of Tool breakpoint. Multiple file for Managing Bluetooth Data Transfer Logs in Mobile Platform” transfer will be possible with International Journal of Advanced this application. Research in Computer Science
182 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 and Software Engineering Research Paper. [2] Pennsylvania Philadelphia, PA Holmdel, NJ Philadelphia, PA “Bluetooth Technology Key Challenges and Initial Research” Roch Gu´erin Enyoung Kim Saswati Sarkar [email protected] [email protected] [email protected] U. Pennsylvania Lucent Technologies U. [3]http://developer.android.com/gui de/topics/connectivity/bluetooth.ht ml [4] Bluetooth SIG, http://www.bluetooth.com.
[5] Bluetooth support inWindows XP, http://www.microsoft.com/hwdev/te ch/network/bluetooth. [6]http://bluetooth.com/Bluetooth/ Technology/Works/Security.
[7] Bluetooth Security Architecture white paper,http://www.bluetooth.com/de veloper/whitepaper/whitepaper.asp.
183 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Optimal Multiserver Configuration for Profit Maximization in Cloud Computing
Pravin Pokale1, Vishal Agrawal2, Rahul Wakchaure3 1,2,3B.E Computer Science Engg, S.V.I.T, Nashik, Maharashtra.
make migrating of data to Abstract— The cloud the cloud possible. Our computing paradigm has results show that, our achieved widespread proposed model provides a adoption in recent years. Its better decision for customers success is due largely to according to their available customers ability to use budgets services on demand with a Index Terms—Cloud pay-as-you go pricing model, Computing, Pricing Model, which has proved convenient Data Security, Cost-effective, in many respects. As cloud Cloud data Migration, Cloud computing becomes more and service provider more popular, understanding INTRODUCTION the costing of cloud The end of this decade is computing becomes critically marked by a paradigm shift of the important. Our pricing model industrial information technology in multiserver system helps towards a subscription based or us to understand pay per use pay-per-use service business service of various cloud model known as cloud computing service providers. Cloud data [1]. One of the prominent services storage highlights the offered in cloud computing is the security issues targeted on cloud data storage, in which, customer’s outsourced data subscribers do not store their i.e. data that is not data on their own servers, where stored/retrieved from the their data will be stored on the customers own servers. Low cloud service provider’s servers. costs and high flexibility
184 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 In cloud computing, customer Businesses locked into such a are charged for the service cloud come to standstill until the providers storage service. This cloud is back online. Moreover, service does not only provides public cloud providers generally flexibility and scalability for the don’t guarantee regular service data storage, it also provide level agreements (SLAs) that is, customers with the benefit of businesses locked into a cloud paying only for the amount of have no guarantees that it will data they need to store for a continue to provide the required particular period of time, without quality of service (QoS). Finally, any concerns for efficient storage most public cloud providers’ mechanisms and maintainability terms of service allows provider issues with large amounts of data unknowingly change pricing at storage. In addition to these any time. Hence, a business benefits, customers can easily locked into a cloud has no more access their data from any control over its own IT costs. geographical region where the Cloud Service Provider’s network or Internet can be accessed. An example of the cloud computing is shown in Fig 1. Despite its obvious advantages, however, many companies hesitate to
“move to the Fig. 1. Cloud computing Cloud” mainly because of architecture example concerns related to service availability, data lock-in, and In addition, providing better legal uncertainties. privacy as well as ensuring data Lockin causes more problems. availability, can be achieved by For one thing, even though public dividing the user’s data block into cloud availability is generally data pieces and distributing them high, outages still occur. among the available SPs in such a
185 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 way that no less than a threshold provider’s cloud services and number of SPs can take part in using its specific API, that successful retrieval of the whole application is bound to that data block. To address these provider; deploying it on another issues in this paper, we proposed cloud would usually require an economical distribution of data completely redesigning and among the available SPs in the rewriting it. Such vendor lock-in market, to provide customers leads to strong dependence on the with data availability as well as cloud service operator. First, secure storage. In our model, the standardized programming APIs customer distributes his data must enable developers to create among several SPs available in cloud-independent applications the market, based on his available that aren’t dependent to any budget. we also provide a decision single provider or cloud service. for the customer, to which SPs he Cloud applications also depends must chose to access data, with on sophisticated features to respect to data access quality of actually deploy and install service offered by the SPs at the applications automatically. location of data retrieval. This Predictable and controlled removes the possibility of a SP application deployment is a misusing the customers’ data, central issue for cost-effective and breaching the privacy of data, but efficient deployments in the can easily ensure the data cloud. availability with a better quality Our proposed approach will of service. provide the cloud computing In our multiserver system we users a decision model, that also introduced the concept of provides a better security by runtime data migration between distributing the data over different cloud service providers. multiple cloud service providers The problem, however, is that in such a way that, none of the once an application has been SP can successfully retrieve developed based on one particular meaningful information from the
186 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 data pieces allocated at their Customers pay a fixed price per servers. Also, in addition, we unit of use. Amazon, considered provide the user with better the market leader in cloud availability of data, by computing, utilizes such a model maintaining redundancy in data by charging a fixed price for each distribution. If a service provider hour of virtual machine usage. suffers service outage or goes The “pay-as-you-go” model is also failure, the user still can access implemented by other leading his data by retrieving it from enterprises such as Google App other service providers. Engine and Windows Azure. From the business point of Another common scheme view, since cloud data storage is a employed by these leading subscription service, the higher enterprises is the “pay for the data redundancy, the higher resources” model. A customer will be the cost to be paid by the pays for the amount of bandwidth user. Thus, we provide an or storage utilized. Subscription, optimization scheme to handle where a customer pays in advance the tradeoff between the costs for the services he is going to that a cloud computing user is receive for a pre-defined period of willing to pay to achieve a time, is also common. particular level of security for his A customer will evaluate a data. In other words, we provide a single service provider based on scheme to maximize the security three main parameters: pricing for a given budget for the cloud approach, QoS, and utilization data. period. The pricing approach RELATED WORK describes the process by which Different service providers the price is determined. The employ different schemes and pricing approach could be one of models for pricing [6]. However, the following: fixed priced the most common model independent of volume, fixed employed in cloud computing is price plus per-unit rate, assured the “pay-as-you go” model. purchase volume plus per unit
187 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 price rate, per-unit rate with a ceiling, and per-unit price. The quality of service describes the requirements for what a service provider should provide to his customers. QoS requirements include the availability of service, security, privacy, scalability, and integrity of the service provider. If the service provider ensures that these requirements are maintained at a high level, the quality of the service provided will increase. This will increase Fig. 2. Aspects of Cloud the number of customers and Computing loyalty to the service provider. The utilization period can be Privacy preservation and data called as the period in which the integrity are two of the most customer has the right to utilize critical security issues related to the provider services based on user data. In conventional SLAs between the two parties. It paradigm, the organizations had could be perpetual, based on the the physical possession of their subscription period, or a pay-per- data and hence have an ease of use model. Figure 2 below implementing better data describes the main aspects of security policies. But in case of pricing models. cloud computing, the data is stored on an autonomous business party, that provides data storage as a subscription service. The users have to trust the cloud service provide with security of
188 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 their data. In, the author meaningful information from the discussed the criticality of the pieces of data stored on its privacy issues in cloud servers, without getting some computing, and pointed out that more pieces of data from other obtaining an information from a service providers. Therefore, the third party is much more easier conventional single service than from the creator himself. provider based scheme does not One more bigger concern that seem too much promising. arises in such schemes of cloud It is the method of moving a storage services, is that, there is large amount of data and no full-proof way to be certain applications into the target cloud that the service provider does not where the target cloud can be a retain the user data, even after public, a private or hybrid cloud the user opts out of the [2]. Since large numbers of subscription. With enormous applications are required to fulfill amount of time, such data can be an organization’s business needs decrypted and meaningful and to improve its growth, information can be retrieved and various models of DaaS (Database user privacy can easily be as a service) are now provided breached. Since, the user might keeping in view the data not be availing the storage migration process. The data can services from that service be migrated in several ways such provider, he will have no clue of as from any organization to a such a passive attack. target cloud or from one cloud to To provide users with better another cloud [3]. But it is quite and fair chances to avail efficient challenging task to migrate data security services for their cloud and it involves various major storage at affordable costs, our security issues as well like data model distributes the data pieces integrity, security, portability, among more than one service data privacy, data accuracy etc. providers, in such a way that no one of the SPs can retrieve any
189 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 associated with a QoS factor, along with its cost of providing storage service per unit of stored data (C) [5]. Every SP has a different level of quality of Fig. 3. Data Migration In service (QoS) offered as well as a Cloud different cost associated with it. MODELS Hence, the cloud user can store In our proposed paper we are his data on more than one SPs going to include three different according to the required level of concepts based on cloud security and their affordable computing. Firstly we will explain budgets. secured multi-cloud data storage We use an example in Fig. 4 to and retrieval, second security of illustrate our proposed threshold. data in cloud and pricing models. In this example we assume that Secured Multi-Cloud Data we have 9 cloud service providers Storage and Retrieval (SP1, SP2, ..., SP9). Let us We consider the storage assume that a Customer (C1) has services for cloud data storage divided his own data he wish to between two entities, cloud users store on some SP’s servers into 9 (U) and cloud service providers data pieces. A customer required (SP). The cloud storage service is to retrieve at least 6 data pieces generally priced on two factors, from different SPs to reconstruct how much data is to be stored on his own data to get the full the cloud servers and for how information, where in our long the data is to be stored. In example, six SPs will participate our model, we assume that all the in the data retrieval (SP1, SP4, data is to be stored for same SP5, SP6, SP8 and SP9). period of time. We consider p number of cloud service providers (SP), each available cloud service provider is
190 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 longer in length as compared to the plain text otherwise a problem may occur. Enhanced algorithm is as follows: 1) For Encryption
Initially, generate a random Fig. 4. Data Storage and key. Retrieval Encrypt the data using that Security of Data in Cloud In my proposed work, I am random key. creating an encryption algorithm Encrypt the random key to provide strong security to data with the shared key. in cloud that would be better in Forward the data after performance than using already encryption process from existing encryption algorithm like step 2 and step 3 together. PBE(prediction based encryption) 2) For Decryption ,IBE(Identity Based Encryption) Now decrypt the encrypted etc. In this encryption method, random key with shared the concept of randomization will key. be used. In randomization Then, decrypt the encrypted concept, we initially start with data with decrypted random encrypting a single plaintext P key. into a number of cipher texts such as C1,C2,…Cn and then randomly select any one of the N cipher texts and secondly, map Fig. 5. Enhanced Algorithm any of those cipher texts back Here, the shared key is re-used into the original plain text since but the random key is used only the one who decrypts the text has once for encrypting data. Hence, no knowledge about which one by applying this method, the data has been picked. One thing that can be made more secure and must be taken into consideration reliable as the outsider will have is that the cipher text should be no idea about what data is
191 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 encrypted with the persistent However, the most common key. model employed in cloud Pricing Models computing is the “pay-as-you go” Different service providers model. Customers pay a fixed employ different schemes and price per unit of use. models for pricing as in Table 1. Table 1. Pricing Model Comparison
Pricing Pricing Fairness Pros Cons model approach Pay-as-you- Price is set Unfair to Customer is Service go model by the the aware of provider might service customer the exact reserve the provider because he price to be resources for and remains might pay paid longer than constant for more the customer’s Resources (static) time than are utilized needed reserved for the customer for the paid period of time Subscription Price is Customer Customer Customer based on might might might overpay the period sometimes underpay for for the of overpay or the resources subscription underpay resources reserved if he (static) reserved if does not use he uses them them extensively extensively Value-based Prices set Fair to High Difficult to pricing according to producers revenue on obtain and the value where each item interpret data perceived by prices are sold from the set on the (advantage customers, customer value from the competitors, (dynamic) perceived producer’s and one’s own
192 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 by the point of corporation to customer view). evaluate customer perceived value Cost-based Price set by Not fair to Simplicity in Tends to pricing[4] adding a customers calculating ignore the role profit where the the price of consumers element on perceived top of the value of cost the product (dynamic) can be identified Competition- Price set Fair to Easy to Does not take based according to customers implement customers into pricing competitors’ where account prices prices are (dynamic) always set according to competitive prices Customer- Price set Fair to Takes Customers based according to customers customer rarely pricing what the as perspective indicate to customer is customers into account seller what prepared to are always they are pay taken into willing to pay (dynamic) account Data are difficult to obtain and to interpret Pay-for- Cost-based Fair for Offers Hard to resources (static) both maximum implement model customers utilization of and the the service service provider’s provider resources
193 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
CONCLUSION helpful discussion and contribution to this work. In this paper we have proposed a multiserver middleware REFERENCES system which will consist of [1] M. Armbrust et al., “A View three models. Secured multi- of Cloud Computing,” cloud data storage and retrieval Comm. ACM, vol. 53, no. 4, helps to provide each customer 2010, pp. 50–58. with a better cloud data storage [2] Secure Migration of Various by dividing and distributing Database over A Cross customers data providing a Platform Environment, an customer with a secured storage International Journal Of under his affordable budget. Engineering And Computer When data is transferred from Science ISSN:2319-7242 one cloud to another cloud Volume 2 Issue 4 April, during the migration process we 2013. improved its security with the [3] Data Migration: Connecting randomized encryption Databases in the Cloud, a technique. With the help of our research paper published by pricing models customer can authors: Farah Habib choose the service provider with Chanchary and Samiul the pricing approach that is Islam in ICCIT 2012. most compatible with the customer’s behavior. [4] S. Lehmann and P. Buxmann, “Pricing ACKNOWLEDGEMENT Strategies of Software None of this work have been Vendors”, Business and possible without the selfless Information Systems assistance of great number of Engineering, (2009). people. I would like to gracefully [5] S. H. Shin, K. Kobara, thank all those members for “Towards secure cloud their value of guidance, time, storage”, Demo for CloudCom2010, Dec 2010.
194 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
[6] S. Maxwell, “The Price is Wrong: Understanding What Makes a Price Seem Fair and the True Cost of Unfair Pricing”, Wiley, (2008).
[7] “System Analysis and Design” by Elias M. Awad. “Microsoft .NET Framework 3.5 ASP.NET Application Development” by Mike Snell, Glenn Johnson, Tony Northup.
195 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 FACILITATING EFFECTIVE USER NAVIGATION THROUGH WEBSITE STRUCTURE IMPROVEMENT
Jyoti B. Kshirsagar Student, S.V.I.T., Chincholi, Nashik
Prof .S .D. Jondhale Professor, P.R.Engg.College., Loni, [email protected] [email protected]
Abstract- Today, it has of disorienting users after become a maintenance how the changes remains to evaluate and improve unanalyzed. Generally, we website structure has become propose a mathematical a crucial issue for website programming model to design and maintenance. improve the user navigation Designing well-structured on a website while websites has long been a minimizing alterations to its challenge to facilitate current structure. effective user navigation. The understanding of web Keywords— Mathematical developers how a structured Programming, User should be organized. If they Navigation, Website Design., have number of task Web Mining, proposed to re link web pages 1. INTRODUCTION to improve navigability using The growth of the explosive web user navigation data, the content, both website users and completely reorganized new managers expect high quality structure can be highly content service. Users wish to unpredictable, and the cost get the information they need
196 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 conveniently. Managers wish to is not easy, thus they are adding attract more users to their more amount of website design. websites. The user’s accessing to If they are highlights that poor a website can be considered as website design has been a key obtaining information from web element in a number of high pages under the restriction of profile site failures. Information website structure. For a is gain of the high quality of relatively static website, its link service.Also find that users the structure would have great targets are very likely to leave a influence to the content service website even if its information is quality. Therefore how to of high quality. If the website evaluate and improve website has minimizing changes to a structure improvement becomes website and reducing a crucial issue. To analyzed past information overload to users. user access patterns to discover common user access behavior, 2. RELATED WORK which is useful to improve the We survey several studies static website link structure or about optimal ratios between dynamically insert links to web depth and breadth of pages. A method to redesign a hierarchies.If they have website by identifying user hierarchies of advantages i.e. it profiles from web logs. Web has to be in plan and the Utilization Miner to find out simultaneously concept of interesting navigation patterns aggregation and abstraction, if for improving the organization of they are already present web pages. A technique to information spaces in discover the gap between website navigational purposes . The designers' expectations and concept of multi trees was users' behavior and suggest areas adapted and integrated in our where website can be improved. conceptual framework. The If they are finding out the culminating in his pessimistic desired information of a website statement that the web will suffer
197 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 a severe usability . we study that first metrics, and how likely the empirical work clearly users suffering navigation indicates that a successful difficulty can benefit from the implementation of website improvements made to the site structure improvement. We focus structure as a second metrics. It of the papers on web site user consist of three steps in first navigation success, user metrics as: expectation. i) The training data to obtain If the user has the poor the links of pages to be website to understand the web improved and the set of new developers how to organized the links that is apply for the structure of a website . mathematical programming Thus,these differences result in model. desired information in not easily ii) The mini session is improved obtained the website. Specifically, from the testing data. They the satisfaction of the users to are having two or more paths measure of website effectiveness and their length i.e. the set of of the web developers. Generally, candidate links that can be how pages should be organized use to improve them and the user’s model of a pages. If we number of paths. have partition the real data set iii) In step 2 for each mini into a training set i.e. first three session, check any candidate months and a testing set i.e.last links matches one of the links month. If they have two metrics in step 1,this is the results is to be defined the average from the training data. number of paths per mini session and the percentage of mini 3. PROPOSED WORK sessions on a specified threshold. Whether the improved structure Users cannot locate the desired can facilitate users to measure information in a website reach their targets faster than structure improvement for the current one on average as a differences in such result. We
198 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 have study to propose a are not improve of the irrelevant mathematical programming mini session. model on a website to improve 3.3 Dominated Mini Sessions
the user navigation wheather the Mini session Tp
changes has to be made on dominates mini session Tq i.e. the minimizing the current set of relevant candidate links of structure. Tp. If the session Tq has the set 3.1 Relevant Mini Sessions of relevant candidate links of Tq. The length of relevant For all the consideration of Tp as mini session is larger than the a relevant candidate links. corresponding path threshold (p) Notation Definition and it has denoted by Im.. We T Mini session that contains the set of paths traversed by a user to define Im = I \ R, any mini session locate one target page. T Im will not be considered in Є W The set of all web pages. our model. If S already meets the I The set of all identified mini goal of path threshold. If the session. path threshold increases from 3 R The set of relevant mini session. C The set of candidate links it can to 5, it reduces from several be selected for improving user thousands of a few hundred to navigation. the number of relevant mini E 1 if page 1 has a link to page j in ij session. Thus, p = 1 the larger the current structure; 0 otherwise.
number of relevant mini sessions L The set of relevant candidate links. M Multiplier for the penalty term in can be deleted from it’s the Objective function. consideration. S The set of source nodes of links in 3.2 Relevant Candidate Links set C. p The path threshold for mini The set of candidate links j sessions in which page j in the for irrelevant mini session is target page.
denoted by CIM and the set of N The number of links that exceed i candidate links for relevant mini the out-Degree threshold O in i page i. session is denoted by CRM.. CIM \ O The out degree threshold for page C is the candidate links which i RM i.
199 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014
D The current out degree threshold i for page i. ID Candidate Links t 1 if the link from i to j is selected; ij T {(2,6),(1,6),(4,6)} 0 Otherwise. 1 qT 1 if i is the rth page in the kth ijkr
path and j is the page in mini T2 session. {(4,6),(3,6),(5,6),(1,6)} OT 1 if in mini session T, a link from kr T {(1,4),(5,4),(2,4)} rth page in target is selected. 3 T tgt(T) The target page of mini session T. 4
{(6,4),(3,4),(2,4),(1,4)}
Table 3.1 Summary of T5 Notations {(4,6),(1,6),(5,6),(3,6)}
Illustrative Examples T6 {(5,4),(3,4),(1,4)}
An Example of Mini Sessions should be improved such Table 3.3 The Set of All
that users can reach their targets Candidate Links in one path. ID Mini Session The penalty term is not consider now. The problem is formulated T1 {(2,1),(4),(5,6)} as T2 {(4,3),(5,1),(2,6)} Minimize t [1 – E (1 – )] T3 {(1,5,2),(6,4)} Σ ij ij ε Subject to OT = qT t ; T4 {(6,3),(2,1),(5,4)} kr Σ ijkr ij r=1,2,…,F (k,T), T5 {(4,1),(5),(3),(2,6)} p k=1,2,….F (T), T R T6 {(5,3,1),(2,4)} m є + >=1 Table 3.2 An Example of Mini (T1) Sessions + >=1 (T2) If the six relevant mini session of + + >=1 the length is larger than the path (T3) threshold. + >=1 (T4)
200 International Journal of Multidisciplinary Educational Research
IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014