Volume 3, Issue 4(2), April 2014 International Journal of Multidisciplinary Educational Research

Published by Sucharitha Publications Visakhapatnam – 530 017 Andhra Pradesh – India Email: [email protected] website : www.ijmer.in

Editorial Board Editor-in-Chief Dr. Victor Babu Koppula Faculty Department of Philosophy Andhra University – Visakhapatnam -530 003 Andhra Pradesh – India

EDITORIAL BOARD MEMBERS

Prof. S.Mahendra Dev Prof. Josef HÖCHTL Vice Chancellor Department of Political Economy Indira Gandhi Institute of Development University of Vienna, Vienna & Research Ex. Member of the Austrian Parliament, Mumbai Austria

Prof.Y.C. Simhadri Prof. Alexander Chumakov Director Chair of Philosophy Department Institute of Constitutional and Parlimentary Russian Philosophical Society Studies, New Delhi & Formerly Vice Moscow, Russia Chancellor of Benaras Hindu University, Andhra University Prof. Fidel Gutierrez Vivanco Nagarjuna University, Patna University Founder and President Escuela Virtual de Asesoría Filosófica Prof. (Dr.) Sohan Raj Tater Lima Peru Former Vice Chancellor Singhania University , Rajasthan Prof. Igor Kondrashin The Member of The Russian Philosophical Prof.K.Sreerama Murty Society Department of Economics The Russian Humanist Society and Expert of Andhra University - Visakhapatnam the UNESCO, Moscow, Russia

Prof. K.R.Rajani Dr. Zoran Vujisiæ Department of Philosophy Rector Andhra University – Visakhapatnam St. Gregory Nazianzen Orthodox Institute Universidad Rural de Guatemala, GT,U.S.A Prof. A.B.S.V.Rangarao Department of Social Work Swami Maheshwarananda Andhra University – Visakhapatnam Founder and President Shree Vishwa Deep Gurukul Prof.S.Prasanna Sree Swami Maheshwarananda Ashram Education Department of English & Research Center Andhra University – Visakhapatnam Rajasthan, India

Prof. P.Sivunnaidu Dr. Momin Mohamed Naser Department of History Department of Geography Andhra University – Visakhapatnam Institute of Arab Research and Studies Cairo University, Egypt Prof. P.D.Satya Paul Department of Anthropology Andhra University – Visakhapatnam I Ketut Donder Dr.K.Chaitanya Depasar State Institute of Hindu Dharma Postdoctoral Research Fellow Indonesia Department of Chemistry Nanjing University of Science and Prof. Roger Wiemers Technology Professor of Education People’s Republic of China Lipscomb University, Nashville, USA Dr.Merina Islam Prof. G.Veerraju Department of Philosophy Department of Philosophy Cachar College, Assam Andhra University Visakhapatnam Dr R Dhanuja PSG College of Arts & Science Prof.G.Subhakar Coimbatore Department of Education Andhra University, Visakhapatnam Dr. Bipasha Sinha S. S. Jalan Girls’ College Dr.B.S.N.Murthy University of Calcutta Department of Mechanical Engineering Calcutta GITAM University –Visakhapatnam Dr. K. John Babu N.Suryanarayana (Dhanam) Department of Journalism & Mass Comm Department of Philosophy Central University of Kashmir, Kashmir Andhra University, Visakhapatnam Dr. H.N. Vidya Governement Arts College Dr.Ch.Prema Kumar Hassan, Karnataka Department of Philosophy Andhra University, Dr.Ton Quang Cuong Visakhapatnam Dean of Faculty of Teacher Education University of Education, VNU, Hanoi Dr. E.Ashok Kumar Dr.E. Ashok Kumar Department of Education Prof. Chanakya Kumar North- Eastern Hill University, Shillong University of Pune PUNE

© Editor-in-Chief, IJMER Typeset and Printed in India www.ijmer.in

IJMER, Journal of Multidisciplinary Educational Research, concentrates on critical and creative research in multidisciplinary traditions. This journal seeks to promote original research and cultivate a fruitful dialogue between old and new thought. Volume 3 Issue 4(2) April 2014

Page S.No No 1. 3D Touchless Finger print Recognition with Identical 1 Twin Fingerprint Chetan G. Puri, Dipak S. Kapadane and S.M. Rokade

2. Applying QoS Base Data Replication Attributes In 12 Clouds A.J.Musmade and S.M. Rokade

3. Cloud compiler for C, C# and Java 26 Sagorika Datta, Aradhana, Anjali Dewani and PriyaBankar

4. GSM Networked DTMF Based Smart Password Entry 38 System Charushila B. Bachhav, Poonam S. Bagul and Pradnya S. Sanap

5. Auditing Protocols: A New Approch for Security of 46 Cloud Data Sonali Pardeshi, Ankita Rathi, Shalini Shejwal and Pooja Kuyate

6. An Extraction Technique for Universal Distance Cache 61 Chotia Amit N, Joshi Abhishek A, Gosavi Darpan V and Wagh Ganesh V

7. Safety Management of Construction Workers 74 Vivek K. Kulkarni and R. V. Devalkar

8. Improving Startup Time and Providing Security to 80 Snapshots on Platform Sheetal R. Tambe, Monika Shinde, Rohini Hire, Shanku Mandal and Rokade S. M

9. Efficiently Securing Privacy of User Information in 92 Cloud Based Health Monitoring System Dhoot Suyog S, Naoghare M. M and Shinde Girish. R

10. Cloud Based Mobile Service Delivery 106 Using QoS Mechanism PrachiB.Gaikwad and S.M. Rokade

11. Fault Diagnosis in Induction Motor 117 K.R.Gosavi and A.A.Bhole

12. A review on intrusion detection system for web based 127 application R S Jagale and M M Naoghare

13. Implementation of Enhanced security on vehicular 141 cloud computing Rajbhoj Supriya K and Pankaj R. Chandre

14. Document Clustering for Forensic Analysis & 149 Investigation Dhokane R.M and Rokade S.M

15. Bluetooth file Transfer with Breakpoint 175 Priyanka V. Godse, Snehal P.Katore, Poonam A.Modani and Poonam B.Sonawane

16. Optimal Multiserver Configuration for Profit 184 Maximization in Cloud Computing Pravin Pokale, Vishal Agrawal and Rahul Wakchaure

17. Facilitating Effective User Navigation through Website 196 Structure Improvement Jyoti B. Kshirsagar and S .D. Jondhale

18. VeinSecured: A Detection and Prevention of DDoS 204 Attacks Sucheta Daware, Saurabh Chatterjee and Shrutika Jadhav 19. An Apotheosis Extraction Approach by Genetic 218 Programming Swati Shahi, Priyanka Tidke, Shital Vanis and Jayshree Sangale

20. Constriction Based Particle Filter to Denoise Video and 234 Trace Multiple Moving Objects A. R. Potdar and V. K. Shrivastava

21. Android Application on Latest Auditions Online Portal 247 Patil Kalpesh R,Sangale Swapnil V, Baviskar Sachin M and Mahajan Vilas U

22. A Survey On User Identity Verification via Keyboard 254 and Mouse Dynamics Ashwini Subhash Sonawane, Uzma Anis Shaikh, Vaishali Sitaram Kadu and Shedge Kishne N

23. Intelligent Transportation System 261 Deepti Patil, Sheetal Sharma, Kiran Madihalli and Gunjan Deore

24. Green Cloud Energy Efficient A New Methodology for 267 Creating Autonomous Software Deployment Packages Rohan Nagar, Sagar Karad and Rakesh Vaishnav

Editorial …….. Provoking fresh thinking is certainly becoming the prime purpose of International Journal of Multidisciplinary Educational Research (IJMER). The new world era we have entered with enormous contradictions is demanding a unique understanding to face challenges. IJMER’s contents are overwhelmingly contributor, distinctive and are creating the right balance for its readers with its varied knowledge. We are happy to inform you that IJMER got the high Impact Factor 2.735, Index Copernicus Value 5.16 and IJMER is listed and indexed in 34 popular indexed organizations in the world. This academic achievement of IJMER is only author’s contribution in the past issues. I hope this journey of IJMER more benefit to future academic world. In the present issue, we have taken up details of multidisciplinary issues discussed in academic circles. There are well written articles covering a wide range of issues that are thought provoking as well as significant in the contemporary research world. My thanks to the Members of the Editorial Board, to the readers, and in particular I sincerely recognize the efforts of the subscribers of articles. The journal thus receives its recognition from the rich contribution of assorted research papers presented by the experienced scholars and the implied commitment is generating the vision envisaged and that is spreading knowledge. I am happy to note that the readers are benefited. My personal thanks to one and all.

(Dr.Victor Babu Koppula)

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 3D TOUCHLESS FINGERPRINT RECOGNITION WITH IDENTICAL TWIN FINGERPRINTS

Chetan G. Puri1, Dipak S. Kapadane2, Prof.Sharad M. Rokade3 3Associate Professor and Head,Computer Engg. Dept. 1, 2, 3Sir Visvesvaraya Institute Of Technology, Nashik [email protected], [email protected] [email protected]

Abstract verification system can Fingerprint recognition with distinguish identical twins identical twins is a challenging without drastic degradation in task due to the closest genetics- performance. (b) The chance that based relationship existing in the the fingerprints have the same identical twins. Several pioneers type from identical twins is have analyzed the similarity 0.7440, comparing to 0.3215 from between twins’ fingerprints. In non-identical twins. (c) For the this work we continue to corresponding fingers of identical investigate the topic of the twins which have same similarity of identical twin fingerprint type, the probability fingerprints. (1) Two state-of-the- distribution of five major art fingerprint identification fingerprint types is similar to the methods: P071 and VeriFinger probability distribution for all 6.1 were used, rather than one the fingers’ fingerprint type. (d) fingerprint identification method For each of four fingers of in previous studies.. (2) A novel identical twins, the probability of statistical analysis, which aims at having same fingerprint type is showing the probability similar. distribution of the fingerprint Fingerprints are types for the corresponding traditionally captured based on fingers of identical twins which contact of the finger on paper or have same fingerprint type, has a platen surface. This often been conducted. (a) A state-of- results in partial or degraded the-art automatic fingerprint images due to improper finger

1 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 placement, skin deformation, Introduction slippage and smearing, or sensor Biometrics refers to the noise from wear and tear of automatic identification of a surface coatings. A new person based on his or her generation of touchless live scan physiological or behavioural devices that generates 3D characteristics. These methods representation of fingerprints is have advantages over traditional appearing in the market. This token based identification new sensing technology approaches using a physical key addresses many of the problems or access card, and over stated above. However, 3D knowledge based identification touchless fingerprint images need approaches that use a password to be compatible with the legacy for various reasons. First, the rolled images used in Automated person to be identified is required Fingerprint Identification to be physically present at the Systems (AFIS). In order to solve point of identification to provide this interoperability issue, we his or her biometric traits. propose a unwrapping algorithm Second, identification based on that unfolds the 3D fingerprint in biometric characteristics avoids such a way that it resembles the the need to carry a card or effect of virtually rolling the 3D remember a password. Finally, finger on a 2D plane. Our the biometric characteristics of preliminary experiments show identified person cannot be lost promising results in obtaining or forged. During the past few touchless fingerprint images that decades, a number of verification are of high quality and at the systems based on different same time compatible with legacy biometric characteristics have rolled fingerprint images. been proposed [3]. Fingerprints are the pattern of Keywords: Fingerprint, ridges on the tip of our fingers. dizygotic, monozygotic, Minutiae They are one of the most mature points biometric technologies and are

2 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 considered legitimate proofs of ovulation. The two fertilized eggs evidence in courts of law all over develop separately and have their the world. Fingerprints are fully own genes. They may or may not formed at about seven months of be the same gender. Monozygotic fetus development and finger twins result from one fertilized ridge configurations do not egg. This egg divides into two change throughout the life except individuals who will share all of due to accidents such as bruises their genes in common. These and cuts on the finger tips. More twins are genetically identical, recently, an increasing number of with the same chromosomes and civilian and commercial similar physical characteristics applications (e.g., welfare and, therefore, they cannot be disbursement, cellular phone distinguished using the same access, laptop computer log-in) deoxyribonucleic acid. are either using or actively considering using fingerprint- based identification because of the availability of inexpensive

and compact solid state scanners as well as its superior and proven matching performance over other Figure 1. Some examples of biometric technologies. fingerprint images in our There are two basic types of database. (a) Are fingerprint twins: dizygotic, commonly images of four fingers of the first referred to as fraternal twins and twin, and (b) are the fingerprint monozygotic, referred to as images of the corresponding four identical twins [4]. Dizygotic twins fingers of his/her identical twin. result from two eggs that are (c) And (d) show fingerprint fertilized separately by two images from a non-identical twin different sperms. This usually pair. happens when the mother An automated fingerprint produces more than one egg at authentication system consists of

3 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 three components, namely, image advantages with respect to the acquisition, feature extraction contact-based technology. The and matching. Among the three, main reason is the cost of this image acquisition is often technology. In fact, in order to considered the most critical as it keep the production costs of determines the fingerprint image these devices low, their quality, which has a large effect manufacturers often use only one on the system performance[1]. camera. This results in Traditionally, fingerprint images fingerprint images with less are acquired by pressing or usable area, due to the curvature rolling a finger against a hard of the finger, compared to the surface (e.g., glass, silicon, contact-based approach. In a polymer) or paper (e.g., index touchless fingerprint image, the card). This often results in apparent frequency of the ridge- partial or degraded images due to valley pattern increases from the improper finger placement, skin centre towards the side until deformation, slippage and ridges and valleys become smearing, or sensor noise from undistinguishable. Hence, wear and tear of surface coatings. dedicated algorithms are needed to correct the ridge-valley pattern 3D RECONSTRUCTION OF with an increase in the overall TOUCHLESS computational load. FINGERPRINTS

Touchless fingerprinting is essentially a remote sensing technique used to capture the ridge-valley pattern. While it is not a completely new approach to Figure 2. Fingerprint acquire fingerprints [2, 3, 4], it did acquisition using a set of not generate a sufficient interest cameras surrounding the finger. in the market, in spite of its

4 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 This paper presents the (3) Compared to Sun et al.’s continued investigations of the method [7], the fingerprint ability of fingerprint verification database is from the same source. technology to distinguish However, only a part of the between identical twins. fingerprint dataset (51 pairs) was (1) Compared to all the methods used in [7], while the whole [2][5][7], two state-of-the-art fingerprint dataset (83 pairs) is fingerprint identification used in this paper. methods: P071 and VeriFinger (4) A novel statistical analysis is 6.1 are used for twin fingerprint conducted for five major identification in this paper rather fingerprint types, which aims at than one fingerprint showing the probability identification method in [2][5][7]. distribution of the fingerprint (2) Compared to Jain’s [2] and types for the corresponding Srihari’s [6] methods, six fingers of identical twins which impressions per finger were have same fingerprint type. This captured rather than just one is novel in our paper. (5) A impression, which makes the probability analysis is conducted genuine distribution of matching for four fingers from identical- scores more realistic. As we twins, which aims at showing know, the genuine distribution of which finger has higher matching scores needs to be probability of having same estimated from matching fingerprint type. This is also multiple fingerprint impressions novel in our paper. of the same finger. In both Jain’s Methods and Srihari’s databases, due to In this paper, two state-of- only a single impression for each the-art methods are used to finger was captured, the identify the similarity of twin distribution of the genuine scores fingerprints: P071 [13] and has to be synthesized, i.e., it is VeriFinger 6.1 SDK (VF6.1) 0. not from the real genuine The details are given as follows. matching.

5 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 P071 Algorithm process of similarity computing. P071 algorithm is first Fuzzy features were used to used for the identification of twin represent n and d. Each fingerprints which has been character is associated with a evaluated in the Fingerprint fuzzy feature that assigns a value Verification (between 0 and 1) to each feature Competition 2004 (FVC2004) and vector in the feature space. The the performance was ranked value, named degree of No.3 among all of the membership, illustrates the participated algorithms. The degree of similarity of the detailed performance of the template and input fingerprints. proposed algorithm on FVC2004 can be seen from the website [15]. The P071 method was based on a normalized fuzzy similarity measure. The algorithm has two main steps. First, the template and input fingerprints were aligned. In this process, the local topological structure matching Figure 3. ROC curves for was introduced to improve the identical-twin and non-twin matching by P071 method. robustness of the global alignment. Second, the method of normalized fuzzy similarity VeriFinger 6.1 SDK measure was introduced to VeriFinger 6.1 SDK [14] is a world- compute the similarity between well-known commercialized the template and input fingerprint recognition software, fingerprints. Two features are which is based on an advanced selected: the number of matched fingerprint recognition sample points (n) and the mean technology and is intended for distance difference of the biometric system developers and matched minutiae pairs (d) in the integrators. The technology

6 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 assures system performance with Parametric 3D Fingerprint fast, reliable fingerprint Unwrapping matching in 1-to-1 and 1-to-many modes and comparison speeds of up to 40,000 fingerprints per second. VeriFinger 6.1 SDK has many features: (1) NIST MINEX proven reliability; (2) robust processing of poor quality and deformed fingerprints; (3) more than 50 scanners are supported by VeriFinger SDK. Some of the Figure. 4. Parametric functions for VeriFinger 6.1 SDK unwrapping using a cylindrical are listed as follows: model (topdown view). Point

1. Enroll fingerprint. Fingerprint (x,y,z) on the 3D finger is can be enrolled from the image or projected to (µ, z) on the 2D by using fingerprint the scanner. plane. 2. Enroll fingerprint with Non-parametric 3D generalization. Using this option, Fingerprint Unwrapping several fingerprints can be enrolled and features generalized. 3. Verification. Using this option, one fingerprint can be verified against the other (1:1 matching). 4. Identification. Using this option, the fingerprint is identified against an internal Figure. 5. 3D representation of database (1: N matching). the finger. Vertices of the triangular mesh are naturally divided into slices.

7 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 3D Fingerprint Figures 6 (a) and (b) show the unwrapped touchless fingerprint images using the cylindrical- based and the proposed method, respectively. Minutiae points (white arrows) are extracted using the feature extraction Figure. 6. Unwrapping a 3D algorithm in [14] and distances fingerprint captured with between a few minutiae points Surround Imager using (a) the (red solid lines) are shown in cylindrical-based parametric Figures 6. These figures show method and (b) the proposed that the proposed unwrapping non-parametric method. method better preserves the inter-point distances with less distortion than the cylindrical- based method. To demonstrate the compatibility of the unwrapped touchless fingerprints with legacy rolled images, we have collected a small database with 38 fingers; each includes one -on paper rolled print and one touchless print using the new line scan sensor.

Figure.7. Visualizing compatibility between (a) a touchless fingerprints from line-scan sensor using the

8 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 proposed nonparametric likely to be the same in twins unwrapping. than in unrelated persons, and (b) the corresponding ink-on- more recent studies confirm this. paper rolled fingerprint. we propose a unwrapping Conclusion algorithm that unfolds the 3D We are presenting a study fingerprint in such a way that it of the distinctiveness of biometric resembles the effect of virtually characteristics of identical twins rolling the 3D finger on a 2D fingerprint. We have assessed the plane. capacity of state-of-the-art commercial biometric matchers References in distinguishing identical twins [1]. Tao X, Chen X, Yang X, Tian based on fingerprint. Although J, et al. (2012) Fingerprint the unimodal fingerprint Recognition with Identical biometric system also can Twin Fingerprints. PLoS discriminate two different ONE 7(4): e35704. persons who are not identical doi:10.1371/ journal. twins better than it can pone.0035704 discriminate identical twins, this [2]. Yi Chen1, Geppy Parziale2, difference is not as large as for Eva Diaz-Santana2, and Anil the face biometric system. In the K. Jain1 3d Touchless fingerprint experiments, the Fingerprints: Compatibility identical twin impostor with Legacy Rolled Images distribution is shifted to the [3]. Jain AK, Bolle R, Pankanti S right, getting closer to the (1999) Biometrics: personal genuine distribution. This identification innetworked suggests a higher correlation society: kluwer academic between fingerprints of identical publishers. twins compared to fingerprints of [4]. Jain AK, Prabhakar S, unrelated persons. Previous Pankanti S (2002) On the studies have shown that the similarity of identical twin fingerprint type is much more

9 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 fingerprints. Pattern Recognition Pattern Recognition. 2149- 35: 2653–2663. 2156. [5]. Ashbaugh, D. R. (1999). [10]. Lee, H. C. and Gaensslen, R. Quantitative-Qualitative E. (2001). Advances in Friction Ridge Analysis. Fingerprint Technology, 2nd Boca Raton, Florida: CRC Edition. United States of Press LLC. America: CRC Press. [6]. Cummins, H. and Kennedy, [11]. Lin, C. H., Liu, J. H., R. (1940). Purkinjes’ Osterburg, J. W. and Nicol, Observations (1823) On J. D. (1982). Fingerprint Fingerprints and Other Skin Comparison 1: Similarity of Features. American J. Police Fingerprints. Journal of Sci. 31(3). Forensic Science, 27 (2), 290- [7]. Fayrouz, N. E., Farida, N. 304. and Irshad, A. H. (2011). [12]. Liu, Y. and Srihari S. N. Relation between (2009). A Computational fingerprints and different Discriminability Analysis on blood groups. Journal of Twin Fingerprints. Forensic and Legal Computational Forensic, Medicine. 19, 18-21. 5718, 43-54. [8]. Jain, A. K., Prabhakar, S. [13]. Neale, M. C. and Maes, H. and Pankanti, S. (2001). H. M. (2004). Methodology of Twin Test: On Genetic Studies of Twins Discriminability of and Families.Netherlands: Fingerprints. Audio and Kluwer. video based Biometric Person [14]. Simon, A. C. (2002). Suspect Authentication, 1 (2091), Identities: A History of 211-217. Jain, Fingerprinting and Criminal [9]. Kong, A. W., Zhang, D. and Identification. United States Lu, G. (2006). A Study of of America: President and Twins’ Palmprints for Fellows of Harvard College. Personal Verification.

10 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 [15]. Temaj, G., Juric, T. S., education/ECE523/Slides.pdf Tomas, Z., Behluli, I., ? Narancic, N. S., Sopi, R., [20]. Introducing a New Jakupi, M. and Milicic, J. Multimodal Database from (2012). Qualitative Twins /PID2759199.pdf dermatoglyphic traits in monozygotic and dizygotic twins of Albanian population in Kosovo. Journal of Comparative Humnan Biology. [16]. Yager, N. and Amin, A, (2004). Fingerprint verification based on minutiae features: a review. Pattern Analytical Application, 7, 94-113. [17]. Can Identical Twins be Discriminated Based on Fingerprints? biometrics.cse.msu.edu/.../Fi ngerprint/JainetalTwinFpTe chReport00.pdf? [18]. The Fingerprint Sourcebook - National Criminal Justice Reference https://www.ncjrs.gov/pdffile s1/nij/225320.pdf [19]. Fingerprint Identification Overview Introduction - CVIP Lab www.cvip.uofl.edu/wwwcvip/

11 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Applying QoS Base Data Replication Attributes In Clouds

Ms. A. J. Musmade Prof. S. M. Rokade (Computer Engineering Department, HOD, Computer Engineering Department, SVIT COE, Nasik/ University of Pune, India) SVIT COE, Nasik/ University of Pune, India)

Abstract - Cloud computing is using some concepts of HQFR an important mechanism for [High QoS First Replication] utilizing computing services. algorithm. Along with the Because of the flexible QoS requirement our main nature of cloud computing goal is to minimize data environment most of the replication cost. So we are applications are developed in developing another this environment. As there algorithm which is inspired are number of applications from MCMF [Minimum Cost they have different quality of Maximum Flow] algorithm. service (QoS) requirements. At the end we will propose an Due to the data corruption in efficient scheme for data data nodes, numbers of replication on the basis of applications are unable to QoS requirement. reach their successful outcomes. To support such Index Terms—Cloud type of applications Computing, Data replication, continuously we are Quality of service. performing data replication on the basis of QoS I. INTRODUCTION requirements of the Cloud computing is corresponding application. becoming an important For performing such type of mechanism for utilizing data replication we are computing services worldwide. developing an algorithm Cloud computing have different

12 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 features like transparency in application in running state resource allocation and service which has request of that provisioning facilities. There is a corrupted data cannot access that rapid growth in new information data. Due to this data corruption, services and business oriented most of the applications are applications via internet. Because unable to reach their successful of the flexible nature of cloud outcomes. So for supporting such computing environment most of type of applications continuously the applications are developed in by avoiding data corruption a this environment. As there is new technique is introduced in tremendous increase in data cloud computing system which is intensive applications there is nothing but data replication. In necessity of new efficient data replication we maintain techniques for processing of a more than one replica that means huge volume of data. Cloud copies of each data block to avoid computing have focus on data corruption. There are scalability and availability of different techniques used for data large scale applications. Apache replication in cloud computing. Hadoop can be regarded as a But very few of them concerned typical example of cloud about quality of service computing. requirements of applications. Cloud computing system In this paper, we are processes a large volume of data focusing on quality of service in the network for performing a requirements of applications huge number of applications. In while performing data the network, there are number of replication. According to our nodes. As there are huge number knowledge very few papers are of nodes there is more possibility concerned with this quality of of hardware failure or system service(QoS) requirement failure. Due to hardware failure problem in data replication. Here some data stored at node may get QoS requirement is considered corrupted. Simultaneously an from the aspect of request

13 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 information of an application. We interested in finding efficient are trying to investigate the energy consumption of nodes in problem of applications regarding cloud computing system. Our QoS requirements while goal is to perform data performing data replication.We replication by considering the are trying to solve this QoS QoS requirements of application problem in data replication and and also the energy consumption will propose an efficient requirements of applications. technique to improve this Apart from this our paper problem. Along with this our goal is distributed in following is to minimize the total data sections. Section II describes the replication cost and to minimize related work. Section III gives the number of QoS violated data the system model. Section IV replicas [1]. describes efficient schemes for In our newly proposed data replication. And lastly we technique we will use some conclude our paper work. algorithms. The idea for first II. LITERATURE SURVEY algorithm is taken from HQFR To tolerate failures of algorithm. HQFR stands for High application in the cloud QoS First Replication. In this computing various concepts are algorithm the application having introduced. We will have look to high QoS will be considered first them. and it can perform data Some techniques are replication. Along with the introduced to avoid data minimization of replication cost corruption in hadoop distributed our aim is to minimize the file system (HDFS). For number of QoS violated data NameNode failure checkpoint replicas. For achieving such type method is used. In this method of goal we will introduce another NameNode periodically combines algorithm which gives optimal the existing checkpoint and solution to the QoS requirements journal to create a new in data replication. We are also checkpoint and an empty journal.

14 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Where checkpoint is persistent failure which is nothing but the record of the image stored in the data replication. In this local host’s native file system and technique they maintain more journal is storage of changes in than one copies of each data the image. The checkpointNode block. In HDFS default data usually runs on a different host replication factor is two. When a from the NameNode since it has new block is created, it places the the same memory requirements first replica on the node where as the NameNode. It takes the the writer is located, the second current checkpoint and journal and third replicas on two files from the NameNode, merges different nodes in a different rack them locally and returns the new and the rest are mounted on checkpoint back to the random nodes with restrictions NameNode. Creating checkpoints such that no more than one is one way to protect the file replica is placed at one node and system metadata [2]. no more than two replicas are Another concept of placed in the same rack when the BackupNode is discussed in the number of replica is less than same paper. It is similar to twice the number of racks [2]. CheckpointNode, creating There are many periodic checkpoints but in techniques introduced to improve addition it keeps record of an data availability and reliability in inmemory, up to date image of cloud computing system, but very the file system namespace which few of them investigate the is always synchronized with the problem of QoS requirements of NameNode. If NameNode fails, application in data replication. the BackupNode’s image in The problem is memory and the checkpoint on investigated of placing the disk is a record of the latest replicas of an object in content namespace state [2]. distribution systems to meet the A new technique is QoS requirements of clients with introduced to tolerate DataNode the objective of minimizing the

15 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 replication cost. In this paper more efficient than l-Greedy- QoS requirements are specified Delete when optimal solution has in the form of a general distance few replicas and l-Greedy-Delete metric. They consider two classes become more efficient when of service models which is replica optimal solution has many aware services and replica blind replicas [3]. services. They proposed efficient A simple problem algorithms to compute the formulation is NP-complete is optimal locations of replicas shown firstly. They present under different models. Their different types of heuristics for goal is to find a replica placement object replication that satisfies that satisfies all requests without the specified access time violating any range constraint, deadlines and they try to achieve and minimize the update and low storage overhead. Their goal storage cost at the same time. was to minimize the number of They show that this QoS aware replicas presented in system. replica placement problem is NP- They proposed a simple complete for general graphs and algorithm which finds the they provide two heuristic solution of QoS aware problem. algorithms called l-Greedy-Insert This algorithm may be known as and l-Greedy-Delete, for general Greedy MSC [Greedy Minimum graph and a dynamic Set Covering [4]. programming solution for tree Authors presented a new topology. Since l-Greedy-Insert heuristic algorithm for QoS starts by inserting replicas into a aware problem which is based on empty replica set & l-Greedy- the idea of cover set. It Delete starts by deleting replicas determines the positions of a from a full replica set, the minimum number of replicas execution time of these two expected to satisfy certain quality algorithms depends on the requirements. Their placement number of replicas in the optimal algorithm exploits the data access solution. So l-Greedy-Insert is history for popular data files and

16 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 computes replica locations by about minimization of replication minimizing QoS satisfaction for a cost, minimization of QoS given traffic pattern [5]. violated data replicas and Authors proposed a new minimization of energy heuristic algorithm, called consumption. Greedy-Cover, which finds good solutions for QoS aware replica III. SYSTEM MODEL placement problem in general For designing our graph. Algorithm helps to decide replication strategy we refer to the positions of the replicas to the architecture of Hadoop improve system performance and Distributed File System. The satisfy the quality requirements basic architecture of HDFS specified by the user cluster is consisting of two major simultaneously [6]. nodes NameNode and DataNode Authors proposed a [2]. replica replacement strategy to make dynamic replica management effective. A dynamic replacement strategy is proposed for a domain based network where a weight of each replica is calculated to make the decision for replacement [7].

All the above explained Figure 3.1: Hadoop Distributed algorithms are not suitable for File System solving our QoS aware problem. Fig. shows the In our system number of replicas architecture of HDFS. Multiple is fixed for specific data block. So DataNodes are mounted in a it is not easy to minimize rack. There are number of racks replication cost with fixed at a NameNode. They number of replicas. Our QoS communicate with each other aware problem is concerned through switches. Files and

17 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 directories are represented on the disk access latencies). After that, NameNode by i nodes, which if data corruption occurs in the record attributes like node running the high-QoS permissions, modification and application, the data of that access times, namespace and disk application will be revert from space quotas. The file content is the low-latency node. Since the divided into large numbers of low-performance node has slow blocks and each block of the file communication and disk access is independently replicated at latencies, the QoS requirement of multiple DataNodes. NameNode the high-QoS application may be keeps the record of datablocks violated. Note that the QoS stored in DataNodes. DataNodes requirement of an application is send heartbeats to the defined from the aspect of the NameNode to represent its request information. proper functioning. We are trying to In a cloud computing investigate the problem of QoS cluster of thousands of nodes, requirements satisfaction while failures of a node (most performing data replication in commonly storage faults) are cloud computing. The problem is daily occurrences. A replica given as, due to limited space for stored on a DataNode may replication some high QoS data become corrupted because of get stored at lower performance faults in memory, disk, or node and cannot reach to the network. To avoid the data appropriate application to give corruption, the data replication correct outcomes. Sometimes low technique is used which provide QoS data get stored at high high data availability. Because of performance node, that data is the heterogeneous nature of not in use for longer time, in this node, the data of a high-QoS way it is wastage of data application may be replicated in a replication space at high low-performance node (the node performance node. This type of having slow communication and lower frequency data blocks

18 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

which are not in use by any . Assumption 1: consider a application or they cannot meet cloud computing system with their appropriate application, are a set of storage node (S). known as QoS violated data These nodes can be run replicas. We are trying to solve applications along with this QoS problem in data storing data. The replication and will propose an functionality of storage nodes efficient technique to improve is similar to the storage nodes this problem. Along with this our in HDFS. goal is to minimize the total data . Assumption 2: Let r be the replication cost and to minimize requested node such as r ϵ S. the number of QoS violated data If node r is running its replicas. application writes a data block b to the disk of node r. IV. AN EFFICIENT DATA Then the replication request REPLICATION is forwarded from node r to TECHNIQUE the cloud computing system. Then the copies of block b are In this section, we will replicated to other nodes in represent two efficient the cloud computing system. algorithms for solving QoS aware . Assumption 3: Now let q be problem in data replication. The the satisfied node. That mean main goal of our scheme is to a replica copy of block b from minimize replication cost and to node r is stored as dr block at minimize QoS unsatisfied data node q. Let T be the time replicas count. We are also required to store dr. dr is interested in finding an efficient associated to the replication algorithm for performing data cost (RC) and total access replication with minimum energy time of replication (AC). consumption. We have some . Assumption 4: When node r assumptions for solving our QoS cannot read its data block b aware problem: due to data corruption then it

19 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 will retrieve the data replica than original copy and they dr from the node q, but if mounted on different data racks access time AC is greater to avoid rack failure. than the time T, then dr Basic idea of algorithm: As becomes QoS violated the name indicates the (unsatisfied) data replica. applications with high QoS . Assumption 5: Our main should be replicated first. goal is to minimize the total According to our knowledge the number of Qos violated data high QoS application have replicas and the total stricter requirements in the replication cost for all data response time of a data access blocks. time than the normal applications. High QoS Now we will see the requirement application should algorithms: take precedence over the low QoS requirement application to 4.1. HQFR algorithm: As the name indicates High QoS perform data replication. First Replication algorithm. The In the cloud computing main thing is that we are system when any application considering the QoS requirement perform a write operation then from the aspect of request the node at which that information and its access time application is executing will only. In HDFS the data is divided forward a replication request of a into 64MB data blocks. The data block to the NameNode. The replication factor is two in HDFS. access time means the qoS There are two numbers of copies requirement of that application is of data block other than the also attached with that request original one. And that two copies which going to generate a QoS are stored on different aware replication request. Like DataNodes or different data this multiple QoS aware racks. And the NameNodes keeps replication requests are issued in track of all the replicas other the cloud computing system from

20 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 different nodes. But these The requested node R and its  i requests are processed and sort qualified node Qj should not be them in ascending order mounted in the same rack. according to their associated They should belong two access time. If the replication different racks. request r has higher QoS Rack(R ) (1) i i) ≠ Rack(Qj requirement than the replication Where Rack() is the function to request rj that means the ri has determine in which rack a

smaller access time than rj. In node is located. such a case r will be processed The total data replica access i  first to store its data replicas in time from qualified node Qj to this algorithm. request node Ri (Taccess(Ri,Qj)) should be smaller than the QoS requirement of running

application in Ri which is Tqos. T (R ,Q (2) access i j) ≤ Tqos After finding the qualified nodes by using these two conditions the data block can store its one data replica in each While processing these qualified nodes and the replication request we have to qualified nodes update their find the qualified node’s list replication space respectively. which helps to satisfy the QoS Now we will calculate the total requirements of the appropriate replication cost. In HQFR application while running. The algorithm the total replication QoS requirement is given in the cost is represented by the total form of access time of that data storage cost taken by all the block which is requested by an requested nodes to store their application. Note that while appropriate replicas. The finding qualified node it should replication cost is nothing but satisfy two conditions:

21 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

the total summation of storage using set Sr and Sq form a costs of all data block replicas. network flow graph. The But we are mainly interested in vertices in the graph are

minimizing replication cost from both the sets Sr and Sq and also the number of QoS and each edge represents violated data replicas. For the pair of appropriate achieving second objective we capacity and cost of the data are going to propose another replication. Then by algorithm for data replication. applying a suitable MCMF 4.2. An efficient optimal algorithm find out an replica placement efficient solution for that algorithm: As its name network flow graph. Then indicates, this algorithm after we will perform the gives an efficient solution to same operation for the the QoS aware replication unqualified nodes problem. In this algorithm corresponding to each

we are transforming QoS requested node Ri. Form the aware problem to the new graph from both the MCMF [Minimum Cost sets described above. Solve Maximum Flow] problem. the graph by using same As same to the previous MCMF algorithm. Consider algorithm in this algorithm both the solutions obtained also we are going to find out previously and perform an efficient optimal placement SqRi the set of qualified nodes for each requested of all QoS violated data replicas. node Ri. Then after we will make union of the set of Because of optimal placement of Qos-violated data qualified nodes Sq with the replicas the number of these newly derived set SqRi which is set of qualified replicas are minimized, which nodes corresponding to each is our main goal. As we have used MCMF algorithm in this requested node Ri. Then by

22 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 scheme, we get our solution in In this algorithm similar to polynomial time. In this above algorithm we are scheme the second part is finding a set of qualified having one flow graph. The nodes corresponding to each

amount of this flow graph is requested node Ri. Create set the amount of flow leaving of such nodes. After that we

from requested node Ri.. Here have check status of each we are considering the amount qualified node. So we will of flow leaving, which is not collect energy status of each added to the total replication node and make another set

cost, which automatically for this Er. helps in minimizing the total Then according to the energy replication cost. Hence we status of nodes, we should achieved our both objectives. sort them in descending order, node with higher energy should come first. The replication request of that node should be considered first from the set of requested node. So the replication request is performed in minimum time with efficient energy. This algorithm is not derived in But we are interested in real time yet, but we are minimizing energy trying to develop this consumption in data algorithm soon. replication. We are going to investigate another algorithm for energy optimization. 4.3. Efficient Energy optimization algorithm:

23 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

V. CONCLUSION REFERENCES We have investigated the Journal Papers: QoS requirement problem in data Jenn-Wei Lin, Chien-Hung [1] replication in cloud computing Chen, and J. Morris Chang “ system. We have been proposed QoS – Aware Data an efficient scheme to solve the Replication for Data Intensive QoS aware problem in data Applications in Cloud replication. First algorithm is Computing System”, Digital inspired from HQFR algorithm. Object Identifier 10.1109 / This algorithm cannot give TCC 2013 IEEE. K. Shvachko, H. Kuang, S. optimal solution to the QoS [2] aware problem. So we proposed Radia, and R. Chansler, “The another algorithm which gives Hadoop Distributed File optimal solution in polynomial System,” in Proc. IEEE 26th time. This algorithm also helps in Symp. Mass Storage Systems achieving both the objectives of and Technologies (MSST), our paper which is minimization Jun. 2010, pp. 1–10. X. Tang and J. Xu, “QoS- of replication cost and [3] minimization of QoS violated Aware Replica Placement for data replicas. Content Distribution,” IEEE In future, we are going to Trans. Parallel and Distrib. find out an efficient energy Syst., vol. 16, no. 10, pp.921– optimization algorithm to energy 932, Oct. 2005. Won J. Jeon, I. Guptaand K. consumption problem of nodes [4] Nahrstedt “QoS Aware Object while performing data replication Replication in Overlay in cloud computing system. We Networks”, IEEE want to develop a new algorithm GLOBECOM 2006, IEEE. which gives proper solution to X. Jia, Deying Li, Hongwei this problem. We will try our best [5] Du and Jinli Cao “On levels to implement that Optimal Replication of Data algorithm in real time. Object at Hierarchical and

Transparent Web Proxies”,

24 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 IEEE Transactions on Mohamed-K HUSSEIN, [10] Parallel and Distributed Mohamed-H MOUSA"A Systems, VOL 16, No. 8. Light-weight Data August 2005. Replication for Cloud Data H. Wang, Pangfeng Liu, Jan- Centers Environment" [6] Jan Wu “A QoS aware International Journal of Heuristic Algorithm for Engineering and Innovative Replica Placement” Grid Technology (IJEIT) Volume Computing Conference 2006 1, Issue 6, June 2012 IEEE. Dejene Boru, Dzmitry [11] K.Shashi and T. Santhanam Kliazovich, Fabrizio Granelli, [7] “Replica Replacement Pascal Bouvry, and Albert ,Y. Algorithm for Data Grid Zomaya"Energy-Efficient Environment”, ARPN Data Replication in Cloud Journal, Vol. 8,No. 2, Feb Computing Datacenters". 2013. M. Creeger, “Cloud [12] Nihat Altiparmak and Ali S, Computing: An Overview,” [8] aman Tosun "Integrated Queue, vol. 7, no. 5,pp. 2:3– Maximum Flow Algorithm for 2:4, Jun. 2009. Optimal Response Time K. V. Vishwanath and N. [13] etrieval of Replicated Data" Nagappan, ‘Characterizing 2012 41st International Cloud Computing Hardware Conference on Parallel Reliability,” in Proc. ACM Processing Symp. Cloud Computing, Da-Wei Sun ,Xing-Wei Wang Jun.2010, pp. 193–204. [9] "Modeling a Dynamic Data E. Pinheiro, W.-D. Weber, [14] Replication Strategy to and L. A. Barroso, “Failure Increase System Availability Trends in a Large Disk Drive in Cloud Computing Population,” in Proc. 5th Environments" JOURNAL USENIX Conf. File and OF COMPUTER SCIENCE Storage Technologies, Feb. AND TECHNOLOGY 27(2): 2007, pp. 17–28. 256272 MAR. 2012.

25 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Cloud Compiler for C, C# and Java

SagorikaDatta, Aradhana, Anjali Dewani, PriyaBankar Sir Visvesvaraya Institute of Technology, Nashik University of Pune Email: [email protected]

Abstract-A compiler converts of excess space required to higher order language manually install compilers on instructions into lower order each machine and other set- or assembly language up options and configuring if instructions. Whereas, cloud not installed using the computing is a metonym for default settings and distributed computing over a parameters. Platform network, and means the independence is also an issue ability to run a program or when a program is compiled, application on demand, on so it is nearly impossible to many connected computers at transport the same code to the same time with marginal other machines. Also usage of efforts of management. The multiple languages implies cloud compiler created by us the installation of multiple is an amalgam ofthe best of compilers. To avoid all these both worlds, embedding not 1 hindrances, our project, the but 3 compilers in cloud. In online cloud compiler today’s technologically provides easy access to the advanced day and age, there compilers of three majorly is a new and advanced used programming languages: or software C, Java and C#, just by which makes the software accessing through any released a mere few years ago browser enabled device with obsolete and unsupported on any network connection, thus the new platform. For providing remote access as compilers, there is the issue well as platform

26 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 independence. We can just client accessing the required upload the program on cloud resources of cloud can be and it will get compiled along impervious to the settings and with its space and time mechanism of the system which complexity. This promotes actually provides said resources. portability, conservation of Even though there is the slight space and less overhead. drawback of loss of management over the infrastructure/ Keywords: Compiler, Cloud surrounding utilized by the users, Computing, Cloud Compiler, the many benefits of cloud Platform Independence including the conservation of memory, minimization of costs make cloud computing the new I. INTRODUCTION age technology clamored by A. CLOUD COMPUTING everyone. The existing The National Institute of Standards and Consequently, security on cloud Technology (NIST) defines ‘Cloud can also be enhanced through use Computing’ as ‘a model for of various innovative techniques enabling easy, on-demand such as through use of aggregate network access to a shared pool of keys, etc. Cloud computing offers configurable computing resources a varied range of services, some of (e.g., networks, servers, storage, the more common of which are applications, and services) that Software as a Service (SaaS), can be rapidly provisioned and Platform as a Service (PaaS) and released with minimal Infrastructure as a Service. The management effort or service types of cloud include public, provider interaction.’ The private, hybrid, community and advantage of this being that the combined cloud computing. resources on cloud can be Fig. 1 displays the overall layout accessed remotely through any of cloud computing. browser enabled device. Also, the

27 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 2. Generation of efficient object code. 3. Performs run-time organization 4. Formatting the output according to asset according to linker and assembler conventions.

Fig. 1. Cloud Computing The compiler comprises of:- B. COMPILER The Front End: It performs  verification of syntax and For the purpose of converting semantics and generation of source code to object code, an intermediate compilers are used. Where the representation of the source source code includes higher level code for processing the middle programming languages and the end. Type checking is also object code includes lower order done by collecting type programming languages. A information. It produces compiler usually entails executing faults and Warnings, if any, in operations like: lexical analysis, a constructive way. preprocessing, parsing, semantic Middle End: It performs: analysis (Syntax directed-  1. Optimization, including translation), code generation and removal of useless or code optimization [17]. unreachable code.

2. Discovery and propagation Structure of a Compiler [17] of constant value. 3.

Relocation of computation to The Bridging of source program less frequently executed way. is done by compilers in high level 4. Generation of another languages with underlying intermediate representation hardware. for the back end. The work done by compilers is:- The Back End: 1. Verification of code syntax 

28 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 1. It generates assembly code by allocating register in Our project as stated before process. includes embedding 3 major 2. NP heuristic techniques are compilers in cloud, hence the used for optimization of name cloud compiler, which not algorithm. only enables remote access to the 3. It helps in optimizing target compiler, but after compiling the code utilization of the program (which can be created hardware. using the editor of one’s own The existing system includes choice) also gives us the time and manually installing compilers on space complexity of the program. our computers. Apart from As a result of this, there is occupying more space on the conservation of memory, platform system, there is at times the independence and the much overhead of need of separate needed portability that today’s editors for each compiler and professionals on the move finding the compatible software require. Fig. 2 provides an with the operating system on the overview of the cloud compiler. machine. If the software is not compatible, for example, the older version of Turbo C being incompatible with Windows 8, we may need to take other measures like the use of DosBox to run programs, or installing a virtual machine to install an operating Fig. 2. Overview of Cloud Compiler system like Windows XP which can thereby compile and run the II. ONLINE CLOUD program by installing the COMPILER software in it. Additionally, there The Cloud Compiling™ Cloud is the disadvantage of not being Compiler™ (CC) is a family of able to use the compiler remotely. cloud compiler licensed programs

29 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 for the IBM OS/390 and z/OS 2. Load: This will load the operating environments. A cloud selected file on cloud, where compiler is a program that by it will be interpreted to functions equivalently to an categorize it in 1 of the three actual compiler but does not languages offered by our cloud require that the actual compiler compiler, which are either of be installed or licensed on the C, Java or C#. machine on which it runs. The 3. Compile: Here, the uploaded CC utilizes FTP (the File program is compiled using the Transfer Protocol client of the appropriate compiler for the IBM OS/390 or z/OS SecureWay language of the program and Communications Server) to if there are any errors, then transmit the user’s source code to they are displayed in the error another mainframe (on which the window along with the line actual compiler is installed), number and details which a compile it there, and return the normal compiler would output of the actual compiler usually provide. If the (system messages, listing, object program is error free, then code, etc.) to the user’s specified the correct output is displayed target datasets. Most of the in the output window. options and features of an actual The following Fig. 3, 4, and 5 compiler are supported. display the user interface of the The principal function of our project which exhibit the compiler in cloud is undertaken simplistic way through which one in three very simple steps: can make use of the project. 1. Choose File: Where you select the file to be uploaded from your device. The file should be the program that you want to compile, typed in the editor of your choice.

30 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 As opposed to any similar concept that may have been implemented before ours, we use the concept of Platform as a Service rather than Software as a Service. In the SaaS model, cloud providers install and operate application software in the cloud and cloud users access the software from cloud clients. Cloud Fig. 3.Homepage GUI users do not manage the cloud infrastructure and platform where the application runs. This eliminates the need to install and run the application on the cloud user's own computers, which simplifies maintenance and support. Cloud applications are different from other applications

Fig. 4. Output of uploaded in their scalability—which can be program achieved by cloning tasks onto

multiple virtual machines at run- time to meet changing work demand [15]. In the PaaS models, cloud providers deliver a , typically including operating system, programming language execution environment, database, and web server. Application developers can Fig. 5.Errors displayed in Error Window develop and run their software

31 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 solutions on a cloud platform services. The .NET Framework is without the cost and complexity designed to fulfil the following of buying and managing the objectives [16]:

underlying hardware and  To provide a consistent object- software layers. With some PaaS oriented programming offers like Windows Azure, the environment whether object underlying computer and storage code is stored and executed resources scale automatically to locally, executed locally but match application demand so that Internet-distributed, or the cloud user does not have to executed remotely.

allocate resources manually. The  To provide a code-execution latter has also been proposed by environment that minimizes an architecture aiming to software deployment and facilitate real-time in cloud versioning conflicts.

environments[14].  To provide a code-execution environment that promotes Platform as a Service offered by safe execution of code, the cloud compiler is also including code created by an advantageous on the fact that unknown or semi-trusted once registered as a user, the third party.

client can access the compilers  To provide a code-execution indefinitely as opposed to using environment that eliminates software that is on lease and may the performance problems of expire after a certain period of scripted or interpreted time. environments.

III. MICROSOFT .NET  To make the developer FRAMEWORK experience consistent across widely varying types of The .NET Framework is a applications, such as technology that supports building Windows-based applications and running the next generation and Web-based applications. of applications and XML Web

32 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

 To build all communication 2. Compiler embedding: It on industry standards to involves embedding the ensure that code based on the previously integrated .NET Framework can compilers on cloud. integrate with any other code. 3. Passing input: Sending the input from user side to the

IV. PROJECT ARCHITECTURE cloud compiler for compiling. The architecture used in the 4. Retrieving output: After system is two-fold i.e. upper layer recognition of language from and lower layer. The upper layer the cloud compiler side, the comprises of server and lower compiler compiles the given layer comprises of clients of lower input program and resends configuration. the result back to the user. Following are the key Protocols used are the SOAP and components of the upper layer: WSDL protocols.

1. A web framework, Visual Fig. 6 gives an explicit Studio 2010: It handles the representation of the architecture work of scripting and of the cloud compiler. compilation of code 2. Internet Information Services Server: It handles the client request. 3. Cloud hard disk: It is a shared resource.

Our project includes the following: Fig. 6. Architecture of Cloud 1. Compiler Integration: This Compiler entails the integration of the C, C# and JAVA compilers in cloud in our Project.

33 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

V. PROJECT File managing, running compiler IMPLEMENTATION and processing the compiler result is done by the script. List While developing the software it of source code and list of errors is of the essence to decide which are sent as result to the user. programs will be executed on server side and which programs VI. PROJECT USE will be executed on client side. When the program to be The main purpose of the creation transferred to the user is of a project like this is to provide moderate in size this approach is utmost convenience to used. The application is run on students/professionals on the the server and the data is move as mentioned before. It transferred between client and gives freedom from being server. With the online compiler restricted to older technologies all the execution is done on the just to use compilers which are server side and the information unsupported by newer ones, as has to be on to the server. After one can easily update their the execution of the information, systems and still be able to the server sends back the result compile programs at ease. to the client that made the When it comes to the specifics of request. usage, however, we can The designing of the front end is recommend using our projects in to be as simple as possible that arenas like the conduction of loads quickly and is platform University Online Practical independent. Examinations, where instead of ASP.NET handles the having to specifically check each communication between a user system for the compiled program and compiler. Therefore the and using different editors and server side uses this part of compilers to do so, the cloud application for implementation provides a centralized system using ASPX written in ASP.NET. whereby we can easily compile

34 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 and run programs. Same goes for advantage of this project is that the interviews for applicants for a upgradation of compiler package technical job, where the company can be done easily without might need to test their coding installing it on each and every skills in the Technical Aptitude machine. round. Here, the cloud compiler In future we hope to implement a will provide a simple and major compiler embedding of sophisticated way through which almost all available compilers. the coding prowess of the applicant can be evaluated by the REFERENCES personnel in-charge and the errors/output can easily be [1] Vouk, M. A., “Cloud displayed without much hassle. Computing – Issues, Research andImplementations" - ITI

VII. CONCLUSION AND 2008 - 30th International FUTURE SCOPE Conference on Information Technology Interfaces By integrating and enhancing the [2] Sweet, W. and Geppert, L., capabilities of fundamental “http:// It has changed technologies we are introducing everything, especially our the “Cloud Compiler” to engineering thinking,” IEEE contribute to the ease of remotely Spectrum, January 1997, pp. compiling programs without the 23-37. overhead of manually installing [3]Camposano, R.; Deering, S.; compilers separately on the DeMicheli, G.; Markov, L.; system. Mastellone, M.; Newton, A.R.; As this would eradicate the need Rabaey, J.; Rowson, J.; “What’s of installing compilers separately, ahead for Design on the Web”, the professionals/students need IEEE Spectrum, September not to visit a specific system but 1998, pp. 53-63. can check the codes at the [4] Hank Shiffman, Making Sense centralized server. Another of Java,

35 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 http://www.disordered.org/Java Vulnerabilities”, Security & -QA.html Privacy, IEEE March-April [5] Hank Shiffman, Boosting Java 2011 Performance: Native Code and [11]Chunye Gong Jie Liu Qiang JIT Compilers, Zhang Haitao Chen Zhenghu http://www.disordered.org/Java Gong, “The Characteristics of -JIT.html Cloud Computing”, Parallel [6] Gundavaram, S.,.CGI Processing Workshops Programming on the World (ICPPW), 2010 39th Wide Web.OReilly& Associates, International Conference Inc., 1996. [12]Junjie Peng Xuejun Zhang [7]Wall,L., Christiansen, T., Zhou Lei Bofeng Zhang Wu Schwartz, R.L. Programming Zhang Qing Li, “Comparison of Perl, OReilly& Associates, Inc., Several Cloud Computing 1996 Platforms”, Information [8] Shufen Zhang Shuai Zhang Science and Engineering Xuebin Chen Shangzhuo , (ISISE), 2009 Second “Analysis and Research of International Symposium 3594 Cloud Computing System [13]AamirNizam Ansari, Instance”, Future Networks, SiddharthPatil, 2010. ICFN '10. Second ArundhatiNavada, Aditya International Conference Peshave, VenkateshBorole , [9] Shuai Zhang Shufen Zhang Online C/C++ Compiler using Xuebin Chen XiuzhenHuo, Cloud Computing, Multimedia “Cloud Computing Research Technology (ICMT), July 2011 and Development Trend”, International Conference, pp. Future Networks, 2010. ICFN 3591-3594. '10. Second International [14] Platform-as-a-Service Conference Architecture for Real-Time [10]Grobauer, B. Walloschek, T. Quality of Service Management Stocker, E. , “Understanding in Clouds Cloud Computing

36 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 [15]Hamdaqa Mohammad, A Reference Model for

Developing Cloud Applications

[16] http://msdn.microsoft.com/en- us/library/zw4w595w(v=vs.110 ).aspx [17] Ano, Alfred V.,RaviSethi, Ullman, Jeffrey D.(1986) Compilers; Principles, Techniques and Tools (1st ed.) Addison-Wesley. ISBN 9780201100884

37 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 GSM NETWORKED DTMF BASED SMART PASSWORD ENTRY SYSTEM 1.Charushila Bhika Bachhav,SVIT ,Nashik 2.Poonam Shivaji Bagul,SVIT,Nashik 3.Pradnya Somnath Sanap,SVIT,Nashik Under the guidance of:-S.A.Gade,SVIT,Nashik [email protected] [email protected] [email protected]

Abstract— Most of the Hacking is the most security problems meet likely and unusual security unexpectedly on the Internet problem with itself. True are due to human faults. hacking usually means that The first level of security the hacker had no or few leaks occurs at the time of information on his target and website development.Hacker does most of the could extract confidential breakthrough with his own information from the website knowledge. General users are itself if a website developer usually not the target of doesn't correctly plan or hackers; hackers will usually proof test his try to get through security scripts.Although some of the barrier of big organization’s virus might contain Internet servers or try to damageable programs for hack Website Servers. They your computer or even allow usually get to do so by using a distant user to take control some software engineering of your computer most of faults that have yet to be them will usually not affect fixed. your computer. These At large to help in this programs are known as case,there is no particular "Trojan Horse". way unless you're the developer of that software.

38 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 To prevent this problem we aim of the proposed system is to developed a system in which develop a cost effective solution we uses a keyboard less that will provide security against password entry system by attackers. using DTMF card through Generally in every system mobile phones. we enter a password using the Keywords- keyboard in which any outsider Decoder,DTMF,GSM,Microco person hack the password or can ntroller,Mobile phone. access important data by hacking that system. When we press the I. INTRODUCTION Today the most common internet keys on the keyboard .that security problem is hacking . keystroke can captured by the True hacking means that the Trojan horse which is already hacker had only few or none entered in the system through information on his hacking goal the network. It captures all and he can do most of the keystrokes which we press and technological advance with the send it to the other person. The knowledge which already he Trojan horse is a small program knows. Generally common usrs that does not attempt to delete not the target of hackers.Hackers anything on user’s disk, but always try to hack big instead replicates itself on the organizations Internet servers or computer s or networks. It enters Website Servers. They use some silently in system during software engineering techniques authentication. When we enter that have yet to be fixed. Pass password using keyboard on a few months there is continuously system that time Trojan horse the problem of hacking is in can gathers important data from news. We proposed a system to system and send back to the prevent problem of hacking. The outsider person who can misuses system imply keyboard less that data. In big organizations password entry through mobile like NASA security is must. To phones by using DTMF card. The prevent that problems our

39 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 system works in which we use a pins of 8870.These 4 bit output id DTMF card through mobile displayed on the DTMF circuit. phones These 4 bit is converted II.SYSTEM DEVELOPMENT into particular decimal number A. Block Diagram when passing to server. B. Circuit Diagram

Fig I:Pictorial representation of complete system One mobile in Fig II:Decoding of DTMF MT- Administrates hand which is a 8870(DECODER) sending phone. Receiver phone is attached to the DTMF card. Receiver phone in set to auto answering mode so when call is made it automatically pick up the call. Earphone is permanently attached to the receiving phone. 8870 DTMF decoder is connected Fig III:Microcontroller output control to receiver. device After receiving the phone In the circuit diagram, the by receiving phone, sending working of system’s mechanism phone or we can say that has explained. Here, a mobile Administrators phone acts as a phone is connected in the control remote. as any key is pressed on unit with headset. When a call is sending phone, the corresponding made the mobile phone in the 4-bit output appears on output control unit automatically

40 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 answered after that password is units and the voice message unit being pressed. The Decoder MT- will be activated by MC.The MC 8870 converts these DTMF sends a deactivation signal when tones.(As shown in Fig II for the the recorded message is played system)[1].Then output which is back.In the same manner this decoded sent to the operation is continues until the microcontroller, which issues the last call is performed.The speaker command to control devices output of the ISD is connected to which is connected to it (As the cellular phone speaker ,so shown in Fig. III for the system) that the recorded message is [1]for. Switching of Device is directly heard by the receiving performed by relay. end of the phone that has been called. C. Voice Message Circuit D. The DTMF Generation and Decoding DTMF is a generic communication term used for touch tone.It is a registered trademark of AT & T.DTMF i.e.Dual Tone Multi Frequency is a popular method of signaling used between switching center Fig IV:Circuit diagram for voice message and telephones. DTMF is also used for the signaling between The circuit diagram for the computer network and the voice message units is shown telephone.When we pressed the in FigIV.The numbers which is digits of mobile phone then recorded on the SIM card of the automatically DTMF tones are mobile phone are called produces and these DTMF tones sequentially when the triggering are different for every signal is detected by digit.Generally mobile phones microcontroller from the scanned

41 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 have 12 keys .These 12 keys are Following table gives frequency 1,2,3,…,9,0,#,*.each key on the pairs [2] on key presses. mobile phone has the unique Mobi Higher Lower signal .this signal is called as le Frequenc Frequenc DTMF Signal. The extra keys keys ies in Hz ies in Hz 0 1209 941 A,B,C and D are not present on 1 1209 697 the cellphone but these keys also 2 1336 697 has unique frequency pairs. these 3 1477 697 keys are the special keys used for 4 1209 770 special purpose.When call is 5 1336 770 6 1477 770 connected ,and pressing of any 7 1209 852 numeric key of mobile phone 8 1336 852 generates the DTMF signal. 9 1477 852 These DTMF signals are audiable * 1336 941 to all of us.the DTMF tone for # 1477 941 A 1633 697 each key is a combination of 2 B 1633 770 different frequencies.each key C 1633 852 has unique frequency pair D 1633 941 associated with itself. hence it Above table shows generates the unique tone or we frequency pairs of each key.Each key combines unique can say that generates unique combination of pairs hence it DTMF signal.DTMF tone for generates unique tone. each key is combination of one III.RESULT AND ANALYSIS higher frequency and one lower frequency. For example,DTMF tone for key 7 is sum of two sinusoidal waves of frequency 1209 HZ as higher frequencyand 852 HZ as lower frequency.

42 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 In our application we use and resistor the tone duration a DTMF receiver IC. MT8870 is can be set to different values. the most commonly used DTMF This circuit is configured in receiver IC used in electronic balanced-line mode.A balanced communications circuits. This differential amplifier input is MT8870 is an 18-pin IC. This IC used to reject common-mode is used in telephones and a noise signals. This circuit also number of other applications. A provides an excellent bridging quick testing of IC MT8870 could interface across a properly save a lot of time in terminated telephone line. manufacturing industries of Transient protection is achieved communication instruments and by splitting the input resistors research labs. Here’s a little and and inserting zener diodes (ZD1 easy to test tester circuit for the and ZD2) to achieve voltage DTMF IC. This DTMF IC can be clamping. This allows the assembled on a multipurpose transient energy to be PCB with an 18-pin IC base. This disapproved in the diodes and IC can also test on a simple resistors, and limits the breadboard. maximum voltage that may For optimum working of appear at the inputs. telephone equipment, the DTMF receiver is designed to accept Whenever you press any successive digit tone-pairs that key on your local telephone are greater than 40 ms apart and keypad, on receiving the tone- to . recognized a valid tone pair pair the delayed steering greater than 40 ms in duration. (Std)output of the IC goes high, However, for other applications causing LED5 (connected to pin like radio communications and 15 of IC via resistor R15) to glow. remote controls , the tone Depending on the values of duration may change due to noise capacitor and resistors at pins 16 considerations. Therefore, by and 17 it will be high. adding an extra steering diode The optional circuit shown

43 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 within dotted line is used for 852 133 8 H 1 0 0 0 254 6 guard time adjustment. The 852 147 9 H 1 0 0 1 246 LEDs are connected via resistors 7 941 120 0 H 1 0 1 0 250 R11 to R14 at pins 11 through 9 14, respectively, indicates the output of the IC. The tone-pair IV. FUTURE SCOPE DTMF (dual-tone multi- 1. in missile firing. frequency) generated by pressing 2. Home security system the telephone button is converted 3. Mobile / Wireless Robot control into binary values internally in 4. Wireless Radio Control the IC. The binary values are indicated with the help of LEDs V. CONCLUSION at the output pins of the IC. 1. The main purpose of our LED4 represents the most project is to perform virtual significant bit (MSB) and LED1 password insertion for any represents the lowest significant system, so we can choose bit (LSB). secure way to enter password A.Functional Table using DTMF card through Following table shows the mobile phones by using GSM decoding of DTMF tones into network. Decimal Number: 2. The usual way to solve the FL FHI K T Q Q Q Q Deci problem of hacking is to OW GH e O 4 3 2 1 mal y W No. make entry of password by 697 120 1 H 0 0 0 1 247 using virtual password entry 9 697 135 2 H 0 0 1 0 251 system and better planning 6 when coding your website 697 147 3 H 0 0 1 1 243 7 and to further test your 770 120 4 H 0 1 0 0 253 scripts, especially those 9 770 133 5 H 0 1 0 1 245 dealing with private data. 6 770 147 6 H 0 1 1 0 249 7 852 120 7 H 0 1 1 1 241 9

44 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 REFERENCES [1]. Coskun And H. Ardam, “A Remote Controller For Home And Office Appliances By Telephone”, Ieee Trans. Consumer Electron. , Vol. 44, No. 4, Pp. 1291- 1297, November 1998 [2]. Tuljappa M Ladwa, Sanjay M Ladwa, R Sudharshan Kaarthik, Alok Ranjan Dhara, Nayan Dalei,” Control Of Remote Domestic System Using DTMF”, ICICI-BME 2009 Bandung, Indonesia. [3]. Nehchaljindal,” Wireless control via mobile communication”, IIT Kanpur-2010.

45 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 AUDITING PROTOCOLS: A NEW APPROACH FOR SECURITY OF CLOUD DATA

Sonali Pardeshi1,Ankita Rathi2, Shalini Shejwal 3 Pooja Kuyate4 1M.E. Student Of Computer, Pune 3M.E. Student Of Computer, Pune. 2Assistant Professor SVIT,Nashik 4M.E. Student Of Computer, Pune

Abstract- cloud computing is public auditability process a long dreamed vision of using TPA, should not computing as utility, where burden the user with data owner can remotely additional online work and store their data in cloud to data privacy of user should enjoy on demand high quality not be vulnerable. With application and services form public auditability ,a trusted shared pool of computing entity with expertise and resources. data integrity capability protection in cloud data owner do not possess computing is a mandatory can be delegated as an task as users no longer have external audit party to assess physical possession of the the risk of outsource data outsourced data users should when needed such auditing be able to just use the cloud system helps to save data storage as if it is local, owner computational without worrying about the resources and provide cost need to verify its integrity. effective method to gain To check the integrity of trust in cloud. we describe outsource data, users can approaches and system resort to a third-party requirement that should be auditor (TPA).while enabling brought in to consideration

46 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 for such publically auditable data access with independent cloud storage and also show geographical locations, and how TPA can perform audits avoidance of capital expenditure for multiple users on hardware, software, and simultaneously and personnel maintenances, etc. As efficiently. a disruptive technology with Keywords-data storage, profound implications, cloud public auditability, cloud computing is transforming the storage, batch verification. very nature of how businesses use information technology. I.INTRODUCTION As cloud make this advantages New computing paradigms keep more beneficial it also brings, emerging. One notable example new and challenging threats is the cloud computing paradigm, towards users outsources data. a new economic computing model basically cloud service providers made possible by the advances in (CSP) are separate networking technology, where a administrative entities. so this client can leverage a service cause users to desist from provider’s computing, storage or controlling the fate of their data. networking infrastructure. With Which results in risking of the unprecedented exponential correctness of data. Even though growth rate of information, there cloud have powerful is an increasing demand for infrastructures compare to outsourcing data storage to cloud personal computing devices they services such as Microsoft’s still face the broad range of both Azure and Amazon’s S3 they internal and external threats for assist in the strategic data integrity. management of corporate data. storing data remotely to the Moreover CSP might prove to cloud in a flexible on-demand behave unfaithfully toward the manner brings appealing cloud users regarding their benefits: relief of the burden for outsourced data status. They can storage management, universal reclaim storage for monetary

47 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 reasons by discarding data that cloud environment can be have not been or are rarely formidable and expensive for the accessed, or even hide data loss cloud users. Moreover, the incidents to maintain a overhead of using cloud storage reputation. Concluding cloud is should be minimized as much as economic-ally attractive for long- possible, such that a user does term large-scale storage, it does not need to perform too many not immediately offer any operations to use the data. guarantee on data integrity and Also users are reluctant for going availability. This problem, if not through process of complexity in properly addressed, may impede verifying data integrity. Consider the success of cloud architecture. example of enterprise where their As users no longer physically may be more than one user possess the storage of their data, accessing same cloud storage. So traditional cryptographic it is desirable that cloud only primitives for the purpose of data entertains verification request security protection cannot be from a single designated party. directly adopted. In particular, In order to solve the simply downloading all the data problem of data integrity for its integrity verification is not checking many schemes are a practical solution due to the proposed under different systems expensiveness in I/O and and security models. In all these transmission cost across the works, great efforts are made to network. Also it is insufficient to design solutions that meet detect the data corruption only various requirements: high when we are accessing as it does scheme efficiency, stateless not give assurance for those verification, unbounded use of unaccessed data. Considering the queries and retrievability of data, large size of the outsourced data etc. Considering the role of the and the user’s constrained verifier in the model, all the resource capability, the tasks of schemes presented before fall auditing the data correctness in a into two categories: private

48 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 auditability and public stored data. This schemes may auditability. potentially reveal users data to Although schemes with private auditors and do not consider auditability can achieve higher privacy protection of data. This is scheme efficiency, public severs drawback which greatly auditability allows anyone, not hampers the security protocol in just the client (data owner), to cloud computing. Users who own challenge the cloud server for the data and rely on TPA just for correctness of data storage while the storage security of data, do keeping no private information. not will to go through this To enable public auditing auditing process as it introducing service for cloud storage , users new vulnerabilities of resort to an independent Third unauthorized information Party Auditor (TPA) to audit the leakage toward their data outsource data when required. security. The TPA, who has expertise and Simply exploiting data capabilities that users do not, can encryption before outsourcing periodically check the integrity of could be one way to mitigate this all the data stored in the cloud on privacy concern of data auditing, behalf of the users, which but it could also be an overkill provides a much more easier and when employed in the case of affordable way for the users to unencrypted/public cloud data. ensure their storage correctness Besides, encryption does not in the cloud. TPA helps users to completely solve the problem of evaluate risk of their subscribe protecting data privacy against cloud data services. The audit third-party auditing but just result from TPA are beneficial reduces it to the complex key for CSP to improve their service management domain. platform. Public auditability Unauthorized data leakage still allows an external party, in remains possible due to the addition to the user himself, to potential exposure of decryption verify the correctness of remotely keys.

49 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

Firstly in this paper we are 1) Motivating public auditing going to tackle the problem of system for clod data storage. how to enable privacy preserving 2) Provide privacy preserving TPA protocol, independent to auditing protocol. data encryption. Secondly the 3) Auditing the cloud data individual auditing of increasing without learning the data tasks is tradiouseand hectic, so content. we have to look for , enabling the 4) Achieving batch auditing in TPA to efficiently perform privacy preserving manner. multiple auditing tasks in a batch 2 PROBLEM STATEMENT manner, i.e., simultaneously. We begin with the high level To address these architecture data storage problems, our work utilizes the As shown in figure 1. technique of public key-based homomorphic linear authenticator (or HLA for short) [9], [13], [8], which enables TPA to perform the auditing without demanding the local copy of data . this approach automatically reduces communication and computation overhead. By integrating the HLA with random masking, our protocol Figure 1: The architecture of guarantees that the TPA could cloud data storage services .s not learn any knowledge about The architecture consist of the data content stored in the architecture consists of four cloud server (CS) during auditing different entities: data process. owner,user, cloud server (CS), Specifically, our paper and TPA. the data owner is the summarizes as the following : one who has large amount of data . The cloud user is the one who

50 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 has to store large amount of data the network path, economically to the cloud . The cloud server is motivated hackers, malicious or managed by CSP to provide data accidental management errors, storage services and has etc. cloud server may for their significant storage to use. TPA is own benefits or to maintain expertise and capable, that cloud reputation CS may even decide to user do not have, and trusted to hide these these data corruption assess the cloud storage service incidents to users. TPA is a on behalf of storage. TPA reliable and independent and provides a transpernt yet cost TPA provides a cost effective effective method for establishing method to users. However, it may trust between data owner and harm the user if the TPA could cloud server. Users rely on the learn the outsourced data after CS for cloud data storage and the audit. maintenance. They may also In this model, beyond users’ dynamically interact with the CS reluctance to leak data to TPA, to access and update their stored we also assume that cloud servers data for various application has no incentives to reveal their purposes. It is important to hosted data to external parties. insure the user that their data On the one hand, there are are correctly stored and regulations, e.g., HIPAA [16], maintained. Cloud users may use requesting CS to maintain users’ the TPA for insuring the data privacy. On the other hand, integrity of the outsource data as users’ data belong to their that will save the computation business asset [10], there also resource and periodically brought exist financial incentives for CS the storage verification. to protect it from any external The data integrity threats parties. Therefore, we assume come from both internal and that neither CS nor TPA has external attacks at Cloud Server. motivations to collude with each These threats include: software other during the auditing bugs, hardware failures, bugs in process. In other words, neither

51 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 entities will deviate from the backgroundfor our proposed prescribed protocol execution in scheme. the following presentation. Bilinear Map. Let G1, G2, and 3 THE PROPOSED SCHEMES GT be multiplicative cyclic The paper is going to present the groups of prime order p. Let g1 solution for outsourcing and g2 be generators of G1 and data and it also check the G2, respectively. A bilinear map is a map e : G1XG2 G such that integrity of the data. First we see → T for all u G , v G2 and a, b the notation and preliminaries € 1 € € ,overview then , present our Zp, e(ua ,vb)=e (u,v)ab. This main scheme and show how to bilinearity implies that for any extent our main scheme to u1,u2 € G1, v €G2, e(u1.u2,v)=e(u ,v). e(u ,v) Of support batch auditing for the 1 2 TPA upon delegations from course there exists an efficiently multiple users. computable algorithm for computing e and the map should 3.1 Notation be nontrivial, i.e., e is F—the data file to be nondegenerate: e(g ,g outsourced, denoted as a 1 2)≠1 Definitions sequence of n blocks m1; . . .;mi; . We are following the a similar . .;mn € Zp for definition of previously proposed some large prime p. schemes in the context of remote MAC(.)—message authentication data integrity checking [9], [11], code (MAC) [13] and adapt the framework for function, defined as: our privacy-preserving public KX{0,1}*→{0,1}* where K auditing system. denotes the key space. This system consist of four

H(.),h(.)—cryptographic hash algorithm functions. 1 .KeyGen: a key generation We now introduce some algorithm that is run by the user necessary cryptographic to setup the scheme 2. SigGen: it consist of digital

52 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 signatures demerits— 3. GenProof: generate proof of The second one is a system based correctness of data storage on homomorphic linear 4. VerifyProof: TPA run this to authentica-tors, which covers audit the proof. many recent proof of storage This auditing done in to the two systems phases: MAC-based solution. There are Setup : The setup consist of two possible ways to make use of keyGen and sigGen which MAC to authenticate the data. A preprocess the data file F and trivial way is just uploading the verify the metadata. The then data blocks with their MACs to keep the file F for verification of the server, and sends the data and delete the local copy. corresponding secret key sk to the TPA. Audit: The TPA does audit to make sure that the CS retained Before data outsourcing, the the integrity of the data file F. it cloud user chooses s random uses the GenProof and message authentication code keys {sk } VerifyProof response message. T 1≤t≤s Precomputess MACs, {MAC The TPA is stateless, i.e., TPA skt (F)} and publishes these does not need to maintain and 1≤t≤s update state between audits[13]. verification metadata (the keys We can easily extend the and the MACs) to TPA. The TPA framework to capture a sateful can reveal a secret key sk to the auditing system, by verifying cloud server and ask for a fresh data in two parts which are keyed MAC for comparison in sorted by TPA and cloud server. each audit. This is privacy preserving as long as it is 3.3 The Basic Schemes impossible to recover F in full First we will study two classes of given MAC (F) and skt. schemes. The first one is a MAC- skt Basic Scheme 1: based solution which suffers from undesirable systematic

53 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 File is divided into Blocks preserving public auditing, we propose to uniquely integrate the homomorphic linear authenticator with random masking technique. In our protocol, the linear combination Basic Scheme 2: of sampled blocks in the server’s response is masked with randomness generated by the server. With random masking, the TPA no longer has all the necessary information to build up a correct group of linear HLA-based solution. To equations and therefore cannot effectively support public derive the user’s data content, no auditability without having to matter how many linear retrieve the data blocks combinations of the same set of themselves, the HLA technique file blocks can be collected. On [9], [13], [8] can be used. HLAs, the other hand, the correctness like MACs, are also some validation of the block- unforgeable verification authenticator pairs can still be metadata that authenticate the carried out in a new way which integrity of a data block. The will be shown shortly, even with difference is that HLAs can be the presence of the randomness. aggregated. It is possible to Our design makes use of a public compute an aggregated HLA key-based HLA, to equip the which authenticates a linear auditing protocol with public combination of the individual auditability. Specifically, we use data blocks. the HLA proposed in [, which is based on the short signature 3.4 Privacy-Preserving Public Auditing Scheme scheme proposed by Boneh, Lynn, and Shacham (hereinafter Overview. To achieve privacy-

54 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 referred as BLS signature) . uniformly at random from Zp as the identifier of file F. Denote the set of authenticators by = Scheme details. Let G1, G2, and ф { i} GT be multiplicative cyclic groups σ 1≤i≤n The last part of SigGen is of prime order p, and e :G1 X G2- for ensuring the integrity of the >GT be a bilinear map as unique file identifier name. One introduced in preliminaries. Let g simple way to do this is to

compute t = name||SSigssk be a generator of G2. H(.)is a secure map-to-point hash (name) as the file tag for F,

where SSigssk(name) is the function: {0,1}*->G1 which maps signature on name under the strings uniformly to G1. Another private key ssk. For simplicity, hash function h(.):GT->Zp maps we assume the TPA knows the group element of GT uniformly to number of blocks n. The user Zp. then sends F along with the The scheme is as follows: verification metadata ( ,t) to the Setup Phase: The cloud user ф server and deletes them from runs KeyGen to generate the local storage. public and secret parameters. Audit Phase: The TPA Specifically, the user chooses a first retrieves the file tag t. With random signing key pair (spk,ssk) respect to the mechanism we a random element x<-Z , p describe in the Setup phase, the random element u<-G1, and x TPA verifies the signature SSigssk computes v ← g The secret (name) via spk, and quits by parameter is sk = (x,ssk) and emitting FALSE if the the public parameter verification fails. Otherwise, the pk=(spk,v,g,u,e(u,v)) TPA recovers name. Given a data file F ={m } the i Now lets see the “core” user runs SigGen to compute

mi x part of the auditing process. To authenticator σi <- (H(Wi).u ) € generate the challenge message G for each i. Here, W = name||i 1 i for the audit “chal,” the TPA and name is chosen by the user

55 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 picks a random c-element subset elaborated as follows : I={s1,….sc}. For each element i € I, the TPA also chooses a random value vi . The message “chal” specifies the positions of the blocks required to be checked. The TPA sends chal = {(I,vi)} to the server. i€I Upon receiving challenge, chal = {(I,vi)} the server runs Properties of our protocol. It is i€I GenProof to generate a response easy to see that our protocol proof of data storage correctness. achieves public auditability. Specifically, the server chooses a There is no secret keying random element r←Zp and material or states for the TPA to calculates R = e(u,v)t G Let µ € t keep or maintain between audits, denote the linear combination of and the auditing protocol does sampled blocks specified in chal: not pose any potential online µ’ = i v m To blind µ’ with r, burden on users. This approach Σ €I i i the server computes: µ = r+ γµ’ ensures the privacy of user data mod p where γ = h(R)€Zp content during the auditing Meanwhile the server calculates process by employing a random an aggregate authenticator σ = masking r to hide a linear vi G It then sends {µ, ,R}as combination of the data blocks. πi€Iσi € 1. σ the response proof of storage Note that the value R in our correctness to TPA. With protocol, which enables the response ,the TPA runs privacy-preserving guarantee, VerifyProof to validateit by first will not affect the validity of the computing γ = h(R) and then we equation, due to the circular cheack the eqation relationship between R and _ in and the verification equation. Storage correctness thus follows The correctness of equation is from that of the underlying

56 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 protocol [13]. The security of this and the cloud server than protocol will be formally proven checking all the data, as long as in Section 4. Besides, the HLA the sampling strategies provides helps achieve the constant high-probability assurance. In communication overhead for Section 4, we will present the server’s response during the experiment result based on these audit: the size of is independent sampling strategies. of the number of sampled blocks For some cloud storage c. providers, it is possible that Previous work [9], [8] showed certain information dispersal that if the server is missing a algorithms (IDA) may be used to fraction of the data, then the fragment and geographically number of blocks that needs to be distribute the user’s out-sourced checked in order to detect server data for increased availability. misbehavior with high We note that these cloud side probability is in the order of O(1). operations would not affect the In particular, if t fraction of data behavior of our proposed is corrupted, then random mechanism, as long as the IDA is sampling c blocks would reach systematic, i.e., it preserves the detection probability P. Here, user’s data in its original form every block is chosen uniformly after encoding with redundancy. at random. When t =1% of the This is because from user’s data F , the TPA only needs to perspective, as long as there is a audit for c ¼ 300 or 460 complete yet unchanged copy of randomly chosen blocks of F to his outsourced data in cloud, the detect this misbehavior with precomputed verification probability larger than 95 and 99 metadata ð_; tÞ will remain percent, respectively. Given the valid. As a result, those metadata huge volume of data outsourced can still be utilized in our in the cloud, checking a portion auditing mechanism to guarantee of the data file is more affordable the correctness of user’s and practical for both the TPA outsourced cloud data.

57 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Storage and communication possibly expensive auditing task, trade off. As described above, but also alleviates the users’ fear each block is accompanied by an of their outsourced data leakage. authenticator of equal size of |p| Considering TPA may bits. This gives about 2 X storage concurrently handle multiple overhead on server. However, as audit sessions from different noted in [13], we can introduce a users for their outsourced data parameter s in the authenticator files, we further extend our construction to adjust this privacy-preserving public storage overhead, in the cost of auditing protocol into a multiuser communication overhead in the setting, where the TPA can auditing protocol between TPA perform multiple auditing tasks and cloud server. In particular, in a batch manner for better we assume each block mi consist efficiency. Extensive analysis of s sectors{m shows that our schemes are ij}with 1≤ j≤s, where u1,u2,u3….us are randomly provably secure and highly chosen form G1. efficient. Our preliminary VI. CONCLUSION experiment conducted on In this paper, we propose Amazon EC2 instance further a privacy-preserving public demonstrates the fast auditing system for data storage performance of our design on security in cloud computing. We both the cloud and the auditor utilize the homomorphic linear side. We leave the full-fledged authenticator and random implementation of the masking to guarantee that the mechanism on commercial public TPA would not learn any cloud as an important future knowledge about the data extension, which is expected to content stored on the cloud robustly cope with very large server during the efficient scale data and thus encourage auditing process, which not only users to adopt cloud storage eliminates the burden of cloud services more confidently user from the tedious and REFERENCES

58 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

Its Doors,” http:// www.techcrunch.com/2008/07/10 [1] C. Wang, Q. Wang, K. Ren, and W. Lou, “Privacy-Preserving /mediamaxthelinkup-closes-its- Public Auditing for Storage doors/, July 2008. Security in Cloud Computing,” [7] Amazon.com, “Amazon s3 Proc. IEEE INFOCOM ’10, Mar. Availability Event: July 20, 2010. 2008,” http://status.aws.amazon.com/s3- [2] P. Mell and T. Grance, “Draft NIST Working Definition of 20080720.html, July 2008. Cloud Computing,” [8] Q. Wang, C. Wang, K. Ren, W. http://csrc.nist.gov/groups/SNS/cl Lou, and J. Li, “Enabling Public oud-computing/index.html, June Auditability and Data Dynamics 2009. for Storage Security in Cloud Computing,” IEEE Trans. [3] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R.H. Katz, A. Parallel and Distributed Konwinski, G. Lee, D.A. Systems, vol. 22, no. 5, pp. 847- Patterson, A. Rabkin, I. Stoica, 859, May 2011. and M. Zaharia, “Above the [9] G. Ateniese, R. Burns, R. Clouds: A Berkeley View of Curtmola, J. Herring, L. Kissner, Cloud Comput-ing,” Technical Z. Peterson, and D. Song, Report UCB-EECS-2009-28, “Provable Data Possession at Univ. of California, Berkeley, Untrusted Stores,” Proc. 14th Feb ACM Conf. Computer and Comm. Security (CCS ’07), pp. [4] Cloud Security Alliance, “Top Threats to Cloud Computing, 598-609, 2007. http://www.cloudsecurityalliance [10] M.A. Shah, R. Swaminathan, .org, 2010. and M. Baker, “Privacy- Preserving Audit and Extraction [5] M. Arrington, “Gmail Disaster: Reports of Mass Email of Digital Contents,” Cryptology Deletions,” ePrint Archive, Report 2008/186, http://www.techcrunch.com/2006 2008. /12/28/gmail-disasterreports-of- [11] A. Juels and J. Burton, S. mass-email-deletions/, 2006. Kaliski, “PORs: Proofs of Retrievability for Large Files,” [6] J. Kincaid, “MediaMax/TheLinkup Closes Proc. ACM Conf. Computer and

59 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

Comm. Security (CCS ’07), pp. [17] R. Curtmola, O. Khan, and 584-597, Oct. 2007. R. Burns, “Robust Remote Data Checking,” Proc. Fourth ACM [12] Cloud Security Alliance, “Security Guidance for Critical Int’l Workshop Storage Security Areas of Focus in Cloud and Survivability (StorageSS Computing,” ’08), pp. 63-68, 2008. http://www.cloudsecurityalliance . org, 2009.

[13] H. Shacham and B. Waters, “Compact Proofs of Retrievability,” Proc. Int’l Conf. Theory and Application of Cryptology and Information Security: Advances in Cryptology (Asiacrypt), vol. 5350, pp. 90- 107, Dec. 2008.

[14] C. Wang, K. Ren, W. Lou, and J. Li, “Towards Publicly Auditable Secure Cloud Data Storage Services,” IEEE Network Magazine, vol. 24, no. 4, pp. 19-24, July/Aug. 2010. [15] M.A. Shah, M. Baker, J.C. Mogul, and R. Swaminathan, “Auditing to Keep Online Storage Services Honest,” Proc. 11th USENIX Workshop Hot Topics in Operating Systems (HotOS ’07), pp. 1-6, 2007.

[16] 104th United States Congress, “Health Insurance Portability and Accountability Act of 1996 (HIPPA),” http://aspe.hhs.gov/ admnsimp/pl104191.htm, 1996.

60 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 An Extraction Technique for Universal Distance Cache

Chotia Amit Nandkishor#1 Joshi Abhishek Arvind#2 Gosavi Darpan Vinod#3 Wagh Ganesh Vijay#4 Sir Visvesvaraya Institute of Technology, Nashik#1, 2, 3, 4 Bachelor of Engineering in Computer Engineering#1, 2, 3, 4 [email protected]#1 [email protected]#2 [email protected]#3 [email protected]#4

1. Abstract Performance is also a big factor while extracting the data. Todays Due to fast internet, a huge what we do? Simply we increase amount of data is uploaded in the disk so that loading can be data repositories hence minimized after then we use the extracting a perfect data is a concept of clustering in which we challenging task todays. Another make a number of cluster to thing which is important that as reduce the load. Another concept distance increases, extraction is RAID (Redundant array of matter. As we know, server independent disk). RAID can be always try to extract data which used according to the is nearer to them but on world requirement of server but all the wide web, when any user want to concepts/techniques require more search data for different location cost, more maintenance that’s as in USA then it may chances why it is not possible to maintain that data may not properly for a small firmware company extracted. but our project will remove all Here, we are going to develop these dependencies and will such type of technique from increase performance which distance won’t matter. automatically.

61 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 We introduce a new extraction process is facilitated by building approach with caching distance. an index on the relevant This is called as disk based attributes. These indexes are caches. User enter a distance often based on treating the range search and find out the records as points in a results. Here, we are going to use multidimensional space and parsing technique. It can extract using what are called point access the results from desired caches methods. and distances. More recent applications involve data that has considerably less 2. Introduction structure and whose specification Classical database methods are is therefore less precise. Some designed to handle data objects example applications include that have some predefined collections of more complex data structure. This structure is such as images, videos, audio usually captured by treating the recordings, text documents, time various attributes associated with series, DNA sequences, etc. The the objects as independent problem is that usually the data dimensions, and then can neither be ordered nor is it representing the objects as meaningful to perform equality records. These records are stored comparisons on it. Instead, in the database using some proximity is a more appropriate appropriate model (e.g., retrieval criterion. Such data relational, object-oriented, object- objects are often described via a relational, hierarchical, network, collection of features, and the etc.). The most common queries result is called a feature vector. on such data are exact match, For example, in the case of image partial match, range, and join data, the feature vector might applied to some or all of the include color, color moments, attributes. Responding to these textures, shape descriptors, etc., queries involves retrieving the all of which are usually described relevant data. The retrieval using scalar values. In the case of

62 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 text documents, we might have to each other (closest pairs one dimension per word, which queries). leads to prohibitively high 3. Existing System dimensions. There are mainly two following Correcting misspelled text or problems in the Existing System: searching for semantic equivalents is even more difficult. A. Problem of deep Video retrieval involves finding Extraction based on overlapping frames which is distance: somewhat like finding Already there are a lots of subsequences in DNA sequences. problem such as language The goal in these applications is dependency, scripting often one of the following: dependency, version dependency 1. Find objects whose feature in extraction but now a days, values fall within a given many technique has been range or where the distance, released such as page level using a suitably defined extraction, fiva tech extraction, distance metric, from some vision based extraction from query object falls into a which efficient extraction can be certain range. done but the main problem is 2. Find objects whose features extraction based upon distance. have values similar to those Many of time, we observe that we of a given query object or set don’t find that type of result of query objects (nearest what we want. neighbor queries). In order to Suppose there is a website in reduce the complexity of the USA for courier service (FedEx) search process, the precision related. That courier company of the required similarity can have also branches in another be an approximation. company such as in India, china, 3. Find pairs of objects from the Russia and etc. Obviously all same set or different sets branches may have relevant which are sufficiently similar

63 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 website in different location. The Another problem is Complexity. problem is that, when any user When data is uploaded from wants to search a branch for that different location then it may courier company in search box having chance of more and more then he find only main branch complex data. If we consider a (USA) which is a problem in digital library website such as extraction. The extraction Google, yahoo, Wikipedia then approach having a problem to there exists too many unwanted find nearer located branch i.e. data. One links may occur many Distance has been not considered times.as, all the links having in that tool (extraction tool). some information behind them. If they will occur more than one Another example is, suppose we time then space will be taken want the information about “java more. Hence performance will be programming language”. We type automatically decreased and after this keyword in any search box then response time will be then what the server do? They increased which is not a solution try to find the java programming for good extraction.so, this type language containing information of problem exists in existing then this time, the concept of system. similarity is used. Server will first match the data after then 3.1 Disadvantages: extract. Here also, distance 1. Minor performance is matter? Which data should be available here. presented, nearer or far 2. Less efficiency and less distanced data? Which data performance only. having sufficient information for 3. It can take more amount of user? So this type of problem time under retrieval. existed in existing system. 4. Computational cost is high. B. Problem of data retrieval 5. More Complexity based on Complexity and web dependency:

64 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

3.2 Existing Algorithms operation is used to find the best A. Basic Algorithms path for inserting a new data In this section, we will present item into the tree by providing a some basic algorithms on high- number representing how bad an dimensional index structuresfor insertion into that path would be. index construction and The PickSplitoperations is used maintenance in a dynamic to split a data page in case of an environment, as well as for query overflow. processing. Although some of the The insertion and delete algorithms are published using a operations of tree structures are specific indexing structure, they usually the most critical are presented here in a more operation which heavily general way. determine the structure of the resulting index and the 3.2.1 Insert, Delete and achievable performance. Update Insert, delete and update are the B. Exact Match Query operations, which are most Exact match queries are defined specific to the corresponding as follows: Given a query point q, index structures. Despite that, determine whether q is contained there are basic algorithms in the database or not. Query capturing all actions which are processing starts with the root common to all index structures. node, which is loaded into main In the GiST framework, the memory. For all regions build-up of the tree via the insert containing point q the function operation is handled using three ExactMatchQuery () is called basic operations: Union, Penalty, recursively. As overlap between and Pick-Split. The Union page regions is allowed in most operation consolidates index structures presented in this information in the tree and chapter, it is possible that several returns a new key which is true branches of the indexing for all data items in the structure have to be examined for considered subtree. The Penalty processing an exact match query.

65 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 In the GiST framework, this caches we are uses here parsing situation is handled using the technique. It can extract the Consistent operation which is the results from desired caches. It generic operation that needs to can extract the results from be implemented for different useful distances. Unnecessary instantiations of the generalized distance locations of caches we search tree. The result are not search here. It can give ofUnderflow conditions can the fewer amounts of results, generally be handled by three with low indexing results. D- different actions: cache gives the reduced cost • Balancing pages by moving results in implementation. Here objects from one page to we show efficient indexing another results in output. It can gives the • Merging pages Exact Match faster extraction results. Query is true if any of the 4.1 Advantages recursive calls returns true. For data pages, the result is 1. It avoids the distance true if one of the points computation problems. stored on the data page fits. 2. Enhance the query

performance. 4. Proposed System 3. Batch insertion of queries We introduce the new extraction also in cache servers. approach with caching distances. Here new database technology 4. Speedup query result introduced here. That is called as display. a disk based caches. It is not 4.2 Applications search the total caches here. User entered distance range search 1. In Dynamic Web and find out and display the 2. In Digital Libraries. results. It is search the data in limited number of caches. 3. In Large Files. Searching in limited number

66 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

4. Pattern Recognition object r and a database object o on input, the D-cache should 5. Universal Distance quickly determine the tightest Cache possible lower or upper bound of D Cache is a technique/tool for without the need of an explicit general metric access distance computation. This cheap methodsthat helps to reduce a determination of lower/upper cost of both indexing and bound distances then serves a querying. Themain task of D MAM in order to filter out a no cache is to determine tight lower relevant database object or even and upperbound of an unknown a whole part of the index. distance between two objects. 6. Metric Access Methods The desired functionality of D- cache is twofold: First, we have to understand First, given a pair runtime about Metric access methods— object/database object hr; oi, the Metric access methods are the D-cache should quickly technique which is used in determine the exact value _ðr; oÞ thatsituations where the in case the distance is stored in similarity searching can be the D-cache. However, as the applied. E.g.Search for SBI, it can exact value could only be found search in entire country i.e. when the actual distance was similarsearch has been invoked. already computed previously in First try to understand the the session, this functionality is concept ofsimilarity searching. limited to rather special cases, When any user submit a query in like rendering of data objects (or thesearch box or any database index rearrangements), repeated then the process of responding queries or querying by database tothese queries is termed as objects. similarity searching. Given a The second functionality, which queryobject this involves finding is the main D-cache contribution, objects that are similar to q in is more general. Given a runtime

67 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 adatabase S of N objects, based Because the main memory is on some similarity measure. always limited and the distance matrix could expand to an Both q and s are drawn from enormous size, we need to choose some “universe” U of objects,but a compact data structure that q is generally not in S. We consumes a user-defined portion assume that the of main memory. In order to similaritymeasure can be provide also fast retrieval, the D- expressed as a distance metric cache implements the distance such thatd(01,02) becomes Matrix as a linear hash table smaller as 01 and 02 are more consisting of entries.the hash key similar thus(s, d) is said to be a (pointing to a position in the finite metric space. hash table) is derived from the Now, metric access method will two ids of objects whose distance facilitate the retrieval processby is being retrieved or stored. building an index on the various In addition, there is a constant- features which areanalogous to size collision interval defined, attributes. These indexes are that allows to move from the based on treatingthe records as a hashed position to a more points in a multidimensional suitable one. space and usepoint access methods. However, in order to keep the D- cache as fast as possible, the Metric access methods uses a collision interval should be very structure for caching small, preferably just one distancescomputed during the position in the hash table (i.e., current runtime session. The only the hashed position). distancecache ought to be an analogy to the classic disk cache 7. Project Modules widelyused in DBMS to optimize There are mainly five modules: I/O cost. Hence instead of sparingI/O, the distance cache 7.1Apply the Concept of D – should spare distance Cache: computations.

68 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Any user can forward any output. It is very cheap for type of distance based query extraction of results and provides starts the searching process and the results as a output. It can create the runtime object and give the results as a minimized database object. Each and every result of content. object session time and index are 7.3 Filtering the Data: calculated here for particular It can start the searching distance based query. Other user process based on radius. It forward same query extracts the searches the data within the results from previous distance. region. It can start the search in Automatically index value is all number of dimensions. increases here. It is the Display the results after procedure of D-cache. D-cache collection of multidimensional starts the searching process and objects. It can give exact and quickly displays the results. It accurate results in output display can calculates lower bound and content. upper bound, which is the nearest locations results those 7.4 Approximate Similarity results are displayed as a final Search: results. It can give relevant It can start the search by distance based caches results exact approximate similarity only in output. search. It can save the cost under extraction of results. This type 7.2 Selection of Dynamic search retrieves the exact results. pivots: It is good incremental search It consider the input of without lower and upper bound first module. That is called as a distances. It is related good preprocessing data or indexing hierarchy related search data. In this particular data only mechanism here. perform the similarity search operations. Automatically creates the dynamic pivot calculation and display the final results in

69 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 7.5 PerformanceUp GetLowerBoundDistance is set gradation: by the user, while this parameter D-cache gives faster is an exact analogy to the number results. It can have less overhead of pivots used by Pivot tables, problems. It reduces the e.g., LAESA. There exists no distances and provides the general rule for the automatic results with less expensive. determination of the number of pivots, especially when 8. Algorithms Used minimizing the real-time cost The D-cache is initialized by a rather than just the number of MAM when loading the index distance computations. (the session begins). Besides the In general, the effective number initialization, the D Cache is also of pivots depends on the notified by a MAM whenever a (expected) size of the database, new query/ insertion is to be its intrinsic dimensionality (see started (the MAM calls method Section 6.1.1), the computational StartRuntimeProcessing on D- complexity of the used metric, cache). At that moment, new the pivot set quality itself, etc. runtime object r is announced to The same reasons apply also for be processed, which also includes D-cache. the computation of distances 8.2 Distance Insertion from r to the k actual dynamic Every time a distance &(ri,oi) is pivots. computed by the MAM, the These are the following triplet( id(r),id(o),&(ri,oi))is algorithms used: inserted into the D-cache (the 8.1 Distance Retrieval MAM calls method The main D-cache functionality InsertDistance on D-cache). Since is operated by methods the storage capacity of D-cache is GetDistance and limited, at some moment the GetLowerBoundDistance collision interval in the hash The number of dynamic pivots (k table for a newly inserted ¼ jDPj) used to evaluate distance entry is full. Then, some

70 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 older entry within the collision structures and the interval has to be replaced by the performance of similarity new entry. Or, alternatively, if it queries. turns out the newly inserted distance is less useful than all the 10. Conclusions distances in the collision interval, In this paper we presented the D- the insertion of the new distance cache, a main-memory data is canceled. structure which tracks computed Note that we should prioritize distances while inserting objects replacing of such entries where or performing similarity queries none of the objects oid1, oid2 in the metric space model. Since belongs to the current set of k distance computations stored in dynamic pivots anymore. the D-cache may be reused in further database operations, it is 9. Future Enhancements not necessary to compute them The D-cache supports three again. Also, the D-cache can be functions useful for metric access used to estimate distance methods (MAMs)—the functions between new objects GetDistance (returning the exact and objects stored in the distance between two objects, if database, which can also avoid available), the expensive distance computations. GetLowerBoundDistance The D-cache aims to amortize the (returning the greatest lower- number of distance computations bound distance between two spent by querying/updating the objects, by means of the dynamic database, similarly like disk page pivots), and the buffering in traditional DBMSs GetUpperBoundDistance aims to amortize the I/O cost. (returning the lowest upper The D-cache structure is based bound distance). With these on a hash table, thus making functions, the D-cache may be efficient to retrieve stored used to improve the distances for further usage. construction of MAMs’ index

71 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

REFERENCES Systems, vol. 24, no. 8, pp. 834- 848, 2008 [1] J.S. Vitter, “External Memory Algorithms and Data [6] P. Zezula, G. Amato, V. Dohnal, Structures: Dealing with and M. Batko, Similarity Massive Data,”ACMComputing Search: The Metric Space Surveys, vol. 33, no. 2, pp. 209- Approach (Advances in 271, Database Systems). Springer, citeseer.ist.psu.edu/vitter01exte 2005 rnal.html, 2001. [7] E. Cha´vez, G. Navarro, R. Baeza-Yates, and J.L. [2] C. Bo¨hm, S. Berchtold, and D. Keim, “Searching in High Marroquı´n, “Searching in Dimensional Spaces—Index Metric Spaces,” ACM Structures for Improving the Computing Surveys, vol. 33, no. Performance of Multimedia 3, pp. 273-321, 2001. Databases,” ACM Computing [8] G.R. Hjaltason and H. Samet, Surveys,vol. 33, no. 3, pp. 322- “Index-Driven Similarity 373, 2001 Search in Metric Spaces,” ACM Trans. Database Systems, vol. [3] S.D. Carson, “A System for Adaptive Disk Rearrangement,” 28, no. 4, pp. 517-580, 2003 Software—Practice and [9] H. Samet, Foundations of Experience, vol. 20, no. 3, pp. Multidimensional and Metric 225-242, 1990. Data Structures. Morgan Kaufmann, 2006. [4] W. Effelsberg and T. Haerder, “Principles of Database Buffer [10] T. Skopal and B. Bustos, “On Management,” ACM Trans. Index-Free Similarity Search in Database Systems, vol. 9, no. 4, Metric Spaces,” Proc. 20th Int’l pp. 560-595, 1984 Conf. Database and Expert Systems Applications (DEXA [5] M. Batko, D. Novak, F. Falchi, and P. Zezula, “Scalability ’09), pp. 516-531, 2009 Comparison of Peer-to-Peer [11] E. Vidal, “New Formulation and Similarity Search Structures,” Improvements of the Nearest- Future Generation Computer Neighbour Approximating and Eliminating Search Algorithm

72 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

(AESA),” Pattern Recognition Letters, vol. 15, no. 1, pp. 1-7, 1994

[12] M.L. Mico´, J. Oncina, and E. Vidal, “An Algorithm for Finding NearestNeighbour in Constant Average Time with a Linear Space Complexity,” Proc. Int’l Conf. Pattern Recognition, 1992.

[13] M.L. Mico´, J. Oncina, and R.C. Carrasco, “A Fast Branch & Bound Nearest- NeighbourClassifier in Metric Spaces,” Pattern Recognition Letters, vol. 17, no. 7, pp. 731- 739, 1996.

73 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Safety Management of Construction Workers

Vivek K. Kulkarni1 Professor R. V. Devalkar2 Dept. of civil Engineering N.D.M.V.P.S K.B.G.T., C.O.E. Nasik

Abstract

Construction is the most dangerous Keywords— DFS, GDP, Hagan, land-based work sector in the world. Bureau of Labor, prime Accidents at work places are due to a sequence of events. Events I.INTRODUCTION may be physical due to hazardous Construction Safety Management situations & incidents due to consists of three phases: behaviors of workers by unsafe 1. Planning and Preparation acts. Construction accidents can Phase be reduced by tokening some 2. Identification and Assessment presentational precaution. Very Phase little attention is given towards 3. Execution and Improvement managerial, organizational & Phase human factors. Thus an Safety management organization lacks the approach system is the basis of safety towards the development & performance, which can allow a enforcement of effective powerful means for controlling & Performance measures & metrics monitoring performance. Safety or model needed to achieve an is the best approach of doing efficient safety management business, which maximizes the system. So to avoid such accident competitive nature of safety management of organization, through continual construction works should be the improvement of its product, at most priority of the concerned services, people & environment organization . by concentrating on customer

74 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 focus, commitment & a solid (the intermediate phase between team work. By giving a safety a finished design and a completed workplace is an important aspect building) is the responsibility of of the system. Safety should be the contractors and other site included as one of the strategic professionals. The success of a objectives of the construction project depends on the planning work. The construction industry and decisions that are made on stands out from other site. Most construction accidents employments as having one of result from basic causes such as the highest worker injury and lack of proper training, deficient fatality rates. Construction enforcement of safety, unsafe comprises a very small equipment, unsafe methods or percentage of the overall sequencing, unsafe site workforce. Yet, the incidence rate conditions, not using the safety for non-fatal injuries and equipment that was provided, illnesses exceeds that of many and a poor attitude towards other industries. The safety (Toole, 2002). Often the construction industry has the role of the various contractors is most fatalities of any other unclear as some contractors may industry sector (Bureau of Labor try to transfer responsibility for Statistics, 2004). Some studies safety to others. The most have shown that a fairly large common construction project percentage of construction arrangement is that of general accidents could have been (prime) contractor/subcontractor. eliminated, reduced, or avoided Before any excavation has taken by making better choices in the place, the contractor is design and planning stages of a responsible for notification of all project (Hecker 2005). The applicable companies that problem is not that the hazards and excavation work is being risks are unknown, it is that they performed. Location of utilities is are very difficult to control in a a must before breaking ground. constantly changing work During excavation, the environment. Construction safety

75 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 contractor is responsible for engineering design, or reduced by providing a safe work incorporating a safety device, environment for employees and then warnings, instruction, and pedestrians. Access and Egress is training are the last resort. This also an important part of process has been applied to the excavation safety. Ramps used by design of products, equipment, equipment must be designed by a machines, facilities, buildings, competent person, qualified in and job tasks. Manufacture, structural design. assembly, and maintenance are considered during the design process. Safety has a high degree of Uncertainty. The main four factors of uncertainty are (1) Inherent Variability (2)Estimation error II. Overview of safety (3) Model Imperfection Designing for Safety (DFS) is the (4) Human error process that incorporates hazard analysis at the beginning of a The sub factor of Inherent design (Hagan). This process Variability is the randomness in starts with identifying the the characteristic of the hazard(s). Engineering measures workplace & the environment to are then applied to eliminate the which the work place is exposed. hazard(s) or reduce the risk. The The sub factors of Estimation hierarchy of design measures Errors are incomplete statistical starts with eliminating the data, inaccurate estimate of the hazard(s) by engineering design. parameters of the probability If the hazard(s) cannot be models of Inherent Variability. eliminated by engineering design, The sub factors for Model then safety device(s) are Imperfection are due to the incorporated. If the risk of injury application of ideal model to cannot be eliminated by

76 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 describe safety. The ideal model improved ergonomics. is due to ignorance & inability to Construction workers are already understand the phenomenon of at a higher risk of accidents than safety measures. The sub factors in any other industry, and the for human errors are uncertainty large influx of workers from that is due to human errors, Eastern European countries is errors in design, construction & presenting considerable operations. additional challenges to employers’ efforts to manage health and safety. In India the construction industry is the second largest employer next to agriculture whereas it is next to the road accidents in our country. The annual turnover of the construction industry in India is about 4000 Billion Rupees, which is more

III. Need for safety management Safety & productivity are co-related. It is a false belief in construction industry that safety management is an additional expense that hinders productivity. But it is not true. Safety management strategies improve productivity through than 6% of the National GDP curtailments in delays & employing a large work force. distraction enhances team work, While dealing with safety cleaned & orderly worksites & management, the main three questions arise:-

77 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 (1) What types of difficulties construction & to enhance the arises on the projects? performance with a (2) To what extent the safety questionnaire. issues affect the project? IV. Methodology (3) What measures can be taken As this topic is very vast to improve safety performance of with different types of problems a construction project? with different scales of industry Some of the basic objectives to deal with problem and to behind taking this study are as conclude from it. Health and follows:- safety is taken extremely 1) To conduct literature survey:- seriously at every level of the We have to study various papers company, from the board of and literature on construction directors’ right through to the safety measures & different site teams and operatives. We are situations in which measures are always striving to find new followed/ not followed & to methods and initiatives in which develop models for safety climate the workforce can be involved on safety. while working towards a safe site 2) To study the factors:- We have environment, as well as to study the main factors which reviewing and improving existing play an important part in safety systems. so taking this into measures & allocations of the consideration we will be working factors as per the perspectives of in the following manner contractors & workers. 3) To examine the safety management implementation i.e. .the efficiency with which it is applied in the industry. 4) To provide a practical suggestion & recommendations towards safety management to (i) The question will be sent to include in the system of large scale construction firms.

78 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 (ii)Thus a stratified sampling References :- method will be used to select the (1) Sherif Mohamed, “Scorecard sample of construction firms. Approach to Benchmarking (iii) Random sampling method Organizational Safety Culture In Construction”, Journal of will be used if construction firms Construction Engineering & who are not ready to provide the Management, Vol-129, No-5, 2003, responses for the question given pp.80-88. to them. (2) Enno Koehn, P.E., and Nirmal K. (iv)The primary data will be Datta “Quality, Environmental, collected mainly from all the Health and Safety Management construction labors with the help Systems for Construction of interviews. Engineering”, Journal of (v)Concluding that Identifying Construction Engineering & the different reasons regarding Management, Vol-129, No-5, October2003.pp-562-569. not following safety measures & (3) Frank Gross and Paul P. Jovanis suggesting medical procedures ,“Current State of Highway Safety for following a mandatory safety Education: Safety Course. measures. (4) Journal of Structural Conclusion Engineering, Vol-115, No-5, May While concluding on this 1989, pp.1119-1140. important topic of construction (5) Ovedit levsen, “Fundamental safety, I would like to refresh you Postulate in Structural Safety”, all that each site poses its own Journal of Engineering Mechanics, unique challenges in terms of Vol-109, No-4, August 1984, pp.1096-1102. industrial safety requirements which have to be tackled by (6) Yiquan Chen, Sebastian Tan and sincerity and professionalism. Samuel Lim, “Singapore Work Place Modern management and Safety & Health Research Agenda: machinery are helpful in Research-To-Practice”, Journal of achieving these objectives when Safety, Health & Environmental used in a disciplined way. Research, Vol-8, No1, 2012, pp 27- 32.

79 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 IMPROVING STARTUP TIME AND PROVIDING SECURITY TO SNAPSHOTS ON LINUX PLATFORM Sheetal R. Tambe, Monika Shinde, Rohini Hire, Shanku Mandal, Rokade S. M. Department Of Computer Engg., Sir Visvesvaraya Institute Of Technology,Nashik.

Abstract: This paper provides smaller, which results less an entirely new mechanism time to write to the disk for traditional shutdown and substantially. It results in boot to improve the boot fresh user session to be time and provide user session conducted more quickly as more quickly. Generally, compared to that of the before closing the kernel traditional. threads and services all the Keywords applications are Freeze, thaw, system closed initially during image, kernel, boot kernel, target shutdown. Finally for a kernel, snapshot image, complete shutdown, all the security, encryption devices prepare themselves. on hibernated image. It takes more time to resume from a complete shutdown. 1. Introduction Therefore, the following Nowadays it is common technique is enhanced in this to have a suspend option in paper. In this technique, the several laptops. The suspend user session is followed by option saves the state of the hibernation of the kernel machine to a file system or to a session. The full hibernation partition and switches to includes a lot of usage of standby mode. The machine memory pages by the user can continue its work by application compared to the resuming the machine saved proposed hibernation in state that is loaded back to which the data is much RAM .It has two benefits. First is that we save ourselves, the

80 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 time machine goes down and encryption mechanism for then boots up, when running hibernated snapshot image. from batteries the energy costs Hence, to overcome this are really high. The other gain problem we are providing is that the processes that are security i.e. we are going to add calculating something for a long an authentication test before time shouldn’t need to be accessing the snapshot image. written interruptible as we The system is so designed that don’t have to interrupt our it checks for the password programs. The technique to be before accessing the snapshot implemented saves the state of images. The level of security for the machine into active swaps the snapshots will be and then reboots or power maintained by using encryption downs. The user must explicitly algorithm to encrypt the image. specify the swap partition to To achieve this some changes in resume with “resume=” kernel the kernel level coding is firstly option. The saved state is required. The system loaded and restored if the implementation requires signature is found. The complete understanding of the resuming is skipped, if the entire flow of the system i.e. option “noresume” is specified how actually hibernation is as a boot parameter. The carried out in the system. The hibernation image is saved area is to be found out on which without compression if the the hibernated snapshot image option gets loaded on disk. The “hibernate=nocompress” is security at a higher level is specified as a boot parameter. achieved by adding encryption The existing hibernation algorithm. process has a problem that it The concept of power does not provide any security to management is implemented in the hibernated snapshot various different ways in images. Secondly, there is no computers viz. suspend, stand-by

81 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 and hibernate. Standby is quicker than performing a full sometimes referred to as power- boot sequence. In this paper we on suspend which is a low-latency are trying to propose the power state. In the standby state, technique which will provide the the power is conserved by placing security to snapshot images at the CPU in a halt state and the kernel level. We will ask for the devices are placed in the D1 password on hibernation for the (Class-specific low-power state) corresponding image then the state. The response latency is user session will be logged off. minimal typically less than 1 While in hibernation, the image second but the power savings are of the kernel space will be not significant. Suspend is also created and is to be stored on a known as suspend-to-RAM non-volatile disk. The proposed commonly. In the suspend state, hibernation data is much smaller all devices are placed in the D3 which will take substantially less state (state in which device is off time to write to the disk as and not running) and the entire compared to a full hibernate system other than main memory which includes a lot of memory is expected to maintain power. pages in use by user applications. The content is not lost as Finally, apply some suitable memory is placed in self-refresh encryption algorithm to provide mode. The response latency of high security. standby is higher, yet it’s still very low between 3-5 seconds. Objectives The most power is conserved by The main idea of the hibernate due to turning off the paper is to provide a kernel level entire system after saving state security to the hibernation state, to a persistent medium, usually a reduces the time required for disk. All the devices are powered booting after traditional shut off unconditionally in the system. down. We are taking into The response latency is highest consideration with resuming the (about 30 seconds) but still previous state of the system

82 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 which has already been done conserve the amount of energy using hibernation in the existing being used there has been many system. While resuming the advances in system and software hibernation image it will architectures. The power authenticate the user using the management that the OS must password to conduct a new user handle is of two types -System session. The future scope of our Power Management and Device project can be extended to have a Power Management. System backup to be taken of the power management deals with snapshot images of hibernated the state which the system states. Further higher security governs. The System Power can be achieved by using the Management will include higher level and complex shutting down the system as well encryption algorithm. as booting it up and taking the system again into a usable state. 2. Literature Survey However, with the 2.1 Power Management in Linux implementation of power Power management in a management of Linux, it is also device is defined as the process possible for system to enter into a by which the overall consumption low-power state which ultimately of power is limited based on user saves the power. Device Power requirements and policy of a Management deals with the computer. As laptops and mobile concept of putting individual phones have become more devices into a proper state as common place, power directed by the user events. The management has become a hot Device Power Management topic in the computer world in describes the state in which a recent years and the users have particular device is working. This become more conscious of the module can put devices into OFF environmental and financial or ON state as well as other effects of limited power power-saving states. In Linux, resources. Over the years, to the Advanced Configuration and

83 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Power Interface (ACPI) is the restoring previously saved state interface in which the mapping of instead of reinitializing the entire events is done to appropriate system in System power action. An industry-standard management. The battery life in interface for power management mobile devices is conserved by in laptops, embedded devices or reducing the annoying wait to desktops is established in an boot into a usable state for the open-industry specification. computer. Several mechanisms like suspend-to-disk, suspend-to-ram, 2.2 Advanced Configuration and standby or shutdown is Power Interface (ACPI) supported by this The Advanced implementation. A state is Configuration and Power defined for the system and each Interface (ACPI) specification of the devices in each mechanism. was mainly developed to Though the most obvious area of establish industry common benefits is conservation of power interfaces. The industry common and hence longer battery life in interfaces enabled robust power management of devices operating system (OS)-directed like laptops or embedded devices motherboard device, Faster is concerned. Boot time is one of Booting Technique configuration the important areas of and power management of both application for power entire systems and devices. The management. To provide a more key element in Operating efficient and faster booting System-directed configuration process several mechanisms are and Power Management is ACPI. proposed in Linux kernel or All classes of computers including related patches. It is much (but not limited to) desktop, desirable at user-level to get the mobile, workstation, and server user screen in lesser time known machines are suitably specified as faster booting. The boot time defined within the concepts of of a system is reduced by the interfaces and OSPM. The

84 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 concept that system should if the power button is pressed or conserve energy by the transition lid of laptop is closed. The of unused devices into low power /etc/ACPI/handler.sh is a shell states including placing the script containing action to be entire system in a low-power executed for each of the events in state (sleeping state) whenever which these actions are defined. possible is promoted by The command ACPI listen is OSPM/ACPI perspective, from a used to recognize how events are power management. For detected by ACPID. The instance, if a laptop is having command ACPI listen is run in battery’s power at critical level, the shell to demonstrate. When a ACPI will detect the event and certain power event occurs it will put laptop in a low-power state display the tokens generated like suspend- to-ram or suspend- after listening to ACPI port. For to-disk. instance, if power button is 2.3 How ACPI works? pressed then the tokens With the help of a process generated are power/button running in background called PBTN 0000000000000b31. The ACPID (ACPI Daemon) ACPI parameters to handler.sh are the subsystem is implemented on tokens generated which will devices. The user-space programs match the tokens with of ACPI events are notified by appropriate case and the action ACPID. The power management will be executed. By creating an of system is controlled and the event file and corresponding rules defined against the events action script the user can define are executed as they occur. The his/her own event and rules. /etc/ACPI/events is the default 2.4 Suspend-to-RAM (Sleep) rule configuration file for ACPID. The Suspend-to-RAM The default rule configuration state offers significant power file comes with a number of savings. In this state everything predefined actions for triggered in the system is put into a low- events, such as what is the result power state, except for memory,

85 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 which is placed in self-refresh However, for example, if mode to retain its contents. one of the devices refuses to While in memory the system and freeze then we need to wake up device state is saved and stored. all already frozen devices, thaw 2.5 Suspend-to-Disk processes and enable non-boot (Hibernation) CPUs. By booting with At a higher level, for the ’resume=¡swap partition¿’ execution of suspend-to-disk command line parameter, the operation ACPI will be kernel-driven resume procedure responsible and then the kernel may be started (where ¡swap will implement actions to take partition¿ is the one the suspend over to suspend the system and image has been written to in step the system state will be saved (7)). Then, the following actions before shutting down the system. are performed: The Suspend-to-Disk operation is 1. The suspend image is read into composed of various steps where RAM. different part of performs 2. Devices are prepared to each step. The actions performed resume. by kernel are as follows: 3. System memory state is 1. Non-boot CPUs are taken off- restored from the suspend line. image. 2. Tasks are frozen. 4. Devices are woken up. 3. Some memory is released, if 5. Tasks are thawed. necessary. 6. Non-boot CPUs are enabled. 4. Devices are frozen. Freezing and thawing 5. Atomic copy of the memory tasks, snapshotting memory and (aka suspend image) is restoring its state, then saving created. and loading kernel image are the 6. Devices are woken up. most important steps involved 7. The suspend image is written here. Hibernation process is to a swap partition. initiated from the function 8. The system is powered off. hibernate () in “hibernate.c”.

86 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 2. Related Work one zone. Each zone bitmap object consists of a list of objects 2.1 Architectural Design of type struct bm-block that and The design of the each block of bitmap certain proposed system-Improving information is stored. Struct Startup Time and Providing memory-bitmap contains a Security to hibernated snapshot pointer to the main list of zone images is as follows: When a bitmap objects, where a struct shutdown command or restart is bm-position is used for browsing invoked the entire code is run the bitmap and for allocating all within the kernel code. The of the zone bitmap objects and working of the system design bitmap block objects a pointer to starts with user giving shutdown the list of pages is used. A pointer command and which in turn calls to list of bitmap block objects and the proposed module. Following a pointer to the bitmap block Figure depicts sequential object that has been most processes involved in the recently used for setting bits is implementation process: contained in struct zone-bitmap. The memory bitmap data structure is used for storing the Additionally, the struct snapshot images. The memory zone bitmap contains the pfns bitmap data structure is used to that correspond to the start and map the memory pages that are end of the represented zone. A to be saved and that are pointer to the memory page in forbidden. Basically, the pages which information is stored (in that are to be included in the the form of a block of bitmap) is system image are indicated in contained in struct bm-block. bitmap. A structure consisting of The struct bm-block also many linked lists of objects is contains the pfns that correspond called as Memory bitmap. to the start and end of the The main elements of list represented memory area. It also are of type struct zone bitmap contains numbers of normal and and each of them corresponds to

87 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 highmem page frames allocated for hibernation image before suspending devices. For marking saveable pages (during hibernation) or hibernation image pages (during restore) memory bitmap is used. Memory bitmap is used during hibernation for marking allocated page frames that will contain Figure 2.2: Architectural Design copies of saveable pages. During – Resume the restore process memory bitmap is initially used for 3. Proposed System marking hibernation image The main idea of the pages, but then the set bits from proposed system is to provide a it are duplicated in @orig-bm kernel level security to the and it is released. hibernation state, reduces the time required for booting after traditional shut down. With resuming the previous state of the system which has already been done using hibernation is concerned. During the process of resuming the hibernation image, the task of authentication of the user using the password will be done. The scope can be further Figure 2.1: Architectural design – extended to have the backup of shutdown the snapshot images of hibernated states. The high security can be achieved using

88 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 the higher level and complex The study of comparison encryption algorithm. of boot times between normal boot mechanisms and our improved mechanisms has been worked out. After the observations done we had our conclusions that the normal boot takes about 40 seconds, but taking into consideration of the proposed system we visualize the idle screen within 3 seconds and Fig.:3.1 Flow Diagram for total boot time does not exceed 6 shutdown seconds. We realize an additional The main difference in reduction of 0.5 seconds if the existing system and proposed approach is applied at the boot system is checking of swsusp loader level. Some functions must signature before loading the be implemented in the boot hibernated image to RAM. The loader which is already main aim is to provide security implemented in the kernel to for the snapshots. have our mechanism applied at the boot loader level: snapshot image loading, initializing some devices, and some other functions. As a result, another 0.5 seconds can be eliminated by applying these mechanisms at the boot loader level. The required works are much less than the snapshot boot although Fig.:3.2 Flow Diagram for the boot loader level approach Resume requires additional management

89 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 and implementation. Finally, this security level of data get result suggests that there is a increased by great extend. trade-off between boot time and management cost. Bibliography

4. Conclusion [1] Reducing boot time in Linux Nowadays the security of devices, C lass We- the data is very much important 2.2,Embedded Live Conference, factor to be taken into UK.2010,Chris Simmonds. consideration. It is necessary to [2] Efficient operating system switching using mode bit and maintain the consistency and hibernation mechanism, CSI security of the hibernated image publications 2012. when the system gets hibernated [3] System Power Management and your work is saved on disk. States Various technology are Documentation/power/sataes.tx implemented in the computer t. system like suspend to disk, [4] Swsuspend Porting Notes suspend to RAM, TuxOnIce for http://tree.celinuxforum.org/Cel hibernation but none of the fPubWiki/SwsuspendPortingNo mentioned technologies provide tes. [5] swsusp for OSK http: the security and encryption for //lists.osdl.org/pipermail/linux- the hibernated snapshot image. pm/2005-July/001077.html Thus the main aim was to design [6] OMAP 5912 Starter Kit and implement a completely new http://tree.celinuxforum.org/Cel mechanism for providing security fPubWiki/OSK . and encryption. So here’s a new [7] Das U-Boot-Universal design and implementation that Bootloader will use the concept of suspend- http://sourceforge.net/projects/ to-disk for providing the user u-boot. with an experience of secured [8] Execute in Place (XIP) http:// www .montavista.co.jp/ hibernation. With the new technique to hibernation the

90 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

[9] KernelXIP http://tree.celinuxforum .org/CelfPubWiki/KernelXIP [10] Prelink http://people.redhat.com/ jakub/prelink.pdf [11]MakingMobilePhonewithCELinu xhttp://tree.celinuxforum.org/C elfPubWiki/ITJ2005Detail1_2d 2

91 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

Efficiently Securing Privacy of User Information in Cloud Based Health Monitoring System

Mr. Dhoot Suyog S, Prof. Naoghare M. M, Mr. Shinde Girish. R (Department of Computer Engineering, SVITCOE, Pune University, India) Email: [email protected]

system which is operated

through mobile phones or ABSTRACT: Cloud computing is emerging technology which used for new wireless network is important revolutionary approaches for user and society. Through cloud a decision cum aspect in the field of technology feedback system has been developed to and adopted by developing obtain report, precautions based on user health information at low cost. System countries. In remote areas of can be operated through mobile and Caribbean countries Microsoft provide good market for health service launched project “MediNet” i.e. provider. Privacy of user health information and the properties of health health monitoring feedback service provider kept to be secure in

cloud. New system required to maintain decision system for diabetes and

privacy with less overhead al client side cardiovascular diseases. User for secure computation. Programs give its health related through which feedback or decision reply to user kept to be secured from information as input which inside and outside attacks. Decryption outsourcing and newly proposed secure passes through web based key duplicate 2-way encryption are medical application programs to selected to reduce overhead at client side with securing privacy of information. give decision or feedback or

This paper demonstrates the precaution to user based on effectiveness of system in terms of security and computational program is set. It provide good

performance. market sector for health service

Keywords : Health monitoring, Health provider to deliver its service to service provider, Privacy, Decryption user for various deceases. User outsourcing, secure key duplicate 2-way can select their medical health 1. INTRODUCTION service provider based on privacy of information and efficient The most important aspect computing provide by them. of human life is its health. Health service provider operated Remote health monitoring through cloud so that less setup

92 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 cost is required and user can survey shows that majority of access this system from people are very careful about anywhere and also user get privacy of their health output al low cost so user information. User is not ready to acceptances will also increase. adopt this type of system as they Due to involvement of cloud think that the information which more work will performed by is passed through electronic or cloud and less computational wireless media can be break or work done at user, health service tackled at any level of operation. provider side. In order to move In order to make effective system towards a developing nation this privacy of user information is type of systems are very useful to very needful so that large maintain the health of peoples number of peoples gets involved which lived in remote areas. in such systems. Existing rules, Government can take initiative regulations, laws, standards that in budget allocation which will are set by regulating agencies be needful project in the field of and standard creation medical, health science. committees are applicable for only static record system and not Cloud based health considered for cloud based monitoring system is useful environment. Some laws are put system but security or privacy of limitations on cloud to maintain information is important factor security of user information but while design this type of system. not provide any constraint on Maintaining privacy of user health service provider. information which contains Internal user like employee sensitive data regarding their of health service provider might health status is required at high obtain useful information of level. Information may be client and can use that for breached at different operations multiple reasons. Insider attacks like storing, monitoring, are very dangerous if information applying, communicating etc. A is shared with mediclaim,

93 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 insurance, third party provide security of its other agencies/user, regulating identifiable information. User agencies or forums. This input information which is information can be used to passed to health service provider maintain record of user or for contains sensitive information research purpose. Effective like blood group, blood pressure mechanism is required to deals or any other related health with insider attacks from health information can use by third service provider by using party person so security of it different security schemes and breaks. By using this type of providing constraint on them to information, identification of use user information to provide individual user is possible so decision or feedback. privacy is not maintained in Anonymization technique is electronic or wireless deals with privacy of data but it environment. Proposed only consider normal mechanism will provide a good information of personal solution to maintain user information of user. Personal information and provides identifiable information is now alternative choice to deals with diverse which contains multiple privacy of user. Health service things of user input like weight, provider has their medical blood group, biometrics programs to provide feedback to information regarding health. user which act as intellectual Any information which identify property rights of them. Privacy or associated with user is of these programs is also personal identifiable information important so that correct, that may be related to history, accurate and timely feedback will geography, relationship, biology, receive by user. If user send its Vocation, genealogy etc. of user. blood pressure value as input to Existing techniques are only service provider and if security of deals with user basic information service provider program is like name, address but not tackled then user get wrong

94 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 decision or feedback so it not provider or cloud server must provide good data give guarantee regarding communication system. accurate information contained Maintaining privacy of health in output. service provider medical In this we proposed system in programs is also important which first we consider the aspect for secured system. privacy problems and try to In order to maintain privacy provide some kind of solutions or of information, computational alternatives. In advance system workload is also to be think for by using outsourcing decryption involved parties. Important technique privacy of user factor to be considered, that how information and intellectual to move more computation property rights of health service overhead on cloud as compared provider are secured in effective to client with also considering way. In last design a secure key security of user information and duplicate 2-way encryption application programs. Proposed mechanism is used to move system has provided more computational overhead to cloud emphasis on insider attack and and less computational work is also considers outsider attack required at user and health which tackled privacy of service provider side. It was information. Insider attacks proposed that health service must handle at high level and provider will only work online at some other predefined initialization or setup phase and techniques are available to deals then it will go offline so that user with outsider attack that may can work with service provider include cryptographic schemes to through cloud. Computational maintain integrity of data, work is move to cloud in that certificate authority schemes, security or privacy of user digital signature etc. The information is encrypted and feedback or decisions which are decrypted so none of attack is getting from health service possible. By using new scheme a

95 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 complexity required for setup, Fig 1: Architecture of cloud data computing, validations, storage service computations will get reduce so 2. LITRETURE SURVEY it also not provide any additional 2.1 Johannes Barnickel, Hakan overhead on cloud as well as Karahan, Ulrike Meyer proposed user. User at client side, cloud a system for security and privacy server, health service provider architecture and implementation can communicate securely by of the HealthNet mobile maintaining privacy of their electronic health monitoring and information and efficiently for data collection system. Privacy fast and good data and security is achieved through communication system. data avoidance, data A number of mechanisms are minimization, decentralized available for maintaining privacy storage, and the use of but those only deals with some cryptography. This system does information of client and not not deal with centralized consider health service provider approach where as health service programs. User can use mobile provider program does not phones to access cloud based secure. health monitoring system which consist of sensors to obtain 2.2 Rifat Shahriyar, Md. Faizul health information in that case Bari, Gourab Kundu, Sheikh resource constraint of that device Iqbal Ahamed, and Md. Mostofa also maintained for less Akbar proposed Intelligent overhead. Mobile Health Monitoring System (IMHMS) for improving communication among patients, physicians, and other health care workers. Security in IMHMS is provided by using RFID. Each patient will be provided RFID

96 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 tags that will be used to uniquely service provider and more identify the patient. The IMS records are required for will maintain patients profile monitoring program. information with the RFID in the central database. So malicious 2.4 D. D. Kouvatsos, G. Min and attacks can be blocked using this B. Qureshi research on information because a patient Performance Issues in a Secure can be easily tracked using RFID. Health Monitoring Wireless As it require large memory and Sensor Network. It concerned cost so high computational with Data Privacy at acquiring complexity required to secure level, Data security at user personal information from transmission level, Data security unauthorized access. at healthcare provider level. In that therefore a new secure 2.3 Minho Shin, Research Article transmission protocol is required on Secure Remote Health providing optimal transmission Monitoring with Unreliable control and bandwidth Mobile Devices in which he utilization to incorporate provided risk analysis and multimedia (audio / video) data. present a framework for secure remote health monitoring 2.5 BENJAMIN C. M. FUNG, systems. We also designed a KE WANG, RUI CHEN, PHILIP health monitoring architecture S. YU present A Survey of that leverages a special Privacy-Preserving Data monitoring unit that plays the Publishing which state that central role of the security by Detailed person-specific data in providing critical security its original form often contains services including sensitive information about authentication, audit, key individuals, and publishing such management, and data fusion. data immediately violates This system does not concerned individual privacy. The current regarding security of health practice primarily relies on

97 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 policies and guidelines to restrict Chow, Qian Wang proposed the types of publishable data and service i.e Privacy-Preserving on agreements on the use and Public Auditing for Secure Cloud storage of sensitive data. The Storage. Enabling public limitation of this approach is auditability for cloud storage is that it either distorts data of critical importance so that excessively or requires a trust users can resort to a third party level that is impractically high in auditor (TPA) to check the many data-sharing scenarios. For integrity of outsourced data and example, contracts and be worry-free. To securely agreements cannot guarantee introduce an effective TPA, the that sensitive data will not be auditing process should bring in carelessly misplaced and end up no new vulnerabilities towards in the wrong hands. user data privacy, and introduce 2.6 K. Venkatasubramanian, and no additional online burden to S.K. Gupta proposed user. This system does not deal AYUSHMAN: A Secure, Usable with attribute based encryption Pervasive Health Monitoring and also no operation on System. It integrates health encrypted and decrypted data monitoring sensors with highly can be performed. capable entities to robustly 2.8 Arvind Narayanan, collect patient data, and utilizes shamatikov gives viewpoints physiological values for regarding Myths and Fallacies of generating keys and securing “Personally Identifiable inter-sensor communication. Information” which state that Security of health result is not any information, recorded or provided also centralized otherwise, relating to an approach is not possible in that identifiable individual. It is .It also not deal with insider worth noting that the collected attack. information from a mHealth 2.7 Cong Wang, Student monitoring system could contain Member, IEEE, Sherman S.M. client’s personal physical data

98 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

such as their heights, weights, and blood types, or even their ultimate personal identifiable information such as their fingerprints and DNA profiles.

3. PRELIMERIES CONCEPTS /ALGORITHEMS

3.1 Branching program Fig 2: Branching Program It include binary classification.. Based on input 3.2 Outsourcing decryption user value a tree will traversed and will provide result present at It is useful technique for leaf node. Let h be the vector of information based encryption clients information in terms of and useful to move attributes. Each value has index computational workload from and value to create information user to cloud server. Using this component with information secure key transferred to index and the respective transformation key so that user information value. First element will require single computation is a set of nodes in the branching in order to receive feedback or tree. Node above leaf node is take decision. It work with multi- decision and label associated dimensional range query based with decision or feedback is anonymous IBE which consist of present at leaf node. different algorithms: initialization to start the setup, encryption of user input value, transformation of secure key to transformation key, encryption of key with information, decryption performed by client

99 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

and cloud. Third party auditor communicate with service will use above scheme to provider. maintain secure and private 3.3.4 EncryptionHP: It done by communication between all health service provider and parties involved in system. generate cipher text to deliver to Holomorphic encryption is used user through cloud. for encryption of information vector given by user. 3.3.5 ReencryptionHP: It performed by third party 3.3 Secure key duplicate 2- auditor and duplicate server to way encryption encrypt decision or feedbacks This scheme only transfer which deliver to user. required cipher text to other 3.3.6 Decryption: It performed party in order to maintain by client after getting decision security of underlying message or feedback so that user only with two features i.e. key gets valid information privateness and unidirectionality. It consists of 4. SETUP AND WORKING following algorithms: A cloud based health 3.3.1 Initialization: It monitoring system consist of four performed by third party components or parties i.e. User auditor after receiving user at client side, Cloud server, information in form of vector. Health service provider, Third party auditor. Figure 3 shows the 3.3.2 Keygen: It performs by architecture of proposed system user and third party auditor to without using newly proposed create secure key i.e. Private key secure key duplicate 2-way that send as transformation key. encryption which not gives performance analysis in terms of 3.3.3 Rekeygen: It performed by third party auditor which again security and efficiency. computes new secure key to

100 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

provider. After initialization health service provider will store there medical application program in the form of branching tree in cloud. This branching tree is encrypted and generated cipher text is stored in cloud. To identify service Fig 3: Architecture of Proposed provider, each service provider System gets one index and along with An advance system which uses that index encrypted branching secure key duplicate 2-way tree program stored in cloud. encryption for security and When particular user wants efficiency in which 2 ways decision or feedback from service encryption is done. It also provider it starts token reduces computational workload generation operation in at user side and move to cloud. association with third party auditor. Client sends index value of health service provider along with its input value vector which consists of user health information. User input query passed in the form of vector with Fig 4: Module Structure of information components. After Proposed System getting input query from client, During initialization third Third party auditor generates party auditor initializes and run one token and sends to client. setup phase and it generates During this process third party system required parameter. After will not get any user identifiable generation of system parameter information. Client sends token it will publish it to user, cloud to cloud server which require for server and health service getting decision. Third party

101 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 auditor will validate tokens and as it was getting in input vector cloud send this token to health and branching programs of service provider then service health service provider is not provider will generate feedback revealed to cloud by applying or decision based on decision tree outsourcing decryption structure and it was passes to technique. Decision or feedback cloud in cipher text format. is encrypted and user only gets Cloud will get partially decrypted that information so it was also cipher text which pass to client. secured. Due to token As this is partially message so generation mechanism cloud not get any useful communication between user, information of decision or service provider and cloud feedback. server is secured. A system is designed and security of information is maintained by creating tokens. Cloud had been formed and access through website.

5.2 Efficiency: Computational complexity of service provider depends on number of nodes in branching program and computational mechanism. As Fig 5: Final Structure of service provider goes offline Proposed System after setup phase less computations are required and 5. PERFORMANCE ANALYSIS computational overhead will move from user side to cloud 5.1 Security: Cloud server will server for fast and timely not get any personal communication system. identifiable information of user Experiments will be conducted

102 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

in order to compute result after reduce so small organization or completion of project. companies can take participation in business and create their 5.3 Mobility: System can be market in efficient way. Security handled through wireless and Effectiveness are achieved network or electronic media with through proposed system. any platform.

6. CONCLUSION ACKNOWLEDGEMENTS Whenever we are standing on Cloud based health monitoring most difficult step of the dream system efficiently secure privacy of our life, we often remember of user information and the great almighty god for his application programs of health blessings kind help. And he service provider. To maintain always helps us in tracking on privacy of user information in the problems by some means in cloud and from insider attack, our lifetime. I feel great pleasure the anonymous Boneh-Franklin to represent this seminar entitled identity based encryption (IBE) Efficiently Securing Privacy of is used to deals with personal User Information in Cloud Based identifiable information. Health Monitoring System. I Outsourcing decryption would like to convey sincere technique will reduce gratitude to my seminar guide computational overhead at user and M.E. Coordinator Prof. M. side and move to cloud. M. Naoghare for her valuable Branching program are guidance and support and who encrypted using different branch guided me provided me with his node values for maintaining useful and valuable suggestions security of that. By applying and without his kind co- newly developed secure key operation it would have been duplicate 2-way encryption extremely difficult for me to scheme computational overhead complete this paper. at service provider side will

103 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

I would also like to extend my Symposium on Security and gratitude to our respected Prof. Privacy, 2007, pp. 350–364. S.M. Rokade, Head of Computer [3] G. Clifford and D. Clifton, Engineering Department for “Wireless technology in disease their kind co-operation for the management and medicine,” betterment and successful Annual Review of Medicine, vol. 63, pp. 479–492, 2012. completion of this paper and support they ever provided to [4] Cong Wang, Student Member, IEEE, Sherman S.M. Chow, Qian me. And last but not least I Wang, Student Member, IEEE, would also like to thanks my Kui Ren, Member, IEEE, and parents and all my friends for Wenjing Lou, Member, IEEE, their encouragement from time “Privacy-Preserving Public to time. Finally, I am very Auditing for Secure Cloud grateful to Mighty God and Storage”. inspiring parents who loving and caring support contributes a Books: major share in completion of my [5] Krishna K. Venkatasubramanian, task. Sandeep K. S. Gupta,”security for Pervasive Health Monitoring REFERENCES Sensor Applications”. Journal Papers: [6] W. Stallings, “Cryptography and [1] Huang Lin, Jun Shaoy, Chi Network Security: Principle and Zhangz, Yuguang Fang, fellow Practices”, Prentice Hall IEEE,” CAM: Cloud- Assisted [7] Johannes Barnickel, Hakan Privacy Preserving Mobile Karahan, Ulrike Meyer, UMIC Health Monitoring.” IEEE Research Center,” Security and Transaction on Image Processing Privacy for Mobile Electronic VOL: 8 NO: 6 YEAR 2013. Health Monitoring and Recording [2] E. Shi, J. Bethencourt, H. T.-H. Systems” Chan, D. X. Song, and A. Perrig, Proceedings Papers: “Multidimensional range query over encrypted data,” in IEEE [8] J. Brickell, D. Porter, V. Shmatikov, and E. Witchel,

104 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

“Privacy-preserving remote process for patients with diagnostics,” in Proceedings of the diabetes and cardiovascular 14th ACM conference on disease using mobile telephony.” Computer and communications Conference Proceedings of the security. ACM, 2007, pp. 498–507. International Conference of [9] P. Mohan, D. Marin, S. Sultan, IEEE Engineering in Medicine and A. Deen, “Medinet: and Biology Society, personalizing the self-care vol. 2008

105 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 CLOUD BASED MOBILE SERVICE DELIVERY USING QOS MECHANISM Ms.PrachiB.Gaikwad, SVIT,CHINCHOLI Prof. S.M.Rokade, SVIT,CHINCHOLI

Abstract— Cloud computing is resources by improving QoS an emerging trend for large and QoE of mobile services. scale infrastructures. It has This framework tells that the the advantage of reducing services run on public cloud cost by sharing computing are able to populate to other and storage resources, cloud in different location. combined with an on-demand This paper proves that if we provisioning mechanism add resource pool for every relying on a pay-per-use cloud then it is responsible business model. Mobile for removing ambiguity devices always maintain which occurs at the time of network connectivity by migrating services. different network providers. So if user moves around then Key Words—cloud computing, they can access cloud QoS, QoE, service polulation services without any disadvantages. In current I. INTRODUCTION model when user moves from Cloud computing becomes one geographical area to popular now a days because of its another he will keep simple nature. It offers various accessing services from computing and storage services previous cloud over a long over a internet.Cloud service distance. It results in more providers rent data centers congestion on network. hardware and software to deliver There is the need of different storage and computing services approach which maintains through the internet. Internet

106 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 users can access services from hardware resources which is used cloud. Instead of their own to perform some critical task. At devices cloud users can store that situation there is the need to their data on cloud. They can run access those devices remotely their applications on cloud through network for storage and platform without full installation processing. This feature is of software. provided by cloud computing. It Cloud service providers provide provides center based resources various cloud services and and those devices require resources as user requirement decenter based pool of resources. and they charged them It creates traffic congestion accordingly. Cloud computing problem on internet due to user increases its popularity because mobility and high bandwidth of its simple nature. Amazon EC2 services. It affects QoS and QoE and Applel’siCloud are very factors in mobile services. This popular cloud based products[2]. paper consists of framework Those vendors create their own which overcomes the problem by cloud services and offer them to service populating technique. client for business and individual II. LITERATURE SURVEY uses. They create cloud services A project invents a of reshaping as reqirements come from of the physical footprint of market and each have different virtual machines within a than others. cloud[8]. It invents a concept Mobile computing also becomes towards the lower operational more popular due to smartphones costs for cloud providers and and tablet pc’s. Laptops and improvement of hosted desktops are cannot be easily application performance by operate due to its size and form. taking into account affinities and So it increases the demand of less conflicts between replaced virtual weight and size mobile devices machines. It is achieve by than laptops and desktops. But mapping virtual machine thesedevices cannot have some footprints. After comparing if

107 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 similarities found in memory concept the physical machines footprint the virtual machines closer to the clouds outer are migrated to the same memory boundarywill used to handle QoS location and content based sensitive services. As these memory sharing also deployed to machines located on outer get consolidation[9][10][11]. The boundary of cloud the data has to basic thing is to build control travel less distance within the system for cloud which perform cloud before sending to the client. footprint reshaping to achieve It improves QoE for client and higher level objectives like low reduces network congestion of power consumption, high cloud. reliability and better All these researches aims only to performance. It then reduces the improve the cloud performance, cost for cloud providers and no one can think about the user creates low cost cloud services for mobility. Providing media user. services to mobile clients MEC (Media Edge Cloud) becomes popular in future. As per architecture improves the that concept mobility and performance of cloud technology. multimedia contents becomes This architecture also improves more popular and high QoS and QoE for multimedia bandwidth data streams will have applications. To achieve that to travel more distance and reach “cloudlets ” of servers running at moving target can create a edge of bigger cloud. So it problem in future. Cloud handles the request closer to the providers may need to create cloud thus it reduces the latency. more clouds to handle the load If requests needs further and reduces the congestion. processing then requests are sent In cloud computing client get to the inner cloud due to that the services by contacting a physical “cloudlets” are reserved for QoS resources directly and then ask based multimedia about the service. Clients need to applications[13]. Using that connect to the cloud then they

108 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 can accesses the services from the III. PROGRAMMERS DESIGN cloud. But in this approach client QoS aware service delivery model need to know the name of the is necessary to deliver the physical resource which offers services. The network the services to the client, so it the infrastructure is used to decide problem of redundancy. Some the network status between the organizations solve this problem client and service. Service by running multiple servers and providers provide services with by using DNS, for load balancing best QoS and QoE parameters to and fail over [13]. This approach their clients. In this model client needs more cost which is not of cloud services will remain affordable for small entities connect to the same cloud which offers a service at lower without thinking about its layer. physical location and network The ability for clients to request status. If the network condition services directly from the satisfactory and there is no network instead of asking for redundant path the service will physical resources that offers be out of reach of network. So those services[1]. It opens a doors providers not able to reach their for future development. Client SLA standards which results that request a service ID and network clients not getting the best QoE . infrastructure which is used to Another thing is that the cloud find whether the actual service is from any location has connected running and then connect it to to the same cloud to get services the client. This approach is able without thinking about the to running a service in multiple distance of cloud from itself. It locations and directly client results in creating load on cloud requests to the appropriate which degrades the QoS of instance depending on their services. It is not possible for location and network status. cloud providers to build a multiple clouds to provide services to the different

109 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 geographical areas. So there is degrades. At that time it is not the need of new technique for expected to create multiple service delivery which provides clouds by cloud providers. various services to clients with Single cloud providers may not proper QoS and QoE parameter, own multiple clouds at different it is also provide better physical location so it is possible management of cloud to the that many cloud providers have providers, it also reduces the their cloud far apart or down to network congestion. In this regional scale within a country. service delivery model we will So we able to address the issue of have clients who request the service population across the services and their requests will different boundaries of cloud be directed to physical location at providers. It introduces a concept which the service is running by where service providers will fulfilling QoS and QoE register their services globally parameters. In the mobility case and not bound to specific cloud it is difficult to direct client to a providers. Services which are specific instance of service. We globally registered and not bound can connect client to the service with specific cloudproviders will instance based on their present free to populate or migrate to location and network conditions, different cloud depending on QoS but if client move to another and source of service request location with different network parameters. This will only area then it is difficult to get this. possible when cloud providers If the user moves away from the open their boundaries, so services cloud then it creates congestion can move in and out of their on network so it impacts on the cloud. It will change the model of QoS of all services on the service providers. Service network. To solve this problem providers will register their we could connect to the client at services with service level different instance of service each agreement which defines the time the QoS parameter expected QoS parameters. Cloud

110 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 providers provide services with references or object of all best QoS so that it will populate populated and non-populated their service and it gives income services. Another client can for them. It is not possible to any access the populating service big cloud to take all the services without any interruption. due to the network congestion problem. So the services may Smart Recommend populate from bigger cloud to phone ation QO smaller cloud to maintain S network congestion free and Eng minimize the distance of itself ine from client. After populating Us Reco er mme services from one cloud to Clo another the receiving cloud can ud also reject the populating service, if it is already under the heavy Fig: System architecture load. This population of service Above figure shows a system process is completely transparent architecture smartphones and to the user. To achieve all those users are the clients who things there is the need of new accessed the services of cloud. service delivery framework and it Those are mobile clients so if should be QoS aware and support they move from one location to service population. At the time of another then there is the need to migration of any service from one populate the services to another cloud there may be the chance location. So the engine gives the that another user is accessing the recommendations depending on same service so after migration of the QoS parameters. The another service from current cloud leads cloud decides whether to receive to starvation of second client. So or not the populated services. to solve this problem we add IV.SERVICE FRAMEWORK separate resource pool to each The service populating model cloud which is used to keep needs a concept of an open cloud.

111 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 The existing closed cloud only parameters defined by cloud. It is runs services controlled by its also used to migrate the service owner. Open cloud allows to find proper clouds that can services from third party accept the service. At that time if providers to populate it. The service needs an extra resource, open cloud is like resource pool so it can be given & as per that anyone can use these resources to service providers will billed. run their services and anyone can Service Subscription Layer provide such resource pool and (SSL)- It is used to perform the accept services from other +. It is used to keep track of providers to run on it. So the new number of from which location framework comes in model.This accessing a service. proposed framework consists of Service Delivery Layer (SDL)-It six layers. delivers services to specific Those are as follows clients. It is responsible for Service management layer:-It is publishing a service from one used to check how services are cloud to another cloud. Finding registered in a cloud. Billing the appropriate cloud as per the information between resources necessary requirements is done and service providers is and then service is populating to processed. It is considered as part this cloud. of application layer in OSI Service Migration Layer (SMiL)- because it defines the The migration of services applications and how they use between clouds is the resources. responsibility of SMiL . To When service providers wants to populate a service we have to publish & service they have to first sure that the target cloud define security &QoS can accept the service. So the parameters. It is the requirement decision of whether to move or to run the service. So to do that not to move a service is done at each service must have a list of SDL. Using that decision parameters this must agree with SMiLinstruct the cloud about

112 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 which resources need to be suited to pass the client request allocated. on the basis of location of user. Service connection layer:-It Service can decide whether to handles the client mobility issue add new client or reject and pass and also checks the connection them to another cloud if possible. between client and services. The STAR works like a DNS Service network abstraction system. The service subscriber layer:-It provides the abstraction request to the STAR for getting property to simplify the cloud id. Once the cloud id is migration process. It acts as found then it is resolved into IP interface between service addresses of cloud that client can delivery framework and new connect to access the service. technology. The decision making algorithm is used to decide whether to accept or reject the service at delivery IV. IMPLEMENTATION To gather the QoS data and layer. network conditions we are using Algorithm: QoS manager. It collects such 1. Start. data by querying the clients for 2. Create node and start network conditions. It also node. resolves service names into the 3. Start QoS manager which unique service ids. To deliver any checks the QoS of various service we need to connect the services. service to its proper service 4. Then next step is user instance, Service Tracking and authentication which is Resolution(STAR) is used to used to authenticate the connect service subscriber to users. correct service instance. It also 5. Authenticated user is keeps records of service ids and connected to the service. in which cloud their instances are 6. Suppose user access video, running. STAR will make video streaming is going decision which cloud is best on.

113 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

7. At the time of streaming balancing becomes more easier in system tracks the QoS this method. It provides with the help of QoS adaptable resource allocation manager. scheme as services can be 8. QoS manager gives some replicated as per the demand. recommendations those Cloud providers can share their are track by system. resources with providers, it gives 9. On the basis of flexibility to add more resources recommendation system as their cloud needs them. It is takes migration decision. also useful in gaming as it is used 10. Service is migrated to in rendering technique. Most of another cloud or keeps as the traffic generated due to the it is. video and audio streaming so it 11. If service is migrated to reduces the network traffic another cloud system will generated by streaming. As the again check the QoS. distance between client and 12. Stop. services reduces by migration decreases the latency which gives V. APPLICATIONS user the more interactive feel to A QoSbased service delivery multimedia applications by model provides various improving QoE. applications and services. It reduces the network congestion CONCLUSION for frequently accessed websites This paper gives the solution on or that having more multimedia challenges presented by user data. This method consumes a mobility. Previous service bandwidth in streaming which delivery model is inefficient to requires appropriate QoS. In this, provide future requirements of the whole service is populated to mobile user. The cloud the area where it is more technology with proposed model demanded it provides great can bring the solution to proper benefits to that area. Load management of network

114 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 resources. This paper introduced Second (2.5 G) and Third (3 a technique which reduces the G) Generation Wireless congestion on network, which Networks, IETF,2003. generated by streaming video and [6] Amazon, 2012, EC2, Feb. 28, audio. 2012. [Online]. Available: REFERENCES http://aws. amazon.com/ec2/. [1] FragkiskosSardis, [7] Microsoft, GlenfordMapp, Jonathan 2011CloudComputing, Loo"On_the_Investigation_of Feb.28, 2012. [Online]. _Cloud-Based Mobile Media Available:http://www.microso Environment with Service ft.com/enus/cloud/default.asp Populating QoS Aware x?fbid Mechanisms" IEEE trasaction [8] J. Sonnek, J. Greensky, R. on multimedia, vol.15, no.4, Reutiman, and A. Chandra, June Starling: [2] Apple, 2012. iCloud Feb. 15, Minimizingcommunication 2012. overhead in virtualized [Online].Available:http://www computing platforms using .apple.com/icloud. decentralizedAfnity-aware [3] J. Postel and J. Reynolds, ISI, migration, in Proc. 39th Int. RFC 948, A Standard for the Conf. on Parallel Processing Transmission of IPDatagrams (ICPP10),San Diego, CA, Over IEEE 802 Networks, USA, Sep. 2010. IETF, 1988. [9] J.Sonnekand, [4] ETSI, 2011, Mobile A.Chandra,Virtualputty:Resh Technologies GSM, Feb. 15, apingthephysical footprint of 2012.[Online].Available:http:/ virtualmachines, in Proc. /www.etsi.org/WebSite/Techn Workshop on Hot Topics in ologies/gsm.aspx. Cloud Computing (Hot [5] H. Inamura, G. Montengero, Cloud09), San Diego, CA, R. Ludwig, A. Gurtov, and F. USA, Jun. 2009 34. Kha_zov, RFC 3481,TCP over

115 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

[10] T.Wood, G. Tarasuk- [12] W.Zhu, C.Luo, J.Wang, Levin, P. Shenoy, P. and S.Li, Multimedia cloud Desnoyers, E. Cecchet, and computing, IEEE Signal M. Corner, Memory buddies: Process.Mag.,vol 28, no.3 Exploiting page sharing for 5969,May 2011. smart colocation in [13] T. Brisko, RFC 1794, DNS virtualizeddata centers, in Support for Load 3Proc. 5th ACM Int. Conf. Balancing,IETF, 1995. Virtual Execution [14] D.N.Thakker, Prefetching Environments, 2009. and clustering Techniques for [11] D. Gupta, S. Lee, M. network based storage, Ph.D. Vrable, S. Savage,A.C.Snoer- dissertation, Sch. Eng. Inf. en,G.Varghese,G.M.Voelker,a Sci., Middlesex Univ., nd A.Vahdat,Di_er- London, U.K.,2010 enceengine: Harnessing memory redundancyin virtualmachines, in Proc. OSDI, 2008.

116 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 FAULT DIAGNOSIS IN INDUCTION MOTOR

Miss K.R.Gosavi Mrs. A.A.Bhole(Asst. Prof) Department of Electrical Engineering Department of Electrical Engineering Government College of Engineering Government College of Engineering Aurangabad, Maharashtra, India Aurangabad, Maharashtra, India [email protected] [email protected]

Abstract— Although, I. INTRODUCTION Induction motors are highly The study of induction motor reliable, they are susceptible behavior during abnormal to many types of faults that conditions due to the presence of can became catastrophic and faults; and the possibility to cause production shutdowns, diagnose these abnormal personal injuries, and waste conditions has been a challenging of raw material. Induction topic for many electrical machine motor faults can be detected researchers. Induction motor has in an initial stage in order to been established as the prevent the complete failure workhorse of industry ever since of the system and unexpected the 20th century. Speed control of production costs. The purpose AC motors has been a of this paper is the analysis of continuously pressing various faults of inverter fed requirement of industry, so as to induction machine. The ensure better production with a laboratory tests thus high degree of qualitative conducted have been reported, consistency. Although recent and it is hoped that the developments in Power research investigations Electronics and Controls have reported would be very useful brought forth some very to the Power Electronics significant drive alternatives like circuit industry. the Switched Reluctance motor,

117 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Permanent magnet and Brushless the market breakups between AC DC Motor; these have not yet drives and DC drives. The rugged become very popular and cost construction of AC drives has effective for a wide range of opened up a host of new applications, especially in the application areas, thereby damp-proof, dust-proof and providing the user and also the flame-proof environment. manufacturer additional potential Therefore, the widespread use of to increase their productivity and induction motors is still II. CONCEPT OF DRIVE SYSTEMS economically viable as well as While comparing the dynamic popular, and is likely to continue performance of a separately for the next few decades. Variable excited DC motor with that of an speed drives are widely used in all Induction motor, the latter application areas of industry. presents a much more complex These include transport systems control plant. This is due to the such as ships, railways, elevators, fact that the main flux and conveyors; material handling armature current distribution of plants and utility companies for a DC motor is fixed in space and mechanical equipment e.g. can be controlled independently; machine tools, extruders, fans, whereas in the case of AC motor, pumps and compressors. The these quantities are strongly penetration of variable speed ac interacting. This design drives into these sectors has been constraint makes the induction further accelerated by the motor drive structure more development of new power complex and non-linear. The semiconductor devices and drive drive hardware complexity concepts, which further allow increases as more and more new functions and performance stringent performance characteristics to be realized. The specifications are demanded by application of new Power the user. The complexity further Electronic components has also increases because of the variable initiated a significant change in frequency power supply, AC

118 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 signal processing and relatively resistance drop may cause a complex dynamics of the AC higher motor current to flow at machine. light loads due to saturation. PWM Inverters: These effects may overheat the One of the best possible methods machine at low speeds.. to control the torque and speed of These limitations of a six induction motor is to implement step inverter drive are overcome variable voltage and variable in a pulse width modulated frequency inverters. Inverters (PWM) inverter. The basic block used for variable speed drive diagram of a PWM inverter is applications should have the shown in figure 1. capability of varying both the voltage and frequency in accordance with speed and other control requirements. The simplest method to achieve this control is through a six step inverter. But this method suffers from the following limitations: Figure 1. Block Diagram Of (i) Presence of low order Inverter System harmonics, because of which the Because of a low harmonic motor losses are increased at all content in the output voltage of speeds causing derating of the diode bridge and also due to the motor. presence of harmonics in the (ii) Torque pulsation is present at input current of a PWM inverter, low speeds, owing to the presence the requirement of filter size in of lower order harmonics. such systems is small. The drive (iii)The harmonic content system consequently delivers increases at low speeds, thus smooth low speed operation, free increasing motor losses. Also the from torque pulsation, thus increase in V/f ratio at low speed leading to lower derating of motor to compensate for the stator and higher overall efficiency. Also

119 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 because of a constant DC bus • DC link capacitor short circuit voltage, a number of PWM fault F4. inverters with their associated • Transistor base drive open motors can be supplied from a fault F5. common diode bridge. However, • Transistor short circuit fault these advantages are obtained at F6. the expense of a complex control • Line to line short circuit at system and higher switching loss machine terminals F7. due to high frequency operation. • Single line to ground fault at Survey of Various Faults: machine terminals F8. A wide range of motors are • Single phasing at machine currently being used for terminals F9. industrial applications. They A three phase voltage fed deliver a wide range of inverter candevelop any of the characteristics demanded for above stated faults, out of which specific tasks. Motors for all types thepen base drive and shoot of duties and with various through are the most common. characteristics require adequate protection. Hence it is essential that the characteristics of motors be carefully examined and considered before applying protection systems.A three-phase voltage fed inverter can develop various types of faults as shown in figure 2. • Input supply single line to ground fault F1.

• Rectifier diode short circuit

fault F2.

• Earth fault on DC bus F3.

120 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 MOTOR CURRENT analysis due to improper peration SIGNATURE ANALYSIS of the equipment, etc. and is Motor Current Signature there other data that can be used Analysis (MCSA) is a system used in an analysis. for analyzing or trending 3. Take data. dynamic, energized systems. 4. Review data and analyze: Proper analysis of MCSA results Review the 10 second snapshot of assists the technician in current to view the operation identifying: over that time period. Review low 1. Incoming winding health frequency demodulated current to 2. Stator winding health view the condition of the rotor 3. Rotor Health and identify any load-related 4. Air gap static and dynamic issues. Review high frequency eccentricity demodulated current and voltage 5. Coupling health, including in order to determine other faults direct, belted and including electrical and geared systems mechanical health. 6. Load issues FAULT DETECTION 7. System load and efficiency a) Detection of broken bars 8. Bearing health It is well known that a 3-phase symmetrical stator winding fed BASIC STEPS FOR from a symmetrical supply with ANALYSIS frequency f 1, will produce a There are a number of simple resultant forward rotating steps that can be used for analysis magnetic field at synchronous using MCSA. The steps are as speed and if exact symmetry follow: exists there will be no resultant 1. Map out an overview of the backward rotating field. Any system being analyzed. asymmetry of the supply or stator 2. Determine the complaints winding impedances will cause a related to the system in question. resultant backward rotating field For instance, is there reason for from the stator winding. When

121 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 applying the same rotating torque pulsation at twice slip magnetic field fundamentals to frequency (2sf1) and a the rotor winding, the first corresponding speed oscillation, difference compared to the stator which is also a function of the winding is that the frequency of drive inertia. This speed the induced electro-magnetic oscillation can reduce the force and current in the rotor magnitude (amps) of the f1(1-2s) winding is at slip frequency, i.e sideband but an upper sideband s.f1, and not at the supply current component at f1(1+2s) is frequency. The rotor currents in a induced in the stator winding due cage winding produce an effective to rotor oscillation. The upper 3-phase magnetic field with the sideband is enhanced by the third same number of poles as the time harmonic flux. Broken rotor stator field but rotating at slip bars therefore result in current frequency f2= s.f1 with respect to components being induced in the the rotating rotor. With a stator winding at frequencies symmetrical cage winding, only a given by [8]: fsb =f1(1±2s) Hz (2) forward rotating field exists. If These are the classical twice slip rotor asymmetry occurs then frequency sidebands due to there will also be a resultant broken rotor bars. b) Detection of backward rotating field at slip air gap eccentricity: Air-gap frequency with respect to the eccentricity in electrical machines forward rotating rotor. As a can occur as static or dynamic result, the backward rotating eccentricity. Static eccentricity is field with respect to the rotor defined as a stationary minimum induces an e.m.f. and current in air-gap. This can be caused by the stator winding at: Fsb=f1 (1- stator core ovality or incorrect 2s) HZ (1) This is referred to as positioning of the rotor or stator the lower twice slip frequency at the commissioning stage. At sideband due to broken rotor the position of minimum air-gap bars. There is therefore a cyclic there is an unbalanced magnetic variation of current that causes a pull which tries to deflect the

122 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 rotor thus increasing the amount shorted turns and the detailed of air-gap eccentricity. mathematics can be found in the Dynamic eccentricity is defined as references [9-10]. d) Detection of a rotating minimum air-gap. It mechanical influences Changes in can be caused by a bent shaft, air gap eccentricity results in mechanical resonances at critical changes in the air gap flux speeds, or bearing wear. Either waveform. With dynamic can lead to a rub between the eccentricity the rotor position can rotor and stator causing serious vary and any oscillation in the damage to the machine. The radial air gap length results in effects of air-gap eccentricity variations in the air gap flux. produce unique spectral patterns Consequently this can induce and can be identified in the stator current components given current spectrum. The analysis is by [11-13]: e r f f mf 1 (3) based on the rotating wave Where f1 = supply frequency approach whereby the fr = rotational speed frequency of magnetic flux waves in the air- the rotor m = gap are taken as d) At low speed 1,2,3...... harmonic number fe on break B the product of = current components due to air permeance and magneto motive gap changes This means that force (MMF) waves. problems such as shaft/coupling c) Detection of shorted turns in misalignment, bearing wear, LV stator winding The objective roller element bearing defects and is to reliably identify current mechanical problems that result components in the stator winding in dynamic rotor disturbances can that are only a function of be potentially detected due to shorted turns and are not due to changes in the current spectrum. any other problem or mechanical e) Influence of gearboxes drive characteristic. There has Mechanical oscillations will give been a range of papers published rise to additional current on the analysis of air gap and components in the frequency axial flux signals to detect spectrum. Gearboxes may also

123 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 give rise to current components of spectrum is completely free of any frequencies close to or similar to current components around the those of broken bar components. main supply frequency, f1, and Hence, to perform a reliable consequently, the frequency diagnosis of a rotor winding for range in which current motors connected to a gearbox, components due to broken rotor the influence of gearbox bars are expected are empty. The components in the spectrum need motor thus shows no signs of be considered. Specifically, slow broken rotor bars Case II: Rotor revolving shafts will give rise to Asymmetry Rotor asymmetry was current components around the detected in a 50 HP, 440 volt main supply frequency operating in sugar mill during components as prescribed by quality control analysis using equation (5) where the rotational MCSA. The full oad speed is 940 speed frequency of the shaft, rpm yielding a frequency interval rotating with Nr rpm, may be of 48.55 Hz to 51.66 Hz for calculated as: 60 r detection of broken rotor bars. V. CASE STUDIES The motor was operating at 85 Case I: Current spectrum of a Amps, corresponding to healthy motor A 100 HP, 440 V approximately 60% full load. standard efficient motor driving a Based on the load conditions, the pump was tested in sugar mill. instrument predicted current The motor was operating at 95 components due to broken rotor Amps, corresponding to bars to be positioned at 49.0 Hz approximately 75% full load. The and 51.0 Hz. A search band is full load speed was 1775 rpm applied around these positions. yielding a frequency interval of Figure 2 shows one current 48.55 Hz to 51.77 Hz for component to be present in each detection of broken rotor bars. search band. The components are Figure 1 shows part of the distributed symmetrically around frequency resolved current f1, as expected, but different spectrum for the motor. The magnitudes, 47.2dB and 58.0dB

124 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 from the main supply frequency. frequency within the applied The components are a sign of search bands. initial rotor asymmetry but yet not indicative of an unhealthy motor. Case III: Damaged Rotor Figure 3 shows part of the

frequency resolved current

spectrum for coal mill rated 440 V, 150 HP operating in a utility plant. The full load speed is 955 rpm yielding a frequency interval of 48 Hz to 52 Hz for detection of broken rotor bars. Based on the supply current, the instrument predicted sidebands due to and thus disregard these slopes broken rotor bars to be positioned from the analysis thereby at 49.70 Hz and 50.57 Hz. These correctly identifying the current frequency positions are close to components due to broken rotor that of the supply frequency. bars. Figure 3 show the supply Case IV: Fault in gear box frequency to have a somewhat Figure 4 shows part of the wide declining current frequency resolved current component. This is caused by the spectrum for coal mill rated 440 motor being subjected to smaller V, 240 HP operating in a utility changes in load, i.e. smaller plant. The full load speed is 885 changes in supply current, during rpm yielding a frequency interval the data acquisition process. of 48 Hz to 52 Hz for detection of However, the peak detection broken rotor bars. The motor is algorithms embedded in the driving a coal mill through a instrument was able to detect the three-stage reduction gearbox, i.e. declining slopes of the supply the gearbox thus contains three

125 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 shafts. The output speed of the References: gearbox at full load conditions is 1] VAS, P. “Parameter Estimation, 18.46 rpm and the individual Condition Monitoring, and Diagnosis shaft speeds internal to the of Electrical Machines”, Clarendon Press, Oxford, 1993. gearbox are 48.60 rpm and 135.78 [2] Kilman, G. B., KoegL, R. A., rpm at full load conditions. Stein, J., Endicott, R. D., Madden, M. Conclusion: W. “Noninvasive Detection of Broken Motor Current Signature Rotor Bars in Operating Induction Analysis is an electric machinery Motors”, IEEE Trans. Energy Conv. monitoring technology. It (1988), 873–879. provides a highly sensitive, selective, and cost-effective.

126 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

A Review on Intrusion Detection System for Web Based Applications

R S Jagale, M M Naoghare (Department of Computer Engineering, SVIT/ Pune University, India) Email:[email protected])

ABSTRACT : Today Internet and applications related with Internet have become an important part of our day to day life. Internet gives us a strong communication media. We can manage our personal information from anywhere in the world. Internet provides banking services, E-mail, online Shopping and many other services. These applications generate large volume of data, and complexity. To manage this increase in application and data complexity, web services use multitier design where the web server runs the application in the frontend logic and data are stored in a database server as a back end. In this paper, we present network based Intrusion Detection System that models the network behavior of user sessions across both the frontend web server and the backend database. By analyzing both web and database requests, we are able to search out attacks that other IDS would not be able to identify. Furthermore, we quantify the limitations of any multitier IDS in terms of training sessions and functionality coverage. We proposed Intrusion Detection System using an Apache web server with MySQL and lightweight . It is an Intrusion Detection System used to detect attacks in multitier web application. Our approach can create normality models of isolated user sessions that include both the web front-end (HTTP) and back-end (File or SQL) network transactions. To achieve this, employ a lightweight virtualization technique to assign each users web session to a dedicated container, an isolated virtual computing environment. We use the container ID concept to accurately associate the web request with the Database queries. Thus, our IDS can build a mapping profile by taking both the web server request and Database SQL queries into consideration.

Keywords: Anomaly, Intrusion, SQL,

I. INTRODUCTION administrators. For a regular user, the web request Ru will There are four major attacks will generate the set of SQL queries be possible on the multi tier web Qu for an administrator, the based applications. request Ra will generate the 1.1 User Privilege Escalation set of admin level queries Qa. Attack Now assume that an attacker Let’s assume that the website login into the web server as a serves both regular users and normal user and changes his

127 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 privileges by user privilege Sessions by not generating any escalation attack. Then he Database queries for normal user triggers admin queries so as to requests. obtain an administrator’s data. This type of attack can never be detected by either the web server IDS or the database IDS since both Ru and Qa are legitimate requests and queries. Fig. 1.1 shows that how a normal Figure 1.1: User Privilege user may use admin queries to escalation attack obtain privileged information. 1.3 A SQL Injection Attack 1.2 Hijack Future User In SQL Injection attack Session Attack attacker scan concatenate the This type of attacks is mainly dynamic part like data or done at the web server side. An string contents to the static attacker first takes control of the part of the SQL query. This will web server and therefore hijacks generate the SQL Injection all subsequent user sessions to attack. This attack collects launch attacks. For instance, by important information from the hijacking other user sessions, the back end database. Since our attacker can send spoofed replies, approach provides two-tier and/or ignore user requests. A detection, even if the SQL session hijacking attack can be Injection attack are accepted by further categorized as a the web server, the relayed Spoofing/Man in-the-Middle contents to the Database server attack, a Denial-of- would not be able to take on the Service/Packet Drop attack, or a expected structure for the Replay attack. Fig. 1.2 illustrates given web server request. For a scenario where in a instance, since the SQL injection compromised web server can attack changes the structure of harm all the Future User the SQL queries, even if the

128 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

injected data were to go Database queries were within through the web server side, the set of allowed queries, it would generate SQL queries then the database IDS would in a different formats that not detect it either. Fig. 1.4 could be detected as a illustrates the scenario wherein deviation from the SQL query an attacker by passes the web format that would normally server to directly pass the SQL follow such a web request. Fig. query to the database. 1.3 illustrates the scenario of a SQL injection attacks are accepted by the web server

Figure 1.3: SQL Injection Attack

Figure 1.2: Hijack Future Session Attack 1.4 Direct Database Attack Figure 1.4: Direct Database In direct database attack an Attack attacker can bypass the web II. LITERATURE SURVEY server or firewalls and connect directly to the database. An A network Intrusion Detection attacker could also have System can be classified into two already taken the control of types: anomaly detection and web server and be trigger such misuse detection. Anomaly queries from the web server detection first requires the IDS without sending web requests. to define and characterize the The web server IDS could not correct and acceptable static able to detect unmatched web form and dynamic behaviour of request. Furthermore, if these the system, which can then be

129 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

used to detect abnormal changes reduce the number of replicated or anomalous behaviours [11], alerts, false positives, and non [12]. The boundary between relevant positives. It also fuses acceptable and anomalous forms the alerts from different levels of stored code and data is describing a single attack, with precisely definable. Behaviour the goal of producing a models are built by performing a succinct overview of security statistical analysis on historical related activity on the data [13], [18], [21] or by using network. It focuses primarily on rule-based approaches to specify abstracting the low-level sensor behaviour patterns [20]. An alerts and providing compound, anomaly detector then compares logical, high-level alert events to actual usage patterns against the users. Our IDS differs from established models to identify this type of approach that abnormal events. Our detection correlates alerts from approach belongs to anomaly independent IDSs. Rather, our detection, and we depend on a IDS operates on multiple feeds training phase to build the of network traffic using single correct model. As some IDS that looks across sessions legitimate updates may cause to produce an alert without model drift, there are a number correlating or summarizing the of approaches [19] that are alerts produced by other trying to solve this problem. independent IDSs. Our detection may run into the An IDS such as in [17] also same problem; in such a case, uses temporal information to our model should be retrained detect intrusions. Our IDS, for each shift Intrusion alerts however, does not correlate correlation [18] provides a events on a time basis, which collection of components that runs the risk of mistakenly transform intrusion detection considering independent but sensor alerts into succinct concurrent events as correlated intrusion reports in order to events. Our IDS does not have

130 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 such a limitation as it uses the virtualization techniques are container ID for each session to commonly used for isolation and causally map the related containment of attacks. events, whether they are However, in our IDS, we utilized concurrent or not. Virtualization the container ID to separate is used to isolate objects and session traffic as a way of enhance security performance. extracting and identifying causal Full virtualization and para- relationships between web server virtualization are not the only requests and database query approaches being taken. An events. alternative is a lightweight III. IMPLEMENTATION virtualization, such as OpenVZ DETAILS [3], Parallels virtuozzo [4], or Linux-VServer [16]. In general, we set up the threat model to these are based on some sort of include our assumptions and container concept. With the types of attacks We are containers, a group of processes aiming to protect against. still appears to have its own Attacks are generated from dedicated system, yet it is network and initiated by the web running in an isolated clients; they can launch environment. On the other application layer attacks to hand, lightweight containers can compromise the web servers have considerable performance they are connecting to. The advantages over full attacker can directly attack on virtualization or para web site database by ignoring virtualization. Thousands of web server. we assume that the containers can run on a single attacks can neither be detected physical host. There are also nor prevented by the currently some desktop systems [15], [14] available web server IDS, that that use lightweight attacker may take control of virtualization to isolate different the web server after the application instances. Such attack, and that afterward they

131 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

can obtain full control of the called as containers, disposable web server to launch subsequent web servers for client sessions. attacks. For example, the It can be possible to create attackers could modify the thousands of virtualized application logic of the web containers on a single server applications, or take the computer, and these virtualized control of other user’s web containers can be discarded, or requests, or intercept and modify quickly reinitialized to give the database related queries to service to new client request. access important data by ignoring their user privileges.

3.1 Architecture of proposed system

All network traffic, from both

legitimate users and adversaries, Figure 3.1: Classic three-tier is received and intermixed at the model same web server. If an attacker A single physical web server takes charge of the web server, runs many containers; each one he can badly affect all future is an exact copy of the original sessions. Assigning each session web server. Our approach to a dedicated web server is dynamically generates new not a practical option, as it containers and recycles used will increase load on the web ones. As a result, a single server resources. To achieve physical server can run similar performance while continuously and serve all web maintaining a low performance requests. However, from a logical and resource overhead, we use perspective, each session is lightweight virtualization assigned to a dedicated web Technique. In our IDS design server and isolated from other approach; we make use of sessions. Since we initialize each lightweight process containers,

132 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

virtualized container using a compromised directly by an read only clean template, we can attacker. Fig. 3.1 show the guarantee that each session will classic three-tier architecture. be served with a clean web server The database server acts as a instance at initialization. we back end and web server plays choose to separate the role of front end application communications at the session logic. From the database end, level so that a single user always we are not able to tell which deals with the same web server. transaction associate to which Sessions can represent different client request The users to some extent, and we communication between the expect the communication of a database server and the web single user to go to the same server is not separated, and it is dedicated web server, thereby difficult to understand the allowing me to identify suspect relationships among them. Fig. behavior by both session and 3.2 show that how user. If we detect abnormal communications are divided in behavior in a session, we will separate sessions and how treat all traffic within this database transactions can be session as tainted. associated to a corresponding If an attacker compromises a session. As express in the Fig. vanilla web server, other 3.1, if the session of Client 2 is session’s communications can malicious and takes the charge also be hijacked. In our of the web server, then it will Intrusion Detection system, an infect all upcoming database attacker can stay within the web related transactions, as well as server containers to which the response to the client. But, as currently he is connected, with per the Fig. 3.2, Client 2 will only no knowledge of the existence of use the VE 2, and the respective other session communications. database transaction set T2 will we can thus ensure that be the only affected section of legitimate sessions will not be data within the database.

133 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

tier setup. So that the web 3.2 Design of Normality server can distinguish sessions Model from different web clients, the This container based and SQL queries are mixed and all session separated web server from the same web server. architecture not only enhances the security performances but also provides us with the isolated information lows that are separated in each container session. It allows us to identify the mapping between the web server requests and the subsequent Database queries, and to utilize such a mapping Figure 3.2: Web server instances model to detect abnormal running in containers behaviors on a client session It is difficult for a database level. In typical three-tiered web server to determine which server architecture, the web SQL queries are the results of server receives HTTP requests which web requests. Even if from clients and then issues SQL we knew the application logic queries to the database server to of the web server and want to fetch and make the modification build a correct model, it would be in the data. The SQL queries are impossible to use such a model. mostly dependent on the web To detect attacks within huge server request. We wish to model amount of concurrent real such causal mapping time traffic unless we had a relationships of all legitimate mechanism to identify the traffic so as to detect combination of the HTTP abnormal/attack traffic. It is request and SQL queries that impossible to build such are generated by the HTTP mapping under a classic three- request. However, within our

134 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

container based web servers, it mapping model, it can be used to is a straightforward matter to detect abnormal behaviors of identify the causal pairs of user request. The web request web requests and resulting and the database queries within SQL queries in a given each session should be associated session. In addition, as traffic with the model. If there exists can easily be separated by any request or query that session, it is possible for us to violates the normality model compare and analyze the request within a session, then the session and queries across different will be consider as a possible sessions. Section 3.4 discusses attack. how to build the mapping model 3.3 Inferring Mapping by profiling session traffics. To Relations that end, we put sensors at both 3.3.1 Deterministic Mapping sides of the servers. At the web pattern server, our sensors are deployed Deterministic mapping model is on the host system and the one of the general and perfect cannot be attacked directly pattern. That means the web since only the virtualized request rm appears in all traffic containers are exposed to with the SQL quer y set Qn. The attackers. Our sensors will not mapping pattern is then express be attacked at the database as rm → Qn (Qn = ∅). For server either, as we assume any user session in the testing that the attacker cannot phase with the request rm , the completely take control of the absence of a query set Qn database server. In fact, we matching the request indicates assume that our sensors a possibility of intrusion. cannot be attacked and can Otherwise, if the Qn is always capture correct traffic present in the user session information at both ends. Fig. traffic without the corresponding 3.2 displays the locations of our rm , this may also be the sign of sensors. After building the an intrusion. In static websites,

135 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 this type of mapping adjusted the similar to the opposite case of majority of cases because the the Empty Query Set mapping same results should be returned pattern. These kind SQL for each time a user visits the queries cannot match with any same link. web requests, and we keep 3.3.2 Empty SQL Query Set these unmatched SQL queries in Pattern a set NMR. At the time In special cases, the SQL query of testing phase, any query set may be the empty set. So that within set NMR is considered the web request not generates valid. The size of NMR depends any database queries. For on web server application logic, example, when a web request for but it is most of the time it is retrieving an image GIF file from small. the same web server is made, a mapping relationship does not 3.3.4 Nondeterministic Mapping exist because only the web The same web request may result requests are observed. This type in different SQL query sets based of mapping is represented as rm on input parameters or the → ∅. During the testing phase, status of the web page at the we store these types of web time the web request is received. requests keep together in the But, these different SQL query query set EQS. sets don’t appear randomly, and there exists a candidate pool of 3.3.3 Request which not matched query sets (e.g.{Qa,Qb,Qc...}). with any pattern In some Each time that the same type of situation, the web server may web request arrives, it always periodically submit SQL queries matches up with one (and only to the database server to one) of the query sets in the conduct some prescheduled pool. The mapping pattern is rm tasks for database backup → Qi(Qi ε {Qa,Qb,Qc...}). system. These queries are not Therefore, it is difficult to generated by any web request, identify traffic that matches this

136 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

pattern. This happens only because the intrusion sensor has within dynamic websites, such as a more precise normality model blogs or forum sites. Fig. 3.3 that detects a wider range of illustrates all four mapping threats. We achieved this by patterns. isolating the flow of information from each web server session with a lightweight virtualization.

Acknowledgement It is a great pleasure to acknowledge those who Figure 3.3: Overall extended their support, and representation of mapping contributed time and psychic patterns energy for the completion of this paper. At the outset; I would IV. CONCLUSION We presented an intrusion like to thank my guide Prof. M M detection system that builds Naoghare, who served as models of normal behavior for sounding board for both contents multitier web applications from and programming work. Her both frontend web (HTTP) valuable and skilful guidance, requests and backend database assessment and suggestions from (SQL) queries. Unlike previous time to time improved the approaches that correlated or quality of work in all respects. I summarized alerts generated would like to take this by independent IDSs, Double opportunity to express my deep Guard forms a container based sense of gratitude towards her, IDS with multiple input for her invaluable contribution in streams to produce alerts. We completion of this paper. I am have shown that such correlation also thankful to Prof. S M of input streams provides a Rokade, Head of Computer better characterization of the Engineering Department for system for anomaly detection his timely guidance, inspiration

137 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

and administrative support [2] SANS, The Top Cyber without which my work would Security Risks, not have been completed. I am http://www.sans.org/top- also thankful to the all staff cyber-securityrisks/, 2011. members of Computer [3] Openvz, Engineering Department and http://wiki.openvz.org, Librarian, Sir Visvesvaraya 2011 Institute of Technology College [4] Virtuozzo Containers, of Engineering Chincoli, ,Nasik. http://www.parallels.com/p Also I would like to thank roducts/pvc45/, 2011. Colleagues and friends who [5] C. Anley, Advanced Sql helped me directly and indirectly Injection in Sql Server to complete this paper. Lastly my Applications, technical special thanks to my family report, Next Generation members for their support and Security Software, Ltd., cooperation during this paper 2002 work. [6] K. Bai, H. Wang, and P. Liu, REFERENCES Towards Database Proceedings Papers: Firewalls, Proc. Ann. IFIP [1] Meixing Le, WG 1.3 Working Conf. AngelosStavrou, Member, Data and Applications IEEE, and Brent ByungHo Security (DBSec 05), 2005 on Kang, “DoubleGuard: [7] B. Parno, J.M. McCune, D. Detecting Intrusions in Wendlandt, D.G. Multitier Web Andersen, and A.Perrig, Applications”, IEEE CLAMP: Practical TRANSACTIONS ON Prevention of Large-Scale DEPENDABLE AND Data Leaks,Proc. IEEE SECURE COMPUTING, Symp. Security and VOL. 9, NO. 4, privacy, 2009 JULY/AUGUST 2012. [8] T. Pietraszek and C.V. Berghe, Defending

138 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

against Injection Attacks WebBased Attacks,” Proc. through Context-Sensitive 10th ACM Conf. Computer String Evaluation, Proc. and Comm. Security (CCS Intl Symp. Recent ’03), Oct. 2003. Advances in Intrusion [14] Y. Huang, A. Stavrou, Detection (RAID 05), 2005 A.K. Ghosh, and S. [9] R. Sekar, An Efficient Black- Jajodia, “Efficiently Box Technique for Tracking Application Defeating Web Interactions Using Application Attacks, Lightweight Proc. Network and Virtualization,” Proc. Distributed System First ACM Workshop Security Symp. (NDSS), Virtual Machine Security, 2009 2008. [10] greensql, [15] S. Potter and J. Nieh, http://www.greensql.net/, “Apiary: Easy-to-Use 2011. Desktop Application Fault [11] H. Debar, M. Dacier, Containment on and A. Wespi, “Towards Commodity Operating a Taxonomy of Intrusion- Systems,” Proc. USENIX Detection Systems,” Ann. Technical Conf., Computer Networks, vol. 2010. 31, no. 9, pp. 805-822, [16] Linux-vserver, http://linux- 1999. vserver.org/, 2011 [12] T. Verwoerd and R. Hunt, [17] A. Seleznyov and S. “Intrusion Detection Puuronen, “Anomaly Techniques and Intrusion Detection Approaches,” Computer Systems: Handling Comm., vol. 25, no. 15, pp. Temporal Relations 1356-1365, 2002. between Events,” Proc. [13] C. Kruegel and G. Vigna, Int’l Symp. Recent “Anomaly Detection of Advances in Intrusion

139 International Journal of Multidisciplinary Educational Research IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

Detection (RAID ’99), and Artificial Intelligence, 1999. 2009. [18] F. Valeur, G. Vigna, C. [20] M. Roesch, “Snort, Kru ¨ gel, and R.A. Intrusion Detection Kemmerer, “A System,” Comprehensive Approach http://www.snort.org, to Intrusion Detection 2011. Alert Correlation,” IEEE [21] M. Cova, D. Balzarotti, V. Trans. Dependable and Felmetsger, and G. Secure Computing, vol. 1, Vigna, “Swaddler: An no. 3, pp. 146-169, July- Approach for the Anomaly- Sept. 2004. Based Detection of State [19] A. Stavrou, G. Cretu- Violations in Web Ciocarlie, M. Locasto, and Applications,” Proc. Int’l S. Stolfo, “Keep Your Symp. Recent Advances in Friends Close: The Intrusion Detection (RAID Necessity for Updating ’07), 2007. an Anomaly Sensor with Legitimate Environment Changes,” Proc. Second ACM Workshop Security

140 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Implementation of Enhanced security on vehicular cloud computing

Ms.RajbhojSupriya K.1, Prof.Pankaj R. Chandre2

Abstract:-In a VC, multiple players caused by underutilized vehicular intermittent short- range resources including communications. We begin by computing power, data describing the VC models, storage, and Internet i.e.ad-hoc-based models and connectivity can be shared demonstrate algorithms to between rented out over the improve the scalability of Internet to various customers. security schemes and If the VC concept is to see a establishing trust wide adoption and to have relationships among multiple significant societal impact, players caused by security and privacy issues intermittent short- range need to be addressed. The communications. main contribution is to detect Index Terms— Challenge and examine a number of analysis, cloud computing, security challenges and privacy, security, vehicular potential privacy threats in cloud. VCs. Even though security I. INTRODUCTION issues has received the attention in cloud computing IN an work to help their vehicles and vehicular networks, we competent in the market, the identified security challenges vehicles manufacturers are that are special to VCs, e.g., offering increasingly more potent challenges of authentication onboard devices, including of high-mobility vehicles, powerful computers, a large scalability and the collection of wireless complexity of establishing transceivers. These device trust relationships among provide a set of customers that

141 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 expect their vehicles to provide vehicle insurance company unified extension of their home enquiry the accident, they cannot environment populated by link the accident to the driver refined entertainment centers, who caused it. Casually, the access to Internet, and other security issues faced in VCs may similar requirements and needs. look corruptly similar to those Powerful onboard devices experienced in other networks. support new applications, However, a more careful analysis including location-specific exposes that many of the classic services, online gaming, and security challenges are various forms of mobile intensified by the characteristic infotainment. features of VCs to the point Security and privacy where they can be construed as problems need to be addressed if VC-specific .The main the VC concept is to be widely contributions of this work are to adopted. Traditional network recognize and evaluate security systems try to prevent attackers challenges and privacy threats from arriving a system. In VC, all that are VC specific and to the users, including the propose a reasonable security attackers, are equal. The structure that addresses some of attackers can be physically the VC challenges recognized in located on one machine. The this paper. attackers can utilize system ambiguities to reach their goals, II. RELATED WORK such as obtaining confidential The security challenges in VC is a records and interfering with the new, interesting topic. Vehicles integrity of information and the will be independently shared to availability of resources. Suppose create a cloud that can provide that an accident has taken place services to certified users. This at center, and the accident will be cloud can provide real-time reported to the VC. The driver services, such as mobile responsible for the accident can systematic laboratories, conquer the VC and modify the intelligent transportation accident record. In Future, when systems and smart electric power the law enforcement or the grids. Vehicles will share the

142 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 ability of calculating power, PKI is a method that is well Internet access, and storage to suited for security purposes, form conventional clouds. These particularly for roadside researchers have only focused on infrastructure. Geo Encrypt in providing a structure for VC VANETs has been proposed by computing, but as already said, Yan et al [12]. The idea is to use the problem of privacy and the geographic location of a security has not yet been movable device to create a secret mentioned in the literature. As key. Encrypted messages with pointed out by Hasan [8], cloud the secret key, and the encoded security becomes one of the texts are sent to receiving major obstacles of a widespread movable device. The receiving adoption of traditional cloud movable device must be services. Generalizing the physically present in a certain recommendations of [8], we geographic region specified by expect that the same problems the sender to be able to decrypt will be present in VCs. the message. Now a days, vehicular ad In recent times, some hoc network (VANET) security attention has been given to the and privacy have been addressed general security problem in by a large number of papers. Yan clouds, although not associated et al [9], [10] proposed active and with vehicular networks [13]. passive location security The simple solution is to control algorithms. Public Key access to the cloud hardware Infrastructure (PKI) and digital facilities. This can minimize risks signature-based methods have from insiders [14]. Santos et al been well exposed in VANETs [15] proposed a new platform to [11]. A certificate authority (CA) achieve trust in conventional generates public and private keys clouds. A trust coordinator for nodes. The use of digital maintained by an external third signature is to validate and party is imported to validate the authenticate the sender. delivered cloud manager, which The main use of encryption makes a set of virtual machines is to reveal the content of (VMs) such as Amazon’s E2C messages only to entitled users. (i.e., Infrastructure as a Service,

143 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 IaaS) available to users. homomorphismauthenticator and Garfinkelet al [16] proposed a random masking to secure cloud solution to prevent the owner of data andpreserve privacy of a physical host from retrieving public cloud data. The bilinear and snooping with the services on aggregatesignature has been the host. Berger et al [17] and extended to simultaneously audit Murray et al. [18] adopted a multipleusers. Ristenpartet al. similar solution. When a VM [23] resented experiments of boots up, systeminformation such locatingco-residence of other as the basic input output system users in cloud VMs. (BIOS), systemprograms, and all the service applications is III. IMPLEMENTATION recorded, anda hash value is DETAILS generated and transmitted to a In a previous papers, Prof. Olariu third-party TrustCenter. For have promoted the vision of every period of time, the system vehicular clouds (VCs), a will collect systeminformation of nontrivial extension, along the BIOS, system programs, and several dimensions, of all the serviceapplications and conventional cloud computing. In transmit the hash value of a VC, underutilized vehicular system informationto the third- resources including computing party Trust Center. The Trust power, storage, and Internet Center can evaluatethe trust connectivity can be shared value of the cloud. Krautheim between renters out over the [19] also proposeda third party to Internet to various customers. share the responsibility of The security challenges are security in cloudcomputing addressed of a novel perspective between the service provider and of VANETs. This system have client, decreasingthe risk first introduced the security and disclosure to both. Jensen et al privacy challenges that VC [20] stated technicalsecurity computing networks have to face, issues of using cloud services on and system also addressed the Internet access.Wang et al possible security solutions. [21], [22] proposed public-key- Although some of the solutions based can leverage existing security

144 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 techniques, there are many Each vehicle has long-term PKI unique challenges. For example, public/private key pairs: attackers can physically locate on • private key: S; the same cloud server. • public key: g, p, T,where The below figure shows the T = gSmodp. system block diagram, theflow of It should be noted that a message how to generate the truth m can be combined as m—T, relationship. where T is the timestamp. The timestamp can ensure the freshness of the message. For each message m to be signed, three steps are followed. 1. Generate a per-message public/private key pair of Sm

(private) and Tm = gSm mod p Fig 1: System Block Diagram. (public). 2. Compute the message digest The signature of the safety dm = H(m|Tm) and the message signature X = S + dmS mod (p - message can be described as m follows: Following the ElGamal 1), where mod is the modulo signature scheme defines the operation and | is the parameters. concatenation operator. 1. Generate user global userset 3. Send m, Tm, and X. 2. If currentuser user then To verify the message, three 3. H: a collision-free hash steps are followed. function; 1. Compute the message digest 4. p: a large prime number that dm = H(m—Tm). x will ensure that computing 2. Compute Y1 = g and Y2 = discrete logarithms modulo p is TmTdm. very difficult; 3. compare Y1 = Y2. If Y1 = Y2, 5. g smaller than p: a randomly then the signature is correct. chosen generator out of a The reason is:Y1=gX multiplicative group of integers =gSm+dmS =gSmgdmS modulo p. =TmgSdm =TmTdm =Y 2.

145 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 communication is inherently unstable and intermittent. We have provided a directional security scheme to show an appropriate security architecture that handles several, not all, challenges in VCs. Now we have investigated the brand-new area Fig 2: System Architecture and design solutions for each The above figure shows the individual challenge. Many system architecture how the applications are developed on sender sends the data then it will VCs. In proposed work a special get sliced and passed to the application will need to analyze mobile devices and again it will and provide security solutions. get desliced so the other usre will Extensive work of the security not be able to use the data and and privacy in VCs will become a passed to the receiver.using it we complex system and need a will maintain the security of VC systematic and synthetic way to computing. implement intelligent transportation systems. IV. RESULT AND CONCLUSION We have first proposed the REFERENCES security and privacy challenges [1] Gongjun Yan, Ding Wen, that VC computing networks Stephan Olariu, and Michele C. have to face, and have also Weigle, “ Security Challenges in Vehicular Cloud Computing”, addressed possible security IEEE transactions on solutions. Even though some of intelligenttransportation the solutions can leverage systems Syst., 2012. vol. 14, no. existing security techniques, 1, march 2013. there are many distinctive [2] S. Arif, S. Olariu, J. Wang, G. challenges. Yan, W. Yang, and I. Khalil, For example, attackers can “Datacenter at the airport: physically locate on the same Reasoning about time- cloud server. The vehicles have dependent parking lot high mobility, and the

146 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

occupancy”, IEEETrans. [9] G. Yan, S. Olariu, and M. Parallel Distrib. Syst., 2012. C.Weigle, “Providing VANET [3] S. Olariu, M. Eltoweissy, and security through active position M. Younis, “Toward detection”, Comput. Commun., autonomous vehicular clouds”, vol. 31, no. 12, pp. 2883- 2897, ICST Trans. MobileCommun. Jul. 2008, Special Issue on Comput., vol. 11, no. 7âAS9, pp. Mobility Protocols for 1 to 11, Jul.âASSep. 2011. ITS/VANET. [4] S. Olariu, I. Khalil, and M. [10] G. Yan, S. Olariu, and M. Abuelela,“Taking VANET to Weigle, “Providing location the clouds” , Int. J. Pervasive security in vehicular ad hoc Comput.Commun., vol. 7, no. 1, networks” IEEEWireless pp.7-21, 2011. Commun., vol. 16, no. 6, pp. 48- [5] G. Yan and S. Olariu, “A 55, Dec 2009. probabilistic analysis of link [11] J. Sun, C. Zhang, Y. Zhang, and duration in vehicular ad hoc Y. M. Fang, “An identity-based networks”, IEEE Trans.Intell. security system for user privacy Transp. Syst., vol. 12, no. 4,pp. in vehicular ad hoc networks”, 1227-1236, Dec. 2011. IEEE [6] D. Huang, S. Misra, G. Xue, and Trans.ParallelDistrib.Syst., vol. M. Verma, “PACP: An efficient 21, no. 9, pp. 1227-1239, Sep. pseudonymous authentication 2010. based conditional privacy [12] G. Yan and S. Olariu, “An protocol for vanets”,IEEE efficient geographic location- Trans.Intell. Transp. Syst., vol. based security mechanism for 12, no. 3, pp. 736-746, Sep.2011. vehicular ad hoc networks”, in [7] J. Li, S. Tang, X. Wang, W. Proc. IEEE Int. Symp. Duan, and F.-Y. Wang, TSP,Macau SAR, China, Oct. “Growing artificial 2009, pp. 804-809. transportation systems: A [13] A. Friedman and D. West, rulebased iterative design “Privacy and security in cloud process”, IEEE Trans. computing”,Centerfor Intell.Transp. Syst, vol. 12, no. Technology Innovation: Issues 2, pp. 322-332, Jun. 2011. in Technology Innovation, no. [8] R. Hasan, Cloud Security. 3, pp.1-11, Oct. 2010. [Online]. [14] J. A. Blackley, J. Peltier, and T. Available:http://www.ragibhasa R. Peltier,“ Information n.com/research/cloudsec.html. Security Fundamentals”,. New York: Auerbach, 2004.

147 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

[15] N. Santos, K. P. Gummadi, and IEEE Int. Conf. Cloud Comput., R. Rodrigues, “Toward trusted 2009, pp. 109-116. cloud computing”, in Proc. [21] C. Wang, Q. Wang, K. Ren, and HotCloud, Jun. 2009. W. Lou, “Privacy preserving [16] T. Garfinkel, B. Pfaff, J. Chow, public auditing for data storage M. Rosenblum, and D. B. Terra, security in cloud computing”, in “Virtual machine-based Proc. IEEE INFOCOM, platform for trusted SanDiego, CA, 2010, pp1-9. computing”, in Proc. ACM [22] Q.Wang, C.Wang, J. Li, K. Ren, SOSP, 2003,pp. 193-206. andW. Lou, “Enabling public [17] S. Berger, R. CÃaceres, K. A. verifiability and data dynamics Goldman, R. Perez, R. Sailer, for storage security in cloud and L. van Doorn, “VTPM: computing”, in Virtualizing the trusted Proc.14thESORICS, 2009, pp. platform module”, in Proc.15th 355-370. Conf.USENIX Sec. Symp., [23] F.-Y. Wang, “Parallel control Berkeley, CA, 2006, pp. 305-320. and management for intelligent [18] D. G. Murray, G. Milos, and S. transportation systems: Hand, “Improving Concepts, architectures, and security through applications”, IEEE Tran. disaggregation”, in Proc.4th Intell.Transp. Syst., vol. 11, no. ACM SIGPLAN/SIGOPS Int. 3, pp. 630-638, Sep. 2010. Conf.IEEE, NewYork, 2008, pp. [24] H. Xie, L. Kulik, and E. Tanin, 151-160. “Privacy-aware traffic [19] F. J. Krautheim, “Private monitoring”, IEEE Trans. virtual infrastructure for cloud Intell. Transp. Syst., vol. 11, no. computing”, in Proc. Conf. Hot 1, pp.61-70, Mar. 2010. Topics CloudComput., 2009, [25] L. Li, J. Song, F.-Y. Wang, W. pp.1-5. Niehsen, and N. Zheng, “IVS [20] M. Jensen, J. Schwenk, N. 05: New developments and Gruschka, and L. L. Iacono, research trends for intelligent “On technical security issues in vehicles”, IEEE Intell.Syst., cloud computing”, in Proc. vol.20, no. 4, pp. 10-14, Jul./Aug. 2005.

148 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 DOCUMENT CLUSTERING FOR FORENSIC ANALYSIS & INVESTIGATION

Mr. Dhokane R.M.1, 1SVIT, Chincholi, Tal: Sinner, Dist: Nashik. [email protected]

Prof. Rokade S.M.2 2 SVIT, Chincholi, Tal: Sinner, Dist: Nashik. [email protected]

Abstract: in computer carrying out extensive forensic analysis, millions of experimentation with six files are usually examined. well-known clustering Most of the data in those files algorithms (K-medoids, K- consists of un-structured means, Complete Link, Single text, whose examination by Link, Average Link, and computer examiners is CSPA) applied to five real- difficult to be performed. In world datasets obtained from this context, automatic computers grabbed in real- methods of analysis are of world investigations. great interest. In particular, Experiments have been algorithms for clustering performed with different documents has facility of the combinations of parameters, discovery of new and useful resulting in 16 different knowledge from the instantiations of algorithms. documents under analysis. I In addition, two relative present an approach that validity indexes were used to applies document clustering automatically estimate the algorithms to forensic number of clusters. Related investigation of Computers studies in the literature are grabbed in police significantly more limited investigations. I illustrate the than our study. Our proposed approach by experiments show that the

149 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Average Link and Complete collect and analyze data from Link algorithms provide the computer systems in a way that best results for our is admissible as evidence in a application domain. If court of law. In our particular suitably initialized, application domain, it usually partitional algorithms (K- involves examining hundreds of means and K-medoids) can thousands of replace (copy-paste) also yield to very good the content with your own results. Finally, I also present material. Files per computer. and discuss several practical This activity exceeds the expert’s results that can be useful for ability of analysis and researchers and interpretation of data. Therefore, Practitioners of forensic methods for automated data computing. analysis, like those widely used for machine learning and data Keywords—Clustering, mining, are of paramount forensic computing, text importance. In particular, mining. algorithms for pattern recognition from the information present in text documents are I. INTRODUCTION IT IS estimated that the promising, as it will hopefully volume of data in the digital become evident later in the world increased about 18 times paper. the amount of information Clustering algorithms are present in all the books ever usually used for exploratory data written and it continues to grow analysis, where there is few or no exponentially. This large amount prior knowledge about the data of data has a direct impact in [2], [3]. This is precisely the case Computer Forensics, which can in several applications of be broadly defined as the Computer Forensics, including discipline that combines elements the one addressed in our work. of law and computer science to From a more technical point of

150 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 view, our datasets consist of cluster [2], [3]. Thus, once a data unlabeled objects the classes or partition has been induced from categories of documents that can data, the expert examiner might be found are a priori unknown. initially focus on reviewing Moreover, even assuming that representative documents from labeled datasets could be the obtained set of clusters. available from previous analyses, Then, after this preliminary there is almost no hope that the analysis, (s) he may eventually same classes (possibly learned decide to scrutinize other earlier by a classifier in a documents from each cluster. By supervised learning setting) doing so, one can avoid the hard would be still valid for the task of examining all the upcoming data, obtained from documents (individually) but, other computers and associated even if so desired, it still could be to different investigation done. In a more practical and processes. More precisely, it is realistic scenario, domain experts likely that the new data sample (e.g., forensic examiners) are would come from a different scarce and have limited time population. In this context, the available for performing use of clustering algorithms, examinations. Thus, it is which are capable of finding reasonable to assume that, after latent patterns from text finding a relevant document, the documents found in grabbed examiner could prioritize the computers, can enhance the analysis of other documents analysis performed by the expert belonging to the cluster of examiner. interest, because it is likely that The rationale behind these are also relevant to the clustering algorithms is that investigation. Such an approach, objects within a valid cluster are based on document clustering, more similar to each other than can indeed improve the analysis un-structured text they are to of grabbed computers, as it will objects belonging to a different be discussed in more detail later.

151 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Clustering algorithms have been application domain using five studied for decades, and the real-world investigation cases literature on the subject is huge. conducted by the Brazilian Therefore, I decided to choose a Federal Police Department. In set of (six) representative order to make the comparative algorithms in order to Show the analysis of the algorithms more potential of the proposed realistic, two relative validity approach, namely: the partitional indexes (Figure [4] and its K-means [3] and K-medoids [4], simplified version [7]) have been the hierarchical used to estimate the number of TABLE I clusters automatically from Data. SUMMARY OF ALGORITHMS It is well-known that the AND THEIR PARAMETERS number of clusters is a critical parameter of many algorithms and it is usually a priori unknown. As far as I know, however, the automatic estimation of the number of clusters has not been investigated in the Computer Forensics literature. Actually, I could not even locate one work Single/Complete/Average Link that is reasonably close in its [5], and the cluster ensemble application domain and that algorithm known as CSPA [6]. reports the use of algorithms These algorithms were run with capable of estimating the number different combinations of their of clusters. Perhaps even more parameters, resulting in sixteen surprising is the lack of studies different algorithmic on hierarchical clustering instantiations, as shown in Table algorithms, which date back to I. Thus, as a contribution of our the sixties. Our study considers work, I compare their relative such classical algorithms, as well performances on the studied

152 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 as recent developments in present in all the books ever clustering, such as the use of written and it continues to grow consensus partitions [6]. The exponentially. This large amount present paper extends our of data has a direct impact in previous work [23], where nine Computer Forensics, which can different instantiations of be broadly defined as the algorithms were analyzed. As discipline that combines elements previously mentioned, in our of law and computer science to current work I employ sixteen collect and analyze data from instantiations of algorithms. In computer systems in a way that addition, I provide more is admissible as evidence in a insightful quantitative and court of law. qualitative analyses of their 2. Clustering algorithms are experimental Results in our typically used for exploratory application domain. data analysis, where there is few The remainder of this or no prior knowledge about the paper is organized as follows. data [2], [3]. This is precisely the Section II presents Literature case in several applications of Survey. Section III briefly Computer Forensics, including addresses the implementation i.e. the one addressed in our work. adopted clustering algorithms From a more technical viewpoint, and preprocessing steps. Section our datasets consist of unlabeled IV reports our experimental objects the classes or categories results, and Section V addresses of documents that can be found some future works of our study. are a priori unknown. Finally, Section VI concludes the 3. There are only a few studies paper. reporting the use of clustering II. LITERATURE SURVEY algorithms in the Computer 1. Digital world increased from Forensics field. Essentially, most 161 hexabytes in 2006 to 988 of the studies describe the use of hexabytes in 2010 [1] about 18 classic algorithms for clustering times the amount of information data e.g., Expectation-

153 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Maximization (EM) for documents found by the user unsupervised learning of anymore. An integrated Gaussian Mixture Models, K- environment for mining e-mails means, Fuzzy C-means (FCM), for forensic analysis, using and Self-Organizing Maps classification and clustering (SOM). These algorithms have algorithms, was presented in well-known properties and are [10]. In a related application widely used in practice. For domain, e-mails are grouped by instance, K-means and FCM can using lexical, syntactic, be seen as particular cases of EM structural, and domain-specific [21]. Algorithms like SOM [22], features [11]. Three clustering in their turn, generally have algorithms (K-means, Bisecting inductive biases similar to K- K-means and EM) were used. means, but are usually less The problem of clustering e-mails computationally efficient. In [8], for forensic analysis was also SOM-based algorithms were used addressed in [12], where a for clustering files with the aim Kernel-based variant of K-means of making the decision-making was applied. The obtained results process performed by the were analyzed subjectively, and examiners more efficient. The the authors concluded that they files were clustered by taking are interesting and useful from into account their creation an investigation perspective. dates/times and their extensions. 4. More recently [13], a FCM- This kind of algorithm has also based method for mining been used in [9] in order to association rules from forensic cluster the results from keyword data was described. The searches. literature on Computer Forensics 3. The underlying assumption is only reports the use of algorithms that the clustered results can that assume that the number of increase the information retrieval clusters is known and fixed a efficiency, because it would not priori by the user. Aimed at be necessary to review all the relaxing this assumption, which

154 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 is often unrealistic in practical for text mining, in which applications, a common approach documents are represented in a in other domains involves vector space model [15]. In this estimating the number of model, each document is clusters from data. Essentially, represented by a vector one induces different data containing the frequencies of partitions (With different occurrences of words, which are numbers of clusters) and then defined as delimited alphabetic assesses them With a relative strings, whose number of validity index in order to characters is between 4 and 25. estimate the best value for the I also used a number of clusters [2], [3], [14]. dimensionality reduction This work makes use of such technique known as Term methods, thus potentially Variance (TV) [16] that can facilitating the work of the expert increase both the effectiveness examiner who in practice would and efficiency of clustering hardly know the number of Algorithms. TV selects a number clusters a priori. of attributes (in our case 100 III. IMPLEMENTATION words) that have the greatest DETAILS. variances over the documents. In A. Pre-Processing Steps order to compute distances Before running clustering between documents, two algorithms on text datasets, I measures have been used, performed some preprocessing namely: cosine-based distance steps. In particular, stop words [15] and Levenshtein- based (prepositions, pronouns, articles, distance [17]. The later has been and irrelevant document used to calculate distances metadata) have been removed. between file (document) names Also, the Snowball stemming only. algorithm for Portuguese words has been used. Then, I adopted a traditional statistical approach

155 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

B. Estimating the Number of process, I am also estimating the Clusters from Data number of clusters. In order to estimate the A widely used relative number of clusters, a widely used validity index is the so-called approach consists of getting a set Figure [4], which has also been of data partitions with different adopted as a component of the numbers of clusters and then algorithms employed in our work. selecting that particular partition Therefore, it is helpful to define that provides the best result it even before I address the according to a specific quality clustering algorithms used in our criterion (e.g., a relative validity study. index [2]–[5]). Such a set of Let us consider an object partitions may result directly belonging to cluster A. The from a hierarchical clustering average dissimilarity of to all dendro gram or, alternatively, other objects of A is denoted by a from multiple runs of a (i). Now let us take into account partitional algorithm (e.g., K- cluster C. The average means) starting from different dissimilarity of i to all objects of numbers and initial positions of C will be called d (i,C) . After the cluster prototypes (e.g., see computing d(i,C) for all clusters [14] and references therein). C≠A, the smallest one is selected, For the moment, let us i.e.(i)= mid d(i,C) , C≠A. This assume that a set of data value represents the dissimilarity partitions with different numbers of i to its neighbor cluster, and of clusters is available, from the Figure for a give object, s (i), which I want to choose the best is given by: one according to some relative validity criterion. Note that, by S (i) = ------(1) choosing such a data partition, I am performing model selection It can be verified that - s (i) and, as an intrinsic part of this 1≤ ≤+1. Thus, the higher s (i) the better the assignment of object to

156 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 a given cluster. In addition, if is centroid. Thus, it is necessary to equal to zero, then it is not clear compute only one distance to get whether the object should have the a (i) value, instead of been assigned to its current calculating all the distances cluster or to a neighboring one Between and the other objects of [4]. Finally, if cluster A is a A. Similarly, instead of singleton, then s (i) is not defined computing d (i,C) as the average and the most neutral choice is to dissimilarity of to all objects of set s (i) =0 [4]. Once I have C, C≠A, I can now compute the computed s (i) over i=1, 2, 3….N, distances between I and the where N is the number of objects centroid of C. in the dataset, I take the average Note that the over these values, and the computation of the original resulting value is then a Figure [4], as well as of its quantitative measure of the data simplified version [7], [14], partition in hand. Thus, the best depends only on the achieved clustering corresponds to the partition and not on the adopted data partition that has the clustering algorithm. Thus, these maximum average Figure. Figures can be applied to assess The average Figure just partitions (taking into account addressed depends on the the number of clusters) obtained computation of all distances by several clustering algorithms, among all objects. In order to as the ones employed in our come up with amore study and addressed in the computationally efficient sequel. criterion, called simplified C. Clustering Algorithms Figure, one can compute only the The clustering algorithms distances among the objects and adopted in our study the the centroids of the clusters. The partitional K-means [2] and K- term a (i) of (1) now corresponds medoids [4], the hierarchical to the dissimilarity of object to its Single/Complete/Average Link corresponding cluster (A) [5], and the cluster ensemble

157 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 based algorithm known as CSPA which distant objects from each [6] are popular in the machine other are chosen as starting learning and data mining fields, prototypes [18]. Unlike the and therefore they have been partitional algorithms such as K- used in our study. Nevertheless, means/medoids, hierarchical some of our choices regarding algorithms such as Single/ their use deserve further Complete/Average Link provide a comments. For instance, K- hierarchical set of nested medoids [4] is similar to K- partitions [3], usually means. However, instead of represented in the form of a computing centroids, it uses dendro gram, from which the best medoids, which are the number of clusters can be representative objects of the estimated. In particular, one can clusters. This property makes it assess the quality of every particularly interesting for partition represented by the applications in which (i) dendro gram, subsequently Centroids cannot be computed; choosing the one that provides and (ii) Distances between pairs the best results [14]. of objects are available, as for The CSPA algorithm [6] computing dissimilarities essentially finds a consensus between names of documents clustering from a cluster with the Levenshtein distance ensemble formed by a set of [17]. different data partitions. More Considering the precisely, after applying partitional algorithms, it is clustering algorithms to the data, widely known that both K-means a similarity (co association) and K-medoids are sensitive to matrix [19] is computed. Each initialization and usually element of this matrix represents converge to solutions that pair-wise similarities between represent local minima. Trying to objects. The similarity between minimize these problems, I used two objects is simply the fraction a nonrandom initialization in of the clustering solutions in

158 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 which those two objects lie in the the relative validity index) is same cluster. Later, this taken as the result of the similarity measure is used by a clustering process. For each clustering algorithm that can partitional algorithm (K- process a proximity matrix e.g., means/medoids), K-medoids to produce the final I execute it repeatedly for consensus clustering. The sets of an increasing number of clusters. data partitions (clustering) were For each value of K, a number of generated in two different ways: partitions achieved from different (a) by running K-means 100 initializations are assessed in times with different subsets of order to choose the best value of attributes (in this case CSPA K and its corresponding data processes 100 data partitions); partition, using the Figure [4] And (b) by using only two data and its simplified version [7], partitions, namely: one obtained which showed good results in by K-medoids from the [14] and is more computationally dissimilarities between the file efficient. In our experiments, I names, and another partition assessed all possible values of K achieved with K-means from the in the interval [2, N], where N is vector space model. In this case, the number of objects to be each partition can have different clustered. weights, which have been varied D. Dealing with Outliers between 0 and 1 (in increments I assess a simple approach to of 0.1 and keeping their sum remove outliers. This approach equals to 1). For the hierarchical makes recursive use of the algorithms Figure. Fundamentally, if the (Single/Complete/Average Link), best partition chosen by the I simply run them and then Figure has singletons (i.e., assess every partition from the clusters formed by a single object resulting dendro gram by means only), these are removed. Then, of the Figure [4]. Then, the best the clustering process is repeated partition (elected according to over and over again until a

159 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 partition without singletons is I used five datasets found. At the end of the process, obtained from real-world all singletons are incorporated investigation cases conducted by into the resulting data partition the Brazilian Federal Police (for evaluation purposes) as Department. Each dataset was single clusters. Table I obtained from a different hard summarizes the clustering drive, being selected all the non- algorithms used in our work and duplicate documents with their main characteristics. extensions “doc”, “docx”, and “odt”. Subsequently, those E. Experimental evaluation documents were converted into 1. Datasets: plain text format and Sets of documents that preprocessed as described in appear in computer forensic Section III-A. The obtained data analysis applications are quite partitions were evaluated by diversified. In particular, any taking into account that I have a kind of content that is digitally reference partition (ground truth) compliant can be subject to for every dataset. Such reference investigation. In the datasets partitions have been provided by assessed in our study, for an expert examiner from the instance, there are textual Brazilian Federal Police documents written in different Department, who previously languages (Portuguese and inspected every document from English). Such documents have our collections. The datasets been originally created in contain varying amounts of different file formats, and some documents (N), groups (K), of them have been corrupted or attributes (D), singletons (S), and are actually incomplete in the number of documents per group sense that they have been (#), as reported in Table II. (partially) recovered from deleted data.

160 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 TABLE II on the Adjusted Rand Index [3], DATASET CHARACTERISTICS [20], which measures the agreement between a partition P, obtained from running a clustering algorithm, and the

reference partition R given by the 2. Evaluation Measure From a scientific expert examiner. More perspective, the use of reference specifically, ARI € [0, 1] and the partitions for evaluating data greater its value the better the clustering algorithms is agreement between P and R. considered a principled approach. F. Data Flow Diagram In controlled experimental settings, reference partitions are usually obtained from data generated synthetically according to some probability distributions. From a practical standpoint, reference partitions are usually obtained in a different way, but they are still employed to choose a particular clustering algorithm that is more appropriate for a given application, or to calibrate its parameters. In our case, reference partitions were constructed by a domain expert and reflects the expectations that (s) he has about the clusters that should be found in the datasets. In this sense, the evaluation method that I used to assess the obtained data partitions is based

161 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

IV. RESULTS AND especially for datasets A and B. DISCUSSIONS This result can be explained by Table III summarizes the the presence of outliers, whose obtained ARI results for the chain effect is known to impact algorithms listed in Table I. In Single Link performance [2]. general, AL100 (Average Link The results achieved by algorithm using the 100 terms Kmd100* and KmsT100* were with the greatest variances, also very good and competitive to cosine- based similarity, and the best hierarchical algorithms Figure criterion) provided the (AL100 and CL100). I note that, best results with respect to both as expected, a the average and the standard TABLE III deviation, thus suggesting great ADJUSTED RAND INDEX (ARI) RESULTS accuracy and stability. Note also that an ARI value close to 1.00 indicates that the respective partition is very consistent with the reference partition this is precisely the case here. In this table, I only report the best obtained results for the algorithms that search for a consensus partition between file name and content (NC100 and NC) i.e., partitions whose weights for name/content resulted in the greatest ARI value. The ARI values for CL100 are similar to those found by AL100. Single Link (SL100), by its turn, presented worse results than its Fig. 1. Figure for Kms100, hierarchical counterparts AL100, and Kms100 (dataset E).

162 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Good initialization procedure random initialization of such as the one described in [18] prototypes (Kms100). However, provides the best results. the ARI values for Kms100*, Particularly, the initialization on which uses distant objects can minimize the K-means/medoids problems with respect to local minima. To illustrate this aspect, Fig. 1 shows the Figure values as a function of K for three algorithms (Kms100, Kms100*, and AL100). One can observe that Kms100 (with random initialization of prototypes), Fig. 2. Figure Kms100 presents more local maxima for (Dataset D). the Figure (recall that these were The simplified Figure to obtained from local minima of K- estimate, are worse than those means), yielding to less stable Obtained for, KmsT100*, results. Conversely, Kms100* Kmd100* and the hierarchical (initialization on distant objects algorithms. Recall that these [18]) has fewer local maxima, algorithms use the traditional being more stable. This trend has version of the Figure. This also been observed in the other observation suggests that the datasets. Surprisingly, Kms100* simplified Figure provided worse has curves similar to those of estimates for the values of K. AL100, especially for higher This can be explained values of K. This fact can be from the fact that the simplified explained, in part, because both Figure essentially computes algorithms tend to separate distances between objects and outliers. It can also be observed centroids, whereas the traditional that Kms100* got slightly better Figure takes all the pair-wise results than its variant with distances between objects into

163 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 account in order to estimate. In underestimates the intra cluster other words, there is some distances. information loss in the As a consequence, in the computation of the simplified trade-off relationship between Figure, which indeed provided cluster compactness and cluster smaller values for K than both separability, the simplified the Figure and the reference Figure value tends to be higher partition in four out of five than the one find by Figure for a datasets. I also point out that given data partition. Also, the both the Figure and its simplified higher the number of clusters the version estimate the number of better the simplified Figure clusters by taking into account estimate for the compactness, two concepts: cluster approaching the value estimated compactness (average intra by Figure and becoming equal in cluster dissimilarity) and cluster the extreme case (K=N), as separability (inter-cluster shown in Fig. 2 (Dataset D). dissimilarity). Both are Similar figures have been materialized by computing observed for the other datasets. average distances. I observed Thus, the simplified Figure that the average distance from a favors lower values for K than given object to all the objects of a the Figure. cluster tends to be greater than The use of the file names the distance of that object to the to compute dissimilarities cluster’s centroid. Moreover, between documents in principle such a difference tends to be seemed to be interesting, because greater when computing intra one usually chooses meaningful cluster distances (compactness) file names. However, it turns out compared to calculating inter that, empirically, the use of only cluster distances (separability). the file names to compute In other words, it is more likely that the simplified Figure

164 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 algorithms that search for a consensus partition between those formed by name and content. The ARI values are shown for different weights that capture the (imposed) relative importance of name and content. Some particular combinations of values for the weights, like when Fig. 3. ARI values NC and the content weight is comprised NC100 algorithms (Dataset C). between 0.1 and 0.5, provided worse results compared to the The dissimilarity between standalone use of either name or documents did not bring good content, thus suggesting that results in general e.g., see results mixing information of different for KmdLev and KmdLevS in nature may be prejudicial to the Table III. On the other hand, clustering process. However, one these results may not be can see that NC100 shows a considered surprising because the (secondary) peak of the ARI value name of the file provides less (for content weight equals to 0.6). information than the file content. Although the primary peak However, there is an exception to suggests using only the data this general behavior that can be partition found from content observed from the relatively good information, it seems that adding results of these algorithms on information about the file name Dataset D, thus suggesting that may indeed be useful. To wrap up the file name, although less this discussion, the use of the file informative than the file content, name can help the clustering can help the clustering process. process, but it seems to be highly In addition, let us now analyze data dependent. Fig. 3 (for Dataset C), which Let us now focus on the shows the ARI values for the performance of E100. Recall from

165 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Table I that this is a particular Partitions, which are the inputs instantiation of the CSPA [6] for the computation of the co algorithm. More specifically, it association matrix. Thus, a obtains a consensus partition misleading consensus clustering from a set of partitions generated is obtained. Therefore, the choice by K-means by randomly of random sets of attributes to selecting 100 different attributes. generate partitions for consensus Because of both the high sparsity clustering algorithms seems to be of the data matrix (common to an inappropriate approach for text datasets) and the random such text data. attribute selection, many Considering the documents are represented by algorithms that recursively apply very sparse vectors. the Figure for removing Consequently, such documents singletons (KmsS and Kms100S), have been grouped into a single Table III shows that their results cluster, whose centroid is also are relatively worse when very sparse and with component compared to the related versions values close to zero. Such that do not remove singletons centroids induce misleading (Kms and Kms100). However, TABLE IV KmdLevS, which is based on the EXAMPLE OF THE similarities between file names, INFORMATION FOUND IN presented similar results to those THE CLUSTERS found by its related version that does not remove singletons (KmdLev). In principle, one could expect that the removal of outliers, identified from carefully analyzing the singletons, could yield to better clustering results. However, I have not observed this potentially good property in our experiments and, as

166 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 expected, this aspect is rather the investigated datasets, the data-dependent. As such, these best data partitions are formed algorithms may be potentially by clusters containing either helpful to deal with other relevant or irrelevant documents. datasets. For example, in dataset C, the As far as the adopted algorithm AL100 obtained a data dimensionality reduction partition formed by some technique is concerned Term singletons and by other 15 Variance (TV) [16] I observed clusters (C1, C2, C3… C15) that the selection of the 100 whose information are listed in attributes (words) that have the Table IV. greatest variance over the documents provided best results than using all the attributes in three out of five datasets (see Table III). Compared to Kms and KmsS, the worse results obtained from feature selection by Kms100 and Kms100S, especially in the dataset D, are likely due to k- Fig. 4. Dendro gram obtained means convergence to local by AL100 Dataset A. optima from bad initialization. By considering all the results For obvious reasons, I cannot obtained from feature selection, I reveal more detailed, confidential believe that it should be further information. However, I can studied mainly because of the mention that, on this real world potentially advantageous financial crime investigation, computational efficiency gains. clusters of relevant documents Finally, from a practical have been obtained, such as C10, viewpoint, a variety of relevant C14, and C2, which contain findings emerged from our study. financial or exchange transaction It is worth stressing that, for all information. Also, other obtained

167 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 clusters have only irrelevant clusters it merges. This documents e.g., C12, which representation provides very contains label designs. These informative descriptions and clusters of either relevant or visualization for the potential irrelevant documents can help data clustering structures [5], computer examiners to efficiently thus being helpful tools for focus on the most relevant forensic examiners that analyze documents without having to textual documents from grabbed inspect them all. In summary, computers. document clustering has great As I already discussed, the potential to be useful for ultimate clustering results can be computer inspection. obtained by cutting the dendro As a final remark, a gram at different levels e.g., by desirable feature of hierarchical using a relative validity criterion algorithms that make them like the Figure. For the sake of particularly interesting for expert illustration, Figs. 4–8 show examiners is the summarized examples of dendro grams view of the dataset in the form of obtained by AL100. The dendro a dendro gram, which is a tree grams were cut horizontally at diagram that illustrates the the height corresponding to the arrangement of the clusters. The number of clusters estimated by root node of the dendro gram the Figure (dashed line). Roughly represents the whole data set (as speaking, sub trees with low a single cluster formed by all height and large width represent objects), and each leaf node both cohesive and large clusters. represents a particular object (as These clusters are good a singleton cluster). The candidates for a starting point intermediate nodes, by their inspection. Moreover, the turn, represent clusters merged forensic examiner can, after hierarchically. The height of an finding a cluster of relevant intermediate node is proportional documents, inspect the cluster to the distance between the most similar to the one just

168 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 found, because it is likely that it is also a relevant cluster. This can be done by taking advantage of the tree diagram. V. FUTURE SCOPE It is well-known that the success of any clustering algorithm is data dependent, but for the assessed datasets some of Fig. 7. Dendro gram obtained our adaptations of existing by AL100 Dataset D. algorithms have shown to be good enough. Scalability may be an issue, however. In order to deal

Fig. 5. Dendro gram obtained by AL100 Dataset B.

Fig. 8. Dendro gram obtained by AL100 Dataset E. With this issue, a number of sampling and other techniques can be used e.g., see [24]–[29]. Also, algorithms such as bisecting k-means and related approaches can be used. These algorithms can also induce dendro grams, and have a similar inductive bias with respect to hierarchical Fig. 6. Dendro gram obtained by methods considered in our work. AL100 Dataset C. More precisely, and aimed at circumventing computational

169 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 difficulties, partitional clustering mentioned in the paper, to algorithms can be used to alleviate this potential difficulty, compute a hierarchical clustering especially when dealing with very solution by using repeated cluster large datasets, a simplified bi sectioning approaches. For Figure [7] can be used. The instance, bisecting k-means has simplified Figure is based on the relatively low computational computation of distances requirements, i.e., it is O between objects and cluster (N.logN), versus the overall time centroids, thus making it possible complexity of ) for to reduce the computational cost agglomerative methods. Since the from to O(k.N.D), where inductive biases of bisecting k- , the number of clusters, is means and the hierarchical usually significantly less than . It algorithms used in our work are is also worth mentioning that similar, I believe that, if the there are several different number of documents is relative validity criteria that can prohibitively high for running be used in place of the Figure agglomerative algorithms, then adopted in our work. As bisecting k-means and related discussed in [14], such criteria approaches can be used. are endowed with particular Considering the features that may make each of computational cost of estimating them to outperform others in the number of clusters, the specific classes of problems. Also, Figure proposed in [4] depends they present different on the computation of all computational requirements. In distances between objects, this context, in practice one can leading to an estimated try different criteria to estimate computational cost of , the number of clusters by taking where N is the number of objects into account both the quality of in the dataset and D is the the obtained data partitions and number of attributes, the associated computational respectively. As already cost.

170 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Finally, as a cautionary suitable for the studied note, I would like to mention application domain because the that, in practice, it is not always dendro grams that they provide of paramount importance to have offer summarized views of the scalable methods. In our documents being inspected, thus particular application scenario, being helpful tools for forensic there are no hard time examiners that analyze textual constraints to get data partitions documents from grabbed (like those present when computers. As already observed analyzing streaming data with in other application domains, online algorithms). Instead, dendro grams provide very domain experts can usually spend informative descriptions and months analyzing their data visualization capabilities of data before reaching a conclusion. clustering structures [5]. The partitional K-means and K- VI. CONCLUSION medoids algorithms also achieved I presented an approach good results when properly that applies document clustering initialized. Considering the Methods to forensic analysis of approaches for estimating the computers grabbed in police number of clusters, the relative investigations. Also, I reported validity criterion known as and discussed several practical Figure has shown to be more results that can be very useful for accurate than its (more researchers and practitioners of computationally efficient) forensic computing. More simplified version. In addition, specifically, in our experiments some of our results suggest that the hierarchical algorithms using the file names along with known as Average Link and the document content Complete Link presented the information may be useful for best results. Despite their usually cluster ensemble algorithms. high computational costs, I have Most importantly, shown that they are particularly

171 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 I observed that clustering Minton, I. Xheneti, A. Toncheva, algorithms indeed tend to induce and A. Manfrediz, “The clusters formed by either expanding relevant or irrelevant documents, Digital universe: A forecast of worldwide information growth thus contributing to enhance the through 2010,” Inf. Data, vol. 1, expert examiner’s job. pp. 1–21, 2007. Furthermore, our evaluation of [2] B. S. Everitt, S. Landau, and M. the proposed approach in five Leese, Cluster Analysis. London, real-world applications show that U.K.: Arnold, 2001. it has the potential to Speed up [3] A. K. Jain and R. C. Dubes, the computer inspection process. Algorithms for Clustering Data. Aimed at further leveraging the Englewood Cliffs, NJ: Prentice- use of data clustering algorithms Hall, 1988. L. Kaufman and P. Rousseeuw, in similar applications, a [4] Finding Groups in Gata: An promising venue for future work Introduction to Cluster Analysis. involves investigating automatic Hoboken, NJ: Wiley- approaches for cluster labeling. Interscience, 1990. The assignment of labels to [5] R. Xu and D. C.Wunsch, II, clusters may enable the expert Clustering. Hoboken, NJ: examiner to identify the semantic Wiley/IEEE content of each cluster more Press, 2009. quickly eventually even before [6] A. Strehl and J. Ghosh, “Cluster examining their contents. ensembles: A knowledge reuse Finally, the study of algorithms framework for combining multiple partitions,” J. Mach. that induce overlapping Learning Res., vol. 3, pp. 583– partitions (e.g., Fuzzy C-Means 617, 2002. and Expectation-Maximization [7] E. R. Hruschka, R. J. G. B. for Gaussian Mixture Models) is Campello, and L. N. de Castro, worth of investigation. “Evolving clusters in gene- References expression data,” Inf. Sci., vol.

[1] J. F. Gantz, D. Reinsel, C. Chute, 176, pp. 1898–1927, 2006. W. Schlichting, J. McArthur, S. [8] B. K. L. Fei, J. H. P. Eloff, H. S. Venter, andM. S. Oliver,

172 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

“Exploring forensic data with and Pattern Recognition, 2010, self-organizing maps,” in Proc. pp. 23–28.

IFIP Int. Conf. Digital Forensics, [14] L. Vendramin, R. J. G. B. 2005, pp. 113–123. Campello, and E. R. Hruschka,

[9] N. L. Beebe and J. G. Clark, “Relative clustering validity “Digital forensic text string criteria: A comparative searching: Improving overview,” Statist. Anal. Data information retrieval Mining, vol. 3, pp. 209–235, effectiveness by thematically 2010.

clustering search results,” [15] G. Salton and C. Buckley, “Term Digital Investigation, Elsevier, weighting approaches in vol. 4, no. 1, pp. 49–54, 2007. automatic text retrieval,” Inf.

[10] R. Hadjidj, M. Debbabi, H. Process. Manage. vol. 24, no. 5, Lounis, F. Iqbal, A. Szporer, and pp. 513–523, 1988.54 IEEE D. Benredjem, “Towards an TRANSACTIONS ON integrated e-mail forensic INFORMATION FORENSICS analysis framework,” Digital AND SECURITY, VOL. 8, NO. Investigation, Elsevier, vol. 5, 1, JANUARY 2013

no. 3–4, pp. 124–137, 2009. [16] L. Liu, J. Kang, J. Yu, and Z. [11] F. Iqbal, H. Binsalleeh, B. C. M. Wang, “A comparative study on Fung, and M. Debbabi, “Mining unsupervised feature selection writeprints from anonymous e- methods for text clustering,” in mails for forensic investigation,” Proc. IEEE Int. Conf. Natural Digital Investigation, Elsevier, Language Processing and vol. 7, no. 1–2, pp. 56–64, 2010. Knowledge Engineering, 2005,

[12] S. Decherchi, S. Tacconi, J. Redi, pp. 597–601. A. Leoncini, F. Sangiacomo, and [17] V. Levenshtein, “Binary codes R. Zunino, “Text clustering for capable of correcting deletions, digital forensics analysis,” insertions, and reversals,” Soviet Computat. Intell. Security Inf. Physics Doklady, vol. 10, pp. Syst., vol. 63, pp. 29–36, 2009. 707–710, 1966.

[13] K. Stoffel, P. Cotofrei, and D. [18] B. Mirkin, Clustering for Data Han, “Fuzzy methods for Mining: A Data Recovery forensic data analysis,” in Proc. Approach. London, U.K.: IEEE Int. Conf. Soft Computing Chapman & Hall, 2005.

173 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

[19] A. L. N. Fred and A. K. Jain, Discov., vol. 10, no. 2, pp. 141– “Combining multiple clusterings 168, 2005.

using evidence accumulation,” [26] Y. Zhao and G. Karypis, IEEE Trans. Pattern Anal. “Evaluation of hierarchical Mach. Intell., vol. 27, no. 6, pp. clustering algorithms for 835–850, Jun. 2005. document datasets,” in Proc.

[20] L. Hubert and P. Arabie, CIKM, 2002, pp. 515–524. “Comparing partitions,” J. [27] S. Nassar, J. Sander, and C. Classification, vol. 2, pp. 193– Cheng, “Incremental and 218, 1985. effective data summarization for

[21] C. M. Bishop, Pattern dynamic hierarchical Recognition and Machine clustering,” in Proc. 2004 ACM Learning. New York: Springer- SIGMOD Int. Conf. Verlag, 2006. Management of Data (SIGMOD

[22] S. Haykin, Neural Networks: A ’04), 2004, pp. 467–478. Comprehensive Foundation. [28] K. Kishida, “High-speed rough Englewood Cliffs, NJ: Prentice- clustering for very large Hall, 1998. document collections,” J. Amer.

[23] L. F. Nassif and E. R. Hruschka, Soc. Inf. Sci., vol. 61, pp. 1092– “Document clustering for 1104, 2010, doi: forensic computing: An approach 10.1002/asi.2131.

for improving computer [29] Y. Loewenstein, E. Portugaly, M. inspection,” in Proc. Tenth Int. Fromer, and M. Linial, “Effcient Conf. Machine Learning and algorithms for exact hierarchical Applications (ICMLA), 2011, vol. clustering of huge datasets: 1, pp. 265–268, IEEE Press. Tackling the entire protein

[24] Aggarwal, C. C. Charu, and C. X. space,” Bioinformatics, vol. 24, Zhai, Eds., “Chapter 4: A Survey no. 13, pp. i41–i49, 2008. of Text Clustering Algorithms,” in Mining Text Data. NewYork: Springer, 2012.

[25] Y. Zhao, G. Karypis, and U. M. Fayyad, “Hierarchical clustering algorithms for document datasets,” Data Min. Knowl.

174 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 BLUETOOTH FILE TRANSFER WITH BREAKPOINT

1Priyanka V. Godse 2Snehal P. Katore 3Poonam A. Modani 4Poonam B. Sonawane

[email protected] [email protected] [email protected] [email protected] Guided by : S. V. Londhe

Sir Visvesvaraya Institute Of Technology,Chincholi(422101).

Abstract-:In this proposed system will store the current system, we provide a new state of transferred file. So methodology for transferring the next time user can file via Bluetooth. Here file transfer only the remaining can be transferred from one part of the failure file. In this device to another device situation, there is no need to simultaneously. There will be transfer whole file again. a log file system on sender for break point and featuring Keywords : Bluetooth, purpose. User can transfer a Breakpoint, File Transfer, file from an android device Log File, Retransfer to another android device via I . INTRODUCTION: Bluetooth.Easy graphical Now a day’s many smart user friendly interface phones are support Wireless or access. Maintaining Log file Bluetooth technology for for break point. In our transferring files without any system we provide a wire. But it provides users with mechanism to handle the limited accessibility. If there is failure of transfer file. If any failure on transferring the there is any failure on file, the transfer will be cancelled transferring the file, the automatically by the application.

175 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 If it is interrupted then it will not universal standard that enables continue from that point rather communication between mobile than whole file will be required to phones, laptops or other portable transfer again so loss of data devices. It supports short range occurs and time will also be wireless transmissions by using consumed more. So rather than the unlicensed 2.4GHz short- sending from the start we will range radio frequency resend the file from the bandwidth. Bluetooth allows breakpoint that is sending of the users to form clusters of 8 file will be resumed. maximum connected devices Need : As the use of bluetooth file that form a star shaped transfer is being increading day network named a piconet. The by day there is need of more main device from the cluster is advanced features in a bluetooth named a master; all other devices file transfer such as there should are named slaves. Two Bluetooth be the facility of retransfer of file devices can transfer data with a from the breakpoint and maximum speed of 2.1Mbps[1]. maintainance of log files. It was originally II . LITERATURE SURVEY: developed as an alternative to Bluetooth is an open RS-232 data cables, personal wireless technology standard for devices communicate based on exchanging data over short the RS-232 serial port protocol, distances using short wavelength proprietary connectors and pin radio transmissions in the ISM arrangements make it impossible band from 2400-2480MHz from to use the same set of cables to fixed and mobile devices. It was interconnect devices from originally developed as an different manufactures, and, alternative to RS-232 data cables. some times, even from the same It can connect several devices, manufacturer[2]. While that of overcoming problems of the Bluetooth connection enables synchronization. Wireless easy exchange of information Bluetooth technology is a

176 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 between different as well as same Android platform provides the manufacturing devices. basic functionalities with the The Android platform Bluetooth File Libraries which includes support for the are necessary for the operations Bluetooth network stack, which related to the file transmission. allows a device to wirelessly III. MATHEMATICAL MODEL: exchange data with other Let S be a system, Bluetooth devices. The S={I,O,W}; application framework provides Where, access to the Bluetooth I=Input set, functionality through the O=Output set, Android Bluetooth APIs. These W=Phases of system. APIs let applications wirelessly Input set : connect to other Bluetooth I={F,D}; devices, enabling point-to-point Where, and multipoint wireless F=Files to be transfferred, features[3]. D=Devices to which files should Using the Bluetooth APIs, an be transfer. Android application can perform Now, the following: F={F0,F1,F2,...... ,Fn};  Scan for other Bluetooth Where, devices. F0={Fs0,Fb0,Fe0};  Query the local Bluetooth F1={Fs1,Fb1,Fe1}; adapter for paired F2={Fs2,Fb2,Fe2}; Bluetooth devices. . .  Establish RFCOMM channels. . .  Connect to other devices through service discovery. Fn={Fsn,Fbn,Fen}; Where,  Transfer data to and from other devices. Fsn=starting file of file n, Fbn=breaking point of file n,

177 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Fen=ending point of file n. And, D={P,C}; Where, P= list of paired bluetooth devices, C= Visible bluetooth devices. Again ,

P={P0,P1,P2,...... ,Pm}; M1:Q0->Q1; C={C0,C1,C2,.....,Cm}; Where, M1 is a connection Output : manager which connnects source O={Q0,Q1,Q2,Q3,Q4,Q5,Q6}; and target devices for further Q0=List of paired and data transmission. discoverable devices. M2:Q1->Q2; Q1=Connected device. Where, M2 is a function that will Q2=List of files for transmission. select files in source device to Q3=Actual file transmission. transmit it to the target device. Q4=Log files. M3:Q2->Q3; Q5=File break point details. Where, M3 is a function in which Q6=Complete file transmission. system will transfer the file from W={M0,M1,M2,M3,M4,M5,M6}; source device to target device,the M0:D->Q0; file transmission will be parallel Where, M0 is a function to scan in case of multiple files. the devices so D will contain list M4:Q3->Q4; of paired and discoverable Where, In this the log file will be devices(It automatically enables maintained in case of failure in bluetooth discovery if it is not the transmision of the file,log file discoverable). will be updated in every data transfer and will be stired on both side. M5:Q4->Q5;

178 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Where, In M5 the system will ask point if the transmission is the filename from target user interrupted in any manner. and verify the file availability There will be a log file system on from source and target devices sender for break point and and transfer the file from break featuring purpose. It is based on point. parallel data transfer and M6:Q5->Q6; multitasking. Where, M6 is the state at which Features: file will be transffered 1. User can transfer a file from a successfully. android device to another android device. 2. Easy graphical user friendly interface access. 3. Maintaining Log file for break point. Figure : DFA for Bluetooth file 4. In our system we provide a transfer with breakpoint. mechanism to handle the failure Where, of transfer file. If there is any 1=scanning the devices (Paired failure on transferring the file, and new devices). the system will store the current 2=connection manager. state of transferred file. So in the 3=file selection. next time user can transfer only 4=file transfer. the remaining part of the failure 5=log file for failure connection. file. In this situation, there is no 6=file retransfer. required to transfer whole file 7=successful file transfer. again. So the loss of data occurs Success:{M6}; and time will also be consumed Failure:{M0,M1,M2,M3,M4,M5}; more by our proposed system. IV. PROPOSED SYSTEM: 1.Working: Current Bluetooth File We use a novel algorithm Transfer does not guarantee the to manage the file transfer transmission of a file from break failure with break point facility.

179 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Which is Current Object State 2.1 Module 2 (Connection Maintenance to Re/Store the data Manager): into targeted devices. In this module, the system The Current Object State will find the target device (which Maintenance algorithm will selected by user from scan list) maintain a log file of transferred and try to create connection data between source and target between the source and target devices. And following are the device.The system will display a modules: success message to source and 1.Scan Devices (Paired and new) target device if connection 2.Connection Manager succeeded, else display failure 3.File Selection in source device message. 4.File Transfer Manager 5.Log File Manager for Failure 2.3 Module 3 (File Selection in Connection Source Device): 6.File Re Transfer Module In this module, the system will ask the user to enter the 2.Module filename (which is going to transfer to target device).The file 2.1 Module 1 (Scan Devices should be from a external (SD (Paired and New)): Card) memory card. The system In this module, the system will not support internal memory will find all available Bluetooth card file system.The system will supported (android) devices. The add the file to transfer list if file System will display all paired selection is valid, else display a devices and new devices for failure message and will not add pairing. the file to transfer list. It automatically enables Bluetooth Discovery, if its not 2.4 Module 4 (File Transfer enabled. Manager): In this module, the system will transfer the file (which is

180 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 selected by user) from source 2.6 Module 6 (File Re Transfer device to target device.The Module): transfer will be in sequential and In this module, the system the data transfer speed will be will ask the filename from target depends the source and target user and verify the file device. availability from source and target devices. If file verified, it 2.5 Module 5(Log File Manager display the related information for Failure Connection): like File Size, Total Transferred In this module, the system Bytes, Break Position and will maintain a log file of Remaining Bytes to Transfer and transferred data between source etc.In the last, source user will and target devices.The log file transfer the file from the break will update in every data position to target user. transfer.The log file will stored in both side. The log file will 3. Block Diagram contains Filename, Total File 3.1 Bluetooth File Transfer: Size, Current Index Position, Transferred Bytes, Total Transfer Bytes and etc

Figure 3.1 : Bluetooth file transfer(current bluetooth file transfer system).

181 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 3.2 Bluetooth File Re-transfer:

Figure 3.2 : Bluetooth file retransfer(proposed system bluetooth file transfer)

V.APPLICATIONS: VI.CONCLUSION: The proposed system is itself an Thus the Bluetooth File application which is basically transmission with breakpoint designed for the mobile phones. system will save the users time Better GUI is provided for this of transmitting data application. Different android phones will use this application VII.REFRENCES: for faster transmission and [1] Olaiya Folorunsho and Mariam purpose of retransmission from Biola Bello “Development of Tool breakpoint. Multiple file for Managing Bluetooth Data Transfer Logs in Mobile Platform” transfer will be possible with International Journal of Advanced this application. Research in Computer Science

182 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 and Software Engineering Research Paper. [2] Pennsylvania Philadelphia, PA Holmdel, NJ Philadelphia, PA “Bluetooth Technology Key Challenges and Initial Research” Roch Gu´erin Enyoung Kim Saswati Sarkar [email protected] [email protected] [email protected] U. Pennsylvania Lucent Technologies U. [3]http://developer.android.com/gui de/topics/connectivity/bluetooth.ht ml [4] Bluetooth SIG, http://www.bluetooth.com.

[5] Bluetooth support inWindows XP, http://www.microsoft.com/hwdev/te ch/network/bluetooth. [6]http://bluetooth.com/Bluetooth/ Technology/Works/Security.

[7] Bluetooth Security Architecture white paper,http://www.bluetooth.com/de veloper/whitepaper/whitepaper.asp.

183 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Optimal Multiserver Configuration for Profit Maximization in Cloud Computing

Pravin Pokale1, Vishal Agrawal2, Rahul Wakchaure3 1,2,3B.E Computer Science Engg, S.V.I.T, Nashik, Maharashtra.

make migrating of data to Abstract— The cloud the cloud possible. Our computing paradigm has results show that, our achieved widespread proposed model provides a adoption in recent years. Its better decision for customers success is due largely to according to their available customers ability to use budgets services on demand with a Index Terms—Cloud pay-as-you go pricing model, Computing, Pricing Model, which has proved convenient Data Security, Cost-effective, in many respects. As cloud Cloud data Migration, Cloud computing becomes more and service provider more popular, understanding INTRODUCTION the costing of cloud The end of this decade is computing becomes critically marked by a paradigm shift of the important. Our pricing model industrial information technology in multiserver system helps towards a subscription based or us to understand pay per use pay-per-use service business service of various cloud model known as cloud computing service providers. Cloud data [1]. One of the prominent services storage highlights the offered in cloud computing is the security issues targeted on cloud data storage, in which, customer’s outsourced data subscribers do not store their i.e. data that is not data on their own servers, where stored/retrieved from the their data will be stored on the customers own servers. Low cloud service provider’s servers. costs and high flexibility

184 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 In cloud computing, customer Businesses locked into such a are charged for the service cloud come to standstill until the providers storage service. This cloud is back online. Moreover, service does not only provides public cloud providers generally flexibility and scalability for the don’t guarantee regular service data storage, it also provide level agreements (SLAs) that is, customers with the benefit of businesses locked into a cloud paying only for the amount of have no guarantees that it will data they need to store for a continue to provide the required particular period of time, without quality of service (QoS). Finally, any concerns for efficient storage most public cloud providers’ mechanisms and maintainability terms of service allows provider issues with large amounts of data unknowingly change pricing at storage. In addition to these any time. Hence, a business benefits, customers can easily locked into a cloud has no more access their data from any control over its own IT costs. geographical region where the Cloud Service Provider’s network or Internet can be accessed. An example of the cloud computing is shown in Fig 1. Despite its obvious advantages, however, many companies hesitate to

“move to the Fig. 1. Cloud computing Cloud” mainly because of architecture example concerns related to service availability, data lock-in, and In addition, providing better legal uncertainties. privacy as well as ensuring data Lockin causes more problems. availability, can be achieved by For one thing, even though public dividing the user’s data block into cloud availability is generally data pieces and distributing them high, outages still occur. among the available SPs in such a

185 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 way that no less than a threshold provider’s cloud services and number of SPs can take part in using its specific API, that successful retrieval of the whole application is bound to that data block. To address these provider; deploying it on another issues in this paper, we proposed cloud would usually require an economical distribution of data completely redesigning and among the available SPs in the rewriting it. Such vendor lock-in market, to provide customers leads to strong dependence on the with data availability as well as cloud service operator. First, secure storage. In our model, the standardized programming APIs customer distributes his data must enable developers to create among several SPs available in cloud-independent applications the market, based on his available that aren’t dependent to any budget. we also provide a decision single provider or cloud service. for the customer, to which SPs he Cloud applications also depends must chose to access data, with on sophisticated features to respect to data access quality of actually deploy and install service offered by the SPs at the applications automatically. location of data retrieval. This Predictable and controlled removes the possibility of a SP application deployment is a misusing the customers’ data, central issue for cost-effective and breaching the privacy of data, but efficient deployments in the can easily ensure the data cloud. availability with a better quality Our proposed approach will of service. provide the cloud computing In our multiserver system we users a decision model, that also introduced the concept of provides a better security by runtime data migration between distributing the data over different cloud service providers. multiple cloud service providers The problem, however, is that in such a way that, none of the once an application has been SP can successfully retrieve developed based on one particular meaningful information from the

186 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 data pieces allocated at their Customers pay a fixed price per servers. Also, in addition, we unit of use. Amazon, considered provide the user with better the market leader in cloud availability of data, by computing, utilizes such a model maintaining redundancy in data by charging a fixed price for each distribution. If a service provider hour of virtual machine usage. suffers service outage or goes The “pay-as-you-go” model is also failure, the user still can access implemented by other leading his data by retrieving it from enterprises such as Google App other service providers. Engine and Windows Azure. From the business point of Another common scheme view, since cloud data storage is a employed by these leading subscription service, the higher enterprises is the “pay for the data redundancy, the higher resources” model. A customer will be the cost to be paid by the pays for the amount of bandwidth user. Thus, we provide an or storage utilized. Subscription, optimization scheme to handle where a customer pays in advance the tradeoff between the costs for the services he is going to that a cloud computing user is receive for a pre-defined period of willing to pay to achieve a time, is also common. particular level of security for his A customer will evaluate a data. In other words, we provide a single service provider based on scheme to maximize the security three main parameters: pricing for a given budget for the cloud approach, QoS, and utilization data. period. The pricing approach RELATED WORK describes the process by which Different service providers the price is determined. The employ different schemes and pricing approach could be one of models for pricing [6]. However, the following: fixed priced the most common model independent of volume, fixed employed in cloud computing is price plus per-unit rate, assured the “pay-as-you go” model. purchase volume plus per unit

187 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 price rate, per-unit rate with a ceiling, and per-unit price. The quality of service describes the requirements for what a service provider should provide to his customers. QoS requirements include the availability of service, security, privacy, scalability, and integrity of the service provider. If the service provider ensures that these requirements are maintained at a high level, the quality of the service provided will increase. This will increase Fig. 2. Aspects of Cloud the number of customers and Computing loyalty to the service provider. The utilization period can be Privacy preservation and data called as the period in which the integrity are two of the most customer has the right to utilize critical security issues related to the provider services based on user data. In conventional SLAs between the two parties. It paradigm, the organizations had could be perpetual, based on the the physical possession of their subscription period, or a pay-per- data and hence have an ease of use model. Figure 2 below implementing better data describes the main aspects of security policies. But in case of pricing models. cloud computing, the data is stored on an autonomous business party, that provides data storage as a subscription service. The users have to trust the cloud service provide with security of

188 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 their data. In, the author meaningful information from the discussed the criticality of the pieces of data stored on its privacy issues in cloud servers, without getting some computing, and pointed out that more pieces of data from other obtaining an information from a service providers. Therefore, the third party is much more easier conventional single service than from the creator himself. provider based scheme does not One more bigger concern that seem too much promising. arises in such schemes of cloud It is the method of moving a storage services, is that, there is large amount of data and no full-proof way to be certain applications into the target cloud that the service provider does not where the target cloud can be a retain the user data, even after public, a private or hybrid cloud the user opts out of the [2]. Since large numbers of subscription. With enormous applications are required to fulfill amount of time, such data can be an organization’s business needs decrypted and meaningful and to improve its growth, information can be retrieved and various models of DaaS (Database user privacy can easily be as a service) are now provided breached. Since, the user might keeping in view the data not be availing the storage migration process. The data can services from that service be migrated in several ways such provider, he will have no clue of as from any organization to a such a passive attack. target cloud or from one cloud to To provide users with better another cloud [3]. But it is quite and fair chances to avail efficient challenging task to migrate data security services for their cloud and it involves various major storage at affordable costs, our security issues as well like data model distributes the data pieces integrity, security, portability, among more than one service data privacy, data accuracy etc. providers, in such a way that no one of the SPs can retrieve any

189 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 associated with a QoS factor, along with its cost of providing storage service per unit of stored data (C) [5]. Every SP has a different level of quality of Fig. 3. Data Migration In service (QoS) offered as well as a Cloud different cost associated with it. MODELS Hence, the cloud user can store In our proposed paper we are his data on more than one SPs going to include three different according to the required level of concepts based on cloud security and their affordable computing. Firstly we will explain budgets. secured multi-cloud data storage We use an example in Fig. 4 to and retrieval, second security of illustrate our proposed threshold. data in cloud and pricing models. In this example we assume that Secured Multi-Cloud Data we have 9 cloud service providers Storage and Retrieval (SP1, SP2, ..., SP9). Let us We consider the storage assume that a Customer (C1) has services for cloud data storage divided his own data he wish to between two entities, cloud users store on some SP’s servers into 9 (U) and cloud service providers data pieces. A customer required (SP). The cloud storage service is to retrieve at least 6 data pieces generally priced on two factors, from different SPs to reconstruct how much data is to be stored on his own data to get the full the cloud servers and for how information, where in our long the data is to be stored. In example, six SPs will participate our model, we assume that all the in the data retrieval (SP1, SP4, data is to be stored for same SP5, SP6, SP8 and SP9). period of time. We consider p number of cloud service providers (SP), each available cloud service provider is

190 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 longer in length as compared to the plain text otherwise a problem may occur. Enhanced algorithm is as follows: 1) For Encryption

Initially, generate a random Fig. 4. Data Storage and  key. Retrieval Encrypt the data using that Security of Data in Cloud  In my proposed work, I am random key. creating an encryption algorithm  Encrypt the random key to provide strong security to data with the shared key. in cloud that would be better in  Forward the data after performance than using already encryption process from existing encryption algorithm like step 2 and step 3 together. PBE(prediction based encryption) 2) For Decryption ,IBE(Identity Based Encryption)  Now decrypt the encrypted etc. In this encryption method, random key with shared the concept of randomization will key. be used. In randomization  Then, decrypt the encrypted concept, we initially start with data with decrypted random encrypting a single plaintext P key. into a number of cipher texts such as C1,C2,…Cn and then randomly select any one of the N cipher texts and secondly, map Fig. 5. Enhanced Algorithm any of those cipher texts back Here, the shared key is re-used into the original plain text since but the random key is used only the one who decrypts the text has once for encrypting data. Hence, no knowledge about which one by applying this method, the data has been picked. One thing that can be made more secure and must be taken into consideration reliable as the outsider will have is that the cipher text should be no idea about what data is

191 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 encrypted with the persistent However, the most common key. model employed in cloud Pricing Models computing is the “pay-as-you go” Different service providers model. Customers pay a fixed employ different schemes and price per unit of use. models for pricing as in Table 1. Table 1. Pricing Model Comparison

Pricing Pricing Fairness Pros Cons model approach Pay-as-you- Price is set Unfair to  Customer is Service go model by the the aware of provider might service customer the exact reserve the provider because he price to be resources for and remains might pay paid longer than constant for more the customer’s  Resources (static) time than are utilized needed reserved for the customer for the paid period of time Subscription Price is Customer Customer Customer based on might might might overpay the period sometimes underpay for for the of overpay or the resources subscription underpay resources reserved if he (static) reserved if does not use he uses them them extensively extensively Value-based Prices set Fair to High Difficult to pricing according to producers revenue on obtain and the value where each item interpret data perceived by prices are sold from the set on the (advantage customers, customer value from the competitors, (dynamic) perceived producer’s and one’s own

192 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 by the point of corporation to customer view). evaluate customer perceived value Cost-based Price set by Not fair to Simplicity in Tends to pricing[4] adding a customers calculating ignore the role profit where the the price of consumers element on perceived top of the value of cost the product (dynamic) can be identified Competition- Price set Fair to Easy to Does not take based according to customers implement customers into pricing competitors’ where account prices prices are (dynamic) always set according to competitive prices Customer- Price set Fair to Takes  Customers based according to customers customer rarely pricing what the as perspective indicate to customer is customers into account seller what prepared to are always they are pay taken into willing to pay (dynamic) account  Data are difficult to obtain and to interpret Pay-for- Cost-based Fair for Offers Hard to resources (static) both maximum implement model customers utilization of and the the service service provider’s provider resources

193 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

CONCLUSION helpful discussion and contribution to this work. In this paper we have proposed a multiserver middleware REFERENCES system which will consist of [1] M. Armbrust et al., “A View three models. Secured multi- of Cloud Computing,” cloud data storage and retrieval Comm. ACM, vol. 53, no. 4, helps to provide each customer 2010, pp. 50–58. with a better cloud data storage [2] Secure Migration of Various by dividing and distributing Database over A Cross customers data providing a Platform Environment, an customer with a secured storage International Journal Of under his affordable budget. Engineering And Computer When data is transferred from Science ISSN:2319-7242 one cloud to another cloud Volume 2 Issue 4 April, during the migration process we 2013. improved its security with the [3] Data Migration: Connecting randomized encryption Databases in the Cloud, a technique. With the help of our research paper published by pricing models customer can authors: Farah Habib choose the service provider with Chanchary and Samiul the pricing approach that is Islam in ICCIT 2012. most compatible with the customer’s behavior. [4] S. Lehmann and P. Buxmann, “Pricing ACKNOWLEDGEMENT Strategies of Software None of this work have been Vendors”, Business and possible without the selfless Information Systems assistance of great number of Engineering, (2009). people. I would like to gracefully [5] S. H. Shin, K. Kobara, thank all those members for “Towards secure cloud their value of guidance, time, storage”, Demo for CloudCom2010, Dec 2010.

194 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

[6] S. Maxwell, “The Price is Wrong: Understanding What Makes a Price Seem Fair and the True Cost of Unfair Pricing”, Wiley, (2008).

[7] “System Analysis and Design” by Elias M. Awad. “Microsoft .NET Framework 3.5 ASP.NET Application Development” by Mike Snell, Glenn Johnson, Tony Northup.

195 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 FACILITATING EFFECTIVE USER NAVIGATION THROUGH WEBSITE STRUCTURE IMPROVEMENT

Jyoti B. Kshirsagar Student, S.V.I.T., Chincholi, Nashik

Prof .S .D. Jondhale Professor, P.R.Engg.College., Loni, [email protected] [email protected]

Abstract- Today, it has of disorienting users after become a maintenance how the changes remains to evaluate and improve unanalyzed. Generally, we website structure has become propose a mathematical a crucial issue for website programming model to design and maintenance. improve the user navigation Designing well-structured on a website while websites has long been a minimizing alterations to its challenge to facilitate current structure. effective user navigation. The understanding of web Keywords— Mathematical developers how a structured Programming, User should be organized. If they Navigation, Website Design., have number of task Web Mining, proposed to re link web pages 1. INTRODUCTION to improve navigability using The growth of the explosive web user navigation data, the content, both website users and completely reorganized new managers expect high quality structure can be highly content service. Users wish to unpredictable, and the cost get the information they need

196 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 conveniently. Managers wish to is not easy, thus they are adding attract more users to their more amount of website design. websites. The user’s accessing to If they are highlights that poor a website can be considered as website design has been a key obtaining information from web element in a number of high pages under the restriction of profile site failures. Information website structure. For a is gain of the high quality of relatively static website, its link service.Also find that users the structure would have great targets are very likely to leave a influence to the content service website even if its information is quality. Therefore how to of high quality. If the website evaluate and improve website has minimizing changes to a structure improvement becomes website and reducing a crucial issue. To analyzed past information overload to users. user access patterns to discover common user access behavior, 2. RELATED WORK which is useful to improve the We survey several studies static website link structure or about optimal ratios between dynamically insert links to web depth and breadth of pages. A method to redesign a hierarchies.If they have website by identifying user hierarchies of advantages i.e. it profiles from web logs. Web has to be in plan and the Utilization Miner to find out simultaneously concept of interesting navigation patterns aggregation and abstraction, if for improving the organization of they are already present web pages. A technique to information spaces in discover the gap between website navigational purposes . The designers' expectations and concept of multi trees was users' behavior and suggest areas adapted and integrated in our where website can be improved. conceptual framework. The If they are finding out the culminating in his pessimistic desired information of a website statement that the web will suffer

197 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 a severe usability . we study that first metrics, and how likely the empirical work clearly users suffering navigation indicates that a successful difficulty can benefit from the implementation of website improvements made to the site structure improvement. We focus structure as a second metrics. It of the papers on web site user consist of three steps in first navigation success, user metrics as: expectation. i) The training data to obtain If the user has the poor the links of pages to be website to understand the web improved and the set of new developers how to organized the links that is apply for the structure of a website . mathematical programming Thus,these differences result in model. desired information in not easily ii) The mini session is improved obtained the website. Specifically, from the testing data. They the satisfaction of the users to are having two or more paths measure of website effectiveness and their length i.e. the set of of the web developers. Generally, candidate links that can be how pages should be organized use to improve them and the user’s model of a pages. If we number of paths. have partition the real data set iii) In step 2 for each mini into a training set i.e. first three session, check any candidate months and a testing set i.e.last links matches one of the links month. If they have two metrics in step 1,this is the results is to be defined the average from the training data. number of paths per mini session and the percentage of mini 3. PROPOSED WORK sessions on a specified threshold. Whether the improved structure Users cannot locate the desired can facilitate users to measure information in a website reach their targets faster than structure improvement for the current one on average as a differences in such result. We

198 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 have study to propose a are not improve of the irrelevant mathematical programming mini session. model on a website to improve 3.3 Dominated Mini Sessions

the user navigation wheather the Mini session Tp

changes has to be made on dominates mini session Tq i.e. the minimizing the current set of relevant candidate links of structure. Tp. If the session Tq has the set 3.1 Relevant Mini Sessions of relevant candidate links of Tq. The length of relevant For all the consideration of Tp as mini session is larger than the a relevant candidate links. corresponding path threshold (p) Notation Definition and it has denoted by Im.. We T Mini session that contains the set of paths traversed by a user to define Im = I \ R, any mini session locate one target page. T Im will not be considered in Є W The set of all web pages. our model. If S already meets the I The set of all identified mini goal of path threshold. If the session. path threshold increases from 3 R The set of relevant mini session. C The set of candidate links it can to 5, it reduces from several be selected for improving user thousands of a few hundred to navigation. the number of relevant mini E 1 if page 1 has a link to page j in ij session. Thus, p = 1 the larger the current structure; 0 otherwise.

number of relevant mini sessions L The set of relevant candidate links. M Multiplier for the penalty term in can be deleted from it’s the Objective function. consideration. S The set of source nodes of links in 3.2 Relevant Candidate Links set C. p The path threshold for mini The set of candidate links j sessions in which page j in the for irrelevant mini session is target page.

denoted by CIM and the set of N The number of links that exceed i candidate links for relevant mini the out-Degree threshold O in i page i. session is denoted by CRM.. CIM \ O The out degree threshold for page C is the candidate links which i RM i.

199 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

D The current out degree threshold i for page i. ID Candidate Links t 1 if the link from i to j is selected; ij T {(2,6),(1,6),(4,6)} 0 Otherwise. 1 qT 1 if i is the rth page in the kth ijkr

path and j is the page in mini T2 session. {(4,6),(3,6),(5,6),(1,6)} OT 1 if in mini session T, a link from kr T {(1,4),(5,4),(2,4)} rth page in target is selected. 3 T tgt(T) The target page of mini session T. 4

{(6,4),(3,4),(2,4),(1,4)}

Table 3.1 Summary of T5 Notations {(4,6),(1,6),(5,6),(3,6)}

Illustrative Examples T6 {(5,4),(3,4),(1,4)}

An Example of Mini Sessions should be improved such Table 3.3 The Set of All

that users can reach their targets Candidate Links in one path. ID Mini Session The penalty term is not consider now. The problem is formulated T1 {(2,1),(4),(5,6)} as T2 {(4,3),(5,1),(2,6)} Minimize t [1 – E (1 – )] T3 {(1,5,2),(6,4)} Σ ij ij ε Subject to OT = qT t ; T4 {(6,3),(2,1),(5,4)} kr Σ ijkr ij r=1,2,…,F (k,T), T5 {(4,1),(5),(3),(2,6)} p k=1,2,….F (T), T R T6 {(5,3,1),(2,4)} m є + >=1 Table 3.2 An Example of Mini (T1) Sessions + >=1 (T2) If the six relevant mini session of + + >=1 the length is larger than the path (T3) threshold. + >=1 (T4)

200 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

+ >=1 threshold as p = 2. The formulated problem as : (T5 ) + + >=1 Minimize t [1 – E (1 – )] + Σ ij ij ε (T6) 5 ∑є t {0,1}, ( i ,j) C Subject to ij ε ν ε

If t = 1 and , the = (i,j) C tij; r= ij = 1 Σ є

corresponding is also set to 1 1,2,…,Fp(k,T), The value is set as for = 1.0 C – k = 1,2,…,F (T), T R ε m ν є 8 is used in the example. + + >=1 (T1) ID Relevant Candidate Links + + + >=1 (T2) T1 {(2,6),(1,6)} + + + >=1 (T4) T2 {(4,6),(3,6)} T {(1,4),(5,4),(2,4)} + + >=1 3

T4 {(6,4),(3,4)} (T5) T {(4,6),(1,6)} t {0,1}, {0} U Y+, ( i ,j) 5 ij ε є ν ε T {(5,4),(3,4),(1,4)} C, i S. 6 ε

Table 3.4 Relevant Candidate ID Relevant Candidate Links Links for p = 1

T1 {(2,6),(1,6),(4,6)}

T2 {(4,6),(3,6),(5,6),(1,6)} The optimal solution as t26 = t46

T4 {(6,4),(3,4),(2,4),(1,4)} = t14 = t64 = 1,with the some

other variable as set to 0. The T5 {(4,6),(1,6),(5,6)} new link is required as (4,6) is Table 3.5 Relevant Candidate already presenting links, i.e.,E 26 Links for p = 2 = E = E = 1.The multiplier 14 64 for the penalty term and the out- The first formula is used it has degree threshold as M = 5 and O six constraints and newly defined = 3 and also increases of path the formula it has four

201 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 constraints to that of relevant on a website. If the contents are

mini session. If the T5 dominates stable over time of the

T2 is improved and T2 is also informational websites. The improved. optimal solution are obtained for 4. Discussion the expectation of the user. 4.1 Path Threshold If the small path References threshold i.e. p = 1 it has many [1] J. Palmer, “Web Site Usability, links to bf added. If the website Design, and Performance results are more improve is only Metrics,” Information Systems Research, vol. 13, no. 2, pp. 151- apply for small path threshold. 167, 2002. V. McKinney, K. 4.2 Out-Degree Threshold Yoon, and F. Zahedi, “The

Measurement of Web- They have only content Customer Satisfaction: An and index pages. If the content Expectation and page have less amount of links Disconfirmation Approach,” and the web pages of threshold Information Systems Research, value is increases. vol. 13, no. 3, pp. 296-315, 2002. [2] Y. Fu, M.Y. Shih, M. Creado, and C. Ju, “Reorganizing Web 4.3 Mini Session Sites Based on User Access Mini session is the Patterns,” Intelligent Systems in Accounting, Finance and choice of page stay time-out Management, vol. 11, no. 1, pp. threshold. If the information is 39-53, 2002. displayed in that mini session [3] J. Liu, S. Zhang, and J. Yang, i.e.time threshold. “Characterizing Web Usage Regularities with Information Conclusion Foraging Agents,” IEEE Trans. Knowledge and Data Eng., vol. We have study a 16, no. 5, pp. 566-584, May mathematical programming 2004. model to improve the navigation [4] J. Palmer, “Designing for Web Site Usability,” Computer, vol.

202 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 35,no. 7, pp. 102-103, June 2002. [5] C.S. Miller and R.W. Remington, “Modeling Information Navigation: Implications for Information Architecture,” Human Computer Interaction, vol. 19, pp. 225-271, 2004. [6] R. Gupta, A. Bagchi, and S. Sarkar, “Improving Linkage of Web Pages,” INFORMS J. Computing, vol. 19, no. 1, pp. 127-136, 2007. [7] H. Kao, J. Ho, and M. Chen, “WISDOM: Web Intrapage Informative Structure Mining Based on Document Object Model,” IEEE Trans. Knowledge and Data Eng., vol. 17, no. 5, pp. 614-627, May 2005.

203 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 VEINSECURED: A DETECTION AND PREVENTION OF DDOS ATTACKS

Sucheta Daware BE Computer, S. V. I. T. Chincholi Nashik, India [email protected]

Saurabh Chatterjee Shrutika Jadhav BE Computer, S. V. I. T. Chincholi BE Computer, S. V. I. T. Chincholi Nashik, India Nashik, India [email protected] [email protected]

Abstract— DoS and DDoS daily – a number that has attacks make news headlines grown rapidly in recent years. around the world daily, with This paper elaborates DDoS stories recounting how a attacks and providess the malicious individual or group explained theory of was able to cause significant foundation, architecture, and downtime for a website or use algorithms of VeinSecured. the disruption to breach The VeinSecured is mainly security, causing financial composed of intrusion and reputational damage. prevention systems (IPSs). While information security Such IPS works at ISP level. researchers have yet to ISP is internet service develop a standardized provider. The IPSs form strategy to collect data virtual protection rings regarding the number or around the hosts to defend nature of DoS and DDoS and collaborate by attacks that occur around the exchanging selected traffic world, it is estimated that information. over 7,000 such attacks occur A very big network of compromised machines and

204 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 one central controller sources to create the denial-of- machine is called botnet service condition. By using where the centralized multiple sources to attack a machine is master and other victim, the mastermind is not machines are called bots.The only able to amplify the master can produce and magnitude of the attack, but can trigger DDoS attacks by using better hide his or her actual this bots. A botnet detection source IP address. The more and identification is still layers that the attacker can place complicated. between him and the victim, the The VeinSecured is able to greater the chances of avoiding detect not only DDos attacks detection. but also flooding DDoS The exponential growth of attacks. VeinSecured works at computer/network attacks are ISPs level and keeps host safe becoming more and more difficult from such attacks. Overlay to identify the need for better and network of protection rings more efficient intrusion detection made by a number of IPSs is a systems increases in step. The main context on which main problem with current VeinSecured largely depends. intrusion detection systems is Index Terms— Detection, high rate of false alarms. The distributed denial-of-service design and implementation of a (DDos), looding, network load balancing between the traffic security coming from clients and the I. INTRODUCTION traffic originated from the Distributed Denial of Service attackers is not implemented. (DDoS) attacks are designed to Most recent works aim at prevent or degrade services countering DDoS attacks by provided by a computer at a given fighting the underlying vector, Internet Protocol (IP) address. which is usually the use of The Distributed Denial of Service botnets. A botnet is a large (DDoS) attack leverages multiple network of compromised

205 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 machines (bots) controlled by one specific flaw in the victim. Hence, entity (the master). The master this paper focuses exclusively on can launch synchronized attacks, flooding DDoS attacks. such as DDoS, by sending orders A single intrusion to the bots via a Command & prevention system (IPS) or Control channel. Unfortunately, intrusion detection system (IDS) detecting a botnet is also hard, can hardly detect such DDoS and efficient solutions may attacks, unless they are located require to participate actively to very close to the victim. However, the botnet itself, which raises even in that latter case, the important ethical issues, or to IDS/IPS may crash because it first detect botnet-related needs to deal with an malicious activities (attacks, overwhelming volume of packets infections, etc.), which may delay (some flooding attacks reach 10– the mitigation. 100 Gb/s). In addition, allowing To avoid these issues, this such huge traffic to transit paper focuses on the detection of through the Internet and only DDoS attacks and per se not their detect/block it at the host underlying vectors. Although non IDS/IPS may severely strain distributed denial-of-service Internet resources. attacks usually exploit This paper presents vulnerability by sending few VeinSecured, a new collaborative carefully forged packets to system that detects flooding disrupt a service, DDoS attacks DDoS attacks as far as possible are mainly used for flooding a from the victim host and as close particular victim with massive as possible to the attack source(s) traffic as highlighted in. In fact, at the Internet service provider the popularity of these attacks is (ISP) level. VeinSecured relies on due to their high effectiveness a distributed architecture against any kind of service since composed of multiple IPSs there is no need to identify and forming overlay networks of exploit any particular service-

206 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 protection rings around different leveraged metrics and subscribed customers. components of the system are VeinSecured is designed presented in Section III. Section in a way that makes it a service IV presents VeinSecured attack to which customers can detection algorithms. Section V subscribe. Participating IPSs explains the mitigation technique along the path to a subscribed used once an attack has been customer collaborate (vertical detected. Section VI presents the communication) by computing simulations we conducted in and exchanging belief scores on order to evaluate VeinSecured. potential attacks. The IPSs form Finally, Section VII concludes the virtual protection rings around paper and outlines future the host they protect. The research directions. virtual rings use horizontal communication when the degree II. VeinSecured of a potential attack is high. In ARCHITECTURE this way, the threat is measured A. Ring-Based Overlay Protection based on the overall traffic The VeinSecured system (Fig. 1) bandwidth directed to the maintains virtual rings or shields customer compared to the of protection around registered maximum bandwidth it customers. A ring is composed of supports. In addition to a set of IPSs that are at the same detecting flooding DDoS attacks, distance (number of hops) from VeinSecured also helps in the customer (Fig. 2). As depicted detecting other flooding in Fig. 1, each VeinSecured IPS scenarios, such as flash crowds, instance analyzes aggregated and for botnet-based DDoS traffic within a configurable attacks. detection window. The metrics This paper proceeds as follows. manager computes the Section II describes the frequencies and the entropies of architecture and the global each rule. A rule describes a operation of VeinSecured. The specific traffic instance to

207 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 monitor and is essentially a cannot be possibly monitored, we traffic filter, which can be based promote the usage of multiple on IP addresses or ports. levels and collaborative filtering Following each detection described previously for an window, the selection manager efficient selection of rules, and so measures the deviation of the traffic, along the process. current traffic profile from the stored ones, selects out of profile rules, and then forwards them to the score manager. Using a decision table, the score manager assigns a score to each selected rule based on the frequencies, the entropies, and the scores received from upstream IPSs (vertical collaboration/communication). Using a threshold, a quite low score is marked as a low potential attack and is communicated to the downstream IPS that will use to compute its own score. A quite high score on the other hand is Fig.1. VeinSecured Architecture marked as high potential attack and triggers ring-level (horizontal) communication (Fig. 2) in order to confirm or dismiss the attack based on the computation of the actual packet rate crossing the ring surpasses Fig.2. Horizontal and vertical the known, or evaluated, communication in VeinSecured. customer capacity (Section II-B). B. Subscription Protocol However, since the entire traffic

208 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 VeinSecured protects compared to the typical duration subscribers (i.e., potential of flooding attacks, and so only victims) based on defined rules. A the current route is considered VeinSecured rule matches a for building the rings. pattern of IP packets. Generally, this corresponds to an IP Fig.3. VeinSecured with two subnetwork or a single IP cusomers: C1 and C2. address. However, the rule III. VeinSecured SYSTEM definition can include any other VeinSecured Components monitorable information that can The VeinSecured system is be monitored, such as the composed of several collaborating protocols or the ports used. IPSs each enriched with the C. Multiple Customers following components. Because of their inherent 1) Packet Processor: The packet complete independence, processor examines traffic and VeinSecured allows the updates elementary metrics coexistence of multiple virtual (counters and frequencies) protection rings for multiple whenever a rule is matched. customers across the same set of 2) Metrics Manager: The IPSs. Therefore, a single IPS may metrics manager computes act at different levels with respect entropies and relative entropies. the customers it protects as 3) Selection Manager: The epicted in Fig. 3. Although most detection_window_ended event of the figures in this paper (Fig. 1) is processed by the represent overlay networks with selection manager, which checks a single route, from an ISP to a whether the traffic during the customer, this figure highlights elapsed detection window was that alternative paths are within profile. It does so by possible. However, as discussed in checking whether the traffic the previous section, the rings are distribution represented by dependent of the routing at a frequencies follows the profile. certain time, which is quite stable

209 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 4) Score Manager: The score frequency is low. However, since manager assigns a score to each it is increasing and deviates from of the selected rules depending on the profile (first selection by the their frequencies and the entropy. selection manager) [(5) and (6)], The entropy and the frequency it may surpass other frequencies are considered high if they are later on in time. respectively greater than a d) Low entropy and Low rule threshold alpha and beta. frequency: This case includes a) High entropy and High rule both high and low frequencies frequency: In this case, the traffic because of the low entropy. Thus, is well distributed, meaning that it is not possible to conclude most rules have about the same about any threat. frequency (they cannot be all high 5) Collaboration Manager: The as the sum is one). Hence, having collaboration manager is the last one rule that is quite different component in charge of from the others is a good sign confirming potential attacks. We that it is a potential attack. claim that detecting a flooding b) Low entropy and High rule attack can be confirmed only if frequency: In this case, the attack the traffic it generates is higher is only potential, but not as much than the customer’s capacity. as when the entropy is high. Hence, the IPS where the alert is triggered has to initiate a ringlevel communication to calculate the average traffic throughput for subsequent comparison with the subscriber’s capacity. c) High entropy and Low rule frequency: This case represents a IV. VeinSecured ATTACK potential threat. Here, all DETECTION ALGORITHMS frequencies are about the same, For each selected ri, the making it not a threat as the collaboration manager computes

210 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 the corresponding packet rate 17: next I P S.checkRule using rule frequencies and the (myID, i, 0, capi) overall bandwidth (bwm) 18: end if consumed during the last detection window. If the rate is When an IPS receives a request higher than the rule capacity to calculate the aggregate packet capi, an alert is raised. Otherwise, rate for a given rule, it first the computed rate is sent to the checks if it was the initiator. In next IPS on the ring this case, it deduces that the request has already made the (Algorithm 1). round of the ring, and hence Algorithm 1: checkRule (IPS_id, there is no potential attack. i ,ratei, capi ) Otherwise, it calculates the new

1: if bi ^ (IPS_id not equal to rate by adding in its own rate and null) then checking if the maximum capacity 2: if IPS_id==myID then is reached, in which case an alert is raised. Otherwise, the 3: bi=false; 4: return investigation is delegated to the 5: else next horizontal IPS on the ring. Algorithm 1 shows the details of 6: ratei <- ratei + Fi this procedure. It is initially 7: if ratei > capi then called with an empty IPSid. The 8: bi=false; 9: raise DDOS alert; first IPS fills it and sets the 10: return Boolean bi to true (line 16).bi is 11: else reset after the computation 12: next I P finishes, i.e., when the request has made the round of the ring or S.checkRule (IPS_id, i, ratei, capi ) when the alert is triggered. With 13: end if simple adjustments, ring 14: end if traversal overhead can further be 15: else reduced if several suspect rules are investigated in one pass. 16: bi=true;

211 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 V. MITIGATION 2: ips.mitigate(ri,,False) A. Mitigation Shields 3: end for When an attack is detected, 4: for all a belongs to getAddr (ri VeinSecured rings form ) do protection shields around the 5: block_IPs(a) victim. In order to block the 6: end for attack as close as possible to its 7: if firstRing = True then source(s), the IPS that detects the 8: nextIPS. mitigate (ri, attack informs its upper-ring True) IPSs (upstream IPSs), which in 9: end if turn apply the vertical 10: setCautiousMode(ri) communication process and enforce the protection at their This process entails the ring level (Algorithm 2). To potential blocking of benign extend the mitigation, the IPS addresses. However, this is a that detects the attack informs temporary cost that is difficult to also its peer IPSs on the same avoid if a flooding attack is to be ring to block traffic related to the stopped. For this, after the corresponding rule. This is done detection and mitigation of an by forwarding the information in attack against some host, the same manner as done by the VeinSecured continues the collaboration manager (Algorithm detection process looking for 1). Only traffic from suspected some additional attack sources. sources (i.e., triggered some rule Furthermore, in order to limit the ri) is blocked as shown in Fig. 7. effect of potentially additional This is performed by the attack sources, after the blocking block_IPs function in Algorithm period elapses, the IPS may 2, line 5. activate a cautious mode phase

Algorithm 2: mitigate (ri, wherein a rate limitation of firstRing) packets corresponding to the 1: for all ips belongs to triggered rule is applied. upstreamIPSs do

212 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 The actual duration of the Spam, meaning that they are blocking and caution period probably zombie computers. depends on the aggressiveness of Nonassigned IP addresses or the attack, i.e., on the difference abnormal source IP addresses between the observed packet rate (multicast, private addresses,…) ratei and the host capacity capi. could be also a starting point of B. Careful Mitigation such blacklisting. This section gives an overview VI. EVALUATION of common techniques to improve The objective of the attack mitigation by blocking experiments is to evaluate the only attacks-related IP sources. accuracy of VeinSecured in Only those associated to high different configurations. packet rates or that have opened Furthermore, the robustness of most of the sessions recently VeinSecured is evaluated in might be blocked like in. abnormal situations such as the Moreover, identifying not-yet- existence of non co operative seen IP addresses is another way routers or configuration errors. to detect the potential spoofed A. Simulations addresses or zombies used to Although obtaining real router perform a DDoS attack. The traces is possible, getting authors in propose other synchronized traffic and host heuristics based on the difference states of a real network along between incoming and outgoing with its detailed topology is quite traffic. A solution could be to difficult for security, privacy, and capture all traffic associated with legal reasons. Thus, we mainly a triggered alert by the score used a simulation-based approach manager and use signatures to for the evaluation of the clearly identify an attack. VeinSecured system. Furthermore, a general blacklist B. Metrics can be imported from external The true positive rate (TPR) databases, like SpamHaus, which measures the proportion of stores IP addresses related to rightly detected attacks. The false

213 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 positives (FPs) counter Fig.4. False positives reduction represents the amount of benign according to manager activity. traffic wrongly flagged as Fig.4 plots the relative number malicious. As previously of FPs compared to the value if described, horizontal no system is used. The first value communication discards all of represents the results when both them by computing the real the selection and score managers packet rates. However, the are enabled. The second value is number of rules to analyze the when only the selection manager traffic has to be as low as is enabled. is fixed to have a possible, and so we will consider detection rate higher than 0.9. the misselected rules as false The selection manager reduces positives. From a practical point, the number of FPs by more than this corresponds to taking the 50%, whereas the score manager output of the score manager is generally less efficient. (Section III) as the final result. However, it can be noticed that In VeinSecured, an alert 49 FPs are avoided when a five- pertains to rules and may only be rings shield is used. The generated following the elapse of reduction of false alerts is more a detection window. Thus, both important for simulations with a the TPR (in proportion) and the lower number of virtual rings FPs (absolute value) are D. Percentage of Collaborative computed on a time-window Routers basis. VeinSecured effectiveness relies C. Efficiency of the MultiLevel on the collaboration between Approach different IPSs. Since a real deployment of such a system is expected to be incremental, we provide in here a way to check its performance when only few routers support it. A router that does not support VeinSecured is

214 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 referred to as noncollaborative. difference to host base IDS they We study two types of do not see the impact of attack. noncollaborative routers. The Being offered as an added value first are routers that cannot service to customers, the perform detection but can accounting for VeinSecured is forward score packets to therefore facilitated, which downstream routers. An operator represents a good incentive for its could use this type of router to deployment by ISPs. As a future test VeinSecured on only few work, we plan to extend routers while still ensuring the VeinSecured to support different IPS collaboration. The second IPS rule structures. type of routers acts as black holes and does not forward score REFERENCES packets. This can be due to [1] M. Vallentin, R. Sommer, J. software or hardware limitations Lee, C. Leres, V. Paxson, and or the fact that the routers have B. Tierney, “The NIDS cluster: Scalable, stateful been compromised in preparation network intrusion detection for a future attack. on commodity hardware,” in VII. CONCLUSION AND Proc. 10th RAID, Sep. 2007, FUTURE WORKS pp. 107–126. The core of the VeinSecured [2] G. Badishi, A. Herzberg, and system is composed of IPSs I. Keidar, “Keeping denial-of- located at the (ISPs) level. The service attackers in the IPSs form virtual protection rings dark,” IEEE Trans. Depend. around the hosts to defend and Secure Comput., vol. 4, no. 3, collaborate by exchanging pp. 191–204, Jul.–Sep. 2007. [3] A. Yaar, A. Perrig, and D. selected traffic information. A Song, “SIFF: A stateless real dataset is present, showing internet flow filter to mitigate effectiveness and low overhead. DDoS flooding attacks,” in Mostly they have to operate on Proc. IEEE Symp. Security encrypted packet where analysis Privacy, May 2004, pp.130– of packets is complicated. As a 143.

215 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

[4] D.Nashat, X. Jiang, “Adaptive early packet andS.Horiguchi, “Router filtering for defending based detection for lowrate firewalls against DoS attacks agents of ddos attack,” in ,” in Proc. IEEE INFOCOM, Proc. HSPR, May 2008, pp. Apr. 2009, pp. 2437–2445. 177–182. [10] H. Hamed, A. El-Atawy, and [5] H. Wang, D. Zhang, and K. E. Al-Shaer, “Adaptive Shin, “Change-point statistical optimization monitoring for the detection techniques for firewall of DoS attacks,” IEEE Trans. packet filtering,” in Proc. Depend. Secure Comput., vol. IEEE INFOCOM, Apr. 2006, 1, no. 4, pp. 193–208, Oct.– pp. 1–12. Dec. 2004. [11] A. El-Atawy, T. Samak, E. Al- [6] P. Verkaik, O. Spatscheck, J. Shaer, and H. Li, “Using Van der Merwe, and A. C. online traffic statistical Snoeren, “Primed: matching for optimizing Community-of-interest-based packet filtering performance,” DDoS mitigation,” in Proc. in Proc. IEEE INFOCOM, ACM SIGCOMM LSAD, 2006, May 2007, pp. 866–874. pp. 147–154. [12] D. Das, U. Sharma, and D. K. [7] G. Koutepas, F. Bhattacharyya, “Detection of Stamatelopoulos, and B. HTTP flooding attacks in Maglaris, “Distributed multiple scenarios,” in Proc. management architecture ACM Int. Conf. Commun., for cooperative detection and Comput. Security, 2011, pp. reaction to DDoS attacks,” J. 517–522. Netw. Syst. Manage., vol. 12, [13] H. Liu, “A new form of DOS pp. 73–94, Mar. 2004. attack in a cloud and its [8] I. B.Mopari, S. G. Pukale, and avoidance mechanism,” in M. L. Dhore, “Detection of Proc. ACM Workshop DDoS attack and defense Cloud Comput. Security, against IP spoofing,” in Proc. 2010, pp. 65–76. ACM ICAC3, 2009, pp. 489– [14] A. Sardana, R. Joshi, and T. 493. hoon Kim, “Deciding optimal [9] A. El-Atawy, E. Al-Shaer, T. entropic thresholds to Tran, and R. Boutaba, calibrate the detection

216 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

mechanism for variable rate DDoS attacks in ISP domain,” in Proc. ISA, Apr. 2008, pp. 270–275. [15] B. Gupta, M. Misra, and R. Joshi, “FVBA: A combined statistical approach for low rate degrading and high bandwidth disruptive DDoS attacks detection in ISP domain,” in Proc. 16th IEEE ICON, Dec. 2008, pp. 1–4. [16] J. L. Berral, N. Poggi, J. Alonso, R. Gavaldà,, J. Torres, and M. Parashar, “Adaptive distributed mechanism against flooding network attacks based on machine learning,” in Proc. ACM Workshop Artif. Intell. Security, 2008, pp. 43–50.

217 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 AN APOTHEOSIS EXTRACTION APPROACH BY GENETIC PROGRAMMING Swati Shahi#1, Priyanka Tidke#2, Shital Vanis#3, Jayshree Sangale#4 Sir Visvesvaraya Institute of Technology, Nashik#1, 2, 3, 4

Abstract— World Wide Web is a operations which are selection, tremendous source for digital crossover and mutation. All three libraries and e-commerce operations such as selection, websites and this digital libraries crossover and mutation are and e-commerce websites contain performed in internal database. many duplicate data. Due to this When it is executed then genetic duplication, users from different programming applies the de- location are unable to get duplication function. This appropriate results from a search execution removes the duplicated engine. Previously, many records and the system apply the techniques are available for suggested function. When the removing such duplications but suggested function work properly they require more space and time then it contain less complexity and after then, they computational resources provide alignment based results utilization in implementation. and also previously approaches Compare to all previous does not contain any quality approaches, present approach data. provides less burden, efficient For increasing the digital and accurate results. libraries data quality, we are Keywords— De-duplication, going to implement a new Evolutionary Programming, approach in present system. This TSIMMIS, Web OQL new approach, we call as a I. INTRODUCTION genetic programming. As we know that extracting the This Genetic programming structural data from any dynamic contains mainly three internal Web page is not easy because

218 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 their internal structure are data duplication, memory and unknown to the system and space is wasted and hence database. We have previously performance decreases and many approaches to solve this computational cost increases type of problem but all the automatically so here, our main approaches have some demerits purpose is to develop an due to multiple dependencies. apotheosis extraction approach We also know that WWW has which should be used to remove more and more online databases duplicated data and display the and the number of databases is quality based effective results. increasing day by day hence data As the name from genetic duplication is occurring very fast. programming, this approach When any query is submitted to works naturally and detect the databases then it retrieves the repeated data. First, the information from that database evaluation is done by assigning and then extract but as the an individual a value that number of data is increasing measures how suitable that rapidly in database, the previous individual is to the proposed approaches has not been updated problem. In our GP experimental so it become very hard to detect environment, individuals are duplicate data and extract with evaluated on how well they learn non-duplicated data in effective to predict good answers to a manner. given problem, using the set of When multiple data is uploaded functions and terminals in the database from some other available. The resulting value is location then it may have chances also called raw fitness and the about duplicate data. At a time, evaluation functions are called there may be multiple users who fitness functions. After the uses same type of links. All the evaluation step, each solution has Links having some contents. a fitness value that measures When link is duplicated more how good or bad it is to the given than one time then due to this problem. Thus, by using this

219 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 value, it is possible to select To remove these problems, it is which individuals should be in necessary to design a de- the next generation. Strategies duplication function that for this selection may involve combines the information very simple or complex available in the data repositories techniques, varying from just in order to identify whether a selecting the best n individuals to pair of record entries refers to randomly selecting the the same real-world entity. individuals proportionally to In the realm of bibliographic their fitness. citations, for instance, this So, in this paper, we will present problem was extensively all the survey, architecture, discussed by Lawrence et al. internal and external works. They propose a number of II. LITERATURE SURVEY algorithms for matching citations Using Genetic Programming, from different sources based on Removing the duplicated data is edit distance, word matching, a vast and interesting topic in the phrase matching, and subfield field of Data Mining. Today, this extraction. problem arises mainly when data As more strategies for extracting are collected from many different disparate pieces of evidence sources using different become available, many works information description styles have proposed new distinct and metadata standards. Other approaches to combine and use common place for replicas is them. Elmagarmid et al. classify found in data repositories created these approaches into the from OCR documents. These following two categories: 1) Ad- situations can lead to Hoc or Domain Knowledge inconsistencies that may affect Approaches—this category many systems such as those that includes approaches that usually depend on searching and mining depend on specific domain tasks. knowledge or specific string distance metrics. Techniques

220 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 that make use of declarative decision trees. This system languages can be also classified differs from the others in the in this category; 2) Training- sense that it tries to reduce the based Approaches—This category amount of necessary training, includes all approaches that relying on user-provided depend on some sort of information about the most training—supervised or semi- relevant cases for training. supervised—in order to identify Before Marlin, this system was the replicas. Probabilistic and the state-of-the-art solution for machine learning approaches fall the problem. into this category. An approach distinct from the Next, we briefly comment on previous ones is presented in. some works based on these two The main idea is to generate approaches (domain knowledge individual rankings for each field and training-based), particularly based on generated similarity those that exploit the domain scores. knowledge and those that are The distance between these based on probabilistic and rankings is calculated by using machine learning techniques, the well-known Spearman’s which are the ones more related Footrule metric, which is to our work. minimized by a modified version Active Atlas is a system whose of the Hungarian Algorithm main goal is to learn rules for specifically tailored to this mapping records from two problem by the authors. Then, a distinct files in order to establish merge algorithm based on a score relationships among them. scheme is applied to the resulting During the learning phase, the rankings. At the end of this mapping rule and the process, the top records in this transformation weights are global ranking are considered to defined. The process of be the most similar to the input combining the transformation record. Notice that this approach weights is executed using requires no training.

221 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Unfortunately, the experiments free the user from the burden of conducted do not evaluate the having to select the similarity quality of the global ranking with function to use with each respect to the record matching attribute required for the de- effectiveness. duplication task and tune the In this project, we propose a GP- replica identification boundary based approach to improve accordingly. results produced by the Fellegi 2.1 History and Sunter’s method. Detection of Duplicated Particularly, we use GP to records is the process of balance the weight vectors identifying different or multiple produced by that statistical records that refer to one unique method, in order to generate a real-world entity or object. better evidence combination than Typically, the process of the simple summation used by it. duplicate detection is preceded by In comparison with our previous a data preparation stage, during results, this paper presents a which data entries are stored in a more general and improved GP- uniform manner in the database, based approach for de- resolving (at least partially) the duplication, which is able to structural heterogeneity automatically generate effective problem. The data preparation de-duplication functions even stage includes a parsing, a data when a suitable similarity transformation, and a function for each record attribute standardization step. The is not provided in advance. In approaches that deal with data addition, it also adapts the preparation are also described suggested functions to changes under the using the term ETL on the replica identification (Extraction, Transformation, boundary values used to classify Loading). These steps improve a pair of records as replicas or the quality of the in-flow data not. These two characteristics are and the data comparable and extremely important since they more usable. While data

222 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 preparation is not the focus of words, this type of conversion this survey, for completeness we focuses on manipulating one field describe briefly the tasks at a time, without taking into performed in that stage. A account the values in the related comprehensive collection of field. The most common form of a papers related to various data simple transformation is the transformation approaches can conversion of a data element be found in. Parsing is the first from one data type to another. critical component in the data Such a data type conversion is preparation stage. Parsing usually required when a legacy or locates, identifies and isolates parent application stored data in individual data elements in the a data type that makes sense source _les. Parsing makes it within the context of the original easier to correct, standardize, application, but not in a newly and match data because it allows developed or subsequent system. the comparison of individual Renaming of a field from one components, rather than of long name to another is considered complex strings of data. For data transformation as well. example, the appropriate parsing Encoded values in operational of name and address components systems and in external data is into consistent packets of another problem that is information is a crucial part in addressed at this stage. These the data cleaning process. values should be converted to Multiple parsing methods have their decoded equivalents, so been proposed recently in the records from different sources literature and the area continues can be compared in a uniform to be an active of research. Data manner. Range checking is yet transformation refers to simple another kind of data conversions that can be applied transformation which involves to the data in order for them to examining data in a field to conform to the data types of their ensure that it falls within the corresponding domains. In other expected range, usually a

223 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 numeric or date range. Lastly, many different ways. Date and dependency checking is slightly time formatting and name and more involved since it requires title formatting pose other comparing the value in a standardization in a database. particular field to the values in Typically, when operational another field, to ensure a applications are designed and minimal level of consistency in constructed, there is very little the data. uniform handling of date and 2.2 Purpose time formats across applications. Data standardization Because most operational refers to the process of environments have many standardizing the information different formats for represented in certain fields to a representing dates and times, content format. This is used for there is a need to transform dates information that can be stored in and times into a standardized many different ways in various format. Name standardization data sources and must be identifies components such as converted to a uniform names, last names, and title and representation before the middle initials and records duplicate detection process everything using some starts. Without standardization, standardized convention. Data many duplicate entries could standardization is a rather erroneously be designated as inexpensive step that can lead to non-duplicates, based on the fact fast identification of duplicates. that common identifying For example, if the only information cannot be compared. difference between two records is One of the most common the differently recorded address standardization applications (44 West Fourth Street vs. 44 involves address information. W4th St.), then the data There is no one standardized way standardization step would make to capture addresses so the same the two records identical, address can be represented in alleviating the need for more

224 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 expensive approximate matching new administrator was appointed approaches that we describe in and he maintained that website the later sections. using html version 4.0 and now a III. EXISTING SYSTEM time, some different Actually, the main problem in the administrator is maintaining that existing system is: website by using some other web  Data Complexity: page programming language such Due to increment of as XHTML and XML. Will data online databases day by day, a be extracted effectively? Of huge amount of data exist in course no. so this is a problem of www so from this data web page programming language complexity, previous and version related dependency. approaches ( such as vision  Scripting Dependency: based approach, page level All the previously most extraction, TSIMMIS, Web work have not considered about OQL) do not work effectively script such as java script, VB because they are low efficient script and CSS so extraction may and time consuming fail if we will extract scripting approach. related page.  Web page programming  Problem with Record language and version Duplication: Dependency: One of the most We have all previous important thing is that when approaches related to web page data is uploaded from extraction that was html different location then there dependent. Let’s take an example may be a chance of data of any website such as Mumbai duplication. If we consider University. Two years ago, there any digital library website was some one administrator who such as Google, Yahoo, was maintaining that website Microsoft or any website then and he was using html and there exist so many unwanted version was 3.0. After one year, a data. One data repeats so

225 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 many time hence so many duplication based result space is wasted. Due to this, approach. processing time is very high Advantages: when we submit any query in a. It can remove the maximized the database. duplicate based results here. Ofcousrse, Due to above problem, b. Provides the effective results the main disadvantages are with the help of machine operational and computational learning techniques. cost after then useless data c. Gives the best evidence display and hence high time based results. processing. V. FUNCTIONAL REQUIREMENTS IV. PROPOSED SYSTEM These are the functional This apotheosis extraction requirements: approach introduce the genetic a. Admin can upload the programming. Here, in two or data in multiple database. more number of repositories, b. User can search the data same data record is present or in multiple database. not is checked with record de- c. User can view the duplication function. It remove duplicated data. duplicate data and provides the d. Apply GP Based approach data as a small dataset content techniques and Find information. After identification without Duplicate Data. of unique records of content e. Both Users and Admin generate the patterns. Those can calculate the patterns are arranges the data Searching Time. with the help of suggested f. Admin can view user function here. Suggested function information and he can for arranging the result uses only change the password. arithmetic functions VI. NON FUNCTIONAL environment. It can give good REQUIREMENTS effective content arrangement 6.1 Performance result. This is the effective de- Requirements

226 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 This system has been developed The apache tomcat is used as the in the high level languages and web server and windows XP using the advanced front-end and Professional is used as the back-end technologies hence it platform. give response to the end user VII. EXTERNAL INTERFACE within very less time. REQUIREMENTS 6.2 Safety Requirements 7.1 User Interfaces This system is safe for extracting In this project, User Interface if anything goes wrong like power will be HTML, CSS and Tomcat failure, Network Server. Communications. 7.2 Hardware Interfaces 6.3 Usability Requirements Processor : Pentium IV The system is designed with fully Hard Disk : 40GB automated process hence there is RAM : 512MB or more less user intervention. 7.3 Software Interfaces 6.4 Reliability Requirements Operating System : The system is more reliable Windows because of the qualities that are Programming Language: Java inherited from the chosen Web Applications : JDBC, platform java. The code built by Servlets, JSP using java is more reliable. IDE/Workbench : My Eclipse 6.5 Supportability 6.0 Requirements Database : Oracle 10g The system is designed to be the VIII. GENETIC PROGRAMMING cross platform supportable. The We have used genetic system is supported on a wide programming in this extraction range of hardware and any approach. As the name from software platform. genetic; data will be automatically selected and 6.6 Implementation duplicated data will be detected Requirements with the help of relevant programming.

227 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Genetic Programming is one of representing a single solution to the evolutionary programming a given problem. In our modeling, technique which having the the trees represent arithmetic properties of natural selection, functions, as illustrated in Fig. 1. crossover and mutation. When using this tree The main aspect that distinguish representation in a GP-based genetic programming from other method, a set of terminals and evolutionary technique is that it functions should be defined. represents the concept and Terminals are inputs, constants interpretation of a problem as a or zero argument nodes that computer program and even the terminate a branch of a tree. data are viewed and manipulated They are also called tree leaves. in this way. The function set is the collection This genetic programming is able of operators, statements, and to discover the variable and basic or user-defined functions relationship with each other and that can be used by the GP find the correct functional form. evolutionary process to It having mainly three operation manipulate the terminal values. such as selection, crossover and These functions are placed in the mutation. All the operation has internal nodes of the tree, as been included in the algorithm. illustrated in Fig. 1. During the IX. SYSTEM ARCHITECTURE evolutionary process, the This is all about System individuals are handled and Architecture for this project. modified by genetic operations such as reproduction, crossover, and mutation, in an iterative way that is expected to spawn better individuals (solutions to the Figure 1: Selection proposed problem) in the Usually, GP evolves a population subsequent generations. of length-free data structures, Reproduction is the operation also called individuals, each one that copies individuals without

228 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 modifying them. Usually, this between the parents, as operator is used to implement an illustrated in Fig. 2. elitist strategy that is adopted to keep the genetic code of the fittest individuals across the changes in the generations. If a good individual is found in earlier generations, it will not be lost during the evolutionary process Figure 3: Mutation Finally, the mutation operation has the role of keeping a minimum diversity level of individuals in the population, thus avoiding premature convergence. Every solution tree resulting from the crossover operation has an equal chance of suffering a mutation process. In a Figure 2: Crossover GP tree representation, a The crossover operation allows random node is selected and the genetic content (e.g. sub trees) corresponding sub tree is exchange between two parents, replaced by a new randomly in a process that can generate created sub tree, as illustrated in two or more children. In a GP Fig. 3. evolutionary process, two parent X. MODULES DESCRIPTION trees are selected according to a Following are the modules of the matching (or pairing) policy and, Project: then, a random sub tree is 1. Procedure of Genetic selected in each parent. Child Programming trees are the result from the 2. Genetic operations swap of the selected sub trees

229 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

3. Generational Evolutionary leaf nodes for extraction of algorithm results. In this procedure, all 4. Record De-duplication internal nodes we find out here 5. Precision and Recall in implementation part. These operations internal nodes identification and create the structure possible with 10.1 Procedure of Genetic three operations here. Those are Programming: called selection, crossover and Multiple User sends the mutation. query and extracts results from 10.3 Generational the search engine. Under Evolutionary Algorithm: extraction of results, the This Algorithm initialize operation which apply first is all the results of nodes. Each and selection operation. This every node of rating we are selection performs in different calculates here. According to databases and extracts the rating value calculates fitness. results with interactive query After finding the fitness node it’s processing. It does not provide possible for creates the any optimal solution. These reproduced tree in results contains some duplicates. implementation. It can contains It display the nearly optimal all nodes are best. This same solution results only. process applies till for finding the optimal tree identification. This 10.2 Genetic Operations: same procedure repeatedly This module has been performs here. developed to provide the 10.4 Record De-duplication: structure based results. Here, According to requirement first thing is selection of root automatically it can changes here terminals. This is zero level in implementation. It can show results. Next, we find out next the efficient results in tree data level of children’s. This structure in implementation. It is procedure applies till reaches to the good evidence based results

230 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 display. All those nodes are next generation population display with the help of similarity without modifying the data. function in implementation  After then Crossover process. operation is performed in 10.5 Precision and Recall which m individual that will operations: compose the next generation These two operations are with the best parent is performed only for correctly selected and replace the identified duplicated data. existing generation i.e. in this P = Number of Correctly process two parent tree are Identified Duplicated pairs / selected according to Number of Identified Duplicated matching policy and then a pairs random sub tree is selected in R = Number of Correctly each parent. Identified Duplicated pairs /  And finally Mutation Number of True Duplicated operation is performed in pairs. which the best individual are XI. ALGORITHM USED produced in the population. Algorithm: Generational XII. FUTURE WORKS Evolutionary Algorithm As future work, there are many  The first thing is that all the things from which we can gathered data is initialized enhance this system. For from which the data is accomplishing this, we plan discovered. experiments with data sets from  After then all the individual other domains. More specifically, data are evaluated and a we intend to investigate in which numeric fitness value is situations our proposed GP- assigned to each one. approach would not be the most adequate to use. Since record de-  Selection process is performed in which all the n duplication is a very expensive individual are selected into and computationally demanding task, it is important to know in

231 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 which cases our approach would  It frees from the user from not be the most suitable option. the burden of choosing the In addition, we intend to improve replica identification the efficiency of the GP training boundary value since it is able phase by selecting the most to automatically select the de- representative examples for duplication function. training. By doing so, we can  Independent of Web Page minimize the training effort Programming language, required by our GP-based version and scripting related approach without affecting the extraction. quality of the final solution. Acknowledgement XIII. CONCLUSIONS We would like to thank Prof. S. This Apotheosis extraction M. Rokade, Head of Computer approach is able to automatically Engineering Department, Sir suggest de-duplication functions Visvesvaraya Institute of based on evidence present in the Technology, Nasik and our data repositories. The suggested college project guide Prof. M. M. functions properly combine the Naoghare and Mr. N V V best evidence available in order Satyanarayana, Sr. Software to identify whether two or more Engineer, Verus IT Services Pvt., distinct record entries are Ltd., Secunderabad, for both replicas or not. Following are the valuable guidance and moral conclusion of this project: support, without which this  It perform based on machine paper would not have been learning method. possible. We would also like to  It provides solution less thank all other people who have intensive. worked earlier on this similar  It frees the user from the topic as their work helped us a burden of choosing how to lot. combine similarity functions and repository attributes.

232 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 REFERENCES Management of Data, pp. 802- [1]. H. Zhao, W. Meng, Z. Wu, and 803, 2006. C. Yu, “Automatic Extraction [7]. R. Bell and F. Dravis, “Is You of Dynamic Record Sections Data Dirty? And Does that from Search Engine Result Matter?” Accenture Whiter Pages,” Proc. 32nd Int’1 Conf. Paper, Very Large data Bases (VLDB), http://www.accenture.com, 2005. 2006. [2]. V. Crescenzi, P. Merialdo, and [8]. J.R. Koza, Gentic Programming: P. Missier, “Clustering Web On the Programming of Pages Based on Their Computers by Means of Structure,” Data and Natural Selection. MIT Press, Knowledge Eng., vol.54, pp. 1992. 279-299, 2005. [3]. B. Liu, R.L. Grossman, and Y. Zhai, “Mining Data Records in Web Pages,” Proc. Int’l Conf. Knowledge Discovery and Data Mining (KDD), pp. 601-606, 2003. [4]. K. Simon and G. Lausen, “ViPER: Augmenting Automatic Information Extraction with Visual Perceptions,” Proc. Conf. Information and Knowledge Management (CIKM), pp. 381- 388, 2005. [5]. M. Wheatley, “Operation Clean Data”, CIO Asia Magazine. [6]. N. Koudas, S. Sarawagi and D. Srivastava, “Record Linkage: Similarity Measures and Algorithms”, Proc. ACM SIGMOD Int’1 Conf.

233 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 CONSTRICTION BASED PARTICLE FILTER TO DENOISE VIDEO AND TRACE MULTIPLE Moving Objects A. R. Potdar1, V. K. Shrivastava2 1 Student M-Tech IT ITM, Bhilwara, Rajasthan. 2 HOD & Assistant Professor M-Tech IT ITM, Bhilwara, Rajasthan. [email protected] [email protected]

Abstract— The video downside is developed as sequences square measure ordered algorithmic typically corrupted by noise estimation having associate throughout acquisition and estimate of the chance process. The noise degrades distribution of the target the visual quality and within the previous frame, conjointly affects the potency the matter is to estimate the of more process like target distribution within the compression, segmentation new frame victimization all etc... Hence, it becomes vital out there previous data and to get rid of the noise also the newinfo brought by whereas protective the initial the new frame. The Kalman video contents. pursuit filter provides a good moving objects in video resolution to the linear- sequences could be a central Gaussian filtering do wnside. concern in laptop vision. However, wherever there s Reliable visual pursuit is nonlinearity, either within indispensable in several the model specification or rising vision applications like the observation method, automatic video police different ways is needed. we investigation, human– have a tendency to computer interfaces and contemplate ways familiar artificial intelligence. generically as particle filters, historically, the pursuit that embody the Constriction

234 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 rule and also the theorem image quality. we have a bootstrap orsampling tendency to target video importance resampling (SIR) denoising during this paper filter. during this paper we Our goal is to attain AN have a tendency to propose a economical, adaptive and high- particle filter for economical quality video denoising rule that video denoising in addition may effectively take away real, on track real time visual structured noise introduced by objects. As particle filter has low-end camcorders and digital several algorithms, we have a cameras. not like artificial, tendency to square measure additive noise, the noise in real employing a Constriction cameras will have sturdy special c approach to implement the orrelations. This structured noise planned system. will have many alternative Keywords- Bayesian causes, as well as the bootstrap, Constriction Video demosaicing method in CCD denoising, Kalman Filter, camera. we discover Linear-Gaussian Filter that pc vision analysis and nonlinearity, Object techniques arhelpful in Tracking, Particle Filters, addressing these noise issues. I. INTRODUCTION The particle filter (PF), a Image numerical quality sweetening may be technique thatpermits finding A a long-standing space of analysis. Napproximate resolution tothe su As low-end imaging ccessive estimation, has devices, like web-cams and cell been with success employedin phones, become omnipresent, several targetfollowing issues an there's ever a lot of want for dvisual following issues. reliable digital image and Its success, compared with video sweetening technologies to the Kalman filter, will boost their outputs. Noise is be explained by its capability dominant issue that degrades to deal with multi-modal

235 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 measurements densities and non- area unit assumed to be offered linear observation models. In at distinct times. For dynamic visual following, multi-modality state estimation, the discrete- of the measuredensity is time approach is widespread and extremely frequent attributable convenient. The state-space to the presence of distractors – approach to time-series modelling scene parts that encompasses focuses attention on the state a similarlook to the target. The vector of a system the state observation model, that relates vector contains all relevant info the state vector to the needed to explain the system measurements, is non-linear as a below investigation.In order to result of image information (very analyse and build illation a few redundant) undergoes feature dynamic system, a minimum of 2 extraction, a extremely non- models area unit required: initial, linear operation. a model describing the evolution 1.1Non-Linear/Non-Gaussian of the state with time (the system Particle Filter model) and, second, a model Many issues in science relating the crying measures to need estimation of the state of a the state (the measurement system that changes over time model). employing a sequence of crying 1.2 Non-Local Means Approach measurements created on the Nonlocal-means (NL-means) is a system. during this paper, we'll picture denoising technique that target the state-space approach replaces every constituent by a to modelling dynamic systems, weighted average of all the pixels and also the focus are going to be within the image. sadly, the on the discrete-time formulation strategy needs the computation of the matter of the coefficient terms for all Thus, distinction doable pairs of pixels, creating it equations area unit wont to computationally expensive . model the evolution of the system Some short-cuts assign a weight with time, and measurements of zero to any constituent pairs

236 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 whose neighbourhood averages 1. Prediction – Predicts where area unit too dissimilar.Noise object should be filtering is one amongst the 2. Measurement – Observe where foremost vital operations object went employed in image and video 3. Assimilation – Update object process. The importance of noise model by combining two filtering in dynamic pictures is 2.1 System Model consistently growing as a result 1. Object modelled by a state of the increasing use of tv and vector, X and a set of equations, video systems in client, called the system model commercial, medical and 2. The state is a time-dependent communication applications. For vector Xt of system variables a video sequence, the common 3. The system model is a vector degradation model may be equation describing the evolution expressed as Yt=noise+Vermont of the state in time Where noise and Yt denote the initial and determined image 2.2 Measurement Model frames at time t, Vermont is 1. At each time step a Associate in Nursing additive measurement, Zt of the state random noise with Gaussian vector is taken. chance distribution, Vt∼N (0, 2. Assume a linear relationship Ct−1). the aim of denoising between measurements and the formula is to estimate the initial system state frame noise from crying frame zt = Htxt + μt Yt. Ht is the measurement matrix 2. Kalman Filtering Algorithm μt is a random vector The basic idea behind Kalman modelling additive noise filter is to model object that is being tracked. At each time 2.3 Limitations of Kalman frame Kalman filter performs Filtering Approach following steps 1. Kalman filtering is linear filtering algorithm

237 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 2. It is Unimodel in nature , In this [5] paper the Gaussian probability distribution logical thinking of 3D human 3. Object state often Non- motion could be a natural Gaussian application, given the nonlinear 4. Sensitive to “Background dynamics of the body and Clutter” therefore the nonlinear relation between states and image II. REVIEW OF LITERATURE observations ar conferred. These The researchers [1] papers describe a filter that uses develop a brand new live, the hybrid town (HMC) to get tactic noise, to guage and samples in high dimensional compare the performance of areas. It uses multiple Andrei digital image denoising ways. we Markov chains that use posterior tend to initial calculate and gradients to apace explore the analyze this methodology noise state area, yielding honest for a large category of denoising samples from the posterior. algorithms, specifically the native This work represents [6] ways smoothing filters. Second, we legendary generically as particle tend to propose a brand new filters, that embrace the algorithmic rule, the nonlocal Constriction algorithmic rule and suggests that (NL-means), therefore the Bayesian bootstrap supported a nonlocal averaging of or sampling importance all pixels within the resampling (SIR) filter. These image.Researchers [2] develop filters represent the posterior paper particle filter algorithmic distribution of the state variables rule is projected to trace the by a system of particles that thing before find in noise evolves and adapts recursively as condition and detection relies new info becomes accessible. In upon the likelihood density of the apply, massive numbers of state given these past particles could also be needed to measurements. supply adequate approximations and certainly applications, once a

238 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 sequence of updates, the particle object, and this constitutes a system can usually collapse to bottleneck to the employment of one purpose. This paper particle filtering in period of time introduce a technique of systems thanks to the big-ticket observance the potency of those computations needed. so as to filters, that provides a trace moving objects in period of straightforward quantitative time directly and loss of image assessment of sample sequences, a particle filter impoverishment and show a way algorithmic rule specifically to construct improved particle designed for a circuit and filters that ar each structurally therefore the circuit of the thing economical, in terms of following algorithmic rule preventing the collapse of the mistreatment the particle filter particle system and ar projected. This circuit is computationally economical in intended by VHDL (VHSIC their implementation hardware description language), “Researchers [8] develop A Associate in Nursingd enforced in period of time Object following an FPGA (field programmable System employing a Particle gate array). All of the functions Filter “Intelligent Robots and of the projected particle filter Systems, 2006 IEEE/RSJ accustomed track moving objects International Conference on Date ar enforced within the FPGA. the of Conference: 9-15 Gregorian thing following system using this calendar month. 2006 Particle circuit is enforced and so its filters have attracted abundant performance is measured.In this attention thanks to their strong letter, enhancements [9] to the following performance in untidy nonlocal suggests that image environments. Particle filters denoising methodology maintain multiple hypotheses at introduced by Buades et al. ar the same time and use a conferred. the first nonlocal probabilistic motion model to suggests that methodology predict the position of the moving replaces a loud constituent by the

239 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 weighted average of pixels with image structure is captured connected encompassing mistreatment native co- neighborhoods. whereas occurrence statistics and is manufacturing progressive incorporated to the sweetening denoising results, this algorithmic rule in an methodology is computationally exceedingly sequent fashion impractical. so as to accelerate mistreatment the particle the algorithmic rule, we tend to filtering technique. during this introduce filters that eliminate context, the reconstruction unrelated neighborhoods from method is shapely employing a the weighted average. These phase space with multiple states filters ar supported native and its evolution is radio- average grey values and controlled by the previous gradients, preclassifying density describing the image neighborhoods and thereby structure. Towards best reducing the first quadratic exploration of the image pure complexness to a linear one and mathematics, Associate in reducing the influence of less- Nursing analysis method of the related areas within the state of the system is performed denoising of a given constituent. at every iteration we tend to gift the underlying Researchers develop [14] framework and experimental Color could be a powerful feature results for grey level and color for following deformable objects pictures moreover as well as for in image sequences with video. advanced backgrounds. the In this paper,[12] a colour particle filter has reconstruction frame-work that evidenced to be Associate in expressly accounts for image Nursing economical, pure mathematics once process straightforward and strong the spatial interaction between following algorithmic rule. during pixels within the filtering this paper, hybrid valued sequent method. to the present finish, state estimation algorithmic rule,

240 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 and its particle filter-based Kalman filtering method implementation, that extends the specifically Constriction allows quality color particle filter in 2 multimodal probability ways that. First, target detection distributions. It also increases Associate in Nursingd deletion ar the robustness to background embedded within the particle clutter. Constriction isalso called filter while not counting on an as Particle Filters and Monte external track format and Carlo Methods. Unlike the cancellation algorithmic rule. Kalman filters Constriction Second, thealgorithm is in a allows arbitrary probability position to trace multiple objects density functions (PDF’s). sharing constant color The goal is to calculate Ƥ description whereas keeping the (xt ǀ Zt) the probability of object enticing properties of the first state Xt given the history of colorparticle filter. The feature measurements. Factored performances of the projected sampling is a method for filter ar evaluated qualitatively approximating probability on numerous real-world video densities. sequences with appearingand Constriction algorithm disappearing targets. predicts multiple possible next III. PROPOSED METHOD states. This is achieved using A. Constriction Approach sampling or drawing samples The basis or outline of the The from the probability density primary difference between function. The idea is high Kalman filters and the probability samples should be CONSTRICTION algorithm is drawn more frequently and low how the state probability density probability samples should be is represented. CONSTRICTION drawn less frequently uses factored sampling to B.Some Important feature of approximate arbitrary pdf’s Constriction Algorithm (Probability Density Function). 1. Can work for multi-modal To overcome the drawbacks of distribution.

241 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 2. Predicts multiple possible Details states for each object tracked. 1. First, the set S = 3. Each possible state has a {s(1)…..s(N)} is sampled different probability. randomly from prior densityƤ(x) 4. Estimates probabilities of 2. Each element s(i) is assigned a predicted states based on weight πiproportional to the observed data. observation density (z ǀ x = s(i) ) D. Modelling Shape and Motion Symbols 1. Xt is the object state vector at time t F. Algorithm 2. Zt is the vector of measured The goal is to estimate the state features at time t probability density (Xt 3. Xt = { X0……Xt} is the state Ƥ ǀ Zt)given (Xt-1 Zt-1)and Zt. history Ƥ ǀ Since the state density is 4. Zt ={ Z0……..Zt} is the approximated as a sample set, history of measured features the algorithm needs to generate 5. (x) is prior probability Ƥ new sample set at each time density for object x CONSTRICTION follows the 6. (Xt Zt) is observation density ǀ following basic iterative steps

1. Prediction – Predicts where E. FactoredSampling:General object should be Idea 2. Measurement – Observe where Factored sampling is a method of object went approximating probability 3. Assimilation – Update object densities. The problem is to find Ƥ model by combining two (X Z), which represents all ǀ 1. INIT, t=0 knowledge about X deducible for i=1,..., N: sample x0(i)~p(x0); from the data Z. t:=1;

2. IMPORTANCE SAMPLING

for i=1,..., N: sample xt(i) ~

p(xt|xt-1(i))

242 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 X0: t (i):= (x0: t-1(i), xt (i)) on receipt of new measurements for i=1,..., N: evaluate are ideally suited for the importance weights Bayesian approach. This provides wt(i)=p(zt|xt(i)) a rigorous general framework for Normalize the importance dynamic state estimation weights problems. 3. SELECTION / In the Bayesian approach to RESAMPLING dynamic state estimation, one Resample with replacement N attempts to construct the particles x0: t (i) according to the posterior probability density importance weights function (pdf) of the state based Set t: =t+1 and go to step 2 on all available information, Block Diagram including the set of received measurements. Since this pdf embodies all available statistical information, it may be said to be the complete solution to the estimation problem. In principle, In order to analyse and make an optimal (with respect to any inference about a dynamic criterion) estimate of the state system, at least two models are may be obtained from the pdf. required: A measure of the accuracy First, a model describing the of the estimate may also be evolution of the state with time obtained. For many problems, an (the system model) and, second, a estimate is required every time model relating the noisy that a measurement is received. measurements to the state (the In this case, a recursive filter is a measurement model). We will convenient solution. A recursive assume that these models are filtering approach means that available in a probabilistic form. received data can be processed The probabilistic state-space sequentially rather than as a formulation and the requirement batch so that it is not necessary for the updating of information

243 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 to store the complete data set or and, not like the quality SIR to reprocess existing data if a filter, must be tailored to the new measurement becomes matter in hand, available. Such a filter consists of the process gains square essentially two stages: prediction measure substantial. and update. The prediction stage Further, we've got introduced a uses the system model to predict diagnostic for sampling the state pdf forward from one unskillfulness that permits us to measurement time to the next. check the performance of Since the state is usually subject assorted Monte Carlo filters. to unknown disturbances We advocate the employment (modelled as random noise), of this diagnostic as a general prediction generally translates, tool within the analysis of deforms, and spreads the state consecutive Monte Carlo pdf. The update operation uses algorithms. the latest measurement to modify REFERENCES the prediction pdf. This is achieved using Bayes theorem, [1]. A. Buades, B. Coll, and J.-M. which is the mechanism for Morel, “A non-local algorithm for image updating knowledge about the denoising,” in IEEE CVPR, target state in the light of extra 2005, pp. 60–65 information from new data [2]. Y. Boers and J. N. Driessen, VI. CONCLUSION “Particle filter based We have shown the way detection for tracking,” in to scale back the process value of Proc. Amer. Contr. Conf., implementing particle vol. 6, pp. 4393–4397, 2001 filters. we've got planned A [3]. E. Bølviken, P. J. Acklam, Nimproved particle filter N.. Christophersen, J-M. and incontestible its. Superior Størdal, “Monte Carlo filters for nonlinear state performance. Superior estimation,” Automatica, vol. performance where as he filter is 37, pp. 177–183, 2001 additional difficult to implement

244 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

[4]. K. Choo and D. J. Fleet, Systems, 2006 IEEE/RSJ “People tracking with hybrid International Conference on Monte Carlo filtering,” in Date of Conference: 9-15 Oct. Proc. IEEE Int. Conf. Comp. 2006 Vis., vol. II, pp. 321–328, [9].“Fast image and video denoising 2001 via nonlocal means of similar [5] M.J. Black and A.D. Jepson. neighborhoods” by M A probabilistic framework Mahmoudi, G Sapiro - Signal for matching temporal Processing Letters, IEEE, trajectories: Constriction- 2005 based recognition of gestures [10].“Combined wavelet domain and and expressions. ECCV-98, temporal video denoising “A pp. 909–924 Pizurica, V Zlokolica, W [6] .J. Carpenter, P. Clifford, Philips - … Video and Signal and .P. Fearnhead. Improved Based …, 2003 IEEE particle filter for nonlinear [11]“Gaussian Particle Filtering” problems. IEEE Proc.-Radar, Jayesh H. Kotecha and Petar Sonar Navig.146:2–4, 1999 M. Djuric´, Senior Member, [7]. M. S. Arulampalam, S. IEEE. IEEE Maskell, N. Gordon, and T. TRANSACTIONS ON Clapp. A tutorial on particle SIGNAL PROCESSING, filters for online VOL. 51, NO. 10, OCTOBER nonlinear/non-Gaussian 2003 Bayesian tracking. IEEE [12].“Real-time hand tracking using Trans. on Signal Processing, a mean shift embedded 50(2):174–188, 2002 particle filter” Caifeng [8].Jung UK Cho Sch. of Inf. & Shana, Tieniu Tanb, Commun. Eng., YuchengWeib a Department Sungkyunkwan Univ., of Computer Science, Queen Suwon Seung Hun Jin ; Mary University of London, Xuan Dai Pham ; Jae Wook Mile End Road, London E1 Jeon ; Jong Eun Byun ; 4NS, UK b National Hoon Kang “A Real-Time Laboratory of Pattern Object Tracking System Recognition (NLPR), Using a Particle Filter Institute of Automation, “Intelligent Robots and

245 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

Chinese Academy of Sciences. [13].Noura Azzabou, Nikos Paragios, Senior Member, IEEE, and Frédéric Guichard “Image Reconstruction Using Particle Filters and Multiple Hypotheses Testing” IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 5, MAY 2010. [14].“A particle filter for joint detection and tracking of color objects” J Czyz, B Ristic, B Macq - Image and Vision Computing, 2007 – Elsevier Science Direct. [15].“Efficient video denoising based on dynamic nonlocal means using Kalman Filter” Yubing Han, Rushan Chen.Image and Vision Computing 30 (2012) 78–85SciVerse Science Direct

246 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Android Application On Latest Auditions Online Portal

Patil Kalpesh R.1 Sangale Swapnil V.2 Baviskar Sachin M.3 Mahajan Vilas U.4 Computer Engineering Department, Sir Visvesvaraya Institute Of Technology, Nashik. 2swapnil:sangale@rediffmail:com; 3sachinbaviskar89@gmail:com

Abstract|Auditions India is One Registered and authen-ticated stop resource for the latest users will apply to the auditions Auditions and casting calls across listed by production houses, actors In-dia was formed with the etc. mission to provide the most Index Terms|Android applica- updated information related to tion,Database,Data Processing auditions and casting calls across and Online Transaction. India and to inform the reg- istered users via di erent modes of I. Introduction communica-tion. This mobile IN the Latest era all the application makes convenient to competitions are carried out,also use and also provide proper says "Audition" related to the di solution. This is the best erent skills Because of growth of opportunity for the users of all lm industry who requires the auditionsin-dia.com from di erent talent Act So perfect management location to choose or know about must required for such auditions current auditions. The people who for the people who participate in want to make career in the eld of the auditions. For such a man- Actors, artist, mimi-cries etc. can agement, one idea comes in mind update his/her pro le through An- that in this era, Everyone uses droid Apps or interface so android OS based smartphones.So requirement of com-puter, power that application is most helpful for supply not necessary because this people to in-teract with such is an Android mobile App. auditions like know about where

247 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 and when the audition will be for that Audition. carried out in that case user can participate. Talent can maintain their pro le II. Related Work such as follows details A. Existing System: 1)General Details 2)Pro le Details 3)Physical Details 4)Contact Details

Talent can also upload photos and videos and also manage photos and videos. After uploaing Existing System consist of 3 photos or videos they are sending sections as follows for moderation where moderator A.1 Production Houses: is responsible for storing photos Production Houses such as BR and videos. chopra, Yash Raj Films,Great

Maratha etc can post their script Both operations from Talent or description about required and Production Houses handled Actors, artist, mimicries etc and by Moderator. also he can see requested A.3 Moderator: Talent(those are regitered users of Auditionsin-dia.com) profiles. Moderator always analyze After that he can further selet informa-tion comming from those Talent for their cast. Talent or Production

Houses.Moderator discards A.2 Talent: vulgure photos and videos after Talent term refers to those con rming then sent to database. who wants to become Actors, artist, mimicries etc. They can see when and where Auditions will taken and also he can apply

248 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 B. Proposed System: communicate to directly to the database.So that purpose SOAP actions performed.

Web Service:

A Web service is a method of communications between two electronic devices over the World Wide Web. It is a software function provided at a network address over the web with the B.1 Frontend Operation: service always on as in the concept of utility computing. a In the front end, Talent who Web service is a .NET component wants to be-come an actor is that replies to HTTP requests login rst through username and that are formatted using password then he/she can able to theSOAP syntax. Web services view Au-dition Calls,change pro are one of the cornerstones of the le details,physical details and also .NET initiative in that they allow upload photos and videos from a degree of interoperability inbuilt camera or gallery. among appli-cations over the Internet that was inconceivable B.2 Backend Operation: before. SQL Server: Here in the background,Data is Microsoft SQL Server is a access from server through Web relational database management Service.This would en-able system developed by Microsoft. information to be exchanged As a database, it is a software between the database server and product whose primary function Android Application.Since the is to store and retrieve data as Android Application cannot requested by other software

249 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 applications, be it those on the SOAP can form the foundation same computer or those running layer of a web ser-vices protocol on another computer across a stack, providing a basic network (including the Internet). messaging framework upon There are at least a dozen di which web services can be built. erent editions of Microsoft SQL This XML-based protocol consists Server aimed at di erent of three parts: an envelope, audiences and for workloads which de nes what is in the ranging from small single- message and how to process it, a machine applications to large set of encoding rules for Internet-facing applications with expressing instances of many concurrent users. Its application-de ned datatypes, and primary query languages are T- a convention for representing SQL and ANSI SQL. procedure calls and responses. SOAP has three major SOAP: characteristics: extensibility SOAP, originally de ned as (security and WS-routing are Simple Object Access Protocol, is among the extensions under a protocol speci cation for development), neutrality (SOAP exchang-ing structured can be used over any transport information in the protocol such as HTTP, SMTP, implementation of Web Services TCP, or JMS), and independence in computer networks. It relies (SOAP allows for any on XML Information Set for its programming model). message format, and usually As an example of how SOAP relies on other Application Layer procedures can be used, a SOAP protocols, most notably message could be sent to a web Hypertext Transfer Pro-tocol site that has web services (HTTP) or Simple Mail Transfer enabled, such as a real-estate Protocol (SMTP), for message price database, with the negotiation and transmis-sion. parameters needed for a search. The site would then return an

250 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 XML-formatted document with the resulting data, e.g., prices, location, features. With the data being returned in a standardized machine-parsable format, it can then be integrated directly into a third-party web site or application.

III. Conclusion In this paper,we propose a Android Applica-tion for Auditionsindia.com users to provide user friendliness application and also provide how user can acknowledge about Auditions where they are held etc.

IV. Code snippets

251 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 V. Snapshots

252 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Server 2008",Pearson Education.Inc,2011. [3] Steele J,"The Android Developers Cookbook Building Applications with the Android SDK".Pearson Education,Inc,2011. [4] "http://www.eclipse.org/resources/ resource.php?id=516" [5] "http://www.softmomo.com/" [6] Developing Android Apps, "http://www.developers.android.c om/" [7] "http://www.stackoverow.com/"

References

[1] Seth Y.FiaWoo and Robert A.Sowah, "Design and Devel- opement of an Android Application to process and display summarized Corporate Data",University of Ghana,IEEE 4th International Conference on Adaptive Science Technology,2012. [2] R. Rankins P.Bertucci,C. Gallelli and A.L.Silverstein,"Microsoft SQL

253 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 A SURVEY ON USER IDENTITY VERIFICATION VIA KEYBOARD AND MOUSE DYNAMICS

Miss Ashwini Subhash Sonawane, Miss Uzma Anis Shaikh, Miss Vaishali Sitaram Kadu, Prof. Shedge Kishor N. Department of Computer Engg. Sir Visvesvaraya Institute of Technology,Nashik.

ABSTRACT Many techniques has been used Millions of people use computers to deal with the identity theft and mobiles. These devices have and the most commonly a very important position in techniques used are login and today’s world. Most of these passwords. With time came devices use passwords and pins different techniques like face for authentication,but passwords recognition technique, retina can be easily hacked .thus pattern, iris pattern, finger ,stronger authentication system prints, palm prints, audio tone is needed to provide security.In and many more. With time use of this paper,we propose an smartcard based authentication authentication system based on increased and biometric behavioural biometrics. We authentication was a great leap introduce a method that verifies in research. Biometric refers to users according to characteristics identification of user based on his of their interaction with the characteristics. Biometrics is mouse and keyboard. classified into two types : a)physiological biometrics, b) Keywords behavioural biometrics. Biometric, authentication, Physiological biometrics refers to classification, data mining. physical characteristics of a person which are unique whereas 1. INTRODUCTION behavioural biometrics is related to the behaviour of a person.

254 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Without the knowledge of the behavioral biometric devices as user,behavioral biometric data follows: can thus be collected. These systems use the human Table 1: Metric to evaluate characteristics for interaction performance of the type of with the input devices and later behavioral biometrics on use these profiles to verify the Metric Explanation identity of the user. FAR Measures the ratio 2. RELATED WORK between the number of attacks that were According to [4], biometric based erroneously labeled as authentication techniques are authentic interactions best to uniquely characterize an and the total number individual , than text based(i.e. of attacks. passwords and PIN) and physical FRR Measures the ratio (i.e. smartcards etc). [1] has between the number stated that behavioral biometric of legitimate techniques can be categorized interactions that were based on different types of erroneously labeled as categories such as type of attacks and the total learning: implicit or explicit. But number of legitimate the focus of this paper is based on interactions. two techniques i.e. mouse and ROC A ROC curve is a key dynamics. graphical In [1], we get a general representation of the idea on how mouse based tradeoff between the authentication methods work FAR and the FRR for which will be explained in various threshold following sections. [1] has also values. stated the basic metrics to AUC Measures the area measure the performance of under the ROC curve. EER The rate at which

255 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 both acceptance and Until now, the research rejection error rates suggests different biometric are equal. Low EER techniques, such as physiological values indicate an and behavioural as explained accurate above. The drawback of authentication physiological techniques is that system. they require expensive sensors In this table: and other hardware devices FAR - False Acceptance Rate which are not always affordable. FRR - False Rejection Rate AUC - Area under Curve 4. BEHAVIORAL EER - Equal Error Rate BIOMETRIC 4.1 General architecture of 3. PROPOSED SYSTEM behavioral biometric system: The figure 1 shows the There are three parts of general architecture of security: text based, physical behavioural biometric system authentication and biometric which consists of following authentication. Usually, components: passwords have been used as a  Events acquisition means of security. But  Feature extraction authentications of passwords and  Classifier PINs have proven to be  Signatures database extremely susceptible to hackers. Also, other credentials such as smartcards can be stolen and misused. The best proven authentication scheme until now is biometric system. It offers highest degree of security since it Figure 1: A typical uses those unique characteristics architecture of of an individual. behaviouralbiometric system.

256 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

1. Event acquisition – captures interaction with input devices, the events generated by the audit logs, car driving style, gait various input devices used for or stride etc. these features are the interaction (e.g. keyboard, captured and treated as raw data mouse). which are used to characterize an 2. Feature extraction – individual. constructs a signature which In this paper, we focus on two identifies the behavioural techniques: a) user interaction biometrics of the user. with mouse b) user interaction 3. Classifier – The classifier with keyboard.Basically timing, used to construct the model movement direction and clicking for each action type is the actions are used to build a profile Random Forest. Each of the of a user, which is then used for actions collected by the authentication purposes. Most Action Collector is passed to systems require the user to the appropriate classifier interact with a program (such as according to the type of a game) in order to derive action. sufficient statistical information 4. Signature database – A regarding their mouse dynamics. signature database is used to The behavioral biometric of store the behavioural keystroke dynamics uses the signatures of the user. manner and rhythm in which an When a username is entered, individual types characters on a thesignature of the user is keyboard or keypad. The retrieved for the verification keystroke rhythms of a user are process. measured to develop a unique biometric template of the users 4.2 Behavioural biometric typing pattern for future techniques: authentication. Raw As such there are many measurements available from techniques that recognize an most every keyboard can be individual by his behaviour viz.

257 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 recorded to determine Dwell time al., 2002; Monrose and Rubin, and Flight time. 2000), with novel research taking In keystroke dynamics the place in different languages biometric template used to (Gunetti et al., 2005), for long identify an individual is based on text samples, (Bartolacci et al., the typing pattern, the rhythm 2005; Curtin et al., 2006) and for and the speed of typing on a e-mail authorship identification keyboard. The raw (Gupta et al., 2004). In a similar measurements used for keystroke fashion Bella and Palmer (2006) dynamics are dwell time and have studied finger movements of flight time. skilled piano players. They have Dwell time is the time duration recorded finger motion from that a key is pressed skilled pianists while playing a Flight time is the time duration musical keyboard. Pianists’ in between releasing a key and finger motion and speed with pressing the next key. which keys are struck was analysed using functional data 4.3 Keystroke dynamics: analysis methods. Movement Keystroke features are velocity and acceleration were based on time durations between consistent for the participants the keystrokes, inter-key strokes and in multiple musical contexts. and dwell times, overall typing Accurate pianists’ classification speed, frequency of errors (use of was achieved by training a neural backspace), use of numpad, order network classifier using in which user presses shift key to velocity/acceleration trajectories get capital letters and possibly preceding key presses[2]. the force with which keys are hit for specially equipped keyboards 4.4 Mouse dynamics (Ilonen, 2006; Jain et al., 1999). A unique profile can be generated Keystroke dynamics is probably by monitoring all mouse actions the most researched type of HCI- produced by the user during based biometric (Bergadano et interaction with the GUI, which

258 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 can be used for user re- requires around 10–15 minutes of authentication (Pusara and devoted game play instead of Brodley, 2004)[2]. seamless data collection during Mouse actions of interest include the normal game play to the user general movement, drag and human computer interaction. As drop, point and click, and far as the extracted features, x stillness. From those a set of and y coordinates of the mouse, features can be extracted for horizontal velocity, vertical example, average speed against velocity, tangential velocity, the distance travelled and tangential acceleration, average speed against the tangential jerk and angular movement direction (Ahmed and velocity are utilised with respect Traore, 2005a; 2005b). Pusara to the mouse strokes to create a and Brodley (2004) describe a unique user profile. feature extraction approach in which they split the mouse event 5. CONCLUSION data into mouse wheel Nowadays the security of the movements, clicks, menu and authentication system is very toolbar clicks. Click data is much important factor to be further subdivided into single taken into consideration.Thus and double click data. the main aim was to design and Gamboa and Fred (2003; 2004) implement a completely new have tried to improve accuracy of mechanism for providing security mouse-dynamics-based to authentication systems. User biometrics by restricting the Identity verification is used as a domain of data collection to an security layer in addition to online game instead of a more username and password by general GUI environment. As a validating the identity of logged- result, applicability of their on users based on their results is somewhat restricted physiological and behavioural and the methodology is more characteristics. The keystroke intrusive to the user. The system and mouse dynamics

259 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 authentication is an interesting emerging biometric modalities”, biometric modality as it does not Procedia Computer require any additional sensor and Science2(2010)213–218, ICEBT is well-accepted by users.With 2010. [6] Ahmed Awad E. Ahmed, the new technique to mouse and IssaTraore”Mouse Dynamics keystroke dynamics the security biometric Technology”, level of data get increased by DOI:10.4018/978-1-60566-725- great extend. 6.ch010 6. REFERENCES [7] Chien Le, ”A Survey of Biometrics Security Systems” [1] Clint Feher a,*, Yuval Elovici a, Robert Moskovitch a, LiorRokach b,1, AlonSchclar c User identity verification via mouse dynamics. [2] Roman V. Yampolskiy*, Venu Govindaraju* Behavioral biometrics: a survey and classification. [3] RomainGiot*, Mohamad El- Abed, Baptiste Hemery, Christophe Rosenberger GREYC Laboratory, ENSICAEN, University of Caen, CNRS, 6 Boulevard Mare´chalJuin, 14000 Caen Cedex, France. [4] Hafiz Zahid Ullah Khan “Comparative study of authentication techniques” International Journal of Video & Image Processing and Network Security IJVIP0opl;NS Vol: 10 No: 04 9 [5] SushilChauhana, A.S. Arorab, Amit Kaula “ A survey of

260 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 INTELLIGENT TRANSPORTATION SYSTEM

Deepti Patil, Sheetal Sharma, Kiran Madihalli, Gunjan Deore Department of Computer Engineering, Pune University Sir visvesvaraya Institute Of Technology,India. [email protected] [email protected] [email protected]

Abstract- It is a framework the petrol pump. The sensor for the deployment of gets activated when the intelligent transport systems vehicle arrives at the petrol in the field of road transport pump to fill the petrol pump. for interfaces with other This project also consists of modes of transport. ITS is a RFID transreciever at toll system which is interface naka to detect vehicle’s between information and chassis number. This project communication technologies. aims at providing innovative It is applied in the field of services related to different road transport, including modes of transport and infrastructure, vehicles and traffic management. It users, and in traffic enables various users to be management and mobility better informed and make management, as well as for safer more coordinated and interfaces with other modes smarter use of transport of transport. It consists of network. It requires less microcontroller 89c52 which manpower at every signal. It collects and stores data with reduces number of accidents the help of GSM modem SIM and bribes to a great extent. 300. Then this data is Keywords- RFID, ITS, GSM transmitted to the server on SIM300, AUTOPIA

261 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

I. INTRODUCTION 10:1 or more, a value not usually Intelligent transportation seen by traditional capacity system has been designed to projects. ITS deployments have provide interface between occurred at the national, state, information and communication and local levels. Oregon’s technologies. It aims at reducing transportation infrastructure is manpower and bribes across the being asked to serve a growing country. It helps to maintain demand while financial resources speed synchronization of vehicles are becoming increasingly and helps to detect stolen limited. ITA technologies have vehicles. It enables various users potential way to address these to be better informed and make needs in Oregon’s transportation. safer more coordinated and the vehicle breaks any of the smarter use of transport rules at the speed control zone, network. the sensor attached in the vehicle II. LITERATURE SURVEY receives data from the node at The goal of Intelligent the zone. The data is processed at transportation system is to the petrol pump during refilling. improve the effectiveness, A) Stolen indication on efficiency, and safety of the toll naka. transportation systems. Long Sometimes when we park our range planning for the vehicle, there are chances of it deployment of ITS technologies getting stolen. So to find the depends in part on the knowledge vehicle we proposed this system. of which technologies are most The owner then files the effective. Thus, it is important to complaint to the RTO office understand the benefits of which gets stored in the database emerging and existing of RTO including its chassis no., technologies. Many of the vehicle Id. Etc... When the benefits of urban traffic vehicle arrives at toll naka, the management systems have RFID sensor at toll naka sends benefit-to-cost ratios of typically the data to the server at the

262 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 petrol pump. The server matches DC motors, GSM modem SIM it with the stored database, and if 300, RFID tag and receiver and the match is found with the microcontroller 89c52. stolen vehicle, stolen vehicle is A) Chassis number detection found. To detect the chassis number of It consists of the vehicle RFID tag is provided microcontroller 89c52 which into the vehicle which has unique collects and stores data with the number for different vehicles. help of GSM Modem SIM300. The RFID receiver which is Then this data is transmitted to placed near toll naka reads the the server on the petrol pump. unique number from the tag. The sensor gets activated when An RFID tag is comprised of a the vehicle arrives at the petrol microchip containing identifying pump to fill the petrol. The information and an antenna that microcontroller 89c52 is used to transmits this data wirelessly to store the data I. Also at the toll a reader. At its most basic, the naka, it consists of a RFID reader chip will contain a serialized at the entrance. Each of the identifier, or license plate vehicles has a RFID tag. When number, that uniquely identifies the vehicle enters the toll naka, that item, similar to the way the reader reads data from the many bar codes are used today. RFID tag on the vehicle i.e. The data storage capacity of the vehicle Id and chassis number to RFID tags are about 16- bit. detect stolen vehicle. If the vehicle has fake or no car plate then it gets caught on the toll naka. III. IMPLEMENTATION DETAILS

Figure 2 : RFID tags To build the Intelligent Transportation System we used

263 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 pump by using GSM Modem SIM 300. These details are then matched with the owner’s database. If the numbers are matched then the thief is caught. The microcontroller used in this model is 89c52. It is an 8-bit Figure 3. RFID receiver microcontroller with 8K flash. The RFID receiver is the The power supply needed is source of the RFID energy used about 5V. Here GSM modem SIM to activate and power the passive 300 is used to transfer the RFID tags. The RF transceiver information. It is also used to controls and modulates the radio send the fine deduction SMS to frequencies that the antenna the owner. The read and write transmits and receives. The instructions are done using AT transceiver filters and amplifies commands. the backscatter signal from a C) Detection of high speed at passive RFID tag. zone When the information is Detection of speed at the transferred to the PC, the zone is done by using the IR corresponding chassis number is sensor. At the low speed zones then printed on the screen. such as near hospitals, schools B) Detection of stolen vehicle etc., the sensor is placed which on petrol pump senses the high speed of the When the vehicle reaches vehicle. The DC motor is also the petrol pump, the petrol flap is embedded into the speedometer opened. A microcontroller is of the vehicle which reads the attached on to the flap which speed limit of the vehicle. Vehicle reads all the information about also contains the GSM modem the vehicle like its vehicle which helps to send the number, owner’s details etc. information to the RTO office so these details are then transferred that if the rules are broken then to the server PC at the petrol

264 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 fine can be easily deducted. This V. REFERENCES helps in reducing accidents. [1] Dr.Krishan Nallaperumal, SMIEEE and C.Nelson Kennedy Babu, Centre for Information Technology and Engineering, M.S.University, Tirunelveli, Tamilnadu, India "An Efficient Geometric Feature Based License Plate Localization and Recognition", International Figure 4. DC motor Journal of Imaging Science and Engineering (IJISE),April 2008. IV. CONCLUSION [2] Wonjun Kim and ChangickKim, It is the combination of Member, IEEE "A New intelligent transport systems in Approach for Overlay Text the field of road transport. It Detection and Extraction from provides interfaces with other Complex Video Scene", IEEE modes of transport. Intelligent Transactions On Image Transportation System is a Processing, Vol.18, No.2, system which is interface February 2009. between information and [3] Hsien-Chu WU, Chwei-Shyong communication technologies. It is TSAI and Ching-Hao LAI "A License Plate Recognition applied in the field of road System in e-Government", transport, including Information and Security, 2004. infrastructure, vehicles and [4] Ladimir Shapiro, DimoDimov, users, and in traffic management Stefan Bonchev, and mobility management, as VeselinVelichkov and Georgi well as for interfaces with other Gluhchev "Adaptive Multi- modes of transport. This model National License Plate also helps in reducing bribes. Extraction" Berlin Academy of Thus it will prove to be a very Sciences 2004. efficient technology in the field of [5] Wei Wang, Qiaojing Jiang, Xi Zhou, Wenyin Wan “Car License transportation. Plate Detection Based on MSER

265 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

School of Telecommunications Engineering, Xidian University Xi’an, China. [6] Chunyu Yang, Student Member, IEEE, Yong Cao, Zaiqing Nie, Jie Zhou,” design of intelligent traffic light controller using embedded system, Senior Member, IEEE, and Ji-Rong Wen.

266 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 GREEN CLOUD ENERGY EFFICIENT A NEW METHODOLOGY FOR CREATING AUTONOMOUS SOFTWARE DEPLOYMENT PACKAGES

Rohan Nagar, Sagar Karad and Rakesh Vaishnav

Abstract in cloud computing. The objective Network-based cloud computing is to consume energy of is rapidly expanding as an conventional computing by alternative to conventional office- deploying the same task on a based computing. As cloud standard consumer personal computing becomes more computer (PC) that is connected widespread, the energy to the Internet but does not consumption of the network and utilize cloud computing. computing resources that Client connects to Cloud and underpin the cloud will grow. demands service on pay and use This is happening at a time when Software services. In this there is increasing attention scenario cloud processes the user being paid to the need to manage request on it own infrastructure energy consumption across the and returns the result to the entire information and client. The load on the cloud communications technology increases as number of clients (ICT) sector. While data center access the services from the energy use has received much cloud. attention recently, there has We are proposing a new been less attention paid to the deployment for cloud where the energy consumption of the requested software by the user is transmission and switching deployed on the client machine networks that are key to silently so that the client can connecting users to the cloud. In access the software on its own this paper, we present an machine without accessing the analysis of energy consumption cloud network. The cloud

267 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 monitor the services on paid does not require any user terms and automatically interaction. undeploys the software after the policy of use and pay is lapsed. Keywords: Autonomous An Installation that is performed installation, Silent unattended without user interaction and does installation, Network not display any message or GUI installation, Mobile agent for during its progress is known as installation, Multi-agent for Silent Unattended Installation. software deployment Silent unattended installation plays a key role when installing Objectives software over networks and time 1) Server Side Design is of immense importance 2) Client Side Design because it does not require user 3) Request for Software interaction/intervention during Installation the process and usually skips the 4) Remote Software non important steps which are Installation (Oracle , SQL usually part of installer wizards. Server, etc) An agent based system for 5) Online Installation activity monitoring on network is Monitoring responsible for the deployment of 6) Pay Use Services Packages on the specified 7) Renewal locations of the network. The 8) Auto Undeployment process is fully autonomous and

Requirements Hardware requirements Number Description Alternatives (If available) 1 PC with 200 GB hard-disk Not-Applicable and 4GB RAM 2 Network Configure for Internet Service

268 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 Software requirements Number Description Alternatives (If available) 1 Windows XP/2007 2 Java Not Applicable

Manpower requirements 4 to 5 students can complete this in 4-6 months if they work fulltime on it. Milestones and Timelines No Milestone Name Milestone Timeline Remarks Description (From Week – To Week) 1 Requirements/ Complete 2-3 Attempt Functional specification of the should be Specification system (with made to appropriate add some assumptions) more including listing relevant down the functionali functionalities to ties other be supported by than those the proxy server. A that are document listed in detailing the same this should be written document. and a presentation on that be made. 2 Technology Understanding of 4-6 The familiarization & the technology presentatio Task breakup needed to n should implement the be from project. the point of view of being able

269 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 to apply it to the project, rather than from a theoretical perspective . The requirements A excel should be broken sheet up into individual containing members of the the break- team and they up of the should come up tasks and with milestones in important their project life- milestones cycle and dates of dates completion should be the output of this stage 3. High-level and The design should 7-10 The Detailed Design detail about the scenarios approach towards should creating the proxy map to the server. Incase of requireme OOAD approach nt towards design, specificatio the output of this n (ie, for phase would be a each design document requireme containing the nt that is UML Class specified, a diagram and correspond Sequence ing Diagram. approach or design should be there in

270 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 the document). 4. Implementation/ The actual 11-16 During Build implementation of this the proxy server milestone should be made in period, it this phase. Unit would be a Testing for each good idea unit/component of for the the system should team (or be done so that the one person functionality of from the each unit is team) to verified. start working on a test-plan for the entire system. This test- plan can be updated as and when new scenarios come to mind. 5 Integration The system should 17 Another 2 Testing be thoroughly weeks tested by running should be all the test cases there to written for the handle any system (from issues milestone 4). found during testing of the system. After that, the final

271 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014 demo can be arranged. 6 Final Review Issues found 18-19 During the during the final previous milestone review of are fixed and the the project, system is ready for it should the final review. be checked that all the requireme nts specified during milestone number 1 are fulfilled (or appropriat e reasons given for not fulfilling the same)

Guidelines and References computing. By Jayant Baliga, 1) Green Cloud Computing: Robert W. A. Ayre, Kerry Hinton, Balancing Energy in Processing, and Rodney S. Tucker, Fellow Storage, and Transport For IEEE processing large amounts of data, management and switching of 2) HERMIT: A NEW communications METHODOLOGY FOR may contribute significantly to CREATING AUTONOMOUS energy consumption and cloud SOFTWARE DEPLOYMENT computing seems to be an PACKAGES alternative to office-based

272 International Journal of Multidisciplinary Educational Research

IJMER; ISSN: 2277-7881; IF-2.735; IC V:5.16; Vol 3, Issue 4(2), April 2014

International Journal of International c⃝ 2012 ISSN 1349- Innovative Computing, 4198 Volume 8, Number 1(A), Information and Control ICIC January 2012

273 International Journal of Multidisciplinary Educational Research