The Early View of

“Global Journals of Computer Science and Technology”

In case of any minor updation/modification/correction, kindly inform within 3 working days after you have received this.

Kindly note, the Research papers may be removed, added, or altered according to the final status.

w e i V y l r a

E

w e i V y l r a E © Global Journal of Computer Science and Technology. 2010.

All rights reserved.

This is a special issue published in version 1.0 of ―Global Journal of Computer Science and Technology..‖ Publisher’s correspondence office

All articles are open access articles distributed under Global Journal of Computer Science and Technology..‖ Global Journals, Headquarters Corporate Office, Reading License, which permits restricted use. Entire contents are copyright by of ―Global United States Journal of Computer Science and w Technology..‖unless otherwise noted on specific articles. Offset Typesetting

No part of this publication may be reproduced e or transmitted in any form or by any means, Global Journals, City Center Office, electronic or mechanical, including photocopy, United States recording, or any information storage and i retrieval system, without written permission. Packaging & Continental Dispatching The opinions and statements made in this book are those of the authors concerned. Ultraculture has not verified and neither confirms nor V denies any of the foregoing and no warranty or Global Journals, India fitness is implied.

Engage with the contents herein at your own Find a correspondence nodal officer near you risk. y The use of this journal, and the terms and To find nodal officer of your country, please conditions for our providing information, is email us at [email protected] governed by our Disclaimer, Terms and l

Conditions and Privacy Policy given on our website http://www.globaljournals.org/global- eContacts journals-research-portal/guideline/terms-and- r conditions/menu-id-260/. Press Inquiries: [email protected] By referring / using / reading / any type of Investor Inquiries: [email protected] association / referencing this journal,a this signifies and you acknowledge that you have Technical Support: [email protected] read them and that you accept and will be bound by the terms thereof. Media & Releases: [email protected] E All information, journals, this journal, activities undertaken, materials, services and Pricing (Including by Air Parcel Charges): our website, terms and conditions, privacy policy, and this journal is subject to change anytime without any prior notice. For Authors: 22 USD (B/W) & 50 USD (Color) License No.: 42125/022010/1186 Registration No.: 430374 Import-Export Code: 1109007027 Yearly Subscription (Personal & Institutional): 200 USD (B/W) & 500 USD (Color) John A. Hamilton,"Drew" Jr., Dr. Wenying Feng Ph.D., Professor, Management Professor, Department of Computing & Computer Science and Software Information Systems Engineering Department of Mathematics Director, Information Assurance Trent University, Peterborough, Laboratory ON Canada K9J 7B8 Auburn University w Dr. Henry Hexmoor Dr. Thomas Wischgoll IEEE senior member since 2004 Computer Science ande Engineering, Ph.D. Computer Science, University at Wright State University, Dayton, Ohio Buffalo B.S., M.S., Ph.D. i Department of Computer Science (University of Kaiserslautern) Southern Illinois University at Carbondale Dr. Osman Balci, Professor Dr. AbdurrahmanV Arslanyilmaz Department of Computer Science Computer Science & Information Systems Virginia Tech, Virginia University Department Ph.D.and M.S.Syracuse University, Youngstown State University Syracuse, New York yPh.D., Texas A&M University M.S. and B.S. Bogazici University, Istanbul,l University of Missouri, Columbia Turkey Gazi University, Turkey Yogita Bajpai Dr. Xiaohong He M.Sc. (Computer Science), FICCTr Professor of International Business U.S.A.Email: University of Quinnipiac [email protected] BS, Jilin Institute of Technology; MA, MS, PhD,. E (University of Texas-Dallas) Dr. T. David A. Forbes Burcin Becerik-Gerber Associate Professor and Range University of Southern Californi Nutritionist Ph.D. in Civil Engineering Ph.D. Edinburgh University - Animal DDes from Harvard University Nutrition M.S. from University of California, Berkeley M.S. Aberdeen University - Animal & Istanbul University Nutrition B.A. University of Dublin- Zoology Dr. Bart Lambrecht Dr. Söhnke M. Bartram Director of Research in Accounting and Department of Accounting and FinanceProfessor of Finance FinanceLancaster University Management Lancaster University Management School SchoolPh.D. (WHU Koblenz) BA (Antwerp); MPhil, MA, PhD MBA/BBA (University of Saarbrücken) (Cambridge) Dr. Carlos García Pont Dr. Miguel Angel Ariño Associate Professor of Marketing Professor of Decision Sciences IESE Business School, University of IESE Business School Navarra Barcelona, Spain (Universidad de Navarra) Doctor of Philosophy (Management), CEIBS (China Europe International Business Massachussetts Institute of Technology School). (MIT) Beijing, Shanghai and Shenzhenw Master in Business Administration, IESE, Ph.D. in Mathematics University of Navarra University of Barcelonae Degree in Industrial Engineering, BA in Mathematics (Licenciatura) Universitat Politècnica de Catalunya University of Barcelonai Dr. Fotini Labropulu Philip G. Moscoso Mathematics - Luther College Technology and Operations Management University of ReginaPh.D., M.Sc. in IESE BusinessV School, University of Navarra Mathematics Ph.D in Industrial Engineering and B.A. (Honors) in Mathematics Management, ETH Zurich University of Windso M.Sc. in Chemical Engineering, ETH Zurich Dr. Lynn Lim yDr. Sanjay Dixit, M.D. Reader in Business and Marketing l Director, EP Laboratories, Philadelphia VA Roehampton University, London Medical Center BCom, PGDip, MBA (Distinction),r PhD, Cardiovascular Medicine - Cardiac FHEA Arrhythmia Univ of Penn School of Medicine Dr. Mihaly Mezei a Dr. Han-Xiang Deng ASSOCIATE PROFESSOR MD., Ph.D Department ofE Structural and Chemical Associate Professor and Research Biology Department Division of Neuromuscular Mount Sinai School of Medical Center Medicine Ph.D., Etvs Lornd University Davee Department of Neurology and Postdoctoral Training, New York Clinical Neurosciences University Northwestern University Feinberg School of Medicine

Dr. Pina C. Sanelli Dr. Michael R. Rudnick Associate Professor of Public Health M.D., FACP Weill Cornell Medical College Associate Professor of Medicine Associate Attending Radiologist Chief, Renal Electrolyte and NewYork-Presbyterian Hospital Hypertension Division (PMC) MRI, MRA, CT, and CTA Penn Medicine, University of Neuroradiology and Diagnostic Pennsylvania Radiology Presbyterian Medical Center, M.D., State University of New York at Philadelphia Buffalo,School of Medicine and Nephrology and Internal Medicine Biomedical Sciences Certified by the American Board of Internal Medicine Dr. Roberto Sanchez w Associate Professor Department of Structural and Chemical Dr. Bassey Benjamine Esu Biology B.Sc. Marketing; MBA Marketing; Ph.D Mount Sinai School of Medicine Marketing i Ph.D., The Rockefeller University Lecturer, Department of Marketing, University of Calabar Dr. Wen-Yih Sun TourismV Consultant, Cross River State Professor of Earth and Atmospheric Tourism Development Department SciencesPurdue University Director Co-ordinator , Sustainable Tourism National Center for Typhoon and Initiative, Calabar, Nigeria

Flooding Research, Taiwan y Dr. Aziz M. Barbar, Ph.D. University Chair Professor l IEEE Senior Member Department of Atmospheric Sciences, Chairperson, Department of Computer National Central University, Chung-Li,r Science TaiwanUniversity Chair Professor AUST - American University of Science & Institute of Environmental Engineering, Technology National Chiao Tung University,a Hsin- Alfred Naccash Avenue – Ashrafieh chu, Taiwan.Ph.D., MS The University of Chicago, GeophysicalE Sciences BS National Taiwan University, Atmospheric Sciences Associate Professor of Radiology

Dr. R.K. Dixit (HON.) M.Sc., Ph.D., FICCT Chief Author, India Email: [email protected] w Vivek Dubey(HON.) Er. Suyog Dixit e MS (Industrial Engineering), BE (HONS. in Computer Science), FICCT MS (Mechanical Engineering) SAP Certifiedi Consultant University of Wisconsin Technical Dean, India FICCT Website:V www.suyogdixit.com Editor-in-Chief, USA Email:[email protected], [email protected] [email protected] Sangita Dixit y M.Sc., FICCT Dean and Publisher, India l [email protected] a E i. Copyright Notice ii. Editorial Board Members iii. Chief Author and Dean iv. Table of Contents v. From the Chief Editor’s Desk vi. Research and Review Papers

1. Load Balanced Clusters for Efficient Mobile Computing 2-6 2. Corporate Data Obesity:50 Percent Redundant 7-11 3. Web Mining: A Key enabler for Distance Education 12-13 4. Optimized Remote Network Using Specified Factors As Key Performance Indices 14-17w 5. Analysis of the Routing Protocols in Real Time Transmission: A Comparative Study 18-22 6. An Empirical Study on Data Mining Applications 23-27 7. A Novel Decision Scheme for Vertical Handoff in 4G Wireless Networks 28-33e 8. Hybrid Approach for Template Protection in Face Recognition System 34-38 9. QRS Wave Detection Using Multiresolution Analysis 39-42 i 10. A Review on Data Clustering Algorithms for Mixed Data 43-48 11. Optimization Of Shop Floor Operations: Application Of Mrp And Lean Manufacturing Principles 49-54 12. A Study On Rough Clustering 55-58 V 13. Applying Software Metrics on Web Applications 59-63 14. Measuring Helpfulness of Personal Decision Aid Design Model 64-80 15. Security Provision For Miners Data Using Singular Value Decomposition In Privacy Preserving Data Mining 81-84 16. An Efficient Synchronous Checkpointing Protocol for Mobile Distributed Systems 85-89 17. A Fuzzy Co-Clustering approach for Clickstreamy Data Pattern 90-95 18. A Survey on Topology for Bluetooth Based Personal Area Networks 96-101 19. Identification of Most Desirable Parametersl in SIGN Language Tools: A Comparative Study 102-108 20. A Low-Overhead Minimum Process Coordinated Checkpointing Algorithm For Mobile Distributed System 109-115 r 21. The Establishment of an AR-based Interactive Digital Artworks 116-120 22. AUTOCLUS: A Proposed Automated Cluster Generation Algorithm 121-123 23. Implementing Searcha Engine Optimization Technique to Dynamic / Model View Controller Web Application 124-132 24. Eye detection in video images with complex Background 133-136 25. Multi-LayerE User Authentication Approach For Electronic Business Using Biometrics 137-141 26. Cloud Computing – A Paradigm Shift 142-146 27. On Security Log Management Systems 147-157 28. A Transformation Scheme for Deriving Symmetric Watermarking Technique into Asymmetric Version 158-162 vii. Auxiliary Memberships viii. Process of Submission of Research Paper ix. Preferred Author Guidelines x. Index Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 1

e see a drastic momentum everywhere in all fields now a day. Which W in turns, say a lot to everyone to excel with all possible way. The need of the hour is to pick the right key at the right time with all extras. Citing the computer versions, any automobile models, infrastructures, etc. It is not the result of any preplanning but the implementations of planning. w With these, we are constantly seeking to establish more formal links with researchers, scientists, engineers, specialists, technicale experts, etc., associations, or other entities, particularly those whoi are active in the field of research, articles, research paper, etc. by inviting them to become affiliated with the Global Journals. V This Global Journal is like a banyan tree whose branches are many and each branch acts like a strong root itself.y Intentions are very clear to do lbest in all possible way with all care.

Dr. R. K. Dixit r Chief Author [email protected]

E

P a g e | 2 Vol. 10 Issue 4 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Load Balanced Clusters for Efficient Mobile Computing

Dr. P.K.Suri and1 Kavita Taneja2

Abstract-Mobile computing is distributed computing that routing and within the cluster MUs be in touch directly. involves components with dynamic position during However, MUs communicate outside the cluster through a computation. It bestows a new paradigm of mobile ad hoc centralized MU that is called Clusterhead (CH). CH elected networks (MANET) for organizing and implementing to be part of the backbone for the MANET system and is computation on the fly. MANET is characterized by the assigned for communication with all other clusters [2, 3, 4]. flexibility to be deployed and functional in “on-demand” situations, combined with the capability to ship a wide This provides a hierarchical MANET system which assists spectrum of applications and buoyancy to dynamically repair in making the routing scalable. CHs are elected according to around broken links. The underlying issue is routing in such several techniques. The CH allows for minimizing routing dynamic topology. Numerous studies have shown the difficulty details overhead from other MU within the cluster. for a routing protocol to scale to large MANET. For this, such Overlapping clusters might have MUsw that are common network relies on a combination of storing some information among them which are called gateways [5]. MANET about the position of the Mobile Unit (MU) at selected sites and requires efficient routing algorithm in order to reduce the on forming some form of clustering. But the centralized amount of signaling introduced due to maintaining valid Clusterhead (CH) can become a bottleneck and possibly lead to routes, and therefore enhancee the overall performance of the lower throughput for MANET. We propose a mechanism in which communication outside the cluster is distributed through MANET system [6,7]. As the CH is the central MU of separate CHs. We prove that the overall averaged throughput routing for packets destinedi outside the cluster in the distinct increases by using distinct CHs for each neighboring cluster. clustering configuration, the CH computing machine pays a Although increase in throughput, reduces after one level of penalty of unfair resource utilization such as battery, CPU, traffic rates due to overhead induced by “many” CHs. and memory [8]. Several studies [9, 10, 11] have proposed a CH election in order to distribute the load among multiple I. MOBILE COMPUTING: VISION AND CHALLENGES V hosts in the cluster. Our approach extends the same concept obility originates from a desire to move toward the of load balancing among CHs too. Section 2 discusses the Mresource or to move away from scarcity and in rare related work and outlines major challenges while clustering cases it may be just a nomadic move. Wireless mobile in MANETs, section 3 discusses the multi-CH approach, computing faces additional constraints induced by wireless section 4 presents the system model, section 5 discusses the communications and the demand for anytime anywherey numerical results obtained, and finally paper is concluded communication towards the vision of ubiquitous or with future scope in section 6. pervasive computing. It is accepted that the new parameters in mobile computing [1] are mobility of elements,l the II. RELATED WORK limited resources of the Mobile Units (MUs) and the limited Several mechanisms of CH election exist with an objective wireless bandwidth. The ―mobility‖ and ―position‖ has a to endow with efficient mobile computing in terms of stable more significant effect on the developmentr of middleware, routing in the MANET system [12, 13]. Some mechanisms simulators and services for the MU than the other favor not changing the CH to reduce the signaling overhead parameters. These characteristics can be viewed in a involved in the process, which also makes the elected MU hierarchical fashion where thea basic elements influence usage of its own resources higher [14]. Other mechanism higher more complicated systems. The mobile computing assigns the CH based on the highest MU ID as in the Linked challenges on the one hand irrevocably handicapped the Cluster Algorithm, LCA [15]. However, this selection existing infrastructure in effectively supporting the process burdens the MU due to its ID. CH can become exponentially rising E demands and on the other hand open bottleneck and lead to propagating congestion. One option is new avenues and opportunities for Mobile Ad Hoc Network to elect CH for a defined duration and then all MUs have a (MANETs). In general, such solutions rely on a combination chance to be a CH [3]. This mechanism keeps the CH load of storing some within one MU for the information about the position of the MU at selected sites CH duration budget, while it provides a balance of and on forming some form of clustering. The MUs are responsibilities for MUs within the cluster. Also, MU with a grouped in distinct or overlapping clusters for the purpose of high mobility rate may not get the chance to become a CH if ______its mobility rate is higher than the duration of CH rotation.

About-1 Professor Deptt. of Comp. Science & Applications, Kurukshetra But transition and the duration budget contribute greatly to University, Kurukshetra, Haryana, India (e-mail;[email protected]) overhead. Mobility is one of the most important challenges About-2 Asstt. Prof., M.M. Inst. of Comp. Tech. & B. Mgmt., of MANETs, and it is the main factor that would change M.M.University, Mullana, Haryana, India(e-mail; [email protected])

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 3 network topology. A good electing CH does not move very community. This section briefly summarizes some of the quickly, because when the clusterhead changes fast, the major challenges faced while clustering in such network MUs may be moved out of a cluster and are joined to [12-15]. another existing cluster and thus resulting in reducing the B. Heterogeneous Network stability of network. Hence, CH election mechanisms consider relative MU mobility to ensure routing path In most cases MANET is heterogeneous consisting of MUs availability [16, 17], however, causing an added signaling with different energy levels. Some MUs are less energy overload and causing the elected CH to pay the higher constrained than others. Usually the fraction of MUs which resource utilization penalty. We can conclude from the are less energy constrained is small. In such scenario, the existing research that several tradeoffs exist for the elected less energy constraint MU are chosen as CH of the cluster CH and the other cluster MUs. Firstly, the CH has to bear and the energy constrained MUs are the member MUs of the higher resource utilization such as power, which may cluster. The problem arises in such network when the deplete its battery sooner than other MUs in the cluster. In network is deployed randomly and all cluster heads are addition, possibly causing more delay for its own concentrated in some particular part of the network resulting application routing due to the competition with the routing in unbalanced cluster formation and also making some for other MUs. Secondly, despite fair share responsibility of portion of the network unreachable. Also if the resulting CH role, it is possible that heavy burst of traffic takes place distribution of the CHs is uniform and if we use multi hop causing some CHs to use maximum resources, while others communication, the MUs which are close to the CH are encounter low traffic bursts resulting in minimum resource under a heavy load as all the traffic isw routed from different use. Thirdly, the fair share or load balancing technique [3], areas of the network to the CH is via the neighbours of the might result in a CH that will not provide the optimal path CH. This will cause rapid extinction of the MUs in the for routing, or yet a link breakage. Plus non CH are neighborhood of the CHs resultinge in gaps near the CHs, privileged as they don‘t pay a routing penalty and have decreasing of the network size and increasing the network resources dedicated for own usage only. Therefore, there is energy consumption. Heterogeneous MANET require no one common CH election mechanism that is best for careful management i of the clusters in order to avoid the MANET systems, without some hurting tradeoffs. The Zone problems resulting from unbalanced CH distribution as well Routing Protocol (ZRP) [18] provides a hybrid approach as to ensure that the energy consumption across the network between proactive routing which produces added routing is uniform. control messages in the network due to keeping up to date V routes, and reactive routing which adds delays due to path C. Network Scalability discovery and floods the network for route determination. In MANET new MUs comes in the vicinity of the current ZRP divides the network into overlapping zones, while network. The clustering scheme should be able to adapt to clustering can have distinct, non overlapping clusters. In changes in the topology of the network. The key point in ZRP, Proactive routing is used within the zone, and reactive designing cluster management schemes should be if the routing is used outside the zone, instead of using one typey of algorithm is local and dynamic it will be easy for it to adapt routing for the whole network. In addition, [18, 19] suggest to topology changes. that hybrid approach is suited for large networks, enhances l D. Uniform Energy Consumption the system efficiency, but adds more complexity. Each MU has a routing zone within a radius of n hops. All MUs with Clustering schemes should ensure that energy dissipation exactly n hops are called peripheral MUs,r and the ones with across the network should be balanced and the CH should be less than n are called interior MUs. This process is repeated rotated in order to balance the network energy consumption. for all MUs in the network. A lookup in the MU‘s routing table helps in deciding if the destination MU is within the E. Multihop or Single Hop Communication zone resulting in proactivea routing. Otherwise, the The communication model that MANET uses is multi hop. destination is outside the zone, and reactive routing is used Since energy consumption in wireless systems is directly which triggers a routing request. As a result of a routing proportional to the square of the distance, most of the response, one of the peripheral MUs will be used as an exit routing algorithms use multi hop communication model route from the zone toE the destination. While, if clustering is since it is more energy efficient in terms of energy applied, the same elected CH is used for routing outside the consumption however, with multi hop communication the cluster without triggering any route discovery to the MUs which are closer to the CH are under heavy traffic and destination. As discussed above, the main focus of the can create gaps near the CH when their energy terminates. existing work focuses on an election of single CH for a cluster. Even though this minimizes the overall signaling F. Cluster Dynamics overhead in the cluster, but it mainly can make the central Cluster dynamics means how the different parameters of the CH a bottleneck. cluster are determined for example, the number of clusters A. Challenges And Issues In Clustering in a particular network. In some cases the number might be reassigned and in some cases it is dynamic. The CH Despite the tremendous potentials and its numerous advantages MANET pose various challenges to research P a g e | 4 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology performs the function of compression as well as transmission of data. The distance between the CHs is a major issue. It can be IV. SYSTEM MODEL dynamic or can be set in accordance with some minimum value. In case of dynamic, there is a possibility of forming We have used glomosim [22] simulator, running IEEE unbalanced clusters. While limiting it by some pre-assigned, 802.11 to prove our contribution. Our MANET system minimum distance can be effective in some cases but this is consists of four distinct non-overlapping clusters with a an open research issue. Also CH selection can either be physical terrain of 1500 meters by 1500 meters as shown in centralized or decentralized which both have advantages and Fig. 1. For the same cluster, we ran simulation experiments disadvantages. The number of clusters might be fixed or with one CH, and compared its performance results with dynamic. Fixed number of clusters cause less overhead in tests using 3 CHs. Each CH has an independent queue for that the network will not have to repeatedly go through the packets destined for the neighboring clusters for which a set up phase in which clusters are formed. In terms of particular CH is meant. During the simulation, we scalability it is poor. maintained the same CHs in both cases (single, multiple CHs), since changing the CH was irrelevant to what we are III. MULTI – CH APPROACH. proving. Our traffic type has Constant Bit Rate, (CBR), and The existing clustering approach encourages election of one File Transfer Protocol, (FTP), traffic. The same traffic load CH [20, 21]. The proposed work enhanced the architecture was run for both cases (single, 3 CHs). The selected traffic to use multiple CHs and distributes the load of the single CH load was chosen based on tests that allowed sufficient amongst multiple CHs in the same cluster. The proposed utilization of the channel. w mechanism does not mandate a specific CH election process. Any of the prior work [9, 10] can be used to select the CHs for a cluster. By distributing the load, a single CH Cluster 3 Cluster 2 e does not have to bear all the added responsibility of being CHs CHs the central point for routing in a cluster. Therefore, we believe this approach provides a more fair solution of i sharing inter-cluster routing responsibilities for a cluster. In addition, other mechanism can be applied to switch the responsibility of a CH to another MU, such as in [3]. In the Cluster 1 case of one CH per cluster, a link breakage caused by the V CHs Cluster 4 failure of the CH isolates all cluster MUs from CHs communicating to/from outside the cluster. However, our approach reduces the link breakage to be only in the direction towards a path where the failed CH forwards the data. Therefore, the reliability of routing in the MANET system is increased. We explore the certain benefitsy of Fig.1. Multi-CH Simulation Setup having multiple sinks in the network as follows: In this model Cluster 4 operates as a cluster with one CH Energy efficiency: In MANET, long routing pathl lengths and with many CHs. The remaining clusters operate with from MU located at the cluster borders to the CH are one CH. This work can be expanded by incrementing the observed. Adding extra CH to the cluster decreases the number of CHs in a cluster such that it has one CH per average path length between a MU andr the CH due to neighboring cluster. Our traffic included FTP traffic shorter geographic distance between them. Therefore, the generated between MUs in all clusters in the MANET number of hops that a packet has to travel to reach a CH gets system. The FTP sessions where established in both smaller. Since each traveled hop means the data packet directions. In addition, CBR traffic was generated in both consumes some energy at the avisiting MU, traveling fewer directions between MUs in cluster 4, and clusters 1, and 2. hops results in consuming lesser energy. In order to focus on the objective of distributing the CH Avoiding congestion near a CH: Using multiple CHs can load, we setup static routes in our MANET system. Routing also relieve the trafficE congestion problem associated with a from cluster 4 to cluster 2 was done via the intermediate single-CH system. cluster 1/cluster 3, and vice versa. Therefore, since there are Avoiding single point of failure: A single-CH is not robust 3 neighboring clusters to cluster 4, the system allowed for against failure of the CH or the MU around the CH. Multi- the use of 3 CHs, one for routing to/from each neighboring CH are therefore more resilient to MU failures. However, cluster. deploying many CHs does not solve the problem directly and evenly. It is essential to distribute cluster load among V. NUMERICAL RESULTS CHs and choose an optimal route(s) between MU and the Our simulation focused on the cumulative averaged corresponding CH. throughput and response time. Fig. 2 shows the percentage of increase in throughput when running multiple CHs over using one CH. In all cases, the throughput increased for the multiple CHs case. For the small simulation time of 1000S

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 5 and with the traffic load used, the increase was only about 7000S. The tests were run with one CH and multiple CHs 18% since the system was lightly loaded as a result of a for cluster 4. The throughput results are presented in Fig. 3. short simulation time. Therefore, one CH operated well The results show the percentage of increase in the averaged since the channel was not well utilized. Our peak results cumulative throughput for running multiple CHs over one show that at 7000S of simulation time, we reached a CH. We ran test at 4 traffic rates: High, medium (half of the maximum throughput improvement as this case indicates the high), low traffic rate (half of the medium) and at much channel utilization was at its optimal condition. Therefore, lower traffic rate than the low traffic for the longer simulation times, beyond what we concluded as optimal, the throughput decreased due to the added traffic on the channel. 120

100

80

60 40

Increase in Throughput(%) Increase 20

0 1K 2K 3K 4K 5K 6K 7K 8K 9K 10K Run length (sec) Fig.3. Throughput Improvement (%)w VS Traffic Rates Fig.2. Run length (sec) VS Throughput Improvement (%) rate which we called very low rate traffic. We have noticed, as shown in Fig. 3, the percentage of throughput The optimal case of 7000S proves the advantage of improvement for the very low was only nearly 50%. This is distributing the load to multiple CHs, we have gained about e attributed to the low channel utilization by the low traffic 101% improvement in throughput. Our results are explained rate. At the high traffic rate we have shown a reduced by the simple queuing theory model: improvement in throughputi due to traffic overload and multi ρ = λ / μ (1) queue overhead in the MANET system. This traffic overload where, ρ is the traffic intensity, λ is the traffic arrival rate was created by the higher arrival rate due to the added and μ is the service rate at each CH with queue length QLI sessions. However, at medium traffic rate, we obtained (k,l) with k as no. of packets and l as no. of CHs per cluster. about theV same level of throughput improvement as our Eq.1 indicates that ρ increases if the λ increases while μ optimal selected rate. We conclude that at these rates we remains at the same rate. In addition, the overall averaged obtained system stability with the offered traffic and service cumulative response time, increases if a constant service rate rates with many CH. Therefore, the results shown in Fig. 3 is maintained, while the traffic arrival rate increases. Our validate the selected traffic for our results above simulation showed that the response time remained constant when using one single CH, and multiple CHs of about y0.5. VI. CONCLUSIONS AND FUTURE WORK The traffic rate in the system is given by Box Muller Our contribution proves that one CH per cluster does not transformation (Eq. 2) with given σ=1 and μ=0 and rand1, provide for a maximized throughput of the MANET system rand2 as samples from U (0, 1). l due to the added responsibility for the one CH. Using s = (-2 Log (rand1)1/2 Cos (2π. rand2) (2) multiple CHs (with independent queue) per cluster The traffic rate is increased as indicated by the throughput distributes the load among multiple MUs which enables increase due to the multiple CHs, whiler maintaining the simultaneous and shared responsibility of inter cluster same response time. Normally, if the arrival rate increases routing among multiple MUs. It is an interesting finding to while maintaining the same service rate, then the response note that the increase in throughput due to the added CHs is time should increase accordingly. Therefore, we can a proportional to the number of CHs. Beat with the number conclude that, by maintaining the same response time, the equal to the neighboring clusters. Depending on the added traffic rate due to an increase in service rate results in topology and traffic pattern, if all CHs are simultaneously constant system utilization. In our topology, we increased used to route traffic, the rate of throughput increase fails to the number of CHs Eto 3. However, our throughput is about be the multiplier of the original throughput when using one doubled as shown in Fig. 2. We should expect by the CH due to overhead of maintaining multiple CHs in a distribution of work to 3 CHs, and by having the same cluster. It is suggested to do further research when having all averaged delay for the MANET system, a 3 fold increase in clusters employing multiple CHs, one per neighboring throughput since the service rate has tripled. However, we clusters. Also one expansion of the system model is to take only gained double the throughput due to cumulative one common queue and dispensing the packet to the idle CH increase in overall overhead due to the added traffic rate by irrespective of the neighboring cluster route. It is expected having multiple queues, one for each CH. In addition, as the that the throughput will increase at a very high rate as traffic arrival rate increased due to having the 3 CHs, the MANET is blessed with multi hop communication and service rate also increased, resulting in the same utilization minimizing the idle time of CHs will lead to balancing the rate for the MANET system. We ran additional test to overhead caused by their existence. validate the traffic rate at our selected simulation time of P a g e | 6 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

VII. REFERENCES wireless ad hoc networks,‖ Ad Hoc Networks, vol. 5, no. 4, pp. pp. 504-530, May. 2007. 1) Buss, D. 2005, ―Technology and design challenges 12) Khac Tiep Mai , Dongkun Shin , Hyunseung Choo, for mobile communication and computing ―Toward stable clustering in mobile ad hoc products,‖ in proceedings of the 2005 International networks,‖ in proceedings of the 23rd International Symposium on Low Power Electronics and Design Conference on Information Networking, pp.308- (San Diego, CA, USA, Aug. 08-10, 2005). ISLPED 310, Jan. 21-24, 2009, Chiang Mai, Thailand. '05. ACM, New York, NY. 13) X. Hong, M. Gerlo, Y. Yi, K. Xu, and T. J. Kwon, 2) S. Sivavakeesar, and G. Pavlou,‖Stable clustering ―Scalable ad hoc routing in large, dense wireless through mobility prediction for large-scale networks using clustering and landmarks,‖ in multihop intelligent ad hoc networks,‖ in proceedings of the IEEE International Conference proceedings of the IEEE Wireless Communications on Communications (ICC'02), vol. 25, no. 1, pp. and Networking Conference (WCNC'04), Georgia, 3179-3185, Apr. 2002. USA, Mar. 2004, vol. 3, 1488-1493. 14) ER, I. I. and Seah, W. K., ―Clustering overhead and 3) Amis, and R. Prakash, ―Load- Balancing Clusters convergence time analysis of the mobility-based in Wireless Ad Hoc Networks,‖ in proceedings of multi-hop clustering algorithm for mobile ad hoc the 3rd IEEE Symposium on Application-Specific networks,‖ in proceedings of the 11th international Systems and Software Engineering Technology Conference on Parallel and Distributed Systems - (ASSET'00), pp 25, Mar. 2000. Workshops ICPADS. IEEE Computerw Society, vol. 4) M. Gerla, and J. Tsai, ―Multicluster, Mobile, 02, pp. 130-134 Washington, DC, Jul. 20 - 22, Multimedia Radio Network,‖ ACM Journal on 2005. Wireless Networks, vol. 1, no. 3, pp 255-265, 1995. 15) Jane Y. Yu and Petere H.J. Chong, ―A survey of 5) Nocetti, J. S. Gonzalez, and I. Stojmenovic, clustering schemes for mobile ad hoc networks‖ ―Connectivity based k-hop clustering in wireless IEEE Commun. Survey & Tutorial, vol 7 no. 1, pp. networks,‖ Telecommunication Systems Journal, 32-48, Mar. 2005.i vol. 22, no 1-4, pp. 205-220, 2003. 16) C. R. Lin and M. Gerla, "Adaptive clustering for mobile wireless networks," IEEE JSAC, vol. 15,

6) Arboleda C., L. M. and Nasser, N, ―Cluster-based pp. 1265-75, Sept. 1997. routing protocol for mobile sensor networks,‖ in 17) A,VMcDonald, T. F. Znati, ―A mobility based proceedings of the 3rd international Conference on framework for adaptive clustering in wireless ad Quality of Service in Heterogeneous hoc networks,‖ IEEE JSAC, vol. 17, no. 8, Wired/Wireless Networks (Waterloo, Ontario, pp.1466- 1486, Aug. 1999. Canada, August 07 - 09, 2006). QShine '06, vol. 18) Z. J. Haas, and M. R. Perlman, ―The performance 191. ACM, New York, NY, 24. of query control schemes for the zone routing

7) Akkaya K., Younis M., "A survey on routingy protocol,‖ in proceedings of ACM Sigcomm‘98, protocols for wireless sensor networks", Elsevier vol. 28, no. 4, pp 167 – 177, Oct. 1998. Ad Hoc Network Journal, vol.3, no. 3, pp. 325-349, 19) P.Y. Chen, and A.L. Liestman, ―Zonal algorithm 2005. l for clustering an hoc networks,‖ International

8) Cardei, I., Varadarajan, S., Pavan, A., Graba, L., Journal of Foundations of Computer Science, in a Cardei, M., and Min, M, ―Resource management special issue dedicated to Wireless Networks and for ad-hoc wireless networksr with cluster Mobile Computing, vol. 14, no. 2, pp. 305-322, organization,‖ Cluster Computing, vol.7, no.1, pp. Apr. 2003. 91-103, Jan. 2004. 20) Zang, C. and Tao, C., ―A multi-hop cluster based

9) Wang, S., Pan, H., Yan,a K., and Lo, Y, ―A unified routing protocol for MANET,‖ in proceedings of framework for cluster manager election and the 2009 First IEEE international Conference on clustering mechanism in mobile ad hoc networks,‖ information Science and Engineering (December Comput. Stand. Interfaces, vol. 30, no. 5, pp. 329- 26 - 28, 2009). ICISE. IEEE Computer Society, 338, Jul. 2008.E Washington, DC, pp. 2465-2468, 2009.

10) V. S. Anitha , M. P. Sebastian, ―Scenario-based 21) Wang, C., Yu, Y., Xu, Y., Ma, M., and Diao, S., diameter-bounded algorithm for cluster creation ―A multi-hop clustering protocol for MANETs,‖ in and management in mobile ad hoc networks,‖ in proceedings of the 5th International Conference on proceedings of the 2009 13th IEEE/ACM Wireless Communications, Networking and Mobile International Symposium on Distributed Computing (Beijing, China, September 24 - 26, Simulation and Real Time Applications, pp.97-104, 2009). IEEE Press, Piscataway, NJ, pp. 3038-3041, Oct. 25-28, 2009 2009

11) Spohn, M. A. and Garcia-Luna-Aceves, J. J., 22) .Web site for glomosim simulator, ―Bounded-distance multi-clusterhead formation in http://pcl.cs.ucla.edu/projects/glomosim/

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 7

Corporate Data Obesity:50 Percent Redundant

Hae Kyung Rhee

Abstract-In this essay, we report what we have observed with research and management information systems research. In regard to status quo of corporate information systems in real this sense, it is almost impossible to find any past work in world from our experiences of twenty years of data the literature made with regard to this issue. Note that the management practices. It is considered to be serious in that concept of data obesity is essentially irrelevant to data data are too conveniently and frequently replicated to make volume. Although introduction of some upper-level data information systems improperly behave in terms of their stores like data warehouses (DW) or data marts (DM) other quality standards including response time. Average ratio of data replication in a site is astonishingly judged to be more than the lower-level operational data stores (ODS) in than 50 percent of a whole corporate database. It is in reality corporate environment certainly contributes to abundance of about 65 percent in average to our knowledge. Presenting this data, DWs and DMs are out of scope in this essay. If we paper to academia has been motivated by our strong belief and stick only to ODSs, we could observe that a lot of obesity is evidence that most of the redundancy can effectively and already there in corporate environment. systemically be removed from the very start of information Note that, in a fairly large corporate such as General Electric system development. We also noted that field workers or Samsung Electronics, there are approximatelyw 15,000-to- including database administrators in corporate environment 20,000 data attributes in their database. Notice also that the tend to think data part of IS and program part of IS mixed level of redundancy in data attribute is not exactly the same together from the start of IS design and popularity of this tendency eventually caused a lot of entanglement that could as the level of redundancye in data volume. However, to hardly be dealt with later by themselves. We therefore present make it comparatively simple to have some idea about a couple of mandates that must be respected in order not to get redundancy i involved in such a perplexity in terms of data volume, since a lot of people in field work Keywords-Corporate Data Obesity, Data Redundancy, prefer this way of understanding, when we happen to hear Enterprise Data Map. that database size of some company is, for instance, 100 terabytes, it is legitimate or reasonable to think that the I. CONCEPT OF OBESITY company Vin reality has a database of approximately 35-to-50 t is not unusual to think that if a person is weighed more TBs. So, in case 50-to-65 TBs of data can be totally I than about 20 percent of what needs to maintain for eliminated from the corporate database and this elimination fitness then he or she is considered to be over-weighted. does never affect harm the operation of the database This is what we understand with regard to concept of at all. Redundancy demands a huge cost in terms of waste in obesity. It is no different for data in corporate environment. storage and belatedness in response to database queries. It will be y Note that even 1 TB of data amounts to piling A4 size astounding to recognize that the degree of data obesity in papers up about 100 kilometers high. corporate is far more than 20 percent. It is in fact l65 percent Redundancy or replication gives some illusion that it could in average for some dozens of large enterprises we have contribute to enhancement of response time, but on the other observed in depth for the past twenty years. To be exact in hand things can get messy if we consider consistency of terms of terminology, the unit of obesityr we mean is data data. The quality of answers to data queries could be always attribute. For example, if there is a customer data and it is in question, since making all the replica copies to have the comprised of c-name and c-address, c-name and c-address same value usually takes a substantial amount of time due to are the data attributes. So, in case c-name appears more than non-automatic processes of such data value propagation. a Manual propagation by considerate programming once in a corporate database, it is called redundant or replicated. Although the reports on data abundance in nevertheless unfortunately incurs unforced human errors and corporate environment have been made in the literature, as there is no guarantee for data consistency at all across a far as we know, onlyE the issue of data deluge [Cukier2010, corporate database. Once an inconsistent value of data KaBoZe2010] has been dealt with a couple of times in order happens to be used to reply the queries, trust of information to emphasize world-wide phenomenon of rapidity in system would unbelievably collapse. Issue of mistrust would increase of data in terms of volume. The issue of data then raise the question of integrity with regard to a whole obesity is new in the world-wide communities of database information system. Therefore, limiting the occasions of data replication to be ______minimal is necessary whenever it is possible. Unless the rate About- Associate professor at Dept. of Computer Game & of data redundancy is substantially reduced, say to about 15 Information in Yong-In Songdam college. percent by means of wary design from the outset of IS (telephone:82-31-330-9234 email:[email protected]) development, data normalization theories [YuJa2008] that have been esteemed almost over the past thirty years turn out to be ―useless‖ at all in real world. To our knowledge P a g e | 8 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

and experiences, they could contribute mere 5 percent of reduction comes quite before some tabular form of data data redundancy reduction. The other 45 percent of begins to emerge in the process of IS development and that is just determine whether any data is a semantic derivative of some where we start to lay out job descriptions, in non-technical other data. This semantic data duplicity is the major malice term. We will get back to this later in this essay after to make corporate database incurably obese. So, it is discussion with regard to how people in IT field are necessary to remove syntactic duplication, but it is insensitive to the issue of redundancy exceedingly more crucial not to forge any possibility of semantic duplicity from the very outset of IS development. II. UNNECESSARY REDUNDANCY It really is almost impossible to check semantic equivalence, an arena where data is represented in a form of table or even periodically, once an information system is in relation, in expertise terminology, the concept of keys like operation day to day. primary key and foreign key is technically inevitable. III. DE-NORMALIZATION—PANACEA OR DEADLY Basically, if a particular key of table, say A, dubbed its HOMEPATHY? primary key, is duplicated in another table, say B, as a part or component of key of B, that key is denoted as a foreign It is really unfortunate that we have never seen any data key in B, as it has been imported or borrowed from other table or relation that even follows the rule of well-known table, which is A. This clarifies that origin of the key is from first normal form (1NF) in real world corporate databases. A, not B. This way of designating and incorporating such So, sometimes it is ridiculed that real world databases only externality of key will bring IS about 15 percent of data contain tables of non-normal form orw zero normal form, redundancy contained intrinsically, which is technically since they have properties significantly inferior than 1NF in unavoidable if we stick to the tabular representation of data. terms of data quality such as the degree of data redundancy This portion of redundancy can be called redundancy of and dependability of non-keye data attributes to key necessity. So, if data obesity ratio is said to be 65 percent, it attributes. The beauty of table normalization or table is true that about 45 percent of the entire data is therefore standardization by applying 1NF, 2NF, 3NF or Boyce-Codd classified to be unnecessary or superfluous in their nature. NF is that whenever i there is a data redundancy in a table Whether to remove this much of unnecessary redundancy or then it is possible to remove it by decomposing or splitting unwanted replication is up to decision of an individual data the table into two. manager, but unless removal of them is done the In corporate IT field unfortunately a term ―de- information system would definitely be hampered or normalization‖V [JoJA2007] has gained so much popularity suffered by lack of consistency and further by eventual in a sense that field managers usually do not have a time to slowness in response time. Note that, normally in the pay attention to and understand the theories behind database queries of any corporate, about half of them are normalization. They at first pretend to understand and use update requests and the other half are retrieval requests. If them, but in reality they sooner or later totally forget about this reality of read-write ratio, i.e. 0.5, is ignored, we are them. By far, we are very unfortunate that we have never soon tempted to allow data duplication by assuming y that seen any database administrator who really does understand reads are much more frequent than writes, and subsequently the basic difference between 1NF and 2NF. The reality is a fatal disaster would then be experienced sooner or later that they keep never trying or studying to grasp the meaning l and benefit of making tables normalized and keep feigning due mainly to data inconsistency dilemma. The payoff for burden of upholding this unnecessary to have started with 1NF initially for IS development and to redundancy is really enormous. Usually, it would be about proceed forward to make tables in up to 3NF and all of r sudden for the sake of performance they inevitably and five times more costly than the case where the level of redundancy is minimally enforced. So, it is going to be 10 eventually come to resort to 1NF again. But this could be a million dollars versus 50 million dollars when so called next sort of fictional story and hence never true at all, since they generation, i.e. enhanced version,a of information system is always had failed to tell us what the intrinsic difference to be developed. As the degree of data redundancy between 1NF and 3NF is. A number of experiments increases, data consistency tasks among operational [KSLM2008] already have shown that having tables in 3NF databases exponentially as well increase in proportion to the performs always better than 2NF or 1NF and that 3NF is E considered to be quite optimal even in cases where seven- amount of increase in data redundancy. Note that there is inevitably redundancy between the lowest-level database way table joins are conducted. Note that 7-way join means and its upper-level data warehouses, since data in database that combining seven different tables, each fairly large in are in principle shoveled upward to its data warehouses in our experiments, at the same time. the process of generating data warehouses. It is also a The real problem with IT field managers and even database natural consequence that another layer of redundancy is administrators is that they hardly understand even what the unavoidable between data warehouses and their upper-level 1NF is. Note that in any data-related literature for the past data marts. forty years of history, notion of ―de-normalization‖ has In case data redundancy is existent, it is not difficult to find never been introduced, but they pretty much fond of taking many of duplication are intrinsically semantic. Syntactic that jargon just in order to forget about normalization stuff duplication is easy to find out, but it is almost impossible to and to wish to let themselves totally unaware of any

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 9 impending issues related to data consistency. They seem to corporate. It is judged to be improper or abnormal if the be soon relieved to hear by someone else that normalization action ‗sell‘ appears more than once in entire job could always be compromised for the reason of descriptions of the corporate. This kind of effort in reducing performance. To our knowledge, they are misled by mainly or removing actions redundancy has no relationship in what outside IT consultants who have never been trained enough is known to be crucial like 1NF, 2NF or 3NF, as emphasized in basic knowledge in database. So, it is actually a very in the literature. But removal effort with regard to demanding burden to make them understand what the redundancy in data attributes directly associated with actions normalization theories are all about. is far more important than the removal of redundancy in However, this is not too bad if we know that having tables tables at a later stage of database creation. If the removal even in 3NF could contribute to reduce the degree of data effort is not sufficiently done, redundancy thus retained redundancy by at most about 5 percent, which is not too intentionally or unintentionally would then automatically be much. Consequently, the contribution of normalization transferred intact to tables at the instance of table creation. would be only minor. But then, where is the majority of From the perspective of who or what is in charge of contribution come from? It comes much prior to the dynamically creating data in corporate environment, it is fair formulation of tables. In order to realize this, we have to to admit that behaviors, rather than fixed entities, play the know what and where the origin of data essentially is in major role of such creation. Fixed entities that are always corporate environment. Where is the place where expressed as nouns in description statements like redundancy really starts to build? It is at the very beginning ‗employee‘ and ‗department‘ normally generate only static of business processes, not where the normalization theories data attributes and thus said to be onlyw at the outskirt in are just about to be applied. Wouldn‘t it be curious that data-creating activities. In this sense, it is meaningful if we where are all the data that are to be appeared eventually in preferably write job descriptions in a way of behavior-by- tables come from? behavior. Each behavior thene has a responsibility for creating only IV. NECESSITY OF BUSINESS PROCESSES DESCRIPTION meaningful data attributes. In case a behavior does not Let us turn our attention to how business processes are contribute to generatei certain attributes, it has no value of described so that field workers can communicate each other existence to be independent or stand alone. This means that later on. They will certainly be in a form of business in that case it is reasonable to place that behavior to be processes description or job description. So, the subsumed by some other behavior that is directly relevant transformation of job descriptions into data tables might and superiorV to it. take a couple of interim stages, since descriptions V. BEHAVIOR-ORIENTED JOB DESCRIPTIONS themselves have a format different from table and there is no direct, straightforward method that can map the As we have observed over the past 20 years, the unit of descriptions into tables. Then, how is job description resources that is assigned to an employee is normally a job. comprised of? In it, there could appear data entity like Definition of jobs has been in a sense pretty much well employee or department which has fixed values for datay established in corporate. For example, we could count the attributes it is comprised of. number of jobs in a corporate without much difficulty. To For example, a data entity ‗employee‘ might consistl of data our experience, a mid-size corporate has about 500 to 1,000 attributes ‗address‘ and ‗social security number‘ and their jobs and to perform those jobs it normally requires to values are normally fixed, i.e., not changed over time. In maintain the number of employees of about twice as much case in job description there is a descriptionr statement like as the number of jobs, since it is a usual practice to assign ―An employee sells a machine.‖, data entities ‗employee‘ two persons to a single job in order to prepare for and ‗machine‘ will have such fixed values, while on the emergencies of just-in-case. So far, we have seen a number other hand data entity ‗sell‘ is different in that the values of corporate that have about 500 jobs and 1,000 employees that data attributes of ‗sell‘ a like selling date or selling in real world. This might be a kind of standard for mi-size volume vary, i.e., changed each time the action or behavior corporate. ‗sell‘ is performed. So, action entities are at the focal point We were able to observe from our experience that each job in terms of creating Edifferent data values in the database. It in average could be comprised of some 20-to-30 actions or can be considered that the source entity of action ‗sell‘ is behaviors in case data-creating actions are only taken into ‗employee‘ and its destination entity is ‗machine‘. This way account in job descriptions. So, if there are 500 different of writing job descriptions by taking action-oriented jobs in a corporate, then it means that there are about approach or behavior-oriented approach [KDLM2007] is 10,000-to-15,000 behaviors altogether in that company. straightforward. It could be fairly easy to understand for With no redundancy in actions, those some 10,000 employees who have a mission of writing a description for behaviors must be unique in that they do not incur jobs they actually perform. redundancy of any types so that each of them must appear Efforts to make job descriptions to be free from data once and at most once throughout the entire corporate redundancy are essential and valuable to check whether database. there is redundancy of any sort for each particular action. This means the action ‗sell‘ above appears at most only once in job descriptions of whole business processes of a P a g e | 10 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

VI. ENTERPRISE DATA MAP system. We emphasize that any programming effort must be deferred until the finalization of EDM. EDM in this sense is These behaviors are in a sense interconnected each other in the blueprint for any design like, for instance, building or a way that each data-creating action has one fixed entity on road. To our knowledge, EDM is definitely the blueprint for its left and one more fixed entity on its right. If we denote a information system prior to any programming effort. What behavior by B and a fixed entity by E, then the web of those we emphasize is that data itself is essentially data in that interconnection would look like a type of ‗E—B—E‘. So, programming must begin to take place only after the data the whole picture would look something like a rectangular formulation has been made to sure to be completely type that would allow data accesses or data retrievals in wrapped up. Data-first programming-later approach is either direction, clockwise or counter-clockwise, as depicted crucial for the success of information system. If data stuff in arrows in Fig.1. and programming stuff are mixed together from the start of E BB EE information system development, chaotic situations would A duly be encountered in determining that whether an impending problem at issue is originally from data part or BB A B programming part. We emphasize that any data cannot be A A represented or expressed or substituted in a way of any EE B EAE programming means. A Note that if somebody happened to introduce a data A ‗whether-a-student-is-registered-or-not‘,w then it is in fact a B A B disguise as a data in that it essentially has a sort of A A A algorithmic logic in that data. Presuming that a data like EAE B EAE ‗registration date‘ could residee somewhere else in the database already, ‗whether-or-not‘ type of decision could A A EE then be definitely dealt with some conditional statements Fig. 1. Rectangular A Path FormedE in Enterprise Data Map, like ‗if‘ in programming.i Separation of data from where B Denotes Behavior and E Denotes Entity programming must be strictly obeyed in a sense that, without separation, a bunch of semantic redundancy like this Rectangularity guarantees balance in response time in either sort of disguise could later be insidiously come into the direction of access, while if otherwise skewed case to one informationV system. If it seems that this way of algorithmic particular direction could induce degradation in response logic is certainly in a data, then it is not real data, since only time. Although there are only seven actions in this picture, the raw data is privileged to be called as data. Anything we could get a whole diagram that contains some 10,000 impure in a way of generating artifacts is not called the real behaviors if we keep extending the picture by adding more data. For example, if data C is from the result of addition of behaviors to it. The entire picture of connection without raw data A and raw data B, then C is not in principle treated allowing isolation of any picture fragment could be calledy an as data. Note that in the lowest infrastructural level database enterprise data map [Moon2004] of corporate only such raw data are entitled to reside. With this EDM, we are able to judge or realize l where the Anything else must be deported to reside somewhere else origin of a particular data attribute is and how it flows like data warehouses. throughout the entire data access paths already obtained and VIII. CONCLUSION depicted in EDM. With EDM, it is veryr easy to find out visually where are data redundancies if there are any. As a In sum, there are two major mandates that have to obey to diagram, one EDM can depict about 20 pages of A3-size in make information systems free from data obesity. The first case font size of 5 is used. Drawing would be automatic if one is that efforts for removing data redundancy should be we use a software drawing toola such as ERwin [JoJB2007]. enforced from the start of information system development, The EDM of such many pages would then easily fit into the which is from the starting point of securing job descriptions. wall of CEO‘s or CIO‘s office. Or it could also be displayed The latter one is the strict separation of data arena and on CFO‘s office in caseE he is interested in figuring out how programming arena in developing information systems. is the flow of all the data directly related to financial status Questions like whether this belongs to data or programs are quo of his company. Unfortunately, at the moment only a better to be raised as frequently as possible in order not to few corporate experienced the value of obtaining and bring any chance of confusion about which comes before maintaining the EDM, but we advocate that its use would and which comes after or later. To our knowledge, the significantly benefit many aspects of information system. degree of data obesity is guaranteed to be tolerated within at We advocate that utilization of EDM would thereafter be most 20 percent if these two mandates are strictly obeyed. plentiful according to your perspectives of looking at it. Removal of another 5 percent of data redundancy is later possible if we conduct a certain set of technical details. The VII. SEPARATION OF DATA FROM PROGRAM well-known data table normalization or data table It is needless to say that EDM is the must to be secured and decomposition theories come into play for this further kept as an asset prior to the programming of information removal. So, the benefit accrued from the data redundancy

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 11 removal efforts by application of normalization theories is Detection and Normalization. VLDB Journal, Vol. considered to be far less than we get from the efforts made 17, 203-223. at the stage of job description, which is about 30-to-45 percent of removal in data redundancy in an entire corporate database. It is adding one more flower to a beauty itself already seized if the normalization theories are applied to make tables best fit with minimal redundancy in them, but we certainly might have no regret at all when they happen to be not applied for some reason under the premise that data redundancy of all sort has already been sorted out and managed to be ruled out prior to table formulation. The adage ―Trying to start with guarantees almost half-way done already‖ still prevails in the world of information system development and making IS fit or well-being in any situation or environment comes true when we immersed to think in this manner. Consequently, the earlier we preoccupied with the trial of data redundancy removal, the better the outcome of information systems in terms of performance, clarity, transparency and promptness in w response time.

IX. REFERENCES e 1) [Cukier2010] Cukier, K. (2010, Feb. 25). Data, Data Everywhere. A Special Report on Managing Information, The Economist. Retrieved May 1, i 2010,from http://www.economist.com/specialreports/displayst ory.cfm?story_id=15557443.html V 2) [KaBoZe2010] D. Katz, M. Bommarito, & J. Zelner (2010, March 1). The Data Deluge. The Economist print edition.

3) [JoJB2007] J. Jones & E. Johnson (2007). Building and Maintaining A Database from Any ER Model. White Papers : Computer Associates. l 4) [Moon2004] S. Moon (2004). Data Architecture, Hyung-Seol Publishing Company. r 5) [KDLM2007] N. Kim, D. Lee and S. Moon (2007). Behavior-Inductive Data Modeling for Enterprise Information Systems. Journal of Computer Informationa Systems, Vol. 48, No. 1, 105-116.

6) [KSLM2008] N. Kim, S. Lee & S. Moon E (2008). Formalized Entity Extraction Methodology for Changeable Business Requirements. Journal of Information Science and Engineering, Vol. 24, No. 3, 649-671.

7) [YuJa2008] C. Yu & H. V. Jagadish (2008). XML Schema Refinement through Redundancy

P a g e | 12 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Web Mining: A Key enabler for Distance Education

D.Santhi Jeslet1 Dr . K.Thangadurai2

Abstract-This paper deals with introduction of one of the 1. Web Content Mining: It is the process of discovering application of data mining which is know as web mining. It useful information from the web which may be in the discuss about various categories of web mining. It also deals form of text, images, audio and video. For the with the application of web mining in distance education and discovery it uses the techniques of Artificial describes the possibilities of application. In this fast world Intelligence (AI), Database and most specifically Data everyone wants to be educated by acquire huge knowledge in a Mining (DM). short duration. They do not want to spend some fixed time for their education. Whenever a person is free they can learn and gain the knowledge. 2. Web Structure Mining: It helps to derive knowledge of Keywords-Data mining, web mining, distance education interconnection of documents, hyperlinks and their relationships. It uses graph theory to analyze the node I. INTRODUCTION and connection structuring of a web site. ow-a-days many organizations accumulate huge 3. Web Usage Mining: It is alsow called as web logs N amount of data. This leads to swell the size of the database as the time passes. Traditional database queries mining. This helps to judge about the usage of a web access a database using SQL queries. The output of this page. It uses computer network concepts, artificial could be data from database that satisfy the query. This intelligence and database.e output cannot give any novice information or correlation II. OBJECTIVES OF DISTANCE EDUCATION among the data. So we need a technique that finds the i hidden information from data collection in a database In the last few decades education has undergone many community which is of large size. This technique is called changes. Class room teaching is needed for face to face the ―Data Mining‖. It discovers valid, novel, potentially education which comprises of class room, presence useful new correlation and new trends from the large (physical) of some learners and a teacher/tutor. Here teacher/tutorV plays a vital role. But by the introduction of amount of data. Data mining uses pattern recognition techniques, statistical and mathematical techniques for its distance education, the interactions between the tutor and discovery. the learner have been very much reduced. Even the In the recent trend, lots of databases are available in the interaction between the learners has become almost zero. web. Not only the database, many valuable informations are The main aim of distance education is to make the society to also available in WWW. So the search area for any acquire more knowledge irrespective of the place where y they are. Those who do not want to stick on to the rules of information has become very vast. Web mining is an application of data mining which uses the data mining regular education system, prefer to earn knowledge through techniques to automatically discover and extract informationl distance education. It also encourages working people to from Web documents/services. It can also be applied to attain their learning goals. semi-structured or unstructured data like free-form text. III. HOW WEB HELPS IN DISTANCE EDUCATION? Web mining activities can be divided intor three categories: content mining, structure mining, and usage mining. The The communication between the tutor and the learner can be taxonomy of Web mining is depicted in the figure. enhanced by the introduction of distance education through web. Here learners work individually at their own place, a with the help of some study materials i.e. system, computer

program and internet. Time and space limitation of education disappears. Tutors interact with the student and E the learner interacts with the tutor via internet. The tutor supply information and learner gets it.

______

1 About- Department of Computer Science, M.G.R.College,Hosur,TN, India (e-mail: [email protected]) About-2Department of Computer Science, Government Arts College( Men ) , Krishnagiri- 635 001, India

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 13

Since many softwares are very simple and user friendly, no from two points of view: aggregate and individual path. need to get special training for working with computer. Aggregate path includes the process of clustering the Power of computers makes student to improve their ability. registered learners. The web site database has the registered Role of tutor is entirely changed. Tutor communicates and learner‘s details. This can be segmented by one of the leads the course of their learning path. Learners will be clustering techniques to discover learners with similar grouped. They learn from each other and they also assess characteristics. By using this we can determine most each other. It allows the learners to apply their knowledge in frequently visited paths of learners. Individual path helps to different situations and to solve practical problems determine a set of frequently visited web pages accessed by according to the feedback of their own action. These a learner during their visits to the server changes in educational system have developed By discovering such aggregate and individual paths for constructivism. Constructivism means learners involve learner in distance education helps in the development of actively constructing meaningful knowledge through effective customized education. Associations and experience. correlation among web pages can be discovered using association rules. This guides to discover the correlations IV. APPLICATION OF WEB MINING IN DISTANCE EDUCATION among references to various web pages available on the Organization that is responsible for distance education server by a learner or learners. Based on this the tutor collect huge volume of data, which are generated can also judge the standard of the learner. automatically by web servers and collected in the server V. CONCLUSION access logs. They also collect information from learner w (referrer) logs which contain information about the referring Web mining in distance education provides a lot of open pages for each page and also from user registration. teaching resources, so that people can teach and learn Through this an organization can get idea about thinking anytime and anywhere. It e helps the organization that is styles, learns their expectations and also about the web site responsible for distance education to discover the learner‘s structure. This helps to improve the efficiency of the web access habit and the study interest. It guides the teacher to site that is responsible for improving the knowledge of the adjust his/her teachingi techniques and the speed of teaching learners. depending on the learner‘s knowledge. So web mining Before gathering histories using mining algorithms, number technology is a key enabler of distance education. of data preprocessing issues such as data cleaning has to be VI. REFERENCES performed. The major preprocessing task is data cleaning. V This is used for removing irrelevant information in the 1) Youtian QU, Lili ZHONG, Huilai ZOU, Chanonan server log. WANG. ―Research About The Application Of Web Mining In Distance Education Platform. Scalable Computing And Communication, Eighth International Conference On Embedded y Computing,2009.SCALCOM EMBEDDEDCOM‘09 International Conference l On Digital Object Identifier 2) WANG Jian And LI Zhuo-Ling. Research And Realization Of Long-Distance Education Platform r Based On Web Mining, Computational Intelligence And Software Engineering 2009, Cise 2009, International Conference On Digital Object Identifier a 3) Sung Ho Ha, Sung Min Bae, Sang Chan Park. Web Mining For Distance Education, Management Of Innovation And Technology,2000,ICMIT 2000, E Proceeding Of The 2000 IEEE Conference On Volume 2, Digital Object Identifier 4) Zhang Yuanyuan, Mo Quian. Research Of The extracted access histories of each individual learner are Constructivism Remote Education Based On Web representing the physical layout of web sites with web page Mining , Education Technology And Computer and hyperlinks between the pages. Once user access Science 2009, ETCS‘09, First International histories have been identified, perform web page traversal Conference On Volume 2, Digital Object Identifier path analysis for customized education and web page 5) Margaret H. Dunham And S. Sridhar. Data Mining: association for virtual knowledge structures. Introductory And Advanced Topics By using different path analysis such as graph representation 6) Pieter Adriaans And Dolf Zantinge. Data Minin we can determine most frequently traversal patterns form the physical layout of a web site. Path analysis is performed P a g e | 14 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Optimized Remote Network Using Specified Factors As Key Performance Indices

John S.N1 Okonigene R.E2 Akinade B.A3

Chukwu I.S3

Abstract-This paper discuss the implementation of an locations across each link respectively. The key optimized remote network, using latency, bandwidth and performance indicators (latency, bandwidth and packet drop packet drop rate as key performance indicator (KPI) to rate) [4, 5, 6] were recorded using standard monitoring tools measure network performance and quality of service (QoS). to monitor each of the experiment performed.Graphical We compared the network performance characteristics derived analysis of the data obtain from the link performance were on the Wide Area Network (WAN) when using Fiber, VSAT and Point-to-Point VPN across the internet respectively as the used as the bases for the conclusion made in this paper using network infrastructure. Network performance variables are latency, bandwidth and packet drop rate as key performance measured across various links (VAST, Fiber and VPN across indicator for network performance the internet) and the corresponding statistical data is analyzed w and used as base-line for the optimization of a corporate II. NETWORK PERFORMANCE CRITERIA network performance. The qualities of service offered on the are able to access applications and carry out given task network before and after optimization are analyzed and use to without undue perceived delay, error or irritation. The determine the level of improvement on the network e primary measure of user perceived performances are performance achieved. availability and completion time. It is important to identify Keywords-Key performance indicator, optimized remote whether utilization factors,i collision rate or bandwidth network, latency, bandwidth, WAN, VSAT. congestion are responsible for network problems [7]. In I. INTRODUCTION general, the performance of a computer network can be divided into three sections for easy analysis and trouble- ost network users often attribute the problem of slow shooting: V network and poor quality of service to lack of M  The performance of the application, sufficient bandwidth, which is not generally correct.  The performance of the servers, Sometimes, poor network performance can be traced to network congestion, high packet drop rate, chatty protocols  The performance of the Network infrastructures. and high latency [1] among others. This paper uses the technique of network base lining to obtain the best Based on end-user perception of the network, we can also y view the network performance in terms of service oriented combination of network metrics that can enhance the performance of network resources up to maximum data flow and efficiency oriented as shown in the Fig. 1. energy (MDFE) which allows maximum amountl of data to be sent in the fastest amount of time using the optimum bandwidth capacity [2]. We assume that the Server and client processing time are minimal relativer to the total time it takes to complete a transaction. Hence, it attributes the cause of service transaction delays to WAN delay. It try to find out the causes of poor quality of service across the a WAN and makes recommendation or how to implement efficient remote network with better quality of service (QoS) [3]. In the methodology, three sets of parallel links (Fiber, VSAT and Point-to-PointE VPN across the internet) of equal bandwidth are set up between two geographically separate Fig:1. Block diagram of IT performance locations. Files of different size were sent between the ______It is noted that, service oriented performance measures how well an application provides service to the customer, About-1 Department of Electrical and Information Engineering, Covenant University, Ota, Nigeria (e-mail; [email protected]) whereas efficiency oriented performance measure how About-2 Department of Electrical & Electronics Engineering, Ambrose Alli much of available channel resource are actually used to University, Ekpoma, Nigeria (e-mail; [email protected]) provide end-user request. This tend to measure how much of 3 About- Department of Electrical and Electronics Engineering, University available channel resources are being wasted due to of Lagos Akoka, Yaba, Lagos Nigeria (email;[email protected]) (e-mail; [email protected]) inefficiencies inherent in the communication channel.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 15

III. METHODOLOGY Bandwidth Throughput  , bps The performance of a wide area network can be verified by TotalLatensy studying the effect of network contribution to transaction time (NCTT) on the network [3]. MSS 1 In a high performance network, TCP packets are transferred Also, Thoughput   , bps across the WAN with minimal delay (low latency) within RTT p the optimum load limit. When the network becomes overloaded, congestion sets in and TCP packets are drop and where, MSS – Maximum segment size (fixed for each consequently re-transmitted which adds to the total time internet path, typically 60 bytes) required to complete a transaction in a busy network [8]. RTT – Round trip time (as measured by TCP) Network contribution to transaction time is the sum of the P – Packet loss rate (%) round-trip times necessary to complete a given transaction The efficiency of the WAN link can be calculated from type, plus the time for recovery from any lost packets during statistical data on the link utilization, where Utilization (U) the transaction [3]. The network contribution to transaction [7] is the percentage of total channel capacity currently time can be calculated as: being consumed by aggregate traffic. NCTT  E  RTT  L RTO Traffic Utilization  100 where, E – number of round-trip exchange necessary to Channel capaciy complete the transaction, w RTT – round-trip time for packet transfer, Also L – number of round-trips exchanges that experience packet [(Data sent  data received)8 Utilization  100 loss, Link speed  sample time RTO – retransmission time-out e The number of losses experienced in the course of a Further more, in this research, three point-to-point WAN transaction depends on round-trip packet loss probability, p. link were setup betweeni two separate locations A and B For a two-ways traffic path, loss probability is given by: using three different WAN technologies, namely: (i) 128/256Kbps leased fiber line P  P 1{(1 P )(1 P )} RTT oneway otherway (ii) 128/256Kbps point-to-point VPN across the If each round-trip exchange, takes Ai attempt to complete public internet. successfully, and the total attempts to complete a transaction (iii) 128/256KbpsV VSAT link given as: The key performance indicators (KPI) metrics for the E research were Latency, Bandwidth and Packet Drop Rate. A   Ai The following approach methods were used to obtain the i1 , then required performance characteristics of the various WAN Prob(A  a)  pa1(1 p) technologies adopted: i y Expected value of A is given by: (a) Files of various sizes were sent from Host A to   Host B across the different WAN links. a1 E(1 p) l a E {A}  Ex axp (1 p)  axp (b) These KPI values were measured and recorded for a1 p a1 different remote network infrastructure in use (Fiber, VSAT, this converge as: r Point-to-Point VPN across the internet with bandwidth of E 128/256 kbps respectively) E{A}  for 0  p  1 1 p (c) The performance statistic values obtained in both cases were plotted in graphical form and analyzed. A is equal to the constant E plusa a random number of losses (d) Recommendation for error correction and L, so E{A}  E  E{L} performance improvement were made E p (e) Conclusion was drawn based on the result obtained E{L}   EE E( ) from the key performance indices. 1 p 1 p , and the average The alternative WAN links between two remote locations NCTT  E  RTT [E{L} RTO] shown in Fig. 1, were routed to Host A and Host B using different connection links (Fiber, VSAT, P2P VPN) to Note that the probability distribution of NCTT is a set of measure the KPI of the network. discrete values [11] at

(E x RTT), VSAT

{(E x RTT) + (1 x RTO)}, FIBER {E x RTT) + (2 x RTO)}, Pt-2-Pt VPN The performance of the WAN and remote network can also HOST A HOST B be viewed in terms of its effective throughput. Throughput is the quality of error-free data that can be Fig. 1. Schematic diagram of alternative WAN links transmitted over a specified unit of time [9]. between two remote locations P a g e | 16 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

The Table 1, shows the result of the throughput obtained from the remote link of the WAN with different Packet Drop Rate of the links. Table 1. Throughput result of a network as affected by both the latency and the packet drop rate

LATENCY THROUGHPUT (ms) TP1(KBPS) TP2(KBPS) TP3(KBPS) TP4(KBPS) TP5(KBPS) TP6(KBPS) TP7(KBPS) 0.01% PDR 0.05% PDR 0.10% PDR 0.50% PDR 1.00% PDR 2.00% PDR 3.00% PDR 9 1822.22 814.95 576.29 257.70 182.22 128.85 105.20 30 546.67 244.48 172.89 77.31 54.67 38.66 31.56 60 273.33 122.24 86.44 38.66 27.33 19.33 15.78 90 182.22 81.50 57.63 25.77 18.22 12.86 10.52 120 136.67 61.12 43.22 19.33 13.67 9.66 7.87 150 109.33 48.90 34.58 15.46 10.93 7.73 6.32 300 54.67 24.45 17.29 7.73 5.47 3.87 3.16 500 32.80 14.70 10.37 4.64 3.28 2.32 1.90 800 20.50 9.17 6.48 2.90 2.05 1.45 1.18 1000 16.40 7.34 5.19 2.32 1.64 1.16 w0.95 situation becomes worse when increasing packet drop rate is associated with VSAT links. For a Point-to-Point virtual

e private network (VPN) across the public internet with average latency of 250 milliseconds, most real-time and i data-based applications performance is considered

favourable. However, Point-to-Point VPN is always

associated with higher packet drop rate than VSAT or Fiber links becauseV of the large number of hop and routing

protocols across the part from source to destination. This is even worse when considering a two-way traffic situation usually experienced in real life scenario.

Fig. 3. Graph of throughput against latency for different IV. IMPROVEMENT IN QUALITY OF SERVICE packet drop rates y The improvement in quality of service (QoS) can be seen by The Fig. 3, shows the effect of packet drop rate on the comparing the network throughput of the Fiber, VPN, and network throughput over different latency. The throughput l VSAT link of a network. If we assume a minimal packet of a network is affected by both the latency and the packet loss for all the three infrastructures: latency of 850ms for drop rate of the link where an increase in latency decreases VSAT, Point-to-Point VPN across the internet at 260ms and the network throughput performance. Similarly, the r Fiber link of 25ms. throughput also decreases as the packet drop rate increases Throughput for VSAT gives 0.6168Mbps that of the VPN which might put the network quality of service to network across the internet gives 2.016Mbps and the throughout for degradation. Analysis of the achieved result indicates that, fiber gives 20.97 Mbps. By replacing the VSAT the best quality of service willa be obtained by using a link infrastructure with Fiber Optic link, the following whose latency is between 1 – 30 milliseconds and packet improvement in QoS would be achieved. drop rate of 0.01% or less. Such latency can only be Hence the improvement in QoS gives achieved using Fiber or radio link where packets are propagated at the speedE of light with very low bit-error-rate. 20.97  0.6168 100 => 3300% The worst quality of service occurred when latency is 0.6168 between 800 – 1000 milliseconds and the packet drop rate Similarly, replacing the VPN with Fiber optic link would be stands at 3% or more. achieved with an improvement in quality of service QoS as The link latency of 800 milliseconds and above is usually follows: associated, with VSAT link because of its technological 2.016  0.6168 limitation caused by distance along the propagation path 100 => 530% between two locations via the orbital satellite. 0.6168 However, VSAT links could still be used for none delay- sensitive application if there are no packet loss. The

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 17

V. CONCLUSION The Key Performance Indices of network services (packet drop rate, latency and throughput) affects the network performance as one the factors goes out of the optimized range value obtained in the research work. Under perfect conditions (assuming minimal percent of packet loss), the use of WAN link with low latency, and use of optimized bandwidth would significantly enhance the quality of service (QoS) experienced by a remote network user over a WAN link

VI. REFERENCE 1) Bilal Haider, M. Zafrullah,M.K. Islam, ―Radio frequency optimization & quality of service evaluation in operational GSM network,‖ Proceedings of the world Congress on Engineering and Computer Science 2009, Page 1, Volume 1. WCECS, Oct 20-22, 2009, San Francisco USA. w 2) Daniel Nassar, ―Network Performace Baselinning,‖ Publisher MTP, 201 West, 103rd Street, Indianapolis, IN46290 USA, 2002. 3) Network contribution to transaction time. ITU-T e Recommendation G.1040 ITU-T Study Group 12 under the ITU-T Recommendation A.8 procedure i (2005-2008). 4) ―Effect of network latency on load sharing in distributed systems,‖ Journal of parallel and distributed computing volume 66, issue 6 Inc V Orlando, FL USA (June 2006). 5) Jorg Widmer, Catherine Boutremans, Jean-Yves Le Boudec: End-to end congestion control for TCP – friendly flows with variables packet size. Publisher: ACM, 2004. 6) Gregory W. Cermak, ―Multimedia Quality asy a function of bandwidth, packet loss and latency,‖ International Journal of Speech Technology. Publisher: Springer Netherlands Issue: l Volume 8, Number 3, 2005.

7) Michael Lemm, ―How to improve WAN application performance,‖r 2007 http://ezinearticles.com.

8) Lai King Tee, ―Packet error rate and Latency requirements for a mobilea wireless access system in an IP network,‖ Vehicular Technology Conference, Pages 249 -253, 2007. 9) J.Scott Haugdahl, ―Network Analysis and Troubleshooting,‖E Publisher: Addison –Wesley, 2003.

P a g e | 18 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Analysis of the Routing Protocols in Real Time Transmission: A Comparative Study

IKram Ud Din1 Saeed Mahfooz2

Muhammad Adnan3

Abstract-During routing, different routing protocols are used at associated with routing in large networks that are beyond the the routers to route real time data (voice and video) to its scope of RIP. destination. These protocols perform well under different IGRP can select the fastest path based on the bandwidth, circumstances. This paper is about to evaluate the performance delay, reliability and load. By default, it uses only of RIP, OSPF, IGRP, and EIGRP for the parameters: packets bandwidth and delay metrics. To allow the network to dropping, traffic received, End-to-End delay, and variation in delay (jitter). Simulations have been done in OPNET for scale, IGRP also has a much higher maximum hop-count evaluating these routing protocols against each parameter. The limit than RIP. OSPF was developed by the Internet results have been shown in the graphs which show that IGRP Engineering Task Force (IETF) in 1988. OSPF shares performs the best in packets dropping, traffic received, and routing information between routers belongingw to the same End-to-End delay as compared to its other companions (RIP, autonomous system. It was developed to address the needs OSPF, and EIGRP), while in case of jitter, RIP performs well of scalable, large internetworks that RIP could not. EIGRP comparatively. is an advanced version ofe IGRP that provides superior Keywords-Routing, Protocol, Delay, Packet Loss, Jitter operating efficiency such as lower overhead bandwidth and I. INTRODUCTION faster convergence [1]. As we are examiningi the video and voice packets during protocol is a set of rules that reveals how computer video conferencing and voice packet transmission in this A systems communicate with each other across networks. paper, therefore a short introduction of those protocols must A protocol also functions as the common medium by which also be inevitable that are used for the transmission of these different hosts, applications, or systems communicate. The packets. V In video conferencing, Real Time Transport data messages are exchanged when computers communicate Protocol (RTP) is used for carrying out video packets, and with one another. Examples of messages are sending or for session establishment between the two systems, either receiving e-mail, establishing a connection to a remote H.323 or SIP is used. RTP provides end-to-end network machine, and transferring files and data. There are two transport functions premeditated for real time applications classes of protocols at the network layer, i.e., routed and such as video and voice. Those functions comprise payload- routing protocols. The transportation of data acrossy a type identification, time stamping, delivery monitoring and network is the responsibility of the routed protocols, and sequence numbering [2]. routing protocols permit routers to appropriately ldirect data Voice over Internet Protocol (VoIP) is a means of from one place to another. In other words, protocols that compressing voice using a standardized codec, then transfer data packets from one host to another across encapsulating the results within IP for transport over data router(s) are routed protocols, and to r exchange routing networks. For establishing and transporting VoIP traffic, information, routers use routing protocols. IP is considered H.323 is a standard protocol [3]. as a routed protocol while routing protocols are: i). Routing The H.323 standard has been developed by the ITU-T for Information Protocol (RIP), ii). Interior Gateway Routing vendors and equipment manufacturers who provide VoIP Protocol (IGRP), iii). Open Shortesta Path First (OSPF), and service. It was originally developed for multimedia iv). Enhanced Interior Gateway Routing Protocol (EIGRP), conferencing on LANs, but was later extended to VoIP. The etc. To forward data packets, the Internet Protocol (IP) uses 1st and 2nd versions of H.323 were released in 1996 and routing table. RIP uses hop count to determine the path and 1998, respectively. Currently, its version 4 is under distance to any link Ein the internetwork. In case of multiple consideration.Session Initiation Protocol (SIP) is the Internet paths to a destination, RIP selects the path that has fewest Engineering Task Force (IETF) standard for multimedia or hops. The only routing metric RIP uses is hop count; voice session establishment over the Internet. It was therefore, it does not necessarily opt for the fastest path to a proposed as a standard in February 1999. SIP: a detailed destination [1].IGRP is developed to address the problems protocol that stipulates the commands and responses to set ______up and tear-down calls. It also details features such as proxy,

About-1,2,3 Department of Computer Science, University of Peshawar, security, and transport (TCP or UDP) services. SIP Pakistan describes end-to-end call signaling between devices. SIP (e-mail1;[email protected]) defines, as the name implies, how the session is established (e-mail; [email protected]) between two IP nodes with or without media [2]. (e-mail; [email protected])

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 19

The goal of this study is to measure the performance of route the packets based on shortest path, shortest delays, and throughput, packet loss, jitter, and delay in real time greatest bandwidth factors. transmission. The simulations have been done in OPNET, The invention of Curtis et al [11] makes routing decisions. because OPNET has originally been developed for network In their invention, a best path is determined according to an simulation, and it is fully usable as an ample simulation tool IGRP, EIGRP, OSPF, BGP or other routing task that can with higher investment. OPNET provides a complete provide multiple routing paths. A first variety of routers in development environment for the specification, simulation the best routing path is determined. and performance analysis of communication networks [4], Their invention also makes decision for routing a received [5], [6]. OPNET must be able to simulate different network packet. If the first variety of routers had a noise level, the devices and various kinds of transmission lines, and display packet is forwarded to a next router in the best routing path. such information as packet end-to-end delay, delay variation If not, then according to said IGRP, EIGRP, OSPF, BGP, or (jitter), and packet loss in the network. The main purpose is the other routing function in a second routing path is to analyze how the network having speech activity. The determined [11]. voice quality can be characterized by two measurements: i) A network facilitates the delivery of packets from a source delay of the signal, and ii) distortion of the signal. The delay to destination. This delivery is possible through routers. disturbs the interactivity, while distortion reduces the Packets have destination addresses that let routers to legibility [7]. Many factors such as a heavy load in the determine how to route the data packets. A router has a network that creates higher traffic, may contribute to the routing table which stores network-topology information. congestion of network interface [8]. Therefore, this research With the help of network-topology information,w the router is important to be managed in order to measure and predict forwards packets to the destination. A routing protocol data transfers in real time applications. The remaining paper consists of methods to select the best path and exchange is structured as: Section 2 describes the work done in the topology information. There eare two main classes of routing evaluation of routing protocols. Section 3 illustrates the protocols: distance vector routing protocols, e.g. RIP and working environment for the implementation of these IGRP, and link-state routing protocols, e.g. OSPF. For protocols. Section 4 explains the OPNET simulations of the enterprise networks, OSPFi is often preferred [12], [13]. mentioned protocols. Section 5 concludes our work, and To exchange service availability and network reachability references are given in section 6. information, router implements one or more routing protocols. In a specific implementation, the border router II. RELATED WORK implementsV RIP, OSPF, IGRP, EIGRP, or BGP [14]. Privacy and security become necessary requirements for Routing protocols accept network state information and then Voice over IP (VoIP) communications that need security on the basis of such accepted information, update network services such as integrity, confidentiality, non-replay, non- topology information. Routing protocols also distribute the repudiation, and authentication. Quality of Service (QoS) of network state information. Path generation and forwarding the voice is affected by jitter, delay, and packet loss [9]. information generation are also duties of the routing Normally, telecommunication network consists of routersy protocols [15], [16]. which optimize the packets' transmission. Practically, a III. WORKING ENVIRONMENT packet is transmitted through a number of pathsl from one router to another. The selection of path is based on routing When a node wants to transmit real time applications (video tables' information usually received according to routing or voice) over IP then it must have to pass through a router. protocol. A routing protocol is one that providesr techniques For transmission of real time applications, real time facilitating a router to build a routing table. It also shares transport protocol (RTP) is used and the session is routing information with other neighboring routers. established between two remote stations through session When a router is switched off, the packets passing through initiation protocol (SIP) or H.323. Except, these real time that router is passed to anothera router. This operation is transmission protocols, some routing protocols are also used known as "routing protocol convergence". Packets are which route the real time applications to its destination. possibly to be lost during a routing protocol convergence These are: RIP, OSPF, IGRP and EIGRP. [10]. E Consider the following scenario having two servers i.e. Networks like the Internet are renowned today. Such VoIP and video, and two clients which are: VoIP and video networks consist of routers, switches and hubs, client. The distribution of the servers and clients are at two communication media, and firewalls. Servers and clients are different location, i.e., servers are located at site Lahore (in usually interconnected by networks. During communication this case) and the clients at the other site (say Karachi). through the Internet, there may be many possible routing paths and many routers between a source and destination. When packets arrive at a router, the router decides as to the next hop in a path to the destination. For making this decision, many algorithms are used, such as RIP, OSPF, IGRP, and EIGRP, etc. The RIP and OSPF try to route the packets to a destination via the path consisting fewest number of nodes (routers). The IGRP and EIGRP attempt to P a g e | 20 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

voice quality [19]. Jitter plays a vital role for the measurement of the Quality of Service (QoS) of real time applications. The effect of end-to-end delay, packet loss, and jitter can be heard as: The calling party says, ―Hello Sir, how are you?‖ With end-to-end delay, the called party hears,…...Hello Sir, how are you? With packet loss, the called party hears, He.lo….r, w are you? With jitter, the called party hears, Hello…Sir, how....are… you? [2].

IV. SIMULATION RESULTS In this section, a scenario was tested in which the delay, packet loss, and jitter were examined.

Figure 2 shows the number of IP packets dropped per second. Figure 3 illustrates the traffic received during video conferencing. The voice traffic received is shown in figure 4. The end-to-end delay in voice packetsw is given in figure 5, while variation in delay or jitter is clear from figure 6. A. Performancee Evaluation The number of packets dropped is given in figure 2; in which the less numberi of packets is lost when IGRP is Fig. 1: structure of the network implemented at the routers. While a huge amount of packets A. IP Packet/Traffic Dropping is dropped if OSPF works as a routing protocol. IGRP also works well in case of receiving video and voice packets, When a router or switch is unable to receive incoming data V given in figure 3 and 4, respectively. The end-to-end delay packets at a given time, is called Packet loss/drop. The real and variation in delay (jitter) in voice traffic is shown in time applications (video or voice) are drastically degraded figure 5 and 6, respectively, in which IGRP is also the best by packet loss [17]. protocol. In the given figures, the X-axis shows the amount B. Video/Voice Traffic Receiving of time and the Y-axis shows the number of packets in y figure 2, 3, and 4, and in figure 5 and 6, it shows the value Video/voice traffic is the total number of audio and video of jitter and delay. packets received during video conferencing or otherl type of real time communication (e.g., IP telephony).

C. End-to-End delayr End-to-end delay depends on the end-to-end data paths/signal paths, the payload size of the packets, and the

CODEC. Delay is the latency;a one-way or round-trip, encounter when data packets are transmitted from one place to another. In order to maintain the expected voice quality for Voice over IP (VoIP),E the roundtrip delay must remain within almost 120 milliseconds. [17].

D. Variation in Delay (Jitter)

In computer networks, the term jitter means variations in delay of packets received. Jitter is an essential quality of service (QoS) factor in evaluation of network performance. It is one of the significant issues in packet based network for real time applications [18]. The variation of interpacket delay or jitter is one of the principal factors that disturbs

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 21

Fig. 2: Number of packets dropped per second

w

e

i

Fig. 5: End-to-End Delay in voice Packets Fig. 3: video traffic received per second

V

y

l

r

a

E

Figure 4: voice traffic received per second Fig. 6: Jitter in Voice Packets

P a g e | 22 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

V. CONCLUSION quality-ECSQ. 2002, springer-Verlag, Berlin, Heideberg: Helsinki, Finland. p. 63-72. The size of today's networks has been growing quickly and 9) Mohd Nazri Ismail and M.T. Ismail, Analyzing of support complicated applications, e.g., video conferencing Virtual Private Network over Open Source and voice messages. Quality transmission is demand of the Application and Hardware Device Performance. time. This needs some good results producing routing European Journal of Scientific Research, 2009. protocols at the routers. The work done in this paper 28(2): p. 215-226. analyzes the available routing protocols: RIP, OSPF, IGRP 10) Nicolas Dubois and B. Fondeviole, Method for and EIGRP for packets dropping, traffic received, End-to- Control Management Based on a Routing Protocol. End delay, and variation in delay (jitter). Our work is based US Patent, 2008. No: US 2008/0025333 Al. on OPNET simulation for each of these parameters. The 11) Richard Scott Curtis and J.D. Forrester, System, study presents a comprehensive result for each protocol Method and Program for Network Routing. US against the parameters: packets dropping, traffic received, Patent, 2008. No: US 2008/0317056 Al. End-to-End delay, and variation in delay (jitter) one by one. 12) Thomas P. Chu, R.N. and Y.-T. Wang, IGRP performs well in packets dropping, traffic received, Automatically Configuring Mesh Groups in Data and End-to-End delay as compared to its other companions Networks. US Patent, 2010. No: US 2010/0020726 (RIP, OSPF, and EIGRP), while in case of jitter; RIP Al. performs a bit well than IGRP. 13) Xiaode Xu, M.S. and D. Shah, Routing Protocol VI. REFERENCES with Packet Network Attributesw for Improved Route Selection. US Patent, 2009. No: US

1) Cisco Systems, I., Cisco Networking Academy 2009/0059908 Al Program CCNA 1 and 2 Companion Guide Third Edition. 2003. e 14) Rosenberg, J., Peer-to-Peer Network including

2) Cisco Systems, I., Cisco Voice Over IP. Student Routing Protocol Enhancement. US Patent, 2009. Guide, ed. V. 4.2. 2004. No.: US 2009/0122724i Al. 3) Shufang Wu, M.R., Riadul Mannan, and Ljiljana 15) Bruce COLE and A.J. Li, Routing Protocols for Trajkovic, OPNET Implementation of Accommodating nodes with Redundant Routing Megaco/H.248 Protocol. 2003. Facilities. US Patent, 2009. No: US 2009/0219804

4) Mohd Ismail Nazri and A.M. Zin, Emulation AlV. Network Analyzer Development for Campus 16) Russell I. White, S.E.M., James L. Ng, and Alvaro Environmetn and Comparison between OPNET Enrique Retana, Determining an Optimal Route Application and Hardware Network Analyzer. Advertisement in a Reactive Routing Environment. European Journal of Scientific Research, 2008. US Patent, 2009. No.: US 2009/0141651 Al. 24(2): p. 270-291. 17) Paul J. Fong, E.K., David Gray, et.al, Configuring

5) Mohd Ismail Nazri and A.M. Zin, Evaluationy of Cisco Voice Over IP, ed. S. Edition. Software Network Analyzer Prototyping Using 18) C. Demichelis and P. Chimento, IP Packet Delay Qualitative Approach. European Journal of Variation Metric for IP Performance Metrics Scientific Research, 2009. 26(3): p. 170-182.l (IPPM). Request for Comments: 3393, 2002.

6) Sood, A., Network Design by using OPNET™ IT 19) Pedrasa, J.R.I. and C.A.M. Festin, An Enhanced GURU Academic Edition Software. Rivier Framing Strategy for Jitter Management, in Academic Journal, 2007. 3(1). r TENCON 2005 IEEE Region 10. 2005:

7) Sjögren, H.R.C., Voice over IP, simulated IP- Melbourne, Qld. p. 1-6. network, in School of Mathematics and Systems

Engineering. 2008, Växjöa University. 8) Chang, W.K. and H. S, Evaluating the performance

of a web site via queuing theory, in software E

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 23

An Empirical Study on Data Mining Applications

P.Sundari1 Dr.K.Thangadurai2

Abstract-The wide availability of huge amounts of data and the need for transforming such data into knowledge influences  Which products should be promoted to a particular towards the attraction of IT industry in data mining. During customer? – Targeted Marketing the early years of the development of computer techniques for business, IT professionals were concerned with designing  What is the probability that a certain customer will databases to store the data so that information could be easily leave for a competitor? – Customer Relationship and quickly accessed. The restrictions are storage space and Management the speed of retrieval of the data. Needless to say, the activity  What is the appropriate medical diagnosis for this was restricted to a very few, highly qualified professionals. patient? – Bio medical Then came an era when Database Management System  What is the likelihood that a certain customer will simplified the task. Thus almost any business such as small, default or pay back a loan? – Banking medium or large scale began using computers for day - to- day  Which products are bought most often together? – activities. Now what is the use of all this data? Up to the early 1990’s the answer to this was “NOT much”. No one was really Market Basket Analysis w interested in utilizing data, which was accumulated during the  How to identify fraudulent users in telecommunication process of daily activities. As a result a new discipline in industry? – Fraudulent pattern analysis Computer Science, Data Mining gradually evolved. Data These types of questions can be answered quickly and easily mining is becoming a pervasive technology in activities as e if the information hidden among the huge amount of data in diverse as using historical data to predict the success of a the databases can be located and utilized. We will discuss marketing campaign, looking for patterns in financial about the applicationsi of data mining in the following transactions to discover illegal activities or analyzing genome sequences. This paper deals with the application of data mining paragraphs. in various fields in our day to day life. II. APPLICATIONS OF DATA MINING Keywords-Data Mining, Targeted Marketing, Market Based Analysis, Customer Relations AlthoughV a large variety of data mining scenarios can be discussed, for the purpose of this paper the applications of I. INTRODUCTION data mining are divided into the following categories: Data Mining – An Overview  Science and Engineering ata mining refers to extracting knowledge from large  Business D amounts of data. The data may be spatial data,  Banking multimedia data, time series data, text data and web data.y  Telecommunication Since Data mining is a young discipline with wide and  Spatial data mining diverse applications. In this paper we will discussl a few  Surveillance application domains of data mining such as Science and Engineering, Banking, Business, Telecommunication and II. (A) Science and Engineering Surveillance. r The data mining has been widely used in area of science Data mining is the process of extraction of interesting, and engineering, such as bioinformatics, genetics, nontrivial, implicit, previously unknown and potentially medicine, education and electrical power engineering. useful patterns or knowledge from huge amounts of data. It a i) Biomedical and DNA Data analysis is the set of activities used to find new, hidden or unexpected patterns in data or unusual patterns in data. The past decade has seen an explosive growth in Using information contained within data warehouse, data biomedical research, ranging from the development of mining can often provideE answers to questions about an new pharmaceuticals and in cancer therapies to the organization that a decision maker has previously not identification and study of human genome by thought to ask. discovering large scale sequencing patterns and gene functions. Recent research in DNA analysis has led to ______the discovery of genetic causes for many diseases and disabilities as well as approaches for disease diagnosis, About-1 Department of Computer Science, Government Arts College ( Women) , Krishnagiri- 635 001, India prevention and treatment. It is challenging to identify (e-mail: [email protected]) particular gene sequence patterns that play roles in About-2 Department of Computer Science, Government Arts College. various diseases. DNA data analysis is done in the (Men ) , Krishnagiri- 635 001, India following ways.[5]

P a g e | 24 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

 Semantic integration of heterogeneous, distributed available for many years. Data mining techniques such as genome databases SOM has been applied to analyze data and to  Similarity search and comparison among DNA determine trends which are not obvious to the standard DGA sequences ratio techniques such as Duval Triangle.[4]  Identification of co occurring gene sequences Data mining technique is used to an integrated-circuit  Path analysis includes linking genes to different production line[2]. The data mining technique is applied in stages of disease development decision analysis to the problem of die-level functional test.  Visualization tools and genetic data analysis Experiments demonstrate the ability of applying a system of  The data mining technique that is used to perform mining historical die-test data to create a probabilistic model this task is known as Multifactor Dimensionality of patterns of die failure which are then utilized to decide in Reduction.[3] real time which die to test next and when to stop testing. This system has been shown, based on experiments with In adverse drug reaction surveillance, the Uppsala historical test data, to have the potential to improve profits Monitoring Centre has, since 1998, used data mining on mature IC products methods to routinely screen for reporting patterns indicative b) Banking of emerging drug safety issues in the WHO global database of 4.6 million suspected adverse drug reaction incidents.[7] Banking data mining applications may, for example, need to Recently, similar methodology has been developed to mine track client spending habits in order to detect unusual transactions that might be fraudulent.w Most banks and large collections of electronic health records for temporal patterns associating drug prescriptions to medical financial institutions offer a wide variety of banking services diagnoses.[8] (such as checking, saving, and business and individual customer transactions), credite (such as business, mortgage, ii ) Education and automobile loans), and investment services (such as The other area of application for data mining in mutual funds) [5]. It has also offer insurance services and stock services. For i example it can also help in fraud science/engineering is within educational research, where data mining has been used to study the factors leading detection by detecting a group of people who stage accidents students to choose to engage in behaviors which reduce their to collect on insurance money. The following methods are learning and to understand the factors influencing university used for financial data analysis.  LoanV payment prediction and customer credit student retention.[6] A similar example of the social application of data mining is its use in expertise finding policy analysis systems, whereby descriptors of human expertise are  Classification and clustering of customers for extracted, normalized and classified so as to facilitate the targeted marketing finding of experts, particularly in scientific and technical  Detection of money laundering and other financial fields. In this way, data mining can facilitate Institutional crimes memory. y c) Business iii) Electrical power engineering Retail industry collects huge amount of data on sales, l customer shopping history, goods transportation and In the area of electrical power engineering, data mining consumption and service records and so on. The quantity of techniques have been widely used for condition monitoring data collected continues to expand rapidly, especially due to of high voltage electrical equipment.r The purpose of the increasing ease, availability and popularity of the condition monitoring is to obtain valuable information on business conducted on web, or e-commerce. Retail industry the insulation's health status of the equipment. Data provides a rich source for data mining. Retail data mining clustering such as Self-Organizinga Map (SOM) has been can help identify customer behavior, discover customer applied on the vibration monitoring and analysis of shopping patterns and trends, improve the quality of transformer On-Load Tap-Changers(OLTCS). Using customer service, achieve better customer retention and vibration monitoring, it can be observed that each tap satisfaction, enhance goods consumption ratios design more change operation E generates a signal that contains effective goods transportation and distribution policies and information about the condition of the tap changer contacts reduce the cost of business [5]. A few examples of data and the drive mechanisms. Obviously, different tap positions mining in the retail industry are as follows. will generate different signals. However, there was  Design and construction of data warehouses based considerable variability amongst normal condition signals on benefits of data mining for the exact same tap position. SOM has been applied to detect abnormal conditions and to estimate the nature of the  Multidimensional analysis of sales, customers, abnormalities.[4] products, time and region: Data mining techniques have also been applied for The multi feature data cube is a useful data structure in retail Dissolved Gas Analysis (DGA) on power transformers. data analysis. DGA, as a diagnostics for power transformer, has been Another example of data mining, often called the market basket analysis, relates to its use in retail sales. If a clothing

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 25 store records the purchases of customers, a data-mining of telecommunication, computer network, Internet and system could identify those customers who favors silk shirts numerous other means of communication and computing are over cotton ones. Although some explanations of underway. Moreover, with the deregulation of the relationships may be difficult, taking advantage of it is telecommunication industry in many countries and the easier. The example deals with association rules within development of new computer and communication transaction-based data. Not all data are transaction based technologies, the telecommunication market is rapidly and logical or inexact rules may also be present within a expanding and highly competitive. This creates a great database. In a manufacturing application, an inexact rule demand from data mining in order to help understand may state that 73% of products which have a specific defect business involved, identify telecommunication patterns, or problem will develop a secondary problem within the catch fraudulent activities, make better use of resources, and next six months. improve the quality of service. Market basket analysis has also been used to identify the e) Spatial data mining purchase patterns of the Alpha consumer. Alpha Consumers are people that play key roles in connecting with the concept Spatial data mining is the application of data mining behind a product, then adopting that product, and finally techniques to spatial data. It follows along the same validating it for the rest of society. Analyzing the data functions in data mining, with the end objective to find collected on these type of users has allowed companies to patterns in geography. So far, data mining and Geographic predict future buying trends and forecast supply demands. Information Systems (GIS) have existed as two separate Data Mining is a highly effective tool in the catalog technologies, each with its own methods,w traditions and marketing industry. Catalogers have a rich history of approaches to visualization and data analysis. Particularly, customer transactions on millions of customers dating back most contemporary GIS have only very basic spatial several years. Data mining tools can identify patterns among analysis functionality. Thee immense explosion in customers and help identify the most likely customers to geographically referenced data occasioned by developments respond to upcoming mailing campaigns. in IT, digital mapping, remote sensing, and the global diffusion of GIS emphasizesi the importance of developing  Analysis of the effectiveness of sales campaigns:  Customer retention – analysis of customer loyalty data driven inductive approaches to geographical analysis There are a wide variety of data mining applications and modeling. available, particularly for business uses, such as Customer Data mining, which is the partially automated search for Relationship Management (CRM). Goods purchased at hidden patternsV in large databases, offers great potential different periods by the same customers can be grouped into benefits for applied GIS-based decision-making. Recently, sequences. Sequential pattern mining can be used to the task of integrating these two technologies has become investigate changes in customer consumption and suggest critical, especially as various public and private sector adjustments on the pricing and variety of goods in order to organizations possessing huge databases with thematic and help retain customers and attract new customers. These geographically referenced data begin to realize the huge applications enable marketing managers to understand y the potential of the information hidden there. Among those behaviors of their organizations are: customers and also to predict the potential behaviorl of Offices requiring analysis or dissemination of geo- prospective customers. A data mining technique may assist referenced statistical data. the prediction of future customer retention. For example, a Public health services searching for explanations of disease company may decide to increase prices, rand could use data clusters mining to predict how many customers might be lost for a Environmental agencies assessing the impact of changing particular percentage increase in product price. land-use patterns on climate change Data mining can also be helpfula to human-resources Geo-Marketing companies doing customer segmentation departments in identifying the characteristics of their most based on spatial location. successful employees. Information obtained, such as f) Surveillance universities attended by highly successful employees, can help HR focus recruitingE efforts accordingly. Additionally, Data Mining is used by intelligence agencies like FBI and Strategic Enterprise Management applications help a CIA to identify threats of terrorism. After the 9/11 incident company translate corporate-level goals, such as profit and it has become one of the prime means to uncover terrorist margin share targets, into operational decisions, such as plots. However this led to concerns among the people as production plans and workforce levels.[1] data collected for such works undermines the privacy of a large number of people. d) Telecommunication Two plausible data mining techniques in the context of The telecommunication industry offers local and long combating terrorism include "pattern mining" and "subject- distance telephone services to provide many other based data mining". comprehensive communication services including voice, i)Pattern mining fax, pager, cellular phone, images, e-mail, computer and "Pattern mining" is a data mining technique that involves web data transmission and other data traffic. The integration finding existing patterns in data. Pattern mining is a tool to P a g e | 26 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology identify terrorist activity, the National Research Council a young field with many issues that still need to be provides the following definition: "Pattern-based data researched in depth. The diversity of data, data mining tasks mining looks for patterns (including anomalous data and approaches poses many challenging research issues in patterns) that might be associated with terrorist activity — data mining. The design of data mining languages, the these patterns might be regarded as small signals in a large development of efficient and effective data mining methods, ocean of noise."[9][10][11] Pattern Mining includes new the construction of interactive and integrated data mining areas such a Music Information Retrieval (MIR) where environments and the application of data patterns seen both in the temporal and non temporal mining techniques to solve large application problems are domains are imported to classical knowledge discovery important tasks for data mining researchers. search techniques. V. REFERENCES ii) Subject-based data mining 1) Ellen Monk, Bret Wagner (2006). Concepts in "Subject-based data mining" is a data mining technique Enterprise Resource Planning, Second Edition. involving the search for associations between individuals in Thomson Course Technology, Boston, MA. ISBN data. In the context of combating terrorism, the National 0-619-21663-8. OCLC 224465825. Research Council provides the following definition: 2) Tony Fountain, Thomas Dietterich & Bill Sudyka "Subject-based data mining uses an initiating individual or (2000) Mining IC Test Data to Optimize VLSI other datum that is considered, based on other information, Testing, in Proceedings of the Sixth ACM to be of high interest, and the goal is to determine what other SIGKDD International Conferencew on Knowledge persons or financial transactions or movements, etc., are Discovery & Data Mining. (pp. 18-25). ACM related to that initiating datum."[9] Press. g) Text Mining and Web Mining e 3) Xingquan Zhu, Ian Davidson (2007). Knowledge Text mining is the process of searching large volumes of Discovery and Data Mining: Challenges and documents from certain keywords or key phrases. By i Realities. Hershey, New Your. pp. 18. ISBN 978- searching literally thousands of documents various 159904252-7. relationships between the documents can be established. An extension of text mining is web mining. Web mining is 4) a b A.J. McGrail, E. Gulski et al.. "Data Mining an exciting new field that integrates data and text mining TechniquesV to Asses the Condition of High Voltage within a website. Web serves as a huge, widely distributed, Electrical Plant". CIGRE WG 15.11 of Study global information service center for news, advertisements, Committee 15. consumer information, financial management, education, government, e- commerce and many other information 5) Jiawei Han & Micheline Kamber. (2001) Data services. It enhances the web site with intelligent behavior, Mining: Concepts and Techniques , Morgan such as suggesting related links or recommending newy Kaufmann publishers, CA,USA. products to the consumer. Web mining is especially exciting because it enables tasks that were previously difficult to l 6) J.F. Superby, J-P. Vandamme, N. Meskens. implement. They can be configured to monitor and gather data from a wide variety of locations and can analyze the "Determination of factors influencing the data across one or multiple sites. For example the search achievement of the first-year university students r using data mining methods". Workshop on engines work on the principle of data mining. Educational Data Mining 2006. III. NEED OF DATA MINING The massive growth of data isa due to the wide availability 7) Bate A, Lindquist M, Edwards IR, Olsson S, Orre of data in automated form from various sources as WWW, R, Lansner A, De Freitas RM. A Bayesian neural Business, science, Society and many more. Data is useless, network method for adverse drug reaction signal if it cannot deliver knowledge. That is why data mining is generation. Eur J Clin Pharmacol. 1998 E Jun;54(4):315-21. gaining wide acceptance in today‘s world. A lot has been done in this field and lot more need to be done. 8) Norén GN, Bate A, Hopstadius J, Star K, Edwards IV. CONCLUSION IR. Temporal Pattern Discovery for Trends and Since data mining is a young discipline with wide and Transient Effects: Its Application to Patient diverse applications, there is still a nontrivial gap between Records. Proceedings of the Fourteenth general principles of data mining and domain specific, International Conference on Knowledge Discovery effective data mining tools for particular applications. The and Data Mining SIGKDD 2008, pages 963-971. aim of the paper is the study of application domains of Las Vegas NV, 2008. Data Mining such as science and engineering, banking, business and telecommunication. Although data mining is

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 27

9) a b National Research Council, Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment, Washington, DC: National Academies Press, 2008.

10) R. Agrawal et al., Fast discovery of association rules, in Advances in knowledge discovery and data mining pp. 307-328, MIT Press, 1996.

11) Stephen Haag et al. (2006). Management Information Systems for the information age. Toronto: McGraw-Hill Ryerson. pp. 28. ISBN 0- 07-095569-7. OCLC 63194770.

w

e

i

V

y

l

r

a

E

P a g e | 28 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

A Novel Decision Scheme for Vertical Handoff in 4G Wireless Networks

E. Arun1 R.S Moni2

Abstract−Future wireless networks will consist of multiple that all these new wireless technologies were designed heterogeneous access technologies such as UMTS, WLAN, and without considering any interworking among them In Wi-Max. These technologies differ greatly regarding network heterogeneous wireless networks, mobile devices or mobile capacity, data rates, and other various parameters such as stations will be equipped with multiple network interfaces to power consumption, Received Signal Strength, and coverage access different wireless networks. Users will expect to areas. This paper presents two Handoff Decision schemes for continue their connections without any disruption when they heterogeneous networks. A good handoff decision could avoid move from one network to another. This important process the redundant handoffs and reduce the packet lose. First scheme makes use of a score function to find the best network in wireless networks is referred to as handoff or handover. at best time from a set of neighboring networks. Score function Handoff process among networks usingw different access uses bandwidth, Received Signal Strength (RSS) and access fee technologies is defined as vertical handoff (VHO) [1]. Such as its parameters. Second scheme makes use of classic triangle a process of changing the connections among different types problem to find the best network from a set of neighboring of wireless and mobile networks is called the vertical networks. This problem considers three parameters handoff. Obviously, the networke selection and the vertical bandwidth, Received Signal Strength (RSS) and access fee as handoff decision are two important processes in an the three sides of a triangle. If an equilateral triangle is integrated wireless and mobile network. Handoff process is obtained with these parameters of a network then that network i initiated by change in different factors like Received Signal will be the best among the set of networks. The best decision model meets the individual user needs but also improve the Strength (RSS), Signal to Noise Ratio (SNR) etc. When whole system performance by reducing the unnecessary these factors fall bellow the threshold value the Mobile handoffs. Node (MN)V has to search for another AP having RSS greater Keywords-MIHF, Received Signal Strength, Mobility than threshold value [2, 3]. Wang et al. introduce the policy Management, vertical handoff , enabled handoff in [4], which was followed by several papers on similar approaches. Policy enabled handoff I. INTRODUCTION systems separates the decision making (i.e. which is the urrently, there are various wireless networks deployed ―best‖ network and when to handoff) from the handoff C around the world. Examples include second and thirdy mechanism. Smart Decision Model [5] smartly performs generation (3G) of cellular networks (e.g., GSM/GPRS, vertical handoff among available network interfaces. Using UMTS, CDMA2000), wireless local area networks WLANs a well-defined score function, the proposed model can (e.g., IEEE 802.11a/b/g), and personal area networksl (e.g., properly handoff to the ―best‖ network interface at the Bluetooth). All these wireless networks are heterogeneous in ―best‖ moment according to the properties of available sense of the different radio access technologies. From this network interfaces, system configurations / information, fact, it follows that no access technology ror service provider and user preferences. A handoff decision scheme with can offer ubiquitous coverage expected by users requiring guaranteed QoS [6] for heterogeneous networks make the connectivity anytime and anywhere. The actual trend is to decision according to the user‘s communicating types and integrate complementary wirelessa technologies with the performance of the networks. A generic vertical handoff overlapping coverage, to provide the expected ubiquitous decision function [7] proposed considering the different coverage and to achieve the ―Always Best Connected‖ factors and metric qualities that give an indication of (ABC) concept The ABC concept allows the user to use the whether or not a handoff is needed. The decision function best available accessE network. In order to accomplish the enables devices to assign weights to different network integration and inter-working between heterogeneous factors such as monetary cost, quality of service, power wireless networks and the ABC concept, many challenging requirements, personal preferences etc. A decision strategy research problems have to be solved, taking into account. [8] considers the performance of the whole system while ______taking VHO decisions by meeting individual needs. This decision strategy select the best network based on the About-1Assistant Professor, Dept of Computer Science & Engineering, highest received signal strength (RSS) and lowest Variation Noorul Islam University, Thuckalay, Tamil Nadu, India of Received Signal Strength (VRSS). Thus it ensures the (e-mail: [email protected]) About-2Senior Professor, Dept of Electronics &Communication high system performance by reducing the unnecessary Engineering, Noorul Islam University, Thuckalay, Tamil Nadu, India handoffs. Nasser et al. [9] proposed a VHO decision (VHD) (e-mail: [email protected]) method that simply estimates the service quality for

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 29 available networks and selects the network with the best MN indicates two possibilities a) RSS for an MN dropped quality. However, there still lie ahead many challenges in below some specific threshold while MN in service at an AP integrating cellular networks and WLANs b) RSS for one or more APs exceeded to a specific This paper is organized as follows. In Section II, we threshold while MN in service at BS. Usually AP is introduce our proposed system model for an integrated preferred attachment point than BS since AP is associated wireless and mobile network. In Section III, different with higher bandwidth cost and higher data rate. When handoff decision strategies are presented. In Section IV, we NSDC obtains LLT it executes Network selection decision analyze the performance of the proposed strategy. Finally, algorithm and find the best AP, if no other best APs are we conclude this paper in Section V. found for handoff select cellular network as the best available network. II. SYSTEM MODEL III. NETWORK SELECTION DECISION MAKING ALGORITHMS

Most existing network selection strategies only focused on

the individual user‘s needs. Motivation of this paper is to design a network-selection strategy from a system‘s perspective, and the network-selection strategy can also meet a certain individual user‘s needs. In the following, we discuss how our proposed network-selectionw strategy works. A. Algorithm 1) Handoff Initiation: MN can be in service with AP or BS. When the RSS e strength goes low below some threshold value or when the Fig 1 Vertical handoff in heterogeneous networks RSS strength in any of the AP goes above some threshold value when the MN i is in service with BS, the MN has to As shown in the above figure an MN can be existing at a find a best network to which it has to perform handoff given time in the coverage area of an UMTS alone. .When RSS goes low MN gives Link layer trigger to However, due to mobility, it can move into the regions Network Selection Decision Controller in the network in covered by more than one access network, i.e., which theV MN currently connects to. Thus the handoff simultaneously within the coverage areas of, for example, an process is initiated. UMTS BS and an IEEE 802.11 AP. Multiple IEEE 802.11 2) Handoff Decision: WLAN coverage areas are usually contained within an When handoff process is initiated, the Network Selection UMTS coverage area. A Worldwide Interoperability for decision controller collects the condition of each Microwave Access (WiMAX) coverage area can overlap neighboring network via Media Independent Handover with WLAN and/or UMTS coverage areas. In dense urbany Function (MIHF) and executes Network Selection Decision areas, even the coverage areas of multiple UMTS BSs can Controller (NSDC) algorithm. The algorithm first calculates overlap. Thus, at any given time, the choice of an the score of the current network and compares the score appropriate attachment point (BS or AP) for each lMN needs with each of the neighboring network‗s score. The score of to be made. These access technologies have different the neighboring networks is calculated only if all the bandwidth, power consumption, RSS threshold, data rate, parameters have satisfying value to accept a Mobile Host. jitter, delay etc. So during handoff it is requiredr to find the Our proposed network-selection strategy prefers a call to be best network according to user preferences. At the hotspots accepted by a network with lower traffic load and stronger APs are made available. When the Received Signal Strength received signal strength, which can achieve better traffic of an AP goes low below somea threshold value the Mobile balance among different types of networks and good service Host has to find another best network considering quality. Consequently, we define a score function to bandwidth, RSS, access fee as parameters. Each of these combine these two factors-the traffic load and the received parameters is given a weight according to preferences. If signal strength. Therefore, the score to use a network Ni for any of the best AP E s are not available handoff has to be a call is defined as performed to Base Station of UMTS. Thus, multiple access The score function used is the following: technologies and multiple operators are typically involved in k Network Selection Decision. The Network Selection Score  W j Normj decision making algorithm is implemented in Network j1 (1) selection decision Controllers located in access networks. k is the number of parameters. Wj is the weight assigned to Decision input for NSDCs will be obtainable via the MIHF. the parameter j. Normj is the normalized value of the The MIHF of NSDC facilitates standard based message parameter j. If any of the network with higher score is exchanges between various access networks or attachment available handoff to that particular network or if any of the points to share information about the traffic load, bandwidth network with optimum score is not available handoff to BS. available, RSS and other network capabilities of each AP.

NSDC obtains LLT s from MN via MIHF. LLT regarding P a g e | 30 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Scorei  wg.Gi  ws.Si  wf .Fi (2) P th  P max 10 log(R ) where Gi is the complementary of the normalized utilization i i i (7) of network Ni, Ri is the relative received signal strength The relative received signal strength from network Ni is from network Ni, Fi is the normalized access fee of network rewritten as Ni and wg (0 ≤ wg ≤ 1) ws (0 ≤ ws ≤ 1),wf (0 ≤ wf ≤ 1), are log(r ) the S  1 i i log(R ) weights that provide preferences to Gi, Si, Fi respectively. i (8) The larger the weight of a specific factor, the more Ri is the radius of cell of network i important that factor is to the user and vice versa The Access fee Φi is given by constraint between wg ,ws and wf is given by

(1i ) wg + ws+wf = 1 (3) i  (9)  max Even though we could add the different factors in the VHDF to obtain network score, each network parameter has a where φmax is the highest access fee that the mobile user different unit, which leads to the necessity of normalization. likes to pay, and φi is the access fee to use network Ni. The The complementary of normalized utilization Gi is defined mobile user does not connect to a network that charges more by than φmax. If an originating call has more than one connection option, the score of all candidatew networks are

B calculated by using the score function in (2). The originating Gi  if call is accepted by a network that has the largest score, Bi (4) which indicates the ―best‖ network.e If there is more than one ―best‖ network, the originating call is randomly accepted by where Bif is the number of available bandwidth units in any one of these ―best‖i networks. network Ni, Bi is the total number of bandwidth units in Flow chart network Ni. In general, stronger received signal strength indicates better signal quality. Therefore, an originating call prefers to be accepted by a network that has higher received signal V strength. However, it is difficult to compare the received signal strength among different types of wireless and mobile networks because they have different maximum transmission power and receiver thresholds. As a result, we propose to use relative received signal strength to compare different types of wireless and mobile networks. Si in (2)y is defined by l P c  P th Si  i i max th Pi  Pi r (5)

P c where i is the current received signal strength from th a P network Ni, i is the receiver threshold in network Ni, P max and i is the maximumE transmitted signal strength in network Ni. It is to note that we only consider the path loss in the radio propagation model. Consequently, the received signal strength (in decibels) in network Ni is given by Fig 2: Handoff decision Algorithm 1

c max Here this algorithm checks only if bandwidth is available P  P 10 log(r ) i i i (6) and not checking it greater than threshold. As the available bandwidth decreases i.e. the load increases there is more where ri is the distance between the mobile user and the BS chance for the RSS to go low. Thus the call dropping (or AP) of network Ni, and γ is the fading factor . Therefore, probability increases and holding time decreases. In this the receiver threshold in network Ni is given by

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 31 algorithm if any of the parameters have greater value the representing the conditions of networks. Each side of the score increases even if others have less value. triangle corresponds to each parameter. The parameters this problem considers in this paper are Received Signal B. Algorithm 2 Strength, Bandwidth and Access cost. If all the parameters Handoff Initiation have desired value (value MN expects) then the resultant MN can be in service with AP or BS. When the RSS triangle will be equilateral (S1=S2=S3=a, three sides equal) strength goes low below some threshold value or when the and if two of the parameters have desired value the triangle RSS strength in any of the AP goes above some threshold will be isosceles (S1≠S2=S3 or S1=S2≠S3, two sides equal). value when the MN is in service with BS, the MN has to If S1≠S2≠S3 then the triangle is scalene. The networks that find a best network to which it has to perform handoff give equilateral triangle and isosceles will be in candidate .When RSS goes low MN gives Link layer trigger to list 1 and candidate list 2 respectively. Select one network Network Selection decision controller in the network in from list1 as best network and if list1 is empty select best which the MN currently connects to. Thus the handoff network from list2. Then perform handoff to the selected process is initiated. best network. If both lists are empty handoff to BS. 2) Handoff Execution: Flow chart Handoff execution is based on classic triangular problem. According to triangular problem we consider triangles

w

e

i

V

y

l r

a

E

Fig 3: Handoff decision Algorithm 2 c where P is the current received signal strength RSS can be measured as i c max from network Ni, ri is the distance between the mobile user Pi  Pi 10 log(ri ) (10) max and the BS (or AP) of network. Pi is the maximum

P a g e | 32 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology transmitted signal strength in network Ni γ is the fading call dropping factor probability algorithm 1 call dropping probability 0.8 0.75 0.7 0.65 0.55 0.4 0.3 Bandwidth is given by Available Bandwidth of the network = Bandwidth of the network − sum of Bandwidth used by all MNs Attached to the network

Access Fee is the fee that is assigned to each network usage. 1.4 It may vary from network to network. User usually prefers 1.2 the low network fee 1 IV. PERFORMANCE ANALYSIS

Simulations have been performed for the 3G cell overlay 0.8 algorithm 1 structure. In this scenario three networks of different data 0.6 algorithm 2 rates co-exist in the same wireless service area. Network 1 and Network 2 represent 802.11b wireless LANs, with 0.4 w bandwidths of 2Mbps and 1Mbps, respectively. Network 3 calldropping probability is modeled as a UMTS network, which supports multiple 0.2 users simultaneously. e The expected graphs are shown below 0 5 10 20 30 40 50 60 Bandwidth 10 20 30 40 50 60 70 80 90 100 i RSS(dBm) Holding Time algorithm Fig 6: Call dropping probability Vs RSS 1 2.5 4.5 5.7 6.1 6.3 6.5 6.9 7 7 7 V Holding V. CONCLUSION Time Thus this paper describes two different handoff decision algorithm algorithms. First algorithm uses a score function to find the 2 3.5 5.5 6.5 7 7.5 8 8.5 9 9.5 best9.5 network at best time from a set of neighboring networks. Second algorithm uses classic triangle problem to y find the best network from a set of neighboring networks. If an equilateral triangle is obtained with three parameters of a 18 l network then that network will be the best among the set of 16 networks. Since the second algorithm performs handoff only if the constraints are above the threshold value. The call 14 r dropping probability is reduced and holding time is 12 increased.

10 algorithm 2 VI. REFERENCES 8 a algorithm 1 1) Enrique Stevens-Navarro, Ulises Pineda-Rico, and

Holdingtime(sec) 6 Jesus Acosta-Elias ―Vertical Handover in beyond Third Generation (B3G) Wireless Networks‖ 4 E International Journal of Future Generation 2 Communication and Networking, pp. 51-58, 2008 0 2) K.Ayyappan and P.Dananjayan ―RSS 10 20 30 40 50 60 70 80 90 100 measurement for vertical handoff in heterogeneous Bandwidth(Mbps) network‖, Journal of Theoretical and Applied Information Technology, pp. 989-994 , 2005

3) Kemeng Yang, Iqbal Gondal, Bin Qiu and Fig 4: Holding time Vs RSS Laurence S. Dooley ―Combined SINR Based Vertical Handoff Algorithm for Next Generation Heterogeneous Wireless Networks‖ Global RSS 5 10 20 30 40 50 60 Telecommunications Conference, 2007. algorithm 2 0.5 0.4 0.3 0.25 0.2 0.1 0.09 GLOBECOM '07, pp. 4483 – 4487, Nov 2007,

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 33

Digital Object Identifier 10.1109/GLOCOM.2007.852

4) Wang, R. Katz, and J. Giese, "Policy-enabled

handoffs across heterogeneous wireless networks",

WMCSA 99, Feb1999, pp. 51-60, Digital Object

Identifier 10.1109/MCSA.1999.749277

5) L-J. Chen, T. Sun, B. Chen, V. Rajendran, and M. Gerla. "A Smart Decision Model for Vertical Handoff." The 4th ANWIRE International Workshop on Wireless Internet and Reconfigurability (ANWIRE 2004). May 2004. 6) Ying-Hong Wang, Chih-Peng Hsu, Kuo-Feng Huang, Wei-Chia Huang‖Handoff decision scheme with guaranteed QoS in heterogeneous network‖ pp 138-143,2008Digital Object Identifier 10.1109/UMEDIA.2008.4570879 7) Ahmed Hasswa, Nidal Nasser, Hossam Hassanein ―Generic Vertical Handoff Decision Function for w Heterogeneous Wireless Networks‖ Wireless and Optical Communications Networks, 2005. WOCN 2005, pp 239-243,Mar 2005 , Digital Object e Identifier 10.1109/WOCN.2005.1436026 8) Shen,W.;Zeng,Q.-A. ―A Novel Decision Strategy of Vertical Handoff in Overlay Wireless Networks‖ i Fifth IEEE International Symposium on Network Computing and Applications, 2006 ,pp 227-230 Digital Object Identifier 10.1109/NCA.2006.5 9) Summary: In an overlay wireless network, a mobile V user can connect to different radio access networks if it is equipped with appropriate network interfaces. When the mobile user changes its connection between different radio access networks, a vertical handof....N. Nasser, A. Hasswa, and H. Hassanein, ―Handoffs in fourthy generation heterogeneous networks,‖ IEEE Commun. Mag., vol. 44, no. 10, pp. 96l–103, Oct. 2006, Digital Object Identifier 10.1109/MCOM.2006.1710420 10) Olga Orrnond, Philip Perryr and John Murphy‖Network Selection Decision in Wireless Heterogeneous Networks‖ 2005 IEEE 16th International Symposium on Personal, Indoor and Mobile Radio Communicationsa Volume 4,pp 2680 – 2684 Sept 2005, Digital Object Identifier 10.1109/PIMRC.2005.1651930 11) Wei Shen, and Qing-An Zeng, ―Cost-Function- E Based Network Selection Strategy in Integrated Wireless and Mobile Networks,‖ IEEE Trans. Veh. Technol., vol. 57, no. 6, pp. 3778–3788, Nov. 2008. Digital Object Identifier 10.1109/TVT.2008.917257.

P a g e | 34 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Hybrid Approach for Template Protection in Face Recognition System

Sheetal Chaudhary1 Rajender Nath2

Abstract-Biometrics deals with identifying individuals with the It must also check that the presented biometric sample help of their biological (physiological and behavioral) data. The belongs to the live human being who was originally enrolled security of biometric systems has however been questioned and in the system and not just any live human being. It is well previous studies have shown that they can be fooled with known that fingerprint systems can be fooled with artificial artificial artifacts. Also biometric recognition systems face fingerprints, static facial images can be used to fool face challenges arising from intra-class variations and attacks upon template databases. To tackle such problems, a hybrid recognition systems, and static iris images can be used to approach for liveness detection and protecting templates in fool iris recognition systems [2]. face recognition system is proposed. Here, the system captures Multimodal biometric systems consolidate the evidence input face image in three different poses (left, front, right) presented by multiple biometric sourcesw of information and based upon the order chosen by the random select module. are expected to be more reliable due to the presence of This approach will perform live face detection based upon multiple, fairly independent pieces of evidence [3]. Intra- complete body movement of the person to be recognized and class variations in face recognition system can be overcome template protection by randomly shuffling and adding the with multimodal biometric e systems. Figure 1 is showing components of feature set resulting after fusion of three poses intra-class variation associated with an individual‘s face of input face image. It overcomes the limitations imposed by intra-class variations and spoof attacks in face recognition image. Due to changei in pose, face recognition system will system. The resulting hybrid template will be more secure as not be able to match these 3 images successfully, even original biometric template will not be stored in the database though they belong to the same individual [4]. A rather it will be stored after applying some changes (shuffling Multibiometric system can be classified into five categories and addition) in its components. Thus the proposed approach (multi-sensor,V multi-algorithm, multi-instance, multi-sample has higher security and better recognition performance as and multimodal) depending upon the evidence presented by compared to the case when no measures are used for live face multiple sources of biometric information. Multi-sample check and template protection in database. system can be used to tackle intra-class variations. Here, a Keywords-Liveness detection, template protection, face single sensor is used to acquire multiple samples of the same recognition, multiple sample fusion, eigen-coefficients biometric trait in order to account for the variations that can I. INTRODUCTION y occur in the trait. It is an inexpensive way of improving system performance since this system requires neither he term biometrics is derived from the Greek words multiple sensors nor multiple feature extraction and T bios and metron which translates as life measurement.l matching modules [5] [6]. Biometrics are not secrets and therefore should be properly protected. A good biometrics system should depend not only on security of biometric data but the authenticationr process must also check for liveness of the biometric data. People leave fingerprints behind on everything they touch, and the iris can be observed anywhere athey look. Our facial images are recorded every time we enter a bank, railway station, and supermarket [1]. Once biometric measurements are disclosed, they cannot be changed (unless the user is willing to have an organ transplant).E The only way how to make a system secure is to make sure that the data presented came Fig.1: Intra-class variation associated with an individual‘s from a real person and is obtained at the time of face image authentication. Liveness detection in a biometric system One of the properties that make biometrics so attractive for means the capability for the system to detect, during authentication purposes is their invariance over time. One of enrollment and identification/verification, whether or not the the most vulnerabilities of biometrics is that once a biometric sample presented to the system is alive or not. biometric image or template is stolen, it is stolen forever and ______cannot be reissued, updated or destroyed [7]. One of the About-1University Research Scholar, Department Of Comp. Sc. & App. most potentially damaging attacks on a biometric system is K.U., Kurukshetra, Haryana, India (e-mail;[email protected]) against the biometric template database. Attacks on the About-2Associate Professor, Department Of Comp. Sc. & App. K.U., template can lead to the following three vulnerabilities: (i) Kurukshetra, Haryana, India (e-mail;[email protected])

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 35

A template can be replaced by an impostor‘s template to reveal any significant information about the original gain unauthorized access, (ii) A physical spoof can be biometric template, it is needed during matching to extract a created from the template to gain unauthorized access to the cryptographic key from the query biometric features. system (as well as other systems which use the same Matching is performed indirectly by verifying the validity of biometric trait) and (iii) The stolen template can be replayed the extracted key. Biometric cryptosystems can be further to the matcher to gain unauthorized access [8]. classified as key binding and key generation systems The proposed hybrid approach provides three main depending on how the helper data is obtained [16]. advantages: handles intra-class variation, performs live face Liveness detection can be performed either at the acquisition check and provides protection against attacks on template stage, or at the processing stage. There are two approaches database. The rest of the paper is organized as follows. in determining if a biometric trait is alive or not; liveness Section 2 addresses the literature study. In section 3 face detection and non-liveness detection [2]. Liveness detection, feature set extraction using PCA is discussed. In section 4 which aims at recognition of human physiological activities architecture of the proposed approach is presented. Section 5 as the liveness indicator to prevent spoofing attack, is discusses the advantage of proposed approach. Finally, the becoming a very active topic in field of fingerprint summary and conclusions are given in last section recognition and iris recognition, but efforts on live face detection are still very limited though live face detection is II. RELATED WORK highly desirable. The most common faking way is to use a In recent years face recognition has received substantial facial photograph of a valid user to spoof face recognition attention from both research communities and the market, systems. Most of the current face recognitionw works with but still remained very challenging in real applications. A lot excellent performance are based on intensity images and of face recognition algorithms have been developed during equipped with a generic camera. Thus, an anti-spoofing the past decades. Face recognition consists in localizing the method without additional devicee will be preferable, since it most characteristic face components (eyes, nose, mouth, could be easily integrated into the existing face recognition etc.) within images that depict human faces This step is systems [17] [18]. essential for the initialization of many face processing i III. FEATURE EXTRACTION techniques like face tracking, facial expression recognition or face recognition. Among these, face recognition is a Facial recognition is the identification of humans by the lively research area where a great effort has been made in unique characteristics of their faces. It has attracted a lot of the last years to design and compare different techniques attention Vbecause of its potential applications. Among face [9]. Hong and Jain [10] designed a decision fusion scheme recognition algorithms, appearance-based approaches (PCA, to combine faces and fingerprint for personal identification. LDA, and ICA) are the most popular. These approaches Brunelli and Falavigna [11] presented a person identification utilize the pixel intensity or intensity-derived features [12]. system by combining outputs from classifiers based on In this paper, the PCA method using eigenfaces was adopted audio and visual cues. Face recognition algorithms are for face recognition. PCA is a way of identifying patterns in categorized into appearance based and model-basedy data, and expressing the data in such a way as to highlight schemes. For appearance-based methods, three linear their similarities and differences. The main idea of the subspace analysis schemes are presented (PCA, l LDA, and principal component analysis (or Karhunen-Loeve ICA) [12].The model-based approaches include Elastic transform) is to find the vectors which best account for the Bunch Graph matching [13], Active Appearance Model [14] distribution of face images within the entire image space. and 3D Morphable Model [15] methods.r Among face These vectors define the subspace of face images, which we recognition algorithms, appearance-based approaches are call "face space". Because these vectors are the eigenvectors the most popular. These approaches utilize the pixel of the covariance matrix corresponding to the original face intensity or intensity-derived features. images, and because they are face like in appearance, we The template protection schemesa proposed in the literature refer to them as ―eigenfaces‖. Eigenfaces are a set of can be broadly classified into two categories, feature eigenvectors used in the computer vision problem of human transformation approach and biometric cryptosystem face recognition. The eigenfaces are the principal approach [8]. In theE feature transformation approach, a components of a distribution of faces, or equivalently, the transformation function is applied to the biometric template eigenvectors of the covariance matrix of the set of face and only the transformed template is stored in the database. images, where each image with NxN pixels is considered a The same transformation function is applied to query point (or vector) in N2-dimensional space [19]. The idea of features and the transformed query is directly matched using principal components to represent human faces was against the transformed template. Depending on the developed by Sirovich and Kirby [20] and used by Turk and characteristics of the transformation function, the feature Pentland [21] for face detection and recognition. Eigenfaces transform schemes can be further categorized as salting and are mostly used to: non-invertible transforms. In a biometric cryptosystem, (a) Extract the relevant facial information, which may or some public information about the biometric template is may not be directly related to face features such as the eyes, stored. This public information is referred to as helper data nose, and lips. One way to do so is to capture the statistical and hence, biometric cryptosystems are also known as variation between face images. helper data-based methods. While the helper data does not P a g e | 36 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

(b) Represent face images efficiently. To reduce the A. Random selection of three Facial views computation and space complexity, each face image can be This step is responsible for performing liveness detection. represented using a small number of dimensions. Here, the person to be recognized is required to stand in Mathematically, it is simply finding the principal front of camera which is focused upon full height of the components of the distribution of faces, or the eigenvectors person. Based upon the random order (LFR or LRF or FLR of the covariance matrix of the set of face images, treating or FRL or RFL or RLF, L: Left; F: Front; R: Right) an image as a point or a vector in a very high dimensional generated by the random select module as shown in fig. 2, space. Each eigenvector is accounting for a different amount the person is asked to move left or right or look at front. The of the variations among the face images. These eigenvectors camera is focused upon the entire body to examine the can be imagined as a set of features that together actual body movement but it will capture only the images of characterize the variation between face images [19] face in the order selected by the module. The module which IV. PROPOSED APPROACH generates random order of three views will detect whether the person is live or not by instructing the person to move Figure 2 shows the block diagram of the proposed approach left or right or look at front. Complete body movement is for template protection in face recognition system. The main examined through camera and face images will be captured idea behind the proposed approach is to generate secure only if the person is live. To perform liveness detection, the hybrid templates by integrating three different views (left, random select module can be equipped with the following front and right) of input face image and then changing the decision process which first checks liveness and then components of resulting face feature set. w performs person recognition

if data = live

perform acquisition and extraction

else if data = not live e

do not perform acquisition and extraction

B. Extraction of Featurei sets of three Facial views

performs feature set extraction of three views (Left, Front,

Right) of input face image by using PCA (appearance based)

face recognition technique. PCA method is applied V individually on each view of the face image to extract the

corresponding feature set. When using PCA, each face

image is assumed to be a 2-dimensional array of intensity

values. It is represented as 1-dimensional vector by

concatenating each row (or column) into a long thin vector.

By projecting the face vector to the basis vectors, the y projection coefficients are used as the feature representation

of each face image. The PCA method using eigenfaces

l consists of the following two stages [10]

1) training stage, in which a set of N face images are

r collected; eigenfaces that correspond to the M highest

eigenvalues are computed from the data set; and each face is

represented as a point in the M dimensional eigenspace, and

2) operational stage, in which each test image is first a projected onto the M-dimensional eigenspace; the M Fig. 2: Architecture of proposed approach for template dimensional face representation is then deemed as a feature protection in face recognition system vector and fed to a classifier to establish the identity of the The proposed approach can be roughly divided into the E individual. following four steps: For each face image, we obtain a feature vector by A. Random selection of three Facial views (Left, Front, projecting image onto the subspace generated by the Right) to perform Liveness Detection principal directions of the covariance matrix. After applying B. Extraction of Feature sets of three Facial views the projection, the input vector (face) in an n-dimensional C. Fusion of Feature sets of three Facial views (Left + Front space is reduced to a feature vector in an m-dimensional + Right) subspace (M<< N) [9]. D. Random shuffling and addition of components of Eigen Thus, the feature vectors of three individual face views can vector (resulting after fusion) be represented in terms of eigen vectors as described below

eigen vector for left face view V = [a , a , a , a …a ] L 1 2 3 4 m eigen vector for front face view VF = [b1, b2, b3, b4…bm]

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 37 eigen vector for right face view VR = [c1, c2, c3, c4…cm] Random shuffling of coefficients in the eigen vector that is where VL, VF, VR represent the feature sets in terms of eigen- obtained after fusion of three feature sets and addition coefficients of three views of face image respectively. among coefficients of shuffled eigen vector will generate the hybrid template that would be finally stored in the system database. The resulting hybrid template will contain half the C. Fusion of Feature sets of three Facial views no. of coefficients in the original eigen vector that was Fusion involves consolidating the evidence presented by two obtained in the previous step. The no. of coefficients are or more biometric feature sets of the same individual. This reduced by addition function. This approach will make the step performs fusion of feature sets of three face views of template more secure against spoof attacks and will take less the same image at feature level [6]. Here, the three feature memory in the database. sets originate from the same feature extraction algorithm V. ADVANTAGE OF USING PROPOSED APPROACH (PCA). Fusion of three face views is performed by just averaging them as given below basic idea of the proposed approach is that instead of storing X = (VL + VF + VR)/3 the original template in database, it is stored after (1) performing fusion, shuffling and addition in the coefficients The resulting fused eigen vector can be represented as X = of eigen vector. The proposed approach offers advantage in [x1, x2, x3, x4…xm]. terms of liveness detection, intra class variation, and template security by providing the ability to discard the D. Random shuffling and addition of components of Eigen stolen template information. Here, multiplew samples of the vector same biometric trait (face) are captured in order to account This step in the proposed approach is responsible for for the intra class variations that can occur in the trait and performing changes in the eigen vector that is obtained after for checking liveness of acquirede biometric sample. This fusion of three feature sets. It will make the resulting approach for liveness detection is natural, non-intrusive and template more secure. This step is illustrated in fig.3 below no extra hardware is required. But it requires user collaboration by instructingi the user to move left, right or stand in front of camera. It provides template security by performing fusion of feature sets of three facial views (Left, Front, Right), random shuffling Vof eigen-coefficients in the fused eigen vector and addition among the shuffled eigen-coefficients. If the template is found to be compromised, the proposed approach provides the ability to discard it and reissue with new shuffling rules. In this way, with shuffling a number of eigen vectors can be generated. Also it is impossible for the y attacker to convert the stolen template into the original face data (PCA eigen vector). It is well known that each l eigenface represents certain characteristic features of faces and any original image can be reconstructed by combining the eigenfaces in right proportion. Hence, the original eigen r vector is not stored in the database rather it is stored after applying shuffling rules and then adding the shuffled coefficients according to the addition function as discussed in the previous section. Addition reduces the size of the Fig. 3: Steps to generate securea hybrid template from input eigen vector by half and hence the final hybrid template face images generated will be compact and more secure. Thus the The coefficients of eigen vector X are randomly shuffled. proposed scheme provides higher template security and By shuffling, randomlyE chosen columns are interchanged better recognition performance as compared to the case and every time we can generate a new eigen vector. when no measures for liveness detection and template X′ = Hshuffle (X) (2) protection are taken as in existing face recognition system where Hshuffle is the function which performs shuffling on using eigenfaces approach. the eigen vector X and X′ is the shuffled eigen vector. The VI. CONCLUSION number of coefficients in both X and X′ are same, shuffling just changes the order of columns. After that, addition Biometric template protection has become one of the among coefficients of shuffled eigen vector is performed in important issues in deploying a practical biometric system. some order. Addition function is described below: In this paper, a hybrid approach for template protection in p=m-2 face recognition system is proposed. This approach is based Addition = ∑ [xp + xp+2], (3) on the fusion of three different views (left, front, right view p = 1 captured randomly) of input face image, random shuffling of after every two iteration p is incremented with 3. coefficients in the eigen vector (extracted using PCA P a g e | 38 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology method) obtained after fusion and addition among Trans. Pattern Analysis and Machine Intelligence, coefficients in the shuffled eigen vector. On the theoretical vol. 20, no. 12, pp. 1295–1307, 1998. basis, it has been proved that the proposed approach 11) R. Brunelli and D. Falavigna, ―Person provides better template protection against spoof attacks as identification using multiple cues,‖ IEEE Trans. compared to the existing method. One of the weaknesses of Pattern Analysis and Machine Intelligence, vol. 17, biometrics is that once a biometric data or template is stolen, no. 10, pp. 955–966, Oct. 1995. it is stolen forever and cannot be reissued, or discarded. 12) [12] Xiaoguang Lu, ―Image Analysis for Face Thus template security has become very critical in these Recognition – A brief survey‖, Dept. of Computer systems. The proposed scheme provides new measures Science & Engineering, Michigan State University, (shuffling and addition) for template protection by giving personal notes, May 2003. the ability to discard the lost template and reissue a new one. 13) L. Wiskott, J.M. Fellous, N. Kruger, and C. von der Malsburg, ―Face recognition by elastic bunch VII. REFERENCES graph matching,‖ IEEE Trans. Pattern Analysis and 1) Bori Toth, ―Biometric Liveness Detection‖, Machine Intelligence, vol. 19, no. 7, pp. 775–779, Information Security Bulletin, October 2005, 1997. Volume 10, pages 291-297. 14) G.J. Edwards, T.F. Cootes, and C.J. Taylor, ―Face recognition using active appearance models,‖ in 2) International Biometric Group. Liveness detection Proc. European Conference on Computer Vision, in biometric systems, 2003. White paper. Available 1998, vol. 2, pp. 581–695. w at http://www.biometricgroup.com/reports/public/ 15) V. Blanz and T. Vetter, ―A morphable model for reports/liveness.html. the synthesis of 3D faces,‖ in Proc. ACM 3) A. K. Jain, A. Ross, and S. Prabhakar, ―An SIGGRAPH, Mar. 1999,e pp. 187–194. introduction to biometric recognition,‖ IEEE Trans. 16) U. Uludag, S. Pankanti, S. Prabhakar, and A. K. on Circuits and Systems for Video Technology, Jain, ―Biometric Cryptosystems: Issues and vol. 14, pp. 4–20, Jan 2004. Challenges,‖i vol. 92, no. 6, pp. 948–960, June 4) Arun Ross and Anil K. Jain, ―Multimodal 2004. biometrics: An overview‖, appeared in Proc. of 17) Gang Pan, Zhaohui Wu and Lin sun, ―Liveness 12th European Signal Processing Conference Detection for Face Recognition‖, Recent Advances (EUSIPCO), (Vienna, Austria), pp. 1221-1224, inV Face Recognition, pages 109-123, December September 2004. 2008, I-Tech, Vienna, Austria. 5) Arun Ross, ―An Introduction to Multibiometrics‖, 18) Jiangwei Li, Yunhong Wang, Tieniu Tan, A.K. EUSIPCO, 2007. Jain, ―Live Face Detection Based on the Analysis 6) A. Ross, K. Nandakumar, and A. K. Jain, of Fourier Spectra‖, Biometric Technology for Handbook of Multibiometrics, New York, Human Identification, Proceedings of. SPIE, Vol. Springer, 2006. y 5404. 7) B. Schneier, ―The uses and abuses of biometrics‖, 19) Y. Vijaya Lata, Chandra Kiran Bharadwaj Communications of the ACM, vol. 42,l no. 8, pp. Tungathurthi, H. Ram Mohan Rao, A. Govardhan, 136, Aug. 1999. L. P. Reddy, ―Facial Recognition using Eigenfaces 8) A. K. Jain, K. Nandakumar and A. Nagar, by PCA‖, International Journal of Recent Trends in ―Biometric Template Security‖,r EURASIP Journal Engineering, Vol. 1, No. 1, May 2009. on Advances in Signal Processing, January 2008. 20) L. Sirovich and M. Kirby, ―Low-dimensional 9) Lu, X., Wang, Y. & Jain, A.K. (2003). Combining procedure for the characterization of human faces‖, Classifiers for Face Recognition, In IEEE Journal of the Optical Society of America A 4: Conference on Multimediaa & Expo, Vol. 3, pp. 13- 519–524, 1987. 16. 21) M.Turk and A. Pentland, "Eigenfaces for 10) L. Hong and A.K Jain, ―Integrating faces and Recognition", Journal of Cognitive Neuroscience, fingerprint for personal identification,‖ IEEE March 1991. E

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 39

QRS Wave Detection Using Multiresolution Analysis

S.Karpagachelvi1 Dr.M.Arthanari, Prof. & Head2 M.Sivakumar3

Abstract-Te electrocardiogram (ECG or EKG) is basically a based on time domain method. But this is not always diagnostic tool that measures and records the electrical signal adequate to study all the features of ECG signals. Therefore by comparing the activity of heart. It is most commonly used to the frequency representation of a signal is required. The perform cardiac test, since it acts as screening tool for cardiac deviations in the normal electrical patterns indicate various abnormalities. This is necessary because no single point cardiac disorders. Cardiac cells, in the normal state are provides a complete picture of what is going on in the heart. It electrically polarized [5]. mainly comprises of PQRS&T wave by showing corresponding time and frequency. PQRST key feature detector is based on ECG is essentially responsible for patient monitoring and wavelet transform which robust to time varying and noise. It diagnosis. The extracted feature from the ECG signal plays will analyze the waveform including noise purification, sample a vital in diagnosing the cardiac disease.w The development design of digital ECG. R peak is mainly used for detection. In of accurate and quick methods for automatic ECG feature this work, we have developed an electrocardiogram (ECG) extraction is of major importance. Therefore it is necessary feature extraction system based on the multi-resolution wavelet that the feature extraction system performs accurately. The transform. It mainly includes two stages. In the first stage, purpose of feature extractione is to find as few properties as algorithm is quoted by using discrete wavelet transform for de- possible within ECG signal that would allow successful noise the signal. In second step multiresolution is done for QRS abnormality detection and efficient prognosis. complex detection. The proposed schemes were mostly based i on Fuzzy Logic Methods, Artificial Neural Networks (ANN), Genetic Algorithm (GA), Support Vector Machines (SVM), and other Signal Analysis techniques. Keywords-Cardiac Cycle, ECG signal, P-QRS-T waves, V Feature Extraction, Haar wavelets.

I. INTRODUCTION he investigation of the ECG has been extensively used T for diagnosing many cardiac diseases. The ECG is a realistic record of the direction and magnitude of y the electrical commotion that is generated by depolarization and re-polarization of the atria and ventricles. One cardiac cycle in an ECG signal consists of the P-QRS-T waves.l Figure 1 shows a sample ECG signal. The majority of the clinically useful information in the ECG is originated in the intervals and amplitudes defined by its features (characteristicr wave peaks and time durations). The improvement of precise and Figure.1 A Sample ECG Signal showing P-QRS-T Wave rapid methods for automatic ECG feature extraction is of chief importance, particularly a for the examination of long recent year, several research and algorithm have been recordings [1]. developed for the exertion of analyzing and classifying the The ECG feature extraction system provides fundamental ECG signal. The classifying method which have been features (amplitudes and intervals) to be used in subsequent proposed during the last decade and under evaluation automatic analysis. InE recent times, a number of techniques includes digital signal analysis, Fuzzy Logic methods, have been proposed to detect these features [2] [3] [4]. The Artificial Neural Network, Hidden Markov Model, Genetic previously proposed method of ECG signal analysis was Algorithm, Support Vector Machines, Self-Organizing Map, ______Bayesian and other method with each approach exhibiting its own advantages and disadvantages. In this work, we have About-1Doctoral Research Scholar, Mother Teresa Women's University, developed an electrocardiogram (ECG) feature extraction Kodaikanal, Tamilnadu, India.(email;[email protected]) system based on the multi-resolution wavelet transform About-2Dept. of Computer Science and Engineering,Tejaa Shakthi Institute of Technology for Women Coimbatore- 641 659, Tamilnadu, India using haar coefficients and also provide an over view on (email: [email protected]) various techniques and transformations used for extracting About-3 Doctoral Research Scholar Anna University – Coimbatore the feature from ECG signal. This paper is structured as Tamilnadu, India(email;[email protected]) follows. Section 2 discusses the related work that was earlier proposed in literature for ECG feature extraction. Section 3 P a g e | 40 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology gives a description of the DWT based ECG feature detection discussions algorithm and Section 4 concludes the paper with fewer extracted from wavelet decomposition of the ECG images II. RELATED WORK intensity. The obtained ECG features are then further ECG feature extraction has been studied from early time and lots of advanced techniques as well as transformations have been proposed for accurate and fast ECG feature extraction. processed using artificial neural networks. The features are: This section of the paper discusses various techniques and mean, median, maximum, minimum, range, standard transformations proposed earlier in literature for extracting deviation, variance, and mean absolute deviation. The feature from ECG. introduced ANN was trained by the main features of the 63 A novel approach for ECG feature extraction was put forth ECG images of different diseases. by Castro et al. in [6]. Their proposed paper present an An algorithm was presented by Chouhan and Mehta in [9] algorithm, based on the wavelet transform, for feature for detection of QRS complexities. The recognition of QRS- extraction from an electrocardiograph (ECG) signal and complexes forms the origin for more or less all automated recognition of abnormal heartbeats. Since wavelet ECG analysis algorithms. The presented algorithm utilizes a transforms can be localized both in the frequency and time modified definition of slope, of ECG signal, as the feature domains. They developed a method for choosing an optimal for detection of QRS. A succession of transformations of the mother wavelet from a set of orthogonal and bi-orthogonal filtered and baseline drift corrected ECG signal is used for wavelet filter bank by means of the best correlation with the mining of a new modified slope-feature.w In the presented ECG signal. The coefficients, approximations of the last algorithm, filtering procedure based on moving averages scale level and the details of the all levels, are used for the [15] provides smooth spike-free ECG signal, which is ECG analyzed. They divided the coefficients of each cycle appropriate for slope feature eextraction. The foremost step is into three segments that are related to P-wave, QRS to extort slope feature from the filtered and drift corrected complex, and T-wave. The summation of the values from ECG signal, by processing and transforming it, in such a these segments provided the feature vectors of single cycles. way that the extractedi feature signal is significantly Mahmoodabadi et al. in [1] described an approach for ECG enhanced in QRS region and suppressed in non-QRS region. feature extraction which utilizes Daubechies Wavelets Xu et al. in [10] described an algorithm using Slope Vector transform. They had developed and evaluated an Waveform (SVW) for ECG QRS complex detection and RR electrocardiogram (ECG) feature extraction system based on interval evaluation.V In their proposed method variable stage the multi-resolution wavelet transform. The ECG signals differentiation is used to achieve the desired slope vectors from Modified Lead II (MLII) were chosen for processing. for feature extraction, and the non-linear amplification is The wavelet filter with scaling function further intimately used to get better of the signal-to-noise ratio. The method similar to the shape of the ECG signal achieved better allows for a fast and accurate search of the R location, QRS detection. The foremost step of their approach was to de- complex duration, and RR interval and yields excellent ECG noise the ECG signal by removing the equivalent wavelety feature extraction results. In order to get QRS durations, the coefficients at higher scales. Then, QRS complexes are feature extraction rules are needed. detected and each one complex is used to trace thel peaks of A modified combined wavelet transforms technique was the individual waves, including onsets and offsets of the P developed by Saxena et al. in [11]. The technique has been and T waves which are present in one cardiac cycle. developed to analyze multi lead electrocardiogram signals A feature extraction method using r Discrete Wavelet for cardiac disease diagnostics. Two wavelets have been Transform (DWT) was proposed by Emran et al. in [7]. used, i.e. a quadratic spline wavelet (QSWT) for QRS They used a discrete wavelet transform (DWT) to extract the detection and the Daubechies six coefficient (DU6) wavelet relevant information from the ECG input data in order to for P and T detection. A procedure has been evolved using perform the classification task.a Their proposed work electrocardiogram parameters with a point scoring system includes the following modules data acquisition, pre- for diagnosis of various cardiac diseases. The consistency processing beat detection, feature extraction and and reliability of the identified and measured parameters classification. In the feature extraction module the Wavelet were confirmed when both the diagnostic criteria gave the Transform (DWT) isE designed to address the problem of same results. Table 1 shows the comparison of different non-stationary ECG signals. It was derived from a single ECG signal feature extraction techniques. generating function called the mother wavelet by translation Fatemian et al.[12] proposed an approach for ECG feature and dilation operations. Using DWT in feature extraction extraction. They suggested a new wavelet based framework may lead to an optimal frequency resolution in all frequency for automatic analysis of single lead electrocardiogram ranges as it has a varying window size, broad at lower (ECG) for application in human recognition. Their system frequencies, and narrow at higher frequencies. The utilized a robust preprocessing stage, which enables it to DWT characterization will deliver the stable features to the handle noise and outliers. This facilitates it to be directly morphology variations of the ECG waveforms. applied on the raw ECG signal. In addition the proposed Tayel and Bouridy together in [8] put forth a technique for system is capable of managing ECGs regardless of the heart ECG image classification by extracting their feature using rate (HR) which renders making presumptions on the wavelet transformation and neural networks. Features are individual's stress level unnecessary. The substantial

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 41 reduction of the template gallery size decreases the storage QRS recognition is shown in Figure 3. requirements of the system appreciably. Additionally, the 1000 categorization process is speeded up by eliminating the need 0 Signal -1000 for dimensionality reduction techniques such as PCA or 20000 200 400 600 800 1000 1200 1400 0 LDA. Their experimental results revealed the fact that the a5 -2000 proposed technique out performed other conventional 50000 5 10 15 20 25 30 35 40 0 methods of ECG feature extraction. d5 -5000 0 5 10 15 20 25 30 35 40 III. DESCRIPTION OF ALGORITHM 2000

0 d4 A. Wavelet Selection -2000 10000 10 20 30 40 50 60 70 80 0 The large number of known wavelet families and functions d3 provides a rich space in which to search for a wavelet which -1000 5000 20 40 60 80 100 120 140 160

will very efficiently represent a signal of interest in a large 0 d2 variety of applications. Wavelet families include -500 1000 50 100 150 200 250 300 350 Biorthogonal, Coiflet, Harr, Symmlet, Daubechies wavelets, 0 d1 etc. There is no absolute way to choose a certain wavelet. -100 The choice of the wavelet function depends on the 0 100 200 300 400 500 600 700 application. The Haar wavelet algorithm has the advantage Fig.2. Multiresolution decomposition of ECG signal of being simple to compute and easy to understand. In the from 801.dat filew Vector magnitute present work Haar wavelet is chosen. Savitzky Golay 0.2 filtering is used to smooth the signal. To identify the onsets Mag. of Signal Q Amp and offsets of the wave , the wave is made to zero base. To 0.15 e R Amp obtain the wavelet analysis, we used the Matlab program, S Amp which contains a very good ―wavelet toolbox‖. First the 0.1 i considered signal was decomposed using Haar wavelet of the order of 1-5 has been evaluated. One of the key criteria 0.05 of a good mother wavelet is its ability to fully reconstruct the signal from the wavelet decompositions. The fig 2 shows 0 the decomposed signal. The high frequency components of V the ECG signal decreases as lower details are removed from -0.05 the original signal. As the lower details are removed, the -0.1 signal becomes smoother and the noises disappears since noises are marked by high frequency components picked up -0.15 along the ways of transmission. This is the contribution of 0 0.05 0.1 0.15 0.2 0.25 the discrete wavelet transform where noise filtrationy is performed implicitly. Fig.3. Multiresolution process of wavelet-based peak Detection in 801.dat file B. Peaks identification l IV. CONCLUSION In order to detect the peaks, specific details of the signal were selected. R peaks are the Largestr amplitude points In this paper, QRS key feature elements detection algorithm which are greater than threshold points are located in the based on multi resolution analysis was proposed. The wave. Those maxima points are stored and the R-R interval performance of the peak detection was examined by testing is determined. Their mean value is found which is used to the algorithm on data standardized MIT-BIH database. The a DWT based QRS detector performs well with standard find the portion of the single wave. A Q and S peak occurs about the R peak with in 0.1second. Calculating the distance techniques. Thus, the primary advantages of the DWT over from zero point or close zero left side of R peak within the existing techniques are noise removal and ability to process threshold limit denotesE Q peak. The onset is the beginning the time varying ECG data. In this work we pointed out the of the Q wave (or R-wave if the Q-wave is missing) and the advantage of using wavelet transform associated with a offset is the ending of the S-wave (or R-wave if the S wave threshold is missing). Normally, the onset of the QRS complex strategy. Further, the possibility of detecting positions of contains the high-frequency components, which are detected QRS complexes in ECG signals is investigated and a simple at finer scales. Calculating the distance from zero point or detection algorithm is proposed. The main advantage of this close zero right side of R peak within the threshold limit kind of detection is less time consuming analysis for long denotes Q peak. time ECG signal. The QRS detection in the ECG signal is explained with screen shots. The future work mainly C. Results concentrates on improving the proposed algorithm for The algorithm presented in this section is applied directly at various QRS waves of different patients. Moreover one run over the whole digitized ECG signal which are additional statistical data will be utilized for evaluating the saved as data files provided by Physionet. performance of an algorithm in ECG signal feature P a g e | 42 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology detection. Improving the accuracy of diagnosing the cardiac 10) Xiaomin Xu, and Ying Liu, ―ECG QRS Complex disease at the earliest is necessary in the case of patient Detection Using Slope Vector Waveform (SVW) monitoring system. Therefore our future work also has an Algorithm,‖ Proceedings of the 26th Annual eye on improvement in diagnosing the cardiac disease. International Conference of the IEEE EMBS, pp. 3597-3600, 2004. V. REFERENCES 11) S. C. Saxena, V. Kumar, and S. T. Hamde, 1) S. Z. Mahmoodabadi, A. Ahmadian, and M. D. ―Feature extraction from ECG signals using Abolhasani, ―ECG Feature Extraction using wavelet transforms for disease diagnostics,‖ Daubechies Wavelets,‖ Proceedings of the fifth IASTED International conference on Visualization, International Journal of Systems Science, vol. 33, Imaging and Image Processing, pp. 343-348, 2005. no. 13, pp. 1073-1085, 2002. 2) Juan Pablo Martínez, Rute Almeida, Salvador 12) S. Z. Fatemian, and D. Hatzinakos, ―A new ECG Olmos, Ana Paula Rocha, and Pablo Laguna, ―A feature extractor for biometric recognition,‖ 16th Wavelet-Based ECG Delineator: Evaluation on International Conference on Digital Signal Standard Databases,‖ IEEE Transactions on Processing, pp. 1-6, 2009. Biomedical Engineering Vol. 51, No. 4, pp. 570- 581, 2004. 3) Krishna Prasad and J. S. Sahambi, ―Classification of ECG Arrhythmias using Multi-Resolution w Analysis and Neural Networks,‖ IEEE Transactions on Biomedical Engineering, vol. 1, pp. 227-231, 2003. e 4) Cuiwei Li, Chongxun Zheng, and Changfeng Tai, ―Detection of ECG Characteristic Points using Wavelet Transforms,‖ IEEE Transactions on i Biomedical Engineering, Vol. 42, No. 1, pp. 21-28, 1995. 5) Saritha, V. Sukanya, and Y. Narasimha Murthy, ―ECG Signal Analysis Using Wavelet V Transforms,‖ Bulgarian Journal of Physics, vol. 35, pp. 68-77, 2008. 6) B. Castro, D. Kogan, and A. B. Geva, ―ECG feature extraction using optimal mother wavelet,‖ The 21st IEEE Convention of the Electrical and Electronic Engineers in Israel, pp. 346-350, 2000.y 7) Emran M. Tamil, Nor Hafeezah Kamarudin, Rosli Salleh, M. Yamani Idna Idris, Noorzailyl M.Noor, and Azmi Mohd Tamil, ―Heartbeat Electrocardiogram (ECG) Signal Feature Extraction Using Discrete Waveletr Transforms (DWT).‖ 8) Mazhar B. Tayel, and Mohamed E. El-Bouridy, ―ECG Images Classification Using Feature Extraction Based On Waveleta Transformation And Neural Network,‖ ICGST, International Conference on AIML, June 2006. 9) V. S. Chouhan, and S. S. Mehta, ―Detection of E QRS Complexes in 12-lead ECG using Adaptive Quantized Threshold,‖ IJCSNS International Journal of Computer Science and Network Security, vol. 8, no. 1, 2008.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 43

A Review on Data Clustering Algorithms for Mixed Data

D. Hari Prasad1 Dr. M. Punithavalli2

Abstract-Clustering is the unsupervised classification of over traditional central grouping techniques, which are patterns into groups (clusters). The clustering problem has centered on the conception of ―feature‖ (see e.g. [2], [3]). been addressed in many contexts and by researchers in many Several data clustering techniques have been put forth by disciplines; this reflects its broad appeal and usefulness as one researchers to assist in the development of knowledge. of the steps in exploratory data analysis. In general, clustering Fuzzy clustering [4] is a simplification of crisp clustering is a method of dividing the data into groups of similar objects. where each sample has a varying degree of membership in One of significant research areas in data mining is to develop all clusters. In many real-world applications, in fact, a methods to modernize knowledge by using the existing knowledge, since it can generally augment mining efficiency, feasible feature-based description of objects might be especially for very bulky database. Data mining uncovers difficult to obtain or inefficient for learning purposes while, hidden, previously unknown, and potentially useful on the other hand, it is often possible tow obtain a measure of information from large amounts of data. This paper presents a the similarity or dissimilarity between objects. Among the general survey of various clustering algorithms. In addition, central algorithmic procedures for perceptual organization the paper also describes the efficiency of Self-Organized Map are clustering principles like generalized k-means methods (SOM) algorithm in enhancing the mixed data clustering or clustering methods for proximitye data [15]. Keywords-Data Clustering, Data Mining, Mixed Data The remainder of this paper is organized as follows section Clustering, Self-Organized Map algorithm. II describes the backgroundi study that is related to clustering I. INTRODUCTION algorithms proposed earlier, section III explains the challenging problems and areas of research and section IV lustering is one of the standard workhorse techniques in concludes the paper with fewer discussions. C the field of data mining. Its intention is to systematize a dataset into a set of groups, or clusters, which contain V II. BACKGROUND STUDY ―similar‖ data items, as measured by some distance A wealth of clustering techniques had been described in the function. The major applications of clustering include literature. This section of the paper presents an overview on document categorization, scientific data analysis, and these clustering algorithms put forth by various researchers. customer/market segmentation. Data clustering has been In general, major clustering methods can be classified into considered as a primary data mining method for knowledge five categories: partitioning methods, hierarchical methods, discovery. Clustering using Gaussian mixture models is yalso density-based methods, grid-based methods and model- extensively employed for exploratory data analysis. The six based methods. sequential, iterative steps of Data mining processesl are: 1) problem definition; 2) data acquisition; 3) data A. Clustering of the Self-Organizing Map preprocessing and survey; 4) data modeling; 5) evaluation; A novel method [1] was put forth by Juha Vesanto and Esa 6) knowledge deployment [1]. The purposer of survey before Alhoniemi for clustering of Self-Organizing Map. data preprocessing is to gain insight knowledge into the data According to the method proposed in this paper the possibilities and problems to determine whether the data are clustering is carried out using a two-level approach, where sufficient. Moreover the surveya assists us to select the the data set is first clustered using the SOM, and then, the proper preprocessing and modeling tools. Typically, several SOM is clustered. The purpose of this paper was to evaluate different data sets and preprocessing strategies need to be if the data abstraction created by the SOM could be considered. For this reason, efficient visualizations and employed in clustering of data. The most imperative summarizations are essential.E advantage of this procedure is that computational load Primarily the focus must be on clustering since they are decreases noticeably, making it possible to cluster large data important characterizations of data. The clustering method sets and to consider several different preprocessing implemented should be fast, robust, and visually efficient. In strategies in a restricted time. Obviously, the approach is the case of clustering Q means, the foremost step is applicable only if the clusters found using the SOM are partitioning a data set into a set of clusters QiBB ,BB where i = 1 C. analogous to those of the original data. Data clustering techniques are gaining escalating reputation ______B. Kernel-Based Clustering

About-1Senior Lecturer, Department of Computer Applications, Sri Mark Girolami presents a Mercer Kernel-Based Clustering Ramakrishna Institute of Technology, Coimbatore, India. [5] algorithm in Feature Space. This paper presents a About-2Director, Department of Computer Science, Sri Ramakrishna Arts method for both the unsupervised partitioning of a sample of College for Women, Coimbatore, India. data and the estimation of the possible number of inherent P a g e | 44 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology clusters which generate the data. This work utilizes the F. Improving Classification Decisions by Multiple perception that Knowledge performing a nonlinear data transformation into some high The new approach to combine multiple sets of rules for text dimensional feature space increases the probability of the categorization using Dempster‘s rule of combination [10] linear separability of the patterns within the transformed was described by Yaxin Bi et al. A boosting-like technique space and therefore simplifies the associated data structure. for generating multiple sets of rules based on rough set In this case, the eigenvectors of a kernel matrix which theory and model classification decisions from multiple sets defines the implicit mapping provides a means to estimate of rules as pieces of evidence which can be combined by the number of clusters inherent within the data and a Dempster‘s rule computationally simple iterative procedure is presented for of combination is developed in this approach. This approach the subsequent feature space partitioning of the data. is employed to set of benchmark data collection, both C. Grouping of Smooth Curves and Texture Segmentation individually and in combination. The experimental results using path-based clustering show that the performance of the best combination of the multiple sets of rules on the benchmark data is significantly A Path-Based Clustering algorithm [6] was described by better than that of the best single set of rules. Fischer and Buhmann for grouping of smooth curves and texture segmentation. This paper proposed a new grouping G. Clustering Algorithm for Data Mining approach referred to as Path-Based Clustering [7], which Zhijie Xu et al. expressed a Modified Clustering Algorithm measures local homogeneity rather than global similarity of w for Data Mining [11]. This paper describes a clustering objects. The new Path-Based Clustering method defines a method for unsupervised classification of objects in large connectedness criterion, which groups objects together if data sets. The new methodology particularly combines the they are connected by a sequence of intermediate objects. simulating annealing algorithme with CLARANS (clustering Moreover an efficient agglomerative algorithm is proposed Large Application based upon Randomized Search) in order to minimize the Path-Based Clustering cost function. This to cluster large data sets efficiently. The parameter T is used approach utilizes a bootstrap resampling scheme to measure i to control the process of clustering. In every step of the the reliability of the grouping results. search, if the cost of the neighbor is less than the current, set D. Bagging for Path-Based Clustering the current to the neighbor. Otherwise, accept the neighbor with the probability of exp (-(Scost-currentcost)/T). Fischer and Buhmann present bagging for path-based V clustering [8]. A resampling scheme for clustering with H. Dominant Sets and Pairwise Clustering similarity to bootstrap aggregation (bagging) is presented in A graph-theoretic approach [12] for Pairwise data clustering this paper. This aggregation (Bagging) is used to develop was developed by Massimiliano Pavan and Marcello Pelillo. the quality of path-based clustering, a data clustering A correspondence is established between dominant sets and method that can extort stretched out structures from data in a the extrema of a quadratic form over the standard simplex, noise stout way. In order to increase the reliabilityy of thereby allowing the use of straightforward and easily clustering solutions, a stochastic resampling method is implementable continuous optimization techniques from developed to deduce accord clusters. Moreover this paper l evolutionary game theory. In order to study the robustness also evaluates the quality of path-based clustering with of the approach against random noise in the background, the resampling on a large image dataset of human level of clutter is allowed to vary, starting from 100 to 1,000 segmentations. r points. Extensions of the approach presented in this paper E. Isoperimetric Graph Partitioning for Data Clustering involving hierarchical data partitioning and out of-sample extensions of dominant-set clusters can be found in [13], Leo Grady and Eric L. Schwartz together proposed an and [14], respectively. approach known as Isoperimetrica Graph Partitioning for Data Clustering and Image Segmentation [9]. This paper, I. A Conceptual Clustering Algorithm adopts a different approach, based on finding partitions with Biswas et al. in [17] put forth a conceptual clustering a small isoperimetric constant in an image graph. The E algorithm for data mining. Their paper described an algorithm described in this paper generates high quality unsupervised discovery method with biases geared toward segmentations and data clusters of spectral methods, but partitioning objects into clusters that improve with improved speed and stability. The term ―partition‖ in interpretability. Their algorithm, ITERATE, employs: (i) a this paper refers to the assignment of each node in the vertex data ordering scheme and (ii) an iterative redistribution set into two (not necessarily equal) parts. Graph partitioning operator to produce maximally cohesive and distinct has been strongly influenced by properties of a clusters. The important task here is interpretation of the combinatorial formulation of the classic isoperimetric generated patterns, and this is best addressed by creating problem: For a fixed area, find the region with minimum groups of data that demonstrate cohesiveness within but perimeter. clear distinctions between the groups. In clustering schemes, data objects are represented as vectors of feature-value pairs.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 45

Features represent properties of an object that are relevant to datasets widely used to test categorical clustering algorithms the problem-solving task. Distinctness or inter-class show that SCCADDS produces clusters that are consistent dissimilarity was measured by an average of the variance of with those produced by existing algorithms, while avoiding the distribution match between clusters. Additionally, their the computation of the spectra of large matrices and empirical results demonstrated the properties of the problems inherent in methods that employ the K-means type discovery algorithm, and its applications to problem solving. algorithms J. The New K-Windows Algorithm for Improving the K- L. A New Supervised Clustering Algorithm Means Clustering Algorithm A new supervised clustering algorithm was projected by Li The new K-windows algorithm for improving the K-means et al. in [20]. They suggested their algorithm for data set clustering algorithm was described by Vrahatis et al. in [18]. with mixed attributes. Because of the complexity of data set The process of partitioning a large set of patterns into with mixed attributes, the conventional clustering algorithms disjoint and homogeneous clusters is fundamental in appropriate for this kind of dataset are not many and the knowledge acquisition. It is called Clustering in the result of clustering is not good. K-prototype clustering is literature and it is applied in various fields including data one of the most commonly used methods in data mining for mining, statistical data analysis, compression and vector this kind of data. They borrowed the ideas from the multiple quantization. The k-means is a very popular algorithm and classifiers combing technology, use k- prototype as the basis one of the best for implementing the clustering process. The clustering algorithm in order to design a multi-level k-means has a time complexity that is dominated by the clustering ensemble algorithm in w the paper, which product of the number of patterns, the number of clusters, adoptively selects attributes for re-clustering. Comparison and the number of iterations. Also, it often converges to a experiments on Adult data set from UCI machine learning local minimum. In their paper, they presented an data repository show verye competitive results and the improvement of the k-means clustering algorithm, aiming at proposed method is suitable for data editing. a better time complexity and partitioning accuracy. M. An Efficient Clustering Algorithm for mixed type Moreover, their approach reduces the number of patterns i attributes in Large Dataset that are needed to be examined for similarity using a windowing technique. The latter is based on well known Jian et al. in [21] proposed an efficient algorithm for spatial data structures, namely the range tree, which allows clustering mixed type attributes in large dataset. Clustering fast range searches. is a extensivelyV used technique in data mining. At present there exist many clustering algorithms, but most existing K. A Spectral-based Clustering Algorithm clustering algorithms either are restricted to handle the Abdu et al. in [19] presented a novel spectral-based single attribute or can handle both data types but are not algorithm for clustering categorical data that combines competent when clustering large data sets. Few algorithms attribute relationship and dimension reduction techniques can do both well. In this article, they proposed a clustering found in Principal Component Analysis (PCA) and Latenty algorithm that can handle large datasets with mixed type of Semantic Indexing (LSI). The new algorithm uses data attributes. They first used CF*tree (just like CF-tree in summaries that consist of attribute occurrencel and co- BIRCH) to pre-cluster datasets. After that the dense regions occurrence frequencies to create a set of vectors each of are stored in leaf nodes, and then they looked every dense which represents a cluster. They referred to these vectors as region as a single point and used the ameliorated k- ―candidate cluster representatives.‖ The ralgorithm also uses prototype to cluster such dense regions. Experimental results spectral decomposition of the data summaries matrix to showed that this algorithm is very efficient in clustering project and cluster the data objects in a reduced space. They large datasets with mixed type of attributes. referred to the algorithm as SCCADDS (Spectral-based N. A Robust and Scalable Clustering Algorithm Clustering algorithm for CAtegoricala Data using Data Summaries). SCCADDS differs from other spectral A robust and scalable clustering algorithm was put forth by clustering algorithms in several key respects. Initially, the Chiu et al. in [22]. They employed this clustering algorithm algorithm uses theE feature categories similarity matrix for mixed type attributes in large database environment. In instead of the data object similarity matrix (as is the case their paper, they proposed a distance measure that enables with most spectral algorithms that find the normalized cut of clustering data with both continuous and categorical a graph of nodes of data objects). SCCADDS scales well for attributes. This distance measure is derived from a large datasets. Second, non-recursive spectral-based probabilistic model that the distance between two clusters is clustering algorithms characteristically necessitate K-means equivalent to the decrease in log-likelihood function as a or some other iterative clustering method after the data result of merging. Calculation of this measure is memory objects have been projected into a reduced space. efficient as it depends only on the merging cluster pair and SCCADDS clusters the data objects directly by comparing not on all the other clusters. The algorithm is implemented them to candidate cluster representatives without the need in the commercial data mining tool Clementine 6.0 which for an iterative clustering method. Third, unlike standard supports the PMML standard of data mining model spectral-based algorithms, the complexity of SCCADDS is deployment. For data with mixed type of attributes, their linear in terms of the number of data objects. Results on experimental results confirmed that the algorithm not only P a g e | 46 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology generates better quality clusters than the traditional k-means Experimental result illustrates that the GA-based new algorithms, but also exhibits good scalability properties and clustering algorithm is reasonable for the large data sets with is able to identify the underlying number of clusters in the mixed numeric and categorical values. data correctly III. CHALLENGING PROBLEMS AND AREAS OF RESEARCH O. Clustering Algorithm for Network Intrusion Detection The algorithms proposed by researchers discussed in section system II of this paper have their own advantages and limitations. Panda et al. in [23] described some clustering algorithms The main requirements that a clustering algorithm should such as K-Means and Fuzzy c-Means for network intrusion satisfy are: scalability, dealing with different types of detection. The objective of intrusion detection is to construct attributes, discovering clusters with arbitrary shape, minimal a system which would automatically scan network activity requirements for domain knowledge to determine input and detect such intrusion attacks. They built a system which parameters, ability to deal with noise and outliers, created clusters from its input data, then automatically insensitivity to order of input records, high dimensionality, labeled clusters as containing either normal or anomalous interpretability and usability. A number of problems are data instances, and finally used these clusters to classify associated with conventional clustering algorithms. A few network data instances as either normal or anomalous. In among them are current clustering techniques do not address their paper, they intended to propose a fuzzy c-means all the requirements adequately (and concurrently), dealing clustering technique which is capable of clustering the most with large number of dimensions and large number of data suitable number of items can be problematic because of w time complexity, the clusters based on objective function. Both the training and effectiveness of the method depends on the definition of testing was done using 10% KDDCup‘99 data, which is a ―distance‖ (for distance-based clustering), if an obvious very well-liked and broadly used intrusion attack dataset. distance measure doesn‘t exist,e then one must ―define‖ it, which is not always easy, especially in multi-dimensional P. Clustering Algorithm-based on Quantum Games spaces, the result of the clustering algorithm (that in many A new clustering algorithm based on quantum games was cases can be arbitraryi itself) can be interpreted in different projected by Li et al. in [24]. Mammoth successes have been ways [16]. A lot of algorithms for clustering data have been made by quantum algorithms during the last decade. In their developed in recent decades, nonetheless, they all visage a paper, they combined the quantum game with the problem major challenge in scaling up to very large database sizes, of data clustering, and then they developed a quantum- an acceleratingV development brought on by advances in game-based clustering algorithm, in which data points in a computer technology, the Internet, and electronic commerce. dataset are considered as players who can make decisions The mainly focused research area is Clustering of mixed and implement quantum strategies in quantum games. After data. A clustering Q means partitioning a data set into a set each round of a quantum game, each player's expected of clusters Qi, where i = 1… C. In crisp clustering, each data payoff is calculated. Soon after, he uses a link-removing- sample belongs to exactly one cluster. Clustering algorithms and-rewiring (LRR) function to change his neighbors y and may be classified as Exclusive Clustering, Overlapping regulate the strength of links connecting to them in order to Clustering, Hierarchical Clustering, and Probabilistic maximize his payoff. Further, algorithms are discussedl and Clustering. Clustering objects into separated groups is an analyzed in two cases of strategies, two payoff matrixes and important topic in exploratory data analysis and pattern two LRR functions. Accordingly, the simulation results have recognition. Many clustering techniques group the data demonstrated that data points in datasetsr are clustered objects together to ―compact‖ clusters with the explicit or reasonably and efficiently, and the clustering algorithms implicit assumption that all objects within one group are have fast rates of convergence. Furthermore, the comparison either mutually similar to each other or they are similar with with other algorithms also provides an indication of the respect to a common representative or Centroid. Clustering effectiveness of the proposed approacha can also be based on mixture models [1]. In this approach, the data are assumed to be generated by several Q. A GA-based Clustering Algorithm parameterized distributions (typically Gaussians). Jie Li et al. in [25] proposed a GA-based clustering Distribution parameters are estimated using, for example, E the expectation-maximation algorithm. Data points are algorithm for large data sets with mixed and numeric and categorical values. In the field of data mining, it is assigned to different clusters based on their probabilities in frequently encountered to execute cluster analysis on large the distributions. The implementation of clustering data sets with mixed numeric and categorical values. algorithms to mixed data is one of the challenging issues However, most existing clustering algorithms are only IV. CONCLUSION competent for the numeric data rather than the mixed data set. For this reason, their paper presented a novel clustering This proposed paper describes various algorithms presented algorithm for these mixed data sets by modifying the by researchers for data clustering. Most of the real time common cost function, trace of the within cluster dispersion applications need clustering of data. This data clustering can matrix. The genetic algorithm (GA) is used to optimize the be implemented to mixed data which is the combination of new cost function to obtain valid clustering result. numeric and strings. The clustering algorithm proposed in

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 47 literature may have its own advantages and limitations. Transactions on Pattern Analysis and Machine Developing an algorithm that meets all the requirements of Intelligence, vol. 29, no. 1, January 2007. the system is tangible. Different clustering algorithms like k- 13) M. Pavan and M. Pelillo, ―Dominant Sets and means, path-based clustering, clustering of self organized Hierarchical Clustering,‖ Proceedings of IEEE map are used widely for real world applications. The future International Conference Computer Vision, vol. 1, work mainly concentrates on developing a clustering pp. 362-369, 2003. algorithm that meets all the requirements. Moreover, the 14) M. Pavan and M. Pelillo, ―Efficient Out-of-Sample future enhancement vision to develop a clustering algorithm Extension of Dominant-Set Clusters,‖ Advances in that performs significantly well for mixed data set Neural Information Processing Systems 17,L.K. Saul, Y. Weiss, and L. Bottou, eds., pp. 1057-1064, V. REFERENCES 2005. 1) Juha Vesanto and Esa Alhoniemi, ―Clustering of 15) J. M. Buhmann, ―Data Clustering and Learning,‖ Self-Organizing Map,‖ IEEE Transactions on Handbook of Brain Theory and Neural Networks, Neural Networks, vol. 11, no. 3, May 2000, pp. M. Arbib, ed., pp. 308-312, Bradfort Books/MIT 586-600. Press, second ed., 2002. 2) J. Shi and J. Malik, ―Normalized Cuts and Image 16) A Tutorial on Clustering Algorithms,http://home Segmentation,‖ IEEE Transactions on Pattern .dei.polimi.it/matteucc/Clustering/tutorial_html. Analysis and Machine Intelligence, vol. 22, no. 8, 17) Gautam Biswas, Jerry B. Weinberg, and Douglas pp. 888-905, Aug. 2000. H. Fisher, ―ITERATE: A Conceptualw Clustering 3) Y. Gdalyahu, D. Weinshall, and M. Werman, Algorithm for Data Mining,‖ IEEE Transactions on ―Self-Organization in Vision: Stochastic Clustering Systems, Man, and Cybernetics, vol. 28, part c, no. for Image Segmentation, Perceptual Grouping, and 2, pp. 100-111, 1998.e Image Database Organization,‖ IEEE Transactions 18) M. N. Vrahatis, B. Boutsinas, P. Alevizos, and G. on Pattern Analysis and Machine Intelligence, vol. Pavlides, ―The New k-Windows Algorithm for 23, no. 10, pp. 1053-1074, Oct. 2001. Improving thei k -Means Clustering Algorithm,‖ 4) J. C. Bezdek and S. K. Pal, Eds., ―Fuzzy Models Journal of Complexity, Elsevier, vol. 18, no. 1, pp. for Pattern Recognition: Methods that Search for 375-391, 2002. Structures in Data,‖ New York: IEEE, 1992. 19) Eman Abdu, and Douglas Salane, ―A spectral- 5) Mark Girolami, ―Mercer Kernel-based Clustering basedV clustering algorithm for categorical data in Feature space,‖ IEEE Transactions on Neural using data summaries,‖ International Conference Networks, vol. 13, no. 3, May 2002. on Knowledge Discovery and Data Mining, ACM, 6) Bernd Fischer, and J. M. Buhmann, ―Path-Based Article no. 2, 2009. Clustering for Grouping of Smooth Curves and 20) Shijin Li, Jing Liu, Yuelong Zhu, and Xiaohua Texture Segmentation,‖ IEEE Transactions on Zhang, ―A New Supervised Clustering Algorithm Pattern Analysis and Machine Intelligence, vol.y 25, for Data Set with Mixed Attributes,‖ Eighth ACIS no. 4, April 2003. International Conference on Software Engineering, 7) Fischer, T. Zoller, and J.M. Buhmann, ―Pathl Based Artificial Intelligence, Networking, and Pair wise Data Clustering with Application to Parallel/Distributed Computing, vol. 2, pp. 844- Texture Segmentation,‖ Energy Minimization 849, 2007. Methods in Computer Visionr and Pattern 21) Jian Yin, Zhi-Fang Tan, Jiang-Tao Ren, and Yi- Recognition, pp. 235-250, LNCS 2134, 2001. Qun Chen, ―An efficient clustering algorithm for 8) Bernd Fischer, and J. M. Buhmann, ―Bagging for mixed type attributes in large dataset,‖ Proceedings Path-Based Clustering,‖ IEEE Transactions on of 2005 International Conference on Machine Pattern Analysis and Machinea Intelligence, vol. 25, Learning and Cybernetics, vol. 3, pp. 1611-1614, no. 11, November 2003. 2005. 9) Leo Grady and Eric L. Schwartz, ―Isoperimetric 22) Tom Chiu, DongPing Fang, John Chen, Yao Wang, Graph Partitioning for Data Clustering and Image and Christopher Jeris, ―A robust and scalable Segmentation,‖E IEEE Transactions on Pattern clustering algorithm for mixed type attributes in Analysis and Machine Intelligence, 2004. large database environment,‖ International 10) Yaxin Bi, Sally McClean and Terry Anderson, Conference on Knowledge Discovery and Data ―Improving Classification Decisions by Multiple Mining, pp. 263-268, 2001. Knowledge,‖ Proceedings of the 17th IEEE 23) Mrutyunjaya Panda, and Manas Ranjan Patra, International Conference on Tools with Artificial ―Some Clustering Algorithms to Enhance the Intelligence, 2005. Performance of the Network Intrusion Detection 11) Zhijie Xu, Laisheng Wang, Jiancheng Luo and System,‖ Journal of Theoretical and Applied Jianqin Zhang, ―A Modified Clustering Algorithm Information Technology, pp. 710-716, 2008. Data Mining,‖ IEEE 2005. 24) Qiang Li, Yan He, and Jing-ping Jiang, ―A novel 12) Massimiliano Pavan and Marcello Pelillo, clustering algorithm based on quantum games,‖ ―Dominant Sets and Pairwise Clustering,‖ IEEE P a g e | 48 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Journal of Physics A: Mathematical and Theoritical, no. 44, 2009. 25) Jie Li, Xinbo Gao, and Li-cheng Jiao, ―A GA- Based Clustering Algorithm for Large Data Sets with Mixed Numeric and Categorical Values,‖ Proceedings of the 5th International Conference on Computational Intelligence and Multimedia Applications, IEEE Computer Society, p. 102, 2003

w

e i V

y l r

a E

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 49

Optimization Of Shop Floor Operations: Application Of Mrp And Lean Manufacturing Principles

Remy Uche1J.A. Onuoha2

Abstract-This research work is concerned with the optimization lowering production cost since less capital is tied up to of shop floor operations by the application of Material unused inventory. Requirements Planning (MRP) and lean manufacturing MRP systems relies on four pieces of information in principles. The present research covers the involvement of determining what material should be ordered and when. MRP and lean manufacturing techniques in manufacturing Namely: environment. The work is intended to decrease cycle time, The master production schedule: This describes when each reduce waste in material movement and inventory, improve the flow of material through improved system layouts and product is scheduled to be manufactured; subsequently increase productivity in shop floor environment. Bill of materials: Gives informationw about the product Keywords-Material Planning, Lean Manufacturing, structure, i.e., parts and raw material units necessary to Scheduling, Production, cycle time. manufacture one unit of the product of interest; Production cycle times and material needs at each stage of I. INTRODUCTION e the production cycle time and Supplier lead times. ncreasing shop floor efficiency through the integration of The master production schedule and bill of materials indicate what materialsi should be ordered; the master IMaterial Requirements Planning (MRP) and Lean manufacturing principles has become one of the major schedule, production cycle times and supplier lead times concerns of manufacturing companies. In today's complex jointly determine when orders should be placed. manufacturing sector, we are confronted to do more with The Master Production Schedule includes quantities of less, and also challenged with new philosophies and products toV be produced at a given time period. concepts that often push or pull us in different directions. A The Lean Manufacturing is a production method that calls case in point is the ongoing integration of MRP and lean for building products with as few steps and as little work-in- manufacturing principles. MRP systems are frequently process inventory as possible. It relies on work centres or condemned as one of the main reasons so many manufacturing cells that are capable of building multiple manufacturing companies, are locked into push systems, products, giving the company the flexibility to produce the while lean concepts imply that pull systems are the ideal.y exact mix and quantity of products required. Nevertheless, one shouldn't throw one out for the other, as Its fundamental objective is to provide perfect value to the the two can coexist harmoniously and beneficiallyl with a customer through a perfect value creation process that has better definition of roles (Steinbrunner, 2004). eliminated all unnecessary waste. According to the American production and control society, To accomplish this, lean thinking changes the focus of MRP constitutes of a set of techniquesr that use master management from optimizing separate technologies and production schedule, bill of material and inventory data to assets to optimizing the flow of the product or family of calculate material requirements. In simple words, MRP is a products through the entire value stream. Eliminating waste technique use in determining when to order dependent along the entire value stream, instead of at isolated points, demand items and how to reschedulea orders to adjust for the creates processes that need less human effort, space, capital changing needs. A key question to MRP process is the and time. This allows companies to make products and number of times a company procures inventory within a services at far lower costs and with fewer defects, compared year. One can readily realize that a high inventory ratio is with traditional business systems. Companies are able to E respond to changing customer desires with great variety, likely to be conducive to ______high quality, low cost and very fast throughput times. Also, with the application of visual methods to control material About-1Department of Mechanical Engineering, Federal University of flow and work-in-process, information management on the Technology, Owerri, NIGERIA shop floor becomes much simpler and more accurate (Tel: +234 803 668 3339 E-mail: [email protected]) 2 Abou-t Department of Mechanical Engineering, Faculty of Engineering, II. PROCEDURE FOR THE IMPLEMENTATION OF MRP University of Port Harcourt, Choba, Rivers State, NIGERIA The following procedures are followed while implementing (Tel: +234 806 234 0271 E-mail: [email protected]) Material Requirements Planning.

Demand for Products: the demand for end products stems

from two main reasons. The first is known customers who P a g e | 50 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology have placed specific orders, such as those generated by sales Each organization poses a unique environment and that personnel, or from interdepartmental transactions. The means that specific actions need to be taken with due regard second source is forecast demand. to environment specifics. Bill of Materials File: This is simply known as BOM file. It We approach MRP as an organizational innovation and contains the complete product description, listing materials, identify the necessary measure which management should parts, and components but also the sequence in which the adopt in implementing it. Motivational influences product is created. The BOM file is often called the product underlying MRP implementation include: structure file or product tree because it shows how a product 1. Recognition of business opportunity for the timely is put together. It contains the information to identify each acquisition of MRP. item and the quantity used per unit of the item of which it is 2. Recognition of technical opportunity for the timely a part. acquisition of the technologies supporting MRP Inventory Records File: Inventory records file under a implementation. computerized system can be quite lengthy. Each item in 3. Recognition of need for solving manufacturing and/or inventory is carried as a separate file and the range of details inventory problems using MRP. Given the above carried about an item is almost limitless. The MRP program motivational factors one may readily identify what and how accesses the status segment of the file according to specific issues underlying MRP design and implementation. time periods. These files are accessed as needed while What refers to a generic process model composed of steps running the program. and indicative levels of effort to implement each step. How refers to management involvementw with respect to the A. Conditions for implementation process. Several requirements have to be met, in order to given an C. Mrp Computer Program MRP implementation project a chance of success, among e the conditions: The MRP program works as follows: A. A list of end items needed by time periods is A. Availability of a computer based manufacturing specified by ithe master production schedule. system is a must. Although it is possible to obtain B. A description of the materials and parts needed to material requirements plan manually, it would be make each item is specified in the bill of materials impossible to keep it up to date because of the file. highly dynamic nature of manufacturing C. TheV number of units of each item and material environments. currently on hand and on order are contained in the B. A feasible master production schedule must be inventory file. drawn up, or else the accumulated planned orders D. The MRP program ―works‖ on the inventory file. of components might mix with the resource In addition, it continuously refers to the bill of restrictions and become infeasible. materials file to compute quantities of each item C. The bills of material should be accurate. Ity is needed. essential to update them promptly to reflect any E. The number of units of each item required is then engineering changes brought to the product.l If a corrected for on hand amounts, and the net component part is omitted from the bill of material requirement is ―offset‖ to allow for the lead time it will never be ordered by the system. needed to obtain the material. D. Inventory records should be a precise r D. Output Reports representation of reality, or else the netting process and the generation of planned orders become Primary Reports: Primary reports are the main or normal meaningless. reports used for the inventory and production control. These E. Lead times for all inventorya items should be known report consist of and given to the MRP system. 1. Planned orders to be released at a future time. F. Shop floor discipline is necessary to ensure that 2. Order release notices to execute the planned orders. orders are E processed in conformity with the 3. Changes in due dates of open orders due to rescheduling. established priorities. Otherwise, the lead times 4. Cancellations or suspensions of open orders due to passed to MRP will not materialize. cancellation or suspension of orders on the master production schedule. B. Techniques for the implementation of 5. Inventory status data. MRP Secondary Reports: Additional reports, which are optional MRP represents an innovation in the manufacturing under the MRP system, fall into three main categories: environment. Thus, its effective implementation requires 1. Planning reports to be used, for example, in forecasting explicit management action. Steps need to be clearly inventory and specifying requirements over some future identified and necessary measures be taken to ensure time horizon. organizational responsiveness to the technique being implemented.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 51

2. Performance reports for purposes of pointing out inactive 3. Exceptions reports that point out serious discrepancies, items and determining the agreement between actual and such as errors, out of range situations, late or overdue programmed item lead times and between actual and orders, excessive scrap, or nonexistent parts. programmed quantity usage and costs. The Figure below shows an overall View of a Material Requirements Program and the Reports Generated by the Program.

Firm orders Forecasts of from known demand from customers random Inventory customers Transactions

Master production w schedule

e

i

V

Bill of Material planning Inventory materials (MRP computer records file file program)y

l

r

a Primary Reports - Secondary reports -

Planned-order schedulesE for Exceptions reports planning

inventory & production control reports. Reports of

performance control

Figure 1. Overall View of the Inputs to a Standard Material Requirements Program and the Reports Generated by the Program objectives often associated with MRP design and E. MRP objectives implementation may be identified among three main The main theme of MRP is ―getting the right materials to the dimensions, namely: inventory, priorities and capacity: right place at the right time‖. Specific organizational Dimension: Objective specifics P a g e | 52 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Inventory According to Steinbrunner (2004), lean is centred on - Order the right part creating more value with less work. Lean manufacture is a - Order the right quantity generic process management philosophy derived mostly - Order at the right time from the Toyota Production System (TPS) and identified as Priorities ―Lean‖ only in the 1990s. It is renowned for its focus on - Order with the right due date reduction of the original Toyota seven wastes to improve - Keep the due date valid overall customer value, but there are varying perspectives on how this is best achieved. The steady growth of Toyota, Capacity from a small company to the world‘s largest automaker, has - Plan for a complete load focused attention on how it has achieved this. - Plan for an accurate load Lean manufacturing is a variation on the theme of efficiency - Plan for an adequate time to view future load based on optimizing flow; it is a present-day instance of the III. LEAN MANUFACTURING recurring theme in human history toward increasing efficiency, decreasing waste, and using empirical methods to Lean manufacturing is a western adaptation of the Toyota decide what matters, rather than uncritically accepting pre- Production System, developed by the Japanese carmaker existing ideas. Lean manufacturing is often seen as a more and most famously studied (and the term ―Lean‖ coined) in refined version of earlier efficiency efforts, building upon The Machine That Changed the World (Womack, 1996). the work of earlier leaders. A fundamental principle of lean The Internet offers some useful resources on this topic, manufacturing is demand-based flow manufacturing.w In this including BCG systems inc.(http//www.mmsonline.com), type of production setting, inventory is only pulled through which state that Lean Manufacturing is a production method each production center when it is needed meet a customer‘s that calls for building products with as few steps and as little order. The benefits of this goal include: decreased cycle work-in-process inventory as possible. It relies on work e time, less inventory, increased productivity, increased centres or manufacturing cells that are capable of building capital equipment utilization. multiple products, giving the company the flexibility to The core of lean is foundedi on the concept of continuous produce the exact mix and quantity of products required. product and process improvement and the elimination of Taiichi Ohno, the engineer commonly credited with non-value added activities. The value adding activities are development of the Toyota Production System, and simply only those things the customer is willing to pay for, therefore Lean, identified seven types of waste: defective everythingV else is waste, and should be eliminated, products, unnecessary finished products, unnecessary work simplified, reduced, or integrated (Rizzardo, 2003). in process, unnecessary processing, unnecessary movement Improving the flow of material through new ideal system (of people), unnecessary transportation (of products) and layouts at the customer‘s required rate would reduce waste unnecessary delays. Lean focuses on eliminating these in material movement and inventory. wastes from a manufacturing system. In particular, this work is interested in the second and third types – unnecessaryy A. Steps to achieve lean systems finished goods and work in process. The Lean answer to The following steps should be implemented to create the these wastes is to link production at each step in the process ideal lean manufacturing system: with the subsequent process (or the consumer forl finished 1. Design a simple manufacturing system goods). At Toyota, they use kanban (a Japanese word for 2. Recognize that there is always room for ―shop sign‖) cards attached to each sub-assembly that are improvement sent back to the producer each time oner is used. The cards 3. Continuously improve the lean manufacturing then become a signal to produce one more. As a result, the system design number of cards in the system controls the amount of work in process. a B. Basics for the design of a simple lean Liker (1997) describes a sequence of phases that a manufacturing system manufacturing facility must visit to become Lean: process A fundamental principle of lean manufacturing is demand- stabilization, continuous flow, synchronous production, pull based flow manufacturing. In this type of production setting, authorization, and levelE production. Such anecdotes are inventory is only pulled through each production center useful advice for managers and provide a general framework when it is needed to meet a customer‘s order. The benefits for becoming Lean, although they do not provide specific of this goal include: strategies for changing production control schemes.

Lean Manufacturing or Lean production, which is often  decreased cycle time known simple as ―Lean‖, is a production practice that  less inventory considers the expenditure of resources for any goal other  increased productivity than the creation of value for the end customer to be  increased capital equipment utilization wasteful, and thus a target for elimination.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 53

(a) There is always room for improvement between the vendor and customer. Materials can then be pulled into the production lines as needed to support the The core of lean is founded on the concept of continuous required production rate of finished goods. Sharing material product and process improvement and the elimination of plans can lead to partnerships with vendors that not only non-value added activities. ―The Value adding activities are reduce lot sizes and lead-times, but also result in reduced simply only those things the customer is willing to pay for, costs and less work-in-process at both vendor and customer everything else is waste, and should be eliminated, locations. For the job shop environment, the planning and simplified, reduced, or integrated‖(Rizzardo, 2003). inventory tools of MRP can also be applied to set priorities Improving the flow of material through new ideal system for raw materials and manufactured products, in addition to layouts at the customer's required rate would reduce waste in developing plans for when and how much will be required. material movement and inventory . Companies will continue to find ways to apply lean (b) Continuously improve manufacturing concepts,if they should remain competitive, to simplify material planning, reduce waste and improve A continuous improvement mindset is essential to reach a their operations. But it may not be feasible to apply pull company's goals. The term "continuous improvement" methods to all of the company's product lines. When MRP means incremental improvement of products, processes, or planning and inventory tools are needed to support the job services over time, with the goal of reducing waste to shop environment, and pull methods make sense to support improve workplace functionality, customer service, or the repetitive production lines, manufacturers will find that a product performance (Suzaki, 1987). blend of MRP push methods and leanw manufacturing pull C. Lean Goals methods can provide the right material planning mix for their mixed mode environment. In order to have a successful  The four goals of Lean manufacturing systems are implementation of MRP, the recommended steps are to be to: e followed:  Improve quality: To stay competitive in today‘s A computer based manufacturing system should be made marketplace, a company must understand its available. It wouldi be impossible to keep material customers' wants and needs and design processes to requirements plan up to date because of the highly dynamic meet their expectations and requirements. nature of manufacturing environments. Although it is  Eliminate waste: Waste is any activity that possible to obtain material requirements plan manually, but consumes time, resources, or space but does not it is time consumingV and a daunting task. add any value to the product or service. There are A feasible master production schedule must be drawn up, or seven types of waste: else the accumulated planned orders of components might 1. Overproduction (occurs when production should fall into the resource restrictions and become infeasible. have stopped) The bills of material should be updated and accurate. It is 2. Waiting (periods of inactivity) essential to update BOM promptly to reflect any engineering 3. Transport (unnecessary movement of materials)y changes brought to the product. If a component part is 4. Extra Processing (rework and reprocessing) omitted from the bill of material it will never be ordered by 5. Inventory (excess inventory not directlyl required the system. for current orders) Inventory records should be a precise representation of 6. Motion (extra steps taken by employees because of reality, or else the netting process and the generation of inefficient layout) r planned orders become meaningless. 7. Defects (do not conform to specifications or Lead times for all inventory items should be known and expectations) given to the MRP system.  Reduce time: Reducing the time it takes to finish an The last but not the least is maintaining Shop floor a discipline. It is necessary to ensure that orders are processed activity from start to finish is one of the most effective ways to eliminate waste and lower costs. in conformity with the established priorities. Otherwise, the lead times passed to MRP will not materialize.  Reduce total costs: To minimize cost, a company must produceE only to customer demand. V. CONCLUSION Overproduction increases a company‘s inventory MRP and lean are not only capable of co-existing, but they costs because of storage needs can also support one another, provided that the following IV. DISCUSSION concepts are understood and conditions exist: Commitment to planning: First and foremost, there must be MRP can be used to set priorities for the production of a commitment to planning. The "P" in MRP is for planning, finished goods, in an environment where mixed mode is yet its role is often overshadowed by the zeal to reduce practised and in the job shop environment in order to waste. The importance of planning simply cannot be develop a plan for common raw materials consumed. overlooked. Beyond better inventory control, planning Uniform containers can be used to standardize lot sizes in enables you to have the right quality and quantity at the right production lines, for unique items consumed to signal the location and time. Good material planning can help reduce need to replenish materials and to simplify transport P a g e | 54 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology the waste of downtime and reduce overtime. It also helps journal of Applied manufacturing systems, winter, with overall product quality. pp.47-58. Communication with suppliers: While lean concepts reduce 4) Edward, A.S., (1995) Inventory management and waste throughout every cycle of production, MRP can Production Planning and scheduling, John wiley reduce waste in the supply chain through better relationships and sons. with suppliers. Planning enables better data and information 5) Gahagan, S.M. (2008) Simulation and that can be shared with vendors. optimization of production control for lean Dedication to data: While MRP systems can play an manufacturing Transition, unpublished dissertation important role in synchronizing products, if changes occur, submitted to the faculty of Graduate school, MRP can be slow to respond. This is usually a result of University of Maryland. transactions not being entered in a timely manner. Effective 6) Jain,R.K. (2008) Production Technology, sixteenth product data management is critical to adapting traditional Edition, Khanna publishers, 2-B,Nath market,Nai manufacturing systems to agile and lean manufacturing sarak,New Delhi. methods. However, it all begins with the data. By gaining an 7) James, H.G. (1997) American production and understanding about which bills of material and routing inventory control society production and inventory schemes are appropriate for given situations, you learn how control Handbook, McGraw-Hill. they can 8) John, F.P. (1998) Master scheduling: A practical be used to streamline operations, improve quality, reduce Guide to competitive management, John wiley and waste, minimize inventory and increase the use of sons. w manufacturing assets. 9) Liker, J.K. (1997) Becoming lean: inside stories of U.S. Manufacturers, Productivity press Portland, MRP is effective when people understand that the system Oregon. e cannot think for them. Too often, team members know that 10) Mohommed, S.A.(2002) African Iron and steel the information loaded into the system is useless, and they Industry[online]. 10(8). Available therefore have no faith in the resulting data that is intended from:http//globle.steel.com/[Accessedi 22nd to guide their ordering, systems, processes and operations - a February 2010]. classic case of garbage in, garbage out. However, if team 11) Moustakis, V.(2000) Material Requirements members have confidence in the data, they will have planning (MRP), Technical University of Crete. confidence in the system. 12) Orlicky,V J. (1976) Materials Requirements Finally, when the principles are well integrated the Planning, McGraw-Hill publisher. following benefits will be obtained 13) Salem, O. and Zimmer, E. (2006) Application of Improve quality: To stay competitive in today‘s lean manufacturing principles to construction marketplace, a company must understand its customers' [online].Available wants and needs and design processes to meet their from:http//www.leanconstructionjournal.org/[Acce expectations and requirements. y ssed 15th February 2010]. Eliminate waste: Waste is any activity that consumes time, 14) Steinbrunner, D. (2004) Modern machine shop resources, or space but does not add any value to the product [online].6(4).Available from:http//mmsonline .com/ l nd or service. [Accessed 2 February 2010]. Reduce time: Reducing the time it takes to finish an activity 15) Waddell, B. (1984) International journal of from start to finish is one of the mostr effective ways to production Research, vol.22,No. 2.pp. 193-233. eliminate waste and lower costs. 16) Womack, J.P. and Jones, D.T. (1996) Lean- Reduce total costs: To minimize cost, a company must thinking: Banish waste and create wealth in your produce only to the customer‘s specification and company. New York. demand. Overproduction increasesa a company‘s inventory costs because of storage needs and inventory carrying cost.

VI. REFERENCES E 1) Agbu, O. (2007) The Iron and steel Industry and Nigeria‘s Industrialization:Exploring cooperation with Japan,institute of develping Economies,Chiba,Japan. 2) Auston, M.K. (1997) Lean manufacturing principles: A comprehensive Framework for improving production efficiency, University of california, los Angeles. 3) Black, J.T.and chen, J.C. (1994) Decoupler- improved output of an apparel Assembly cell, The

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 55

A Study On Rough Clustering

Dr.K.Thangadurai1 M.Uma2 Dr.M.Punithavalli3

Abstract-Clustering of data is an important data mining clustering. Another kind of clustering is conceptual application. However, the data contained in today’s databases clustering: two or more objects belong to the same cluster if is uncertain in nature. One of the problems with traditional this one defines a concept common to all that objects. In partitioning clustering methods is that they partition the data other words, objects are grouped according to their fit to into hard bound number of clusters. There have been recent descriptive concepts, not according to simple similarity advances in algorithms for clustering uncertain data, Rough set based Indiscernibility relation combined with measures[1]. indiscernibility graph, leads to knowledge discovery in an II. GOALS OF CLUSTERING elegant way as it creates natural clusters in data. In this thesis, rough K-means clustering is studied and compared with the The goal of clustering is to determine the intrinsic grouping traditional K-means and weighted K-Means clustering in a set of unlabeled data. There is no absolute best criterion methods for different data sets available in UCI data which would be independent of the final aim of the repository clustering. Consequently, it is the user which must supply Keywords-Clusters, Boundary, Iteration, Attributes, this criterion, in such a way that the result of the clustering Centroid, w will suit their needs. For instance, we could be interested in I. INTRODUCTION finding representative for homogeneous groups (data reduction), in finding natural clusters and describe their lustering is a technique to group together a set of items unknown properties (naturale data types), in finding useful C having similar characteristic. There are two kinds of and suitable groupings (useful data class) or in finding clusters to be discovered in web usage domain they are unusual data objects (outlieri detection). usage clusters and page clusters. Clustering of users tends to establish groups of users exhibiting similar browsing A. The main requirements that a clustering algorithm patterns. Clustering of pages will discover groups of pages should satisfy are having related content. This information is useful for Scalability , dealing with different types of attributes, internet search engines and web assistance providers. V discovering clusters with arbitrary shape, minimal Clustering can be considered the most important requirements for domain knowledge to determine input unsupervised learning problem, so as every other problem of parameters, ability to deal with noise and outliers, this kind, it deals with finding a structure in collection of insensitivity to order of input records, high dimensionality, unlabeled data. A cluster is therefore a collection of objects interpretability and usability[2] which are similar between them and are dissimilar to the objects belonging to other clusters. Here the simpley B. Numbers of problems with clustering are graphical example for that Current clustering techniques do not address all the l requirements adequately. Dealing with large number of dimensions and large number of data items can be problematic because of time r complexity. The effectiveness of the method depends on the definition of distance. a If and obvious distance measure doesn‘t exist we must define it, which is not always easy, especially in multi- dimensional spaces. The result of the clustering algorithm can be interpreted in FigureE 1: Cluster Analysis different ways In this case we easily identify the 4 clusters into which the data can be divided; the similarity criterion is distance: two III. CLUSTERING ALGORITHMS or more objects belong to the same cluster if they are A large number of techniques have been proposed for ―close‖ according to a given distance (in this case forming clusters from distance matrices. The most important geometrical distance). This is called distance-based types are hierarchical techniques, optimization techniques ______and mixture models. We are going to discuss first two types About-1 Head in Computer Science, Govt. Arts College (Men), Krishnagiri, here TN, India(e-mail; [email protected]) About-2 Research Scholar, Dravidian University, Kuppam, A.P., India C. Approaches to clustering About-3 Director, Department of Computer Science, SRCW, Coimbatore, TN, India 1. Centroid approaches, 2.hierarchical approaches.

P a g e | 56 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Centroid approaches: We guess the centroids or central B. Types of partitional Algorithms point in each cluster, and assign points to the cluster of their  Squared Error Algorithms nearest centroid.  Graph-Theoretic Clustering Hierarchical approaches: We begin assuming that each  Mixture-Resolving point is a cluster by itself. We repeatedly merge nearby  Mode-Seeking Algorithms clusters, using some measure of how close two clusters are, or how good a cluster the resulting group would be. K-Means Algorithm:The K-means method aims to minimize D. Hierarchical Clustering Algorithms the sum of squared distances between all points and the A hierarchical algorithm yields a dendogram, representing cluster centre. This procedure consists of the following the nested grouping of patterns and similarity levels at steps, as described by Tou and Gonzalez. which groupings change. The dendogram can be broken at 1. Choose K initial cluster centre z1 (1), z2 (1)…zk (1). different levels to yield different clustering of the data. 2. At the k-th iterative step, distribute the samples {x} Most hierarchical clustering algoritms are variantas of the among the K clusters using the relation single-link, complete-link, and minimum-variance xC j (k)if || x  z j (k) |||| x  zi (k) || algorithms[3]. The single-link and complete-link algorithms are most popular. These two algorithms differ in the way of For all i=1, 2…K; I ≠ j; where Cj (k) denotes the set of characterize the similarity between a parir of cluster. samples whose cluster centre is zj (k). In the single link method, the distance between two clusters 3. Compute the new cluster centre zj (k+1),w j=1, 2…K such is the minimum of the distances between all pairs of patterns that the sum of the squared distances from all points in Cj drawn from the two clusters. In the complete link algorithm, (k) to the new cluster centre is minimized. The measure the distance between two clusters is the minimum of all pair which minimizes this is simply the sample mean of Cj (k). Therefore, the new cluster centree is given by wise distance between patterns in the two clusters. The 1 clusters obtained by the complete link algorithm are more z (k 1)  x compact then those obtained by the single link algorithm j i  N j xC j (k) IV. PARTITIONAL ALGORITHMS j=1, 2… K A partitional clustering algorithm obtains a single partition Where Nj is the number of samples in Cj (k) of the data instead of a clustering structure, such a 4. If zj (k+1)V =zj (k) for j=1, 2…K then the algorithm has dendogram produced by a hierarchical technique. converged and the procedure is terminated. Parititional methods have advantages in applications 5. Otherwise go to step 2 involving large data sets for which the construction of a C. Drawbacks of K-Means algorithm dendogram is computationally prohibitive. A problem accomanying the use os partitional algorithm is the choice of The final clusters do not represent a global optimization the number of desired output clusters. The partitionaly result but only the local one, and complete different final technique usually produce clusters by optimizing a citerion clusters can arise from difference in the initial randomly function defined either locally or globally. chosen cluster centers. l We have to know how many clusters we will have at the A. Clustering Techniques first. Let X be a data set, that is, X = {푥푖=1……N}r D. Working Principle Now let be the partition, ℜ, of X into m sets, 퐶 , j=1…m. 푗 The K-Means algorithm working principles are clearly These sets are called clusters and need to satisfy the explained in the following algorithm steps. following conditions: •퐶 ≠ ∅ , i = 1... m a Algorithm: 푖 푚 • 푖=1 퐶푖 =X 1) Initialize the number of clusters k. •퐶푖 ∩ 퐶푗 = ∅, i j, i,j=1,….,m 2) Randomly selecting the centroids in the given data It is important to sayE that the objects (vectors) contained in a set (풄ퟏ풄ퟐ … 풄풌) cluster 퐶푖 are more similar to each other and less similar to 3) Compute the distance between the centroids and the objects (vectors) contained in the other clusters. The objects using the Euclidean Distance equation. intention in the clustering algorithms is to join (or separate) ퟐ a. dij = 풙풊−풄풌 the most similar (or dissimilar) objects of a data set X, it is 4) Update the centroids. necessary to apply a function that can make a quantitative 5) Stop the process when the new centroids are measure among vectors [8] nearer to old one. Partitional algorithm is typically run multipel times with Otherwise, go to step-3. different starting states, and the best configuration obtained from all of the runs issued as the output clustering.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 57

E. Weighted K-Means Algorithm rough sets technique reduces the computational complexity of learning processes and eliminates the unimportant or Weighted K-Means algorithm is one of the clustering irrelevant attributes so that the knowledge discovery in algorithms, based on the K-Means algorithm calculating database or in experimental data sets can be efficiently with weights. A natural extension of the K-Means problem learned. Using rough sets, has been shown to be effective allows us to include some more information, namely, a set for revealing relationships within imprecise data, of weights associated with the data points. These might discovering dependencies among objects and attributes, represent a measure of importance, a frequency count, or evaluating the classificatory importance of attributes, some other information. This algorithm is same as normal removing data re-abundances, and generating decision rules K-Means algorithm just adding the weights. Weighted K- [5]. Some classes, or categories, of objects in an information Means attempts to decompose a set of objects system cannot be into a set of disjoint clusters, taking into consideration the distinguished in term of available attributes. They can only fact that the numerical attributes of objects in the set often be roughly, or approximately, defined. The idea of rough do not come from independent identical normal distribution. sets is based on equivalence relations which partition a data The weighted k-means algorithm uses weight vector to set into equivalence classes, and consists of the decrease the affects of irrelevant attributes and reflect the approximation of a set by a pair of sets, called lower and semantic information of objects. Weighted K-Means upper approximations. The lower approximation of a given algorithms are iterative and use hill-climbing to find an sets of attributes, can be classified as certainly belonging to optimal solution (clustering), and thus usually converge to a the concept. The upper approximationw of a set contains all local minimum. objects that cannot be classified categorically as not In the Weighted K-Means algorithm, the weights can be belonging to the concept. A rough set also is defined as an classified into two types. approximation of a set, defined as a pair of sets: the upper Dynamic Weights: In the dynamic weights, the weights are e and lower approximation of a set [7]. changed during the program. Static Weights: In the static weights, the weights are not G. Roughi K-Means Algorithm changed during the program. Step 0: Initialization. Randomly assign each data object to The Weighted K-Means algorithm is used to clustering the exactly one lower approximation. By definition (Property 2) objects. Using this algorithm we can also calculating the the data object also belongs to the upper approximation of weights dynamically and clustering the data in the dataset. the same cluster.V Working Principle Step 1: Calculation of the new means. The means are The Weighted K-Means algorithm working procedure is calculated as follows: same as the procedure for K –Means algorithm but the only weight is included in the weighted k means algorithm. The 푚푘 working procedure is given in the following algorithm steps. 푋푛 푋푛 퐵 푤푙 + 푤퐵 퐵 푓표푟 퐶푘 ≠ ∅. Input: a set of n data points and the number of clusters |퐶푘 | 퐶 y 푋 ∈퐶 퐵 푘 (K) = 푘 푘 푋푘 ∈퐶푘 Output: centroids of the K clusters 푋푛 l 푤푙 푂푡ℎ푒푟푤푖푠푒. 1. Initialize the number of clusters k. |퐶푘 | 2. Randomly selecting the centroids (푐1푐2 … 푐푘 ) in 푋푘 ∈퐶푘 the data set. r where the parameters wl and wb define the importance of 3. Choosing the Static weight ,which is range 푤 the lower approximation and boundary area of the cluster. from 0 to 2.5 or (5.0) The expression |Ck| indicates the numbers of data objects in 4. Find the distance betweena the centroids using lower approximation of the cluster and |CBk | = |Ck −Ck| is the Euclidean the number of data objects in the boundary areas. Distance equation. Step 2: Assign the data objects to the approximations. (i) For 2 a given data object Xn dij =E 푤 . ∗ (푥푖−푐푘 ) determine its closest mean mh: 5. Update the centroids using this equation. 6. Stop the process when the new centroids are 푚푖푛 푑푛,ℎ = 푑 푋푛 , 푚푘 = 푚푖푛푘=1…푘 푑 푋푛 , 푚푘 nearer to old one. Otherwise, go to step-4. F. Rough Set Clustering Algorithm Assign Xn to the upper approximation of the cluster h:Xn ∈ Ch. Rough sets were introduced by Zdzislaw Pawlak [6][7] to (ii) Determine the means mt that are also close to Xn—they provide a systemic framework for studying imprecise and are not farther away from Xn than d( Xn,mh)where is a insufficient knowledge. Rough sets are used to develop given threshold: efficient heuristics searching for relevant tolerance relations that allow extracting objects in data. An attribute-oriented 푇 = {푡: 푑 푋푛 , 푚푘 − 푑 푋푛 , 푚ℎ ≤ 휀 ∩ ℎ ≠ 푘} P a g e | 58 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

measure of Xie - Bien Index for three different UCI data If T= ∅ (Xn is also close to at least one other mean mt sets. It is observed that Rough K-Means algorithm is besides mh) performing well comparatively Then Xn ∈ Ct , ∀t ∈ T . • Else Xn ∈ Ch. VI. REFERENCES Step 3: If the algorithms continue with Step 1. Else STOP. 1) Agrawal R, Imielinski T and Swami A. ―Mining association rules between sets of items in large H. Experimental Results And Discussion databases‖, In Proc. 1993 Int. Conf. Management The experimental analysis is carried out in this chapter by of Data (SIGMOD-93), 207-216. May 1993 considering three different data sets from UCI data depository and the algorithms are validated through XIE – 2) Agrawal R, Mannila H, Srikant R, Toivonen H and BIEN index Verkamo AI. ― Fast discovery of association rules.‖, In: Fayyad UM, Piatetsky-Shapiro G, I. Xie-Beni Validity Index Smyth P and Uthurusamy R. (Eds.) Advances in In this thesis, the Xie-Beni index has been chosen as the Knowledge Discovery and Data Mining, 307-328, cluster validity measure because it has been shown to be 1996. able to detect the correct number of clusters in several 3) Bhattacharyya S, Pictet O, Zumbach G. experiments. Xie-Beni validity is the combination of two ―Representational semanticsw for genetic functions. The first calculates the compactness of data in the programming based learning in high-frequency same cluster and the second computes the separateness of financial data.‖, Genetic Programming 1998: Proc. data in different clusters. Let S represent the overall validity 3rd Annual Conf., 11e-16. Morgan Kaufmann, 1998. index, π be the compactness and s be the separation of the 4) Jiawei Han and Micheline Kamber, ―Data Mining rough k-partition of the data set. The Xie-Beni validity can Concepts and Techniques‖, Morgan Kaufmann now be expressed as: Publishers, USA,i 2001. 5) Kusiak M, ―Rough set theory: A Data Mining tool K n 2 2 for semiconductor manufacturing‖, IEEE   ij || x  zi || Transactions on Electronics Packaging i1 j1 ManufacturingV 24 (1) (2001) 44-50

  6) Lingras P and West C, ―Interval set clustering of n web users with rough K-means‖, Journal of Where Intelligent Information Systems 23 (1) (2004) 5-16. And s= (d ) 2 min 7) Lingras P, Yan R and M. Hogo, ―Rough set based dmin is the minimum distance between cluster centres, clustering: evolutionary, neural, and statistical given by y approaches‖, Proceedings of the First Indian dmin= minij ||zi-zj|| International Conference on Artificial Intelligence Where n is the number of users, k is the number of clusters, (2003) 1074-1087. and Zi is the cluster centre of cluster Ci, wl is takenl as 0.7 8) Lingras, P. ―Rough Set Clustering for Web for the elements that are placed in lower approximation, wu Mining‖, Proceedings of 2002 IEEE International is taken 0.3 for the elements that are placed in Upper Conference on Fuzzy Systems. 2002. approximation, µij is taken as 0.3 for ther elements that are 9) .Milligan G.W and Cooper M.C., ―An examination placed in boundary region. µij be the membership value of of procedures for determining the number of the user in boundary region. Smaller values of π indicate clusters in a data set‖, Psychometrika, vol. 50, pp. that the clusters are more compact and larger values of s a 159-179, 1985. indicate the clusters are well separated. Thus a smaller S 10) Monmarche N. Slimane M, and Venturini G. reflects that the clusters have greater separation from each Antclass, ―Discovery of cluster in numeric data by other and are more compact. In this thesis, Xie-Beni validity an hybridization of an ant colony with the k-means index is used to validateE the clusters obtained after applying algorithm‖, Technical Report 213, Ecole d‘ the clustering algorithms Ingenieurs en Informatique pour l‘Industrie (E3i), V. CONCLUSION Universite de Tours, Jan. 1999.

The K-Means, Weighted K-Means and Rough K-Means clustering algorithms have been studied and implemented. All the three algorithms are analyzed using the validity

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 59

Applying Software Metrics on Web Applications

Vikas Raheja1 Rajan Saluja2

Abstract- Web Applications Automates many daily business height . where height is an attribute activities . User Interact with these web applications by the Mapping: After finding the Empirical Relation one should interface which these applications provides . Web applications go for mapping from Empirical Relation to Numerical are different from normal applications . The traditional Relation . software metrics can be applied to web applications . but some A is taller than B if and only if M(A) > M(B) new metrics which are made only for web applications are If we convert that type of relation to some mathematical important and increase the performance of web applications. form then such form is called mapping . In this paper traditional software metrics as well as some new web metrics are described . In new approach I have described The stages for measurement are the performance metric for web applications and security  Identify attribute for some real world entities. measures and navigability metric which are useful to improve  Identify empirical relation for attributes. the web applications . In the beginning I have given basics of  Identify numerical relations corresponding to each measurements which are required for better understanding of empirical relation. this paper . w  Define Mapping from real world entities to Keywords-Web Metric ,Navigability Metric , Performance numbers. Metric , Security Metric  Check that numerical relations preserve and are I. INTRODUCTION preserved by empiricale relation. easurement is the process by which numbers or A. Direct and Indirect Measurement symbols are assigned to attributes of entities in the i M Once we have a model of entities and attributes involved , real world in such a way as to describe them according to we can define the measure in terms of them . Direct clearly defined rules [2]. Software Metric is a term that measurement of an attribute of an entity involves no other embraces many activities , all of which involves some attribute or entity for example length of a physical object degree of measurement . Software Metrics provides a basis V can be measured without reference to any other object or for improving the software process, increasing the accuracy attribute . on the other hand , density of a physical object of project estimates , enhancing project tracking , and can be measured only in terms of mass and volume , we then improving software quality . There are many type of use a model to show us that the relationship among the three Software Metrics present out of which some are in the area is density = mass / volume . some direct measures in of software engineering are length, duration of testing process , 1. Cost and Effort Estimation y number of defects discovered ,time a programmer spends on 2. Productivity Measure and Models the project. Indirect measurement is often useful in making 3. Data Collection l visible the interactions between direct measurement [1] . 4. Quality Model and Measures Example of Common Direct Measurement 5. Reliability Models Length 6. Performance Evaluation and Models r Width 7. Structural and Complexity metrics Line of Code 8. Capability – Maturity Assessment Example of Common Indirect Measurement II. THE BASIC OFa MEASUREM ENT Program Productivity:LOC produced / person months efforts There are several theories of measurement, which will work Module Defect Density :Number of Defects /module size like for e.g. Representational Theory of Measurement will Requirements Stability:Number of initial requirement/ total work [2]. The Representational Theory of Measurement E number of requirements. seeks to formulize our intuition about the way world Test Effectiveness Ratio :Efforts spent fixing faults / total works. that is , the data we obtain project effort as measures should represent the attributes of the entity. Our Intuition is the starting point for all measurements. B. Measurement Scales and Scale types Empirical Relation : Given any two peoples x and y , we There are five major type of scales . can observe that x is taller than y or y is taller than x therefore we say that ―Taller than is a empirical relation for  Nominal ______ Ordinal  Interval About-1 Assistant Professor ,N.C. Institute of Computer Sciences , Israna ,  Ratio Panipat (Haryana) 2  Absolute About- Assistant Professor ,N.C. Institute of Computer Sciences , Israna , Panipat (Haryana) P a g e | 60 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

C. Classifying Software Measures Table 1

Software measurement needs entities and attributes , we can III. WEB METRICS divide our software to these three classes . A. Web Engineering Fundamentals Processes : are collection of software-related activities. Web Engineering is the implementation of engineering Products : are any artifacts , deliverables or documents that principals to obtain high quality web applications . Similar result from a process activities types of processes will be followed to make web Resources : are entities required by the process activities. applications as in traditional software‘s but with new ideas . With in each class of entities we distinguish internal and now a day when the platform of programming has changed external attributes then it is difficult to develop the software only with Internal attributes : of a product , process or resources are traditional models . some changes in models needs to be those that can be measured purely in terms of the product , required for the development of online applications. In the process or resources itself. Previous years the web site consist of little more than a set External attributes : of a product , process or resources are of hypertext files that present information using text and those that can be measured only with respect to how the limited graphics . as the time passed , HTML was product , process or resources relates to the environment . augmented by development tools that enabled web engineers

to provide computing capability along with information . Entitie Attributes As in traditional projects attributes are needed for software s metrics either they are internal attributesw or external Produc Internal External attributes. Similarly attributes are needed by web metrics t for the improvement of online projects or web applications . Specifi Size, Reuse, Modularity, Comprehensi some of the attributes which are useful for web metric are cation Redundancy Functionality, bility, e Network Intensiveness : A Web App resides on a network Syntactic Correctness Maintainabili and must serve the needs of a diverse community of clients . ty Web Applications arei network dependents [5]. Design Size , Reuse, Modularity, Reliability, Concurrency : A Large no of users may assess the Web s Coupling , Cohesiveness, Usability, Application at one time [5]. Functionality Maintainabili Unpredictable Load : At one time 1000 users can assess the ty. web applicationV or 10 users may assess the web application Code Size,Reuse,Modularity,Coupling, Reliability, [5]. Functionality,Algorithmic Usability, a. Performance : If a user wait for too long then , he complexity, ControlFlow Maintainabili or she may decide to go else where [5]. Structure ness ty, Availability : Web Application should be available for Test Size , Coverage level Quality maximum time like 24/7/365 basis [5] . data y b. Data Driven : The primary function of many web Processes application is to present Hypermedia files as well Constr Time, Effort, No of Qualityl , as to display the graphics But web applications ucting Requirements Changes , cost, Stability may also be able to assess the database [5]. Specifi Content Sensitive :The text present on the web sites should cation r be of high quality . Because the contents always represent Detaile Time , Effort, No of Specification Cost, Cost the quality of web sites [5]. d Faults Found Effectiveness Continuous Evolution : Web Applications evolves Design continuously. Some web applications may be updated after Testing Time , Effort, Noa of Coding Cost, Cost each hours ,some may be updated after each minutes . Faults Found Effectiveness Security : Web applications are on world network then there , Stability is need for securing the contents of web applications . strong Resources E security measures are to be taken for protecting the Person Age, Price, Productivity information and data of web applications al Experience, Meet the Business requirements web applications should Intelligence solve the purpose of business for which they are made . Teams Size, Communication level, Productivity, Various Types of Web Applications are Structure ness Quality  Informational Softwa Price, size , Usability,  Downloads re Reliability  Customizable Hardw Price, Speed , Memory Size, Reliability,  Interaction are  User Input Offices Size, Temperature, Light Comfort ,  Transaction Oriented Quality

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 61

 Portal metrics provide the way how these web applications  Database Access behaves and what is the quality of these online applications . Software Metrics provides a basis for improving the Traditional Small e-Projects Major e- software process, Increasing the accuracy of project Projects Projects estimates , enhancing project tracking , and improving Requirement Rigorous Limited Rigorous software quality . Web Metrics if properly characterized , Gathering achieve all these benefits also improve the usability , web Technical Robust: Descriptive Robust UML Application performance , and user satisfaction [5]. Specifications Models Overview Models The goal of web metrics is to provide better quality of web spec applications from technical and business point of view . Project Measured Measured in Measured in Web Metric provides the measures of effort, time and Duration in month days weeks or months and complexity of web applications . Some of the measures of or years months years web applications are Testing & QA Focused E. Performance Metric on achieving Performance is related to availability and concurrency of quality web applications . when end user require the service of web targets applications and web applications get fail such condition Risk Explicit Inherent Explicit reduces the performance of Web applicationsw . The cause of Management failure may be any thing either due to network failure or Half 18 months 3 to 4 months 6 to 12 heavy load on servers Life or longer months Fig 1 shows an example of a typical web application Deliverables e architecture . in which web server take request from users Release Process Rigorous Limited Rigorous and passes the request to database server through Post Release – Requires Automatically Obtained application server . andi then result of database query will be customer Proactive obtained from both feedback shifted to client machine [8]. efforts user automatically Single set of web server, application server and database interaction and solicited server is giving the service to no of clients . with such type feedback of architectureV it is difficult to improve performance of web  Data Warehousing applications. Table 2 B. Planning For Web Engineering Projects Firewall WEB APPL SERVER SERVER

In Table 2 the comparison of the traditional projects withy small e- Projects and Major e-Projects has been carried out Client . Traditional Software Projects and Major e- Projects have 1 substantial similarities . small e-projects havel special Client characteristic which differs them form traditional projects . 2 Even in case of small e-Projects planning must be occurred and risk must be considered , a r schedule must be Client DATA established and control must be defined so that confusion , 3 BASE frustration, and failure are avoided SERVER

a Client C. Project Management Issues for web applications 4 a) A Business must choose from one of the two web engineering issues (1) The web application is Client outsourced E–The web Engineering is performed by 5 some third party who has the expertise , talent and resources that may be lacking with in the business , (2) or the web application is developed in-house Figure 1 using web engineers that are employed by the Client business . A third alternative is there in which some n work is carried out In-House and some work is outsourced [4] . But if we improve such model and new model we make as shown in fig 2 in which two set of web server application D. Our Approach Towards Web Metrics server and database server give service to the clients then Web Engineering uses metrics to improve the overall load on each server reduces our performance metric says process for the development of web applications. These that response time decreases if total no of servers increases P a g e | 62 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

,Media Testing Time Taken to test all media in web Applications . Total Effort:Structuring effort +Interlinking effort + Interface building +link-Testing effort + Media Testing Response Time  1/ Total no of Servers efforts To reduce the response time increase the no of servers (1) Page Authoring

Text Efforts : Time Taken to author or reuse text in page, Page linking efforts :Time Taken to author link in page, page structuring efforts :Time Taken to structure page. Total page efforts: Text effort + page linking effort + page Structuring effort.

(2) Media Authoring

Media efforts: Time taken to author or reuse media files, Media digitizing: Time taken to digitize media Total media efforts: media effort + media digitizing effort [5] .

(3) Programming Authoring w Programming effort : Time taken to author HTML ,Java or related Language implementations Reuse effort : Time taken to reuse/modifye existing programming .

(4) iNavigability Measures Navigability describe the ease with which user find the Figure 2 desired information . navigability measure is important F. Security Metric for usability. A proper model of navigability reduces the access timeV .there are certain measures through which Web application are on world network then there is need for navigability can be increased e.g. hyperlinks depth securing the contents of web applications [6]. Strong ,hyperlinks breadths, topologies in connection with security measures should be taken to protect the information hyperlinks , some study have been done for the examination and data of web applications. of hypertext topologies on usability [7] .breadth maximum Inputs from user is the way through which security can be approach all links are there on a single page or home page reduced . while coding the web applications appropriatey so that user can move to the desired page just by single checks should be implemented on user inputs to maintain click but this approach is better only for informational the security of web applications . l websites like Rediff home page . e.g an input which is ready to take character type data Depth maximum approach all links are on different pages in should not take numeric data or any other special characters. a web application .depth is the no of clicks required to get Apply user ID and password on securer information .SQL the specific page from the home page . this approach is Injection attacks which are done by better where input is required from user by following the hackers should be avoided by positive tainting techniques specific no of steps . web site navigability can be evaluated [2]. in three ways with user survey , with usage analysis , and HTTP Cookies and server variablesa can be the cause for with navigability measurements [7]. poor security . if user may not perform any action for some Mainly there are four hyper text topologies present (1) period of time then cookies should get expired and Linear Topology (2) Strictly Hierarchical (3) Mixed application should askE for relogin and password .Defensive topology (Hierarchical Topology with cross –referential programming reduces the attacks hyperlinks ) (4) Non linear topology (a complete network based on a large no of cross referential links ) .Previous IV. MEASUREMENT OF TIME AND EFFORTS study finds that navigability decreases in the order (1) A Few measures of efforts and time are given below Linear , (2) Strict (3) Mixed (4) Complex . we can divide the Structuring efforts:Time to Structure Web Application Mixed Topology into three sub categories (1) Mixed Interlinking effort:Time to interlink pages to build the web Hierarchical with link to Home Page (2) Bottom up applications, Interface Planning : Time taken to plan web approach (3) Mixed Hierarchical with link at the same application Interface, Interface Building : Time Taken to level . In the first approach a link to home page is present Implement interface for web applications Link-Testing from every page , In second approach a link to previous effort :Time taken to test all links in web applications page is present from every page , and third approach is a link for every page at the same level is present .

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 63

V. REFERENCES

1) E. Stroulia, M. El-Ramly, P. Iglinski, and P.

Sorenson, ―User Interface Reverse Engineering in

Support of Interface Migration to the Web,‖

Automated Software Eng. J., vol. 10 no.3, pp.

271-301,2003.

2) Norman E . Fenton , ―Software Metircs ―

Thomson Publications, Fifth Edition , 2005

3) Sreedevi Sampath,Lori Pollock ―Appling Concept

Analysis to User-Session-Based-Testing of Web

Applications ― IEEE Trans. on Software

Engineering, Vol 33 , No. 10 pp. 643-657,Oct

2007

4) Powell, T.A. Web Site Engineering, Prentice Hall,

1998

5) Pressman Roger S, Software Engineering, McGraw

Hill, 2005 w 6) Ying Zou, ,Qi Zhang, Xulin Zhao , ― Improving

the usability of e-commerce applications using

business processes ― , IEEE Transactions on

Software Engineering ,Vol 33 , No 12 , pp. 837 – e

853 ,Dec 2007

7) Yuming Zhou,Hareton Leung, ―MNav : A Markov i ―,IEEE Trans. On Software Engineering ,Vol

33,No. 12, pp. 869-889 , Dec 2007.

William G.J. ,Alessandro Orso, Panagiotis ,‖

WASP V 8) Protecting Web Applications Using Positive

Tainting and Syntax Awareness ― IEEE Trans.

On Software Engineering ,Vol 34,No. 1, pp. 65-79

, Jan/Feb 2008.

y

l

r

a

E

P a g e | 64 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Measuring Helpfulness of Personal Decision Aid Design Model

Siti Mahfuzah Sarif 1

Norshuhada Shiratuddin 2

Abstract-The existence of countless computerized personal Hoog, 1983; Alidrisi, 1987; Todd & Benbasat, 1991). The decision aids has triggered the interest to investigate which existence of countless computerized personal decision aids decision strategy and technique are ideal for a personal (either in the form of website, software or spreadsheet) these decision aid and how helpful is decision aid to non-expert days, has triggered the interest to investigate the suitability users? Two categories of decision strategies have been and helpfulness of this technology to users, especially to the reviewed; compensatory and non-compensatory, which results in fusing the two strategies in order to get the best of both non-expert users. worlds. Findings from the study of focus groups show that II. BACKGROUND OF STUDY multi criteria decision method (MCDM) known as Pugh matrix and lexicographic have been identified as two most preferred lthough most personal decisions madew are minor in nature techniques in solving personal decision problems. Both, the and in terms of its consequences, but still, being able to strategies and techniques, are incorporated in the development make an actual decision out of any situation is indeed of a personal decision aid design model (PDADM). The essential (Rich,1999). Living in the 21st century, it is almost proposed model is then validated through prototyping method e impossible not to associate anything with computer in two different case studies (choosing development methodology in mobile computing course; and purchasing a technology and this includes decision making. The evidence of human limitationsi in information processing is mobile phone). In measuring the helpfulness of the prototypes, this study is looking at four dimensions; reliability, decision unquestionable, thus, the advantage of computerized making effort, confidence, and decision process awareness. The decision aids can be a major benefit for decision maker. findings show that the respondents from different decision situations perceived PDADM driven prototypes as helpful. VA. Research Problem Statement Keywords-Computerized decision aid, decision strategy, Decisions are part of human life. Decision majorly involves multi criteria decision method, helpfulness choices, and the hardest part is to make the right choice. It can be demanding to choose without being clear about what I. INTRODUCTION to choose and how to go about it, which later, may lead to uman commonly makes decisions of varying being indecisive. Moreover, indecisiveness may cause failed H importance on daily basis, thus, making the ideay of actions and tendency of being controlled by others seeing personal decision making as a researchable matter (McGuire, 2002; Arsham, 2004). This shows that, under seems odd. However, studies have proven that most humans appropriate circumstances, it is essential to apply decision l aid in making decision. are much poorer at decision making than they think. An understanding of what decision making involves, together Over decades, there are countless of studies on decision with a few effective techniques, will help produce better support technology that proposed the methods of improving r the performance of such technology at organization level. decisions. Thus, explains the existence of decision support technology at different levels in various fields; for instance However, in more recent years, the existence of in management, engineering and medicine. computerized personal decision aids (more examples and To date, the attentions given to athe improvement of decision reviews in section 3.2) are mushrooming and progressively support at organization level has been enormous. On the getting attention from users; for example like ―hunch‖ contrary, the study in improving the performance of decision (www.hunch.com) and ―Let Simon Decide‖ aid in personal decision making is still lacking and out of (www.letsimondecide.com). This shows the relevance of E study in issues related to computerized decision aids date (Jungermann, 1980; Wooler, 1982; Bronner & de ______pertaining to personal decisions. For more than five decades, most of research that have been About-1 Department of Computer Science, College of Arts and Sciences, carried out on decision process focuses either only on Universiti Utara Malaysia, and is currently pursuing her PhD degree. She descriptive aspect (studying how decisions are being made) specializes in software application development and multimedia design. (email: [email protected]) or normative aspect (studying how some ideally logical About-2 Professor and Applied Science Chair at the College of Arts and decider would make decisions). Decider in this context is Sciences, Universiti Utara Malaysia. She obtained her PhD from referring to decision aid. Prescriptive research on decision University of Strathclyde, Glasgow, UK and has published more than 100 processes, on how to help the decider progress from the papers in journals and proceedings. She specializes in design research and application engineering. descriptive to the normative has, however, been scarce (email: [email protected]) (Brown, 2008). This is also has been mentioned earlier in (Bell et al., 1988).

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 65

The term computerized decision aid refers to a very diverse decision and ensuring the helpfulness of the model via set of tools based on a varying techniques and complexity. prototyping method. Generally, decision aids are designed with aims to help human choosing the best decision possible with the Among the topics reviewed from the literatures include knowledge they have available. However, creating effective decision making, multi criteria decision making (MCDM) decision aids is more than meet the eyes (Power, 1998). methods, computerized decision aids, related decision Complex and structured mathematical techniques that theories, and aspects of helpfulness of information systems correspond to the uncertainty of a decision situation have in general and decision support in particular. long held great theoretical appeal for helping decision A. Decision Strategies and Techniques makers make better decisions. Studies by Hayes and Akhavi (2008), Adam and Humphreys (2008), Zannier et al., (2007) Personal decision normally involves evaluation of many and Law (1996) do not agree with the earlier statement. choices and making selection out of many. Generally, there Hayes and Akhavi (2008) also affirmed that ―decision aids are various strategies and techniques in making decision. based on mathematically correct and sophisticated models This study focuses on decision making problems when the do not actually improve the decision making performance. number of the criteria and alternatives is finite, and the This is due to how the decision aids frame the problem in a alternatives are given explicitly. Problems of this type are way that does not fit human decision making approaches‖. called multi attribute decision making problems. Furthermore, although uncertainty can be tackled using Compensatory and Non-compensatory Strategies complex mathematical tools, but more often than not, w decision maker will not have the time to implement the The decision strategies are commonly divided into two structured mathematical strategies (McGuire, 2002; Arsham, broad categories, non-compensatory and compensatory. 2004). These are further supported in Alidrisi (1987) and Ullman (2002) defines non-compensatory strategies using the example of one well documentede non-compensatory Adam and Humphreys (2008). All the researchers agreed that as far as personal decision making is concerned, strategy; the lexicographic method. complex and structured mathematical techniques are not As for compensatory istrategies, Ullman (2002) defines it as preferred. Evidently, this indicates that a simple decision strategy which allows decision makers to evaluate the making model is a more needed solution when compared to alternatives by balancing the strong features of the the rigorous criteria weighing analysis. alternatives with its weaker features. Example of methods All else being equal, decision makers prefer more accurate that supportV compensatory strategy is decision matrix and and less effortful choices. Since these desires are utility theory methods. conflicting, thus selecting suitable strategy for the aid Lexicographic method strategy can be a tricky task (Payne, 1993; Naude, 1997; Al- Shemmeri et al., 1997; Zanakis et al., 1998). Then again, the In the lexicographic method, criteria are ranked in the order appropriate use of decision strategies can contribute to of their importance. The alternative with the best effective decision making (Cosier & Dalton, 1986). y performance score on the most important criterion is chosen. If there are ties with respect to this criterion, the B. Research Objectives performance of the tied alternatives on the next most With the nature of the problem in mind, this studyl aims to important criterion will be compared, and so on, till a unique propose a personal decision aid design model that is alternative is found (Linkov et al., 2004). perceived helpful. The following specific aims are outlined Maut in means to support the general aim: r Multi-attribute utility theory (MAUT) is seen as an ideal i. To identify the appropriate decision strategy and approach for personal decision making by many previous decision technique for apersonal decision making researchers due to the nature of the decision problem. This is ii. To incorporate identified decision strategy and supported in a number of studies (Bronner & Hoog, 1983; technique in the development of the personal Alidrisi, 1987; Işıklar & Büyüközkan, (2007); Adam & decision aid design model Humphreys, 2008). In a study, Adam and Humphreys iii. To validate Ethe personal decision aid design model (2008) described that, ―MAUT is simple enough to in different situations via prototyping method implement as compared to other model of decision making iv. To measure the users‘ perceived helpfulness of the which requires a more rigorous criteria weighing analysis prototypes that is not necessarily needed for the role of decision making‖. Pugh’s Method III. INTRODUCTION TO DECISION TECHNIQUES Pugh's method is known as the simplified MAUT which was Apparently, a working knowledge of decision theory is first introduced by Pugh (1990) as the method for concept needed before embarking into developing a decision aid selection in engineering decision. In Pugh approach, all design model. The design of the model includes two alternatives are compared to a datum alternative on each important expectations which are to accomplish a better criterion. Alternatives are either better (+1), worse (-1), or the same (0) as the datum for a given criterion. The score for P a g e | 66 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology each alternative is calculated as the number of occurrence of Analytic Hierarchical Process (+1) minus the occurrence of (-1). Emphasis was placed on The Analytic Hierarchy Process (AHP) is a multi-criteria using these comparisons to try to improve the weaknesses decision-making approach and was introduced by Saaty (i.e., the –1‘s) of an alternative without weakening any (1977 and 1994). The AHP has attracted the interest of strength (i.e., +1‘s). many researchers mainly due to the careful mathematical Weighted Decision Method properties of the method and the fact that the required input Weighted decision matrix involves mathematical reasoning data are rather easy to obtain. The AHP is a decision support in solving single or multi attribute decision problems. Two tool which can be used to solve complex decision problems. examples of weighted decision matrix are Weighted Sum It uses a multi-level hierarchical structure of objectives, Model (WSM) and Weighted Product Model (WPM). WSM criteria, sub-criteria and alternatives. is probably the most widely used approach, especially in single dimensional problems (Triantaphyllou, 2000). If there Pros and Cons Analysis are m alternatives and n criteria then, the best alternative is Pros and Cons Analysis is a qualitative comparison method the one that satisfies the following expression (Fishburn, in which good things (pros) and bad things (cons) are 1967): identified about each alternative. Lists of the pros and cons, n * based on the input of subject matter experts, are compared  max aij w j AWSMscore i  one to another for each alternative. The alternative with the j1 , for i = 1, 2, 3 …m strongest pros and weakest cons is preferred. The decision WPM shares almost similar concept with WSM. The main documentation should include an exposition,w which justifies difference is that instead of addition in the model there is why the preferred alternative‘s pros are more important and multiplication. Each alternative is compared with the others its cons are less consequential than those of the other by multiplying a number of ratios, one for each criterion. alternatives. Pros and Cons e Analysis is suitable for simple Each ration is raised to the power equivalent to the relative decisions with few alternatives and few discriminating weight of the corresponding criterion. In general, in order to criteria of approximatelyi equal value. It requires no compare two alternatives AK and AL, the following product mathematical skill and can be implemented rapidly (Baker has to be calculated according to this expression (Bridgman, et al., 2002). 1992; Miller & Star, 1969): B. Computerized Personal Decision Aids n wj R A | A  a | a A numberV of computerized decision aids have been  K L   Kj Lj  j1 identified. The aids come in varying mediums like website, , where n is the number spreadsheet, software and web application. All of the of criteria, aij is the actual value of i-th alternative in terms identified aids can be used to assist in personal decision of j-th criterion, and wj is the weight of importance of the j- making and also in other type of decision problems like th criterion. If the term R (AK|AL) is greater than or equal financial and management problems. Table 3.1 summarizes to one, then it indicates that alternative AK is more desirabley eight computerized decision aids along with the reviews. than alternative AL. The best alternative is the one that is The number of aids reviewed in this study is meant to be better than or at least equal to all other alternatives.l representative.

r Table 3.1: Computerized decision aids Z Typea Method/ Technique Description Reviews 1) Hunch (2009) Decision Collective  A decision community website  the interactivity is (www.hunch.com) engine (web) intelligence  uses machine learning based on statistical intuitive but involves decision making, inferences (the system gets smarter as more series of steps machine learning & users use it) (answering questions) E  uses question selection algorithm to  Involves a lot of decision trees a) find a question which will statistical analysis in discriminate well among the the back end (very remaining possible recommendation complex)

outcomes for user  Does not involve b) looks for a question which can help defining importance optimize and rank the remaining of criteria (rank the recommendation outcomes to present criteria) you with the ones you'll like the most

2) Let Simon Decide (2009) Decision Collective  consists of three decision making tools:  involves complex (www.letsimondecide.co engine (web) intelligence a. My Scores: for logical, fact based mathematical m) decision, weighted decision with multi-alternatives approach to decision- b. My Life Match: for big, life-changing making decisions  requires many steps

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 67

decision analysis c. My Points of View: for quick decision although the process  combines user qualitative input with a is intuitive weighted, mathematical formula (weighs alternatives against proprietary profile)  enables collective learning – share decision summary with others  provides action plan for every decision 3) Choose It! (1999) Web Decision Matrix  Online decision making tool that use  does not acknowledge (chooseit.sitesell.com/) application decision matrix concept the distinct difference  can be used to make important business, between subjective financial, and personal life decisions and objective factors

4) Management For The Spreadsheet Decision Matrix  based on classic decision grid concept  crowded text in the Rest of Us  in Excel spreadsheet format which contains: visual presentation (MFTROU.com) a. Overview of how to make decisions  Very formal Decision Making Tool b. Decision Making Example presentation (in excel (n.d.) c. Template for Making Your Own environment) (www.mftrou.com/decisio Decision n-making-tool.html)

5) Decision Oven (2008) Software Decision matrix  Off the shelf decision support software  acknowledge the (decisionoven.com/) with mathematical  can be used to support personal or business wdifference between reasoning decisions defining subjective criteria and objective criteria 6) EduTools Decision Web Weighted decision  use a rational decision making processe  Only focus on Engine (2009) application matrix selecting a course http://ocep.edutools.info/s management system, ummative/index.jsp?pj=4 i not for generic decision  User have to be familiar with the products and features that they wish to V compare 7) Career Decision Making Instructor- Guidelines and  It‘s a career decision making tool  Only focus on career Tool (CDMT) (n.d.) led, teaching/learning  It suggests the following decision cycles: decision making, not (http://cte.ed.gov/acrn/cd classroom- material a) Engaging for generic decision mt/tool.htm) based online b) Understanding  To be implemented in c) Exploring teaching/learning tool d) Evaluating environment y e) Acting f) Reflecting 8) Super Decisions (2004) Software Analytic Network  It extends the Analytic Hierarchy Process  Use complex decision http://www.superdecision Process l (AHP) analysis with rigorous s.com/  Uses same fundamental prioritization mathematical process based on deriving priorities through reasoning r judgments on pairs of elements or from  Solve for complex direct measurements. decision problem C. Theories in Modeling Decision Aid Process Decision theory is an attempt toa explicate how human make making is crucial to the effective design of the decision aid. decision, and in helping us understand the process of Therefore, this study discusses a number of related theories decision making. A grasp of the fundamentals of decision that contribute to understanding multi criteria decision making. The related literature is summarized in Table 3.2. E

P a g e | 68 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Table 3.2: Literature survey of related decision theories Decision Theories References needs for a solution to propose a proper decision making Multi Attribute Baker et al. (2001); Alidrisi (1987); model for personal decisions. Utility Theory Dyer et al. (1992); Keeney & Raiffa Expert Interview (1993); Collins et al. (2006) Interviews with experts in the related field were conducted Behavioral Einhorn & Hogarth (1981); Westaby to identify relevancies of the addressed problems. Decision Theory (2005) Discussion with the experts involves brainstorming of idea, Bounded Bahl & Hunt (1984); March & Simon approval of idea and reviews on research material. Three Rationality Model (1958); Newell & Simon (1972) experts have been referred to during this stage and also at Implicit Favorite Bahl & Hunt (1984); Soelberg (1967) certain stage of this study. The experts are professors and Model academics specializing in one of these fields: model-based Dominance Theory Easwaran (2007); Zsambok et al. systems and qualitative reasoning, quantitative analysis; and (1992) artificial intelligence. Satisficing Theory Zsambok et al. (1992); Simon (1956) B. Solution Design IV. RESEARCH METHODOLOGY In the second phase, the solution is designed and proposed. This study employed design science approach to address the After identifying the research problems and evaluating its relevance, a solution is developed in w the form of artifacts. research questions posed earlier. The selection of a suitable approach is based on the nature of a research, phases Varying methods are used to come out with all the artifacts involved and research outcomes. March and Smith (1995) including content analysis, expert review, focus group study, described design science research as a process which aims to participatory design, prototypinge and elicitation work. ―produce and apply knowledge of tasks or situations in order C. Evaluation to create effective artifacts‖ in order to enhance practice. In this study, evaluationi is achieved by the mean of case In general, process in design science research can be studies and laboratory experiments. The findings of this structured into three main phases include ―problem stage are further explained in Result section. identification‖, ―solution design‖ and ―evaluation‖. Clearly, design science research consists of a series of steps but in V. DEVELOPMENT OF PERSONAL DECISION AID DESIGN V MODEL (PDADM) practice they are not always executed in sequence; they often are performed iteratively. This study implemented the This section describes the process in developing the following steps, adapted from Offermann et al. (2009), and PDADM. Prior to this, an appropriate decision strategies for driven by design science research approach. personal decision making need to be identified, and A. Problem Identification followed by a selection of appropriate decision technique (i.e. MCDM method). Afterward, both will be incorporated The phase is divided into the following steps: ―identifyy in the development of the decision aid design model. The problem‖, ―literature research‖ and ―expert interviews‖. It method used in developing PDADM involves content specifies a research question and verifies itsl practical analysis, participatory design and expert review. relevance. As a result of this phase, the research questions are defined. A. Decision Strategy Selection r From the literature search, two common decision strategy Identify Problem groups are studied; non-compensatory and compensatory. The existence of countless computerized personal decision Findings indicate that non-compensatory strategies do not aids, these days, has triggered athe interest to investigate the allow very good performance relative to one criterion to relevance and helpfulness of ICT assistance in personal make up for poor performance on another. In other words, decision making. Offermann et al. (2009) provides the no matter how good an alternative is, if it fails on one support for the identification of research problem in this evaluative criterion, it is eliminated from consideration. study, of which, theyE stated that researchable material ―may As for compensatory strategies, they allow the decision arise from a current business problem or opportunities makers to balance the good features of an alternative with its offered by new technology‖. weaker features. Additionally, the compensatory strategies Literature Search give greater accuracy in decision but the non-compensatory strategies take the least time to accomplish decision In order to identify the research problem, literature search is In responding to the earlier discussion, this study decided to used. As a summary, a number of decision strategies, combine the implementation of compensatory and non- decision techniques (MCDM methods), computerized compensatory strategies in order to obtain the ―best of both personal decision aids, and decision making related theories worlds‖. This is supported by Ullman (2002) in his work were reviewed in this study. This results in strengthening the which stated that ―a method that gives the accuracy of the compensatory strategy with the effort of the no

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 69 compensatory strategy would add value to human decision making activities‖. that it would take even more time and effort to achieve B. Decision Technique Selection decision with the PUG and WSM. It is noted that AHP In light of the numerous decision techniques available to scores the lowest response for all three questions, which is decision makers, study of focus groups is used in order to due to refusal of most respondents to utilize it. Evidently, get some understanding of which kind of techniques that is from this focus group study, PUG and LEX are selected as more preferred by the (non-expert) decision maker. This the study also decided that introducing more than one would potential techniques to be incorporated in the design of enhance focus groups abilities to understand that there is not proposed personal decision aid design model. a single right way to resolve a decision. There are five techniques that were introduced to the focus PUG or Pugh matrix is originally a concept selection group of 51 (non-expert) participants of varying method used by engineers for design decision (Pugh, 1990). demographic background; weighted sum method (WSM), Since it was introduced, there have been many different Pugh matrix (PUG), Analytic Hierarchy Process (AHP), pro modified versions of Pugh matrix analysis in various and cons analysis (PCA), and lexicographic (LEX). All examples of its applications. In line with this, a participatory methods involve defining criteria on which to compare a set design study was conducted to learn which implementation of alternatives. The group was encouraged to solve the same of the Pugh matrix is preferred and suitable with the non- decision scenario (choosing a laptop from 4 different expert decision is making style. There are five versions (see brands) using each or at least three of the techniques Appendix) of Pugh matrix approach (includingw the original) mentioned above one at a time. This study did not make it used in this participatory design study. A total of 66 compulsory for them to use all the techniques, because of participants of varying demographic background were varying rate of understanding of the techniques after first involved in this study. e time being introduced to them. Hence, unutilized techniques show respondents‘ difficulty to understand and to get Firstly, the participants were briefly explained about the familiar with it. different implementationsi of the Pugh matrix method. Then, After establishing the focus group previous experience with they were asked to solve a designated decision problem each decision technique, the group was asked which (choosing a laptop from four different brands) using all four technique helped the most and which they had more versions; one at a time. Later, the participants were asked confidence in. Next, the group was asked which tool they ten questionsV (refer Table 5.3) based on their experience think is ―least prone to bias‖. using the different implementation of Pugh matrix and also The results from the survey are summarized for each three additional demographic questions on gender, IT skill question. The first two questions concerned (i) which and age technique that they think helped the most if they were to use Table 5.3: Questions asked in the participatory design study it in real decision and (ii) which technique they had the most confidence in. As shown in Table 5.1, technique PUG yand No. Question LEX scored among the highest number of respondents for Q1 Are you familiar with the use of Pugh matrix? both questions. l Q2 Do you find it difficult to choose the first reference? Table 5.1: Helpful and Confidence Q3 Do you prefer to weigh or not to weigh the criteria? Q4 Do you prefer to use percentage (%) or scaled WSM PUG AHPr PCA LEX values (e.g. 1 to 5) as weight? Helpful 21 39 3 19 43 Q5 Do you prefer to use comparative symbols (+, -, S) or scaled values (e.g. 1 to 5) to rate the alternatives? More 14 31 3 15 45 Q6 Which version of Pugh matrix do you think is most confidence in a helpful? Q7 Which version of Pugh matrix you had more The next question asked the group which technique they felt confidence in? was least prone to E bias (that is, is the most difficult to Q8 In your opinion, which version is least prone to manipulate to achieve preconceived results). These results bias? are shown in Table 5.2 Q9 Would you use either of these Pugh matrix approach in your real life decision? Table 5.2: Bias Q10 Would it be easier if Pugh matrix process is WSM PUG AHP PCA LEX automated (i.e. in a computerized format)? Least prone to bias 34 41 2 18 22 All the responses from participants were recorded and summarized in the following tables (Table 5.4 to 5.12). The Interestingly, even though majority of the participants had first question dealt with the previous experience of the more confidence in LEX, the score changes when it comes participants with Pugh matrix method. As shown in Table to biasness of the technique. More than half of them felt that 5.4, majority of the participants had not used the Pugh PUG was less prone followed by second the highest scored approach before this study technique; the WSM. Nevertheless, the participants noted P a g e | 70 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Table 5.9: Helpful and confidence Table 5.4: Familiar with Pugh matrix

Yes No NA* Original MV1 MV2 MV3 MV4 NA Familiar? 9 57 0 Helpful 22 11 13 7 8 5 *=No answer More 21 10 14 8 10 3 The next question asked about participants experience confidence in during the study when they were required to choose their MV=modified version own reference for comparative analysis in Pugh matrix take place. As shown in Table 5.5, more than half of the Even though majority has more confidence in the original participants claimed that it is not a problem for them to version, but when asked about which version they think is perform that task. But the number of participants who least prone to bias, the majority score shows contrasting claimed the opposite was not far behind. response. One third of the participants agreed MV2 Table 5.5: Difficulty to choose first reference (modified version #2) is the one least prone to bias. Table 5.10: Bias Yes No NA Original MV1 MV MV MV4 NA Difficult? 24 42 0 2 3 The third and fourth questions asked about participants Least 15 11 22 w10 4 4 experience with the use of weight in defining the importance prone to of each of the evaluative criteria. As shown in Table 5.6, bias majority of the participants preferred to weigh their criteria e during the process. From this majority group, 35 of them Concerning the use of Pugh approach in real decision preferred weighing the criteria using scaled values than situation, 49 of 66 indicated that they will consider using using percentage (Table 5.7). This number represented more this approach, 16 indicatedi that they would not, and one did than half of the participants. not respond to this question (refer Table 5.11). Table 5.11: Will use Pugh matrix in real situation Table 5.6: Weighing criteria Yes No NA Yes No NA Will Vuse Pugh approach in 49 16 1 Weighing criteria 42 21 3 real situation? Table 5.7: Use percentage or scaled values for weighing Lastly, when asked whether the participants think that by automating the process of Pugh matrix (in computerized Percentage Scaled NA format) will make it easier to use this approach, majority of Values them answered yes. From 12 of the remaining participants y who answered no, 7 of them appeared to claim themselves Preferred weighing 26 35 5 as having very less IT skill. criteria l Table 5.12: Automate Pugh matrix The fifth question asked the participants if they prefer to use Yes No NA symbols; + for better, - for worse and + for equal); or scaled Automating Pugh approach 54 12 0 value to perform the comparative analysisr of alternatives makes it easier? against the reference on each criterion. Majority agreed that C. Incorporating the Decision Strategy and Decision the use of symbols is more convenience for the comparative Technique in PDADM analysis. a Table 5.8: Use symbols or scaled values The results; decision strategies and techniques, obtained Symbols Scaled Values NA from previous focus group study are incorporated in the Preferred development of personal decision aid design model. The evaluation E52 12 2 model comprises of the flow of the decision process and the styles relationship between input and outcome of each step of the process. Figure 5.1 illustrates the previous statement clearer. The next two questions (question 6 and 7) dealt with participants experience after using the Pugh approach to solve the decision problem. As shown in Table 5.9, the obviously dominant choice for both questions is the original version. The participants, as a whole, not only felt like the original version helped the most in assisting them with decision problem, but they had more confidence in it.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 71

Pre-Decision Post-Decision Decision Process Process Process Non-Compensatory Compensatory Strategy Strategy

Problem Identification & Change N Definition Identification Datum on set of Reduction of Evaluation of Suggestion Filter alternatives selected Modified Alternatives Acceptable alternatives Lexicographic alternatives Pugh’s Method (categorically) Y First Choosing Objective datum Decision Criteria Criteria Weighing Subjective Criteria

N Confirmation or Re-evaluation

w Y e Action i

Figure 5.1: Personal Decision Aid Design Model (PDADM)

VI. IMPLEMENTING PDADM IN DIFFERENT SITUATIONS

The proposed PDADM is validated through development of V two prototypes in two different case studies; choosing exposed to the importance of adopting a suitable development methodology in mobile computing course; and methodology for a mobile development project. purchasing a mobile phone. These case studies involved two Selecting a suitable development methodology for mobile very different decision situations which were intended to development project is another challenge in itself (Bertini et showcase the flexibility and functionality of the proposed al., 2006; Heikkinen & Still, 2005; Atkinson & Olla, 2004; model. y Heyes, 2002; Afonso et al., 1998). Less experienced developers will find the task even more challenging, thus,

A. Case study 1: Choosing a Development Methodology this study seeks to propose a solution by implementing the in Mobile Programming Course l proposed PDADM via a development of prototype named as Over the last decade, mobile computing has received md-Matrix (as in mobile development methodology matrix). significant interest in the academic and r industrial research Features and Screenshots of md-Matrix community. As a result, demands from the industry for This decision-making tool is mainly aimed at assisting graduates of mobile computing course are rising (Gillespie, developers (especially the novice) in choosing the most 2007). appropriate development methodology for mobile The graduates who are enteringa the mobile development development project. The numbers of available development world are expected to put up with the challenges imposed by methodologies in md-Matrix are meant to be representative; the mobile environment. Heyes (2002) reported that mobile only for the purpose of demonstrating the decision process developers face twice as much as challenges than that occur in selecting a mobile development methodology. developing traditionalE system application due to the specific The prototype of md-Matrix features the following (see demand and technical constraints of mobile environment. In Table 6.1): addition to that, inadequate research in assisting developers Table 6.1: Features of md-Matrix with the mobile development issues is also highlighted in Md4-Matrix the GI Dagstuhl Research Seminar in 2007 (König-Ries, Alternatives Mobile application 2009). filter technologies: Within this perspective, it is believed that selecting a Generic suitable development methodology is the key to these issues. J2ME* The use of a methodology is important, as a project can be Flash Lite* structured into small, well-defined activities where the Native sequence and interaction of these activities can be specified Web based (Avison & Fitzgerald, 1990). Hence, students should be P a g e | 72 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Object Oriented Platform dependent Criteria 12 objective options (methodologies) following the non-compensatory 12 subjective strategy (lexicographic process). The three highest scored Alternatives Flash Lite (4 methodologies) methods (see Figure 6.3) which pass most of the selected J2ME (4 methodologies) criteria will be ranked accordingly and the one in the highest Feedback Pop-up window rank will be set as the first reference (datum). Next, the three On screen text identified methods from previous step will be compared to Interface agent each other following the compensatory strategy (modified * enabled in this prototype Pugh‘s method) based on preferred subjective criteria (Figure 6.4). The steps can be iterated in maximum 3 cycles The first step of md-Matrix enables user to filter the where in each round the reference will be changed until each available methodologies based on preferred technology for methodology will be a reference once. The dominance development of a mobile application (Figure 6.1). As it methodology from the 3 rounds will be suggested as the best proceeds with the second step (Figure 6.2), users will make selection. The following are screenshots of md-Matrix: their selection of narrative criteria to further filter the w e i V y

Figure 6.1: Alternatives filtered categorically l r a E

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 73

Figure 6.2: The 12 objective criteria used in non-compensatory (lexicographic) process

w e i

V Figure 6.3: Result obtained in non-compensatory process y l r a E

Figure 6.4: The 12 subjective criteria used in compensatory process P a g e | 74 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology md-Matrix as a Learning Tool or displaying it to the decision maker (consumer) in a much clearer way than simply making a list of the alternatives. Along providing solution to the selection of development Within this perspective, the proposed PDADM is methodology, md-Matrix also can be utilized as an implemented in assisting consumers to make purchasing educational tool either in academic or industry. Learning decision via the use of the prototype known as ep-Matrix (as institutions can utilize it for teaching purposes to educate in electronic purchasing matrix) students on the need to have a well-structured process of Features and Screenshots of ep-Matrix developing mobile applications. As for the industry, this tool The prototype (ep-Matrix) is developed to demonstrate an can be used as one of the materials for training of new example of making a purchasing decision of a mobile interns and apprentice developers. phone. A well know brand of mobile phone is used for three reasons; the convenience of getting all the required data, the B. Case study 2: Choosing a Mobile Phone familiarity factor among consumers and for the purpose of Consumers are faced with purchase decisions mostly every evaluation later on. Table 6.2 summarizes the features of ep- time when a purchase is required. But not all decisions are Matrix that is developed for this case study: treated the same. Some decisions are more complex than Table 6.2: Features of ep-Matrix others and thus require more effort by the consumer. Other Z decisions are fairly frequent and require little effort. Alternatives Mobile phone styles: Consumers will not simply go to a store or online catalog filter Bar and spend their money in a rush. Purchasing takes place Slider* w usually as a result of series of decision making steps. The Touch Screen implication of buying behavior shows the need for a reliable Folder/Flip decision making tool to assist consumers in making a less- QWERTYe regretful and effective decision (Häubl & Trifts, 2000; Chris, 2008). Criteria 13 objective It is also important for the consumers to be able to decide on i 9 subjective the purchasing item with confidence and ease. Thus, a Alternatives Slider (6 models) comprehensive and undemanding decision aid is much Feedback Pop-up window, on-screen needed in the process. Another important aspect is the use of text, decision aid in raising awareness about the consequences of V Interface agent actually choosing the item and purchases it. This could be * enabled in this prototype obtained by organizing data with the purpose of presenting The first step of ep-Matrix enables user to filter the available phone models based on preferred style (Figure 6.5). As it proceeds with the second step (Figure 6.6), users will make their selection of objective criteria to further filter y the options (phone models) following the non-compensatory strategy (lexicographic process). The three highest scored models (see Figure 6.7) which pass most of thel selected criteria will be ranked accordingly and the one in the highest rank will be set as the first reference (datum). Next, the three identified models from previous step willr be compared to each other following the compensatory strategy (modified Pugh‘s method) based on preferred subjective criteria (Figure 6.8). The steps can be iterateda in maximum 3 cycles where in each round the reference will be changed until each model will be a reference once. The dominance model from the 3 rounds will be suggested as the best selection. The following are screenshotsE of ep-Matrix

Figure 6.5: Alternatives filtered categorically

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 75

Figure 6.8: The 9 subjective criteria used in compensatory process (modified Pugh‘s method)

VII. HELPFULNESS OF PDADM DRIVEN PROTOTYPES

This study intends to investigate users‘ perception towards helpfulness of the PDADM driven prototypes in both case studies. In measuring helpfulness, quantitative data need to be gathered through an instrument. In addition to that, subjective input through interviews and observations might help enriching the collected data. To develop the instrument for measuring helpfulness, an elicitation work as summarized in Figure 7.1 was performed (Ariffin, 2009)

Figure 7.1: Summary of elicitation work Figure 7.1 illustrates the processesw involved in the instrument development; beginning with elicitation works to determine measuring items until the instrument is ready for pilot testing. The instrumente was constructed based on the dimensions identified from elicitation work. Later, Figure 6.6: The 13 objective criteria used in non- measuring items were added based on the reviewed compensatory (lexicographic) process literatures. Some modificationsi are made to the measuring items, in terms of rewording some items and repositioning some items into another dimension of the instrument. In measuring the helpfulness of the PDADM driven prototypes,V this study is looking at four important dimensions; reliability, decision making effort, confidence, and decision process awareness. The instrument was then named as Q-HELP, which contains four dimensions: reliability, decision making effort, confidence, and decision process awareness y Table 7.1 illustrates the reliability of Q-HELP by each dimension. In the evaluation, respondents are required to l rate the helpfulness level based on each dimensions using the seven point Likert scales; which are 1 = strongly disagree, 2 = disagree, 3 = somewhat disagree, 4 = r undecided, 5 = somewhat agree, 6 = agree and 7 = strongly agree. Respective measuring items can be seen in Table 7.2 Figure 6.7: Result obtained in non-compensatory process Table 7.1: Reliability of dimensions in Q-HELP Dimensions Cronbach Alpha value a Reliability 0.755 Decision making effort 0.689 Confidence 0.906 Decision process 0.771 E awareness

One hundred and seven respondents participated in the lab experiment; 63 of them were evaluated for the first case study where as 44 for the second case study. The experiment proceeded in two steps for each case study. In the first step participants were required to accomplish the selection task aided by other tool or material. The main concern is to study the process that they went through before they can actually make a selection. In the second step, participants solved the same decision problem by making selection with the

P a g e | 76 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology assistance of proposed PDADM driven prototypes in each same decision problem in the experiment. Table 7.2 also case study. depicts the mean responses for each item in Q-HELP Upon completion of both steps, participants were requested answered by participants in respective case studies. to answer 26 questions from all four dimensions of helpfulness in Q-HELP. The instrument recorded their perceptions and experiences of making a selection for the

Table 7.2: Q-HELP items and mean responses by each item for each case study

Reliability md-Matrix ep-Matrix n=63 n=44 {name of prototype}* can be relied to function properly. 5.22 5.84 {name of prototype}* is suitable to my style of decision making. 5.02 5.43 {name of prototype}* is capable of helping me in making a choice. 5.25 5.80 {name of prototype}* provides the help that I need to make a selection. 5.33 5.75 {name of prototype}* provides the advice that I require to make my decision. 5.08 5.64 I would use {name of prototype}* if I were attempting to make a choice that is ―good enough‖ but 4.95 5.82 not necessarily the best. {name of prototype}* is suitable even during limited time to make a decision. 5.03w 5.82 Group Mean A 5.13 5.73 Decision making effort It was very time consuming to choose a {item} from the available options. 4.81 5.39 e It was very difficult to choose a {item} from the available options. 4.43 5.27 {name of prototype}* allowed me to carefully consider the decision made. 5.35 5.84 The decision process in {name of prototype}* is logical to me. i 5.30 6.14 The decision process in {name of prototype}* is simple to me. 5.19 5.91 I understand how decision process in {name of prototype}* works. 5.17 5.70 I found it very easy to interpret the decision justification provided by {name of prototype}*. 5.06 5.77 Group Mean B V 5.04 5.72 Confidence I am satisfied with the recommended solution. 5.27 5.75 The recommended solution reflects my initial preferences. 5.16 5.61 I am confident that I am able to make selection with {name of prototype}*. 5.17 5.86 I am confident that I can justify the selection that I made with {name of prototype}*. 5.17 5.93 I feel that the problem in making selection is solved. y 5.05 5.45 I am very pleased with my experience using {name of prototype}*. 5.48 5.77 Group Mean C l 5.22 5.73 Decision process awareness {name of prototype}* makes me realize I cannot get everything from just one alternative. 5.44 5.93 {name of prototype}* is an aid for me in clarifying what I want. 5.27 5.84 r {name of prototype}* shows my subconscious decision process. 5.11 5.73 {name of prototype}* helps me not to be easily influenced by others in making selection. 5.29 5.98 {name of prototype}* makes me more independent of others in making a selection. 5.22 6.00 I learned a lot about the problema using {name of prototype}*. 5.48 6.00 Group Mean D 5.30 5.91 moderately high perception on reliability. In case study 2, *replaced with md-MatrixE or ep-Matrix based on respective case studies the group mean score of the same items was 5.73, indicating high level of reliability. VIII. RESULTS Question B1 to B7 are used to assess the user‘s perceptions As mentioned earlier, the instrument used in evaluating the on effort invested in the decision making process with the helpfulness of the PDADM driven prototypes is looking at assistance of PDADM driven prototypes. For case study 1, four important dimensions; reliability, decision making the group mean score for items in dimension B was 5.04, effort, confidence, and decision process awareness. Table signifying moderately high perception on decision making 7.2 presents means of responses to the items in measuring effort among respondents. As for case study 2, the group the helpfulness of the prototypes in both case studies. mean score of the same items was 5.72, indicating high Questions A1 to A7 are used to assess the user‘s perceptions perception on the decision making effort. on reliability of the prototypes. For case study 1, the group Question C1 to C6 are used to assess the confidence level of mean score of items in dimension A was 5.13, indicating respondents in solution and procedure applied in the decision aids. In case study 1, the group mean score was

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 77

5.22, representing moderate confidence level among respondents. As for the second case study, the group mean Perceived Reliability of Both Decision Aids score was 5.73, indicating higher confidence level among 6 5.8 5.82 respondents after using the PDADM driven prototypes. 5.8 5.84 5.75 5.82 For the last dimension of the instrument, six items (items D1 5.6 5.64 5.4 5.43 to D6) have been asked to the respondents in order to 5.33 measure their perception on decision process awareness. In 5.2 5.22 5.08 5.03 5.25

Average 5 case study 1, the group mean score of the last six items in Q- 5.02 HELP was 5.30, representing moderate perception score on 4.8 4.95 decision process awareness among respondents. For case 4.6 4.4 study 2, the group mean score was 5.91, signifying high 1 2 3 4 5 6 7 perception score on decision process awareness. Items in instrument md-Matrix From the analysis above and as can be summarized in ep-Matrix d p Figure 6.9 , generally the mean scores of each dimension Figure 6.10: Perceived reliability of m -Matrix and e - fall under category moderately high or high, indicating that Matrix participants were incline to perceive the use of PDADM driven prototypes as helpful even in different personal decision situations. In both prototypes, participants rated Perceived Confidence of Both Decision Aids highly on decision process awareness, this is followed by w 6 their perceived confidence and reliable in the decision aids. 5.93 5.75 Upon further analysis, participants responded highly on the 5.8 5.86 5.77 items under reliability and confidence as depicted in Figure 5.6 5.45 5.61 e 5.48 5.4 5.27 6.10 and 6.11. Therefore, it can be concluded that both 5.17

decision aids: 5.2 Average i. provide the help that participants needed to make a 5 5.16i 5.17 5.05 selection, 4.8 ii. can be relied to function properly 4.6 iii. are capable of helping participants in making a 1 2 3 4 5 6 Items in instrument md-Matrix choice V ep-Matrix Also, the participants were: Figure 6.11: Perceived confidence in md-Matrix and ep- Matrix i. very pleased with their experience using the decision aids IX. CONCLUSION ii. confident that they can justify the selection that have been made with the decision aids y Despite the existence of various computerized decision aids, iii. satisfied with the recommended solution decision maker perceptions of the ideal decision strategy and technique have not been subjected to systematic l investigation. In doing so, this study seeks to contribute the

Group Means for Helpfulness Dimensions following, along achieving the previously stated objectives:

6.00 i. In general, this study will contribute to decision 5.91 5.80 r making area as well as cross-disciplinary area

5.60 which is related to the decision situation 5.73 5.72 5.73 5.40 ii. A proposed decision making model for personal 5.30 5.20 decisions with emphasis on the non-expert use.

Average 5.00 a5.22 5.13 md-Matrix iii. Two prototypes which utilizing the proposed 4.80 5.04 ep-Matrix decision model in two different situations; 4.60 purchasing decision and educational decision. Reliability Decision making Confidence Decision effort process iv. Algorithms of the developed prototypes. E awareness Dimension v. Instruments to measure users‘ perceived Figure 6.9: Group means for helpfulness dimensions helpfulness of the prototypes.

A comparative analysis of five decision strategies which provides research basis for related future studies

X. REFERENCES 1) Adam, F. and Humphreys, P. (2008). Encyclopedia of Decision Making and Decision Support Technologies. Idea Group Inc. P a g e | 78 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

2) Alidrisi, M.M. (1987). Use of multi attribute utility P., Svenson, O. and Anna Vári, A. (Eds.), theory for personal decision making. International Analysing and Aiding Decision Processes (pp. 281- Journal of Systems Science, 18(12), 2229—2237. 299). North Holland: Amsterdam. 3) Al-Shemmeri, T., Al-Kloub, B. and Pearman, A. 16) Christ, P. (2008). KnowThis: Marketing Basics. (1997). Model Choice in Multicriteria Decision KnowThis Media. Aid. European Journal of Operational Research, 97, 17) Collins, T.R., Rossetti, M.D., Nachtmann, H.L. & 550-560. Oldham J.R. (2006). The use of multi-attribute 4) Afonso, A.P., Regateiro, F.S., and Silva, M.J. utility theory to determine the overall best-in-class (1998). Dynamic Channels: A New Development performer in a benchmarking study. Methodology for Mobile Computing Applications. Benchmarking: An International Journal, 13, 431- Retrieved, Jan 22, 2007, from 446. http://www.di.fc.ul.pt/biblioteca/tech-reports. 18) Cosier, R.A. and Dalton, D.R. (1986). The 5) Ariffin, A.M. (2009). Conceptual Design Model of Appropriate Choice and Implementation of Reality Learning Media (RLM): Towards Decision Strategies. Journal of Industrial Entertaining and Fun Electronic Learning Materials Management & Data Systems, 86(3/4), 18-21. (eLM) (Ph.D. Dissertation, Universiti Utara Abstract retrieved from Malaysia) http://www.emeraldinsight.com/10.1108/eb057436 6) Arsham, H. (2004). Decision Making: Overcoming 19) Dyer, J.S., Fishburn, P.C., Steuer, R.E., Wallenius, Serious Indecisiveness. Retrieved March 10, 2009 J. and Zionts, S. (1992). Multiplew Criteria Decision from Making, Multiattribute Utility Theory: The Next http://home.ubalt.edu/ntsbarsh/opre640/partXIII.ht Ten Years. Management Science, 38(5), 645-654. m. 20) Easwaran, Kennye (2009). Dominance-based 7) Atkinson, C. and Olla, P. (2004). Developing a Decision Theory. Unpublished manuscript. wireless reference model for interpreting Retrieved from complexity in wireless projects. Industrial http://www.ocf.berkeley.edu/~easwaran/papers/deci Management & Data Systems, 104, 262-272. ision.pdf 8) Avison, D.E. and Fitzgerald, G. (1990). 21) Einhorn, H.J. and Hogarth, R.M. (1981). Information Systems Development: Behavioral Decision Theory: Process of Judgment Methodologies, Techniques and Tools. London: andV Choice. Annual Reviews Psychology, 32, 53- Blackwell. 88. 9) Bahl, H.C. and Hunt, R.G. (1984). Decision- 22) Fishburn, P.C. (1967). Additive Utilities with Making Theory and DSS Design. Data Base, 15(4), Incomplete Product Set: Applications to Priorities 10-14. and Assignments. American Society of Operations 10) Baker, D., Bridges, D., Hunter, R., Johnson, G., Research (ORSA), Baltimore, MD: U.S.A. Krupa, J., Murphy, J. and Sorenson, K. (2002)y 23) Gillespie, M. (2007). Resource Guide for the Guidebook to Decision-Making Methods, WSRC- UMPC Software Developer. Intel.com IM-2002-00002, Retrieved from Departmentl of 24) Häubl, G. and Trifts, V. (2000). Consumer Energy, USA website: http://emi- Decision Making in Online Shopping web.inel.gov/Nissmg/Guidebook_2002.pdf. Environments: The Effects of Interactive Decision 11) Bell, D.E., Raiffa, H., and Tversky,r A. (1988). Aids. Marketing Science, 19(1), 4-21. Descriptive, normative, and prescriptive 25) Hayes, C.C. & Akhavi, F. (2008). Creating interactions in decision making. In D. Bell, Raiffa, Effective Decision Aids for Complex Tasks. H., and A. Tversky (Eds.), Decision making: Journal of Usability Studies. 3 (4), 152 - 172. descriptive, normative,a and prescriptive 26) Heikkinen, M.T. and Still, J. (2005). Business interactions (pp. 9-32). Cambridge: Cambridge Networks and New Mobile Service Development. University Press. Proceedings of the International Conference on 12) Bertini, E., Gabrielli, S., and Kimani, S. (2006). Mobile Business (ICMB‘05). 144 -151. AppropriatingE and Assessing Heuristics for Mobile 27) Heyes, I.S. (2002). Just Enough Wireless Computing. Proceedings of the working Computing. Upper Saddle River, NI: Prentice Hall. Conference on Advanced Visual Interfaces 28) Işıklar, G. and Büyüközkan, G. (2007). Using a AVI‘06, Venezia, Italy. 119-126. multi-criteria decision making approach to evaluate 13) Bridgman, P. W. (1922). Dimensional analysis. mobile phone alternatives. Computer Standards & New Haven, CT: Yale University Press. Interfaces, 29, 265-274. 14) Brown, R. (2008). Decision Aiding Research 29) Jungermann, H. (1980). Speculations about Needs. In. Adam F. and Humphreys P. (Eds.), Decision Theoretic Aids for Personal Decision Encyclopedia of Decision Making and Decision Making. In Acta Psychologica 45 (pp. 7-34). North Support Technologies (pp. 141-147). IGI Global. Holland. 15) Bronner, F. & de Hoog, R. (1982). Non-Expert Use 30) Keeney, R. and Raiffa, H. (1993). Decisions with of a Computerized Decision Aid. In Humphreys, Multiple Objectives : Preference and Value

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 79

Tradeoffs, Cambridge University Press, 47) Todd, P. & Benbasat, I. (1991). An Experimental Cambridge. Investigation of the Impact of Computer Based 31) König-Ries, B. (2009). Challenges in Mobile Decision Aids on Decision Making Strategies. Application Development. it – Information Information Systems Research, 2(2), 87-115. Technology, 51(2), 69-71. 48) Triantaphyllou, E. (2000). Multi-Criteria Decision 32) Law, W. S. (1996). Evaluating imprecision in Making Methods: A Comparative Study. Norwell, engineering design (Ph.D. Dissertation, California MA: Springer. Institute of Technology, Pasadena, California). 49) Ullman, D.G. (2002). The Ideal Engineering 33) Linkov, I., Varghese, A., Jamil, S., Seager, T.P., Decision Support System. Retrieved March 10, Kiker, G. and Bridges, T. (2004) Multi-criteria 2009 from decision analysis: A framework for structuring http://citeseerx.ist.psu.edu/viewdoc/download?doi= remedial decisions at the contaminated sites, In: 10.1.1.87.1827&rep=rep1&type=pdf Linkov, I. and Ramadan, A.B. (Eds.), Comparative 50) Westaby, J.D. (2005). Behavioral reasoning theory: Risk Assessment and Environmental Decision Identifying new linkages underlying intentions and Making (pp. 15-54). New York: Springer. behavior. Organizational Behavior and Human 34) March, S.T. and Smith, G. (1995). Design and Decision Processes, 98, 97-120. Natural Science Research on Information 51) Wooler, S. (1982). A Decision Aid for Structuring Technology. Decision Support Systems, 15(4), and Evaluating Career Choice Options. Journal of 251-266. Operational Research Society,w 33(4), 343-351. 35) McGuire, R. (2002). Decision Making. The 52) Zanakis, S.H., Solomon, A., Wishart, N. and Pharmaceutical Journal. 269, 647-649. Dublish, S. (1998). Multi-attribute decision 36) Miller, D.W., & Starr, M.K. (1969). Executive making: A simulatione comparison of select decisions and operations research. Englewood methods. European Journal of Operational Cliffs, NJ: Prentice-Hall, Inc. Research, 107, 507-529. 37) Naude, P., Lockett, G. and Holms, K. (1997). A 53) Zannier, C, Chaisson,i M. and Maurer, F. (2007). A Case Study of Strategic Engineering Decision model of design decision making based on Making Using Judgmental Modeling and empirical results on interviews with software Psychological Profiling. Transactions on designers. Information and Software Technology, Engineering Management, 44(3), 237-247. 49,V 637-653. 38) Offermann, P., Levina, O., Schonherr, M. and Bub, 54) Zsambok, C.E., Beach, L.R. & Klein, G. (1992) A U. (2009). Outline of a Design Science Research Literature Review of Analytical and Naturalistic Process. Proceedings of DESRIST‘09, Malvern, Decision Making. Final technical report, Fairborn, PA: USA. OH: Klein Associates Inc. 39) Payne, J.,Bettman, J. and Johnson, E. (1993). The XI. APPENDIX Adaptive Decision Maker. Cambridge Universityy Press. Original Pugh Matrix 40) Power, D.J. (1998). Designing and Developingl a Computerized Decision Aid - A Case Study. Retrieved December 10, 2009 from http://dssresources.com/papers/decisionaids.htmlr . 41) Pugh, S. (1990). Total Design: Integrated Methods for Successful Product Engineering. Great Britain: Addison Wesley. 42) Rich, P. (1999). A Processa for Effective Decision Making. Retrieved 5 April 2009 from http://www.selfhelpmagazine.com/article/decision- making 43) Saaty, T.L. E(1977). A Scaling Method for Priorities in Hierarchical Structures. Journal of Mathematical Psychology, 15, 57-68. 44) Saaty, T.L. (1994). Fundamentals of Decision Making and Priority Theory with the AHP. Pittsburgh, PA: RWS Publications. 45) Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63, 129–138. 46) Soelberg, P.O. (1967). Unprogrammed Decision Making. Industrial Management Review, 8, 19-29. P a g e | 80 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Modified Pugh Matrix #1 (MV1)

Modified Pugh Matrix #2 (MV2) w e i V

Modified Pugh Matrix #3 (MV3) y l r a

E Modified Pugh Matrix #4 (MV4)

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 81

Security Provision For Miners Data Using Singular Value Decomposition In Privacy Preserving Data Mining

Narendar.Machha1 M.Y.Babu2

Abstract-Large repositories of data contain sensitive information. Data warehouses and government agencies information that must be protected against unauthorized may potentially have access to many databases collected access. The protection of the confidentiality of this information from different sources and may extract any information has been a long-term goal for the database security research from these databases. This potentially unlimited access to community and for the government statistical agencies. Recent data and information raises the fear of possible abuse and advances in data mining and machine learning algorithms have promotes the call for privacy protection and due process of increased the disclosure risks that one may encounter when releasing data to outside parties. It brings out a new branch of law. Privacy-preserving data mining techniques have been data mining, known as Privacy Preserving Data Mining developed to address these concerns. Thew general goal of the (PPDM). Privacy-Preserving is a major concern in the privacy-preserving data mining techniques is defined as to application of data mining techniques to datasets containing hide sensitive individual data values from the outside world personal, sensitive, or confidential information. Data distortion or from unauthorized persons,e and simultaneously preserve is a critical component to preserve privacy in security-related the underlying data patterns and semantics so that a valid data mining applications; we propose a Singular Value and efficient decision model based on the distorted data can Decomposition (SVD) method for data distortion. We focus be constructed. In thei best scenarios, this new decision primarily on privacy preserving data clustering. Our proposed model should be equivalent to or even better than the model method Singular Value Decomposition (SVD) distorts only confidential numerical attributes to meet privacy using the original data from the viewpoint of decision requirements. accuracy. There are currently at least two broad classes of Keywords-Privacy-Preserving Data Mining, Matrix approachesV to achieving this goal. The first class of Decomposition, singular value decomposition, Nonnegative approaches attempts to distort the original data values so Matrix Factorization data distortion, data utility. that the data miners (analysts) have no means (or greatly reduced ability) to derive the original values of the data. The I. INTRODUCTION second is to modify the data mining algorithms so that they ata mining technologies have now been used in allow data mining operations on distributed datasets without D commercial, industrial, and governmental businesses,y knowing the exact values of the data or without direct for various purposes, ranging from increasing profitability to accessing the original datasets. This paper only discusses the enhancing national security. The widespread applicationsl of first class of approaches. Interested readers may consult data mining technologies have raised concerns about trade (Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, secrecy of corporations and privacy of innocent people M., 2003) and the references therein for discussions on contained in the datasets collected andr used for the data distributed data mining approaches. mining purpose. It is necessary that data mining II. BACKGROUND technologies designed for knowledge discovery across corporations and for security purpose towards general The input to a data mining algorithm in many cases can be population have sufficient privacya awareness to protect the represented by a vector-space model, where a collection of corporate trade secrecy and individual private information. records or objects is encoded as an n m × object-attribute Unfortunately, most standard data mining algorithms are not matrix (Frankes, & Baeza-Yates, 1992). For example, the very efficient in terms of privacy protection, as they were set of vocabulary (words or terms) in a dictionary can be the E items forming the rows of the matrix, and the occurrence originally developed mainly for commercial applications, in which different organizations collect and own their private frequencies of all terms in a document are listed in a databases, and mine their private databases for specific column of the matrix. A collection of documents thus forms commercial purposes. In the cases of inter-corporation and a term-document matrix commonly used in information security data mining applications, data mining algorithms retrieval. In the context of privacy preserving data mining, may be applied to datasets containing sensitive or private each column of the data matrix can contain the attributes of ______a person, such as the person‘s name, income, social security number, address, telephone number, medical records, etc. About-1Assistant Professor,HITS College Of Engineering. Datasets of interest often lead to a very high dimensional (e-mail:[email protected]) matrix representation (Achlioptas, 2004). It is observable About-2 Assistant Professor ([email protected]) Aurora Engineering College that many real-world datasets have nonnegative values for attributes. In fact, many of the existing data distortion P a g e | 82 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology methods inevitably fall into the context of matrix VI. SINGULAR VALUE DECOMPOSITION computation. For instance, having the longest history in Singular Value Decomposition (SVD) is a popular matrix privacy protection area and by adding random noise to the factorization method in data mining and information data, additive noise method can be viewed as a random retrieval. It has been used to reduce the dimensionality of, matrix and therefore its properties can be understood by (and remove the noise in the noisy), datasets in practice studying the properties of random matrices (Kargupta, (Berry, Drmac, & Jessup, 1999). The use of SVD technique Sivakumar, & Ghosh, 2002; Mahta, 1991). Matrix in data distortion is proposed in (Xu, Zhang, Han, & Wang, decomposition in numerical linear algebra typically serves 2005). In (Wang, Zhang, Zhong, & Xu, 2007), the SVD the purpose of finding a computationally convenient means technique is used to distort portions of the datasets. to obtain a solution to a linear system. In the context of data The SVD of the data matrix A is written as mining, the main purpose of matrix decomposition is to T obtain some form of simplified low-rank approximation to A = UΣV the original dataset for understanding the structure of the where U is an nxn orthonormal matrix, Σ = diag[σ , σ ,….. 1 2 data, particularly the relationship within the objects and σ ] (s= min{m, n} ) is an n xm diagonal matrix whose within the attributes and how the objects relate to the s nonnegative diagonal entries (the singular values) are in a attributes (Hubert, Meulman, & Heiser, 2000). The study of T matrix decomposition techniques in data mining, descending order, and V is an mxm orthonormal matrix. particularly in text mining, is not new, but the application of The number of nonzero diagonal entries of Σ is equal to the these techniques as data distortion methods in privacy- rank of the matrix A. w preserving data mining is a recent interest (Xu, Zhang, Han, Due to the arrangement of the singular values in the matrix & Wang, 2005). A unique characteristic of the matrix Σ (in a descending order), the SVD transformation has the decomposition techniques, a compact representation with property that the maximum e variation among the objects is reduced-rank while preserving dominant data patterns, captured in the first dimension, as σ ≥ σ for i≥2. Similarly, 1 i stimulates researchers‘ interest in utilizing them to achieve a much of the remaining variations is captured in the second win-win task both on high degree privacy preserving and i dimension, and so on. Thus, a transformed matrix with a high level data mining accuracy. much lower dimension can be constructed to represent the

AIN FOCUS structure of the original matrix faithfully. Define III. M T A =U Σ V Data distortion is one of the most important parts in many k k Vk k privacy-preserving data mining tasks. The desired distortion Where Uk contains the first k columns of U , Σ contains the k methods must preserve data privacy, and at the same time, T first k nonzero singular values, and V contains the first k must keep the utility of the data after the distortion k T (Verykios, Bertino, Fovino, Provenza, Saygin, & rows of V . The rank of the matrix A is k. With k being Theodoridis, 2004). The classical data distortion methods k are based on the random value y usually small, the dimensionality of the dataset has been perturbation (Agrawal, & Srikant, 2000). The more recent reduced dramatically from min{m, n} to k (assuming all ones are based on the data matrix decomposition strategies attributes are linearly independent). It has been proved that l A is the best k dimensional approximation of A in the (Wang, Zhong, & Zhang, 2006; Wang, Zhang, Zhong, & k Xu, 2007; Xu, Zhang, Han, & Wang, 2006). sense of the Frobenius norm. In data mining applications, the use of A to represent A has IV. UNIFORMLY DISTRIBUTED NOISEr k another important implication. The removed part E =A- A The original data matrix A is added with a uniformly k k distributed noise matrix E . Here E is of the same can be considered as the noise in the original dataset (Xu, u a u Zhang, Han, & Wang, 2006). Thus, in many situations, dimension as that of A, and its elements are random mining on the reduced dataset A may yield better results numbers generated from a continuous uniform distribution k on the interval fromC1 to C2. The distorted data matrix A is than mining on the original dataset A . When used for u privacy-preserving purpose, the distorted dataset A can denoted as: A = A+E . k u Eu provide protection for data privacy, at the same time, it V. NORMALLY DISTRIBUTED NOISE keeps the utility of the original data as it can faithfully represent the original data structure. Similar to the previous method, here the original data matrix A is added with a normally VII. NONNEGATIVE MATRIX FACTORIZATION distributed noise matrix En, which has the same dimension Given an nxm nonnegative matrix dataset A with Aij ≥ 0 as that of A . The elements of En are random numbers and a prespecified positive integer k ≤min{n,m} , the generated from the normal distribution with a parameter nonnegative matrix factorization (NMF) finds two mean μ and a standard deviation ρ. The distorted data matrix nonnegative matrices W Є Rnxk with Wij ≥ 0 and H Є An is denoted as: A = A+ E . n n

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 83

Rkxm with Hij≥ 0, such that A ≈ WH and the objective datasets. The distorted datasets from the techniques based on function adding either uniformly distributed noise or normally distributed noise do not have this property. They is minimized. Here is the Frobenius norm. The matrices W actually generate ―noisy‖ datasets in order to distort data and H may have many other values. desirable properties in data mining applications. Several IX. FUTURE TRENDS algorithms to compute nonnegative matrix factorizations for some applications of practical interests are proposed in (Lee, Using matrix decomposition-based techniques in data & Seung, 1999; Pascual-Montano, Carazo, Kochi, Lehmann, distortion for privacy-preserving data mining is a relatively & Pascual-Marqui, 2006). Some of these algorithms are new trend. This class of data privacy-preserving approaches modified in (Wang, Zhong, & Zhang, 2006) to compute has many desirable advantages over the more standard nonnegative matrix factorizations for enabling privacy- privacy-preserving data mining approaches. There are a lot preserving in datasets for data mining applications. Similar of unanswered questions in this new research direction. For to the sparsified SVD techniques, sparsification techniques example, a classical problem in SVD-based dimensionality can be used to drop small size entries from the computed reduction techniques is to determine the optimal rank of the matrix factors to further distort the data values (Wang, reduced dataset matrix. Although in the data distortion Zhong, & Zhang, 2006). In text mining, NMF has an applications, the rank of the reduced matrix does not seem to advantage over SVD in the sense that if the data values are sensitively affect the degree of the data distortion or the nonnegative in the original dataset, NMF maintains their level of the accuracy of the data miningw results (Wang, nonnegative, but SVD does not. The nonnegative constraints Zhang, Zhong, & Xu, 2007), it is still of both practical and can lead to a parts-based representation because they allow theoretical interests to be able to choose a good rank size for only additive, not subtractive, combinations of the original the reduced data matrix. e Unlike the data distortion basis vectors (Lee, & Seung, 1999). Thus, dataset values techniques based on adding either uniformly distributed from NMF have some meaningful interpretations in the noise or normally distributed noise, SVD and NMF does not original sense. On the contrary, data values from SVD are maintain some statisticali properties of the original sestets, no longer guaranteed to be nonnegative. There has been no such as the mean of the data attributes. Such statistical obvious meaning for the negative values in the SVD properties may or may not be important in certain data matrices. In the context of privacy preserving, on the other mining applications. It would be desirable to design some hand, the negative values in the dataset may actually be an matrix decomposition-basedV data distortion techniques that advantage, as they further obscure the properties of the maintain these statistical properties. The SVD and NMF original datasets. data distortion techniques have been used with the support vector machine based classification algorithms (Xu, Zhang, VIII. UTILITY OF THE DISTORTED DATA Han, & Wang, 2006). It is not clear if they are equally Experimental results obtained in (Wang, Zhang, Zhong, & applicable to other data mining algorithms. It is certainly of Xu, 2007; Wang, Zhong, & Zhang, 2006; Xu, Zhang, Han,y interest for the research community to experiment these data & Wang, 2006; Xu, Zhang, Han, & Wang, 2005), using both distortion techniques with other data mining algorithms. synthetic and real-world datasets with a classificationl There is also a need to develop certain techniques to algorithm, show that both SVD and NMF techniques quantify the level of data privacy preserved in the data provide much higher degree of data distortion than the distortion process. Although some measures for data standard data distortion techniques r based on adding distortion and data utility are defined in (Xu, Zhang, Han, & uniformly distributed noise or normally distributed noise. In Wang, 2006), they are not directly related to the concept of terms of the accuracy of the data mining algorithm, privacy-preserving in datasets. techniques based on adding uniformly distributed noise or X. CONCLUSION normally distributed noise sometimesa degrade the accuracy of the classification results, compared with applying the We have presented two classes of matrix decomposition- algorithm on the original, undistorted datasets. On the other based techniques for data distortion to achieve privacy- hand, both SVD and ENMF techniques can generate distorted preserving in data mining applications. These techniques are datasets that are able to yield better classification results, based on matrix factorization techniques commonly compared with applying the algorithm directly on the practiced in matrix computation and numerical linear original, undistorted datasets. This is amazing, as we algebra. Although their application in text mining is not intuitively expect that data mining algorithms applied on the new, their application in data distortion with privacy- distorted datasets may produce less accurate results, than preserving data mining is a recent attempt. Previous applied on the original datasets. It is not clear why the experimental results have demonstrated that these data distorted data from SVD and NMF are better for the data distortion techniques are highly effective for high accuracy classification algorithm used to obtain the experimental privacy protection, in the sense that they can provide high results. The hypothesis is that both SVD and NMF may have degree of data distortion and maintain high level data utility some functionalities to remove the noise from the original with respect to the data mining algorithms. The datasets by removing small size matrix entries. Thus, the computational methods for SVD and NMF are well distorted datasets from SVD and NMF look like ―cleaned‖ developed in the matrix computation community. Very P a g e | 84 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology efficient software packages are available either in standard matrix computation packages such as MATLAB or from several websites maintained by individual researchers. The availability of these software packages greatly accelerates the application of these and other matrix decomposition and factorization techniques in data mining and other application areas.

XI. REFERENCES

1) Agrawal, R., & Srikant, R. (2000). Privacy- preserving data mining, Proceedings of the 2000 2) ACM SIGMOD International Conference on Management of Data, pp. 439-450, Dallas,TX. 3) Berry, M. W., Drmac, Z., & Jessup, E. R. (1999). Matrix, vector space, and information retrieval. SIAM Review, 41, 335-362. 4) Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. (2003). Tools for privacy preserving distributed data mining. ACM SIGKDD w Explorations, 4(2), 1-7. 5) Gao, J., & Zhang, J. (2003). Sparsification strategies in latent semantic indexing. Proceedings e of the 2003 Text Mining Workshop, pp. 93-103, San Francisco, CA. 6) Hubert, L., Meulman, J., & Heiser, W. (2000). Two i purposes for matrix factorization: a historical appraisal. SIAM Review, 42(4), 68-82. 7) Kargupta, H., Sivakumar, K., & Ghosh, S. (2002). Dependency detection in mobimine and V 8) random matrices. Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 250-262, Helsinki, Finland. 9) Lee, D. D., & Seung, H. S. (1999). Learning in parts of objects by non-negative matrixy factorization. Nature, 401, 788-791. 10) Mahta, M. L. (1991). Random Matrices.l 2nd edition. Academic, London. 11) Analysis and Machine Intelligence, 28, 403-415. Verykios, V.S., Bertino, E.,r Fovino, I. N., Provenza, L. P., Saygin, Y., & Theodoridis, Y. (2004).State-of-the-art in privacy preserving data mining. ACM SIGMOD Record, 3(1), 50-57. 12) Wang, J., Zhong, W.a J., & Zhang, J. (2006). NNMF-based factorization techniques for highaccuracy privacy protection on non-negative- valued datasets. Proceedings of the IEEE E Conference on Data Mining 2006, International Workshop on Privacy Aspects of Date Mining (PADM 2006), pp. 513-517, Hong Kong, China. 13) Xu, S., Zhang, J., Han, D., & Wang, J. (2006). Singular value decomposition based data distortion strategy for privacy protection. Knowledge and Information Systems, 10(3), 383-397.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 85

An Efficient Synchronous Checkpointing Protocol for Mobile Distributed Systems

Parveen Kumar1Rachit Garg2

Abstract-Recent years have witnessed rapid development of messages sent but not yet received. A global state is said to mobile communications and become part of everyday life for be ―consistent‖ if it contains no orphan message; i.e., a most people. In order to transparently adding fault tolerance in message whose receive event is recorded, but its send event mobile distributed systems, Minimum-process coordinated is lost [5]. To recover from a failure, the system restarts its checkpointing is preferable but it may require blocking of execution from the previous consistent global state saved on processes, extra synchronization messages or taking some useless checkpoints. All-process checkpointing may lead to the stable storage during fault-free execution. This saves all exceedingly high checkpointing overhead. In order to balance the computation done up to the last checkpointed state and the checkpointing overhead and the loss of computation on only the computation done thereafter needs to be redone. recovery, we propose a hybrid checkpointing algorithm, In independent checkpointing, processesw do not synchronize wherein an all-process coordinated checkpoint is taken after their checkpointing activity and processes are allowed to the execution of minimum-process coordinated checkpointing records their local checkpoints in an independent way. After algorithm for a fixed number of times. In the minimum-process a failure, system will search a consistent global state by coordinated checkpointing algorithm; an effort has been made tracking the dependencies frome the stable storage. The main to optimize the number of useless checkpoints and blocking of advantage of this approach is that there is no need to processes using probabilistic approach and by computing an interacting set of processes at beginning. We try to reduce the exchange any controli messages during checkpointing. But loss of checkpointing effort when any process fails to take its this requires each process to keep several checkpoints in checkpoint in coordination with others. We reduce the size of stable storage and there is no certainty that a global checkpoint sequence number piggybacked on each consistent state can be built. It may require cascaded computation message rollbacks V that may lead to the initial state due to domino- effect [6]. Acharya and Badrinath[1] were the first who I. BACKGROUND present a uncoordinated checkpointing algorithm for mobile ecent years have witnessed rapid development of computing systems. In their algorithm, an MH takes a local R mobile communications and become part of everyday checkpoint whenever a message reception is preceded by a life for most people. In the future, we will expect more and message sent at that MH. If the send and receive of more people will use some portable units such as notebooksy messages are interleaved, the number of local checkpoints or personal data assistants. With increasing use small will be equal to half of the number of computation portable computers, wireless networks and satellites, a trend messages, which may degrade the system performance. to support ―Computing of the move‖ has emerged.l This In coordinated or synchronous checkpointing, processes take trend is known as mobile computing or ―anytime‖ or checkpoints in such a manner that the resulting global state ―anywhere‖ computing. This enables the user to access and is consistent. Mostly it follows the two-phase commit exchange information while they travel, rroam in their home structure [2], [5], [6], [7], [10], [15]. In the first phase, environments, or work at their desktop computers. Mobile processes take tentative checkpoints, and in the second Hosts (MHs) are increasingly becoming common in phase, these are made permanent. The main advantage is distributed systems due to their aavailability, cost, and mobile that only one permanent checkpoint and at most one connectivity. An MH is a computer that may retain its tentative checkpoint is required to be stored. In the connectivity with the rest of the distributed system through a case of a fault, processes rollback to the last checkpointed wireless network while on move. An MH communicates state [6]. The Chandy-Lamport [5] algorithm is the earliest with the other nodesE of the distributed system via a special non-blocking all-process coordinated checkpointing node called mobile support station (MSS). A ―cell‖ is a algorithm. geographical area around an MSS in which it can support an The existence of mobile nodes in a distributed system MH. An MSS has both wired and wireless links and it acts introduces new issues that need proper handling while as an interface between the static network and a part of the designing a checkpointing algorithm for such systems [1], mobile network. Static nodes are connected by a high speed [4], [14], [16]. These issues are mobility, disconnections, wired network [1]. finite power source, vulnerable to physical damage, lack of A checkpoint is a local state of a process saved on the stable stable storage etc. Prakash and Singhal [14] proposed a storage. In a distributed system, since the processes in the nonblocking minimum-process coordinated checkpointing system do not share memory, a global state of the system is protocol for mobile distributed systems. They proposed that defined as a set of local states, one from each process. The a good checkpointing protocol for mobile distributed state of channels corresponding to a global state is the set of systems should have low overheads on MHs and wireless P a g e | 86 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology channels; and it should avoid awakening of an MH in doze authors proves that the algorithm in[4] is designed to only mode operation. The disconnection of an MH should not handle the situation where the system has only one lead to infinite wait state. The algorithm should be non- checkpoint initiator at a time and can cause inconsistency intrusive and it should force minimum number of processes when there are multiple forced checkpoints or multiple to take their local checkpoints. In minimum-process concurrent checkpoint initiations. In[22], the author point coordinated checkpointing algorithms, some blocking of the out following problems in allowing concurrent initiations in processes takes place [3], [10], [11], or some useless minimum-process checkpointing protocols, particularly in checkpoints are taken [4], [15]. case of mobile distributed systems: In minimum-process coordinated checkpointing algorithms, i) If Pi and Pj concurrently initiate checkpointing and a process Pi takes its checkpoint only if it a member of the Pj belongs to the minimum set of Pi, then Pj‘s minimum set (a subset of interacting process). A process Pi initiation will be redundant one. Some processes, in is in the minimum set only if the checkpoint initiator process Pj‘s minimum set, will unnecessarily take is transitively dependent upon it. Pj is directly dependent multiple checkpoints by hardly advancing their upon Pk only if there exists m such that Pj receives m from recovery line. In other words, an MH may be Pk in the current checkpointing interval [CI] and Pk has not asked to store multiple checkpoints in its local disk. taken its permanent checkpoint after sending m. The ith CI It may also transfer multiple checkpoints to its of a process denotes all the computation performed between local MSS. its ith and (i+1)th checkpoint, including the ith checkpoint ii) Sometimes, multiple triggers need to be but not the (i+1)th checkpoint. piggybacked onto normal w messages. Trigger The koo-Toueg[10] proposed a minimum process contains the initiator process identification and its coordinated checkpointing algorithm for distributed systems csn. Even if a process takes a checkpoint and no with the cost of blocking of processes during checkpointing. concurrent initiatione is going on, it will piggyback However this algorithm requires minimum number of its trigger, unnecessarily. If we do not allow synchronization message and number of checkpoints but concurrent initiation, no trigger is required to be each process uses monotonically increasing labels in its piggybackedi onto normal messages. Hence, outgoing messages. The initiator process sends the concurrent initiations increase message size. checkpoint request to Pi only if it has received m from Pi in Authors [23] have proposed a minimum process coordinated the current CI. Similarly, Pi sends the checkpoint request to checkpointing algorithm for mobile distributed system, other processes. In this way, a checkpointing tree is formed where noV useless checkpoints are taken and an effort is and at last the leaf node processes take checkpoints. The made to minimize the blocking of processes. . They time taken to collect coordinated checkpoint in mobile captured the transitive dependencies during the normal systems may be too large due to mobility, disconnections execution. The Z-dependencies are well taken care of in this and unreliable wireless channels. The extensive blocking of protocol. They also avoided collecting dependency vectors processes may degrade the system performance. Cao and of all processes to compute the minimum set. Singhal [4] achieved non-intrusiveness in the minimum-y In this paper [24], authors propose a nonblocking process algorithm by introducing the concept of mutable coordinated checkpointing algorithm for mobile computing checkpoints. Kumar and Kumar [21] proposed a lminimum- systems, which requires only a minimum number of process coordinated checkpointing algorithm for mobile processes to take permanent checkpoints. They reduce the distributed systems, where the number of useless message complexity as compared to the Cao-Singhal checkpoints and the blocking of processesr are reduced using algorithm [4], while keeping the number of useless a probabilistic approach. Singh and Cabillic [20] proposed a checkpoints unchanged. minimum-process non-intrusive coordinated checkpointing II. INTRODUCTION protocol for deterministic mobile systems,a where anti-messages of The system model is similar to [3], [4]. A mobile computing selective messages are logged during checkpointing. Higaki system consists of a large number of MH‘s and relatively and Takizawa [8], and Kumar et al [17] proposed hybrid fewer MSS‘s. The distributed computation we consider checkpointing protocols where MHs checkpoint consists of n spatially separated sequential processes independently and MSSsE checkpoint synchronously. Neves denoted by P0, P1, ..., Pn-1, running on fail-stop MH‘s or on et al. [13] gave a time based loosely synchronized MSS‘s. Each MH or MSS has one process running on it. coordinated checkpointing protocol that removes the The processes do no share common memory or common overhead of synchronization and piggybacks integer csn clock. Message passing is the only way for processes to (checkpoint sequence number). Pradhan et al [19] had communicate with each other. Each process progresses at its shown that asynchronous checkpointing with message own speed and messages are exchanged through reliable logging is quite effective for checkpointing mobile systems. channels, whose transmission delays are finite but arbitrary. Most of the proposed checkpointing algorithms do not We assume the processes to be non-deterministic. addressing the multiple concurrent initiations in their Similar to [3], [21], [22] initiator process collects the algorithms, as it may exhaust the limited battery and congest dependency vectors of all processes and computes the the wireless channels. The authors claim in that their tentative minimum set. Suppose, during the execution of the algorithm supports concurrent initiations [4]. But in[15] checkpointing algorithm, Pi takes its checkpoint and sends

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 87 m to Pj. Pj receives m such that it has not taken its initiates checkpointing process and sends request to all checkpoint for the current initiation and it does not know processes for their dependency vectors. At time t2, P1 whether it will get the checkpoint request. If Pj takes its receives the dependency vectors from all processes and checkpoint after processing m, m will become orphan. In computes the tentative minimum (mset[]) set as in [21], order to avoid such orphan messages, we use the following which in case of Figure 1 is {P0, P1, P2}]. P1 sends this technique as mentioned in [21]. tentative minimum set to all processes. A process takes its If Pj has sent at least one message to a process, say Pk, and soft checkpoint if it is a member of the tentative minimum Pk is in the tentative minimum set, there is a good set. When P0 and P2 get the mset[], they find themselves in probability that Pj will get the checkpoint request. the mset[]; therefore, they take their soft checkpoints. Therefore, Pj takes its mutable checkpoint before processing When P3, P4 and P5 get the mset[], they find that they are m [4]. In this case, most probably, Pj will get the checkpoint not its members ; therefore, they do not take their request and its mutable checkpoint will be converted into checkpoints. permanent one. Alternatively, this message is buffered Pj. Pj P1 sends m8 after taking its checkpoint and P0 receives m8 will process m only after taking its tentative checkpoint or before getting the mset[]. In this case, P0 buffers m8 and after getting commit as in [22]. processes it only after taking its soft checkpoint. After In minimum-process checkpointing, some processes may taking its soft checkpoint, P1 sends m11 to P3. At the time not be included in the minimum set for several checkpoint of receiving m11, P4 has received the mset[] and it has not initiations due to typical dependency pattern; and they may taken its checkpoint, therefore, P4 takes bitwise logical starve for checkpointing. In the case of a recovery after a AND of sendv4[] and mset[] and findsw that the resultant fault, the loss of computation at such processes may be vector is not all zeroes [sendv3[1]=1 due to m3; mset[2]=1]. unreasonably high [22]. In Mobile Systems, the P3 concludes that most probably, it will get the checkpoint checkpointing overhead is quite high in all-process request in the current initiation;e therefore, it takes its checkpointing [14]. Thus, to balance the checkpointing mutable checkpoint before processing m11. When P2 takes overhead and the loss of computation on recovery, we its soft checkpoint, it finds that it is dependent upon P3 and design a hybrid checkpointing algorithm for mobile P3 is not in the minimumi set [known locally]; therefore, P2 distributed systems, where an all-process checkpoint is sends checkpoint request to P3. On receiving the checkpoint taken after certain number of minimum-process checkpoints. request, P3 converts its mutable checkpoint into soft one. In coordinated checkpointing, if a single process fails to take After taking its checkpoint, P2 sends m13 to P4. P4 takes its checkpoint; all the checkpointing effort goes waste, the bitwiseV logical AND of sendv4[] and mset[] and finds because, each process has to abort its tentative checkpoint. the resultant vector to be all zeroes (sendv4[]=[000001]; In order to take the tentative checkpoint, an MH needs to mset[]=[111000]). P4 concludes that most probably, it will transfer large checkpoint data to its local MSS over wireless not get the checkpoint request in the current initiation; channels. Hence, the loss of checkpointing effort may be therefore, P4 does not take mutable checkpoint but buffers exceedingly high. Therefore, we propose that in the first m13. P4 processes m13 only after getting commit request. phase, all concerned MHs will take soft checkpoint only.y P5 processes m14, because, it has not sent any message Soft checkpoint is similar to mutable checkpoint [4], which since last permanent checkpoint. After taking its is stored on the memory of MH only. In this case,l if some checkpoint, P1 sends m12 to P2. P2 processes m12, because, process fails to take checkpoint in the first phase, then MHs it has already taken its checkpoint in the current initiation. need to abort their soft checkpoints only. The effort of At time t3, P1 receives responses from all relevant processes taking a soft checkpoint is negligible asr compared to the and issues tentative checkpoint request along with the exact tentative one. When the initiator comes to know that all minimum set [P0, P1, P2, P3 ] to all processes. On receiving relevant processes have taken their soft checkpoints, it asks tentative checkpoint request, all relevant processes convert all relevant processes to come into the second phase, in their soft checkpoints into tentative ones and inform the which, a process converts its softa checkpoint into tentative initiator. Finally, at time t4, initiator P2 issues commit. On one. Finally, the initiator issues the commit request. receiving commit following actions are taken. A process, in In the present study, we present a hybrid scheme, where an the minimum set, converts its tentative checkpoint into all process checkpoint is enforced after executing minimum- permanent one and discards its earlier permanent process algorithm forE a fixed number of times as in [22]. In checkpoint, if any. A process, not in the minimum set, the first phase, the MHs in the minimum set are required to discards its mutable checkpoint, if any, or processes the take soft checkpoint only. In the minimum-process buffered messages, if any algorithm, a process takes its forced checkpoint only if it is having a good probability of getting the checkpoint request as in [21].

III. THE PROPOSED CHECKPOINTING SCHEME

A. An Example We explain the minimum-process checkpointing algorithm with the help of an example. In Figure 1, at time t1, P1

P a g e | 88 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

w

e

i

V the minimum set, MSSk converts its disconnected B. Handling Node Mobility and Disconnections checkpoint into permanent one. On global checkpoint Suppose, an MH, say MHi, disconnects from an MSS, say commit, MSSk also updates MHi‘s ddv[], as if, it is a MSSk, it stores its own checkpoint, say disconnect_ckpti, normal process. On the receipt of messages for MHi, MSSk and other support information, e.g. ddv[], at MSSk. During stores them in a queue without updating ddv[]. When MHi, disconnection period, MSSk acts on behalf of MHiy as enters in the follows. If checkpointing process is initiated and MHi is in cell of MSSj, it is connected to the MSSj, if g_chkpt is reset. system). Intuitively, we can say that the number of useless Otherwise, it waits for the g_chkpt to be reset.l Before checkpoints in the proposed algorithm will be negligibly connection, MSSj collects its ddv[], buffered messages small as compared to the algorithm [20]. from MSSk; and MSSk discards MHi‘s support information The proposed protocol suffers from the following limitations and disconnect_ckpti. The buffered messagesr are processed with respect to the existing algorithm [20]. Initiator MSS by MHi, in the order of their receipt at the MSSk. MHi‘s collects dependencies of all processes, computes the ddv[] is updated on the processing of buffered messages. tentative minimum set, and broadcasts the tentative Comparison with existing non-blockinga algorithm In Cao- minimum set along with the checkpoint request to all Singhal algorithm [20], suppose, Pi receives m from Pj MSS‘s. Initiator MSS broadcasts exact minimum set along before taking its checkpoint and Pi is in the minimum set. with the commit request on the static network. Blocking of In this case, after taking its checkpoint, Pi sends checkpoint processes also takes place. Concurrent executions of the request to Pj due toE m. If Pj has taken some permanent algorithm are avoided. checkpoint request after sending m, the checkpoint request IV. CONCLUSIONS to Pj is useless. To enable Pj to decide whether the checkpoint request is useful, Pi also piggybacks csni[j] and a We propose a hybrid checkpointing algorithm, wherein an huge data structure MR[] along with the checkpoint request all-process coordinated checkpoint is taken after the to Pj. These useless checkpoint requests and piggybacked execution of minimum-process coordinated checkpointing data structures increase the message complexity of the algorithm for a fixed number of times. In minimum-process algorithm. Whereas, in our algorithm, no such useless checkpointing, we try to reduce number of useless checkpoint requests are sent and no such information is checkpoints and blocking of processes. We have proposed a piggybacked onto checkpoint requests. The csni[j] is integer; probabilistic approach to reduce the number of useless its size is 4 bytes. In worst case the size of MR[] is (4n +n/8) checkpoints. Thus, the proposed protocol is simultaneously bytes (n is the number of processes in the distributed able to reduce the useless checkpoints and blocking of

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 89 processes at very less cost of maintaining and collecting Systems‖, International Journal of Information and dependencies and piggybacking checkpoint sequence Computing Science, Vol. 9, No. 1, pp. 18-27, 2006. numbers onto normal messages. Concurrent initiations of 12) Lalit Kumar, M. Misra, R.C. Joshi, ―Low overhead the proposed protocol do not cause its concurrent optimal checkpointing for mobile distributed executions. We try to reduce the loss of checkpointing effort systems‖ Proceedings. 19th International when any process fails to take its checkpoint in coordination Conference on IEEE Data Engineering, pp 686 – with others. 88, 2003. 13) Neves N. and Fuchs W. K., ―Adaptive Recovery V. REFERENCES for Mobile Environments,‖ Communications of the 1) Acharya A. and Badrinath B. R., ―Checkpointing ACM, vol. 40, no. 1, pp. 68-74, January 1997. Distributed Applications on Mobile Computers,‖ 14) Prakash R. and Singhal M., ―Low-Cost Proceedings of the 3rd International Conference on Checkpointing and Failure Recovery in Mobile Parallel and Distributed Information Systems, pp. Computing Systems,‖ IEEE Transaction On 73-80, September 1994. Parallel and Distributed Systems, vol. 7, no. 10, pp. 2) Cao G. and Singhal M., ―On coordinated 1035-1048, October1996. checkpointing in Distributed Systems‖, IEEE 15) Weigang Ni, Susan V. Vrbsky and Sibabrata Ray, ― Transactions on Parallel and Distributed Systems, Pitfalls in nonblocking checkpointing‖ World vol. 9, no.12, pp. 1213-1225, Dec 1998. Science‘s journal of Interconnected Networks. Vol. 3) Cao G. and Singhal M., ―On the Impossibility of 1 No. 5, pp. 47-78, March 2004.w Min-process Non-blocking Checkpointing and an 16) Parveen Kumar, Lalit Kumar, R K Chauhan, ―A Efficient Checkpointing Algorithm for Mobile low overhead Non-intrusive Hybrid Synchronous Computing Systems,‖ Proceedings of International checkpointing protocole for mobile systems‖, Conference on Parallel Processing, pp. 37-44, Journal of Multidisciplinary Engineering August 1998. Technologies, Vol.1, No. 1, pp 40-50, 2005. 4) Cao G. and Singhal M., ―Mutable Checkpoints: A 17) Lalit Kumar,i Parveen Kumar, R K chauhan New Checkpointing Approach for Mobile ―Logging based Coordinated Checkpointing in Computing systems,‖ IEEE Transaction On Mobile Distributed Computing Systems‖, IETE Parallel and Distributed Systems, vol. 12, no. 2, pp. journal of research, vol. 51, no. 6, 2005. 157-172, February 2001. 18) LamportsV L., ―Time, clocks and ordering of events 5) Chandy K. M. and Lamport L., ―Distributed in distributed systems‖ Comm. ACM, 21(7), 1978, Snapshots: Determining Global State of Distributed pp 558-565. Systems,‖ ACM Transaction on Computing 19) Pradhan D.K., Krishana P.P. and Vaidya N.H., Systems, vol. 3, No. 1, pp. 63-75, February 1985. ―Recovery in Mobile Wireless Environment: 6) Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson Design and Trade-off Analysis,‖ Proceedings 26th D.B., ―A Survey of Rollback-Recovery Protocolsy International Symposium on Fault-Tolerant in Message-Passing Systems,‖ ACM Computing Computing, pp. 16-25, 1996. Surveys, vol. 34, no. 3, pp. 375-408, 2002.l 20) Pushpendra Singh, Gilbert Cabillic, ―A 7) Elnozahy E.N., Johnson D.B. and Zwaenepoel W., Checkpointing Algorithm for Mobile Computing ―The Performance of Consistent Checkpointing,‖ Environment‖, LNCS, No. 2775, pp 65-74, 2003. Proceedings of the 11th Symposiumr on Reliable 21) Lalit Kumar Awasthi, P.Kumar, ―A Synchronous Distributed Systems, pp. 39-47, October 1992. Checkpointing Protocol for Mobile Distributed 8) Higaki H. and Takizawa M., ―Checkpoint-recovery Systems: Probabilistic Approach‖ International Protocol for Reliable Mobile Systems,‖ Trans. of Journal of Information and Computer Security, Information processinga Japan, vol. 40, no.1, pp. Vol.1, No.3 pp 298-314, 2007. 236-244, Jan. 1999. 22) Parveen Kumar, ―A Low-Cost Hybrid 9) J.L. Kim, T. Park, ―An efficient Protocol for Coordinated Checkpointing Protocol for Mobile checkpointing Recovery in Distributed Systems,‖ Distributed Systems‖, Mobile Information Systems E pp 13-32, Vol. 4, No. 1, 2007. IEEE Trans. Parallel and Distributed Systems, pp. 955-960, Aug. 1993. 23) [23] Kumar, P., & Khunteta, A. A Minimum- 10) Koo R. and Toueg S., ―Checkpointing and Roll- Process Coordinated Checkpointing Protocol For Back Recovery for Distributed Systems,‖ IEEE Mobile Distributed System. International Journal of Trans. on Software Engineering, vol. 13, no. 1, pp. Computer Science issues, Vol. 7, Issue 3, 2010. 23-31, January 1987. 24) [24] Garg, R., & Kumar, P.(2010). A Nonblocking 11) Parveen Kumar, R K Chauhan, ―A Coordinated Coordinated Checkpointing Algorithm for Mobile Checkpointing Protocol for Mobile Computing Computing Systems. International Journal of Computer Science issues, Vol. 7, Issue 3, 2010.

P a g e | 90 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

A Fuzzy Co-Clustering approach for Clickstream Data Pattern

R.Rathipriya1 Dr. K.Thangavel2

Abstract-Web Usage mining is a very important tool to extract identifying the web communities and election analysis the hidden business intelligence data from large databases. [1][2]. Co-Clustering techniques can be used in The extracted information provides the organizations with the collaborative filtering to identify subgroups of customers ability to produce results more effectively to improve their with similar preferences or behaviors towards a subset of businesses and increasing of sales. Co-clustering is a powerful products with the goal of performing target marketing. bipartition technique which identifies group of users associated Recommendation systems and Target Marketing are to group of web pages. These associations are quantified to reveal the users’ interest in the different web pages’ clusters. In important applications in E-commerce area. The main goal this paper, Fuzzy Co-Clustering algorithm is proposed for for the above applications is to identify group of web users clickstream data to identify the subset of users of similar or customers with similar behavior/ interestw so that one can navigational behavior /interest over a subset of web pages of a predict the customer‘s interest and make proper website. Targeting the users group for various promotional recommendations to improve their sale activities is an important aspect of marketing practices. Generally, Co-clustering is a form of two-way clustering in Experiments are conducted on real dataset to prove the which both dimensions aree clustered simultaneously and efficiency of proposed algorithm. The results and findings of generated Co-Clusters are refined using some techniques this algorithm could be used to enhance the marketing strategy like fuzzy approach. The goal of this paper is to provide for directing marketing, advertisements for web based i businesses and so on. fuzzy Co-Clustering algorithm for clickstream data to Keywords-Web usage mining, Fuzzy Co-Clustering, Target quantify the discovered Co-Clusters .Users‘ clusters with marketing, Clickstream data their members are related in a different degree with pages‘ clusters. TheV relation between these clusters is quantified I. INTRODUCTION using fuzzy membership function to show the distribution of owdays, internet is a very fast communication media users‘ interest over the web page clusters. Nbetween business organizations‘ services and their The organization of the paper is as follows. Section 2 customers with very low cost. Web Data mining [1] is an summarizes some of the existing web clustering techniques intelligent data mining technique to analyze web data. It and co-clustering approaches. Section 3 describes the includes web content data, web structure data and web usagey problem statements. The proposed Fuzzy Co-clustering data. Analysis of usage data provides the organizations with algorithm is described briefly in Section 4. The experimental the required information to improve their performances. results of the proposed algorithm are discussed in the l Section 5. Section 6 concludes this paper. In general, Web clustering techniques are used to discover the group of users or group of pages called clusters which II. BACKGROUND are similar between them and dissimilar to the users /pages r A. Related work in the other cluster. User clustering approaches of usage data create groups with similar browsing pattern. Web page‘s Web mining was first proposed by Etzioni in 1996.Web content data, structure data and usage data are used to mining techniques automatically discover and extract cluster the web pages of a weba site. Clustering results may information from World Wide Web documents and services. be beneficial for wide range of application such as web site Cooley et al.[1,6] did in-depth research in web usage personalization, system improvement, web caching and pre- mining.Approaches proposed in [3,10] extend the one fetching, recommendation system, design of collaborative dimensional clustering problem and focus on the filtering and target marketing.E These clustering techniques simultaneous grouping of both users and web pages by are one dimensional where as Co-Clustering is the bi- exploiting their relations. Its goal is to identify groups of dimensional clustering technique. The combination of user related web users and pages, which has similar interest cluster with set of its significant web pages of a web site is across the same subset of pages. This behavior reveals called a Co-Cluster. users‘ interests as similar and highly related to the topic that There are many applications for Co-clustering[7] such as the specific set of pages involves. The obtained results are recommendations systems, direct marketing, text mining, particularly useful for applications such as e-commerce and ______recommendation engines, since relations between clients and products may be revealed. These relations are more About-1Department of Computer Science, Periyar University, meaningful than the one dimensional clustering of users or Salem,Tamilnadu,India. pages. e-mail;[email protected] ,[email protected]

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 91

Co-Clustering algorithms fall into three categories. First clustering the objects of different types simultaneously category models each type of object as a random variable, while preserving the mutual information between the random variables that model these objects. The second where Hits(Ui ,Pj) is the count/frequency of the user category models the relationship between different types of Ui accesses the page Pj during a given period of time. objects as a (nonnegative) matrix. This matrix is approximately decomposed into several matrices, which A user cluster is a group of users that have similar behavior indicate the cluster memberships for the objects . The third when navigating through a web site during a given period of category treats the relationship between different types of time. A page cluster is a group of web pages that are related objects as a graph and performs co-clustering by graph according to user‘s perception, specifically they are partitioning based on spectral analysis [2].Fuzzy biclustering accessed by similar users during a given period of time. approach to correlate the web users and web pages based on Similarity measure used in this paper is Fuzzy similarity. spectral clustering technique was proposed in[2]. Fuzzy similarity measure between two fuzzy subsets B. Co-Clustering Approach X1={x11,x12 ,…….,x1n} and X2={x21,xu22 ,…….,x2n} is By definition Co-Clustering[9] is the process of defined as simultaneous categorization of user and web pages into user cluster and page cluster respectively. The term co-cluster m refers to each pair of user cluster and page cluster. Using the xx12kk matrix illustration, a co-cluster is represented by a sub-  fsim(X X )= k 1 matrix of A where the aij values of all its elements are 1, 2 m w similar to one another. Thus co-clustering is the task of xx12kk finding these coherent sub-matrices of A. One illustration of  k 1 (2) co-clustering is shown in the following matrix. The six This ratio defines the similaritye between two fuzzy subsets, square matrices represent the six co-clusters (i.e. A11 to with values between 0 and 1. Using this similarity measure, A32). compute similarity matrix for user vector and page vector of i user associated matrix A. Fuzzy co-clustering is a technique that performs simultaneous clustering of objects and features using fuzzy membership function to correlate their relations. It allows V user clusters to belong to several page clusters simultaneously with different degree of membership value. The membership value lies between 0 and 1. A. Clustering algorithm : K-Means In this paper, K-Means clustering technique[12] is used to This paper aims to provide a framework for y the create user cluster and page cluster. K-means is one of the simultaneous clustering of web pages and users called simplest unsupervised learning algorithms for clustering Fuzzy Co-Clustering. The relations between webl users and problem. The procedure is simple and easy way to classify a pages in a co-cluster will be identified and quantified. Here, given data set through a certain fixed number of clusters users grouped in the same users‘ cluster may be related to (assume K clusters). more than one web pages‘ cluster with r different degree of The algorithm is composed of the following steps fuzzy membership value and vice versa. : III. PROBLEM STATEMENT This section gives the formal definitionsa of the problem and Place K points into the space represented by the describes how the clickstream data from web server log file objects that are being clustered. These points converted into matrix form. represent initial group centroids. Let A( U , P) be anE ‗n x m‘ user associated matrix where U={U1,U2 ,…….,Un} be a set of users and P={ Assign each object to the group that has the closest P1,P2,…….,Pm} be a set of pages of a web site. It is used to centroid. describe the relationship between web pages and users who access these web pages. Let ‗n‘ be the number of web user When all objects have been assigned, recalculate and ‗m‘ be the number of web pages. The element aij of the positions of the K centroids. A(U,P) represents frequency of the user Ui of U visit the page Pj of P during a given period of time Repeat Steps 2 and 3 until the centroids no longer Hits(Ui ,Pj) , if Pj is visited by Ui move. This produces a separation of the objects into groups from which the metric to be minimized aij = 0 , otherwise (1) can be calculated.

P a g e | 92 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

The K-Means algorithm is significantly sensitive to the D. User fuzzy subset initial randomly selected cluster centers. Run K-Means For each user U , use the user accessing information on algorithm repeatedly with different random cluster i each web page P to describe the visiting pattern. Then centers(called centriods)approximately for ten times. j user fuzzy subset µ of ith user that reflects the user‘s Choose the best centriod whose Davis Bouldin Index Ui visiting behavior is defined as value is minimum.

B. Web Server Log File µµUii  P,f j U P j | P jò P Web server log file[3,5] is a log file automatically created and maintained by server of activity performed by it. where fµUi (Pj) is the membership function which is defined Default log file format is Common log File format. It as contains information about the request, client IP address, Hits Uij , P  fµ i P  request date/time, page requested, HTTP code, bytes served, Uj  m user agent, and referrer. These data can be combined into a  Hits Uik , P  single file, or separated into distinct logs, such as an access k1 (3) log, error log, or referrer log. From web server log file, and m is the number of web pages of a web site. which user access which web page of a web site during a E. Page Fuzzy subset specified period of time can be obtained easily. For each page Pj , use the all user accessingw information on C. Clickstream Data the web page Pj to describe the web page itself. Then page Clickstream data[4] is a natural by-product of a user fuzzy subset µPj that reflects all users‘ visiting behavior on accessing world wide web pages, and refers to the sequence the jth page is defined as e of pages visited and the time these pages were viewed. Clickstream data is to Internet marketers and advertisers. µµPjj  U i , f P U i | U iò U An instance of real clickstream records is the MSNBC i dataset, which describes the page visits of users who visited where fµPj (Ui) is the membership function which is msnbc.com on a single day. There are 989,818 users and defined as only 17 distinct items, because these items are recorded at the level of URL category, not at V

page level, which greatly reduces the dimensionality. The 17 Hits Uij , P  categories are tabulated with their category number. fµUPj i   n  Hits Ukj , P  Frontpage 1 News 2 k1 y (4) Tech 3 Local 4 Opinion 5 On-air 6 and n is the number of web users. Misc 7 Weather 8 l Health 9 Living 10 IV. FUZZY CO-CLUSTERING ALGORITHM FOR CLICKSTREAM Business 11 Sports 12 DATA Summary 13 Bbs r14 In this paper, K-Means clustering method is applied on the Travel 15 msn-news 16 user(row) and page(column) dimensions of the user access msn-sports 17 matrix A(U,P) separately and, then combine the results to Sample Sequences a obtain small co-regulated submatrices called Co-Clusters. Given a user access matrix A, let ku be the number of 1 1 clusters on user dimension and kp be the number of clusters 2 on page dimension after K-Means clustering is applied. Cu 3 2 2 4 2 2 2 3 3 E is the family of user clusters and Cp is the family of page 6 7 7 7 6 6 8 8 8 8 clusters. Let ciu be a subset of users and ciu ϵ Cu (1≤ i≥ 6 9 4 4 4 10 3 10 5 10 4 4 4 ku). Let cjp be a subset of pages and cjp ϵ Cp (1≤ j≥ kp). The pair (ciu,cjp) denotes a Co-Cluster of A . By combining Each row describes the hits of a single user. For example, the results of user dimensional clustering and page the first user hits "frontpage" twice, and the second user hits dimensional clustering, ku × kp Co-clusters are obtained. "news" once. The objective of the paper is to quantify these Co-clusters in different degree using fuzzy membership function. The proposed Fuzzy Co-Clustering algorithm has three phases .

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 93

A. First Phase: User Clustering strategy and also used to improve the quality of recommendation systems. 1. Compute user fuzzy subset of the user associated A(U,P)nxm using equation 3. V. EXPERIMENTATION AND RESULTS 2. Compute user similarity matrix of size ‗n x n‘ Inorder to evaluate performance of the proposed algorithm, using fuzzy similarity measure as defined in experiment is conducted on the benchmark clickstream equation2. dataset of MSNBC.com which describes the sequence of 3. Apply K-Means to user similarity matrix and page visits of users on 28 September 1999. generate ku user groups. A. Data Preprocessing B. Second Phase: Page Clustering Data preprocessing[8] transforms the data into a format that will be more easily and effectively processed for the purpose 1. Compute page fuzzy subset of the user of the user. The techniques to preprocess data include data associated A(U,P)nxm using equation 4. cleaning, data integration, data transformation and data 2. Compute page fuzzy similarity matrix of size ‗m reduction. Clickstream records in the MSNBC dataset is x m‘ using equation2. converted into matrix format where elements aij of A(U,P) 3. Apply K-Means to page similarity matrix and represents the frequency of the user Ui accesses the web generate kp page groups. page Pj of a web site during a given periodw of time. During the user session, the user visited web page categories are C. Third Phase: Fuzzy Relation Coefficients marked with the frequency of that page accessed and otherwise 0. e 1. Combining the results of user dimensional B. Data filtering clustering and page dimensional clustering, to Data filtering is the task of extracting only those records of obtain ku × kp Co-clusters . i 2. Calculate relation coefficients between user weblog files, which are essential for the analysis, thus cluster and page cluster of each Co-Cluster using reducing significantly data necessary for further processing. equation5 that indicates the distribution of related In this paper, data filtering aims to filter out the users who users‘ interest over the page clusters. have visitedV less than 9 page categories of web site. Initially 3. Calculate relation coefficients between there are 989818 users, after this step number of users are user cluster and page cluster of each Co- reduced to 1720 users. Cluster using equation6 that shows which C. Results user cluster has more interest in that page K-Means algorithm is applied to the resultant user cluster. associated matrix of size 1720 X 17 where ku=10 and kp= 3 y was fixed to create ten user clusters and three page clusters. Using equation 4 and equation 5, the relations between user After performing one dimensional clustering on user clusters and page clusters were quantified as shown in Fuzzy fuzzy subset and page fuzzy subset, k user clusters and u l Relation Coefficient Matrix 1and Matrix 2 kp page clusters are related and quantified user clusters‘ interest in the different degree to different page clusters. VI. FUZZY RELATION CO-EFFICIENT It reveals the group of related users‘ r interest in the Table 1 shows which user and page clusters are more related different group of related web pages. The fuzzy relation and it indicate the way of users‘ clusters interest distribution co-efficient between user cluster and web page cluster is u over all pages‘ clusters. From the Table1, User Cluster c2 defined in two ways as p a has more interest in the page cluster c3 because that Co- Cluster ‗s fuzzy relation value is high. Similarly, interested pages for each user cluster can be found easily and efficiently E

Equation5 quantifies the each user clusters‘ interest for different related web page clusters. Equation 6 quantifies the different users‘ clusters interest for each web page clusters. The interpretation of the fuzzy co-clustering result can be used to improve direct and target marketing P a g e | 94 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

c 4 0.0847 0.0423 0.0456 c u 5 0.0485 0.0019 0.0549 c u 6 0.0921 0.3978 0.0877 c u 7 0.0739 0.0346 0.0792 c u 8 0.1049 0.0752 0.1165 c u 9 0.1734 0.0562 0.0475 c u 10 0.1305 0.1256 0.0431

Table 1 : Users‘ Cluster Fuzzy Relation Coefficient for Page Clusters

Table 2 shows Co-Cluster‘s fuzzy relation value by relating user and page clusters. It clearly pictures out which user w cluster has more interest for a page cluster. By this way it is easy to identify target related user group for each page cluster and which is useful for target marketing to make recommendations according to their frequent access of web e

pages during a given period of time. i p p p Clusters c1 c2 c3 u c1 0.0839 0.0162 0.0489 u c2 0.1615 0.1739 0.4192 - V u c3 0.0466 0.0763 0.0574

Table 2 shows Co-Cluster‘s fuzzy relation value by relating user and page clusters. It clearly pictures out which user cluster has more interest for a page cluster. By this way it is easy to identify target related user group for each pagey Interpretation of Co-Cluster result with fuzzy relation value cluster and which is useful for target marketing to make is very helpful to realize how and with which patterns the recommendations according to their frequent accessl of web web site page categories are visited more by the which user pages during a given period of time. cluster. Such information‘s are useful to the web administrators for web site evaluation or reorganization. r Recommend set of related web page category for users group based on the fuzzy relation value also possible.

a cluster results are used by the company for focalized u u u u u u u u u Clusters c1 c2 c3 c4 c5 c6 c7 c8 c9

p c1 0.4192E 0.1389 0.2445 0.4309 0.274 0.2652 0.2805 0.2712 0.595 p c2 0.0103 0.0189 0.0507 0.0273 0.0014 0.1451 0.0167 0.0246 0.0244

p c3 0.5705 0.8422 0.7048 0.5419 0.7246 0.5898 0.7028 0.7041 0.3806 marketing campaigns to an interesting target user cluster.

VII. CONCLUSION

This paper proposed Fuzzy Co-Clustering algorithm for This is a key feature in target marketing. Our Fuzzy Co- Cliskstream data and evaluated it with real dataset. The Clustering algorithm produces non-overlapping co-clusters. results proved its efficiency in correlating the relevant users In future it is extended to generate overlapping clusters. and web pages of a web site. Thus, interpretation of Co-

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 95

VIII. REFERENCES

1) Cooley.R ,Srivastava.J, Deshpande.M, , ―Data preparation for mining world wide web browsing patterns‖, Knowledge and Information Systems ,Vol 1,No.1,pp.5–32,1999. 2) Koutsonikola, V.A. and Vakali, A. ,―A fuzzy bi- clustering approach to correlate web users and

3) pages‖, Int. J. Knowledge and Web Intelligence, Vol. 1, No. 1/2, pp.3–23, 2009. 4) Liu, X., He, P. and Yang, Q. ‗Mining user access patterns based on Web logs‘, Canadian Conference on Electrical and Computer Engineering, May, Saskatoon Inn Saskatoon,Saskatchewan Canada, pp.2280–2283, 2005. 5) Panagiotis Antonellis, Christos Makris, Nikos Tsirakis,‖Algorithms for Clustering ClickStream Data‖,Information Processing Letters, Vol- 109, w Issue 8, pp. 381-385,2009 6) Qinbao Song , Martin Shepperd ,‖ Mining web browsing patterns for E-commerce‖, Computers in e Industry 57,pp. 622–630,2006. 7) Srivastava.J, Cooley.R, Deshpande.M, and P.-N. Tan, ―Web usage mining: Discovery and i applications of usage patterns from web data,‖ SIGKDD Explorations, Vol. 1, No. 2, pp. 12-23, 2000. 8) Stanislav Busygina, Oleg Prokopyevb, and Panos V M. Pardalos,‖ Biclustering in data mining‖, Computers & Operations Research 35 ,pp.2964 – 2987,2008. 9) Suneetha K.R, Dr. R. Krishnamoorthi,‖Data Preprocessing and Easy Access Retrieval of Data through Data Ware House ―,Proceedings of y the World Congress on Engineering and Computer Science 2009, USA, Vol. 1,2009. l 10) Tjhi.W.C and L. Chen ,‖ Minimum sum-squared residue for fuzzy co-clustering‖ Intelligent Data Analysis 9 pp.1–13,2006. r 11) Zeng, H-J., Chen, Z. and Ma, Y-M. ―A unified framework for clustering heterogeneous web objects‖, Proceedings of the 3rd International Conference on Weba Information Systems Engineering, December, Singapore, pp.161–172, 2002.

E

P a g e | 96 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

A Survey on Topology for Bluetooth Based Personal Area Networks

Prof. Anuradha.V1 Dr. Sivaprakasam. P2

Abstract-Bluetooth is a proficient technology for short range between piconets on a time division basis, and, while wireless communication and networking, fundamentally used switching, they must re-synchronize with the current as a alternate for connected cables. Bluetooth is a Wireless piconet. An intended full duplex connection can be Personal Area Network (WPAN) technology, which enables established between the master and the slave by sending and devices to connect and communicate by means of short-range receiving the traffic alternatively. A master or a slave ad-hoc networks. Topology formation remains to be challenging problem in most of the Bluetooth based Wireless involved in the activity of more than one piconet can act as a Personal Area Networks (BT-WPANs). The problem of bridge allowing piconets to form a larger network, a so- topology creation in WPANs can be divided into two sub called scatternet. A slave is allowed to start transmission in a problems: the election of the nodes that have to act as master, given slot, if the master has addressed it in the preceding and the assignment of the slaves to the piconets. Topology slot. In Bluetooth technology, frequencyw hopping or time creation is the procedure of defining the piconets, and the division duplex (FH/TDD) is used for time division into interconnection of the nodes organized in the network area. 625- sec intervals, termed as slots. The master uses intra- Traffic load distribution and energy consumption by the nodes piconet scheduling algorithms to schedule the traffic within are the two major factors that are affected by improper a piconet. Inter-piconet schedulinge algorithms are used to topology design. Many researches have been conceded on topology study for Bluetooth WPANs. These researches decide schedule the existence of the bridges in diverse piconets [8]. to develop an efficient topology for BT-WPANs that may Abundant intra and inter-piconeti scheduling algorithms have consume less energy for communication between the master been proposed [5] [6] [7]. and the slave. This paper presents a survey on various network Network topology creation remains to be a most important topology distribution techniques for Bluetooth based WPANs. aspect in WPANs. Topology creation is the process of Additionally, as a part of future research, this paper also defining the piconets, and the interconnection of the nodes discusses some of the limitations of the available topologies and V deployed in the network area. Certainly, topology design has the probable solutions to over come the limitations an essential impact on the traffic load distribution within the Keywords- Bluetooth, Bridges, Topology, Wireless Personal WPAN, and on the nodes energy consumption. One of the Area Network (WPAN), Nodes, Master, Slave, Piconets, most demanding problems in deploying a BT-WPAN Scatternets, Slots, Frequency Hopping consists in forming a scatternet that meets the constraints I. INTRODUCTION y posed by the system specifications and the traffic requirements. This paper presents a survey on various n recent years, wireless ad-hoc networks have acquired network topology distribution techniques for Bluetooth I significant importance. Correspondingly, a greatl deal of based WPANs. Additionally, as a part of future research, attention is offered towards short range radio systems that this paper also discusses some of the limitations of the are operated using Bluetooth technology [1], [2] and IEEE available topologies and the probable solutions to over come 802.15 [3] Wireless Personal Area Networksr (WPAN). the limitations. Piconets form the fundamental architectural unit in WPANs. The remainder of this paper is organized as follows. Section Bluetooth is a Wireless Personal Area Network (WPAN) II of this paper provides an insight view on different technology, which enables a devices to connect and topologies for Bluetooth based wireless personal area communicate via short-range ad-hoc networks [4]. networks that were proposed earlier in literature. Section III Bluetooth WPANs (BT-WPANs) are characteristically used gives directions for future research. Section IV concludes to twist stand-alone devices located in the range of about 10 the paper with fewer discussions. m into networked E equipment. In general, a piconet comprises of a master device and a maximum of seven slave II. LITERATURE REVIEW devices. The slave devices are limited in operation as they Numerous researches have been carried on topology study are permitted to communicate only with their master device. for Bluetooth WPANs. These researches determine to Additionally, a piconet can have unlimited number of nodes, develop an efficient topology for BT-WPANs that may provided that they remain inactive. In other words, the consume less energy for communication between the master excess nodes will not participate in piconet transmissions. and the slave. This section of the paper provides a close A different frequency hopping sequence may be utilized by study on different topologies for Bluetooth based wireless each piconet. This frequency hopping sequence is normally personal area networks that were proposed earlier in derived from the master address. Because of the exercise of literature. different hopping sequences, a bridge cannot be active in An effective topology for Bluetooth scatternet was proposed more than one piconet at a time; thus, bridges have to switch by Huang et al. in [9]. Bluetooth is a capable technology for

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 97 short range wireless communication and networking, Optimal topology for Bluetooth was projected by Melodia et essentially used as a replacement for connected cables. al. in [12]. As we all know, Bluetooth is a hopeful Since the Bluetooth specification only defines how to build technology for personal/local area wireless communications. piconet, several solutions have been proposed to construct a A Bluetooth scatternet is composed of overlapping piconets, scatternet from the piconets in the literatures. A tree shaped each with a low number of devices sharing the same radio scatternet is called the bluetree. In their paper, they proposed channel. Their paper discusses the scatternet formation an approach to generate the bluetree hierarchically; namely, problem by analyzing topological characteristics of the the nodes are added into the bluetree level by level. This scatternet formed. A matrix-based representation of the kind of Hierarchical Grown Bluetree (HGB) topology network topology is used to define metrics that are applied resolves the defects of the conventional bluetree. During to estimate the key cost parameters and the scatternet growing up, HGB always remains balanced so as to performance. Numerical examples are presented and maintain shorter routing paths. Besides, the links between discussed, highlighting the impact of metric selection on siblings provide alternative paths for routing. As a result, the scatternet performance. Then, a distributed algorithm for traffic load at parent nodes can be significantly improved scatternet topology optimization is introduced, that supports and only two separate parts will be induced if a parent node the formation of a locally optimal scatternet based on a is lost. The Bluetooth network therefore achieves better selected metric. Numerical results obtained by adopting this reliability. distributed approach to optimize the network topology are L. Huang et al. in [10] described the impact of topology on shown to be close to the global optimum. multi-hop Bluetooth personal area network. Their paper Lin et al. in [13] proposed the formationw of a new BlueRing concentrates on the impact of topology on Bluetooth scatternet topology for Bluetooth WPANs. It is personal area network. They initially described some recommendable to have uncomplicated yet competent observations on performance degradations of Bluetooth scatternet topologies withe good supports of routing PAN due to network topologies, and then analyzed its protocols, considering that Bluetooth are to be used for reason. Based on their analysis, they described a lithe personal area networks with design goals of simplicity and scatternet formation algorithm under conference scenario for compactness. In the i literature, even though many routing multi-hop communication. By using proposed method, protocols have been proposed for mobile ad hoc networks, scatternet can be formed flexibly with different topologies directly applying them poses a difficulty due to Bluetooth's under a controlled way. In order to utilize topology special base band and MAC-layer features. In their work, information in multi-hop communication, they proposed they proposedV an attractive scatternet topology called new link metric Load Metric (LM) information in multi-hop BlueRing, which connects piconets as a ring interleaved by communication; they proposed a new link metric Load bridges between piconets, and address its formation, routing, Metric (LM) instead of number of hops. LM is derived from and topology-maintenance protocols. The BlueRing estimation of nodes link bandwidth, which reflects different architecture enjoys the following fine features. First, routing roles of nodes in Bluetooth scatternet. Furthermore, their on BlueRing is stateless in the sense that no routing proposal helped routing protocol to bypass heavily loadedy information needs to be kept by any host once the ring is nodes, and find route with larger bandwidth. They presented formed. This would be favorable for environments such as some experimental results based on implementation,l which Smart Homes where computing capability is limited. proved the effectiveness of their protocols. Second, the architecture is scalable to median-size Hsu et al. in [11] put forth a method of topology formation scatternets easily (e.g. around 50 ~ 70 Bluetooth units). In with the assistance of ns. Bluetoothr is a promising comparison, most star- or treelike scatternet topologies can technology in wireless applications, and many associated easily form a communication bottleneck at the root of the issues are however to be explored both in academia and tree as the network enlarges. Third, maintaining a BlueRing industry. Because of the complexity and the dynamics of is a trouble-free task computer networks, a good a simulation tool plays an as some Bluetooth units join or leave the network. To imperative role in the development stage. Of the existing endure single-point failure, they proposed a protocol-level simulation tools, ns is an accepted, open-source package that solution mechanism. To tolerate multipoint failure, they has a considerable support for simulation of TCP, routing, proposed a recovery mechanism to reconnect the BlueRing. and multicast protocolsE over wired and wireless networks. It Graceful failure is tolerable as long as no two or more also has BlueHoc as its extension for Bluetooth. Although critical points fail at the same time. In addition, they also BlueHoc offers many simulation functions for Bluetooth, all evaluated the ideal network throughput at different simulations must be done in a practically fixed topology. BlueRing sizes and configurations by mathematical analysis. Hence simulation about dynamic topology construction-the Simulation results are presented, which demonstrated that first and an important step in establishing a Bluetooth BlueRing outperforms other scatternet structures with higher network-cannot be conducted. Besides, BlueHoc offers only network throughput and moderate packet delay. a restricted support for building a network. It also lacks A feasible topology formation algorithm for Bluetooth based flexibility in device control, in animated presentation, and in WPANs was presented by Carla et al. in [14]. In their paper, modeling mobility. The main contribution of their paper is they begin with the problem of topology formation in therefore to enhance BlueHoc to support the aforementioned Bluetooth Wireless Personal Area Networks (BT-WPANs). functions. They initially overviewed and extended a previously P a g e | 98 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

proposed centralized optimization approach, and discussed ―HELLO‖ message. The design has no assumption on its results. Then they outlined the main steps of two number, distribution and mobility of nodes. In addition, they procedures that can lead to feasible distributed algorithms presented discussion and simulation results that showed the for the incremental construction of the topology of a BT- proposed algorithm has lower formation latency, maintained WPAN. The centralized optimization approach has the consume and generated an efficient and good quality of advantage of producing topologies that reduce the traffic topology for forwarding packet. load of the most congested node in the network while A self-routing topology for Bluetooth WPANs was put forth meeting the limitations on the BT-WPAN structure and by Sun et al. in [17]. The emerging Bluetooth standard is capacity. On the other hand, the centralized nature and the considered to be the most promising technology to construct high complexity of the optimization are a strong limitation ad-hoc networks. It contains specifications on how to build a of the proposed approach. Distributed algorithms for the piconet but left out the details on how to automatically topology formation of BT-WPANs are much more construct a scatternet from the piconets. Existing solutions attractive, provided their algorithmic complexity and energy only discussed the scatternet formation concern without cost are sufficiently low to allow implementation in large considering the ease of routing in such a scatternet. They BT-WPANs. Moreover, they discussed the distributed presented algorithms to embed b-trees into a scatternet procedures for the insertion and the removal of a node which enables such a network to become self-routing. It in/from a BT-WPAN, which are easily implementable and requires only fix-sized message header and no routing table able to cooperate between the system efficiency and its at each node regardless of the size of the scatternet. These ability to rapidly recover from topology changes. These properties made their solution scalablew to deal with networks procedures are the key building blocks for a distributed of large sizes. Their solutions are of distributed control and solution approach to the BT-WPAN topology formation asynchronous. They also proved that their algorithm problem. preserves the b-tree propertye when devices join or leave the Roy et al. in [15] proposed a new topology construction scatternet and when one scatternet is merged into another. technique for Bluetooth WPANs. They proposed a Salonidis et al. in [18] proposed a distributed topology Bluetooth topology construction protocol that works in formation technique fori Bluetooth personal area networks. combination with a priority-based polling scheme. A master In their paper they introduced and analyzed a randomized assigns a priority to its slaves including bridges for each symmetric protocol that yields link establishment delay with polling cycle and then polls them as many times as the predictable statistical properties. They then proposed the assigned priority. The slaves can splurge their idle time BluetoothV Topology Construction Protocol (BTCP), an either in a power-saving mode or execute new node asynchronous distributed protocol that extends the point-to- discovery. The topology construction algorithm works in a point symmetric mechanism to the case of several nodes. bottom-up manner in which isolated nodes join to form BTCP is based on a distributed leader election process small piconets. These small piconets can come together to where closeness information is discovered in a progressive form larger piconets. Larger piconets can establish sharing way and ultimately accumulated to an elected coordinator bridge nodes to form a scatternet. Individual piconets y can node. BTCP consists on three important phases. They are also discover new nodes while participating in the master- Coordinator election, role determination, connection driven polling process. The shutting down of masterl and establishment and leader election termination. Bluetooth slave nodes is detected for dynamic restructuring of the link establishment is a two-step process that involves the scatternet. The protocol can handle situations when all the Inquiry and Paging procedures. Leader election is an Bluetooth nodes are not within radio ranger of each other. important tool for breaking symmetry in a distributed Scatternet formation of Bluetooth wireless networks was system. They have implemented BTCP on top of an existing projected by Zhen et al. in [16]. In their paper, a protocol prototype implementation that emulates the Bluetooth stack of Bluetooth group ad hoc network and a ―blue-star environment on a Linux platform. island‖ network formation algorithma are proposed. The A distributed Bluetooth scatternet formation method was network formation locates within Bluetooth Network presented by Chang et al. in [19]. They devised a distributed Encapsulation Protocol (BNEP) layer and is underneath the Bluetooth scatternet formation algorithm using the parking routing protocol. The most important task of network property. This parking mechanism allows the master to formation is to establishE and maintain Bluetooth network manage more than seven slaves in its piconet. When a topology with better performance and in a fast and economic master slave pair is formed, the slave is immediately parked way. The routing protocol is generally to find the best routes such that the master will not be restricted by already having among the existing network topology. The network seven active slaves. This method is effortless and valuable formation communicates with routing protocol and and is well-matched with current Bluetooth specification. As management entity using ―Routing Trigger‖ mechanism. we all know that straight line is the shortest way to connect The ―blue-star island‖ algorithm is a distributed 2-stages to points in space, they named their algorithm Blueline to scheme. First, a group of neighbor nodes are self-organized indicate that the communicating path between two Bluetooth into ―blue-star Island,‖ where the joint node is slave in nodes is shorter compared to other scatternets. Their scatternet. Then, initiated by ―Routing Trigger‖ from routing proposed scatternet formation algorithm will allow two protocol, blue-star islands are bridged together. The Bluetooth nodes to form a connection and communicate ―Routing trigger‖ can be ―Route REQuest‖ message or directly if they are within each other‘s transmission range.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 99

The important purpose is to form a topology with the algorithms can be seen as an evolution of the classical local minimum number of hops for routes. One thing not optimum solution search called Steepest Descent (SD). The described in the above algorithm is the switching policy of a approach they proposed to find the optimal network bridge in the scatternet. In order to evaluate the performance topology in a centralized manner completely relies on the of Blueline, they have developed a Bluetooth extension to use of the tabu search (TS) methodology. Numerical results the VINT project network simulator. showed that the distributed algorithm closely approximates Metin et al. in [20] discussed the construction of energy the performance of the centralized solution for almost any efficient Bluetooth scatternets. Bluetooth networks can be number of nodes in the network area. constructed as piconets or scatternets depending on the In order to optimize the topology in Bluetooth PANs Marsan number of nodes in the network. Though piconet et al. proposed a method in [23]. Their optimization construction is a distinct process specified in Bluetooth approach is based on a model resultant from constrictions standards, scatternet formation policies and algorithms are that are unambiguous to the BT-WPAN technology, but the not well specified. Among many solution proposals for this level of abstraction of the model is such that it can be related problem, only a few of them focus on efficient usage of to the more general field of ad hoc networking. By using a bandwidth in the resulting scatternets. In their paper, they min-max formulation, they determined the optimal topology proposed a distributed algorithm for the scatternet formation that provides full network connectivity, fulfills the traffic problem that dynamically constructs and maintains a requirements and the constraints posed by the system scatternet based on estimated traffic flow rates between specification, and minimizes the traffic load of the most nodes. The algorithm is adaptive to changes and maintains a congested node in the network, or equivalentlyw its energy constructed scatternet for bandwidth-efficiency when nodes consumption. Results showed that a topology optimized for come and go or when traffic flow rates change. Based on some traffic requirements is also remarkably robust to simulations, the paper also presented the improvements in changes in the traffic pattern.e Due to the problem bandwidth-efficiency and reduction in energy consumption complexity, the optimal solution is attained in a centralized provided by the proposed algorithm. manner. Although this implies severe limitations, a An algorithm for connected topologies in Bluetooth WPANs centralized solution i can be applied whenever a network was described by Guerin et al. in [21]. They first described coordinator is elected, and provides a useful term of the fundamental characteristics of the Bluetooth technology comparison for any distributed heuristics. that are appropriate to topology formation. They formulated III. FUTURE ENHANCEMENT a mathematical model for the system objectives and V constraints, as an initial step towards a systematic In recent years, wireless ad hoc networks have been a investigation of the connectivity issue. They mainly focused growing area of research. While there has been considerable on designing a topology where a node‘s degree does not research on the topic of routing in such networks, the topic exceed 7. They presented a topology design procedure based of topology creation has received due attention. Bluetooth is on an approximation algorithm guaranteed to generate a a promising new wireless technology, which enables spanning tree with degree at most one more than y the portable devices to form short-range wireless ad hoc minimum possible value in any arbitrary graph. The networks and is based on a frequency Minimum weighted spanning tree algorithm doesl not give hopping physical layer. However, the network topology any analytical guarantee on the degrees of the nodes in the construction at present requires that devices are pairwise in 3-dimensional case. Therefore they utilized MST algorithm range of each other. The issue of determining an optimal to form connected topologies for Bluetoothr networks. topology specifically for BT-WPANs is discussed in [18] Marsan et al. in [22] projected an approach for optimal but is not actually addressed there. The first attempt at topology design in WPANs. In their paper, they deal with finding a solution to the problem is represented by the work the master election and the assignment of the slaves to the in [24]. Further research is needed to conquer this strong piconets, while they do not addressa the election of the bridge requirement while maintaining an easy construction process. nodes. They defined an intention function to be optimized in In addition, it would be interesting to perform simulation the course of the network topology design, which represents studies in order to estimate the parameters of real schedules the above requirements on traffic load distribution and that yield a good tradeoff between achievable throughput, energy consumptionE at the network nodes. Then, they average path length and medium access delay caused by the devised topology design algorithms for WPAN systems, that scheduling. The mobility support of the algorithm is not both maximize the objective function, and satisfy the discussed in [19]. Therefore, the future work may take steps constraints on the maximum number of active slaves to make the algorithm to support mobility by turning the allowed per piconet and on the maximum transmission neighbors discover time to infinity. The future study may range of the radio devices. They initially assumed that a determine to find a mathematical framework for Bluetooth centralized procedure can be performed, and they found the scatternets, in order to allow the design of efficient optimal set of masters as well as the optimal assignment of scatternet topologies slaves to piconets. Then, by maintaining the set of masters identified via the centralized algorithm, they developed a distributed assignment scheme, which well approximates the performance of the centralized solution. Tabu search P a g e | 100 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

IV. CONCLUSION 10) Leping Huang, Hongyuan Chen, V. L. N. Sivakumar, Tsuyoshi Kashima and Kaoru Sezaki, Wireless networks are implemented in a variety of real time ―Impact of Topology on Multi-hop Bluetooth applications. Bluetooth is a capable technology in wireless Personal Area Network,‖ Book Chapter, Springer applications, and many associated issues are however to be link, pp. 131-138, 2004. explored both in academia and industry. Therefore, the 11) Chia-Jui Hsu and Yuh-Jzer Joung, ―An ns-based Bluetooth technology that is used to interface the devices Bluetooth Topology Construction Simulation within a short range is widely used in recent years. The Environment,‖ Proceedings of the 36th annual communication between the connected devices takes place symposium on Simulation, p. 145, 2003. by means of a network, which has to be assigned a topology. 12) Tommaso Melodia and Francesca Cuomo, ―Locally Topology determination for a Bluetooth based WPANs are a Optimal Scatternet Topologies for Bluetooth Ad serious problem in most of the applications. Topology Hoc Networks,‖ Book Chapter on Wireless On- creation resides on election of a master and assignment of Demand Network Systems, Springer link, pp. 19- the slaves to that particular elected master. A lot of 24, 2004. techniques and methods have been proposed earlier in 13) Ting-Yu Lin, Yu-Chee Tseng and Keng-Ming literature for topology formation in Bluetooth wireless Chang, ―A new BlueRing scatternet topology for personal area networks. This paper presents a survey on Bluetooth with its formation, routing, and various network topology distribution techniques for maintenance protocols,‖ Research in Ad Hoc Bluetooth based WPANs. The future work mainly focuses Networking, Smart Sensingw and Pervasive on developing an approach for topology creation that Computing, vol. 3, no. 4, pp. 517-537, 2003. accounts for minimum energy consumption between the 14) Carla F. Chiasserini, Marco Ajmone Marsan, Elena master and slave node. The development of an approach Baralis and Paolo Garza, ―Towards Feasible also considers the traffic load between the nodes e Topology Formation Algorithms for Bluetooth- V. REFERENCES based WPANs,‖ 36th Annual Hawaii International Conference oni System Sciences (HICSS‘03), vol. 1) Haartsen, ―The Bluetooth radio system,‖ IEEE 9, p. 313, 2003. Personal Communications Magazine, pp. 28–36, 15) Rajarshi Roy, Mukesh Kumar, Navin K. Sharma February 2000. and Shamik Sural, ―Bottom-Up Construction of

2) ―The Bluetooth core specification,‖ 2001, BluetoothV Topology under a Traffic-Aware http://www.bluetooth.com. Scheduling Scheme,‖ IEEE Transactions on 3) ―IEEE 802.15 Working Group,‖ 2001, Mobile Computing, vol. 6, no. 1, pp. 72-86, http://www.ieee802.org/15/pub/TG2.html. January, 2007.

4) Bluetooth Special Interest Group, ―Specification of 16) Bin Zhen, Jonghun Park, and Yongsuk Kim, the Bluetooth System – Version 2.0,‖ Nov. 2004. ―Scatternet Formation of Bluetooth Ad Hoc

5) Baatz, M. Frank, C. Kuhl, P. Martini, andy C. Networks,‖ 36th Annual Hawaii International Scholz, ―Bluetooth Scatternet: An Enhanced Conference on System Sciences (HICSS‘03), vol. Adaptive Scheduling Scheme,‖ in Proceedings of 9, p. 312, 2003. IEEE INFOCOM‘02, pp. 782-790, 2002.l 17) Min-Te Sun, Chung-Kuo Chang and Ten-Hwang

6) A .Capone, M. Gerla, and R. Kapoor, ―An Efficient Lai, ―A Self-Routing Topology for Bluetooth Polling Schemes for Bluetooth Picocells,‖ in Scatternets,‖ International Symposium on Parallel Proceedings of IEEE ICC‘01, r vol. 7, pp. 1990- Architectures, Algorithms and Networks (ISPAN 1994, 2001. ‗02), p. 17, 2002. 7) Har-Shai, R. Kofman, A. Segall, and G. Zussman, 18) Theodoros Salonidis and Leandros Tassiulas, ―Load Adaptive Inter-piconeta Scheduling in Small- ―Distributed Topology Construction of Bluetooth scale Bluetooth Scatternets,‖ IEEE Wireless Personal Area Networks,‖ In Proceedings Communications Magazine, vol. 42, pp. 136–142, of IEEE INFOCOM, 2001. July 2004. 19) Ruay-Shiung Chang and Ming-Te Chou, ―Blueline:

8) Gil Zussman,E Adrian Segall and Uri Yechiali, ―On A Distributed Bluetooth Scatternet Formation and the Analysis of the Bluetooth Time Division Routing Algorithm,‖ Journal of Information Duplex Mechanism,‖ IEEE Transactions on Science and Engineering, vol. 21, pp. 479-494, Wireless Communications, vol. 6, no. 6, pp. 2149- 2005. 2161, 2007. 20) Metin Tekkalmaz, Hasan Sozer and Ibrahim

9) Tsung-Chuan Huang, Chu-Sing Yang, Chao-Chieh Korpeoglu, ―Distributed Construction and Huang and Sheng-Wen Bai, ―Hierarchical Grown Maintenance of Bandwidth and Energy Efficient Bluetrees (HGB): an effective topology for Bluetooth Scatternets,‖ IEEE Transactions on Bluetooth scatternets,‖ International Journal of Parallel and Distributed Systems, vol. 17, no. 9, pp. Computational Science and Engineering, vol. 2, no. 963-974, 2006. 2, pp. 23-31, 2006. 21) Guerin, J. Rank, S. Sarkar and E. Vergetis, ―Forming Connected Topologies in Bluetooth Ad-

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 101

hoc Networks - An Algorithmic Perspective,‖ 2003. 22) Ajmone Marsan, C. F. Chiasserini, and A. Nucci, ―Optimal Topology Design in Wireless Personal Area Networks,‖ www.cercom.polito.it/Publication/Pdf/114.pdf. 23) Marco Ajmone Marsan, Carla F. Chiasserini, Antonio Nucci, Giuliana Carello, and Luigi De Giovanni, ―Optimizing the Topology of Bluetooth Wireless Personal Area Networks,‖ 2005. 24) O. Miklos, A. Racz, Z. Turanyi, A.Valko, and P. Johansson, ―Performance Aspects of Bluetooth Scatternet Formation,‖ First Annual Workshop on Mobile and Ad Hoc Networking and Computing (MobiHoc), pp. 147–148, August 2000.

w

e

i

V

y

l

r

a

E

P a g e | 102 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Identification of Most Desirable Parameters in SIGN Language Tools: A Comparative Study

Yousef Al-Ohali

Abstract-Symbolic languages have gained popularity to provided to summarize and facilitate the easy choice of tool undertake communication within community of people with for translating the words and punctuations into their special needs. The languages are used to help the individuals symbolic equality. with special needs. Research exporting the possibilities for suggesting better communication symbols can not only be II. SIGN LANGUAGE TOOLS taken as a scientific achievement but also as great services to humanity. This paper focuses on the identification and In this section, we survey available tools that help designers comparison of tools that have been developed to present the and developers to develop new systems for sign language words in symbols. The careful selection of tool has been made translation. so that the competitors are of adequate standard to generate A. Vsign the valuable results. The comparative study has focused on w most desirable parameters, e.g. 3D animation, video based Vsign [1] is a 3-D animation tool implemented in representation, sign Editor, Education tool, dictionary, text Macromedia ShockWave. It is sponsored by EMMA analysis and speech recognition. An ample amount of tools (European Media Masters of Art) 2001/2002. It models the have been discussed and their merits and de-merits have been e sign animations by means of an editor. Vsign consists of two explored. In light of the discussion the choice of appropriate parts: tool can be made based on the customized requirements. Vsign Builder: The builderi is an editor that facilitates a way I. INTRODUCTION to setup the beginning, end and intermediate states of signs (Figure 1). It provides separate modeling for hands, body sign language uses visual sign patterns to convey and arms. The animation is saved to text files (with special Ameanings by combining the hand shapes, movements file extensionV ―gbr‖). and orientations of the diversified shapes of hands, arms and Vsign Player: This part facilitates playback of the animation other associated parts of the body. The facial expressions are file from a properly stored text file. also used to fully express the thoughts of the speaker. The Vsign is a good tool that can be utilized to implement sign sign languages are basically developed to help the deaf language translation since it contains a 3D capability along understand the message without listening. Diversity in the with the sign language editor. Fortunately, it does not have expressions has been observed throughout the world that is an extra hardware requirement. Furthermore, Vsign uses a governed by the culture, traditions, symbolic signaly simple file format to store animation information. However, representation and inter-symbol sequencing. Hundreds of this tool has some drawbacks. It does not have a user- sign languages are being used throughout l the world friendly interface, for instance. In addition, it produces simultaneously and have been greatly admired by the deaf unrealistic (far from natural) 3D viewing culture. The sign languages have been observed tor be existing since 5th century BC. In 1620 Juan Pablo published ―Reduction of letters and art for teaching mute people to speak‖ in Madrid which is considered to be the afirst symbolic representation of the words and phonetics enabling deaf to learn and present themselves by using the signs. Charles-Michel‘s work has been revolutionary in this domain and is used in France and North AmericaE until the present time. With the passage of time, the need to develop computerized systems that can help the deaf in conveying and understanding the message has increased. In the consequent sections we discuss the available tools that can help in translating the words into symbols, we also find the merits and de-merits of each tool, and finally a tabular view is

______

About-Deanship Of E-Transactions And Communication King Saud University, Riyadh, Saudi Arabia Figure 1: Vsign Builder [email protected]

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 103

B. The DeaulP University American Sign Language in videos which are helpful for reminding the user of the Project words. The site and the product use video clips to show This is a large scaled and professional academic 3-D project motions and icons but as the quality increases it gets harder that aims to translate English to the American Sign to process and download those videos. The portal has many Language (ASL) [2]. In order to improve quality of the lessons that are educationally well-organized. It includes animation, the project emphasizes on shadows and some games that are simultaneously educational and naturalness. Shadows and different light sources are entertaining, thus providing an enjoyable user experience. implemented to make animations look normal. To achieve E. SIGN naturalness, every animation is repeated hundreds of times The eSIGN [5] project aims to provide sign language on to detect and correct sharp/unrealistic movement transitions. websites with small software installed to clients. It uses both Furthermore, the project aims towards comprehensible 3-D animations and videos as the expression medium. It finger spelling. Thus, translation from every letter to a animates original BBC news simultaneously with a smart proper sign is kept in video file as AVI format. avatar near the news video. Moreover, eSIGN provides a The produced animations seem descriptive and realistic. user friendly interface (Figure 3). The animations are However, there are only a few sample animations in the created with an intelligent sign language editor. The website (Figure 2) which does not give a concrete idea about animations are based on motion-capture data and so they are the educational aspect and user interface of the project more realistic than synthetic ones. Nevertheless, the hand

shapes are not caught easily since the avatar is small. w

e

i

V

y Figure 2: The DePaul University American Sign Language example l C. Reading Power

Reading Power [3] is an educational software product for r native signers focused on literacy and reading comprehension. The software includes storytelling, interactive conversation, and tools to build comprehension and vocabulary. Reading Powera uses 3-D signing characters to unlock the power of reading and to add fun to the learning process. Reading Power also includes teacher support materials, activities, a starter dictionary and ideas for E integrating technology into learning. Reading Power uses 3-

D signing characters which avoid the disadvantages of video based applications. Reading Power has a big advantage with its 3-D virtual environment.

D. Ready Set Sign Ready Set Sign (RSS) [4] is an online portal for teaching American Sign Language, but the main product is published and sold via CD. The portal has many lessons and many video clips for each lesson. The courses are organized as if they are intended to teach a foreign language. The site is easily understandable. Iconic explanations are widely used P a g e | 104 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Figure 3: eSIGN (a- provide sign language on websites, b- components: word processor, text to sign/speech converter, sign language translator) an English-ASL dictionary and the ASL playroom. The target audience is not limited to people who want to F. SignGenius learn new signs, but extends to those who look for fun along SignGenius [6] is a fast, interactive software package to the way. Interaction with the objects is provided in a user learn Sign Language developed by Moving Hand friendly and clever manner (Figure 5). This is digital video oriented tool, and is not a 3-D based environment Enterprises and accredited by DEAFSA (South African

National Council for the Deaf). It uses video clips to demonstrate sign language. SignGenius is composed of six sections (Figure 4):  Tips: Overview of the basic hand shapes and movements that a user may need to know in order to use sign language correctly.

 Tutor: 2197 video clips grouped into 65 categories.

 Test: test feature to test the ability to associate the

video clips with the correct words.

 Score: For parents, teachers and students, measure w progressing medium.  Info: Comprehensive list of addresses of Deaf organizations, support groups etc.  Game: A built-in Hangman game. e

SignGenius is not an animation sign language tool but it has (a)i Word Processor a numbers of features like advanced search function, user

friendly interface, and good categorization for the tutor. However, SignGenius has some shortcomings e.g. low video quality, and insufficient educational perspective. V

y

l

r

(c) English-ASL Dictionary a

E

Figure 4: SignGenius

G. Personal Communicator Personal communicator [7] is a tool for learning and communicating in American Sign Language (ASL) developed by Comm. Tech Lab in MSU. Personal Communicator uses digital video and compression technology for presenting sign language features. It has four

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 105

(c) ASL Playroom I. Auslan Tuition System Auslan Tuition System [9] is a 3-D animation of Australian Sign Language. It is created by the School of Computer Science & Software Engineering, University of Western Australia. It consists of two parts: Auslan Tuition System and Auslan Sign Editor. The Auslan tuition system is made up of several modes: Tutorial mode that allows the user to select an Auslan phrase and learn the sign. Finger-spelling mode where user enters words that are then finger spelled. Dialogue mode that has two avatars signing dialogue together. This mode is designed for the sake of phrases learning in conversations. Numbers mode which is used for number signing. The Auslan Sign Editor software concentrates on building the signs, whereas the tuition part is the front end of the system and is used to display the constructed signs in a tutorial manner (Figure 7). Only Auslanw Tuition System is (d) ASL Browser available for download from the web. Shown animation Figure 5: Personal Communicator demos seem detailed and realistic. H. ViSiCAST e

ViSiCAST [8] is a project funded under the European Union

Fifth Framework which is part of the Information Society i

Technologies (IST) program. It is a large project consisting of three main parts:

Multimedia and WWW Applications: which enables authors of web pages to provide signed material as part of the page‘s V content.

Face-to-Face Transactions: this part provides a basis for dialogue between customer and clerk, through the incorporation of available moving image recognition technology to ‗read‘ simple signs made by the deaf customer, which can then be translated into text or speech y for the benefit of the clerk.

Television & Broadcast Transmission: this part concerns the provision of virtual human synthetic signing capabilitiesl in the context of broadcast television, and has two related Figure 7: Auslan Tuition System aspects: development of the necessary transmission technology and the incorporation of ViSiCASTr work into J. Sign Smith Studio & Gesture Builder the relevant broadcast standards. The Sign Smith Studio authoring tool from Vcom3D allows

individuals to rapidly create Signing Avatar scripts for a creating sign enabled content. Studio offers many powerful

features for changing coordinated facial expression, eye

gaze, role shifting and speech [10]. It contains over 2,500 ready to use signs in its dictionary. Sign Builder allows E users of Studio to "spatially inflect" signs such as pronouns,

verbs and classifiers. Sign Builder also allows users to

create other signs that may not be a part of Studio‘s core

dictionary. These include:

Specialized technical and science vocabulary.

Signs which are standard in certain regions of the U.S.

Contextualized name signs for people and places. Foreign sign languages such as British Sign Language Figure 6: ViSiCAST (BSL), etc. A key feature of this tool is Inverse Kinematics (IK) technology. It allows the user to focus on the hand position. P a g e | 106 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Once the user selects a hand shape and properly positions Arabic text to show the related ARSL animation, the hand, the IK software automatically places the joints of categorized ARSL vocabulary dictionary, and a sign the wrist, elbow and shoulder in natural positions. These language text editor (Figure 10). It consists of three parts: features give full power of creativity to the user. It is an easy Translator: It provides text to translate signs. It allows users tool to learn and use (Figure 8). to enter an Arabic text and view the Arabic signs that are related to the entered text. Dictionary: Dictionary of Tawasoul is a basic vocabulary guide for users who want to learn Arabic Sign Language. It consists of a number of categories; each one contains a

related group of words. Finger Spelling: it can be utilized as a sign language editor to help users to write documents in sign alphabetic letters by converting the entered Arabic text to sign language text

w

e

Figure 8: Sign Smith Studio i K. SiSi (Say It Sign It) System

It is 3D animation tool developed by researchers at IBM [11]. SiSi translates spoken or written words to the British Sign Language (BSL). In case of spoken words SiSi first V Figure 10: Tawasoul translate the words to text then to 3D animations (Figure 9). The system is useful in many situations when there‘s no sign M. 3D-Sign language interpreter like radio, telephone calls, some It is Malaysian sign language project [13]. It aims to develop television shows. a package to assist the learning of Malaysian sign language in 3D format using the 3D Poser Artist 4.0 which allows y creating animations using 3D characters; the interface is easy to use (Figure 11). The package consists of the l following functions: One of three human characters can be selected: male, female and child. r Learners can select between different levels: beginner, intermediate and advanced. The 3D animation enables learners to view hand/finger signing from different angles. a Different ways of learning such as chatting & puzzles‘ games.

Figure 9: The 3D character of SiSi system E L. Tawasoul Tawasoul [12] is a research project conducted by the Computer Sciences Department in King Saud University. It was developed as an Arabic Sign Language (ARSL) educational tool for hearing impaired children, their parents, and others who are interested in learning ARSL. The system is comprised of four key features: namely, 3D Animations of ARSL expressions such as hand-signs, mouth and eyes expressions, morphological Analyzers which analyze the Figure 11: 3D Sign

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 107

Hand Speak N. Sign to me O. Hand Speak [14] is American Sign Language (ASL) site that Simon Harvey developed British sign language tool and produced an online dictionary, grammar, storytelling and introduced the ‗Sign to me‘ tool [14]. It provides videos of poetry, manual alphabet (finger spelling), manual numeral, everyday signs aimed at adults and children who have tutorials, articles. Hand speak consist ASL words in the difficulties with their reading and pre-reading age. It constantly growing dictionary. All images after consists of many functions: September 2007 are video-based, the rest of the older Find a Sign (Alphabetical Dictionary): by writing the word images are gif-animation (which will be replaced or the phrase then the video demonstrates the corresponding continually). A teacher can vocally speak a word and the symbol (Figure 12) child fingerspells out a word in spelling lessons. (Figure 13) Picture Signs (Picture Dictionary): Each sign is represented by a symbol in clear categories, when the cursor is rolled over the symbol, a video clip of the sign for that symbol appears and the word is also spoken. Games: by showing a video clip of a sign, then letting the player choose the correct picture that represents that sign. The main advantages is the ease of use and colorful symbols which make it an attractive way to learn. w

e

i

Figure 13: Hand Speak V P. Comparison of various tools Examining these fifteen notable products gives a good

overview of the technology solutions in this domain. Table 1

presents the whole ten products along with main features of Figure 12: Sign to me y each one.

l

r presents the whole ten products along with main features of Q. Comparison of various tools each one. Examining these fifteen notable products gives a good overview of the technology solutionsa in this domain. Table 1 3D Video Sign Education Free Text Speech Tool/ Feature Dictionary Animation Based Editor Tool Analyzer Recognition Vsign EYes No Yes No No No No ASL Project Yes No Yes No Yes No No

eSIGN Yes Yes Yes Yes Yes Yes No

Reading Power Yes No No Yes Yes No No

SignGenius No Yes No No Yes No No

Ready Set Sign No Yes No Yes Yes No Yes P a g e | 108 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

ViSiCAST Yes No No No No Yes Yes Personal No Yes No Yes Yes Yes No Communicator Auslan Tuition Yes No Yes Yes Yes No No

Sign Smith Studio Yes No Yes Yes Yes Yes No

SiSi Yes No No Yes No Yes Yes

Tawasoul Yes No No Yes Yes Yes No

3D-Sign Yes Yes No Yes Yes No No

Sign to me No Yes No Yes Yes No No

Hand Speak No Yes No No Yes No No Table 2: Comparison of sign language products 5) eSIGN at UEA, eSign, w III. CONCLUSION ―http://www.visicast.cmp.uea.ac.uk/eSIGN/Introdu This paper highlights certain parameters that are essential ction.html". for evaluating the effectiveness of sign language tools. The 6) Sign Language Softwaree by SignGenius, parameters include but not limited to 3-D animation, video ―http://www.signgenius.com/‖. based features, sign editor, education tool, dictionary, text 7) Comm Tech Lab @ MSU, Personal analyzer and speech recognition component. After Communicator.,i comprehensive discussion of ten different sign language ―http://commtechlab.msu.edu/index.php‖. tools based on the mentioned parameters it has been 8) ViSiCAST Project, observed that none of the existing tools meets all the ―http://www.visicast.sys.uea.ac.uk/Publications.ht parameters. Comparing all the available tools against known mlV‖. parameters we can identify the extent to which each tool 9) Auslan Tuition System, supports these essential features. ‗VSign‘, ‘SignGenius‘, and ―http://auslantuition.csse.uwa.edu.au/‖. ‘Hand Speak‘ support only two features each, ‗Ready Set 10) Vcom3D, Sign Smith Studio & Gesture Builder., Sign‘ and ‗Auslan Tuition‘ and few other support three or ―http://www.vcom3d.com/‖. four features respectively. ‗Sign Smith Studio‘ support the 11) IBM Recruitment , SiSi software, ―http://www- five major features required in a sign language tool. y The 05.ibm.com/employment/uk/extreme-blue/cool- SiSi tools has the support for animation, text analyzer and projects/sisi.html― speech recognition but lacks the valuable featuresl like video 12) Tawasoul, ―http://tawasoul4arsl.ksu.edu.sa/― base and sign editor. Tawasol also lack the features that SiSi 13) DSpace@UM, 3D-Sign, lacks. ‗3D-Sign‘ and ‗Sign to me‘ are very well video based ―http://dspace.fsktm.um.edu.my/handle/1812/200― but lack the features like animation, editor,r text analyzer and 14) British sign language, Sing to me, speech analyzer. ‗Hand Speak‘ also suffers from the facts http://www.britishsign.co.uk/product_info.php?pN that it does not support 3D-animation, Sign Editor, ame=sign-to-me-cdrom Education Tool, Text and speech analyzer. The result of the 15) Handspeak: http://www.handspeak.com/tour study support that eSIGN a tool provides the best 16) 16. Ahmad Mukhtar Omer, Muhammad Hamassa functionality with respect to the features used for evaluation. Abdul Latif, Mustafa Zahran, „Basic Language „, Bachelors degree thesis, Feb. 2000 17) A tool for analysis of the Arabic sentence," twitter EIV. R EFERENCES Aallissan, Samia geek, Maha Al-Rabiah and Faten 1) Vsign, Vsign Project, Al-Qahtani, a graduation project for a bachelor's http://www.vsign.nl/EN/vsignEN.htm. degree, Riyadh, Feb. 2000 2) DePaul ASL Synthesizer, The DePaul University 18) Francik, J. and P. Fabian, ―Animating Sign American Sign Language Project., Language in the Real Time, 20th IASTED ―http://asl.cs.depaul.edu‖. International Multi-Conference Applied 3) Reading Power, ―http://voisales.com/items/sign- Informatics, Innsbruck, Austria, pp. 276-281, 2002. language-software/vcom3d/reading-power- 19) Holden, E. J. and G. G. Roy, ―The Graphical detail.html‖. Translation of English Text into Signed English in 4) Ready! Set! Sign! the Hand Sign Translator System‖, Computer ―http://www.readysetsign.com/index2.html‖. Graphics Forum (Eurographics'92), vol. 11, no. 3, pp. C357-C366, 1992.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 109

A Low-Overhead Minimum Process Coordinated Checkpointing Algorithm For Mobile Distributed System

Parveen Kumar1 Poonam Gahlan2

Abstract-A distributed system is a collection of independent computers. Fault-tolerant techniques enable a system to entities that cooperate to solve a problem that cannot be perform tasks in the presence of faults. It is easier and more individually solved. A mobile computing system is a distributed cost effective to provide software fault tolerance solutions system where some of processes are running on mobile hosts than hardware solutions to cope with transient failures [25]. (MHs), whose location in the network changes with time. The A distributed system is a collection of independent entities number of processes that take checkpoints is minimized to 1) avoid awakening of MHs in doze mode of operation, 2) that cooperate to solve a problem that cannot be individually minimize thrashing of MHs with checkpointing activity, 3) save solved. With the widespread proliferation of the Internet and limited battery life of MHs and low bandwidth of wireless the emerging global village, the notionw of distributed channels. In minimum-process checkpointing protocols, some computing systems as a useful and widely deployed tool is useless checkpoints are taken or blocking of processes takes becoming a reality [24]. A distributed system can be place. In this paper, we propose a minimum-process characterized as a collectione of mostly autonomous coordinated checkpointing algorithm for non- processors communicating over a communication network deterministic mobile distributed systems, where no and having the following features [25] No common physicali clock; This is an important useless checkpoints are taken. An effort has been made to minimize the blocking of processes and assumption because it introduces the element of ―distribution‖ in the system and gives rise to the inherent synchronization message overhead. We try to reduce asynchrony amongst the processors. the loss of checkpointing effort when any process fails to take its checkpoint in coordination with others. No sharedV memory; This is a key feature that requires Keywords-Checkpointing algorithms; parallel & distributed message-passing for communication. It may be noted that a computing; rollback recovery; fault-tolerant system; mobile distributed system may still provide the abstraction of a computing common address space via the distributed shared memory abstraction. I. INTRODUCTION Geographical separation; It is not necessary for the arallel computing with clusters of workstations is beingy processors to be on a wide-area network (WAN). Recently, P used extensively as they are cost-effective and scalable, the network/cluster of workstations (NOW/COW) and are able to meet the demands of high performancel configuration connecting processors on a LAN is also being computing. Increase in the number of components in such increasingly regarded as a small distributed system. This systems increases the failure probability. It is, thus, NOW configuration is becoming popular because of the necessary to examine both hardware andr software solutions low-cost high-speed off-the-shelf processors now available. to ensure fault tolerance of such parallel computers. To The Google search engine is based on the NOW provide fault tolerance, it is essential to understand the architecture. nature of the faults that occur in these systems. There are Autonomy and heterogeneity; The processors are ―loosely a coupled in that they have different speeds and each can be mainly two kinds of faults: permanent and transient. Permanent faults are caused by permanent damage to one or running a different operating system. They are usually not more components and transient faults are caused by changes part of a dedicated system, but cooperate with one another in environmental conditions.E Permanent faults can be by offering services or solving a problem [25]. rectified by repair or replacement of components. Transient Local checkpoint is the saved state of a process at a faults remain for a short duration of time and are difficult to processor at a given instance. Global checkpoint is a detect and deal with. Hence it is necessary to provide fault collection of local checkpoints, one from each process. A tolerance particularly for transient failures in parallel global state is said to be ―consistent‖ if it contains no orphan ______message; i.e., a message whose receive event is recorded, but its send event is lost. To recover from a failure, the About-1Department of Computer Science & Engineering Meerut Institute of system restarts its execution from a previous consistent Engineering & Technology, Meerut, India, -250005 (e-mail: [email protected]) global state saved on the stable storage during fault-free About-2Department of Computer Sc & Engg, Singhania University, Pacheri execution. In distributed systems, checkpointing can be Bari (Rajasthan) India;( e-mail: [email protected]) independent, coordinated or quasi-synchronous. Message Logging is also used for fault tolerance in distributed systems [14]. Most of the existing coordinated P a g e | 110 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

checkpointing algorithms [9, 19] rely on the two-phase minimizes the stable storage requirement and also forces protocol and save two kinds of checkpoints on the stable only interacting processes to checkpoint. storage: tentative and permanent. In the first phase, the In coordinated or synchronous checkpointing, processes initiator process takes a tentative checkpoint and requests all coordinate their local checkpointing actions such that the set or selective processes to take their tentative checkpoints. If of all recent checkpoints in the system is guaranteed to be all processes are asked to take their checkpoints, it is called consistent [add reference list……]. In case of a fault, every all-process coordinated checkpointing [5, 7, 19]. process restarts from its most recent permanent/committed Alternatively, if selective communicating processes are checkpoint. Hence, this approach simplifies recovery and it required to take checkpoints, it is called minimum-process does not suffer from domino-effect. Furthermore, checkpointing. Each process informs the initiator whether it coordinated checkpointing requires each process to maintain succeeded in taking a tentative checkpoint. After the only one permanent checkpoint on stable storage, reducing initiator has received positive acknowledgments from all storage overhead and eliminating the need for garbage relevant processes, the algorithm enters the second phase. collection. Its main disadvantage is the large latency Alternatively, if a process fails to take its tentative involved in output commit. checkpoint in the first phase, the initiator process requests A straightforward approach to coordinate checkpointing is all processes to abort their tentative checkpoint. to block communications while the checkpointing process If the initiator learns that all concerned processes have executes. A coordinator takes a checkpoint and broadcasts a successfully taken their tentative checkpoints, the algorithm request message to all processes, asking them to take a enters in the second phase and the initiator asks the relevant checkpoint. When a process receives aw message, it stops its processes to make their tentative checkpoints permanent. execution, flushes all the communication channels, takes a In order to record a consistent global checkpoint, when a tentative checkpoint, and sends an acknowledgement process takes a checkpoint, it asks (by sending checkpoint message back to the coordinator.e After the coordinator requests to) all relevant processes to take checkpoints. receives acknowledgement from all processes, it broadcasts Therefore, coordinated checkpointing suffers from high a commit message that completes the two phase overhead associated with the checkpointing process [20], checkpointing protocol.i After receiving the commit [21], [22], [23]. Much of the previous work [2, 3, 4, 20, 21, message, each process receives the old permanent 22, 23] in coordinated checkpointing has focused on checkpoint and makes the tentative checkpoint permanent. minimizing the number of synchronization messages and the The process is then free to resume execution and exchange number of checkpoints during the checkpointing process. messages V with other processes. The coordinated However, some algorithms (called blocking algorithm) checkpointing algorithms can also be classified into force all relevant processes in the system to block their following two categories: minimum-process and all process computations during the checkpointing process [3, 9, 21, 22, algorithms. 23]. Checkpointing includes the time to trace the Prakash-Singhal algorithm [13] forces only a minimum dependency tree and to save the states of processes on the number of processes to take checkpoints and does not block stable storage, which may be long. Moreover, in mobiley the underlying computation during checkpointing. However, computing systems, due to the mobility of MHs, a message it was proved that their algorithm may result in an may be routed several times before reaching its destination.l inconsistency [3]. Cao and Singhal [4] achieved non- Therefore, blocking algorithms may dramatically reduce the intrusiveness in the minimum-process algorithm by performance of these systems [7]. Recently, non-blocking introducing the concept of mutable checkpoints. The algorithms [7, 19] have received considerabler attention. In number of useless checkpoints in [4] may be exceedingly these algorithms, processes need not block during the high in some situations [16]. Kumar et. al [16] and Kumar checkpointing by using a checkpointing sequence number to et. al [11] reduced the height of the checkpointing tree and identify orphan messages. Moreover, these algorithms [4, the number of useless checkpoints by keeping non- 10] require all processes in thea system to take checkpoints intrusiveness intact, at the extra cost of maintaining and during checkpointing, even though many of them may not collecting dependency vectors, computing the minimum set be necessary. and broadcasting the same on the static network along with A mobile computing system is a distributed system where the checkpoint request. Some minimum-process blocking some of processes E are running on mobile hosts (MHs), algorithms are also proposed in literature [3, 9, 21, 23]. whose location in the network changes with time. To In this paper, we propose an efficient checkpointing communicate with MHs, mobile support stations (MSSs) act algorithm for mobile computing systems that forces only a as access points for the MHs by wireless networks. Features minimum number of processes to take checkpoints. An that make traditional checkpointing algorithms for effort has been made to minimize the blocking of processes distributed systems unsuitable for mobile computing and synchronization message overhead. systems are: locating processes that have to take their We capture the partial transitive dependencies during the checkpoints, energy consumption constraints, lack of stable normal execution by piggybacking dependency vectors onto storage in MHs, and low bandwidth for communication with computation messages. The Z-dependencies are well taken MHs [1]. Minimum-process coordinated checkpointing is an care of in this protocol. In order to reduce the message attractive approach for transparently adding fault tolerance overhead, we also avoid collecting dependency vectors of all to distributed applications, since it avoids domino effect, processes to find the minimum set as in [3], [11], [21]. We

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 111 also try to minimize the loss of checkpointing effort when any process fails to take its checkpoint. Third Phase of the Algorithm;Finally, when the proxy MSS learns that all processes in the minimum set have taken their II. PROPOSED CHECKPOINTING ALGORITHM tentative checkpoints successfully, it issues commit request Our system model is similar to [4, 21]. We propose to to all MSSs. When a process in the minimum set gets the handle node mobility and failures during checkpointing as commit request, it converts its tentative checkpoint into proposed in [21]. permanent one and discards its earlier permanent checkpoint, if any. A. The Proposed Algorithm First phase of the algorithm: When a process, say Pi, B. Massage Handling During Checkpointing running on an MH, say MHi, initiates a checkpointing, it When a process takes its mutable checkpoint, it does not sends a checkpoint initiation request to its local MSS, which send any massage till it receives the tentative checkpoint will be the proxy MSS (if the initiator runs on an MSS, then request. Suppose, Pi sends m to Pj after taking its mutable the MSS is the proxy MSS). The proxy MSS maintains the checkpoint and Pj has not taken its mutable checkpoint at dependency vector of Pi say Ri. On the basis of Ri, the set of the time of receiving m. In this case, if Pj takes its mutable dependent processes of Pi is formed, say Sminset. The proxy checkpoint after processing m, then m will become orphan. MSS broadcasts ckpt (Sminset) to all MSSs. When an MSS Therefore, we do not allow Pi to send any massage unless receive ckpt (Sminset) message, it checks, if any processes and until every process in the minimum set have taken its in Sminset are in its cell. If so, the MSS sends mutable mutable checkpoint in the first phase. wPi can send massages checkpoint request message to them. Any process receiving when it receives the tentative checkpoint request; because, at a mutable checkpoint request takes a mutable checkpoint this moment every concerned process has taken its mutable and sends a response to its local MSS. After an MSS checkpoint and m cannot becomee orphan. The massages to received all response messages from the processes to which be sent are buffered at senders end. In this duration, a it sent mutable checkpoint request messages, it sends a process is allowed to continue its normal computations and response to the proxy MSS. It should be noted that in the receive massages. i first phase, all processes take the mutable checkpoints. For a Suppose, Pj gets the mutable checkpoint request at MSSp. process running on a static host, mutable checkpoint is Now, we find any process Pk such that Pk does not belong equivalent to tentative checkpoint. But, for an MH, mutable to Sminset and Pk belongs to Rj[]. In this case, Pk is also checkpoint is different from tentative checkpoint. In order to included V in the minimum set; and Pj sends mutable take a tentative checkpoint, an MH has to record its local checkpoint request to Pk. It should be noted that the state and has to transfer it to its local MSS. But, the mutable Sminset, computed on the basis of dependency vector of checkpoint is stored on the local disk of the MH. It should initiator process is only a subset of the minimum set. Due to be noted that the effort of taking a mutable checkpoint is zigzag dependencies, initiator process may be transitively very small as compared to the tentative one[4]. For a dependent upon some more process which is not included in disconnected MH that is a member of minimum set, y the the Sminset. MSS that has its disconnected checkpoint, considers its C. An Example disconnected checkpoint as the required come. Second Phase of the Algorithm;After the proxyl MSS has The proposed Algorithm can be better understood by the received the response from every MSS, the algorithm enters example shown in Figure 2. There are six processes (P0 to the second phase. If the proxy MSS learns that all relevant P5) denoted by straight lines. Each process is assumed to processes have taken their mutable checkpointsr successfully, have initial permanent checkpoints with csn equal to ―0‖. it asks them to convert their mutable checkpoints into Cix denotes the xth checkpoints of Pi. Initial dependency tentative ones and also sends the exact minimum set along vectors of P0, P1, P2, P3, P4, P5 are [000001], [000010] with this request. Alternatively,a if initiator MSS comes to [000100], [001000], [010000], and [100000], respectively. know that some process has failed to take its checkpoint in The dependency vectors are maintained as explained in the first phase, it issues abort request to all MSS. In this way Section 2.1. the MHs need to abort only the mutable checkpoints, and P0 sends m2 to P1 along with its dependency vector not the tentative ones.E In this way we try to reduce the loss [000001]. When P1 receives m2, it computes its dependency of checkpointing effort in case of abort of checkpointing vector by taking bitwise logical OR of dependency vectors algorithm in first phase. of P0 and P1, which comes out to be [000011]. Similarly, P2 When an MSS receives the tentative checkpoint request, it updates its dependency vector on receiving m3 and it comes asks all the process in the minimum set, which are also out to be [000111]. At time t1, P2 initiates checkpointing running in itself, to convert their mutable checkpoints into algorithm with its dependency vector is [000111]. At time tentative ones. When an MSS learns that all relevant process t1, P2 finds that it is transitively dependent upon P0 and P1. in its cell have taken their tentative checkpoints Therefore, P2 computes the tentative minimum set [Sminset= successfully, it sends response to proxy MSS {P0, P1, P2}]. P2 sends the mutable checkpoint request to P1 and P0 and takes its own mutable checkpoint C21. For an MH the mutable checkpoint is stored on the disk of MH. It should be noted that Sminset is only a subset of the minimum P a g e | 112 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

set. When P1 takes its mutable checkpoint C11, it finds that it is dependent upon P3 due to m4, but P3 is not a member of Sminset; therefore, P1 sends mutable checkpoint request to P3. At time t2, P2 receives responses to mutable checkpoints Consequently, P3 takes its mutable checkpoint C31. requests from all process in the minimum set (not shown in After taking its mutable checkpoint C21, P2 generates m8 the Figure 2) and finds that they have taken their mutable for P3. As P2 has already taken its mutable checkpoint for checkpoints successfully, therefore, P2 issues tentative the current initiation and it has not received the tentative checkpoint request to all processes. On getting tentative checkpoint request from the initiator; therefore P2 buffers checkpoint request, processess in the minimum set [ P0, P1, m8 on its local disk. We define this duration as the P2, P3 ] convert their mutable checkpoints into tentative uncertainty period of a process during which a process is not ones and send the response to initiator process P2; these allowed to send any massage. The massages generated for process also send the massages, buffered at their local disks, sending are buffered at the local disk of the sender‘s to the destination processes For example, P0 sends m10 to process. P2 can sends m8 only after getting tentative P1 after getting tentative checkpoint request [not shown in checkpoint request or abort massages from the initiator the figure]. Similarly, P2 sends m8 to P3 after getting process. Similarly, after taking its mutable checkpoint P0 tentative checkpoint request. At time t3, P2 receives buffers m10 for its uncertainty period. It should be noted responses from the process in minimum set [not shown in that P1 receives m10 only after taking its mutable the figure] and finds that they have taken their tentative checkpoint. Similarly, P3 receives m8 only after taking its checkpoints successfully, therefore, P2 issues commit mutable checkpoint C31.A process receives all the massages request to all process. A process in the minimum set during its uncertainty period for example P3 receives m11. converts its tentative checkpoint into permanentw checkpoint A process is also allowed to perform its normal and discards it old permanent checkpoint if any. computations during its uncertainty period. e i V y l r a E uncertainty period and it can not send any message unless D. Correctness Proof and until it receives the tentative checkpoint request. P2 can We can show that global state collected by the proposed protocol will be consistent. We can prove the result by issue the tentative checkpoint request only after getting contradiction. Suppose there is some orphan message in the confirmed that every concerned process (including P1) has recorded global state. We explore different possibilities with taken its mutable check point. Hence P1 can not receive the help of Figure 2. Suppose, P0 sends m10 after taking its m10 before taking its mutable checkpoint C11. Suppose, P5 mutable checkpoint and P1 receives m10 before taking its sends m13 to P3 after C50 and P3 gets m13 before C31 (not mutable checkpoint. This situation is not possible, because, show in the Figure 2). In this case, when P3 takes its after taking its mutable checkpoint P0 comes into its mutable checkpoint C31, it will find that P5 dose not belong

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 113 to Sminset and P3 is dependent upon P5; therefore, P3 will IV. MESSAGE OVERHEAD OF THE PROPOSED ALGORITHM send mutable checkpoint request to P5 and send (m13) will A. Message overhead in the first phase also be included in the global state the other possibilities can be proved by obviousness [21]. Initiator process sends mutable checkpoint request to the local MSS and (say MSSin) and gets response from the III. COMPARATIVE ANALYSIS OF THE PROPOSED MSSin: 2 Cwl ALGORITHMS WITH OTHER ALGORITHMS MSS in broadcasts mutable checkpoint request over the We use following notations to compare our algorithm with static network: Cbroadcast other algorithms: We suppose that all the process are running on MHs. All the process in the minimum set get the mutable N : number of MSSs. mss checkpoint request from the local MSS and sends response Nmh: number of MHs. to the local MSS: 2*Nmin*Cwl Cpp: cost of sending a message from one Every MSS sends response to MSSin: Nmss*Cst process to another B. MESSAGE OVERHEAD IN THE SECOND PHASE Cst: cost of sending a message between MSSin broadcasts tentative checkpoint request over static any two MSSs. network:Cbroadcast Every process in the minimum set receives tentative Cwl: cost of sending a message from an checkpoint request, and sends responsew to these requests to MH to its local MSS (or vice versa). local MSS: 2*Nmin*Cwl Cbroadcast: cost of broadcasting a message Every MSS sends response to MSSin: Nmss*Cst over static network. C. MESSAGE OVERHEADe IN THE THIRD PHASE C : cost incurred to locate an MH and search MSS broadcasts commit request over static network: forward a message to its current local MSS, in Cbroadcast i from a source MSS. Total Average message overhead: 2Cwl+3 Cbroadcast +4*N *C + 2*N *C Tst: average message delay in static min wl mss st network. Our algorithm is a three phase algorithm; therefore it suffers from extraV message overhead of Cbroadcast +4*Nmin*Cwl. By Twl: average message delay in the wireless doing so, we are able to reduce the loss of checkpointing network. effort in case of abort of the checkpointing procedure in the first phase. In other algorithms [2, 3, 4, 9], in case of abort Tch: average delay to save a checkpoint on in the first phase, all concerned processes are forced to abort the stable storage. It also includes the time to their tentative checkpoint whereas in the proposed scheme, transfer the checkpoint from an MH to y its all relevant processes abort their mutable checkpoints only. local MSS. The effort of taking a mutable checkpoint is negligible as compared to tentative one in the mobile distributed system N: total number of processes l [4]. Frequent abort of checkpointing algorithms, due to Nmin: number of minimum processes exhausted battery power, abrupt disconnections etc., may required to take checkpoints. r significantly increase the checkpointing overhead in two- phase algorithms [2, 3, 4, 9]. We try to minimize the same Nmut: number of useless mutable by designing the three phase algorithm. checkpoints [4]. In our algorithm, only minimum number of processes is required to take their checkpoints. Tsearch: average delay a incurred to locate an The blocking time of the Koo-Toueg [11] protocol is MH and forward a message to its current local highest, followed by Cao-Singhal [4] algorithm. We claim MSS. that the blocking time in the proposed scheme will be E significantly smaller as compared to the KT Algorithm [9]. Nucr: average number of useless checkpoint requests in [4]. Because, in algorithm [9], transitive dependencies are collected by direct dependencies. The checkpoint initiator Ndep: average number of processes on process, say Pin, sends the checkpoint request to any process which a process depends. Pi if Pin is causally dependent upon Pi. Similarly, Pi sends the checkpoint request to any process Pj if Pi is causally h1 : height of the checkpointing tree in dependent upon Pj. In this way, a checkpointing tree is Koo-Toueg algorithm [9]. formed. In the proposed algorithm, transitive dependencies h2 : height of the checkpointing tree in the are captured during normal execution as described in proposed algorithm. Section 2.1. Some zigzag dependencies may not be captured in the proposed scheme during normal execution and they may form low order checkpointing tree in some typical P a g e | 114 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

situations. But, in general, the checkpointing tree formed in processes will be small in the proposed scheme as compared the proposed scheme will be negligibly small as compared to KT algorithm [9]. Furthermore, in the proposed scheme, a to KT algorithm [9] and hence the blocking time of process is blocked when it takes its mutable checkpoint and duration. In our scheme, a process is blocked only if it is a it waits for the other concerned process to take their mutable member of the minimum set.Furthermore, a process is checkpoints to come out of blocking state. In KT allowed to perform its normal computations and receive algorithm [9], a process is blocked when it takes its tentative messages during its blocking period checkpoint and it waits for the other concerned process to In the algorithms proposed in [4], [20], no blocking of take their tentative checkpoints to come out of blocking processes takes place, but some useless checkpoints are state. In mobile distributed systems, the time to take a taken, which are discarded on commit. In Elnozahy et al [7] mutable algorithm, all processes take checkpoints. In the protocols checkpoint may be negligibly small as compared to tentative [3], [9], and in the proposed one, only minimum numbers of checkpoint. Hence, in the proposed scheme, the blocking processes record their checkpoints. In algorithm [4], period of a process will be significantly small as compared concurrent executions of the algorithm are allowed, but it to the KT algorithm [9]. Our blocking period is larger than may lead to inconsistencies in doing so [17]. We avoid the CS algorithm [3], but it suffers from extra message overhead concurrent executions of the proposed algorithm.. of collecting dependency vectors from all processes and Table 1. A Comparison of System Performance moreover, it forces all the processes to block for a short w Cao- Cao- Koo-Toeg Elnozahy Proposed Singhal Singhal [5] Algorithm et al [8] Algorithm [4] [11] e

Avg. 2Tst 0 h1*Tch 0 h2*Tch blocking i Time Average No. N N + N N N min min min min of N mut checkpoints

Average 3Cbroadcast+2C 2*Nmin* 3*Nmin*Cpp* NdepV 2* C broadcast + N 2Cwl+3 + Cbroadcast Message wl+2Nmss*Cst+ Cpp + Cbroadcast + *Cpp +4*Nmin*Cwl + Overhead 3Nmh* Cwl Nucr*Cpp 2*Nmss*Cst

y

V. CONCLUSION l 3) Transactions on Parallel and Distributed Systems, In this paper, we have proposed a minimum-process vol. 9, no.12, pp. 1213-1225, Dec 1998. checkpointing protocol for deterministic mobile distributed 4) Cao G. and Singhal M., ―On the Impossibility of systems, where no useless checkpointsr are taken and an Min-process Non-blocking Checkpointing and an effort has been made to minimize the blocking of processes. Efficient Checkpointing Algorithm for Mobile We try to reduce the checkpointing time and blocking time 5) Computing Systems,‖ Proceedings of International of processes by limiting checkpointing tree which may be Conference on Parallel Processing, pp. 37-44, formed in other algorithms [4, 9].a We captured the transitive August 1998. dependencies during the normal execution by piggybacking 6) Cao G. and Singhal M., ―Mutable Checkpoints: A dependency vectors onto computation messages. The Z- New Checkpointing Approach for Mobile dependencies are well taken care of in this protocol. We also Computing systems,‖ IEEE Transaction On E Parallel and Distributed Systems, vol. 12, no. 2, pp. try to reduce the loss of checkpointing effort when any process fails to take its checkpoint in coordination with 157-172, February 2001. others 7) Chandy K. M. and Lamport L., ―Distributed Snapshots: Determining Global State of Distributed EFERENCES VI. R Systems,‖ ACM Transaction on Computing 1) Acharya A. and Badrinath B. R., ―Checkpointing Systems, vol. 3, No. 1, pp. 63-75, February 1985. Distributed Applications on Mobile Computers,‖ 8) Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson Proceedings of the 3rd International Conference on D.B., ―A Survey of Rollback-Recovery Protocols Parallel and Distributed Information Systems, pp. in Message-Passing Systems,‖ ACM Computing 73-80, September 1994. Surveys, vol. 34, no. 3, pp. 375-408, 2002. 2) Cao G. and Singhal M., ―On coordinated 9) Elnozahy E.N., Johnson D.B. and Zwaenepoel W., checkpointing in Distributed Systems‖, IEEE ―The Performance of Consistent Checkpointing,‖

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 115

Proceedings of the 11th Symposium on Reliable 22) Parveen Kumar, Lalit Kumar, R K Chauhan, ―A Distributed Systems, pp. 39-47, October 1992. Non-intrusive Hybrid Synchronous Checkpointing 10) Higaki H. and Takizawa M., ―Checkpoint-recovery Protocol for Mobile Systems‖, IETE Journal of Protocol for Reliable Mobile Systems,‖ Trans. of Research, Vol. 52 No. 2&3, 2006. Information processing Japan, vol. 40, no.1, pp. 23) Parveen Kumar, ―A Low-Cost Hybrid Coordinated 236-244, Jan. 1999. Checkpointing Protocol for mobile distributed 11) Koo R. and Toueg S., ―Checkpointing and Roll- systems‖, Mobile Information Systems. pp 13-32, Back Recovery for Distributed Systems,‖ IEEE Vol. 4, No. 1, 2007. Trans. on Software Engineering, vol. 13, no. 1, pp. 24) Lalit Kumar Awasthi, Parveen Kumar, ―A 23-31, January 1987. Synchronous Checkpointing Protocol for Mobile 12) Neves N. and Fuchs W. K., ―Adaptive Recovery Distributed Systems: Probabilistic Approach‖ for Mobile Environments,‖ Communications of the International Journal of Information and Computer ACM, vol. 40, no. 1, pp. 68-74, January 1997. Security, Vol.1, No.3 pp 298-314. 13) Parveen Kumar, Lalit Kumar, R K Chauhan, V K 25) Sunil Kumar, R K Chauhan, Parveen Kumar, ―A Gupta ―A Non-Intrusive Minimum Process Minimum-process Coordinated Checkpointing Synchronous Checkpointing Protocol for Mobile Protocol for Mobile Computing Systems‖, Distributed Systems‖ Proceedings of IEEE International Journal of Foundations of Computer ICPWC-2005, pp 491-95, January 2005. science,Vol 19, No. 4, pp 1015-1038 (2008). 14) Pradhan D.K., Krishana P.P. and Vaidya N.H., 26) [24] A. Tanenbaum and M. Vanw Steen, Distributed ―Recovery in Mobile Wireless Environment: Systems: Principles and Paradigms, Upper Saddle Design and Trade-off Analysis,‖ Proceedings 26th River, NJ, Prentice-Hall, 2003. International Symposium on Fault-Tolerant 27) [25] M. Singhal e and N. Shivaratri, Advanced Computing, pp. 16-25, 1996. Concepts in Operating Systems, New York, 15) Prakash R. and Singhal M., ―Low-Cost McGraw Hill, 1994. Checkpointing and Failure Recovery in Mobile i Computing Systems,‖ IEEE Transaction On Parallel and Distributed Systems, vol. 7, no. 10, pp. 1035-1048, October1996. 16) Ssu K.F., Yao B., Fuchs W.K. and Neves N. F., V ―Adaptive Checkpointing with Storage Management for Mobile Environments,‖ IEEE Transactions on Reliability, vol. 48, no. 4, pp. 315- 324, December 1999. 17) J.L. Kim, T. Park, ―An efficient Protocol for checkpointing Recovery in Distributed Systems,‖y IEEE Trans. Parallel and Distributed Systems, pp. 955-960, Aug. 1993. l 18) L. Kumar, M. Misra, R.C. Joshi, ―Low overhead optimal checkpointing for mobile distributed systems‖ Proceedings. 19th IEEEr International Conference on Data Engineering, pp 686 – 88, 2003. 19) Ni, W., S. Vrbsky and S. Ray, ―Pitfalls in Distributed Nonblockinga Checkpointing‖, Journal of Interconnection Networks, Vol. 1 No. 5, pp. 47- 78, March 2004. 20) L. Lamport, ―Time, clocks and ordering of events in a distributedE system‖ Comm. ACM, vol.21, no.7, pp. 558-565, July 1978. 21) Silva, L.M. and J.G. Silva, ―Global checkpointing for distributed programs‖, Proc. 11th symp. Reliable Distributed Systems, pp. 155-62, Oct. 1992.

P a g e | 116 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

The Establishment of an AR-based Interactive Digital Artworks

1 2 3 Min-Chai Hsieh Hao-Chiang Koong Lin Jin-Wei Lin Mei-Chi Chen4

Abstract-This work attempts to declare the background of disappeared. From now emphasizing that the verisimilitude personal contemporary state through an immersion of “digital imitation is also redefined in the art history [3]. The text vacancy”. The work is stacked on the identical digital space should be opened to and created by the readers. The with concurrent portrait and enjoyment. Moreover, the work meaning of text is interpreted by the readers instead of the describes the doubt and depression in life, combining with author. This is the well-known ―writable text‖ concept [4]. humor of predicament and absurdities of senses. We employ augmented reality to create digital artworks to present In this work, we will employ Augmented Reality (AR) interactive poem. This work is established where the digital technology to create digital artworks to present a series of poem is generated via the interaction between a video film and interactive poem. The audience can interact with the digital a text-based poem. After establishing the digital artwork, we poem via pre-designed postcards. Notice that the postcard is exhibited the digital work at Digital Art Center (DAC), Taipei, real object, while the digital poemw is virtual sight. Taiwan. The audiences can interactive with digital poem in real Interestingly, the real lies in the virtual; vice versa, virtual time. In comparison to the other AR equipment, the cost of this scenes render in the real environment. Therefore, audience work is quite low. In the future, some usability evaluation will can feel themselves in an environment, both virtual and real. be performed on this work. e Keywords- Augmented reality, Digital artworks, Interactive II. RELATEDWORK poem. In recent years, manyi scholars and institutes have been I. INTRODUCTION carrying out the research on Augmented Reality, one of the techniques of computer vision application. AR is also called n the past, the artists present the creation of domain and Mixed Reality (MR), the extension of Virtual Reality (VR). private space. The most artworks are based on non- I By setting up the scene via , VR can interactive visual creative expression. With the progress V simulate objects in the real world and create the of information technology development, people can create environment where users can interact with the simulated art by using the digital multimedia rather than just doing in a objects. AR is the images, objects or scenes generated by the traditional manner. That is, the way of art-creating has computer that blend into the real environment to strengthen changed dramatically. Thus, the digital art creation becomes our visual feelings. In sum, it adds virtual objects to our real more lively and interesting. Furthermore, these environment. The technology has to possess three qualities, materials/technologies enhance the artists‘ creativity. Artistsy the combination of virtual objects and the real world, real are able to create artworks via technology and multimedia; time interaction, 3D space only. that is to say, they can create artworks with the multimedia l besides the traditional way of creating arts, so they can

create in more fashions to express their thoughts. Today, the

ideas of artists can be implemented in real time via the r powerful computation abilities of various computers.

The process of artworks creation is charmingly. Because it no longer a phenomenon of the slice, but a manifestation of Fig.1. Reality and Virtuality(RV) continuum the experience. Interaction hasa been considered as an Milgram et al. [5] treats the real environment and the virtual important characteristic of digital artworks. But the one as a closed union. We can find it in Fig 1. On the left evolution of the aesthetic point of view is seldom side is merely real environment while on the right side is mentioned. Participate in the experience during the E only virtual environment. VR is inclined to take place of the construction is the significance of create all of the works. It real world; AR is to augment the virtual image produced by formed the ―interactive aesthetics‖ gradually. These are the computer to the real environment. Presently, AR is being important concepts in new media art [1][2]. applied very extensively to such as education, medical In ―The End of Art‖, Arthur C. Danto said/mentioned that technology, military training, engineering, industrial design, the function of art imitation and reappear has already ______art, entertainment and so on [6][7][8][9][10][11][12]. AR combines virtual objects with the real environment and About-1 Dept. of Information and Learning Technology, National displays the virtual object generated by computers in front University of Tainan, Tainan, Taiwan. of users‘ eyes. Milgram et al. [5] defines two displaying (telephone: +886-6-2133111#771 email: [email protected]) ways of AR. One is See-Through AR. Users can directly see About-2Dept. of Information and Learning Technology, National University of Tainan, Tainan, Taiwan. the surrounding environment through the monitor and the (telephone: +886-6-2133111#771 email: [email protected]) monitor also display the virtual image in it. Accordingly, the

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 117 effect of augmented environment can be the greatest via consists of a sequence of Chinese characters. Fig. 2 is the See-through AR. The other is Monitor Based AR. The transformation program written in Processing computer combines the images captured by the webcam with the virtual images. The final image after combination will show up on Head Mounted Display (HMD) or computer monitor. There are two kinds of HMD, one is pure HMD and the other is HMD with a small webcam. The former has small volume and can be equipped with the head mounted tracking instrument, which can track the present angle as well as direction ahead of user‘s head. It is more suitable for research and application of AR. The latter has immersion effect.

III. CONCEPT OF WORK This work attempts to declare the background of personal contemporary state through an immersion of ―digital vacancy,‖ the work is stacked on the identical digital space with concurrent portrait and enjoyment. The author allows the audience to generate engagement of ideas from past w created videos and poetry using interactive media, which further pushes the audience to wonder what they are expecting and the kind of attention they involve in while Fig.2. The transformation programe written in Processing. waiting. Such as life, the crowd passes each other in the city, alternating and switching consciousness and predicament. After these two inputs are fed to the system, each frame in Perhaps the image of dust generally contains an ingenious i the video is transformed to an image constructed by texts. meaning due to naturally-born vision and wisdom; while the The transformation process is depicted as follows. A ―cell condensation of air, image and signs symbolize the endless size‖ is defined in the program, and each cell contains vacancy. Perhaps the audience has fallen into a conventional several pixels, for example, four pixels. The number of the mindset. Often times, the audience needs to think again cell size V determines the style of the resulting image. For before understanding the definition to his or herself. each cell in the frame, we replace the content by the Our thinking can be both naive and profound. Therefore, character in the poem. The orders for the characters to be this work attempts to expand fragments of a series of applied depend on their positions in the poem. identify from the phenomenology of inconspicuous things. Moreover, the color of the character is based on the color of The work describes the doubt and depression in life, the cell on the same position. Notice that the font size of the combining with humor of predicament and absurditiesy of character can also be defined by the designer. Therefore, if senses. We are the city wanderers who observe various the font size is larger than the cell size, characters on the surrounding symbols through constantly mutteringl without image may overlap with each others so that the colors will probing into its significance. blur to embellish the frame to be ―draw-like‖. After we enter the kingdom of other dimension, we often When all the frames are generated and be filled with colors start immersing in the beauty of ambiguityr while thinking via the above process, an interactive ―digital poem‖ with a about the multilevel of possibility. Such pattern forms video form is thus produced. Fig. 3 is the video file before cognitive approach to reflect the nature and details of things, transformation, play with Quick Time. And Fig. 4 is the while estimating the length and scale of seemingly familiar frame after transformation. yet strange surrounding sceneries,a giving a little taste of such inspiration. As Claude Levi-Strauss said, ―Our eyes have lost the ability to distinguish and we no longer know how to treat things.‖E Subjective regularity helps us gain insight to streamer and freeze in true cleverness. Our creation no longer belongs to part of theories and we can unlimitedly slow down our pace. III. DIGITAL POEM

A system (written in the Processing programming language) is established where the digital poem is generated via the interaction between a video film and a text-based poem. In other words, the system acquires two kinds of inputs: (1) a video file which was produced by the artist before, and (2) a modem poem which was written by the artist. The poem P a g e | 118 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Fig.3. The video file before transformation.

vertex, the u is horizontal coordinate for the texture mapping and the y is vertical coordinate for the texture mapping.

Fig.6. The film draws on image texture. Fig.4. The frame after transformation (Pixels in this frame were replaced by w texts in poem). About the development environment, we use a PC with Pentium(R) Dual-Core 2.6GHz CPU, Logitech Orbit as the webcam, which captures 30e frames per second. The frame IV. IMPLEMENTATION size is 640×480. The distance between the webcam and the postcard is 50 centimeters. The AR marker on postcard is . 4.55 cm in length and iwidth respectively. The interactive content of the work contains video. Each poem from postcard matches a virtual digital poem. Then, the audience can interact with postcard by directly manipulatingV it. Figure 7 is the example of postcard. Each postcard corresponds to a video of digital poem. The total numbers of postcards and videos are both 12. Shown in figure 8 are all of the 12 postcards.

y

The webcam captures video of the real world andl sends it to the computer. The system searches through each video frame for any square shapes of black color. If a square is found, the system uses some mathematicsr to calculate the position of the webcam relative to the black square. Once the position of the webcam is determined, a film of digital poem is drawn from that same position. This film of digital poem is drawn on top of the videoa of the real world and so appears stuck on the square marker. The final output is shown back, and displayed via the projector. Therefore, when the audience looksE through the display they see film of digital poem overlaid on the real world. Figure 6 shows the digital poem presentation based on our system written in Processing language. We create image textures and corresponding vertices. Then, the four vertexes

of film will match the four ones of Image Texture and film will be drawn on the Image Texture. There are four vertices in Image Texture. They are expressed as vertex (x, y, u, v). Fig.2. (1) The postcard of back is AR marker. (2) The postcard of front is The x is coordinate of the vertex, the y is coordinate of the poem.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 119

w

e

i

V

y

l

r

a

E

Fig. 8. The total postcards P a g e | 120 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

.The element of presentation includes the webcam interact with the AR digital poem via internet. The artwork embedded to the lamp, reading-desk, projector and white setting and operation is easy. Audiences only need to set up wall in exhibition. Figure 9 is the presentation of artworks. a webcam, with no additional hardware requirement. In There are several postcards on reading-desk and the lamp is comparison to other AR equipment, the cost of this work is installed at a higher position in order to present a broader quite low. In the future, some usability evaluation will be view. The audience ―read‖ the content of these poem by performed on this work.

VII. REFERENCES manipulating the postcards, so that the digital films ―hidden‖ behind the AR markers will be displayed. The installation of 1) Lev, M. (2001). The Language of New Media. this artwork is show in figure 10. Massachusetts: MIT Press. 2) Kirk, V., and Gopnik, A. (1990). High and Low: Modern Art and Popular Culture. New York: Museum of Modern Art. 3) Oliver, G. (2003).Virtual Art. Massachusetts: MIT Press. 4) Zucker, S. D. (1997). The Arts of Interaction: Interactivity, Performativity and Computers, Journal of Aesthetics and Art Criticism (Special Issue on Art and Technology),w 55(2), 17-127. 5) Milgram, P., and Kishino, F. (1994). A Taxonomy of Mixed Reality Visual Displays. IEICE Trans. Information Systems,e E77-D(12), 1321-1329. 6) Azuma, R.(1997). A Survey of Augmented Reality. Presence: Teleoperators and Virtual Environments, 6, 355-385. i 7) Azuma, R., Baillot, Y., and Behringer, R. (2001). Recent advances in augmented reality. IEEE Computers and Graphic, 21, 34–47. 8) Dünser,V A., and Hornecker, E. (2007). Lessons from an AR book study. In: Proceedings of the First International Conference on Tangible and Embedded Interaction, 179-182. 9) Liarokapis, F., Petridis, P., Lister P.F., and White, Fig.9. The presentation of artworks. M. (2002). Multimedia Augmented Reality y Interface for E-learning (MARIE). World Trans. on Engineering and Technology Education, 1(2), 173- l 176. 10) Billinghurst, M., Kato, H., and Poupyrev, I. (2001). The MagicBook: A transitional AR interface. r Computer and Graphic, 25, 745-753. 11) Kirner, C., and Zorzal, E.R. (2005). Educational applications of augmented reality collaborative environments. Proceedings of sixteenth Brazilian a Symposium on Informatics in Education, 114-124. 12) Hsieh, M.C., and Lee, J.S. (2008). AR marker capacity increasing for kindergarten English E learning. International Multiconference of Engineerings and Computer Scientists, 663-666. 13) Kato, H., Billinghurst, M., Blanding, B., and May, R. (1999). ARToolKit. Technical Report (Hiroshima City University). Fig.10. The audience interacts with AR digital poem.

VI. CONCLUSION

In this work, we employ augmented reality technologies to

create digital artworks to present interactive poem. This

artwork was exhibited in Digital Art Center, Taipei, Taiwan,

and the exhibition duration is from March 6, 2010 to April

11, 2010. We will extend this work so that the audiences can

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 121

AUTOCLUS: A Proposed Automated Cluster Generation Algorithm

Samarjeet Borah1 Mrinal Kanti Ghose2

Abstract-Among all kind of clustering algorithms partition that an objective function is optimized. Partition based based and hierarchical based methods have gained more clustering algorithms try to locally improve a certain popularity among the researchers. Both of the methods have criterion. The majority of them could be considered as their own advantages and disadvantages. In this paper an greedy algorithms, i.e., algorithms that at each step choose attempt has been made to propose a new clustering algorithm the best solution and may not lead to optimal results in the which includes the selected features of both of the algorithms. While developing the proposed methodology, emphasis has end. The best solution at each step is the placement of a been given on some of the disadvantages of both categories of certain object in the cluster for which the representative algorithms. The proposed algorithm is tested with various point is nearest to the object. This family of clustering datasets and found satisfactory results algorithms includes the first ones that appeared in the Data Keywords-Clustering, Partition, Hierarchical, Automatic, Mining Community. The most commonlyw used are K-means Distance Measure. [7], PAM (Partitioning Around Medoids), CLARA (Clustering LARge Applications) and CLARANS I. INTRODUCTION (Clustering LARge ApplicatioNS). All of them are applicable to data sets with numericale attributes. here is huge amount of data in the world and it is T increasing day by day. Everyday new data are collected B. Hierarchical Clustering Algorithms and stored in the databases. To obtain implicit meaningful i information from the data the requirement of efficient Hierarchical algorithms can be agglomerative (bottom-up) analysis methods [1] arises. If a data set has thousands of or divisive (top-down). Agglomerative algorithms begin entries and hundreds of attributes, it is impossible for a with each element as a separate cluster and merge them in human being to extract meaningful information from it by successivelyV larger clusters. Divisive algorithms begin with means of visual inspection only. Computer-based data the whole set and proceed to divide it into successively mining techniques are essential in order to reveal a more smaller clusters. Hierarchical algorithms have two basic complicated inner structure of the data. Such techniques are advantages [4]. First, the number of classes need not be the clustering solutions which help in extracting information specified a priori, and second, they are independent of the from the large dataset initial conditions. However, the main drawback of hierarchical clustering techniques is that they are static; that II. CLUSTERING y is, data points assigned to a cluster cannot move to another Clustering [2][3][4] is a type of unsupervised learning cluster. In addition to that, they may fail to separate method in which a set of elements is separatedl into overlapping clusters due to lack of information about the homogeneous groups. Intuitively, patterns within a valid global shape or size of the clusters [8]. In hierarchical cluster are more similar to each other than they are to a clustering, the output is a tree showing a sequence of pattern belonging to a different cluster.r The variety of clustering, with each cluster being a partition of the data set techniques for representing data, measuring similarity [9]. between data elements, and grouping data elements has III. AUTOMATED CLUSTERING (AUTOCLUS) produced a rich and often confusinga assortment of clustering methods. Clustering is useful in several exploratory pattern- This work has been motivated by the issues mentioned analysis, grouping, decision-making, and machine-learning above. Although the above algorithms are well established situations, including data mining, document retrieval, image and quite efficient one but their particular drawbacks may segmentation, and E pattern classification [5][3]. Data affect the clustering result. For example, many of these clustering algorithms can be hierarchical or partitional [6]. algorithms require the user to specify input parameters Within each of the types, there exists a wealth of subtypes where wrong input parameter may result in bad clustering. and different algorithms for finding the clusters The algorithm AUTOCLUS has been proposed keeping some of the issues in mind faced in the above algorithms. A. Partition Based Clustering Methods It‘s a hybrid kind of algorithm which includes some features Given a database of n objects, a partition based [5] of both partition based and hierarchical based algorithms. clustering algorithm constructs k partitions of the data, so A. Proposed Methodology ―Autoclus‖ ______

Let the data set be given as 푿 = {풙풊, 풊 = ퟏ, ퟐ … 푵} which About-*Department of Computer Science & Engineering, Sikkim Manipal consists of N data objects 풙 , 풙 … 풙 ., where each object Institute of Technology Majitar, Rangpo, East Sikkim-737136, India ퟏ ퟐ 푵 (e-mail;[email protected] [email protected]) has M different attribute values corresponding to the M P a g e | 122 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

different attributes. The value of i-th object can be given by: algorithm follows the reproduction process of amoeba, 푿풊 = {풙풊ퟏ, 풙풊ퟐ … 풙풊푴} which is a tiny, one celled organism. Amoebas reproduce by Let the relation 풙풊 = 풙풌 does not mean that 풙풊 and 풙풌 are binary fission. A parent cell divides the nucleus also divides the same objects but that the two objects has equal values in a process called fission and produces two smaller copies for the attribute set 푨 = {풂ퟏ, 풂ퟐ, … ., 풂풎}. The main of it. objective of the algorithm is to partition the dataset into k The same phenomenon has been followed here. A cluster disjoint subsets where 풌 ≤ 푵. The algorithm tries to will be divided into two smaller clusters selecting a new minimize the inter-cluster similarity and maximize the intra- mean value. This new means value will be selected at the cluster similarity. furthest Euclidian distance of the current mean. Then the B. Distance Measure rest of the objects will be redistributed among the two While searching a certain structure from given data sets, the means (the old mean and the newly selected mean). important thing is to find an appropriate distance function. E. Cluster Validation Phase In this context the most important question is what should be the criterion for selecting an appropriate distance This is the most important part of the algorithm. Whether function. For distance calculation the distance measure sum the newly generated cluster is a stable cluster or not will be of square Euclidian distance is used in this algorithm. It checked by this Cluster validation phase. Within group sum aims at minimizing the average square error criterion which of square has been taken as the criteria for cluster validation. is a good measure of the within cluster variation across all If the total WGSSE of the newly generated clusters is the partitions. Thus the average square error criterion tries to smaller than the parent cluster‘s WGSSE,w then the clusters make the k-clusters as compact and separated as possible. are valid. Otherwise the newly generated clusters will be The algorithm combines the features of both hierarchical discarded and the clustering process will be stopped here. and partition based clustering. It creates a hierarchical F. The pseudoe code for AUTOCLUS decomposition of the given set of data objects. At the same time it tries to group the objects based on a mean value. It is 1. Take an initial data set D. a simple algorithm which applies a top-down or divisive i 2. Compute Grand Mean: CALCULATE_GM(D). approach. Initially all the objects of the dataset will be 3. Find the object with mean value closest to GM and assumed as a single cluster. The algorithm applies an call it Cluster_Head1. iterative process to divide the given dataset into a set of 4. Assign points to the cluster ASSIGN_PT (X, C) clusters until the termination condition converges. The V // X = {xi, i = 1, 2 … N}, classification is done based on the popular clustering // C = {c , c … c } where k ≤ N criterion within-group sum of squared error (WGSSE) 1 2 k function. 5. SS ∶= CALCULATE_WGSS (M, C). 풏 푴 ퟐ 푾푮푺푺푬 = 풙풊풋 − 풙풊 // M= {m1, m2 … mk}where k ≤ n 풊=ퟏ 풋=ퟏ y 6. Repeat the following steps while The classical WGSSE function was originally designed to WGSSE_of_Parent > define the traditional hard c-means and l ISODATA (푇표푡푎푙_푊퐺푆푆퐸_표푓_퐶ℎ 푖푙푑푠) algorithms. With the emergence of fuzzy sets theory, Dunn a. Obtain Euclidian Distance(ED) from all [10] firstly generalized WGSSE to square weighting other objects to Cluster_Head1. WGSSE function. Later, Bezdek [11] r extended it to an b. Select the object at the largest Euclidian infinite family of criterion functions which formed a distance of Cluster_Head1. universal clustering objective function of fuzzy c-means (FCM) type algorithms. The studies on criterion functions c. Name the object at largest distance as a Cluster_Head2. have mainly been focused on the measurements of similarity or distortion D (.), which are often expressed by the d. Rename the Cluster_Head1 as distances between the samples and prototypes. Different Cluster_Head1.1 distance measurementsE are used to detect various structural e. Reassign objects around Cluster_Head1.1 subsets. and Cluster_Head2. f. Calculate WGSSE for the C. Phases of the Algorithm Cluster_Head1.1 and The algorithm works in two different phases. One is the Cluster_Head2 (SS1 & 푆푆2). cluster generation phase and the other is the cluster g. If validation phase. The various phases of the algorithm work WGSSE_of_Parent > as follows: (푇표푡푎푙_푊퐺푆푆퐸_표푓_퐶ℎ 푖푙푑푠) then the child clusters will be accepted else D. Cluster Generation Phase discarded. This phase involves in the formation of new clusters by h. Go to step 6 and repeat the whole process grouping the objects around a new mean value. The for accepted new clusters.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 123

IV. IMPLEMENTATION & RESULTS appropriate number of clusters without sound domain The algorithm has been implemented in C using a synthetic knowledge in prior. Again, hierarchical methods suffer from data set having 10 dimensions. The data set consists of real a fact that once a step (merge/split) is done, it can never be values including both positive and negative. It is almost undone. This algorithm overcomes that problem. But it similar to a gene expression data set. The program has increases the computation cost because of the cluster different procedures for the implementation of various validity process. In the development of the AUTOCLUS elements of the algorithm. For example the procedure algorithm, it has been tried to minimize these drawbacks as compute_mean_grandmean() has been used to compute the much as possible. From the experiments it has been found grand mean of the dataset. Again the procedure wgsse_cal() that the algorithm is working properly with minimum user has been used to calculate the within group sum of square of interaction. The number of clusters to be generated need not a cluster and computationss() has been used to generate the to be entered in prior. Again, in the cluster validation phase clusters. It‘s a top-down approach. The clusters that have it can automatically accept or discard clusters based on the been generated are uniquely identified by a cluster number. criteria function. The algorithm is tested with datasets of The numbering of the clusters have been done in such a way varying size and found satisfactory result. that the level of the cluster in the sub-tree can be found out from the number itself. For example 0 is the number VI. REFERENCES assigned to the root of the tree, which is the cluster 1) Yi Jiang, Efficient Classification Method for Large containing all the nodes of the dataset, 00 is assigned to the Dataset, School of Informatics,w Guangdong Univ. left sub-tree, 01 is assigned to the right sub-tree and so on. of Foreign Studies, Guangzhou Applying the algorithm on the given dataset finally six 2) Alexander Hinneburg, Daniel A. Keim, Clustering clusters have been found. The tree of the clusters generated Techniques for Larges Data Sets from the Past to can be shown as below: e the Future,

0 3) A.K. Jain (Michigan State University), M.N. Murty

(Indian Institutei of Science) and P.J. Flynn (The

Ohio State University), Data Clustering: A Review, 00 01 4) Lourdes Perez, Data Clustering, Student Papers, 000 001 010 011 University of California San Diego, http://cseweb.ucsd.edu/~paturiV /cse91/papers.html 5) Raza Ali, Usman Ghani, Aasim Saeed, Data

0000 0001 0010 0011 0100 0101 Clustering and Its Applications, http://

members.tripod.com/asimsaeed/paper.htm,

Fig 1: The Tree of the Clusters Generated by AUTOCLUS 6) Pavel Berkhin, Survey of Clustering Data Mining A small portion of the result of AUTOCLUS is given in the Techniques, Accrue Software Inc, San Jose, CA, following figure. The pair of centroids mentioned there are thosey (2002). 7) McQueen J.B. Some methods for classification and Cluster No. 0: 0 8.00 4.30 9.00 7.10 9.10 6.00 2.30 ) ( 2.00 5.00 11.00l 88.00 3.00 analysis of multivariate observations. In 5.00 66.00 22.00 7.77 4.30 ) ( -3.00 9.00 -2.00 -4.00 0.90 5.00 6.00 Proceedings of the Fifth Berkeley Symposium on 1.00 1.10 4.50 ) ( 5.00 1.00 0.20 9.00 0.40 -8.00 -1.00 -4.00 -5.00 Mathematical Statistics and Probability, volume 1, 6.30 ) ( 8.00 9.00 2.00 1.00 6.00 -4.00 7.00 6.10 9.00 0.80 ) ( 9.00 7.80 pages 281–297, Univ.of California, Berkeley, 9.00 6.10 -9.00 -3.00 0.50 8.00 9.00 -9.00 ) ( r-1.00 -3.00 6.00 7.00 3.00 3.00 5.70 55.50 32.50 -66.00 ) -->Centroid: (6.16 , 21.41) 1967. Univ.of California Press, Berkeley. Cluster No. 00: 8) A.K. Jain (Michigan State University), M.N. Murty ( 5.00 5.00 6.00 8.00 -9.00 4.40 22.00 8.50 7.70 4.00 ) ( 5.40 6.00 (Indian Institute of Science) and P.J. Flynn (The 2.10 8.00 4.30 9.00 7.10 9.10 6.00 2.30a ) ( - 3.00 9.00 -2.00 -4.00 0.90 Ohio State University) Data Clustering: A Review. 5.00 6.00 1.00 1.10 4.50 ) ( 5.00 1.00 0.20 9.00 0.40 -8.00 - 9) I .Murtagh, A Survey of Recent Advances in 1.00 -4.00 -5.00 6.30 ) ( 8.00 9.00 2.00 1.00 6.00 -4.00 Hierarchical Clustering Algorithms, Oxford 7.00 6.10 9.00 0.80 ) Journals, The Computer Journal, Vol. 26, No. 4, ( 9.00 7.80 9.00 6.10E -9.00 -3.00 0.50 8.00 9.00 -9.00 ) pp. 354-359. ( -1.00 -3.00 6.00 7.00 3.00 3.00 5.70 55.50 32.50 - 10) Dunn, J. C., A fuzzy relative of the ISODATA 66.00 ) -->Centroid : (2.84 , 4.27) process and its use in detecting compact well

which will be considered for the next decomposition of the cluster. separated cluster, J. Cybemet, 1974, 3: 32. Fig 2: Results from AUTOCLUS implementation 11) Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithms, New York: V. CONCLUSION Plenum Press, 1981 Partition based clustering algorithms face a problem that the number of partitions to be generated has to be entered by the user. Generally, algorithms of K-means family face this problem. As a result the clusters formed may not be upto mark. Because, it is difficult for a user to select the P a g e | 124 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Implementing Search Engine Optimization Technique to Dynamic / Model View Controller Web Application

R. Vadivel1 Dr. K. Baskaran2

Abstract-the main objective of this paper is implementing the familiarity with basic HTML. SEO is sometimes also called search engine optimization to dynamic web application / Model SEO copyrighting because most of the techniques that are Viewer Controller web application. Implement the SEO used to promote sites in search engines deal with text. concepts to both applications static and dynamic web Generally, SEO can be defined as the activity of optimizing application. There is no issue for create SEO contents to static Web pages or whole sites in order to make them more (web contents does not change until that web site is re host) web application and keep up the SEO regulations and state of search engine-friendly, thus getting higher positions in affairs. A few significant challenges to dynamic content poses. search results. w To overcome these challenges to have a fully functional A Search Engine Optimization (SEO) is very popular term dynamic site that is optimized as much as a static site can be in web application industry. We can implement the SEO optimized. Whatever user search and they can get information concepts to both applicationse static and dynamic web their information quickly. In that circumstances we are using application. No matter for implement SEO to static web few search engine optimization dynamic web application application. We have just followed up the SEO rules and methods such as User Friendly URL’s, URL Redirector and conditions. We have ito implement to dynamic / MVC web HTML Generic and few other SEO methods and concepts such application it should be an insignificant complicate and use as a crawler, an index (or catalog) and a search interface, search engine algorithms and page rank algorithms. Both some tricky. internal and external elements of the site affect the way it’s The specific objective is to implement search engine ranked in any given search engine, so all of these elements optimizationV for model 1 and model 2/Model Viewer should be taken into consideration. Controller (MVC) dynamic web applications. There is no Keywords— Search Engine Optimization (SEO), Model Viewer specified web technology in dynamic web applications. We Controller (MVC), Dynamic web, Friendly URLs, ASP.Net can use any Microsoft or any other corporation software. In I. INTRODUCTION my work .NET has played major role. To understand dynamic content, it's important to have an f we have a website, we definitely need it to be a friendy of idea of its opposite, static content. The term static content I search engines. There are several ways to attract visitors refers to web content that is generated without using a data to our website, but in order to make searchers knowl about source such as a database. Essentially, the site viewer sees our website, search engine is the tool where we need to exactly what is coded in the web page's HTML. prove our contents. If we are just having a static HTML With dynamic pages, a site can display the same address for content, then there is no much problem inr promoting it. But every visitor, and have totally unique content for each one to where in today‘s world of Content Managed Websites and view. For example, when I visit the social networking site eCommerce Portals we need to look further and implement a Facebook (facebook.com), I see few more techniques in order to make the site more http://www.facebook.com/home.php as the address in my prominent to robots. In this articlea we will discuss how we web browser, but I see a unique page that's different from can develop a SEO Friendly website where the content is what anyone else sees if they view that page at the same driven from the Database with a Content Management time. Thesite shows information about my friends in my System which is developed using ASP.NET. We will learn account, and different information for each person in his to build a simple CMSE driven site with no nonsense URL, account, or for someone who has no account. which Search Engines invite. Search Engine Optimization (SEO) is often considered the more technical part of Web Not all dynamically generated content is unique to every marketing. This is true because SEO does help in the viewer, but all dynamic content comes from a data source, promotion of sites and at the same time it requires some whether it's a database or another source, such as an XML technical knowledge – at least file ______A. SEO in web application About-1Computer Science, Karpagam University Pollachi Road, Eachanari, Coimbatore, Tamilnadu India 641 024 A web application has playing most important role in the (e-mail;[email protected]) online business. About-2Asst. Professor (RD), Dept. of CSE and IT, Govt. College of Technology, Coimbatore – 641 006

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 125

A million of static and dynamic web pages are available in Because dirty URLs are long and complex, they are the internet and million users can have used those web pages difficult to repeat or remember and provide few clues for for their required information. average users as to what a particular resource actually In this circumstances search engine optimization is play contains or the function it performs. most important play between user and web applications. In Million web pages are available the user should need D. Dirty URLs are a security risk their specific search criteria such as business man have The query string which follows the question mark (?) in a search the own needs, students have search their own needs dirty URL is often modified by hackers in an attempt to and etc., perform a front door attack into a web application. The very Our aim is whatever user search and they can get file extensions used in complex URLs such as .asp, .jsp, .pl, information their information quickly. In that situation we and so on also give away valuable information about the are using few search engine optimization methods and implementation of a dynamic web site that a potential concepts such as a crawler, an index (or catalog) and a hacker may utilize. search interface, search engine algorithms and page rank algorithms. E. Dirty URLs impede abstraction and maintainability Search engines take advantage of reverse broadcast Because dirty URLs generally expose the technology used networks to help save you time and money. Search allows (via the file extension) and the parameters used (via the you to "sell what your customers want, when they want it! ―. query string), they do not promote abstraction. Instead of Search Engine Optimization is the science of customizing hiding such implementation details, dirtyw URLs expose the elements of your web site to achieve the best possible search underlying "wiring" of a site. As a result, changing from one engine ranking. That‘s really all there is to search engine technology to another is a difficult and painful process filled optimization. But as simple as it sounds, don‘t let it fool with the potential for brokene links and numerous required you. redirects. Both internal and external elements of the site affect the III. RELATED WORKS way it‘s ranked in any given search engine, so all of these i elements should be taken into consideration. Good Search There is a three technologies have been used that is 1. User Engine Optimization can be very difficult to achieve, and Friendly URL‘s, 2. URL Redirector and 3. HTML Generic. great Search Engine Optimization seems pretty well A Model View Controller has been used Microsoft .NET impossible at times. web applicationV with ASP.NET and C#. In this application Optimization involves making pages readable to search has used data model and business layer in separate module engines and emphasizing key topics related to your content. and its like a DLL (Dynamic Link Library) and we have Basic optimization may involve nothing more than ensuring started to created and converted dynamic URL‘s into Static that a site does not unnecessarily become part of the URLs. invisible Web (the portion of the Web not accessible Fig 5 – Fig 8 is shows and implemented those technologies through Web search engines). y which are mentioned in early. The URLs converting code first we must grab the incoming URL and split the extension II. EXISTING SYSTEM l of the page. Which pages have ―.html‖ extension we should Previously SEO have implemented in static commercial / redirect that page to related ―.aspx‖ page on code behind non-commercial web sites. In this way there is no they have executed business logic or data manipulation or dynamically site map and have not well-definedr RSS feed whatever functionality need, and display to the end user for those implementations and there is no specific way to exact content for that particular page with proper Meta find the back links. description and keywords. In this time of period user can only view ―.html‖ page but all other logics will execute the A. Dirty URLsa code behind. Complex, hard-to-read URLs are often dubbed dirty URLs because they tend to be littered with punctuation and A. Dynamic Content and SEO identifiers that are at best irrelevant to the ordinary user. SEO for dynamic content poses a few significant URLs such E as http://www.example.com/cgi- challenges. Luckily, you have ways to overcome these bin/gen.pl?id=4&view=basic are commonplace in today's challenges to have a fully functional dynamic site that is optimized as much as a static site can be optimized. This dynamic web. Unfortunately, dirty URLs have a variety section discusses the pitfalls of dynamic sites, and how to of troubling aspects, including: overcome them to create fully optimized dynamic sites. B. Dirty URLs are difficult to type B. Challenges for Optimizing Dynamic Content The length, use of punctuation, and complexity of these Here are some common areas of dynamic sites that URLs makes typos commonplace. provide setbacks for humans as well as search engine spiders. C. Dirty URLs do not promote usability P a g e | 126 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

1) Dynamic URLs their appearance. Here's an example from http:// A Dynamic URL is an address of a dynamic web page, as www.financialadvisormatch.com/ article for a product called opposed to a Static URL, which is the address of a static Kindle: web page. Dynamic URLs are typically fairly cryptic in http://www.financialadvisormatch.com/article/product/B00 While search engines may not have problems indexing 0FI73MA/ref=amb_link_7646122_1?pf_rd_ URLs with variables, it's important to note that highly m=ATVPDKIKX0DER&pf_rd_s=center- descriptive URLs like the one just shown can get more 1&pf_rd_r=1FYB35NGH8MSMESECBX7&pf_rd_t=101 clicks in searches than cryptic URLs.if searchers can clearly &pf_rd_p=450995701&pf_rd_i=507846 see keywords that have to do with the content they're Notice that the URL doesn't contain any information about looking for in your page's URL. the item's product type, or anything about the item's name. For a well-trusted site like Amazon, this is not a problem at 2) Logins and other forms all. But for a new site, or for a site that's gaining credibility Login forms can restrict access to pages not only to users, and popularity, a better solution can help search results by but also search engines. In some cases, you want pages showing a searcher some relevant keywords in the page's behind logins made searchable. In those cases, you can place URL. Here's an example of something a little more code in those pages that determines whether the person effective: visiting has access to view that content, and determine what http://www.financialadvisormatch.com/article/products/ele to do from there. ctronics/kindle/ w

e

i

V

y

l

r

a

E Fig – 1 Login and search engine validations

3) Cookies Other web forms, referring to content in

tags, can restrict access to pages as well. While Google has revealed Web cookies are small bits of data that are stored in a user's that googlebot can go through simple HTML forms (see web browser. Cookies are used frequently on the Web for http://googlewebmastercentral.blogspot.com/2008/04/crawli storing temporary data like shopping cart information or ng-through-html-forms.html), not all search engines follow user this same process, which means content hidden behind forms may or may not be indexed.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 127

Fig – 2 Add cookies with help of class library preferences. Pages that require cookies can block spiders because spiders don't store cookies as web browsers do. 4)Session IDs

w

e

i

V

Fig – 3 Create a sessionyid and store into database Session IDs are similar to cookies in that if you need them architecture. For example, a page more than three clicks to view pages, then spiders don't index those pages.l deep from the home page of a website may not be crawled 5) Hidden pages without an XML sitemap. Other pages that may be hidden Sometimes, pages on a website are hidden from search include pages only visible via a site search. engines because they're buried too deep inr a site's 6) JavaScript

a

E

Fig – 4 Include javascript into web form with master page

Search engines don't index content that requires full-featured same way as you would if you were using a browser with JavaScript. Remember that spiders view content in much the JavaScript disabled. Text that is created using JavaScript, P a g e | 128 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

and therefore only accessible with JavaScript enabled, will guidelines when using dynamic content in order to keep not be indexed. your site optimized. Here are some things you can do to optimize your sites that contain dynamic content. C. Ways to Optimize Dynamic Content Dynamic content is often necessary in websites. In addition, 7)Creating static URLs content that is easily changed through an outside data source Dynamic URLs, especially dynamic URLs with vague helps keep a site's content fresh and relevant. This increases names, can be a turnoff to searchers. In order to have its value to search engines. You don't need to worry that friendly URLs, you want to rewrite your dynamic URLs as because your site is dynamic, your content won't be indexed. static URLs. You just need to make sure you're following the appropriate

] w

e

i

V

y

l

r Fig – 5 Creating / converting dynamic into static web application (Coding Part I)

Blogs powered by wordpress or Blogger make it easy to administrator account, and then, under Settings, click the convert dynamic links to statica links. Blogger automatically Permalink button. From there, you simply select a static creates static URLs, and with wordpress you need only a URL publishing method or create a custom one and save the simple change in your settings. For wordpress, log in to your changes. Nice!

E

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 129

w

e

i

Fig – 6 Creating / converting dynamic into static web application (Coding Part II) V If your site isn't powered by a blogging application, you doing before using these techniques on a production server. need to rewrite the URLs manually. The process is To test this process on a testing server, you can download somewhat complex, and it requires modifying your .htm and install a testing server (discussed in Chapter 4), and then access file. Because modifying your .htm access file can download all or part of your website to your computer. That cause permanent changes to your website, you want to either way, changes you make on your local computer don't affect practice on a testing server or know exactly what you'rey your live site.

l

r

a

E

P a g e | 130 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Fig – 7 Creating / converting dynamic into static web application (Coding Part III)

.

w

Fig – 8 Creating / converting dynamic into static web application (Coding Part IV)e taking that content out of the restricted area so it can be 8). Optimizing content hidden by forms indexed. i The fact that web forms can hide content can be a good thing, but sometimes forms hide content you may not want IV. RESULTS hidden. Login forms (forms that require a user name and Successfully implemented search optimization in model password) can potentially block search engines if a login viewer controller web application with help of those form is the only way to access that information. Of course, V technologies. Here show the few mock-up screen shots. sometimes this feature is intentional, like for protecting bank account information on a banking site. For non-login forms, Fig – 9 has been displayed ―ASPX‖ page of the blog / assuming that search engines index content that's accessible forum / articles. only by filling out text fields or other form elements is Fig – 10 has searched the specific keywords in google dangerous. Further, it's equally dangerous to assume that search engine and showed results for that ―ASPX‖ blog / search engines don't index content that's accessible onlyy via forum / articles. non-login forms. If you want your form's hidden content to Fig – 11 and 12 has show that blog / forum / articles in be indexed, make sure to give access to it in waysl other than HTML format. through a form alone. If you don't want the content to be indexed, make sure to hide it from search engines via Hence we have successfully implemented Search Engine robots.txt, or some other method r Optimization technique for model view controller web Typically, content that's viewable only after a user is logged application into an account isn't necessary to index. If you have content that you want indexed hidden ina a login-only area, consider

E

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 131

Fig – 9 ASPX page for article/forum/blog

Fig – 10 Search keywords to google w

e

i

V

Fig – 11 Search results from google (Results – I) y

l

r

a

E

Fig – 12 Search results from google (Results – II)

V. CONCLUSIONS application and their techniques such as URL Redirector, A Search Engine Optimization has been implemented in HTML Generic, .NET security tools. The proposed is dynamic web application. It has been used MVC web

P a g e | 132 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

implementing the multiple query searches and personalized 13) http://msdn.microsoft.com/en- concept based clustering. us/library/ms972974.aspx Most of the tips presented here are fairly straightforward, with the partial exception of URL cleaning and rewriting. All of them can be accomplished with a reasonable amount of effort. The result of this effort should be cleaned URLs that are short, understandable, permanent, and devoid of implementation details. This should significantly improve the usability, maintainability and security of a web site. The potential objections that developers and administrators might have against next generation URLs will probably have to do with any performance problems they might encounter using server filters to implement them or issues involving search engine compatibility. As to the former, many of the required technologies are quite mature in the Apache world, and their newer IIS equivalents are usually explicitly modelled on the Apache exemplars, so that bodes well.As to the search engine concerns, fortunately, Google so far has not shown any issue at all with cleaned URLs. At w this point, the main thing standing in the way of the adoption of next generation URLs is the simple fact that so few developers know they are possible, while some who do e are too comfortable with the status quo to explore them in earnest. This is a pity, because while these improved URLs may not be the mythical URN-style keyword always i promised to be just around the corner, they can substantially improve the web experience for both users and developers alike in the long run. V VI. REFERENCE

1) Kenneth Wai-Ting Leung, Wilfred Ng, and Dik Lun Lee, ―Personalized Concept-Based Clustering of Search Engine Queries‖, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 20, NO. y 11, NOVEMBER 2008 2) AARON MATTHEW WALL,‖Searchl Engine Optimization‖, JUNE 2008 3) Ernest Ackermann & Karen Lartman, ―The information Specialist Guide r to Searching & researching on the Internet and World Wide Web‖, Fitzroy Dearbon Publishers, 1999. 4) R.Elmasri and S.B. Navathe, ―Fundamentals of Database Systems‖, 2nda Edition, Menlo Park, CA: Addison- Wesley 1994. 5) Jeff Ferguson, Brian Patterson, Pierre Boutquin ― C# Bible‖, John Wiley and Sons, June 2002 E 6) Wei Meng Lee, ―C#.net Web Developer's Guide‖, Syngress, January 1970 7) Jose Mojica, C# Web Development for ASP.NET, Peachpit Press, March 2003 8) http://www.macronimous.com/ 2009 9) http://www.seochat.com/ 2009 10) http://www.webtop.com.au/seo 2009 11) http://www.seocompany.ca/seo/seo- techniques.html 12) http://searchengineland.com/21-essential-seo-tips- techniques-11580

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 133

Eye detection in video images with complex Background

M.Moorthi1 Dr. M.Arthanari2 M.Sivakumar3

Abstract-Detection of human eye is a significant but difficult o determining a region candidate that contains an eye in the task. This paper presents an efficient eye detection approach whole image, and then the symmetry is used for selecting for video images with complex background. The propose the pair of eyes. Although the detection rate is high, the method has two main phases to find eye pair such as locating assumption that the distance between the camera and the face and eye region and finding eye. In the first phase the novel person does not change greatly limits its practical approach to fast locating the face and eye region is developed. applicability. In the second phase eye finding directed by knowledge is introduced in detail. Both phases developed using Mat lab 7.5. In this paper, a knowledge-based algorithm for eye detection The proposed method is robust against moderated rotations, is presented. Knowledge guided eye contour searching clustered background, partial face occlusion and glass wearing. effectively improves the system accuracy. The so We prove the efficiency of our proposed method in detection of called regional image processing techniquesw are also used eyes complex background i.e. both indoor and outdoor both in the preprocessing step and in the detection algorithm environment. itself. They can reduce the influence caused by the Keywords-Face recognition, Facial features extraction, Eye illumination variations in the small region. Since the detection processing algorithms in thee second step only apply in the regions of small size, the efficiency of the system is I. INTRODUCTION improved i etection of eye is a crucial aspect in many useful This paper is organized as follows. Literature surveys are D applications ranging from face recognition and face given in section 2. In section 3 we will devote ourselves to detection to human computer interface design; model based discussing the knowledge-based eye detection method in video coding, driver behavior analysis, compression detail. ExperimentalV results are reported in section 4. techniques development and automatic annotation for image Conclusions will be drawn in section 5. data bases etc. By locating the position of the eyes, the gaze can be determined. A large number of works have been II. LITERATURE SURVEY published in the last decade on this subject of which the Pitas et. al. [8] uses thresholding in HSV color space for effectiveness are not satisfied due to the complexity of the skin color extraction. However, this technique is sensitive to problem. Yet an efficient and accurate method is toy be illumination changes and race. Ahuja et. al. [9], model found. Generally the detection of eyes is done in two phases: human skin color as a Gaussian mixture and estimate model locating face to extract eye regions and then eye detection. parameters using the Expectation Maximization algorithm. The face detection problem has been faced up withl different Yang et. al. [10] proposes an adaptive bivariate Gaussian approaches: neural network, principal components, skin color model to locate human faces. Baluja et. al. [13] independent components, and skin color based methods. suggests a neural network based face detector with Recently, methods based on boosting haver become the focus orientation normalization. Approaches such as this require of active research. The eye detection is done in the face exhaustive training sets. Huang et. al. [11] perform the task regions which have been already located [1], [2], [3]. Little of eye detection using optimal wavelet packets for eye research has been done, however,a on the direct search for representation and radial basis functions for subsequent eyes in whole images. Some approaches are based on active classification of facial areas into eye and non-eye regions. techniques: they exploit the spectral properties of pupil Rosenfeld et. al. [12] use filters based on Gabor wavelets to under near IR illumination. For example, in [4] two near detect eyes in gray level images. Pitas et. al., [8] adopt a infrared multiple lightE sources synchronized with the camera similar approach using the vertical and horizontal relief for frame rate have been used to generate bright and dark pupil the detection of the eye pair requiring pose normalization. images. Pupils can be detected by using a simple threshold Feng et. d. [13] employ multi cues for eye detection on gray n the difference between the dark and the bright pupil images using variance projection function. However, the images. In [5], iris geometrical information is used for variance projection function on a eye window is not very ______consistent. Compadelli et.al.[6] propose a binary template matching to find the feature image, searching for the two About-1 Assistant Professor, Department of computer Applications, Kongu Arts and Science College, Erode – 638 107, Tamil Nadu, India, eyes .However this method can not deal with large out plane phone:9842645643 (Email: [email protected]) face rotation because the structure of the eye region About-2 Prof. & Head, Tejaa Sakthi Institute of technology for Women, changes. Coimbatore – 641 659, (Email:[email protected]) About-3 Doctoral Research Scholar, Anna University,Coimbatore, (Email: [email protected]) P a g e | 134 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

III. PROPOSED METHOD D. Preprocessing Detection of the human eye is a very difficult task because In an ordinary face image, the contrast of the eye region is the contrast of the eye is very poor. Under this situation, a usually relatively weak. Laplacian operator is used in this good edge image is not to be obtained. However, it is found stage to enhance the contrast at edges. As this preprocessing that some eye marks have relatively much higher contrast, is applied directly in the eye region based on the image such as the boundary points between eye white and eyeball. situation in it, the edge information becomes more Besides this, eyes also have good symmetric characters. prominent. These marks can be used as knowledge to find the eye. E. Knowledge-oriented edge detection The propose method has two main phases to find eye pair The eye region image which is processed by Laplacian such as locating face and eye region and finding eye. In the operator is sensitive to the edge. Automatic thresholding to first phase the novel approach to fast locating the face and this image can conserve most of the edge information. Edge eye region is developed. In the second phase eye finding detection has the following steps. directed by knowledge is introduced in detail. The proposed 1) Locate big dark areas as the iris candidates using the method is robust against moderated rotations, clustered following properties on a right eye pattern: background, partial face occlusion and glass wearing. This i. The two dark areas have the similar area; is discussed one by one in the following section. ii. The line passing through the centre of the two dark areas A. Face locating is approximately Detecting the locations of human face in a scene is the first Parallel to the image x axis; and w step in the face recognition system. In this step the region of iii. The two dark areas are ellipse shaped. the face candidate is roughly estimated using histogram 2) Find the top and bottom points of each iris. Let the thresholding technique. To simplify segmentation we pointes are called (topx, topy)e and assume that there is only one face in the image and is to be (bottom,bottomy), respectively. located. The binary image B (x, y) consists of all active 3) Find the upper eyelid is staring from (topx, topy) towards left part and right parti of eyelid respectively using slop pixels which include eye features. Histogram smoothing and automatic thresholding techniques are employed in this calculation. Apply the following knowledge to determine stage to eliminate the noises in the image and select the whether the last point is corner points. threshold The distance between two corners is larger than that between theV points (topx, ropy) and (bottom, bottomy); and B. Eye region extracting Two corners are not lower than (bottomx ,bottomy). The purpose of this stage is to roughly extract the eye 5) Find lower eyelid ie. Illumination variation usually has region, which encloses two eyes from the face. The next eye greater effect to the lower eyelids than to the upper eyelids. detecting algorithm then will be applied only on this region. This makes the above algorithm not so effective to detect It therefore improves the efficiency of the system. The eye the low eyelids. However as the eye corners and points region extraction has the following steps: y (bottomx, bottomy) have been known, a Parabola which 1) Find the hair region from the binary image. passes through the three points for each eye can be found to 2) Identify the lower boundary of the hair region.l The left approximate each lower eyelid. and right ending points are denoted by; ledge and redge, IV. EXPERIMENTAL RESULTS respectively. The eye region is enclosed by ledge and redge called as E. r The proposed method was tested on the real video images. 3) Find a pair of dark areas in E that may represent the The video image of [480 x 640 pixels] of 75 different test locations of the eyes. This pair of dark areas should satisfy persons and has been recorded during several sessions at the following conditions: different places. This set features a larger variety of a illumination, background and face size. It stresses real world i. Eyes are situated on the line that is parallel to the line joining ledge and redge. constraints. So it is believed to be more difficult than other ii. Eyes are symmetric with respect to the perpendicular datasets containing images with uniform illumination and bisector of the line . E background. The eye pair can be selected successfully in iii. Eyes are situated below the eyebrows. most cases, no matter whether face patterns are in different Then the eye region can be extracted from the image. scale, expression, and illumination conditions. The eye location rate is 93.3%. Typical results of eye detection with C. Eye detection the proposed approach are shown in Fig.2, 3. The input As in the image processed through the first step part of the images vary greatly in background, scale, expression and eye information may have been lost, original eye region illumination, the images also including partial face image is used at this second stage. It can be obtained by occlusions and glasses wearing. The correct judgment rate applying the eye region coordinates extracted from previous testing is shown in Fig 1and the results shown in Fig 2.. section 3.1 and 3.2 to the original face image. Eye detection has the following two steps.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 135

Homomorphic filtering is applied to enhance the contrast of dark regions; therefore, facial images with poor contrast are enhanced. Eye pairs are extracted by using knowledge oriented eye detection technique. The proposed method can deal with glasses wearing and partial face occlusions. However, the eye detection will fail if the reflection of glasses is too strong. If the reflection of glasses is too strong, the eyes can not be extracted. Closed eyes will not influence the results of eye location. The advantage of this method is that its computational cost is very low. The eye form is then searched based on the knowledge. Regional image processing techniques are also used in this paper to enhance the edge details. They improve the reliability of the system. The whole system has been successfully applied to eye form searching and the results are promising

VI. REFERENCES

1) T. Kawaguchi and M. Rizon , ―Iris detection using intensity and edge information,‖w Pattern Recognition, vol. 36, pp. 549–562, 2003. 2) S. Baskan M. Bulut and V. Atalay, ―Projection based method fore segmentation of human face and its evaluation‖, Pattern Recognition Letters, Vol. 23, pp. 1623–1629, 2002. 3) S. Sirohey, i A. Rosenfiled, and Z. Duric, ―A

method of detection and tracking iris and (a) (b). (c) eyelids in video,‖ Pattern Recognition, vol. 35, pp. 1389–1401, 2002. 4) A.V Haro, M. Flickner, and I. Essa, ―Detecting and tracking eyes by using their physiological (d) (e) (f) properties, dynamics, and appearance,‖ in Fig 2: Real Image capture from video using web camera (a). Proc. IEEE Conf. computer Vision and Pattern Original Image. (b) Binary Image. (c.) Detection of binary Recognition, vol. 1, pp. 163– 168, 2000. eye.(d) both eyes (e) Left eye. (f) Right eye. 5) T. D‘Orazio, M. Leo, G.Cicirelli, and A. Distante, y ―An algorithm for real time eye detection in face images,‖ in Proc. 17th Int. Conf. on Pattern l Recognition, vol. 3, pp. 278–281, 2004. 6) P. Campadelli and R. Lanzarotti, ―Localization of facial features and fiducial points,‖ in Proc. Int. r Conf. Visualization, Imaging and Image (a) (b). (c) (d) Processing, pp. 491–495, 2002. 7) H. Rowley. S. Baluja. and T. Kanade. "Neural Network Based Face Detection," Proc., a lEEE Conf. on Computer vision and Pattern Recognition. San Francisco, CA, 1996, 203- (a) (b). (c) (d) 207. E 8) K. Sobattka and I. Pitas, ―A Novel Method for Automatic Face Segmentation. Facial Fig 3. Results using digital camera. (a). Original Image. (b) Feature Extraction and Tracking," Signal Both eyes (c) Left eye. Processing: Image Communication, 12(3) 1998, (d) Right eye 263-281.Ming-Hsuan Yang and Narendra Ahuja, V. CONCLUSION ―Detecting Human 9) Faces in Color Images," Proceedings of IEEE Int'l Fig 3. Results using digital camera. (a). Original Image. (b) conf on Image Processing, Chicago, IL, 1998, 127- Both eyes (c) Left eye. 1 30. (d) Right eye 10) Jie Yang, Weier Lu and Alex Waibel, ―Skin Color In this paper, an efficient method for detecting eyes in video Modeling and Adaptation", Proceedings of images with unconstrained background is presented. To ACCV‘98 (tech report CMU-CS-97-146, CS dept, obtain eye, the preprocessing is applied to input images. CMU 1997). P a g e | 136 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

11) Jeffrey Huang and Harry Wechsler, "Eye Detection Using Optimal Wavelet Packets and Radial Basis Functions", International Journal of Pattern Recognition and Artificia1 Intelligence, Vol. 13, No 7, 1999. 12) S. A. Sirohey and Azriel Rosenfeld, "Eye detection in a face image using linear and nonlinear filters", Pattern I. Recognition. Vol. 34, 2001, 1367-1391. 13) Guo Can Feng and Pong C. Yuen, "Multi-cues eye detection on gray intensity Image", Pattern Recognition. Vol. 34, 2001, 1033-1046

w

e

i

V

y

l

r

a

E

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 137

Multi-Layer User Authentication Approach For Electronic Business Using Biometrics

Machha. Narendar1 M.Mohan Rao2 M.Y.Babu3

Abstract-With the increased awareness of the dangers of heavily upon single authentication using password-based cyberterrorism and identity fraud, the security of information access control because of its low implementation cost and systems has been propelled to the forefront as organizations end-user convenience. In the case of Internet web sites that strive to implement greater access control. Access control for require user authentication to provide personalized information systems is founded upon reliable user information or sensitive data, there does not appear to be a authentication, and the predominant form of authentication remains the username-password combination. There are cost-effective alternative to the use of the username- several vulnerabilities associated with an over-reliance upon password pair for access control. While major corporations this method of authentication, stemming from weaknesses in have the financial ability to augment the security of their both the user construction of passwords and single-layer intranets and extranets through the use of smart cards and authentication techniques. The growing number of Internet token devices, it is impractical andw far too costly for business transactions highlights the need for a more secure business-to-consumer web sites to deploy this same method of user authentication that is cost-effective as well as technology. Such an effort would require the manufacture practical. A multi-layer user authentication scheme is proposed and shipment of a special token or smart card for every as a solution, incorporating the use of dynamic biometric data registrant on every web site.e It is in this context of Internet selection and a timestamp to guard against reuse of intercepted authentication bit streams. transactions and electronic commerce (ecommerce) where the issue of i information system security will be I. INTRODUCTION explored. The purpose of this study is to analyze in detail omputer-based information systems are becoming the inherent flaws in password-based user authentication and C increasingly vital to the su ccess of many businesses. propose a solution to address how managers can effectively As organizations move to adapt to the digital era, the increase informationV system security and maintain access quantity of confidential and sensitive data stored in control through an improvement in the user authentication electronic form on computers continues to increase at a schemes at the root of current information security policy blinding pace. The growing trend in customer relationship II. VULNERABILITIES OF PASSWORD-BASED USER management and the need to offer personalization through AUTHENTICATION customized information further adds to the wealth of data Information security is concerned with enabling authorized contained in these information systems. Since the y users to access appropriate resources and denying such compromise of the sensitive data stored in information access to unauthorized users. The ability to authenticate systems can be disastrous for an organization, the security of l valid users is therefore at the foundation of any access these systems is of utmost concern. Given that "practically control system. Passwords have long been the access control every penetration of a computer system, at some stage, relies method of choice for most organizations, in part because of on the ability to compromise a password‖ it seems rather r their low implementation cost and convenience to users. surprising that passwords continue to form the basis of user However, password-based authentication schemes have authentication methods. In the password-based scheme, severalvulnerabilities users of the information system are required to enter a 1. Weaknesses in Password Construction username password combination.a The username establishes 2. Ease of Password Compromise the identity of an individual as a valid user, and the 3.Alternatives to Password-Based Authentication password serves to confirm that identity and provide access When juxtaposed with the tendency of individuals to reuse to authorized resources. Passwords are inherently weak the same usernamepassword combinations across several security constructs dueE to the ability of hackers to guess web sites, these factors reveal the lack of security offered them through brute-force methods enabled by the processing by information systems that grant access based solely on the power of today's computers. Despite their knowledge of the input of a valid username-password combination. vulnerability posed by password-based user authentication systems, information systems managers continue to rely A. Weaknesses in Password Construction .______often contributes to its weakness as a security mechanism.

About-1Assistant Professor,HITS College of Engineering When passwords are generated by computer, they are ([email protected]) generally more secure at the expense of being harder to About-2Assistant Professor, Tirumala Engineering College remember. Userselected passwords, on the other hand, have ([email protected]) the added benefit of being easier to recall, but are usually About-3Assistant Professor, Aurora Engg College ([email protected] less secure in their construction. Case studies of real users P a g e | 138 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

over the past two decades reveal that characteristics of user- password or to have it reset. An intruder with knowledge of selected passwords have not changed significantly in a valid username can just as easily contact the help desk and response to publicity of information security breaches. obtain a working password, enabling him to then log in According to a study by Morris and Thompson (1979), more using the valid username-password combination. than 85 percent of user passwords were dictionary words; C. Alternatives to Password-Based Authentication words spelled backwards, names of people or places, or a sequence of numbers. Further investigated the The weaknesses associated with password-based access characteristics of user passwords, and examined issues such control can be overcome by taking advantage of alternative as password length, character composition, reset frequency, methods of user and the use of personal information as passwords. They authentication. According to Lui and Silverman (2001), the conducted a survey of a sample of computer users by using following three categories can be used for authentication an anonymous questionnaire that inquired about the users' purposes. password features. According to the study, they determined 1. Knowledge, "something you know" that the majority of passwords contained either five or 2. Possession, "something you have" six characters. Furthermore, their survey revealed the 3. Biometric, "something you are" following The first category includes information known by an 71.9% of respondent‘s passwords were between one and six individual, such as a password or personal identification Characters number (PIN). The second category consists of an 80.1% of passwords were composed of strictly alphabetic identifiable physical object that an w individual possesses, characters such as a smart card, token, or identification badge. The 79.6% of respondents never changed their passwords third category, known as biometrics, includes biological 65.2% of respondents had used personal information in their attributes specific to an individual.e The third category of password user authentication, called nbiometrics, involves an 35.3% of users wrote down their passwords nearby to identifier that cannot be misplaced or forgotten (Liu and remember them Silverman, 2001). Biometrici devices use physical, biological Computer users tend to choose passwords that are short, features to identify and verify individuals. A biometric can constant, and based on their surroundings or personal take many forms, and in fact, any physical or behavioral information. Studies of user-selected passwords highlight characteristic of a person that can serve to uniquely identify that the common features of the passwords have remained that personV can be considered a biometric. Examples include fairly consistent over time. The underlying reason for this a fingerprint, hand geometric pattern, iris, retina, voice, consistency is a matter of convenience for the end-users, signature, facial pattern, or even DNA (Liu and Silverman, who choose a password that will be simple and easy for 2001). Commonly accepted as evidence in law enforcement, them to remember. Hackers are undoubtedly aware of a biometric is considered by many people Regardless of the this trend and can subsequently focus their efforts to guess specific type of biometric used, the first step in user passwords. y implementing a biometric-based authentication system is generally to store a template of the biometric data of a user. B. Ease of Password Compromise l Upon a subsequent attempt to access the information The second vulnerability of the password-based user system, the user must present the required biometric authentication scheme is the ease with which hackers can identifier to a scanning or recording device, which then obtain these passwords. The common characteristicsr of the compares the input to the database of templates for a overwhelming majority of passwords are the primary reason potential match. The need to store a biometric template for that passwords are easily compromised. Knowing that most each user is, unfortunately, one of the major drawbacks of passwords are short, alphabetic or alphanumeric strings of biometric-based authentication systems. Scalability becomes characters, hackers can run brute-forcea attacks that use a major hurdle because the amount of time it takes a system dictionary words, proper names, and numbers to gain illegal to verify an individual increases significantly as more access to information systems. Passwords can also be templates are added and checked against everym input. obtained by interceptingE their transmission over Bandwidth likewise becomes a significant issue since the communications channels, a technique known as "password digitized data from biometric inputs can be quite large.To be sniffing". Given the findings of Zviran & Haga (1999) that the most secure method of uniquely identifying individuals. over a third of users write down their passwords nearby, a Biometric-based authentication is considered to be password in this case "is no longer something to be guessed extremely reliable, but like any authentication system, it is but becomes something to be located". If an observer is able not foolproof. It is necessary for information systems to see the user's password written down somewhere, or managers to decide upon an acceptable balance between the happens to see the user type the password, then the security false acceptance rate (FAR) and the false rejection rate of the password is subsequently compromised. Furthermore, (FRR) of a biometric system. The false acceptance rate is a despite creating passwords that are easy to measure of what percentage of unauthorized users are remember,computer users are often faced with the granted access to the system due to similarities between the burdensome task of remembering many passwords. As a user and a stored template that are close enough to be result, users place repeated calls to help desks to obtain their considered a match. The false rejection rate, on the other

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 139 hand, is a measure of the percentage of authorized users who compromise the system. On the other hand, even if a hacker are refused access, generally due to extraneous factors such is able to determine what number a user entered from the as lighting or cleanliness variations of the subject. token and what number the user entered as a PIN, the hacker Higher success rates of user identification may come at the is unable to use that information to subsequently expense of other disadvantages, however. While some compromise the system since the number on the token will biometric systems can read a user's input through a non- have changed. Biometrics can even be used in conjunction invasive method (such as signature recognition), other with other forms of authentication to provide a greater methods such as retinal scanning may seem somewhat degree of reliability. The emergence of "hybrid technology" uncomfortable for users. Furthermore, in the unlikely but that encodes a user's biometric template on a smart possible event that biometric data is compromised, new card/sensor device. This device enables users to scan their biological features cannot be distributed to an individual as fingerprint directly on a portable card, which compares the easily as passwords can be reset. With their personal scanned information with the biometric data stored on the biometric data stored on servers, users are faced with the card. While biometrics offer increased security over other fear that their data will be compromised or their privacy methods ofm authentication, this device can still be stolen violated. And finally, them financial cost of such a system and with it, an individual's biometric data. As mentioned cannot be ignored. Biometric systems are perhaps the most earlier, the consequences of compromising a full biometric costly method of authentication to implement. Nonetheless, template are severe since an individual only possesses one the additional security they provide must be factored into an set of fingerprints for life. In addition, the technology analysis of their practicality and usefulness. User already exists to lift a fingerprint w from a surface and authentication controls, whether based on passwords, manufacture a false finger capable of fooling some tokens, cards, or biometrics, provide a layer of security to biometric sensors. As a result, new biometric sensors are information systems. However, each of these access control emerging that can detect thee presence of a live finger by methods relies upon a single layer of authentication and can reading a pulse. be compromised in a single step. The security of The biometric smart card is nonetheless a step in the right information systems can be increased through a technique direction. Due to the storagei of biometric data on the device called double authentication, which relies upon a itself, it is much faster than conventional biometric systems combination of methods to perform user authentication and that store a user's biometric template on a server. When verification. This technique is much more secure than access biometric data is stored on a central server, it takes a control based on single-layer authentication because even if significantV amount of time for the system to search every one form of authentication is compromised, there is an template in its database to find a match. As a result, some additional check in place to prevent unauthorized access to companies including MasterCard have implemented a information system resources. Double authentication can system in which the user first indicates his or her name take many forms. Several of these dual-layer systems have (which does not add much security), and that individual's been inn existence for quite some time, while other biometric template is then retrieved from the database. The combinations are just emerging. Perhaps the most welly individual then submits to a biometric scan that is compared known double authentication technique is the use of a with the template pulled from the database. Using this magnetic card along with a PIN known by the user.l Banks method, the system does not have to compare a scan with have been using this dual layer of authentication at every template in the database. automated teller machines (ATMs) or with debit cards. This Of all the forms of user authentication, biometrics is system incorporates something that the r user possesses (the considered to be the most difficult to compromise. The magnetic card) with something that the user knows (the combination of biometrics with alternative forms of PIN), thereby providing two layers of authentication before authentication therefore seems to provide the most secure providing access to finances. Even in the event that an method of access control. individual's debit card is stolen,a the thiefn must also know III. PROPOSED SOLUTION TO USER AUTHENTICATION FOR the user's PIN for the card to bem of any use. The E-BUSINESS APPLICATIONS combination of a PIN and a possession is also utilized by most token-based access control systems. Instead of a The proposed solution to increase the security of Internet magnetic card, usersmE possess a token that displays a transactions in a cost-effective manner makes use of all dynamic numeric access code. This code must be entered three aforementioned categories of user authentication. This along with the user's PIN in order to gain access to the multi-layer authentication method combines the use of a system. Token-based systems involving a tokenm and PIN PIN (something the user knows), a magnetic card combination are slightly more secure than a magnetic card (something the user has), and a biometric sample and PIN combination because the former also makes use of (something the user is). However, simply combining all a timestamp to prevent password re-use. A timestamp is a three technologies will not solve the problem of secure user date and time associated with the moment a password is authentication unless it is cheap, versatile, and accepted by entered by the user, and it must fall within the time specified users. The proposed security solution satisfies all three by the server that a particular password is still valid. If a criteria. In order for this approach to be implemented, a hacker, for example, knows the account number on a user's small hardware investment is required on the user's end. magnetic card and knows the user's PIN, he is able to Specifically, this proposal would require the use of a P a g e | 140 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

specialized keyboard that contained a USB card reader as ensure that the biometric reading occurred within a specified well as a small fingerprint scanner. Such an approach would time range after the request by be very cost-effective because each web site wishing to the website was issued. This procedure is conduct electronic business would not have to spend time illustrated in Figure 1.2. and money to distribute tokens or smart cards to each user, Each e-business web site would only be responsible for and instead could take advantage of a technology that can storing a username, site PIN, and biopair value for each be purchased once and used for all web sites. The proposed user, a feature that enables this multi-layer authentication solution will require each individual to have a username technique to be completely scalable. Upon each subsequent stored on a magnetic strip card. The username must be a visit to a site, the user would have to be authenticated and unique string of both alphanumeric and symbolic characters then permitted to perform any number of business-to- and a recommended length of ten such characters, such as business or business-to-consumer transactions that are "m49K2g#%6L". This string must not contain as a substring permitted for the specified user on that site. To perform each the individual's name or social security number. Since the authentication, the site would query its database for the majority of Internet transactions take place with the use of a username and return the associated site PIN to the biometric major credit card, it would seem logical to work with the scanner. The user would once again scan his finger and enter credit card companies to add an individual's username to the his PIN, and the scanner would combine the bits of the data currently stored on the magnetic strip on a credit card. biometric reading specified by the user PIN with the bits An individual would specify at the time of applying for a specified by the site PIN. This combined value would be credit card if he desired that specific credit card to contain sent back to the web site, which wouldw then compare this his username, and that card-issuing company would then biopair with the biopair value in its database. create a username for the cardholder and encode it on the This multi-layer authentication technique solves a number of card. The credit card- username technique would not add the problems with currente methods of authentication. any significant cost to the distribution of usernames, would Privacy of biometric data is maintained and bandwidth be portable (in the user's wallet), and would enable the user resources are not overwhelmed since only a portion of the to maintain the same username across multiple full biometric readingi is sent over the Internet or stored in a sites,eliminating the need to remember multiple usernames. web site's database. User PINs, which are known by the user Secondly, the user would select a PIN upon registering at and not stored anywhere,ensure that a user can specify each web site. This PIN will be known only by the user and which bits from his biometric reading to use as part of his not recorded anywhere. In addition, the user can choose to password.V This technique enables a user to change his have a separate PIN for each e-business web site but can biometric site password at any time by merely changing his also securely use the same PIN for each site. This apparent PIN. Even if a biometric transmission is intercepted, the security vulnerability warrants further explanation. The PIN timestamp ensures that the intercepted information cannot would not be stored anywhere and would serve to tell the be reused, and the relationship of the biometric reading to fingerprint scanner which bits of the user's scanned the PIN ensures that the password can be changed if biometric reading to use for authentication purposes.y compromised. The problem with current biometric systems Combining each of these features, the proposed solution for is that since the full biometric data is used for the secure user authentication during Internet transactionsl comparison, a compromise of the data would be severe would work according to the following procedure. Upon because a user has a limited number of fingerprints. Since registering at a web site, the user would first enter his each web site associates its own site PIN with each user's username and then swipe the credit card encodedr with the biopair value, a password that is hacked from one web site same username for initial verification. The user would then cannot be used to gain access to other sites, even though the be prompted to scan his finger on the biometric sensor username is the same. This approach thereby enables users attached to the keyboard. The biometric scanner would to use the same username and PIN for every web site detect whether a live finger or aa fake has been presented as without sacrificing security input by reading a pulse from the source. The digitized The downside to this technique lies in the added cost to the biometric data would only be stored locally for 60 seconds. user in additional hardware functionality. However, this During that time, the user would enter his PIN (the user one-time cost is far outweighed by the increased level of PIN), and the webE site would send a numeric value security offered by this multi-layer authentication method. (henceforth labeled the site PIN) as well as a timestamp. The In addition, the keyboard scanner is extremely versatile in PIN and the number sent by the web site would specify that it can be used for every e-business site that chooses to which combined bits of the biometric reading to use as the implement this security method. This increased level of individual's password for that particular website, and this security across all sites can therefore be obtained at a cost combination of biometric data bits (henceforth labeled the much less than if the user had to purchase a token or biopair value) would be sent back to the website with the proprietary biometric scanner from each e-business. user's timestamp. The biometric data sent to the website IV. CONCLUSION AND FUTURE RESEARCH would only be a small portion of the user's full biometric reading, thereby maintaining the privacy of the actual Passwords remain the most prevalent method of user biometric reading and avoiding the high bandwidth authentication for information systems, and especially transmission of an entire reading. The timestamp would Internet transactions for e-business. Password-based

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 141 authentication is lacking in security due to several factors 6) Computer Security Institute. (2002). Computer contributing to the weakness of password construction and Crime and Security Survey. Retrieved November the easewith which passwords can subsequently be 5,2002 compromised. The Computer Security Institute (2002) 7) Two integrated schemes of user authentication and reported in its "Computer Crime and Security Survey" that access control in a distributed computer network. 38 percent of responding corporations has security breaches IEE Proceedings: Computers and Digital resulting from unauthorized access to areas of their web Techniques, Vol. 145, No. 6., 419-423. sites. 8) Lui, S., Silverman, M. (2001). A Practical Guide Alternative forms of user authentication, including tokens, to Biometric Security Technology. IT Professional. smart cards, and biometrics, attempt to address some of the Retrieved October 10, 2002 vulnerabilities of password-based systems, but suffer from 9) Millman, H. (2002). Making Passwords Passe . their own vulnerabilities when employed in a singlelayer omputerworld. Retrieved October 10, 2002 l authentication scheme. Double authentication techniques 10) Morris, R., Thompson, K. (1979). Password have been developed which combine the methods of two security: a case history. Communication of the authentication techniques to increase the level of security of ACM, Vol. 22, No. 11, 594-597. information systems. E-business transactions have yet to 11) Porter, S.N. (1982). A password extension for take advantage of increased security in user authentication human factors. Computers and Security, Vol. 6, methods due to the high costs associated with a largescale No. 5, 403-416. deployment of double authentication technology by each w individual site. The cost of increased security remains a principle inhibitor to its implementation, and the percentage of information technology budgets allotted to security has e not kept pace with the increase in threats. In 2002, businesses reportedly spent only 11.8% of their information technology budget on security. The multi-layer i authentication method proposed in this study can increase the security of web-based transactions by providing a more reliable way to authenticate valid users and simultaneously reduce security implementation costs through the adoption V of a single technology source usable by all e-business web sites. While this study has laid the groundwork for a proposal to increase the security of Internet transactions, there are several areas that can be nexplored as further extensions of this research. Future study can look into the methods of authentication used by the 0.5 percent of Internety transactions identified as using more than strictly passwords (Radcliff, 2002). In addition, further research canl analyze the multi-layer authentication method proposed in this study to determine its possible areas of vulnerability. And finally, the physical keyboard-scanner technologyr central to this proposal can be explored to reduce the cost of such a device and propel its

V. REFERENCES a 1) Anthes, G. H. (1994). SecurID keeps passwords changing. Computerworld. Retrieved October 10,2002 E 2) Anthes, G. H. (1994). TGV's onetime passwords evade intruders. Computerworld. Retrieved October 10, 2002 Anthes , G. H. (1998). 3) Promising technology has yet to gain wideacceptance. Computerworld. Retrieved October 10,2002 4) Bishop, M., Klein, D.V. (1995). Improving system security via proactive password checking. mComputers and Security, Vol. 14, No. 3, 233-249. 5) Black, J. (2002). A Growing Body of Biometric Tech. BusinessWeek Online. Retrieved October 10,2002 P a g e | 142 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Cloud Computing – A Paradigm Shift

Manjula K A

Abstract-Grid technology is finding its way out of the academic Cloud Computing poses a number of challenges: incubator and entering into commercial environments. deployment, data sharing, load balancing, failover, Ensembles of distributed, heterogeneous resources, or discovery (nodes, availability), provisioning (add, remove), Computational Grids, have emerged as popular platforms for management, monitoring, development process, debugging, deploying large-scale and resource-intensive applications. inter and external clouds (syncing data, syncing code, Large collaborative efforts are currently underway to provide the necessary software infrastructure. This paper explains failover jobs). Cloud platforms like GridGain, GigaSpaces, Grid Computing and introduces its basic concepts. Clouds, Terracotta, Coherence, Hadoop etc. makes it affordable to another variant of Grids, and their significance is also grow and manage grids. Gridgain is an open source discussed. GridGain which is an open source product from computational grid framework that enables Java developers GridGgain Systems Inc, is an ideal platform for Native Cloud to improve general performance of processing intensive Applications. This provides developers a powerful and elegant applications by splitting and parallelizing the workload. technology to develop and run applications on private or public GridGain can also be thought of as w a set of middleware clouds. GridGain enthusiastically supports the MapReduce primitives for building applications. model of computation. This paper also discusses about this In this paper grid computing is discussed in the next section, open source business model which is growing fastest with its noted characteristics of ease and transparency. followed by cloud computinge and a comparision of these Keywords-Cloud Computing, Distributed Computing, Grid two technologies. The paper concludes with a discussion on Computing, GridGain, Open Source, Middleware. the capabilities and characteristicsi of GridGain II. GRID COMPUTING I. INTRODUCTION his paper discusses Grid Computing, which is a form of The main concept of Grid Computing is to extend the Distributed Computing whereby resources of many original ideas of the Internet to sharing widespread T computingV power, storage capacities and other resources [2]. computers in a network is used at the same time, to solve a single problem. Grid Computing is the use of hundreds, The term, Grid Computing, has become one of the latest thousands, or millions of geographically and buzzwords in the IT industry. Grid Computing can be organisationally disperse and diverse resources to solve thought of as distributed and large scale Cluster Computing problems that require more computing power than is and as a form of network distributed parallel processing. available from a single machine or from a local area This innovative approach of computing leverages on y existing IT infrastructure to optimize computing resources distributed system [1]. This technology has been applied to computationally intensive scientific, mathematical, and and manage data as well as computing workloads. Grids are academic problems through volunteer computing,l and it is collections of heterogeneous computation and storage used in commercial enterprises for such diverse applications resources scattered along distinct network domains. Grids as drug discovery, economic forecasting, seismic analysis, provide tools that allow users to find, allocate and use and back-office data processing in supportr of e-commerce available resources [3]. Grid middleware provides users and Web services. with seamless computing ability and uniform access to Compared to Grid Computing, Cloud Computing is resources in the heterogeneous grid environment [4]. relatively a newer concept, which has become popular Structure of a grid is depicted in Fig. 1. a Grid computing appears to be a promising trend for three recently with the availability of envoronment like Amazon EC2. Clouds leverages virtualization technology and that reasons [5]: makes it distinguishable from Grids. Cloud Computing is  Its ability to make more cost effective use of a the use of a third partyE service (Web Services) to perform given amount of computer resources. computing needs. Here Cloud depicts Internet. With Cloud  As a way to solve problems that cannot be Computing, companies can scale up to massive capacities in approached without an enormous amount of an instant without having to invest in new infrastructure, computing power. which is beneficial to small and medium-sized businesses. It suggests that the resources of many computers Basically consumers use what they need on the Internet and can be cooperatively and perhaps synergistically pay only for what they use. harnessed and ______managed as collaboration toward a common objective. Grid Computing is becoming a critical component of About-Department of Information Technology science, business, and industry. Grids could allow the Kannur University, India analysis of huge investment portfolios in minutes instead of (e-mail; [email protected]) hours, significantly accelerate drug development, and reduce

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 143 design times and defects. Larger bodies of scientific and  The ability to pay for use of computing resources engineering applications stands to benefit from grid on a short-term basis as needed (e.g., processors by computing, including molecular biology, financial and the hour and storage by the day) and release them mechanical modeling, aircraft design, fluid mechani as needed, thereby rewarding conservation by biophysics, biochemistry, drug design, tomography, data letting machines and storage go when they are no mining, nuclear simulations, environmental studies, climate longer useful. modeling, neuroscience/brain activity analysis, Cloud computing is massively scalable, provides a superior astrophysics[6]. user experience, and is characterized by new, internet-driven economics. [10] III. CLOUD COMPUTING IV. GRID COMPUTING vs CLOUD COMPUTING Cloud Computing evolves from grid computing and provides on-demand resource provisioning. Grid computing Cloud Computing and Grid Computing do have a lot in may or may not be in the cloud depending on what type of common; both are scalable. Scalability is accomplished users are using it [7]. Cloud Computing is the convergence through load balancing of application instances running and evolution of several concepts from virtualization, separately on a variety of operating systems and connected distributed application design, grid, and enterprise IT through Web services. Both computing types involve management to enable a more flexible approach for multitenancy and multitask, meaning that many customers deploying and scaling applications[Fig. 2]. To deliver a can perform different tasks, accessing a single or multiple future state architecture that captures the promise of Cloud application instances[7]. Cloud and gridw computing provide Computing, architects need to understand the primary service-level agreements (SLAs) for guaranteed uptime benefits of Cloud computing[8]: availability of, say, 99 percent. At the same time Cloud  Decoupling and separation of the business service Computing and Grid Computinge do have differences[11]. from the infrastructure needed to run it One major difference being that while grids are typically (virtualization). used for job execution (i.e. limited duration execution of a  Flexibility to choose multiple vendors that provide program, often as parti of a larger set of jobs, consuming or reliable and scalable business services, producing all together a significant amount of data), clouds development environments, and infrastructure that are more often used to support long-serving services. While can be leveraged out of the box and billed on a Grids provide higher-level services that are not covered by metered basis—with no long term contracts. Clouds; servicesV enabling complex distributed scientific  Elastic nature of the infrastructure to rapidly collaborations (i.e. virtual organisations) in order to share allocate and de-allocate massively scalable computing data and ultimately scientific discoveries. In resources to business services on a demand basis. Clouds Amazon S3 provides a Web services interface for  Cost allocation flexibility for customers wanting to the storage and retrieval of data. An object as small as 1 byte move capital expenditure into operating and as large as 5 GB or even several terabytes can be stored. expenditure. y S3 uses the concept of buckets as containers for each storage  Reduced costs due to operational efficiencies, and location of objects. The data is stored securely using the more rapid deployment of new business services.l same data storage infrastructure that Amazon uses for its e- Cloud computing eliminates the costs and complexity of commerce Web sites. Users are gaining confidence in the buying, configuring, and managing the hardware and cloud services and are now outsourcing production services software needed to build and deploy r applications, these and part of their IT infrastructure to cloud providers such as applications are delivered as a service over the Internet (the Amazon. A comparison of grid (EGEE Grid) and cloud cloud). Cloud Computing refers to both the applications (Amazon cloud) is depicted in Fig.3. which points out to the delivered as services over the Internet and the hardware and power of cloud computing [11] a Michael Sheehan(12) analysed the trends in search volume systems software in the data centers that provide those services. Cloud computing incorporates infrastructure as a and news reference volumes of computing terms Grid service (IaaS), platform as a service (PaaS) and software as Computing and Cloud Computing. He finds that the term a service (SaaS) as wellE as Web 2.0. Grid Computing, which has been around for a while is seen From a hardware point of view, three aspects are new in trending downwards. But, the newcomer Cloud Computing, Cloud Computing[9]. which made its full entrance into this trend analysis around  The illusion of infinite computing resources 2007 is rapidly gaining momentum. 2008 seems to be a available on demand, thereby eliminating the need pivotal time where it surpassed Grid Computing (and for Cloud Computing users to plan far ahead for continues to grow). provisioning. V. GRIDGAIN  The elimination of an up-front commitment by Cloud users, thereby allowing companies to start In order to control and manage the various resources that small and increase hardware resources only when Grids can offer, various Grid middleware like Optimal Grid there is an increase in their needs. [13]; Ice [14]; GridGain [15], Gigaspaces[16], Terracotta[17 ] etc. have been developed. One among these, GridGain is an open source product released under the terms of GNU P a g e | 144 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

General Public License (GPL) from GridGain Systems Inc. and work is ongoing on developing GridGain on Google The developers of GridGain are of the opinion[15] that it is Android platform. Next version of GridGain is expected to an ideal platform for Native Cloud Applications and is noted feature improved management and monitoring for for the ease of use and transparency it renders with regard to subscription subscribers based on VisualVM featuring the deployment issues. GridGain with its modern design is enterprise grade capabilities such as role-based access based on Java programming language, and is adequate for control, global alerts, scheduling control, trend base-lining, networking systems and applications. GridGain provides reporting, and distributed monitoring. developers with powerful and elegant technology to develop VI. CONCLUSIONS and run applications on private or public clouds. GridGain is foccused on providing the best Java software middleware to Computational Grids act as popular platforms for deploying develop and run Grid applications on the Cloud large-scale and resource-intensive applications. Another infrastructure in a simple and productive way. GridGain‘s related technology, the newly emerging IT delivery model— open Cloud platform is a new breed of Cloud Computing Cloud Computing—can significantly reduce IT costs & software. It enables developers to write any custom Grid- complexities while improving workload optimization and enabled applications or Grid enable the existing one and service delivery. Cloud Computing is found to be massively seamlessly deploy it on the Cloud taking a full advantage of scalable, provides a superior user experience, and is such concepts like MapReduce, data grids, affinity load characterized by new, internet-driven economics. Grid balancing, zero deployment, and peer-to-peer class loading Computing and Cloud Computing resemble on some among many other. respects, but there are differences. GridGainw is a newly GridGain‘s SPI architecture is ideally suited for hybrid introduced promising grid framework, which is an open cloud deployment with any mix of internal and external source product. This middleware enables developers to write Clouds in the same time allowing to develop entire any custom grid-enabled applicationse and performs well application locally and then seamlessly deploy them on with its key features like SPI based Integration- virtualized Cloud without any changes to business logic, the Customization, Advanced Affinity Map/reduce, AOP. The code or how it was developed [15]. The key features of developers of GridGaini are expecting to incorporate much GridGain are more improved management and monitoring features into  Hybrid Cloud Deployment. this product in coming releases.  Cloud Aware Communication & Deployment. VII. REFERENCE  Advanced Affinity Map/Reduce. V  Annotation-based Grid-enabling with AOP. 1) Lewis, M. Grid Computing. Retrieved from  SPI-based integration and customization. http://grid.cs.binghamton.edu.  Advanced load balancing and scheduling.  Pluggable Fault-Tolerance. 2) Yang, C.T., Han, T.F., & Kan, H.C. (2009). G-  One compute grid - many data grids. BLAST: a Grid-based solution for mpiBLAST on y computational Grids. Concurrency and  Zero deployment model. Computation: Practice and Experience, vol. 21, no.  JMX-based Management & Monitoring. 2, pp. 225-255. l

The characteristics that are considered to be promoting the 3) [3Amador, G., Alexandre, R., & Gomes, A (2009). growth of GridGain are[18]. Re-engineering to work on a grid. Retrieved  Cost - It Is Free. r from  Source Code - It Is Open Source. http://www.av.it.pt/conftele2009/Papers/96.pdf  Support - Enterprise Level Support.  Java - It Is Made In Javaa And For Java. 4) Buyya, R., & Venugopal, S.(2005). A Gentle  AOP - Innovative AOP-based Grid Enabling. Introduciton to Grid Computing and Technologies.  Simplicity - Ease Of Use. CSI Communications, July, 2005.  Features - BestE of Breed Grid Computing Features.  Practicality - Everything You Need, Nothing You 5) Berman, F., et al. (2003). Grid Computing: John Do not. Wiley and Sons.  Integration - Out-Of-The-Box Integration With 6) Kaufman, J.H., et al. (2003). Grid Computing Spring, JBoss, Aspect etc Made Simple. The Industrial Physicist, vol. 9, no.  Agile - Made With Developers In Mind. 4, pp. 31-33. According to the developers of GridGain[19], the next major 7) Myerson, J.M. (2009) . Cloud computing versus release is expected to include ability to execute Grid task grid computing. Retrieved from from any Ajax-based application via REST/JSON http://www.ibm.com combination providing native integration between server- 8) [Bennett, S., Bhuller, M., & Covington, R. (2009). side GridGain and Web 2.0 client side applications. Oracle White Paper in Enterprise Architecture – GridGain is pioneering in field of mobile Grid Computing Architectural Strategies for Cloud Computing. Retrieved from www.oracle.com

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 145

9) Armbrust, M., & Fox, A., et al. (2009). Above the 13) OptimalGrid. Retrieved from Clouds: A Berkeley View of Cloud Computing. http://www.alphaworks.ibm.com/tech/optimalgrid whitepaper, UC Berkeley Reliable Adaptive 14) Ice. Retrieved from www.zeroc.com Distributed Systems Laboratory. Retrieved from 15) Gridgain. Retrieved from www.gridgain.com http://radlab.cs.berkeley.edu/ 16) GigaSpaces. www.gigaspaces.com 10) Cloud Computing. Retrieved from 17) Terracotta. Retrieved from www.terracotta.org http://www.ibm.com/ibm/cloud/ 18) 10 Reasons to use GridGain. Retrieved from 11) Klems, M. (2008). Comparative study: Grids and http://www.gridgainsystems.com Clouds, Evolution or Revolution. Retrieved from 19) GridGain: One Compute Grid, Many Data Grids. www.eu.egee.org 11/6/2008 Retrieved from http://highscalability.com 12) Sheehan, M., (2008). Trending Various Computing Terms – Clouds are getting Congested. Retrieved from www.gogrid.com w e i V

Fig. 1 Example for a Grid structure y l r a E

. Fig. 2. The architecture of cloud [Source: http://www.cloudup.net] .

P a g e | 146 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

w

e Fig. 3. Comparison between EGEE Grid and Amazon Cloud

i

V

y

l

r

a

E

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 147

On Security Log Management Systems

Sabah Al-Fedaghi1 Bader Mattar2

Abstract-A log management system (LMS) is a system for storing, analyzing, and disposing of computer security log creating, receiving, processing, releasing, and transferring of data‖ [18]. According to [18], a fundamental problem with security log data. Its main objectives include detecting and log management is balancing log management resources preventing unauthorised access and abuse, and meeting with the quantity of log data. ―Log generation and storage regulatory requirements. One of its main components is the can be complicated by several factors, including a high classification of events to make decisions related to archiving and to invoking responses to certain events. Most current number of log sources; inconsistent log content, formats, approaches to LMS design are system dependent and involve and timestamps among sources; and increasingly large specific hardware (e.g., firewalls, servers) and commercial volumes of log data‖ [18]. There are several widely used software systems. This paper presents a theoretical framework formats of log messages, e.g., Apache Common Log Format for LMS in terms of a flow-based conceptual model with [14] for Web servers, and Unix Syslog format [10]; emphasis on security-related events. The framework includes however, there are no common standards for log message four separate flow systems: active system, log system, alarm encoding system, and response system. All systems are composed of five To assist in facilitating more efficientw and effective log inclusive stages: receiving, processing, creating, releasing, and management, Kent [18] recommends transferring. The experimental part of the paper concentrates on log analysis in the processing stage in the log system. We The establishment of a log management infrastructure that is select actual log entries and classify them according to these used to generate, transmit, store,e analyze, and dispose of log five stages. data, and support the policy and roles. When designing Keyword -log management system, security-related events, infrastructures, major factors to be considered include the volume to be processed,i network bandwidth, storage, the conceptual model, logs classification. security requirements, and resources needed for staff to I. INTRODUCTION analyze the logs. [18] log is a record of a computer event arising during There are many log management systems on the market, and A processing in systems and networks. It is ―append-only, also manyV internally managed log management systems. It time stamped records representing some event that has is reported that ―because of difficulties in setup and occurred in some computer or network device‖ [23]. integration, most organizations have only achieved partial Logging refers to the action of recording events in a log automation of their log management and reporting database. Event logging is a major component in most processes‖ [19]. According to Patrick Mueller [21], critical systems. Applications, security systems, and The legal requirements around log management may make operating system components can make use of a centralisedy you feel like you're battling the Hydra—solve one problem, log service to report events that have taken place, such as a two more pop up in its place. Analyzing and aggregating the failure to start a module, complete an action, orl block or incessant streams of information created by computer and deny some connections. network logs has always been a difficult, thankless task, but A centralised log service provides valuable security-related now it's taking on epic proportions because of regulatory functions such as troubleshooting and monitoring.r Logging compliance tools can improve security for systems, applications, and For example, the 2006 PCI Data Security Standard (DSS) storage with benefits that include the following [19] [13] requires that certain events be logged with specific details of - Detect/prevent unauthorised access and insider abuse each audit entry and with network time synchronisation a among logging components. The Health Insurance - Meet regulatory requirements - Analyse and correlate forensic data Portability and Accountability Act (HIPAA) mandates - Track suspicious behaviour ―hardware, software, and/or procedural mechanisms that - IT troubleshooting Eand network operations record and examine activity in information systems that The use of computer security logs from servers, network contain or use electronic protected health information." devices, diagnostic tools, and security-specific devices has increased tremendously (89 percent of organisations in Modern log management systems are typically complex 2010, compared with 43 percent in 2005 [19]). This has systems. Although LMS is dependent on actual created the need for log management. (Security) log requirements and needs, theoretical study of these systems management is the process ―for generating, transmitting, provides a foundation for requirement analysis, design, and ______implementation. Design of such a system needs all system development methodologies that have been used in other 1,2 1 About- Kuwait University (e-mail; [email protected]) computer systems. Insufficient efforts have been invested in About2 (e-mail;[email protected]) studying LMS in the abstract. P a g e | 148 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

This paper is concerned with developing high-level Triggering another type of flow architectural issues for these systems. After reviewing current abstract architectures, we propose a conceptual model that depicts various functionalities involved in Created logging procedure. The paper also concentrates on Transferred computer security related logs that can be generated by flowthing, e.g., flowthing, e.g., new, constructed, Released many sources, including security software, intrusion being transmitted, manufactured. flowthing, detection systems, operating systems, and applications. An communicated, or e.g., to be transported. implementation-independent scheme of classification of log shipped out, entries is also proposed. The classification method is applied for departure, to actual entries of events. Our approach is based on a flow to be model, as reviewed in the next section disclosed. Processed Received II. FLOW MODEL flowthing, e.g., flowthing, e.g., Recently, a flow model (FM) has been proposed and used in transformed, arrived or collected. several applications, including communication and reshaped. engineering requirement analysis. For the sake of making this paper self-contained, we review such a model Fig. 1. State transition diagram for the flow model introduced in several works [4], with additional materials with possible triggering mechanism.w related to event classification. Fig. 1. State transition diagram for the flow model with In FM, the flow of ―things‖ indicates movement inside and possible triggeringe mechanism. between spheres. The sphere is the environment of the flow and includes five stages that may be subspheres with their The environment in i which information exists is called its own five-stage schema. The stages may be named sphere (e.g., computer, human mind, organisation differently; for example, in an information sphere, a stage information system, department information system). The may be called ―communication,‖ while in raw material flow flowsystem is reusable because a copy of it is assigned to the same stage is called ―transportation‖. The information each entity (e.g., software system, vendor, and user). An entity mayV have multiple flowsystems, each with its own creation stage may be called ―manufacturing‖ in materials flow. flowsystem. As will be developed later, an improvement cycle can be described in terms of flowsystems of different A. General view flowthings: information, plans, actions, and systems. As A flow model is a uniform method for representing things forms of information, plans, actions, and systems are that ―flow‖, i.e., things that are exchanged, processed, flowthings that can be received, processed, created, released, y and transferred. created, transferred, and communicated. ―Things that flow‖, called flowthings, include information, materials (e.g., in A flowsystem may not necessarily include all states; for manufacturing), and money. To simplify this reviewl of FM, example, conceptualisation of a physical airport can model we introduce the model in terms of information flow. the flow of passengers: received (arriving), processed (e.g., Information occurs in five states: transferred, received, passports examined), released (waiting to board), and processed, created, and released, as illustratedr in Fig. 1. transferred (to planes); however, airports do not create Here, we view "state of information" in the sense of passengers (ignoring the possibility of an emergency where properties; for example, water occurs in nature in the states a baby is born in the airport). In this case, the flowsystem of of liquid, solid, and gas. a the airport includes only passenger states of received, processed, released, and transferred. Fig. 1 also represents a transition graph, called flowsystem, As mentioned, we view a system as the environment in with five states and arrows representing flows among these which information exists, called its sphere (e.g., computer, states. Information canE also be stored, copied, destroyed, human mind, organisation information system, department used, etc., but these are secondary states of information in information system). A system is also viewed as a complex any of the five generic states. In Fig. 1, flows are denoted by of flowsystems. From this perspective, many notions solid arrows that may trigger other types of flow, denoted discussed in this paper take different spots. For example, by dashed arrows, as will be discussed. ―Where events happen‖ is understood as the position of events in flowsystems and in stages of flowsystems. ―Who is affected‖ is translated as what flowsystem is affected. ―What subsystems are involved‖ asks what subflowsystems are involved. This also applies to flows, substages, and so forth related to events

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 149

B. Exclusiveness of information states

The states shown in Fig. 1 are exclusive in the sense that if Released Created information is in one state, it is not in any of the other four 5 Received Invoices states. Consider a piece of information  in the possession 1 of a hospital. Then,  is in the possession of the hospital and can be in only one of the following states: Transferred Transferred Buyer 1.  has just been collected (received) from some source, e.g., patient, friend, or agency, and stored in the hospital 4 2 record waiting to be used. It is received (row) information Transferred Transferred that has not been processed by the hospital.

2.  has been processed in some way, converted to another Orders form (e.g., digital), translated, compressed, etc. In addition, Released Received it may be stored in the hospital information system as Seller processed data waiting for some use. 3.  has actually been created in the hospital as the result of Created 3 Processed doctors‘ diagnoses, lab tests, processing of current information (e.g., data mining), and so forth. Thus,  is in the possession of the hospital as created data to be used. w

4.  is being released from the hospital information sphere. Fig. 2. Order flow triggers invoice flow. It is designated as released information ready for transfer (e.g., sent via DHL). In an analogy of a factory environment, C. Formale View  would represent materials designated as ready to ship Fig. 2 illustrates the triggering mechanism of flows of outside the factory. They may actually be stored for some orders and invoices. i An important principle of FM is the period waiting to be transported; nevertheless, their separation of flows. An order triggers an invoice, and each designation as ―for export‖ keeps them in such a state. has its flowsystem in its own information sphere. Triggering 5.  is in a transferred state, i.e., it is being transferred in the context of FM means activation of a state or substate, between two information spheres. It has left the released which may generate a flow. Suppose the receive state is state and will enter the received state, where it will become V activated by triggering; when flow is received, triggering received information in the new information sphere. may then result in: It is not possible for processed information to directly (1) Activating the flow to release become received information in the same flowsystem. (2) Activating the flow to process Processed information can become received information in (3) Mistriggering another flowsystem by first becoming released information, Mistriggering indicates that the triggering has not then transferred information, then arriving at (being receivedy succeeded. Triggering may specify a chain of flow. For by) another flowsystem. example, a triggering in receive may specify flow to release Consider the seller and buyer information spheres shown in l or flow to release and transfer. In the last case the triggering Fig. 2. Each contains two flowsystems: one for the flow of is a chain of triggering. orders, and the other for the flow of invoices. In the seller‘s FM reflects a map of possible flows, just as a city map infosphere, processing of an order triggers (circle 3) the r represents possible routes. Traffic lights internally trigger creation of an invoice in the seller‘s information sphere, thus flows. initiating the flow of invoices. Secondary stages include Copy, Store, Delete, and Destroy The reflexive arrow of the transfer state shown in Fig. 1 a that can be in any of the five FM stages. For example, there (above) denotes flow from the transfer state of one is stored received information, stored created information, flowsystem to the transfer state of another. stored processed information, stored released information,

and stored transferred information. In Fig. 2, the Buyer creates an Order that flows by being E This FM formalisation can be supplemented with rules and released and is then transferred to the Seller. The ―transfer constraints that permit flow from one state to another. components‖ of the Buyer and the Seller can be viewed as Additional ―logic gates‖ (e.g., OR, AND) can also be their transmission subsystems, while the arrow between overbuilt on the basic flowsystem. them represents the actual transmission channel. The notion of triggering will be used in our cycle of III. CONCEPTUALISING LOG MANAGEMENT SYSTEMS improvement, where information flow (e.g., data about a The design of log management systems is a complex process current system) triggers plan flow, which in turn triggers that needs all system development methodologies that have action flow that creates a new system (system flow) been utilised in other computer systems, including

conceptual modelling. A conceptual model is viewed as a

high-level abstract description of concepts that correspond

to entities, processes, and relationships in real-world P a g e | 150 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

systems. The resulting representation can be used as a first Control interface Input module Output module step in building a less abstracted description of the phenomenon under study. It is also useful for identifying common constructs and can serve as general understanding Input module Processing module Output module in designing and communication. In the context of LMS, [22] presents a ―conceptual‖ model Fig. 4. Sample log management system (from of a ―typical‖ log management solution utilising servers, [9]). firewall, and so forth. According to [22], The Syslog standard [is used] to aggregate logs from key The input layer receives, formats, and sends log messages to systems across the network. A correlation engine is then the processing layer. The processing layer includes the used to look at relationships between events. This event processing module and the rule store (Control). Log correlation step allows a view into collections of events that messages are processed and filtered against rules to select may point to malicious or other unwanted network different output modules. Typical processing includes behaviour. interpretation, categorisation, normalisation, reduction, and Clearly, such a description is a highly specific system in execution. The rules are used to extract the needed comparison with, say, database systems that are described in information to decide which of the configured actions such terms as objects, attributes, entities, and relationships. should be executed next. For example, ―the result triggers an action, such as displaying an alert or running an external Fischer [9] describes transfer of a log message in a more program‖ [9]. The output layer uses databasew back-ends for abstract fashion, as shown in Fig. 3. storing and retrieving received log messages. An important architectural issue in OMS is the separation of Source Log system File Network concerns: e Each functional subsystem or module of a system should be cleanly separated from other system components and Operating system encapsulate exclusivelyi functionality that is absolutely required for the specific task… On a higher level of abstraction, this requirement might be fulfilled by structuring a log management system into the Fig. 3. Example transfer of a log message mentionedV basic building blocks, namely input, (Modified from [9]). processing and output layer. [9] [6] While this type of description is independent of a specific ―First step to implement a logging infrastructure is listing system, it falls short of drawing a complete conceptual critical systems ... and determining what logging is turned picture in comparison with our FM-based model. Also, it on‖ [7]. At the analysis stage, raw log data are analysed and does not distinguish clearly among functionalities and interpreted to consolidate logs. Sophisticated tools y are subsystems. needed to spot security problems. A number of software products are available to help collect and analyse audit and l IV. FM-BASED LOG SYSTEM log data. We conceptualise LMS infrastructure as a system of According to Fischer [9] r flowsystems created from an active system, logging system, alert system, and response system, as illustrated in Fig. 5(a). Although the specific technical architecture of a log management system has to be adapted to specific a System (events) requirements, the basic structure of a log management system is layered in three tiers: • Input layer Response system • Processing layer E Log system • Output layer Fischer [9] then gives an example of a log management Alert system system, as shown in Fig. 4

Fig. 5. General view of FM-based LMS infrastructure.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 151

The notion of ―event‖ is a central concept in logging. Figure 8 shows a detailed representation of these Logged events are selected to cover different types of events flowsystems. Notice that the log system keeps recording all at different security-relevant points in the system. In FM, activities including alarms and responses (circles 1 and 2). events are flowthings that can be received (e.g., the event of In the figure, instead of drawing too many dotted arrows, log-in), processed (e.g., request to retrieve data), created triggering mechanisms are drawn starting from the edge of (e.g., activating a program), released (e.g., the action of the flowsystem box to denote that origins of activities can be buffering out bound file), and transferred (e.g., in any point in the flowsystem. transmission). Any of these events is a change of state in an active system which is viewed as a flowsystem. They trigger These figures provide a theoretical framework for LMS in creating and processing log entries. Events are actions (in terms of a flow-based conceptual model. We assume that the common sense), while logs, in this case, are records of the log system includes all typical functions such as analysis, actions. archiving, and reporting. Thus, the log system by itself is a complex of flowsystems of flowthings that are related to Logs, in their turn, are flowthings. Some log entries trigger logging. The same flow-based methodology can be utilised alarms. Alarms, in its different forms, are flowthings that as different levels of descriptions. In the next section we can be created, processed, received, released, and transfer. concentrate on the classification of events as part of the log Alarms, also, trigger responses. Typically, a response is a analysis. type of an action (purposeful event), hence, it is a flowthing. Action Fig. 6 shows the possible sequence of flowthings. w

Nevertheless, in the context of security-related events, Reaction system‘s events can be viewed as an outside action (e.g., Response e Log attack) and system‘s reaction as shown in Fig. 7. Alarm i Event Fig. 7. General sequences of flowthings in the context

Response Log of security-related events. Alarm V Fig. 6. General sequences of flowthings.

Outside action y Response Receive Create System System l (reaction)

Process Release Transfer Process Release Transfer r Create Receive a 1 Log storage Alert E Create Create 2 Log System System

Process Release Transfer Process Release Transfer

Receive Log Receive analysis

Fig. 8. Conceptualisation of log system and related systems.

P a g e | 152 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

V. FLOW-BASED LOGGING MODEL - Network traffic (packets) attempting to access the host - Login activity on the networking layer FM provides a conceptual foundation for the classification - Actions of a super-user (root) of security-related events as received, processed, created, - File system integrity, e.g., any changes to the files released, or transferred information events. Classification - System registers state here is based on information flow in different stages of the - The state of key operating system files and streams (active) flowsystems. In addition, the flow model allows We can observe no systemised grouping of event types: establishing rules for such issues as classifying the activities, actions, or changes to files, states, and streams, severity/sensitivity of processed events. In privacy extracted from the network operating environment. enhancing systems, creation of personal information is generally a more sensitive event than processing it. In Windows event logs contain the following types of events addition, releasing such information to outsiders is a more [12]: error event, warning event, information event privacy-significant event than receiving it. In such cases it is (successful operation of an application, driver, or service), possible to configure a type of filtering mechanism that success audit event, failure audit event. Again, these classes would apply to these types of events; e.g., the rule: the event of events seem to lack systemisation and are specified of releasing personal identifiable information to an outsider according to the hosting system. triggers sending an e-mail to alert the proprietor of that personal identifiable information, as required by some HP recommends basic monitored events:w admin event, login privacy regulations. event, moddac self-auditing event, execv, execve, and pset event [12]. ―Admin‖, ―login‖, ―exec‖… seem to be targeted Separate protection mechanisms can be used for each class events according to the specifics of the system. of log information. ―The use of different mechanisms for e different classification gives the opportunity to use Web Services Architecture [15] contains an ―audit guard‖ mechanisms that give more usability for information that that monitors agents,i resources, and services, and actions doesn‘t need that strong mechanism. This allows the relative to one or more services. We again note the possibilities of using the same mechanism with lower heterogeneous categorisation of monitoring types. assurance or strength‖ [17]. It can beV concluded that a general theory for event There are many security-oriented event classifications. We classification for monitoring purposes is lacking. The review a few methods as examples [4]. required classification must be generic in the sense that it is AIX [5] uses the following classifications (some not tied to any specific system activity. In the next section, subcategories are deleted for the sake of brevity): we introduce the foundation of such an approach to event logging. A. Security Policy Events y From the security point of view, monitoring for security -Subject Events: process creation, process deletion, setting breaches can be accomplished by way of network level subject security attributes l TCP/IP, server, and application, and through process- -Object Events: object creation, object deletion, object open, specific monitoring [7]. A record of monitoring typically object close shows the identity of any entity that has accessed a system - Import/Export Events: importing or exportingr an object and types of operations executed. It may include attempted -Accountability Events: adding a user, changing user accesses and services, a sequence of events that have attributes in the password database, user login, and user resulted in modifying data. The purpose is mostly to provide logoff a evidence for reconstruction of the sequence of events that -General System Administration Events: use of privilege, led to a certain effect or change. file system configuration, device definition and configuration, normal system shutdown To accomplish its function, a log system runs in a privileged -Security Violations E(potential): access permission refusals, mode to oversee and monitor all operations. Key privilege failures, diagnostically detected faults and system information in such a system includes information format, errors. In such a long list of events, grouping is performed type of activity, identity, storage, location, time, cause, tools according to subject/object, then import/export, then and mechanisms used, and so forth [13]. violations. In this case, a list seems to be built from examining different rules (subject/object), functions (e.g., VI. EXPERIMENTATION: SECURITY-RELATED LOGS accountability), organisational units (system administration), There is no system that uses the proposed FM-based and status (violations). classification of security-related events. Consequently, we In the area of network intrusion events, Kazienko and have opted to inspect one current log of events to identify Dorosz [11] list monitoring of the following events groups: entries that can be classified accordingly. We select a

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 153 sample log file extracted from Cisco PIX (Private Internet eXchange) firewall logging monitor tool [16]. PIX is a popular IP firewall and network address translation (NAT) Table 1 shows a view of the output screen, including tracked appliance that runs a custom-written proprietary operating events in the PIX firewall. It shows some error logs, for system originally called Finesse (Fast Internet Server example, one stating no translation found for a specific IP Executive), now known simply as PIX OS. It is classified as address, and other informational logs of building and a network layer firewall with stateful inspection, which denying TCP connections according to the rules of allowing means it keeps track of the state of network connections and denying traffic of the PIX firewall, and higher severity (such as TCP streams, UDP communication) travelling logs are also shown, like termination of some TCP or UDP across it. Technically the PIX would more precisely be connections because of time out or for other reasons. called a Layer 4—Transport Layer—Firewall, although its Different event meanings are shown in Table 2. access is not restricted to Network Layer routing, but to The entire sample log file is examined for entries that can be socket-based connections. By default it allows outbound classified according to the five stages of FM. We can traffic , which is the traffic generated from the inside host to classify each row in the event log by matching the actual the outside one, and it allows only inbound traffic generated meaning of the log with the theoretical concept of the FM in response to a valid request from an insider or allowed by classification model (created, processed, received, disclosed, a predefined access control list (ACL) or conduit. The PIX communicated), as shown in Table 3. The table contains can be configured to perform many functions, including only selected rows from Table 2. network address translation (NAT) and port address The first row in Table 3: translation (PAT), as well as being a virtual private network Login permitted from 53.215.253.172/49810w to (VPN) endpoint appliance. [16] Inside:53.215.253.254/https for user **** The PIX, like many other security devices, has a logging is a request to access, and the request is accepted according tool. This tool helps security engineers and network to authentication. It is receivede event. Note that the administrators to track, debug, and monitor any normal or receiving component in the system has its own processing abnormal security events. aspect. Received means that the original imported data has The log classifies events according to their predefined been kept in its originali form. It could be compared with severity: Emergencies, Alerts, Critical, Errors, Warnings, (operated on), as in this case, a stored file in the receiving Notifications, Informational, and Debugging. state, but the data itself has not been changed in form.

Table 1. Sample logged entries.V

y l r

a

E

P a g e | 154 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

Table 2. Different event meanings.

Seq Event Meaning Login permitted from 53.215.253.172/49810 to User has been logged in to the system 1 Inside:53.215.253.254/https for user **** Deny udp src Outside:53.215.0.241/49197 dst System denies the "udp" connection requested from 2 Inside:53.215.253.122/2056 outside source to inside agent Device completed SSL handshake with client Secure Sockets Layer used for transmitting private 3 Inside:53.215.253.172/49810 documents via the Internet Built inbound TCP connection for Connection built on user request and traffic prepared to 4 inside2:192.168.253.129/2608 to be sent to the outside Inside:53.215.253.100/110 portmap translation creation failed for udp src Connection cannot be created between the client and the 5 inside2:192.168.253.180/1025 dst Inside WWW server 6 53.215.253.179 Accessed URL 213.199.141.140 The user accessed a URL Deny TCP (no connection) from The connection closed (Terminated) 7 207.46.148.33/80 to 53.215.253.62/58806 flags SYN ACK w Permit TCP connection from 207.46.148.33/80 The connection has been created 8 to 53.215.253.62/58806 flags Backup SSL client Inside:53.215.253.172/52321 request Request from user to restore the session 9 to resume previous session. e Built outbound TCP connection for Connection built on user request and traffic sent to the 10 Outside:213.199.141.140/80 to inside i Inside:53.215.253.179/1417 Deny TCP (no connection) from The connection closed (Terminated) 11 65.55.15.123/80 to 53.215.253.62/43197 flags SYN ACK V portmap translation created for udp src Connection created between the client and the WWW 12 inside2:192.168.253.180/1025 dst server Inside:192.168.93.3/161 Teardown ICMP connection for faddr Connection closed because of SYN timeout 13 196.221.174.51/0 laddr 192.168.253.115 Permit outbound UDP connection for Connection built on user request and traffic sent to the 14 y Outside:77.28.78.85/10780 to inside2 inside Built inbound ICMP connection for faddr Connection built on user request and traffic sent to the 15 196.221.174.51/0 laddr 192.168.253.115l outside Teardown dynamic TCP translation from Connection closed because of SYN timeout 16 inside2:192.168.253.109/1352r to Outside Table 3. Event classifications according to FM model Creat Proces Receiv Releas Transfe Event a e s e e r 1 Login permitted from 53.215.253.172/49810 to * Inside:53.215.253.254/https for user **** 2 Deny udp src Outside:53.215.0.241/49197 dst * * Inside:53.215.253.122/2056E 3 Device completed SSL handshake with client * * Inside:53.215.253.172/49810 5 portmap translation creation failed for udp src * * * * inside2:192.168.253.180/1025 dst Inside 6 53.215.253.179 Accessed URL 213.199.141.140 * * * 8 Permit TCP connection from 207.46.148.33/80 to * * * * 53.215.253.62/58806 flags Backup 1 Teardown dynamic TCP translation from * * 6 inside2:192.168.253.109/1352 to Outside

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 155

The second row in Table 3 shows a deny action taken by the This means the connection cannot be created security device, where the request was received and between the client and the WWW server. In terms processed. of the FM model, the sequence of events is: System deny the "udp" connection requested from outside 1- Receiving: request from the host. source to inside agent 2- Processing: to check for permission The request was processed because it involved inside agent. 3- Releasing: buffering the data to be sent This implies that the original request was analysed to break 4- Transferring: denying the connection requested. it down into components. Here the desired connection is not clear. Failure to Note that in such a system, alert is not incorporated as an establish connection may indicate decisions made independent notion into the system. In our methodology, the at several stages. Figure 9 shows the stream of flow process may be visualised as follows: of possible communication between source and 1. Receiving stage: Request Q has been received by user A. destination, where the system acts as mediator. The (Apparently, user A is a legitimate user, otherwise, it would crossed circles denote positions where a decision is be denied access, as in the previous case) made to abort the connection. For example, it is 2. Processing stage: request has been processed as possible that the distinction is a blacklisted site; "udp" connection requested from outside source to inside thus, any request to connect to that site is rejected agent at the transfer or receive stages. The processing module denied permission. Process 3. Log system: Creating registries of these events in the log w system. If (this event has been designated as alarm), then trigger alarm system. Receive Release 4. Alarm system: Create alarm description, and then trigger e response system. Response system: Instruction to the receiving module to watch out for this user. Source i Destination Row 3: Transfer Device completed SSL handshake with client Fig. 9. Streams of flow in line 5, Table 3. Inside:53.215.253.172/49810 shows a different classification event: V Process: The device did a handshake with the host to ensure Fig. 9. Streams of flow in line 5, Table 3. availability. Release: After ensuring of host availability, traffic is This is contrast to line 6, which states: prepared to be sent to that host. 53.215.253.179 Accessed URL 213.199.141.140 Row number 4 indicates: Here the user requests access to a certain WWW server; Built inbound TCP connection for inside2:y after processing, the connection is created. 192.168.253.129/2608 to Inside: 53.215.253.100/110 In row number 8: The entry means four different actions: l Permit TCP connection from 207.46.148.33/80 to 1- Received: authenticated user. 53.215.253.62/58806 flags Backup 2- Processed: authorised operation. ―Backup‖ results from transferring files from their original 3- Created: built inbound TCP connection.r location to another location like media storage. This would 4- Released: traffic was prepared to be sent ―disclosed‖ to establish a communication channel between source and the outside. destination to release and transfer the backup information. Note how the sequence of events was registered by these Notice that storage in FM is divided logically into five FM log entries, as shown in Fig.a 8. If there is an alert, it stages. Suppose that the transferred files are log system files. involves the entire sequence of events, not only the last one. These are created information and should be stored in the Log Log new location as such.

Created Released Line 16: E Log Log Log Teardown dynamic TCP translation from

Receive Processed Transferred inside2:192.168.253.109/1352 to Outside This is an example of an event without receiving that does not involve the receiving stage. It could be because of the Fig. 8. Map of compound events involved in connection timeout; hence the system closes the session. row 4 of Table 3. Other events can be classified in the same way. The key issue is knowing the actual meaning of each event in the log Row 5 states: so it can be easily classified by mapping it to the theoretical portmap translation creation failed for udp src concept of the FM classification model. inside2:192.168.253.180/1025 dst Inside

P a g e | 156 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

For security-related events the FM approach provides a management framework. Diploma Thesis, Institut logical map for clear description of the sequence of events f¨ur Softwaretechnik und interaktive Systeme. that cause a security-related awareness. http://cocoon.ifs.tuwien.ac.at/lehre/diplomarbeiten/ 2007_Fischer.pdf VII. CONCLUSION 10) Gerhards, R. (2005, Oct.). The syslog protocol. This paper presents a theoretical framework for LMS in http://tools.ietf.org/html/draft-ietf-syslog-protocol terms of a flow-based conceptual model with emphasis on 11) Kazienko, P., & Dorosz, P. (2004). Intrusion security-related events. The framework includes four detection systems (IDS) Part 2 - Classification; separate flow systems: active system, log system, alarm methods; techniques. WindowsSecurity.com. system, and response system. All systems are composed in 12) http://www.windowsecurity.com/articles/IDS- terms of flow systems. The experimental part of the paper Part2-Classification-methods-techniques.html concentrates on log analysis in the log system. 13) GFI EventsManager. (2009, October). GFI The high level modelling methodology provides a promising Software Ltd. Manual. approach for LMS design. Further research aims at http://www.gfi.com/esm/esm8manual.pdf constructing a centralised LMS for simple hardware/ 14) SANS Consensus Project. (2007). Information software system where the flowsystem and its classification system audit logging requirements. SANS Institute. are applied right from the beginning. http://www.sans.org/resources/policies/info_sys_au dit.pdf EFERENCES VIII. R 15) The Apache Software w Foundation. (2006). 1) Al-Fedaghi, S. (2010). Threat risk modeling. 2010 Common log format. International Conference on Communication .http://httpd.apache.org/docs/2.2/logs.html#accessl Software and Networks (ICCSN 2010), Singapore, og e 26-28 February. 16) W3C Working Draft 8, Web Services Architecture, 2) Al-Fedaghi, S. (2009). On developing service- August 2003. oriented Web applications. The 2009 AAAI 17) Cisco i PIX, Dec 2009. (Advancement of Artificial Intelligence)/IJCAI http://en.wikipedia.org/wiki/Cisco_PIX (International Joint Conferences on Artificial 18) Försvarets Materielverk, FMV (Swedish Defence Intelligence) Workshop on Information Integration Material Administration), Design Rule: Security on the Web (IIWeb:09), July 11, Pasadena, CA, aspectsV of information, 2008-04-30. USA. Paper: https://www.fmv.se/upload/Bilder%20och%20dok http://research.ihost.com/iiweb09/notes/9-P4- ument/Vad%20gor%20FMV/Uppdrag/LedsystT/F ALFEDAGHI.pdf MLS%202010/FMLS%20Design%20Rules/LT1O 3) Al-Fedaghi, S. (2009). Flow-based description of %20P06- conceptual and design levels. IEEE International 0108%20DR%20Security%20aspects%20of%20inf Conference on Computer Engineering y and ormation%203.0.pdf Technology 2009, January 22-24. Singapore. 19) Kent, K., & Souppaya, M. (2006, Sept.) Guide to 4) Al-Fedaghi, S., & Mahdi, F. (2010).l Events computer security log management. classification in log audit. International Journal of Recommendations of the National Institute of Network Security & Its Applications (IJNSA), Standards and Technology, NIST Special 2(2). http://airccse.org/journal/nsa/0410ijnsa5.pdfr Publication 800-92. 5) AIX, System management concepts: Operating http://csrc.nist.gov/publications/nistpubs/800- system and devices (1st ed.) Chapter 3, Auditing 92/SP800-92.pdf overview. (September 1999). http://www.chm.tu- 20) Shenk, J. (2008, Oct.). Log management in the dresden.de/edv/manuals/aix/aixbman/admnconc/aua cloud: A comparison of in-house versus cloud- dit.htm based management of log data. SANS Whitepaper. 6) Dijkstra, E. W. (2003). On the role of scientific http://www.sans.org/reading_room/analysts_progra thought. m/LogMgmtCloud_Oct08.pdf E 21) Shenk, J. (2010, Apr.). SANS sixth annual log http://www.cs.utexas.edu/users/EWD/transcriptions /EWD04xx/EWD447.html management survey report. SANS Whitepaper. 7) Becta. (2009). Good practice in information https://www.sans.org/reading_room/analysts_progr handling: Audit logging and incident handling. am/logmgtsurvey-2010.pdf Becta, V2, March 2009 22) Mueller, P. (2008, May 31). Facing the monster: http://schools.becta.org.uk/upload- The labors of log management. InformationWeek. dir/downloads/audit_logging.doc http://www.informationweek.com/news/global- 8) GFI. (2008). Auditing events. cio/compliance/showArticle.jhtml?articleID=20840 http://docstore.mik.ua/manuals/hp-ux/en/5992- 0730 3387/ch10s04.html 23) Olzak, T. (2010, Apr. 24). Five common gaps in 9) Fischer, R. (2007, April). Motivations and logical network security. Bright Hub, Edited & challenges in designing a distributed log published by Michele McDonough.

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 157

http://www.brighthub.com/computing/enterprise- security/articles/66298.aspx?p=2 24) Adam, S. (2002). A new architecture for managing enterprise log data. In: Proceedings of the 16th Systems Administration Conference (LISA-02). Philadelphia, PA: USENIX Association, 121–132.

w e

i

V

y l r

a

E

P a g e | 158 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

A Transformation Scheme for Deriving Symmetric Watermarking Technique into Asymmetric Version

Rinaldi Munir1 Bambang Riyanto2 Sarwono Sutikno3 Wiseto P. Agung4

Abstract-This paper proposes a transformation scheme for private key by a proper transform. Legendre-sequence-key rendering the asymmetric watermarking technique into its et al. [5] belong to the first category, whereas Hartung and asymmetric version. The asymmetric technique uses secret Girod‘s [6] and Gui [7] techniques belong to the second watermark as private key and public watermark as public key. category. The public watermark has a normal distribution and the Many symmetric watermarking techniques have been private watermark is a linear combination of the public watermark and a secret sequence. The detection process is proposed and some of them have good results in robustness implemented by correlation test between the public watermark and imperceptibility. Thus we have an idea to derive a and the received image. The scheme is used to transform Barni symmetric watermarking technique into its asymmetric Algorithm, a symmetric watermarking technique, into a version, because designing a new asymmetric watermarking asymmetric version. Experiments showed that the asymmetric technique may need intensive effort andw time. In this paper, technique was proved as robust as its symmetric version we contribute to propose a transformation scheme which can against some typical image processing schemes. be used to derive the symmetric technique into its Keywords- asymmetric watermarking, Barni Algorithm, asymmetric version. We choose a classical symmetric transformation, correlation. watermarking technique whiche has good robustness and I. INTRODUCTION imperceptibility, i.e. Barni Algorithm [9]. We use the scheme to derive an asymmetrici version of Barni Algorithm igital watermarking has been used widely as a tool for D protecting copyright of digital multimedia data (e.g II. PROPOSED SCHEME images) [1, 2]. Many digital watermarking techniques for In several symmetric techniques, the secret key is the still images have been proposed [1-3]. The particular watermarks itself where they have the normal distribution. problem with the state-of-the-art watermarking techniques is V In asymmetric version of the symmetric technique, the that the majority of these schemes are symmetric: private key and the public key is referred as the private watermark embedding and detection use the same key. The watermark (Ws) and the public watermark (Wp) symmetric watermarking scheme has a security problem: respectively. The public watermark should have a once attacker knows the secret key, the watermark not only correlation with the private watermark, because the can be detected, but it can be easily estimated and removed detection is implemented by using correlation test between from the multimedia data completely and thereby defeaty the the public watermark and the received image. goal of copyright protection. In our scheme we map the symmetry method into the A solution to solve the problem is the asymmetricl asymmetry version. Based on the compatibility between watermarking scheme, in which different key(s) are used for symmetric and asymmetric watermarking method, then in watermark embedding and detection. An asymmetric the mapping no change in the watermark embedding watermarking system uses the privater key to embed a algorithm. Watermark embedding on the asymmetric watermark and the public key to verify the watermark. version is same as the original method (symmetric), but Anybody who knows the public key could detect the watermark detection algorithm is a slight change. The watermark, but the private keya cannot be deduced from the change is in the reference watermark used in correlation test. public key. Also, knowing the public key does not enable an In the symmetric method correlation test performed between attacker to remove the watermark [3]. the received image and original watermark, then in the Review of several existing asymmetric watermarking asymmetry version correlation test is performed by the techniques can be foundE in [3]. The asymmetric techniques received image and the public watermark proposed until now can be classified into two categories [8]. A new process added to the mapping is a transformation of a The first category is watermark-characteristics-based- private watermark to produce a public watermark. The method where the watermark is the signals which have public watermark Wp is generated by the transformation f to special characteristics such as periodicity. The other is the private watermark Ws, transform-based-method to make a public key from a given Wp = f(Ws ) (1) ______Fig.1 shows the transformation diagrams. The About-1,2,3 Rinaldi Munir, Bambang Riyanto, Sarwono Sutikno (School of transformation f is a one-way function, so that Electrical Engineering and Informatics, Bandung Institute of Technology, computationally almost impossible to derive private Indonesia).(e-mail;[email protected],[email protected] watermark from the corresponding public watermark. One- [email protected] 4 way nature of this property is important to provide security About- PT.Telekomunikasi,Indonesia(e-mail; [email protected]) on asymmetric watermarking method. The Function f is core

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 159 process of transformation the symmetric watermarking Fig. 2. Generating of the public and the private watermark method into its asymmetric version. III. SECURITY ANALYSIS W f Wp s Because of the public watermark, detector, and

Fig. 1. Transformation of the private watermark to the public watermark watermarked image are publicly available (not secret), the attacker uses the public information to deduce the private We design both of watermarks to have normal distribution, watermark Ws. Such attacks are called public attack. Once as the choice of this distribution gives resilient performance Ws can be calculated, then Ws is removed from the against collusion attack [1]. We use a concept in statistics to watermarked image by performing a subtraction operation design a function f. on watermark embedding formula (depending on methods). In statistics, if we add two or more random variables as a Security of this transformation scheme is based on two linear combination where each of them has normal factors as follows: distribution, then the result has normal distribution too. Let A. One-way function X be a sequence with mean  and variance  2 and Y be 1 1 One-way functions are commonly used in cryptography to sequence that independent from X with mean  and 2 enhance system security. In the one-way function, variance  2. A combination linear of X and Y is defined as 2 computing to evaluate function value of the variables is Z = aX + bY where a and b is parameters. Sequence Z has 2 2 2 2 2 relatively easy, but to discover variables value of the the mean 3 = a1 + b2 and variance 3 = a 1 + b 2 w function value is relatively computationally difficult and [10]. even impossible. In Eq. (3) parameter R can be viewed as In generating the watermarks we have to ensure that the trapdoor, it is impossible to find Ws without knowing the combination linear is secure. It means that the private trapdoor, because the attackerse must do the inversion of the watermark cannot be deduced from the public watermark. one-way function. The attacker knows Wp but he (or she) Furthermore, knowing the public watermark does not enable a user to remove the embedded watermark from the does not know R. Becausei of is an encrypted version of watermarked image. This characteristic is realized by R, the attacker must know the R before getting . adding the public watermark with a secret sequence. Security of this asymmetric version depend on the secret B. Permutation sequence. Let Wp be the public watermark and R be the Let the attackerV knows R, next he (or she) needs to know a secret sequence, the private watermark can be obtained by random permutation used to encrypt R. Because cardinality adding Wp and R as of R is n, the attacker must try n! permutation to find the right one. Remember that n is large enough, it is about 25%

Ws = f(Wp, R) = 0Wp + 1R (2) of original image size, so that finding the right permutation needs O(n!) computation. For n = 10000 as example, there are 10000! computation. We conclude that it is impossible where i is a parameter in [0, 1] to control the tradey off between the two sequences and 0 + 1 = 1. In order to make for attackers to deduce the private watermark from these the sequence R is more secure, we encrypt R byl a random public information. permutation before adding with Wp. Thus, eq. (2) can be IV. CASE STUDY: TRANSFORMATION OF BARNI ALGORITHM written as ~ In this section we present derivation of a symmetric Ws = f(Wp, R) =0Wp + 1 R (3) r ~ watermarking method into the asymmetric version based on where R is encrypted version of R. Fig. 2 shows the transformation scheme which has been described in Section process of generating the public and the private watermark. II. The symmetric watermarking method is a classic method, The private watermark Ws isa embedded into the image i.e Barni Algorithm. according to the equation used by its symmetric technique. In Barni Algorithm, the watermark consists of a pseudo In the detector side, using the public watermark, Wp, the test random sequence of M real number, W = {w(1), w(2), ..., correlation is computed to accomplish the watermark w(n)}, that has a normal distribution with mean = 0 and detection. E variance = 1. The watermark W is inserted into selected W Seed 1 Pseudo random p W s DCT coefficients, V = {v(1), v(2), ..., v(n)}. The watermark Addition generator detection is done by computing correlation between the selected DCT coefficients from a possibly corrupted image * * * ~ I*, i.e. V* = {v (1), v (2), ..., v (n)}. R In the asymmetric version of Barni Algorithm, we use two watermarks, the first is a private watermark, Ws = {ws(1), Seed 2 Pseudo random R Encrypt w (2), ..., w (M)}, that embedded into the host image and the generator s s second is a public watermark Wp = {wp(1), wp(2), ..., wp(n)), for detection phase. Both of the watermarks are generated by the procedure explained in Section 2. The private permutation key P a g e | 160 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

watermark is embedded into the image according to embedded in the image, then response from all the public formula: watermark is compared and the highest response is selected.

vw (i) = v(i) + v(i)ws(i) (3) Response from the public watermark is should be the In the detector side, using the public watermark, Wp, the highest compared with the others and this suggests the following correlation is computed: existence of a corresponding private watermark in the n image. 1 (4) c  v * (i)  wp (i) n ii After we set the threshold T, the watermark detection is finished by comparing c and a threshold. The threshold is depend on the received image and calculated with following formula:  n T   v* (i) (5) 3n i1 V. SIMULATION AND RESULTS We apply our method to image watermarking by using (a) Original image (b)w Watermarked image MATLAB as programming tool. The test image is a 512  Respon detektor Respon detektor 5 6 512 color image ‗train‘. Size of the private watermark is n = respon respon threshold threshold 5 16000. The watermark and the secret sequence R have a 4

4 normal distribution with mean = 0 and variance = 1. The 3 e public watermark is generated by function f which has been 3

2 Korelasi

Korelasi 2 explained in Section II with 0 is equal to 0.8 and 1 = 0.2. 1 i The embedding strength  is equal to 0.25. Histogram of 1 the public watermark and the private watermark is shown in 0 0 -1 -1 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Fig. 3. From Fig. 2(b) we observe that shape of distribution Indeks watermark Indeks watermark graphics of the private watermark is like a bell as common V standard normal distributions. (c) Response of public detection (c) Response of private detection

Histogram of private watermark Histogram of public watermark 900 700 Fig. 4. Output of watermark embedding and detection on 800 600

700 ‗train‘ image . 500 600

400 500 y Figure 4(c) shows that the correlation test with the correct 400 300 public watermark correlation value is significantly higher 300 200 200 than the other. The threshold T is calculated analytically 100 l 100 from equation (5) is 1.0188. In case there is no attack on the 0 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 watermarked image, the detector gives the value c = 4.2512. r Because c > T, it can be concluded that the image contains the (private) watermark. For comparison, in private Fig.3. Histogram of the private and public watermark detection (using the private watermark on correlation test) Before embedding the private watermark, the original image gives c = 5.3251 (Fig. 4(d)). a The next experiment was done to see robustness of the is transformed from RGB to YcbCr. The watermark is watermark against some non-malicious attacks, which is the embedded to luminance component (Y) only, and the final general operations performed on image processing result is retransformedE from YcbCr to RGB. (cropping, compression, low-pass filtering, etc.). We use Jasc Paint Shop version 6.01 as image processing software. Figure 4(a) shows the original image and Figure 4(b) shows The experiments and results are explained as follows. the watermarked image (PSNR = 36.9833). Visually the V.1. Experiment 1: JPEG Compression watermarked image quality was almost identical with the We tested the robustness against JPEG compression with original image. Figure 4(c) shows response of a public extreme compression qualitie. For compression quality 6%, detector to 1000 random watermarks, one of them (index the watermark can be detected succesfully (c = 1.1727, T = 250) is a public watermark that have a correlation with the 0.9602). For comparison, in private detection (using the private watermark. Such images provide two interpretations private watermark) gives c = 1.4632). See Fig. 5 [9]. The first interpretation, the response to the public watermark is compared with the T to decide the existence of the (private) watermark within the image. The second interpretation, if people do not know which watermark is

Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 161

(a) Cropped image (b) Image after histogram equalization (a) JPEG image with low compression quality Respon detektor Respon detektor 1.2 1.5 respon respon Fig. 7. Image cropping and histogram equalization 1 threshold threshold

1 V.4. Experiment 4: Histogram Equalization 0.8

0.6 The watermarked image is adjusted so that distribution of 0.5

0.4 gray-level is uniform by using histogram equalization Korelasi

Korelasi 0.2 0 operation (a typical low-pass filtering operation, see Fig. 0 7(b)). Experiment shows that the watermark can be detected -0.2 -0.5 w

-0.4 where c = 6.4877, T = 1.2269. For comparison, in in private -1 -0.6 0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000 Indeks watermark detection gives c = 8.1186. Indeks watermark (b) Response of public detection (c) Response of private detection V.5. Experiment 5: Resizing e The watermarked image is resized until 50% of the original size. Experiment shows that the watermark still can be Fig 5. JPEG compression with compression quality 6%. The detected. For resizingi up to 200% of the original image, the watermark can be detected watermark still can be detected well (we found that c = 1.8030, T = 0.4520). For comparison, in in private detection V.2 Experiment 2: Dithering gives c = 2.2571 We convert the watermarked image to a binary image by V VI. DISCUSSION dithering operation. It means plenty of gray-level information lost. It is shown in Fig. 6 that the watermark Based on a series of experiments that have been done for still can be detected. The response to the right watermark is asymmetric version of Barni algorithm, it has achieved some largest among the response to all the watermarks (c = results which are analyzed as follows. 3.4352, T = 2.4982). A series of experimental results show that the asymmetric y version remains robust to typical image processing

operations like JPEG compression, histogram equalization,

Respon detektor dithering, cropping, and resizeng. Detector response of 4 l respon threshold asymmetric method is not much different to original 3 symmetric version, and correlation values yielded by 2 detector not differ significantly 1

Korelasi r

0

-1 VII. CONCLUSION

-2 0 100 200 300 400 500 600 700 800 900 1000 a Indeks watermark In this paper a scheme for deriving a symmetric

watermarking technique into its asymmetric technique has been proposed. For test case, Barni algorithm, a classical E image watermarking, is successfully transformed into an Fig. 6. (a) Dithering. (b) Response of public detector asymmetric watermarking technique. This technique uses V.3. Experiment 2: Image Cropping two watermarks: the first watermark is a public watermark Image cropping will remove some watermark information. used for public detection, and the second watermark is a In our simulation, we cut unimportant part from the private watermark that has a correlation to the public watermarked image (about 50%), the missing part of the watermark. The private watermark is a linear combination of image is replaced with black pixels (see Figure 7a). In fact, the public watermark and an encrypted version of a secret we can always corectly detect the watermark because the sequence. Security of this asymmetric technique is based on correlation value (c = 2.1752, T = 0.8579) is still greater one-way function with trapdoor and the difficulty of finding significantly than the others. For comparison, in private the secret sequence where it needs O(n!) computation. detection gives c = 2.7349. Simulations against various attacks confirmed that this asymmetric technique is as robust as its symmetric version P a g e | 162 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology

VIII. REFERENCES

1) Ingemar J. Cox, et al, ―Secure Spread Spectrum

Watermarking for Multimedia‖, IEEE Trans. On

Image Processing, Vol. 6, No. 12, Dec 1997,

pp.1673-1687.

2) Mauro Barni, Franco Bartolini, Watermarking

Systems Engineering, Marcel Dekker Publishing,

2004.

3) Joachim J. Eggers, Jonathan K. Su, and Bernd

Girod, Asymmetric Watermarking Schemes, GMD

Jahrestagung, Proceedings, Springer-Verlag, 2000.

4) R.G. Schyndel, A.Z.,Tirkel, I.B. Svalbe, (1999):

―Key Independent Watermark Detection‖, in

Proceeding of the IEEE Intl. Conference on

Multimedia Computing and Systems, volume 1,

Florence, Italy.

5) J.J. Eggers, J.J., J.K. Su, B. Girod (2000): ―Public

Key Watermarking by Eigenvectors of Linear w Transform‖, EUSIPCO 2000.

6) F. Hartung, F. and B. Girod (1997): ―Fast Public-

Key Watermarking of Compressed Video‖,

Proceedings of the 1997 International. Conference e

on Image Processing (ICIP).

7) Guo-fu Gui, Ling-ge Jiang, and Chen He, ―A New i Asymmetric Watermarking Scheme for Copyright

Protection‖ IECE Trans, Fundamentals Vol. E89-

A, No 2 February 2006.

8) Geun-Sil Song, Mi-Ae-Kim, and Won-Hyung Lee, V ―Asymmetric Watermarking Scheme Using

Permutation Braids‖, Springer-Verlag, 2004.

9) Mauro Barni, F. Bartolini, V. Cappellini, A.Piva,

―A DCT-Domain System for Robust Image

Watermarking‖, Signal Processing 66, pp 357-372,

1998. y 10) Walpole, Ronald E., Myers, Raymond H., (1995),

Probability and Statistics for Engineers and

Scientists, Mc. Graw-Hill, 1995 l

r

a

E

w

e i V y l r a E

FELLOW OF INTERNATIONAL CONGRESS OF COMPUTER SCIENCE AND TECHNOLOGY (FICCT)

 FICCT' title will be awarded to the person after approval of Editor-in-Chief and Editorial Board. The title 'FICCT" can be added to name in the following manner e.g. Dr. Andrew Knoll, Ph.D., FICCT, .Er. Pettor Jone, M.E., FICCT  FICCT can submit two papers every year for publication without any charges. The paper will be sent to two peer reviewers. The paper will be published after the acceptance of peer reviewers and Editorial Board. w  Free unlimited Web-space will be allotted to 'FICCT 'along with subDomain to contribute and partake in our activities. e  A professional email address will be allotted free with unlimitedi email space.  FICCT will be authorized to receive e-Journals - GJCST for the Lifetime.  FICCT will be exempted from the registration fees of Seminar/Symposium/Conference/Workshop conductedV internationally of GJCST (FREE of Charge).  FICCT will be an Honorable Guest of any gathering hold. ASSOCIATE OF INTERNATIONALy CONGRESS OF COMPUTER SCIENCE AND TECHNOLOGY (AICCT) l  AICCT title will be awarded to the person/institution after approval of Editor-in- Chef and Editorial Board.r The title 'AICCTcan be added to name in the following manner: eg. Dr. Thomas aHerry, Ph.D., AICCT  AICCT can submit one paper every year for publication without any charges. The paper will be sent to two peer reviewers. The paper will be published after the acceptanceE of peer reviewers and Editorial Board.  Free 2GB Web-space will be allotted to 'FICCT' along with subDomain to contribute and participate in our activities.  A professional email address will be allotted with free 1GB email space.  AICCT will be authorized to receive e-Journal GJCST for lifetime.  A professional email address will be allotted with free 1GB email space.  AICHSS will be authorized to receive e-Journal GJHSS for lifetime. © Copyright by Global Journals | Guidelines Handbook

I

ANNUAL MEMBER

 Annual Member will be authorized to receive e-Journal GJCST for one year (subscription for one year).  The member will be allotted free 1 GB Web-space along with subDomain to contribute and participate in our activities. w  A professional email address will be allotted free 500 MB emaile space. PAPER PUBLICATION i  The members can publish paper once. The paper will be sent to two-peer reviewer. The paper will be published after the acceptance of peer reviewers and Editorial Board. V

y l

r

a E

© Copyright by Global Journals| Guidelines Handbook

II

The Area or field of specialization may or may not be of any category as mentioned in ‘Scope of Journal’ menu of the GlobalJournals.org website. There are 37 Research Journal categorized with Six parental Journals GJCST, GJMR, GJRE, GJMBR, GJSFR, GJHSS. For Authors should prefer the mentioned categories. There are three widely used systems UDC, DDC and LCC. The details are available as ‘Knowledge Abstract’ at Home page. The major advantage of this coding is that, the research work will be exposed to and shared with all over the world as we are being abstracted and indexed worldwide. w The paper should be in proper format. The format can be downloaded from first page of ‘Author Guideline’ Menu. The Author is expected to follow e the general rules as mentioned in this menu. The paper should be writteni in MS-Word Format (*.DOC,*.DOCX). The Author can submit the paper either online or offline. The authors should prefer online submission. V Online Submission: There are three ways to submit your paper: (A) (I) Register yourself using top right corner of Home page then Login from same place twice. If you are already registered, then login using your username and password. y (II) Choose corresponding Journal from “Research Journals” Menu. (III) Click ‘Submit Manuscript’. lFill required information and Upload the paper. (B) If you are using Internet Explorer (Although Mozilla Firefox is preferred), then Direct Submission through Homepager is also available. (C) If these two are not convenient, and then email the paper directly to [email protected] as an attachment. Offline Submission: Author can send the typed form of paper by Post. However, online submission shouldE be preferred.

© Copyright by Global Journals | Guidelines Handbook

III

MANUSCRIPT STYLE INSTRUCTION (Must be strictly followed)

Page Size: 8.27" X 11'"

 Left Margin: 0.65  Right Margin: 0.65  Top Margin: 0.75  Bottom Margin: 0.75  Font type of all text should be Times New Roman.  Paper Title should be of Font Size 24 with one Column section.  Author Name in Font Size of 11 with one column as of Title. w  Abstract Font size of 9 Bold, “Abstract” word in Italic Bold.  Main Text: Font size 10 with justified two columns section  Two Column with Equal Column with of 3.38 and Gaping of .2 e  First Character must be two lines Drop capped.  Paragraph before Spacing of 1 pt and After of 0 pt. i  Line Spacing of 1 pt  Large Images must be in One Column  Numbering of First Main Headings (Heading 1) must be in Roman Letters, Capital Letter, and Font Size of 10.  Numbering of Second Main Headings (Heading 2) must be in Alphabets,V Italic, and Font Size of 10. You can use your own standard format also.

Author Guidelines: 1. General, y 2. Ethical Guidelines, l 3. Submission of Manuscripts, 4. Manuscript’s Category, r 5. Structure and Format of Manuscript,

6. After Acceptance. a 1. GENERAL E Before submitting your research paper, one is advised to go through the details as mentioned in following heads. It will be beneficial, while peer reviewer justify your paper for publication.

© Copyright by Global Journals| Guidelines Handbook

IV

Scope

The Global Journals welcome the submission of original paper, review paper, survey article relevant to the all the streams of Philosophy and knowledge. The Global Journals is parental platform for Global Journal of Computer Science and Technology, Researches in Engineering, Medical Research, Science Frontier Research, Human Social Science, Management, and Business organization. The choice of specific field can be done otherwise as following in Abstracting and Indexing Page on this Website. As the all Global Journals are being abstracted and indexed (in process) by most of the reputed organizations. Topics of only narrow interest will not be accepted unless they have wider potential or consequences.

2. ETHICAL GUIDELINES

Authors should follow the ethical guidelines as mentioned below for publication of research paper and research activities.

Papers are accepted on strict understanding that the material in whole or in part has not been, nor is being, considered for publication elsewhere. If the paper once accepted by Global Journals and Editorial Board, will become the copyright of the Global Journals.

Authorship: The authors and coauthors should have active contribution to conception design, analysis and interpretation of findings. They should critically review the contents and drafting of the paper. All should approve the final version of the paper before submission w The Global Journals follows the definition of authorship set up by the Global Academy of Research and Development. According to the Global Academy of R&D authorship, criteria must be based on: e 1) Substantial contributions to conception and acquisition of data, analysis and interpretation ofi the findings. 2) Drafting the paper and revising it critically regarding important academic content. 3) Final approval of the version of the paper to be published. V All authors should have been credited according to their appropriate contribution in research activity and preparing paper. Contributors who do not match the criteria as authors may be mentioned under Acknowledgement.

Acknowledgements: Contributors to the research other than authors credited should be mentioned under acknowledgement. The specifications of the source of funding for the research if appropriate can be included. Suppliers of resources may be mentioned along with address. y Appeal of Decision: The Editorial Board’s decision on publicationl of the paper is final and cannot be appealed elsewhere. Permissions: It is the author's responsibility to have prior permission if all or parts of earlier published illustrations are used in this paper. r Please mention proper reference and appropriate acknowledgements wherever expected. If all or parts of previously publisheda illustrations are used, permission must be taken from the copyright holder concerned. It is the author's responsibility to take these in writing.

Approval for reproduction/modification of any information (including figures and tables) published elsewhere must be obtained by the authors/copyright holdersE before submission of the manuscript. Contributors (Authors) are responsible for any copyright fee involved. 3. SUBMISSION OF MANUSCRIPTS

Manuscripts should be uploaded via this online submission page. The online submission is most efficient method for submission of papers, as it enables rapid distribution of manuscripts and consequently speeds up the review procedure. It also enables authors to know the status of their own manuscripts by emailing us. Complete instructions for submitting a paper is available below.

Manuscript submission is a systematic procedure and little preparation is required beyond having all parts of your manuscript in a given format and a computer with an Internet connection and a Web browser. Full help and instructions are provided on-screen. As an author, © Copyright by Global Journals | Guidelines Handbook

V

you will be prompted for login and manuscript details as Field of Paper and then to upload your manuscript file(s) according to the instructions.

To avoid postal delays, all transaction is preferred by e-mail. A finished manuscript submission is confirmed by e-mail immediately and your paper enters the editorial process with no postal delays. When a conclusion is made about the publication of your paper by our Editorial Board, revisions can be submitted online with the same procedure, with an occasion to view and respond to all comments. Complete support for both authors and co-author is provided.

4. MANUSCRIPT’S CATEGORY

Based on potential and nature, the manuscript can be categorized under the following heads: Original research paper: Such papers are reports of high-level significant original research work.

Review papers: These are concise, significant but helpful and decisive topics for young researchers.

Research articles: These are handled with small investigation and applications Research letters: The letters are small and concise comments on previously published matters. w 5. STRUCTURE AND FORMAT OF MANUSCRIPT The recommended size of original research paper is less than seven thousand words, review papers fewere than seven thousands words also. Preparation of research paper or how to write research paper, are major hurdle, while writing manuscript. The research articles and research letters should be fewer than three thousand words, the structure original research paper; sometime review paper should be as follows: i

Papers: These are reports of significant research (typically less than 7000 words equivalent, including tables, figures, references), and comprise: V (a)Title should be relevant and commensurate with the theme of the paper.

(b) A brief Summary, “Abstract” (less than 150 words) containing the major results and conclusions. (c) Up to ten keywords, that precisely identifies the paper's subject,y purpose, and focus. (d) An Introduction, giving necessary background excluding subheadings; objectives must be clearly declared.

(e) Resources and techniques with sufficient completel experimental details (wherever possible by reference) to permit repetition; sources of information must be given and numerical methods must be specified by reference, unless non-standard.

(f) Results should be presented concisely, byr well-designed tables and/or figures; the same data may not be used in both; suitable statistical data should be given. All data must be obtained with attention to numerical detail in the planning stage. As reproduced design has been recognized to be important to experiments for a considerable time, the Editor has decided that any paper that appears not to have adequate numerical treatmentsa of the data will be returned un-refereed; (g) Discussion should cover the implications and consequences, not just recapitulating the results; conclusions should be summarizing.

(h) Brief Acknowledgements.E

(i) References in the proper form.

Authors should very cautiously consider the preparation of papers to ensure that they communicate efficiently. Papers are much more likely to be accepted, if they are cautiously designed and laid out, contain few or no errors, are summarizing, and be conventional to the approach and instructions. They will in addition, be published with much less delays than those that require much technical and editorial correction. The Editorial Board reserves the right to make literary corrections and to make suggestions to improve briefness. It is vital, that authors take care in submitting a manuscript that is written in simple language and adheres to published guidelines.

© Copyright by Global Journals| Guidelines Handbook

VI

Format Language: The language of publication is UK English. Authors, for whom English is a second language, must have their manuscript efficiently edited by an English-speaking person before submission to make sure that, the English is of high excellence. It is preferable, that manuscripts should be professionally edited.

Standard Usage, Abbreviations, and Units: Spelling and hyphenation should be conventional to The Concise Oxford English Dictionary. Statistics and measurements should at all times be given in figures, e.g. 16 min, except for when the number begins a sentence. When the number does not refer to a unit of measurement it should be spelt in full unless, it is 160 or greater.

Abbreviations supposed to be used carefully. The abbreviated name or expression is supposed to be cited in full at first usage, followed by the conventional abbreviation in parentheses.

Metric SI units are supposed to generally be used excluding where they conflict with current practice or are confusing. For illustration, 1.4 l rather than 1.4 × 10-3 m3, or 4 mm somewhat than 4 × 10-3 m. Chemical formula and solutions must identify the form used, e.g. anhydrous or hydrated, and the concentration must be in clearly defined units. Common species names should be followed by underlines at the first mention. For following use the generic name should be constricted to a single letter, if it is clear. Structure w All manuscripts submitted to Global Journals, ought to include: Title: The title page must carry an instructive title that reflects the content, a running title (less than 45 characters together with spaces), names of the authors and co-authors, and the place(s) wherever the work was carried out. The full postal address in addition with the e- mail address of related author must be given. Up to eleven keywords or very brief phrases have to be givene to help data retrieval, mining and indexing. Abstract, used in Original Papers and Reviews: i Optimizing Abstract for Search Engines Many researchers searching for information online will use search engines such as Google, Yahoo or similar. By optimizing your paper for search engines, you will amplify the chance of someone finding it. This in turn will make it more likely to be viewed and/or cited in a further work. Global Journals have compiled these guidelines to facilitate you to maximize the web-friendliness of the most public part of your paper. V

Key Words A major linchpin in research work for the writing research paper is the keyword search, which one will employ to find both library and Internet resources. One must be persistent and creative in using keywords. An effective keyword search requires a strategy and planning a list of possible keywords and phrases to try. y Search engines for most searches, use Boolean searching, which is somewhat different from Internet searches. The Boolean search uses "operators," words (and, or, not, and near) that enablel you to expand or narrow your affords. Tips for research paper while preparing research paper are very helpful guideline of research paper.

Choice of key words is first tool of tips to writer research paper. Research paper writing is an art.A few tips for deciding as strategically as possible about keyword search:

 One shoulda start brainstorming lists of possible keywords before even begin searching. Think about the most important concepts related to research work. Ask, "What words would a source have to include to be truly valuable in research paper?" Then consider synonyms for the important words.  EIt may take the discovery of only one relevant paper to let steer in the right keyword direction because in most databases, the keywords under which a research paper is abstracted are listed with the paper.  One should avoid outdated words.

Keywords are the key that opens a door to research work sources. Keyword searching is an art in which researcher's skills are bound to improve with experience and time.

Numerical Methods: Numerical methods used should be clear and, where appropriate, supported by references.

Acknowledgements: Please make these as concise as possible. © Copyright by Global Journals | Guidelines Handbook

VII

References

References follow the Harvard scheme of referencing. References in the text should cite the authors' names followed by the time of their publication, unless there are three or more authors when simply the first author's name is quoted followed by et al. unpublished work has to only be cited where necessary, and only in the text. Copies of references in press in other journals have to be supplied with submitted typescripts. It is necessary that all citations and references be carefully checked before submission, as mistakes or omissions will cause delays.

References to information on the World Wide Web can be given, but only if the information is available without charge to readers on an official site. Wikipedia and Similar websites are not allowed where anyone can change the information. Authors will be asked to make available electronic copies of the cited information for inclusion on the Global Journals homepage at the judgment of the Editorial Board.

The Editorial Board and Global Journals recommend that, citation of online-published papers and other material should be done via a DOI (digital object identifier). If an author cites anything, which does not have a DOI, they run the risk of the cited material not being noticeable.

The Editorial Board and Global Journals recommend the use of a tool such as Reference Manager for reference management and formatting. w Tables, Figures and Figure Legends Tables: Tables should be few in number, cautiously designed, uncrowned, and include only essentiale data. Each must have an Arabic number, e.g. Table 4, a self-explanatory caption and be on a separate sheet. Vertical lines should not be used. Figures: Figures are supposed to be submitted as separate files. Always take in a citation in thei text for each figure using Arabic numbers, e.g. Fig. 4. Artwork must be submitted online in electronic form by e-mailing them. Preparation of Electronic Figures for Publication V Even though low quality images are sufficient for review purposes, print publication requires high quality images to prevent the final product being blurred or fuzzy. Submit (or e-mail) EPS (line art) or TIFF (halftone/photographs) files only. MS PowerPoint and Word Graphics are unsuitable for printed pictures. Do not use pixel-oriented software. Scans (TIFF only) should have a resolution of at least 350 dpi (halftone) or 700 to 1100 dpi (line drawings) in relation to the imitation size. Please give the data for figures in black and white or submit a Color Work Agreement Form. EPS files must be savedy with fonts embedded (and with a TIFF preview, if possible). For scanned images, the scanning resolution (at final image size) ought to be as follows to ensure good reproduction: line art: >650 dpi; halftones (including gel photographs) : >350 dpi; figuresl containing both halftone and line images: >650 dpi. Color Charges: It is the rule of the Global Journals for authors to pay the full cost for the reproduction of their color artwork. Hence, please note that, if there is color artwork in yourr manuscript when it is accepted for publication, we would require you to complete and return a color work agreement form before your paper can be published.

Figure Legends: Self-explanatory legends of all figures should be incorporated separately under the heading 'Legends to Figures'. In the full-text online edition of the journal,a figure legends may possibly be truncated in abbreviated links to the full screen version. Therefore, the first 100 characters of any legend should notify the reader, about the key aspects of the figure. E 6. AFTER ACCEPTANCE

Upon approval of a paper for publication, the manuscript will be forwarded to the dean, who is responsible for the publication of the Global Journals.

6.1 Proof Corrections

The corresponding author will receive an e-mail alert containing a link to a website or will be attached. A working e-mail address must therefore be provided for the related author. © Copyright by Global Journals| Guidelines Handbook

VIII

Acrobat Reader will be required in order to read this file. This software can be downloaded

(Free of charge) from the following website: www.adobe.com/products/acrobat/readstep2.html. This will facilitate the file to be opened, read on screen, and printed out in order for any corrections to be added. Further instructions will be sent with the proof.

Proofs must be returned to the dean at [email protected] within three days of receipt.

As changes to proofs are costly, we inquire that you only correct typesetting errors. All illustrations are retained by the publisher. Please note that the authors are responsible for all statements made in their work, including changes made by the copy editor.

6.2 Early View of Global Journals (Publication Prior to Print)

The Global Journals are enclosed by our publishing's Early View service. Early View articles are complete full-text articles sent in advance of their publication. Early View articles are absolute and final. They have been completely reviewed, revised and edited for publication, and the authors' final corrections have been incorporated. Because they are in final form, no changes can be made after sending them. The nature of Early View articles means that they do not yet have volume, issue or page numbers, so Early View articles cannot be cited in the conventional way. w 6.3 Author Services

Online production tracking is available for your article through Author Services. Author Services enablese authors to track their article - once it has been accepted - through the production process to publication online and in print. Authors can check the status of their articles online and choose to receive automated e-mails at key stages of production. The authorsi will receive an e-mail with a unique link that enables them to register and have their article automatically added to the system. Please ensure that a complete e-mail address is provided when submitting the manuscript. 6.4 Author Material Archive Policy V Please note that if not specifically requested, publisher will dispose off hardcopy & electronic information submitted, after the two months of publication. If you require the return of any information submitted, please inform the Editorial Board or dean as soon as possible. 6.5 Offprint and Extra Copies y A PDF offprint of the online-published article will be provided free of charge to the related author, and may be distributed according to the Publisher's terms and conditions. Additional paper loffprint may be ordered by emailing us at: [email protected] . INFORMAL TIPS FOR WRITING A COMPUTERr SCIENCE RESEARCH PAPER TO INCREASE READABILITY AND CITATION Before start writing a good quality Computer Science Research Paper, let us first understand what is Computer Science Research Paper? So, Computer Science Research Paper is the paper which is written by professionals or scientists who are associated to Computer Science and Information Technology, or doinga research study in these areas. If you are novel to this field then you can consult about this field from your supervisor or guide.

Techniques for writingE a good quality Computer Science Research Paper:

1. Choosing the topic- In most cases, the topic is searched by the interest of author but it can be also suggested by the guides. You can have several topics and then you can judge that in which topic or subject you are finding yourself most comfortable. This can be done by asking several questions to yourself, like Will I be able to carry our search in this area? Will I find all necessary recourses to accomplish the search? Will I be able to find all information in this field area? If the answer of these types of questions will be "Yes" then you can choose that topic. In most of the cases, you may have to conduct the surveys and have to visit several places because this field is related to Computer Science and Information Technology. Also, you may have to do a lot of work to find all rise and falls regarding the various data of that subject. Sometimes, detailed information plays a vital role, instead of short information.

© Copyright by Global Journals | Guidelines Handbook

IX

2. Evaluators are human: First thing to remember that evaluators are also human being. They are not only meant for rejecting a paper. They are here to evaluate your paper. So, present your Best.

3. Think Like Evaluators: If you are in a confusion or getting demotivated that your paper will be accepted by evaluators or not, then think and try to evaluate your paper like an Evaluator. Try to understand that what an evaluator wants in your research paper and automatically you will have your answer.

4. Make blueprints of paper: The outline is the plan or framework that will help you to arrange your thoughts. It will make your paper logical. But remember that all points of your outline must be related to the topic you have chosen.

5. Ask your Guides: If you are having any difficulty in your research, then do not hesitate to share your difficulty to your guide (if you have any). They will surely help you out and resolve your doubts. If you can't clarify what exactly you require for your work then ask the supervisor to help you with the alternative. He might also provide you the list of essential readings.

6. Use of computer is recommended: As you are doing research in the field of Computer Science, then this point is quite obvious.

7. Use right software: Always use good quality software packages. If you are not capable to judge good software wthen you can lose quality of your paper unknowingly. There are various software programs available to help you, which you can get through Internet.

8. Use the Internet for help: An excellent start for your paper can be by using the Google. It is an excellente search engine, where you can have your doubts resolved. You may also read some answers for the frequent question how to write my research paper or find model research paper. From the internet library you can download books. If you have all required booksi make important reading selecting and analyzing the specified information. Then put together research paper sketch out.

9. Use and get big pictures: Always use encyclopedias, Wikipedia to get pictures so that you can go into the depth. V 10. Bookmarks are useful: When you read any book or magazine, you generally use bookmarks, right! It is a good habit, which helps to not to lose your continuity. You should always use bookmarks while searching on Internet also, which will make your search easier.

11. Revise what you wrote: When you write anything, always read it, summarize it and then finalize it. y 12. Make all efforts: Make all efforts to mention what you are going to write in your paper. That means always have a good start. Try to mention everything in introduction, that what is the needl of a particular research paper. Polish your work by good skill of writing and always give an evaluator, what he wants.

13. Have backups: When you are going to do rany important thing like making research paper, you should always have backup copies of it either in your computer or in paper. This will help you to not to lose any of your important.

14. Produce good diagrams of youra own: Always try to include good charts or diagrams in your paper to improve quality. Using several and unnecessary diagrams will degrade the quality of your paper by creating "hotchpotch." So always, try to make and include those diagrams, which are made by your own to improve readability and understandability of your paper.

15. Use of direct quotes:E When you do research relevant to literature, history or current affairs then use of quotes become essential but if study is relevant to science then use of quotes is not preferable.

16. Use proper verb tense: Use proper verb tenses in your paper. Use past tense, to present those events that happened. Use present tense to indicate events that are going on. Use future tense to indicate future happening events. Use of improper and wrong tenses will confuse the evaluator. Avoid the sentences that are incomplete.

17. Never use online paper: If you are getting any paper on Internet, then never use it as your research paper because it might be possible that evaluator has already seen it or maybe it is outdated version. © Copyright by Global Journals| Guidelines Handbook

X

18. Pick a good study spot: To do your research studies always try to pick a spot, which is quiet. Every spot is not for studies. Spot that suits you choose it and proceed further.

19. Know what you know: Always try to know, what you know by making objectives. Else, you will be confused and cannot achieve your target.

20. Use good quality grammar: Always use a good quality grammar and use words that will throw positive impact on evaluator. Use of good quality grammar does not mean to use tough words, that for each word the evaluator has to go through dictionary. Do not start sentence with a conjunction. Do not fragment sentences. Eliminate one-word sentences. Ignore passive voice. Do not ever use a big word when a diminutive one would suffice. Verbs have to be in agreement with their subjects. Prepositions are not expressions to finish sentences with. It is incorrect to ever divide an infinitive. Avoid clichés like the disease. Also, always shun irritating alliteration. Use language that is simple and straight forward. put together a neat summary.

21. Arrangement of information: Each section of the main body should start with an opening sentence and there should be a changeover at the end of the section. Give only valid and powerful arguments to your topic. You may also maintain your arguments with records. w 22. Never start in last minute: Always start at right time and give enough time to research work. Leaving everything to the last minute will degrade your paper and spoil your work. e 23. Multitasking in research is not good: Doing several things at the same time proves bad habit in case of research activity. Research is an area, where everything has a particular time slot. Divide your research work in parts and do iparticular part in particular time slot.

24. Never copy others' work: Never copy others' work and give it your name because if evaluator has seen it anywhere you will be in trouble. V 25. Take proper rest and food: No matter how many hours you spend for your research activity, if you are not taking care of your health then all your efforts will be in vain. For a quality research, study is must, and this can be done by taking proper rest and food.

26. Go for seminars: Attend seminars if the topic is relevant to your research area. Utilize all your resources. y 27. Refresh your mind after intervals: Try to give rest to your mind by listening to soft music or by sleeping in intervals. This will also improve your memory. l

28. Make colleagues: Always try to make colleagues. No matter how sharper or intelligent you are, if you make colleagues you can have several ideas, which will be helpful for your research.r

29. Think technically: Always think technically. If anything happens, then search its reasons, its benefits, and demerits. a 30. Think and then print: When you will go to print your paper, notice that tables are not be split, headings are not detached from their descriptions, and page sequence is maintained.

31. Adding unnecessaryE information: Do not add unnecessary information, like, I have used MS Excel to draw graph. Do not add irrelevant and inappropriate material. These all will create superfluous. Foreign terminology and phrases are not apropos. One should NEVER take a broad view. Analogy in script is like feathers on a snake. Not at all use a large word when a very small one would be sufficient. Use words properly, regardless of how others use them. Remove quotations. Puns are for kids, not grunt readers. Amplification is a billion times of inferior quality than sarcasm.

32. Never oversimplify everything: To add material in your research paper, never go for oversimplification. This will definitely irritate the evaluator. Be more or less specific. Also too, by no means, ever use rhythmic redundancies. Contractions aren't essential and shouldn't be there used. Comparisons are as terrible as clichés. Give up ampersands and abbreviations, and so on. Remove commas, that are, not © Copyright by Global Journals | Guidelines Handbook

XI

necessary. Parenthetical words however should be together with this in commas. Understatement is all the time the complete best way to put onward earth-shaking thoughts. Give a detailed literary review.

33. Report concluded results: Use concluded results. From raw data, filter the results and then conclude your studies based on measurements and observations taken. Significant figures and appropriate number of decimal places should be used. Parenthetical remarks are prohibitive. Proofread carefully at final stage. In the end give outline to your arguments. Spot out perspectives of further study of this subject. Justify your conclusion by at the bottom of them with sufficient justifications and examples.

34. After conclusion: Once you have concluded your research, the next most important step is to present your findings. Presentation is extremely important as it is the definite medium though which your research is going to be in print to the rest of the crowd. Care should be taken to categorize your thoughts well and present them in a logical and neat manner. A good quality research paper format is essential because it serves to highlight your research paper and bring to light all necessary aspects in your research.

INFORMAL GUIDELINES OF RESEARCH PAPER WRITING Key points to remember: w  Submit all work in its final form.  Write your paper in the form, which is presented in the guidelines using the template.  Please note the criterion for grading the final paper by peer-reviewers. e Final Points: i

A purpose of organizing a research paper is to let people to interpret your effort selectively. The journal requires the following sections, submitted in the order listed, each section to start on a new page. V The introduction will be compiled from reference matter and will reflect the design processes or outline of basis that direct you to make study. As you will carry out the process of study, the method and process section will be constructed as like that. The result segment will show related statistics in nearly sequential order and will direct the reviewers next to the similar intellectual paths throughout the data that you took to carry out your study. The discussion section will provide understanding of the data and projections as to the implication of the results. The use of good quality references all through the paper will give the effort trustworthiness by representing an alertness of prior workings. y

Writing a research paper is not an easy job no matter howl trouble-free the actual research or concept. Practice, excellent preparation, and controlled record keeping are the only means to make straightforward the progression.

General style: r

Specific editorial column necessities for compliance of a manuscript will always take over from directions in these general guidelines. a To make a paper clear · Adhere to recommendedE page limits

Mistakes to evade

 Insertion a title at the foot of a page with the subsequent text on the next page  Separating a table/chart or figure - impound each figure/table to a single page  Submitting a manuscript with pages out of sequence

© Copyright by Global Journals| Guidelines Handbook

XII

In every sections of your document

· Use standard writing style including articles ("a", "the," etc.)

· Keep on paying attention on the research topic of the paper

· Use paragraphs to split each significant point (excluding for the abstract)

· Align the primary line of each section

· Present your points in sound order

· Use present tense to report well accepted

· Use past tense to describe specific results

· Shun familiar wording, don't address the reviewer directly, and don't use slang, slang language, or superlatives w · Shun use of extra pictures - include only those figures essential to presenting results

Title Page: e

Choose a revealing title. It should be short. It should not have non-standard acronyms or abbreviations.i It should not exceed two printed lines. It should include the name(s) and address (es) of all authors.

Abstract: V The summary should be two hundred words or less. It should briefly and clearly explain the key findings reported in the manuscript-- must have precise statistics. It should not have abnormal acronyms or abbreviations. It should be logical in itself. Shun citing references at this point.

An abstract is a brief distinct paragraph summary of finished worky or work in development. In a minute or less a reviewer can be taught the foundation behind the study, common approach to the problem, relevant results, and significant conclusions or new questions. l Write your summary when your paper is completed because how can you write the summary of anything which is not yet written? Wealth of terminology is very essential in abstract. Yet, use comprehensive sentences and do not let go readability for briefness. You can maintain it succinct by phrasing sentences so rthat they provide more than lone rationale. The author can at this moment go straight to shortening the outcome. Sum up the study, with the subsequent elements in any summary. Try to maintain the initial two items to no more than one ruling each. a  Reason of the study - theory, overall issue, purpose  Fundamental goal  To the point Edepiction of the research  Consequences, including definite statistics - if the consequences are quantitative in nature, account quantitative data; results of any numerical analysis should be reported  Significant conclusions or questions that track from the research(es)

Approach:

 Single section, and succinct  As a outline of job done, it is always written in past tense  A conceptual should situate on its own, and not submit to any other part of the paper such as a form or table © Copyright by Global Journals | Guidelines Handbook

XIII

 Center on shortening results - bound background information to a verdict or two, if completely necessary  What you account in an conceptual must be regular with what you reported in the manuscript  Exact spelling, clearness of sentences and phrases, and appropriate reporting of quantities (proper units, important statistics) are just as significant in an abstract as they are anywhere else

Introduction: The Introduction should "introduce" the manuscript. The reviewer should be presented with sufficient background information to be capable to comprehend and calculate the purpose of your study without having to submit to other works. The basis for the study should be offered. Give most important references but shun difficult to make a comprehensive appraisal of the topic. In the introduction, describe the problem visibly. If the problem is not acknowledged in a logical, reasonable way, the reviewer will have no attention in your result. Speak in common terms about techniques used to explain the problem, if needed, but do not present any particulars about the protocols here. Following approach can create a valuable beginning:

 Explain the value (significance) of the study  Shield the model - why did you employ this particular system or method? What is its compensation? You strength remark on its appropriateness from a abstract point of vision as well as point out sensible reasons for using it. w  Present a justification. Status your particular theory (es) or aim(s), and describe the logic that led you to choose them.  Very for a short time explain the tentative propose and how it skilled the declared objectives.

Approach: e

 Use past tense except for when referring to recognized facts. After all, the manuscripti will be submitted after the entire job is done.  Sort out your thoughts; manufacture one key point with every section. If you make the four points listed above, you will need a least of four paragraphs.  Present surroundings information only as desirable in order hold up a situation.V The reviewer does not desire to read the whole thing you know about a topic.  Shape the theory/purpose specifically - do not take a broad view.  As always, give awareness to spelling, simplicity and correctness of sentences and phrases.

Procedures (Methods and Materials): y This part is supposed to be the easiest to carve if you have good skills. A sound written Procedures segment allows a capable scientist to replacement your results. Present precise information labout your supplies. The suppliers and clarity of reagents can be helpful bits of information. Present methods in sequential order but linked methodologies can be grouped as a segment. Be concise when relating the protocols. Attempt for the least amount of information that would permit another capable scientist to spare your outcome but be cautious that vital information is integrated. Ther use of subheadings is suggested and ought to be synchronized with the results section. When a technique is used that has been well described in another object, mention the specific item describing a way but draw the basic principle while stating the situation. The purpose is to text all particular resources and broad procedures, so that another person may use some or all of the methods in one more study or referee the scientific value of your work. It is not to be a step by step report of the whole thing you did, nor is a methodsa section a set of orders. Materials:  Explain materialsE individually only if the study is so complex that it saves liberty this way.  Embrace particular materials, and any tools or provisions that are not frequently found in laboratories.  Do not take in frequently found.  If use of a definite type of tools.  Materials may be reported in a part section or else they may be recognized along with your measures.

Methods:

 Report the method (not particulars of each process that engaged the same methodology)

© Copyright by Global Journals| Guidelines Handbook

XIV

 Describe the method entirely  To be succinct, present methods under headings dedicated to specific dealings or groups of measures  Simplify - details how procedures were completed not how they were exclusively performed on a particular day.  If well known procedures were used, account the procedure by name, possibly with reference, and that's all.

Approach:

 It is embarrassed or not possible to use vigorous voice when documenting methods with no using first person, which would focus the reviewer's interest on the researcher rather than the job. As a result when script up the methods most authors use third person passive voice.  Use standard style in this and in every other part of the paper - avoid familiar lists, and use full sentences.

What to keep away from

 Resources and methods are not a set of information.  Skip all descriptive information and surroundings - save it for the argument.  Leave out information that is immaterial to a third party. w Results:

The principle of a results segment is to present and demonstrate your conclusion. Create this part a entirelye objective details of the outcome, and save all understanding for the discussion. i The page length of this segment is set by the sum and types of data to be reported. Carry on to be to the point, by means of statistics and tables, if suitable, to present consequences most efficiently.

You must obviously differentiate material that would usually be incorporated in a study editorial from any unprocessed data or additional appendix matter that would not be available. In fact, such matter shouldV not be submitted at all except requested by the instructor. Content

 Sum up your conclusion in text and demonstrate them, if suitable, with figures and tables.  In manuscript, explain each of your consequences, point the reader to remarks that are most appropriate.  y Present a background, such as by describing the question that was addressed by creation an exacting study.  Explain results of control experiments and comprise remarks that are not accessible in a prescribed figure or table, if appropriate. l  Examine your data, then prepare the analyzed (transformed) data in the form of a figure (graph), table, or in manuscript form. What to stay away from  Do not discuss or infer your outcome,r report surroundings information, or try to explain anything.  Not at all, take in raw data or intermediate calculations in a research manuscript.  Do not present the similar data more than once.  Manuscript should complementa any figures or tables, not duplicate the identical information.  Never confuse figures with tables - there is a difference. Approach  As forever, useE past tense when you submit to your results, and put the whole thing in a reasonable order.  Put figures and tables, appropriately numbered, in order at the end of the report  If you desire, you may place your figures and tables properly within the text of your results part. Figures and tables  If you put figures and tables at the end of the details, make certain that they are visibly distinguished from any attach appendix materials, such as raw facts  Despite of position, each figure must be numbered one after the other and complete with subtitle  In spite of position, each table must be titled, numbered one after the other and complete with heading  All figure and table must be adequately complete that it could situate on its own, divide from text

© Copyright by Global Journals | Guidelines Handbook

XV

Discussion:

The Discussion is expected the trickiest segment to write and describe. A lot of papers submitted for journal are discarded based on problems with the Discussion. There is no head of state for how long a argument should be. Position your understanding of the outcome visibly to lead the reviewer through your conclusions, and then finish the paper with a summing up of the implication of the study. The purpose here is to offer an understanding of your results and hold up for all of your conclusions, using facts from your research and generally accepted information, if suitable. The implication of result should be visibly described. Infer your data in the conversation in suitable depth. This means that when you clarify an observable fact you must explain mechanisms that may account for the observation. If your results vary from your prospect, make clear why that may have happened. If your results agree, then explain the theory that the proof supported. It is never suitable to just state that the data approved with prospect, and let it drop at that.

 Make a decision if each premise is supported, discarded, or if you cannot make a conclusion with assurance. Do not just dismiss a study or part of a study as "uncertain."  Research papers are not acknowledged if the work is imperfect. Draw what conclusions you can based upon the results that you have, and take care of the study as a finished work  You may propose future guidelines, such as how the experiment might be personalized to accomplish a new idea.  Give details all of your remarks as much as possible, focus on mechanisms.  Make a decision if the tentative design sufficiently addressed the theory, and whether or not it was correctlyw restricted.  Try to present substitute explanations if sensible alternatives be present.  One research will not counter an overall question, so maintain the large picture in mind, where do you go next? The best studies unlock new avenues of study. What questions remain? e  Recommendations for detailed papers will offer supplementary suggestions. Approach: i  When you refer to information, differentiate data generated by your own studies from available information  Submit to work done by specific persons (including you) in past tense.  Submit to generally acknowledged facts and main beliefs in present tense.V ADMINISTRATION RULES LISTED BEFORE SUBMITTING YOUR RESEARCH PAPER TO GLOBAL JOURNALS Please carefully note down following rules and regulation before submitting your Research Paper to Global Journals:

Segment Draft and Final Research Paper: You have to strictly followy the template of research paper. If it is not done your paper may get rejected. l  The major constraint is that you must independently make all content, tables, graphs, and facts that are offered in the paper. You must write each part of the paperr wholly on your own. The Peer-reviewers need to identify your own perceptive of the concepts in your own terms. NEVER extract straight from any foundation, and never rephrase someone else's analysis.  Do not give permission to anyone else to "PROOFREAD" your manuscript.

 Methods to avoid Plagiarisma is applied by us on every paper, if found guilty, you will be blacklisted by all of our collaborated research groups, your institution will be informed for this and strict legal actions will be taken immediately.)  To guard yourselfE and others from possible illegal use please do not permit anyone right to use to your paper and files.

© Copyright by Global Journals| Guidelines Handbook

XVI

CRITERION FOR GRADING A RESEARCH PAPER (COMPILATION) BY GLOBAL JOURNALS Please note that following table is only a Grading of "Paper Compilation" and not on "Performed/Stated Research" whose grading solely depends on Individual Assigned Peer Reviewer and Editorial Board Member. These can be available only on request and after decision of Paper. This report will be the property of Global Journals.

Topics Grades

A-B C-D E-F

Clear and concise with Unclear summary and no No specific data with ambiguous appropriate content, Correct specific data, Incorrect form information w Abstract format. 200 words or below Above 200 words Above 250 words

Containing all background Unclear and confusing data, Out ofe place depth and content, details with clear goal and appropriate format, grammar hazy format appropriate details, flow and spelling errors with i specification, no grammar unorganized matter Introduction and spelling mistake, well organized sentence and paragraph, reference cited V

Clear and to the point with Difficult to comprehend with Incorrect and unorganized well arranged paragraph, embarrassed text, too much structure with hazy meaning Methods and precision and accuracy of explanation but completed Procedures facts and figures, well y organized subheads l Well organized, Clear and Complete and embarrassed Irregular format with wrong facts specific, Correct units with text, difficult to comprehend and figures precision, correctr data, well Result structuring of paragraph, no grammar and spelling mistakea Well organized, meaningful Wordy, unclear conclusion, Conclusion is not cited, specification, sound spurious unorganized, difficult to E conclusion, logical and comprehend concise explanation, highly Discussion structured paragraph reference cited

Complete and correct Beside the point, Incomplete Wrong format and structuring References format, well organized

© Copyright by Global Journals | Guidelines Handbook

XVII