The Early View of
“Global Journals of Computer Science and Technology”
In case of any minor updation/modification/correction, kindly inform within 3 working days after you have received this.
Kindly note, the Research papers may be removed, added, or altered according to the final status.
w e i V y l r a
E
w e i V y l r a E © Global Journal of Computer Science and Technology. 2010.
All rights reserved.
This is a special issue published in version 1.0 of ―Global Journal of Computer Science and Technology..‖ Publisher’s correspondence office
All articles are open access articles distributed under Global Journal of Computer Science and Technology..‖ Global Journals, Headquarters Corporate Office, Reading License, which permits restricted use. Entire contents are copyright by of ―Global United States Journal of Computer Science and w Technology..‖unless otherwise noted on specific articles. Offset Typesetting
No part of this publication may be reproduced e or transmitted in any form or by any means, Global Journals, City Center Office, electronic or mechanical, including photocopy, United States recording, or any information storage and i retrieval system, without written permission. Packaging & Continental Dispatching The opinions and statements made in this book are those of the authors concerned. Ultraculture has not verified and neither confirms nor V denies any of the foregoing and no warranty or Global Journals, India fitness is implied.
Engage with the contents herein at your own Find a correspondence nodal officer near you risk. y The use of this journal, and the terms and To find nodal officer of your country, please conditions for our providing information, is email us at [email protected] governed by our Disclaimer, Terms and l
Conditions and Privacy Policy given on our website http://www.globaljournals.org/global- eContacts journals-research-portal/guideline/terms-and- r conditions/menu-id-260/. Press Inquiries: [email protected] By referring / using / reading / any type of Investor Inquiries: [email protected] association / referencing this journal,a this signifies and you acknowledge that you have Technical Support: [email protected] read them and that you accept and will be bound by the terms thereof. Media & Releases: [email protected] E All information, journals, this journal, activities undertaken, materials, services and Pricing (Including by Air Parcel Charges): our website, terms and conditions, privacy policy, and this journal is subject to change anytime without any prior notice. For Authors: 22 USD (B/W) & 50 USD (Color) License No.: 42125/022010/1186 Registration No.: 430374 Import-Export Code: 1109007027 Yearly Subscription (Personal & Institutional): 200 USD (B/W) & 500 USD (Color) John A. Hamilton,"Drew" Jr., Dr. Wenying Feng Ph.D., Professor, Management Professor, Department of Computing & Computer Science and Software Information Systems Engineering Department of Mathematics Director, Information Assurance Trent University, Peterborough, Laboratory ON Canada K9J 7B8 Auburn University w Dr. Henry Hexmoor Dr. Thomas Wischgoll IEEE senior member since 2004 Computer Science ande Engineering, Ph.D. Computer Science, University at Wright State University, Dayton, Ohio Buffalo B.S., M.S., Ph.D. i Department of Computer Science (University of Kaiserslautern) Southern Illinois University at Carbondale Dr. Osman Balci, Professor Dr. AbdurrahmanV Arslanyilmaz Department of Computer Science Computer Science & Information Systems Virginia Tech, Virginia University Department Ph.D.and M.S.Syracuse University, Youngstown State University Syracuse, New York yPh.D., Texas A&M University M.S. and B.S. Bogazici University, Istanbul,l University of Missouri, Columbia Turkey Gazi University, Turkey Yogita Bajpai Dr. Xiaohong He M.Sc. (Computer Science), FICCTr Professor of International Business U.S.A.Email: University of Quinnipiac [email protected] BS, Jilin Institute of Technology; MA, MS, PhD,. E (University of Texas-Dallas) Dr. T. David A. Forbes Burcin Becerik-Gerber Associate Professor and Range University of Southern Californi Nutritionist Ph.D. in Civil Engineering Ph.D. Edinburgh University - Animal DDes from Harvard University Nutrition M.S. from University of California, Berkeley M.S. Aberdeen University - Animal & Istanbul University Nutrition B.A. University of Dublin- Zoology Dr. Bart Lambrecht Dr. Söhnke M. Bartram Director of Research in Accounting and Department of Accounting and FinanceProfessor of Finance FinanceLancaster University Management Lancaster University Management School SchoolPh.D. (WHU Koblenz) BA (Antwerp); MPhil, MA, PhD MBA/BBA (University of Saarbrücken) (Cambridge) Dr. Carlos García Pont Dr. Miguel Angel Ariño Associate Professor of Marketing Professor of Decision Sciences IESE Business School, University of IESE Business School Navarra Barcelona, Spain (Universidad de Navarra) Doctor of Philosophy (Management), CEIBS (China Europe International Business Massachussetts Institute of Technology School). (MIT) Beijing, Shanghai and Shenzhenw Master in Business Administration, IESE, Ph.D. in Mathematics University of Navarra University of Barcelonae Degree in Industrial Engineering, BA in Mathematics (Licenciatura) Universitat Politècnica de Catalunya University of Barcelonai Dr. Fotini Labropulu Philip G. Moscoso Mathematics - Luther College Technology and Operations Management University of ReginaPh.D., M.Sc. in IESE BusinessV School, University of Navarra Mathematics Ph.D in Industrial Engineering and B.A. (Honors) in Mathematics Management, ETH Zurich University of Windso M.Sc. in Chemical Engineering, ETH Zurich Dr. Lynn Lim yDr. Sanjay Dixit, M.D. Reader in Business and Marketing l Director, EP Laboratories, Philadelphia VA Roehampton University, London Medical Center BCom, PGDip, MBA (Distinction),r PhD, Cardiovascular Medicine - Cardiac FHEA Arrhythmia Univ of Penn School of Medicine Dr. Mihaly Mezei a Dr. Han-Xiang Deng ASSOCIATE PROFESSOR MD., Ph.D Department ofE Structural and Chemical Associate Professor and Research Biology Department Division of Neuromuscular Mount Sinai School of Medical Center Medicine Ph.D., Etvs Lornd University Davee Department of Neurology and Postdoctoral Training, New York Clinical Neurosciences University Northwestern University Feinberg School of Medicine
Dr. Pina C. Sanelli Dr. Michael R. Rudnick Associate Professor of Public Health M.D., FACP Weill Cornell Medical College Associate Professor of Medicine Associate Attending Radiologist Chief, Renal Electrolyte and NewYork-Presbyterian Hospital Hypertension Division (PMC) MRI, MRA, CT, and CTA Penn Medicine, University of Neuroradiology and Diagnostic Pennsylvania Radiology Presbyterian Medical Center, M.D., State University of New York at Philadelphia Buffalo,School of Medicine and Nephrology and Internal Medicine Biomedical Sciences Certified by the American Board of Internal Medicine Dr. Roberto Sanchez w Associate Professor Department of Structural and Chemical Dr. Bassey Benjamine Esu Biology B.Sc. Marketing; MBA Marketing; Ph.D Mount Sinai School of Medicine Marketing i Ph.D., The Rockefeller University Lecturer, Department of Marketing, University of Calabar Dr. Wen-Yih Sun TourismV Consultant, Cross River State Professor of Earth and Atmospheric Tourism Development Department SciencesPurdue University Director Co-ordinator , Sustainable Tourism National Center for Typhoon and Initiative, Calabar, Nigeria
Flooding Research, Taiwan y Dr. Aziz M. Barbar, Ph.D. University Chair Professor l IEEE Senior Member Department of Atmospheric Sciences, Chairperson, Department of Computer National Central University, Chung-Li,r Science TaiwanUniversity Chair Professor AUST - American University of Science & Institute of Environmental Engineering, Technology National Chiao Tung University,a Hsin- Alfred Naccash Avenue – Ashrafieh chu, Taiwan.Ph.D., MS The University of Chicago, GeophysicalE Sciences BS National Taiwan University, Atmospheric Sciences Associate Professor of Radiology
Dr. R.K. Dixit (HON.) M.Sc., Ph.D., FICCT Chief Author, India Email: [email protected] w Vivek Dubey(HON.) Er. Suyog Dixit e MS (Industrial Engineering), BE (HONS. in Computer Science), FICCT MS (Mechanical Engineering) SAP Certifiedi Consultant University of Wisconsin Technical Dean, India FICCT Website:V www.suyogdixit.com Editor-in-Chief, USA Email:[email protected], [email protected] [email protected] Sangita Dixit y M.Sc., FICCT Dean and Publisher, India l [email protected] a E i. Copyright Notice ii. Editorial Board Members iii. Chief Author and Dean iv. Table of Contents v. From the Chief Editor’s Desk vi. Research and Review Papers
1. Load Balanced Clusters for Efficient Mobile Computing 2-6 2. Corporate Data Obesity:50 Percent Redundant 7-11 3. Web Mining: A Key enabler for Distance Education 12-13 4. Optimized Remote Network Using Specified Factors As Key Performance Indices 14-17w 5. Analysis of the Routing Protocols in Real Time Transmission: A Comparative Study 18-22 6. An Empirical Study on Data Mining Applications 23-27 7. A Novel Decision Scheme for Vertical Handoff in 4G Wireless Networks 28-33e 8. Hybrid Approach for Template Protection in Face Recognition System 34-38 9. QRS Wave Detection Using Multiresolution Analysis 39-42 i 10. A Review on Data Clustering Algorithms for Mixed Data 43-48 11. Optimization Of Shop Floor Operations: Application Of Mrp And Lean Manufacturing Principles 49-54 12. A Study On Rough Clustering 55-58 V 13. Applying Software Metrics on Web Applications 59-63 14. Measuring Helpfulness of Personal Decision Aid Design Model 64-80 15. Security Provision For Miners Data Using Singular Value Decomposition In Privacy Preserving Data Mining 81-84 16. An Efficient Synchronous Checkpointing Protocol for Mobile Distributed Systems 85-89 17. A Fuzzy Co-Clustering approach for Clickstreamy Data Pattern 90-95 18. A Survey on Topology for Bluetooth Based Personal Area Networks 96-101 19. Identification of Most Desirable Parametersl in SIGN Language Tools: A Comparative Study 102-108 20. A Low-Overhead Minimum Process Coordinated Checkpointing Algorithm For Mobile Distributed System 109-115 r 21. The Establishment of an AR-based Interactive Digital Artworks 116-120 22. AUTOCLUS: A Proposed Automated Cluster Generation Algorithm 121-123 23. Implementing Searcha Engine Optimization Technique to Dynamic / Model View Controller Web Application 124-132 24. Eye detection in video images with complex Background 133-136 25. Multi-LayerE User Authentication Approach For Electronic Business Using Biometrics 137-141 26. Cloud Computing – A Paradigm Shift 142-146 27. On Security Log Management Systems 147-157 28. A Transformation Scheme for Deriving Symmetric Watermarking Technique into Asymmetric Version 158-162 vii. Auxiliary Memberships viii. Process of Submission of Research Paper ix. Preferred Author Guidelines x. Index Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 1
e see a drastic momentum everywhere in all fields now a day. Which W in turns, say a lot to everyone to excel with all possible way. The need of the hour is to pick the right key at the right time with all extras. Citing the computer versions, any automobile models, infrastructures, etc. It is not the result of any preplanning but the implementations of planning. w With these, we are constantly seeking to establish more formal links with researchers, scientists, engineers, specialists, technicale experts, etc., associations, or other entities, particularly those whoi are active in the field of research, articles, research paper, etc. by inviting them to become affiliated with the Global Journals. V This Global Journal is like a banyan tree whose branches are many and each branch acts like a strong root itself.y Intentions are very clear to do lbest in all possible way with all care.
Dr. R. K. Dixit r Chief Author [email protected]
E
P a g e | 2 Vol. 10 Issue 4 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Load Balanced Clusters for Efficient Mobile Computing
Dr. P.K.Suri and1 Kavita Taneja2
Abstract-Mobile computing is distributed computing that routing and within the cluster MUs be in touch directly. involves components with dynamic position during However, MUs communicate outside the cluster through a computation. It bestows a new paradigm of mobile ad hoc centralized MU that is called Clusterhead (CH). CH elected networks (MANET) for organizing and implementing to be part of the backbone for the MANET system and is computation on the fly. MANET is characterized by the assigned for communication with all other clusters [2, 3, 4]. flexibility to be deployed and functional in “on-demand” situations, combined with the capability to ship a wide This provides a hierarchical MANET system which assists spectrum of applications and buoyancy to dynamically repair in making the routing scalable. CHs are elected according to around broken links. The underlying issue is routing in such several techniques. The CH allows for minimizing routing dynamic topology. Numerous studies have shown the difficulty details overhead from other MU within the cluster. for a routing protocol to scale to large MANET. For this, such Overlapping clusters might have MUsw that are common network relies on a combination of storing some information among them which are called gateways [5]. MANET about the position of the Mobile Unit (MU) at selected sites and requires efficient routing algorithm in order to reduce the on forming some form of clustering. But the centralized amount of signaling introduced due to maintaining valid Clusterhead (CH) can become a bottleneck and possibly lead to routes, and therefore enhancee the overall performance of the lower throughput for MANET. We propose a mechanism in which communication outside the cluster is distributed through MANET system [6,7]. As the CH is the central MU of separate CHs. We prove that the overall averaged throughput routing for packets destinedi outside the cluster in the distinct increases by using distinct CHs for each neighboring cluster. clustering configuration, the CH computing machine pays a Although increase in throughput, reduces after one level of penalty of unfair resource utilization such as battery, CPU, traffic rates due to overhead induced by “many” CHs. and memory [8]. Several studies [9, 10, 11] have proposed a CH election in order to distribute the load among multiple I. MOBILE COMPUTING: VISION AND CHALLENGES V hosts in the cluster. Our approach extends the same concept obility originates from a desire to move toward the of load balancing among CHs too. Section 2 discusses the Mresource or to move away from scarcity and in rare related work and outlines major challenges while clustering cases it may be just a nomadic move. Wireless mobile in MANETs, section 3 discusses the multi-CH approach, computing faces additional constraints induced by wireless section 4 presents the system model, section 5 discusses the communications and the demand for anytime anywherey numerical results obtained, and finally paper is concluded communication towards the vision of ubiquitous or with future scope in section 6. pervasive computing. It is accepted that the new parameters in mobile computing [1] are mobility of elements,l the II. RELATED WORK limited resources of the Mobile Units (MUs) and the limited Several mechanisms of CH election exist with an objective wireless bandwidth. The ―mobility‖ and ―position‖ has a to endow with efficient mobile computing in terms of stable more significant effect on the developmentr of middleware, routing in the MANET system [12, 13]. Some mechanisms simulators and services for the MU than the other favor not changing the CH to reduce the signaling overhead parameters. These characteristics can be viewed in a involved in the process, which also makes the elected MU hierarchical fashion where thea basic elements influence usage of its own resources higher [14]. Other mechanism higher more complicated systems. The mobile computing assigns the CH based on the highest MU ID as in the Linked challenges on the one hand irrevocably handicapped the Cluster Algorithm, LCA [15]. However, this selection existing infrastructure in effectively supporting the process burdens the MU due to its ID. CH can become exponentially rising E demands and on the other hand open bottleneck and lead to propagating congestion. One option is new avenues and opportunities for Mobile Ad Hoc Network to elect CH for a defined duration and then all MUs have a (MANETs). In general, such solutions rely on a combination chance to be a CH [3]. This mechanism keeps the CH load of storing some within one MU for the information about the position of the MU at selected sites CH duration budget, while it provides a balance of and on forming some form of clustering. The MUs are responsibilities for MUs within the cluster. Also, MU with a grouped in distinct or overlapping clusters for the purpose of high mobility rate may not get the chance to become a CH if ______its mobility rate is higher than the duration of CH rotation.
About-1 Professor Deptt. of Comp. Science & Applications, Kurukshetra But transition and the duration budget contribute greatly to University, Kurukshetra, Haryana, India (e-mail;[email protected]) overhead. Mobility is one of the most important challenges About-2 Asstt. Prof., M.M. Inst. of Comp. Tech. & B. Mgmt., of MANETs, and it is the main factor that would change M.M.University, Mullana, Haryana, India(e-mail; [email protected])
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 3 network topology. A good electing CH does not move very community. This section briefly summarizes some of the quickly, because when the clusterhead changes fast, the major challenges faced while clustering in such network MUs may be moved out of a cluster and are joined to [12-15]. another existing cluster and thus resulting in reducing the B. Heterogeneous Network stability of network. Hence, CH election mechanisms consider relative MU mobility to ensure routing path In most cases MANET is heterogeneous consisting of MUs availability [16, 17], however, causing an added signaling with different energy levels. Some MUs are less energy overload and causing the elected CH to pay the higher constrained than others. Usually the fraction of MUs which resource utilization penalty. We can conclude from the are less energy constrained is small. In such scenario, the existing research that several tradeoffs exist for the elected less energy constraint MU are chosen as CH of the cluster CH and the other cluster MUs. Firstly, the CH has to bear and the energy constrained MUs are the member MUs of the higher resource utilization such as power, which may cluster. The problem arises in such network when the deplete its battery sooner than other MUs in the cluster. In network is deployed randomly and all cluster heads are addition, possibly causing more delay for its own concentrated in some particular part of the network resulting application routing due to the competition with the routing in unbalanced cluster formation and also making some for other MUs. Secondly, despite fair share responsibility of portion of the network unreachable. Also if the resulting CH role, it is possible that heavy burst of traffic takes place distribution of the CHs is uniform and if we use multi hop causing some CHs to use maximum resources, while others communication, the MUs which are close to the CH are encounter low traffic bursts resulting in minimum resource under a heavy load as all the traffic isw routed from different use. Thirdly, the fair share or load balancing technique [3], areas of the network to the CH is via the neighbours of the might result in a CH that will not provide the optimal path CH. This will cause rapid extinction of the MUs in the for routing, or yet a link breakage. Plus non CH are neighborhood of the CHs resultinge in gaps near the CHs, privileged as they don‘t pay a routing penalty and have decreasing of the network size and increasing the network resources dedicated for own usage only. Therefore, there is energy consumption. Heterogeneous MANET require no one common CH election mechanism that is best for careful management i of the clusters in order to avoid the MANET systems, without some hurting tradeoffs. The Zone problems resulting from unbalanced CH distribution as well Routing Protocol (ZRP) [18] provides a hybrid approach as to ensure that the energy consumption across the network between proactive routing which produces added routing is uniform. control messages in the network due to keeping up to date V routes, and reactive routing which adds delays due to path C. Network Scalability discovery and floods the network for route determination. In MANET new MUs comes in the vicinity of the current ZRP divides the network into overlapping zones, while network. The clustering scheme should be able to adapt to clustering can have distinct, non overlapping clusters. In changes in the topology of the network. The key point in ZRP, Proactive routing is used within the zone, and reactive designing cluster management schemes should be if the routing is used outside the zone, instead of using one typey of algorithm is local and dynamic it will be easy for it to adapt routing for the whole network. In addition, [18, 19] suggest to topology changes. that hybrid approach is suited for large networks, enhances l D. Uniform Energy Consumption the system efficiency, but adds more complexity. Each MU has a routing zone within a radius of n hops. All MUs with Clustering schemes should ensure that energy dissipation exactly n hops are called peripheral MUs,r and the ones with across the network should be balanced and the CH should be less than n are called interior MUs. This process is repeated rotated in order to balance the network energy consumption. for all MUs in the network. A lookup in the MU‘s routing table helps in deciding if the destination MU is within the E. Multihop or Single Hop Communication zone resulting in proactivea routing. Otherwise, the The communication model that MANET uses is multi hop. destination is outside the zone, and reactive routing is used Since energy consumption in wireless systems is directly which triggers a routing request. As a result of a routing proportional to the square of the distance, most of the response, one of the peripheral MUs will be used as an exit routing algorithms use multi hop communication model route from the zone toE the destination. While, if clustering is since it is more energy efficient in terms of energy applied, the same elected CH is used for routing outside the consumption however, with multi hop communication the cluster without triggering any route discovery to the MUs which are closer to the CH are under heavy traffic and destination. As discussed above, the main focus of the can create gaps near the CH when their energy terminates. existing work focuses on an election of single CH for a cluster. Even though this minimizes the overall signaling F. Cluster Dynamics overhead in the cluster, but it mainly can make the central Cluster dynamics means how the different parameters of the CH a bottleneck. cluster are determined for example, the number of clusters A. Challenges And Issues In Clustering in a particular network. In some cases the number might be reassigned and in some cases it is dynamic. The CH Despite the tremendous potentials and its numerous advantages MANET pose various challenges to research P a g e | 4 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology performs the function of compression as well as transmission of data. The distance between the CHs is a major issue. It can be IV. SYSTEM MODEL dynamic or can be set in accordance with some minimum value. In case of dynamic, there is a possibility of forming We have used glomosim [22] simulator, running IEEE unbalanced clusters. While limiting it by some pre-assigned, 802.11 to prove our contribution. Our MANET system minimum distance can be effective in some cases but this is consists of four distinct non-overlapping clusters with a an open research issue. Also CH selection can either be physical terrain of 1500 meters by 1500 meters as shown in centralized or decentralized which both have advantages and Fig. 1. For the same cluster, we ran simulation experiments disadvantages. The number of clusters might be fixed or with one CH, and compared its performance results with dynamic. Fixed number of clusters cause less overhead in tests using 3 CHs. Each CH has an independent queue for that the network will not have to repeatedly go through the packets destined for the neighboring clusters for which a set up phase in which clusters are formed. In terms of particular CH is meant. During the simulation, we scalability it is poor. maintained the same CHs in both cases (single, multiple CHs), since changing the CH was irrelevant to what we are III. MULTI – CH APPROACH. proving. Our traffic type has Constant Bit Rate, (CBR), and The existing clustering approach encourages election of one File Transfer Protocol, (FTP), traffic. The same traffic load CH [20, 21]. The proposed work enhanced the architecture was run for both cases (single, 3 CHs). The selected traffic to use multiple CHs and distributes the load of the single CH load was chosen based on tests that allowed sufficient amongst multiple CHs in the same cluster. The proposed utilization of the channel. w mechanism does not mandate a specific CH election process. Any of the prior work [9, 10] can be used to select the CHs for a cluster. By distributing the load, a single CH Cluster 3 Cluster 2 e does not have to bear all the added responsibility of being CHs CHs the central point for routing in a cluster. Therefore, we believe this approach provides a more fair solution of i sharing inter-cluster routing responsibilities for a cluster. In addition, other mechanism can be applied to switch the responsibility of a CH to another MU, such as in [3]. In the Cluster 1 case of one CH per cluster, a link breakage caused by the V CHs Cluster 4 failure of the CH isolates all cluster MUs from CHs communicating to/from outside the cluster. However, our approach reduces the link breakage to be only in the direction towards a path where the failed CH forwards the data. Therefore, the reliability of routing in the MANET system is increased. We explore the certain benefitsy of Fig.1. Multi-CH Simulation Setup having multiple sinks in the network as follows: In this model Cluster 4 operates as a cluster with one CH Energy efficiency: In MANET, long routing pathl lengths and with many CHs. The remaining clusters operate with from MU located at the cluster borders to the CH are one CH. This work can be expanded by incrementing the observed. Adding extra CH to the cluster decreases the number of CHs in a cluster such that it has one CH per average path length between a MU andr the CH due to neighboring cluster. Our traffic included FTP traffic shorter geographic distance between them. Therefore, the generated between MUs in all clusters in the MANET number of hops that a packet has to travel to reach a CH gets system. The FTP sessions where established in both smaller. Since each traveled hop means the data packet directions. In addition, CBR traffic was generated in both consumes some energy at the avisiting MU, traveling fewer directions between MUs in cluster 4, and clusters 1, and 2. hops results in consuming lesser energy. In order to focus on the objective of distributing the CH Avoiding congestion near a CH: Using multiple CHs can load, we setup static routes in our MANET system. Routing also relieve the trafficE congestion problem associated with a from cluster 4 to cluster 2 was done via the intermediate single-CH system. cluster 1/cluster 3, and vice versa. Therefore, since there are Avoiding single point of failure: A single-CH is not robust 3 neighboring clusters to cluster 4, the system allowed for against failure of the CH or the MU around the CH. Multi- the use of 3 CHs, one for routing to/from each neighboring CH are therefore more resilient to MU failures. However, cluster. deploying many CHs does not solve the problem directly and evenly. It is essential to distribute cluster load among V. NUMERICAL RESULTS CHs and choose an optimal route(s) between MU and the Our simulation focused on the cumulative averaged corresponding CH. throughput and response time. Fig. 2 shows the percentage of increase in throughput when running multiple CHs over using one CH. In all cases, the throughput increased for the multiple CHs case. For the small simulation time of 1000S
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 5 and with the traffic load used, the increase was only about 7000S. The tests were run with one CH and multiple CHs 18% since the system was lightly loaded as a result of a for cluster 4. The throughput results are presented in Fig. 3. short simulation time. Therefore, one CH operated well The results show the percentage of increase in the averaged since the channel was not well utilized. Our peak results cumulative throughput for running multiple CHs over one show that at 7000S of simulation time, we reached a CH. We ran test at 4 traffic rates: High, medium (half of the maximum throughput improvement as this case indicates the high), low traffic rate (half of the medium) and at much channel utilization was at its optimal condition. Therefore, lower traffic rate than the low traffic for the longer simulation times, beyond what we concluded as optimal, the throughput decreased due to the added traffic on the channel. 120
100
80
60 40
Increase in Throughput(%) Increase 20
0 1K 2K 3K 4K 5K 6K 7K 8K 9K 10K Run length (sec) Fig.3. Throughput Improvement (%)w VS Traffic Rates Fig.2. Run length (sec) VS Throughput Improvement (%) rate which we called very low rate traffic. We have noticed, as shown in Fig. 3, the percentage of throughput The optimal case of 7000S proves the advantage of improvement for the very low was only nearly 50%. This is distributing the load to multiple CHs, we have gained about e attributed to the low channel utilization by the low traffic 101% improvement in throughput. Our results are explained rate. At the high traffic rate we have shown a reduced by the simple queuing theory model: improvement in throughputi due to traffic overload and multi ρ = λ / μ (1) queue overhead in the MANET system. This traffic overload where, ρ is the traffic intensity, λ is the traffic arrival rate was created by the higher arrival rate due to the added and μ is the service rate at each CH with queue length QLI sessions. However, at medium traffic rate, we obtained (k,l) with k as no. of packets and l as no. of CHs per cluster. about theV same level of throughput improvement as our Eq.1 indicates that ρ increases if the λ increases while μ optimal selected rate. We conclude that at these rates we remains at the same rate. In addition, the overall averaged obtained system stability with the offered traffic and service cumulative response time, increases if a constant service rate rates with many CH. Therefore, the results shown in Fig. 3 is maintained, while the traffic arrival rate increases. Our validate the selected traffic for our results above simulation showed that the response time remained constant when using one single CH, and multiple CHs of about y0.5. VI. CONCLUSIONS AND FUTURE WORK The traffic rate in the system is given by Box Muller Our contribution proves that one CH per cluster does not transformation (Eq. 2) with given σ=1 and μ=0 and rand1, provide for a maximized throughput of the MANET system rand2 as samples from U (0, 1). l due to the added responsibility for the one CH. Using s = (-2 Log (rand1)1/2 Cos (2π. rand2) (2) multiple CHs (with independent queue) per cluster The traffic rate is increased as indicated by the throughput distributes the load among multiple MUs which enables increase due to the multiple CHs, whiler maintaining the simultaneous and shared responsibility of inter cluster same response time. Normally, if the arrival rate increases routing among multiple MUs. It is an interesting finding to while maintaining the same service rate, then the response note that the increase in throughput due to the added CHs is time should increase accordingly. Therefore, we can a proportional to the number of CHs. Beat with the number conclude that, by maintaining the same response time, the equal to the neighboring clusters. Depending on the added traffic rate due to an increase in service rate results in topology and traffic pattern, if all CHs are simultaneously constant system utilization. In our topology, we increased used to route traffic, the rate of throughput increase fails to the number of CHs Eto 3. However, our throughput is about be the multiplier of the original throughput when using one doubled as shown in Fig. 2. We should expect by the CH due to overhead of maintaining multiple CHs in a distribution of work to 3 CHs, and by having the same cluster. It is suggested to do further research when having all averaged delay for the MANET system, a 3 fold increase in clusters employing multiple CHs, one per neighboring throughput since the service rate has tripled. However, we clusters. Also one expansion of the system model is to take only gained double the throughput due to cumulative one common queue and dispensing the packet to the idle CH increase in overall overhead due to the added traffic rate by irrespective of the neighboring cluster route. It is expected having multiple queues, one for each CH. In addition, as the that the throughput will increase at a very high rate as traffic arrival rate increased due to having the 3 CHs, the MANET is blessed with multi hop communication and service rate also increased, resulting in the same utilization minimizing the idle time of CHs will lead to balancing the rate for the MANET system. We ran additional test to overhead caused by their existence. validate the traffic rate at our selected simulation time of P a g e | 6 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
VII. REFERENCES wireless ad hoc networks,‖ Ad Hoc Networks, vol. 5, no. 4, pp. pp. 504-530, May. 2007. 1) Buss, D. 2005, ―Technology and design challenges 12) Khac Tiep Mai , Dongkun Shin , Hyunseung Choo, for mobile communication and computing ―Toward stable clustering in mobile ad hoc products,‖ in proceedings of the 2005 International networks,‖ in proceedings of the 23rd International Symposium on Low Power Electronics and Design Conference on Information Networking, pp.308- (San Diego, CA, USA, Aug. 08-10, 2005). ISLPED 310, Jan. 21-24, 2009, Chiang Mai, Thailand. '05. ACM, New York, NY. 13) X. Hong, M. Gerlo, Y. Yi, K. Xu, and T. J. Kwon, 2) S. Sivavakeesar, and G. Pavlou,‖Stable clustering ―Scalable ad hoc routing in large, dense wireless through mobility prediction for large-scale networks using clustering and landmarks,‖ in multihop intelligent ad hoc networks,‖ in proceedings of the IEEE International Conference proceedings of the IEEE Wireless Communications on Communications (ICC'02), vol. 25, no. 1, pp. and Networking Conference (WCNC'04), Georgia, 3179-3185, Apr. 2002. USA, Mar. 2004, vol. 3, 1488-1493. 14) ER, I. I. and Seah, W. K., ―Clustering overhead and 3) Amis, and R. Prakash, ―Load- Balancing Clusters convergence time analysis of the mobility-based in Wireless Ad Hoc Networks,‖ in proceedings of multi-hop clustering algorithm for mobile ad hoc the 3rd IEEE Symposium on Application-Specific networks,‖ in proceedings of the 11th international Systems and Software Engineering Technology Conference on Parallel and Distributed Systems - (ASSET'00), pp 25, Mar. 2000. Workshops ICPADS. IEEE Computerw Society, vol. 4) M. Gerla, and J. Tsai, ―Multicluster, Mobile, 02, pp. 130-134 Washington, DC, Jul. 20 - 22, Multimedia Radio Network,‖ ACM Journal on 2005. Wireless Networks, vol. 1, no. 3, pp 255-265, 1995. 15) Jane Y. Yu and Petere H.J. Chong, ―A survey of 5) Nocetti, J. S. Gonzalez, and I. Stojmenovic, clustering schemes for mobile ad hoc networks‖ ―Connectivity based k-hop clustering in wireless IEEE Commun. Survey & Tutorial, vol 7 no. 1, pp. networks,‖ Telecommunication Systems Journal, 32-48, Mar. 2005.i vol. 22, no 1-4, pp. 205-220, 2003. 16) C. R. Lin and M. Gerla, "Adaptive clustering for mobile wireless networks," IEEE JSAC, vol. 15,
6) Arboleda C., L. M. and Nasser, N, ―Cluster-based pp. 1265-75, Sept. 1997. routing protocol for mobile sensor networks,‖ in 17) A,VMcDonald, T. F. Znati, ―A mobility based proceedings of the 3rd international Conference on framework for adaptive clustering in wireless ad Quality of Service in Heterogeneous hoc networks,‖ IEEE JSAC, vol. 17, no. 8, Wired/Wireless Networks (Waterloo, Ontario, pp.1466- 1486, Aug. 1999. Canada, August 07 - 09, 2006). QShine '06, vol. 18) Z. J. Haas, and M. R. Perlman, ―The performance 191. ACM, New York, NY, 24. of query control schemes for the zone routing
7) Akkaya K., Younis M., "A survey on routingy protocol,‖ in proceedings of ACM Sigcomm‘98, protocols for wireless sensor networks", Elsevier vol. 28, no. 4, pp 167 – 177, Oct. 1998. Ad Hoc Network Journal, vol.3, no. 3, pp. 325-349, 19) P.Y. Chen, and A.L. Liestman, ―Zonal algorithm 2005. l for clustering an hoc networks,‖ International
8) Cardei, I., Varadarajan, S., Pavan, A., Graba, L., Journal of Foundations of Computer Science, in a Cardei, M., and Min, M, ―Resource management special issue dedicated to Wireless Networks and for ad-hoc wireless networksr with cluster Mobile Computing, vol. 14, no. 2, pp. 305-322, organization,‖ Cluster Computing, vol.7, no.1, pp. Apr. 2003. 91-103, Jan. 2004. 20) Zang, C. and Tao, C., ―A multi-hop cluster based
9) Wang, S., Pan, H., Yan,a K., and Lo, Y, ―A unified routing protocol for MANET,‖ in proceedings of framework for cluster manager election and the 2009 First IEEE international Conference on clustering mechanism in mobile ad hoc networks,‖ information Science and Engineering (December Comput. Stand. Interfaces, vol. 30, no. 5, pp. 329- 26 - 28, 2009). ICISE. IEEE Computer Society, 338, Jul. 2008.E Washington, DC, pp. 2465-2468, 2009.
10) V. S. Anitha , M. P. Sebastian, ―Scenario-based 21) Wang, C., Yu, Y., Xu, Y., Ma, M., and Diao, S., diameter-bounded algorithm for cluster creation ―A multi-hop clustering protocol for MANETs,‖ in and management in mobile ad hoc networks,‖ in proceedings of the 5th International Conference on proceedings of the 2009 13th IEEE/ACM Wireless Communications, Networking and Mobile International Symposium on Distributed Computing (Beijing, China, September 24 - 26, Simulation and Real Time Applications, pp.97-104, 2009). IEEE Press, Piscataway, NJ, pp. 3038-3041, Oct. 25-28, 2009 2009
11) Spohn, M. A. and Garcia-Luna-Aceves, J. J., 22) .Web site for glomosim simulator, ―Bounded-distance multi-clusterhead formation in http://pcl.cs.ucla.edu/projects/glomosim/
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 7
Corporate Data Obesity:50 Percent Redundant
Hae Kyung Rhee
Abstract-In this essay, we report what we have observed with research and management information systems research. In regard to status quo of corporate information systems in real this sense, it is almost impossible to find any past work in world from our experiences of twenty years of data the literature made with regard to this issue. Note that the management practices. It is considered to be serious in that concept of data obesity is essentially irrelevant to data data are too conveniently and frequently replicated to make volume. Although introduction of some upper-level data information systems improperly behave in terms of their stores like data warehouses (DW) or data marts (DM) other quality standards including response time. Average ratio of data replication in a site is astonishingly judged to be more than the lower-level operational data stores (ODS) in than 50 percent of a whole corporate database. It is in reality corporate environment certainly contributes to abundance of about 65 percent in average to our knowledge. Presenting this data, DWs and DMs are out of scope in this essay. If we paper to academia has been motivated by our strong belief and stick only to ODSs, we could observe that a lot of obesity is evidence that most of the redundancy can effectively and already there in corporate environment. systemically be removed from the very start of information Note that, in a fairly large corporate such as General Electric system development. We also noted that field workers or Samsung Electronics, there are approximatelyw 15,000-to- including database administrators in corporate environment 20,000 data attributes in their database. Notice also that the tend to think data part of IS and program part of IS mixed level of redundancy in data attribute is not exactly the same together from the start of IS design and popularity of this tendency eventually caused a lot of entanglement that could as the level of redundancye in data volume. However, to hardly be dealt with later by themselves. We therefore present make it comparatively simple to have some idea about a couple of mandates that must be respected in order not to get redundancy i involved in such a perplexity in terms of data volume, since a lot of people in field work Keywords-Corporate Data Obesity, Data Redundancy, prefer this way of understanding, when we happen to hear Enterprise Data Map. that database size of some company is, for instance, 100 terabytes, it is legitimate or reasonable to think that the I. CONCEPT OF OBESITY company Vin reality has a database of approximately 35-to-50 t is not unusual to think that if a person is weighed more TBs. So, in case 50-to-65 TBs of data can be totally I than about 20 percent of what needs to maintain for eliminated from the corporate database and this elimination fitness then he or she is considered to be over-weighted. does never affect harm the normal operation of the database This is what we understand with regard to concept of at all. Redundancy demands a huge cost in terms of waste in obesity. It is no different for data in corporate environment. storage and belatedness in response to database queries. It will be y Note that even 1 TB of data amounts to piling A4 size astounding to recognize that the degree of data obesity in papers up about 100 kilometers high. corporate is far more than 20 percent. It is in fact l65 percent Redundancy or replication gives some illusion that it could in average for some dozens of large enterprises we have contribute to enhancement of response time, but on the other observed in depth for the past twenty years. To be exact in hand things can get messy if we consider consistency of terms of terminology, the unit of obesityr we mean is data data. The quality of answers to data queries could be always attribute. For example, if there is a customer data and it is in question, since making all the replica copies to have the comprised of c-name and c-address, c-name and c-address same value usually takes a substantial amount of time due to are the data attributes. So, in case c-name appears more than non-automatic processes of such data value propagation. a Manual propagation by considerate programming once in a corporate database, it is called redundant or replicated. Although the reports on data abundance in nevertheless unfortunately incurs unforced human errors and corporate environment have been made in the literature, as there is no guarantee for data consistency at all across a far as we know, onlyE the issue of data deluge [Cukier2010, corporate database. Once an inconsistent value of data KaBoZe2010] has been dealt with a couple of times in order happens to be used to reply the queries, trust of information to emphasize world-wide phenomenon of rapidity in system would unbelievably collapse. Issue of mistrust would increase of data in terms of volume. The issue of data then raise the question of integrity with regard to a whole obesity is new in the world-wide communities of database information system. Therefore, limiting the occasions of data replication to be ______minimal is necessary whenever it is possible. Unless the rate About- Associate professor at Dept. of Computer Game & of data redundancy is substantially reduced, say to about 15 Information in Yong-In Songdam college. percent by means of wary design from the outset of IS (telephone:82-31-330-9234 email:[email protected]) development, data normalization theories [YuJa2008] that have been esteemed almost over the past thirty years turn out to be ―useless‖ at all in real world. To our knowledge P a g e | 8 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
and experiences, they could contribute mere 5 percent of reduction comes quite before some tabular form of data data redundancy reduction. The other 45 percent of begins to emerge in the process of IS development and that is just determine whether any data is a semantic derivative of some where we start to lay out job descriptions, in non-technical other data. This semantic data duplicity is the major malice term. We will get back to this later in this essay after to make corporate database incurably obese. So, it is discussion with regard to how people in IT field are necessary to remove syntactic duplication, but it is insensitive to the issue of redundancy exceedingly more crucial not to forge any possibility of semantic duplicity from the very outset of IS development. II. UNNECESSARY REDUNDANCY It really is almost impossible to check semantic equivalence, an arena where data is represented in a form of table or even periodically, once an information system is in relation, in expertise terminology, the concept of keys like operation day to day. primary key and foreign key is technically inevitable. III. DE-NORMALIZATION—PANACEA OR DEADLY Basically, if a particular key of table, say A, dubbed its HOMEPATHY? primary key, is duplicated in another table, say B, as a part or component of key of B, that key is denoted as a foreign It is really unfortunate that we have never seen any data key in B, as it has been imported or borrowed from other table or relation that even follows the rule of well-known table, which is A. This clarifies that origin of the key is from first normal form (1NF) in real world corporate databases. A, not B. This way of designating and incorporating such So, sometimes it is ridiculed that real world databases only externality of key will bring IS about 15 percent of data contain tables of non-normal form orw zero normal form, redundancy contained intrinsically, which is technically since they have properties significantly inferior than 1NF in unavoidable if we stick to the tabular representation of data. terms of data quality such as the degree of data redundancy This portion of redundancy can be called redundancy of and dependability of non-keye data attributes to key necessity. So, if data obesity ratio is said to be 65 percent, it attributes. The beauty of table normalization or table is true that about 45 percent of the entire data is therefore standardization by applying 1NF, 2NF, 3NF or Boyce-Codd classified to be unnecessary or superfluous in their nature. NF is that whenever i there is a data redundancy in a table Whether to remove this much of unnecessary redundancy or then it is possible to remove it by decomposing or splitting unwanted replication is up to decision of an individual data the table into two. manager, but unless removal of them is done the In corporate IT field unfortunately a term ―de- information system would definitely be hampered or normalization‖V [JoJA2007] has gained so much popularity suffered by lack of consistency and further by eventual in a sense that field managers usually do not have a time to slowness in response time. Note that, normally in the pay attention to and understand the theories behind database queries of any corporate, about half of them are normalization. They at first pretend to understand and use update requests and the other half are retrieval requests. If them, but in reality they sooner or later totally forget about this reality of read-write ratio, i.e. 0.5, is ignored, we are them. By far, we are very unfortunate that we have never soon tempted to allow data duplication by assuming y that seen any database administrator who really does understand reads are much more frequent than writes, and subsequently the basic difference between 1NF and 2NF. The reality is a fatal disaster would then be experienced sooner or later that they keep never trying or studying to grasp the meaning l and benefit of making tables normalized and keep feigning due mainly to data inconsistency dilemma. The payoff for burden of upholding this unnecessary to have started with 1NF initially for IS development and to redundancy is really enormous. Usually, it would be about proceed forward to make tables in up to 3NF and all of r sudden for the sake of performance they inevitably and five times more costly than the case where the level of redundancy is minimally enforced. So, it is going to be 10 eventually come to resort to 1NF again. But this could be a million dollars versus 50 million dollars when so called next sort of fictional story and hence never true at all, since they generation, i.e. enhanced version,a of information system is always had failed to tell us what the intrinsic difference to be developed. As the degree of data redundancy between 1NF and 3NF is. A number of experiments increases, data consistency tasks among operational [KSLM2008] already have shown that having tables in 3NF databases exponentially as well increase in proportion to the performs always better than 2NF or 1NF and that 3NF is E considered to be quite optimal even in cases where seven- amount of increase in data redundancy. Note that there is inevitably redundancy between the lowest-level database way table joins are conducted. Note that 7-way join means and its upper-level data warehouses, since data in database that combining seven different tables, each fairly large in are in principle shoveled upward to its data warehouses in our experiments, at the same time. the process of generating data warehouses. It is also a The real problem with IT field managers and even database natural consequence that another layer of redundancy is administrators is that they hardly understand even what the unavoidable between data warehouses and their upper-level 1NF is. Note that in any data-related literature for the past data marts. forty years of history, notion of ―de-normalization‖ has In case data redundancy is existent, it is not difficult to find never been introduced, but they pretty much fond of taking many of duplication are intrinsically semantic. Syntactic that jargon just in order to forget about normalization stuff duplication is easy to find out, but it is almost impossible to and to wish to let themselves totally unaware of any
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 9 impending issues related to data consistency. They seem to corporate. It is judged to be improper or abnormal if the be soon relieved to hear by someone else that normalization action ‗sell‘ appears more than once in entire job could always be compromised for the reason of descriptions of the corporate. This kind of effort in reducing performance. To our knowledge, they are misled by mainly or removing actions redundancy has no relationship in what outside IT consultants who have never been trained enough is known to be crucial like 1NF, 2NF or 3NF, as emphasized in basic knowledge in database. So, it is actually a very in the literature. But removal effort with regard to demanding burden to make them understand what the redundancy in data attributes directly associated with actions normalization theories are all about. is far more important than the removal of redundancy in However, this is not too bad if we know that having tables tables at a later stage of database creation. If the removal even in 3NF could contribute to reduce the degree of data effort is not sufficiently done, redundancy thus retained redundancy by at most about 5 percent, which is not too intentionally or unintentionally would then automatically be much. Consequently, the contribution of normalization transferred intact to tables at the instance of table creation. would be only minor. But then, where is the majority of From the perspective of who or what is in charge of contribution come from? It comes much prior to the dynamically creating data in corporate environment, it is fair formulation of tables. In order to realize this, we have to to admit that behaviors, rather than fixed entities, play the know what and where the origin of data essentially is in major role of such creation. Fixed entities that are always corporate environment. Where is the place where expressed as nouns in description statements like redundancy really starts to build? It is at the very beginning ‗employee‘ and ‗department‘ normally generate only static of business processes, not where the normalization theories data attributes and thus said to be onlyw at the outskirt in are just about to be applied. Wouldn‘t it be curious that data-creating activities. In this sense, it is meaningful if we where are all the data that are to be appeared eventually in preferably write job descriptions in a way of behavior-by- tables come from? behavior. Each behavior thene has a responsibility for creating only IV. NECESSITY OF BUSINESS PROCESSES DESCRIPTION meaningful data attributes. In case a behavior does not Let us turn our attention to how business processes are contribute to generatei certain attributes, it has no value of described so that field workers can communicate each other existence to be independent or stand alone. This means that later on. They will certainly be in a form of business in that case it is reasonable to place that behavior to be processes description or job description. So, the subsumed by some other behavior that is directly relevant transformation of job descriptions into data tables might and superiorV to it. take a couple of interim stages, since descriptions V. BEHAVIOR-ORIENTED JOB DESCRIPTIONS themselves have a format different from table and there is no direct, straightforward method that can map the As we have observed over the past 20 years, the unit of descriptions into tables. Then, how is job description resources that is assigned to an employee is normally a job. comprised of? In it, there could appear data entity like Definition of jobs has been in a sense pretty much well employee or department which has fixed values for datay established in corporate. For example, we could count the attributes it is comprised of. number of jobs in a corporate without much difficulty. To For example, a data entity ‗employee‘ might consistl of data our experience, a mid-size corporate has about 500 to 1,000 attributes ‗address‘ and ‗social security number‘ and their jobs and to perform those jobs it normally requires to values are normally fixed, i.e., not changed over time. In maintain the number of employees of about twice as much case in job description there is a descriptionr statement like as the number of jobs, since it is a usual practice to assign ―An employee sells a machine.‖, data entities ‗employee‘ two persons to a single job in order to prepare for and ‗machine‘ will have such fixed values, while on the emergencies of just-in-case. So far, we have seen a number other hand data entity ‗sell‘ is different in that the values of corporate that have about 500 jobs and 1,000 employees that data attributes of ‗sell‘ a like selling date or selling in real world. This might be a kind of standard for mi-size volume vary, i.e., changed each time the action or behavior corporate. ‗sell‘ is performed. So, action entities are at the focal point We were able to observe from our experience that each job in terms of creating Edifferent data values in the database. It in average could be comprised of some 20-to-30 actions or can be considered that the source entity of action ‗sell‘ is behaviors in case data-creating actions are only taken into ‗employee‘ and its destination entity is ‗machine‘. This way account in job descriptions. So, if there are 500 different of writing job descriptions by taking action-oriented jobs in a corporate, then it means that there are about approach or behavior-oriented approach [KDLM2007] is 10,000-to-15,000 behaviors altogether in that company. straightforward. It could be fairly easy to understand for With no redundancy in actions, those some 10,000 employees who have a mission of writing a description for behaviors must be unique in that they do not incur jobs they actually perform. redundancy of any types so that each of them must appear Efforts to make job descriptions to be free from data once and at most once throughout the entire corporate redundancy are essential and valuable to check whether database. there is redundancy of any sort for each particular action. This means the action ‗sell‘ above appears at most only once in job descriptions of whole business processes of a P a g e | 10 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
VI. ENTERPRISE DATA MAP system. We emphasize that any programming effort must be deferred until the finalization of EDM. EDM in this sense is These behaviors are in a sense interconnected each other in the blueprint for any design like, for instance, building or a way that each data-creating action has one fixed entity on road. To our knowledge, EDM is definitely the blueprint for its left and one more fixed entity on its right. If we denote a information system prior to any programming effort. What behavior by B and a fixed entity by E, then the web of those we emphasize is that data itself is essentially data in that interconnection would look like a type of ‗E—B—E‘. So, programming must begin to take place only after the data the whole picture would look something like a rectangular formulation has been made to sure to be completely type that would allow data accesses or data retrievals in wrapped up. Data-first programming-later approach is either direction, clockwise or counter-clockwise, as depicted crucial for the success of information system. If data stuff in arrows in Fig.1. and programming stuff are mixed together from the start of E BB EE information system development, chaotic situations would A duly be encountered in determining that whether an impending problem at issue is originally from data part or BB A B programming part. We emphasize that any data cannot be A A represented or expressed or substituted in a way of any EE B EAE programming means. A Note that if somebody happened to introduce a data A ‗whether-a-student-is-registered-or-not‘,w then it is in fact a B A B disguise as a data in that it essentially has a sort of A A A algorithmic logic in that data. Presuming that a data like EAE B EAE ‗registration date‘ could residee somewhere else in the database already, ‗whether-or-not‘ type of decision could A A EE then be definitely dealt with some conditional statements Fig. 1. Rectangular A Path FormedE in Enterprise Data Map, like ‗if‘ in programming.i Separation of data from where B Denotes Behavior and E Denotes Entity programming must be strictly obeyed in a sense that, without separation, a bunch of semantic redundancy like this Rectangularity guarantees balance in response time in either sort of disguise could later be insidiously come into the direction of access, while if otherwise skewed case to one informationV system. If it seems that this way of algorithmic particular direction could induce degradation in response logic is certainly in a data, then it is not real data, since only time. Although there are only seven actions in this picture, the raw data is privileged to be called as data. Anything we could get a whole diagram that contains some 10,000 impure in a way of generating artifacts is not called the real behaviors if we keep extending the picture by adding more data. For example, if data C is from the result of addition of behaviors to it. The entire picture of connection without raw data A and raw data B, then C is not in principle treated allowing isolation of any picture fragment could be calledy an as data. Note that in the lowest infrastructural level database enterprise data map [Moon2004] of corporate only such raw data are entitled to reside. With this EDM, we are able to judge or realize l where the Anything else must be deported to reside somewhere else origin of a particular data attribute is and how it flows like data warehouses. throughout the entire data access paths already obtained and VIII. CONCLUSION depicted in EDM. With EDM, it is veryr easy to find out visually where are data redundancies if there are any. As a In sum, there are two major mandates that have to obey to diagram, one EDM can depict about 20 pages of A3-size in make information systems free from data obesity. The first case font size of 5 is used. Drawing would be automatic if one is that efforts for removing data redundancy should be we use a software drawing toola such as ERwin [JoJB2007]. enforced from the start of information system development, The EDM of such many pages would then easily fit into the which is from the starting point of securing job descriptions. wall of CEO‘s or CIO‘s office. Or it could also be displayed The latter one is the strict separation of data arena and on CFO‘s office in caseE he is interested in figuring out how programming arena in developing information systems. is the flow of all the data directly related to financial status Questions like whether this belongs to data or programs are quo of his company. Unfortunately, at the moment only a better to be raised as frequently as possible in order not to few corporate experienced the value of obtaining and bring any chance of confusion about which comes before maintaining the EDM, but we advocate that its use would and which comes after or later. To our knowledge, the significantly benefit many aspects of information system. degree of data obesity is guaranteed to be tolerated within at We advocate that utilization of EDM would thereafter be most 20 percent if these two mandates are strictly obeyed. plentiful according to your perspectives of looking at it. Removal of another 5 percent of data redundancy is later possible if we conduct a certain set of technical details. The VII. SEPARATION OF DATA FROM PROGRAM well-known data table normalization or data table It is needless to say that EDM is the must to be secured and decomposition theories come into play for this further kept as an asset prior to the programming of information removal. So, the benefit accrued from the data redundancy
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 11 removal efforts by application of normalization theories is Detection and Normalization. VLDB Journal, Vol. considered to be far less than we get from the efforts made 17, 203-223. at the stage of job description, which is about 30-to-45 percent of removal in data redundancy in an entire corporate database. It is adding one more flower to a beauty itself already seized if the normalization theories are applied to make tables best fit with minimal redundancy in them, but we certainly might have no regret at all when they happen to be not applied for some reason under the premise that data redundancy of all sort has already been sorted out and managed to be ruled out prior to table formulation. The adage ―Trying to start with guarantees almost half-way done already‖ still prevails in the world of information system development and making IS fit or well-being in any situation or environment comes true when we immersed to think in this manner. Consequently, the earlier we preoccupied with the trial of data redundancy removal, the better the outcome of information systems in terms of performance, clarity, transparency and promptness in w response time.
IX. REFERENCES e 1) [Cukier2010] Cukier, K. (2010, Feb. 25). Data, Data Everywhere. A Special Report on Managing Information, The Economist. Retrieved May 1, i 2010,from http://www.economist.com/specialreports/displayst ory.cfm?story_id=15557443.html V 2) [KaBoZe2010] D. Katz, M. Bommarito, & J. Zelner (2010, March 1). The Data Deluge. The Economist print edition.
3) [JoJB2007] J. Jones & E. Johnson (2007). Building and Maintaining A Database from Any ER Model. White Papers : Computer Associates. l 4) [Moon2004] S. Moon (2004). Data Architecture, Hyung-Seol Publishing Company. r 5) [KDLM2007] N. Kim, D. Lee and S. Moon (2007). Behavior-Inductive Data Modeling for Enterprise Information Systems. Journal of Computer Informationa Systems, Vol. 48, No. 1, 105-116.
6) [KSLM2008] N. Kim, S. Lee & S. Moon E (2008). Formalized Entity Extraction Methodology for Changeable Business Requirements. Journal of Information Science and Engineering, Vol. 24, No. 3, 649-671.
7) [YuJa2008] C. Yu & H. V. Jagadish (2008). XML Schema Refinement through Redundancy
P a g e | 12 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Web Mining: A Key enabler for Distance Education
D.Santhi Jeslet1 Dr . K.Thangadurai2
Abstract-This paper deals with introduction of one of the 1. Web Content Mining: It is the process of discovering application of data mining which is know as web mining. It useful information from the web which may be in the discuss about various categories of web mining. It also deals form of text, images, audio and video. For the with the application of web mining in distance education and discovery it uses the techniques of Artificial describes the possibilities of application. In this fast world Intelligence (AI), Database and most specifically Data everyone wants to be educated by acquire huge knowledge in a Mining (DM). short duration. They do not want to spend some fixed time for their education. Whenever a person is free they can learn and gain the knowledge. 2. Web Structure Mining: It helps to derive knowledge of Keywords-Data mining, web mining, distance education interconnection of documents, hyperlinks and their relationships. It uses graph theory to analyze the node I. INTRODUCTION and connection structuring of a web site. ow-a-days many organizations accumulate huge 3. Web Usage Mining: It is alsow called as web logs N amount of data. This leads to swell the size of the database as the time passes. Traditional database queries mining. This helps to judge about the usage of a web access a database using SQL queries. The output of this page. It uses computer network concepts, artificial could be data from database that satisfy the query. This intelligence and database.e output cannot give any novice information or correlation II. OBJECTIVES OF DISTANCE EDUCATION among the data. So we need a technique that finds the i hidden information from data collection in a database In the last few decades education has undergone many community which is of large size. This technique is called changes. Class room teaching is needed for face to face the ―Data Mining‖. It discovers valid, novel, potentially education which comprises of class room, presence useful new correlation and new trends from the large (physical) of some learners and a teacher/tutor. Here teacher/tutorV plays a vital role. But by the introduction of amount of data. Data mining uses pattern recognition techniques, statistical and mathematical techniques for its distance education, the interactions between the tutor and discovery. the learner have been very much reduced. Even the In the recent trend, lots of databases are available in the interaction between the learners has become almost zero. web. Not only the database, many valuable informations are The main aim of distance education is to make the society to also available in WWW. So the search area for any acquire more knowledge irrespective of the place where y they are. Those who do not want to stick on to the rules of information has become very vast. Web mining is an application of data mining which uses the data mining regular education system, prefer to earn knowledge through techniques to automatically discover and extract informationl distance education. It also encourages working people to from Web documents/services. It can also be applied to attain their learning goals. semi-structured or unstructured data like free-form text. III. HOW WEB HELPS IN DISTANCE EDUCATION? Web mining activities can be divided intor three categories: content mining, structure mining, and usage mining. The The communication between the tutor and the learner can be taxonomy of Web mining is depicted in the figure. enhanced by the introduction of distance education through web. Here learners work individually at their own place, a with the help of some study materials i.e. system, computer
program and internet. Time and space limitation of education disappears. Tutors interact with the student and E the learner interacts with the tutor via internet. The tutor supply information and learner gets it.
______
1 About- Department of Computer Science, M.G.R.College,Hosur,TN, India (e-mail: [email protected]) About-2Department of Computer Science, Government Arts College( Men ) , Krishnagiri- 635 001, India
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 13
Since many softwares are very simple and user friendly, no from two points of view: aggregate and individual path. need to get special training for working with computer. Aggregate path includes the process of clustering the Power of computers makes student to improve their ability. registered learners. The web site database has the registered Role of tutor is entirely changed. Tutor communicates and learner‘s details. This can be segmented by one of the leads the course of their learning path. Learners will be clustering techniques to discover learners with similar grouped. They learn from each other and they also assess characteristics. By using this we can determine most each other. It allows the learners to apply their knowledge in frequently visited paths of learners. Individual path helps to different situations and to solve practical problems determine a set of frequently visited web pages accessed by according to the feedback of their own action. These a learner during their visits to the server changes in educational system have developed By discovering such aggregate and individual paths for constructivism. Constructivism means learners involve learner in distance education helps in the development of actively constructing meaningful knowledge through effective customized education. Associations and experience. correlation among web pages can be discovered using association rules. This guides to discover the correlations IV. APPLICATION OF WEB MINING IN DISTANCE EDUCATION among references to various web pages available on the Organization that is responsible for distance education server by a learner or learners. Based on this the tutor collect huge volume of data, which are generated can also judge the standard of the learner. automatically by web servers and collected in the server V. CONCLUSION access logs. They also collect information from learner w (referrer) logs which contain information about the referring Web mining in distance education provides a lot of open pages for each page and also from user registration. teaching resources, so that people can teach and learn Through this an organization can get idea about thinking anytime and anywhere. It e helps the organization that is styles, learns their expectations and also about the web site responsible for distance education to discover the learner‘s structure. This helps to improve the efficiency of the web access habit and the study interest. It guides the teacher to site that is responsible for improving the knowledge of the adjust his/her teachingi techniques and the speed of teaching learners. depending on the learner‘s knowledge. So web mining Before gathering histories using mining algorithms, number technology is a key enabler of distance education. of data preprocessing issues such as data cleaning has to be VI. REFERENCES performed. The major preprocessing task is data cleaning. V This is used for removing irrelevant information in the 1) Youtian QU, Lili ZHONG, Huilai ZOU, Chanonan server log. WANG. ―Research About The Application Of Web Mining In Distance Education Platform. Scalable Computing And Communication, Eighth International Conference On Embedded y Computing,2009.SCALCOM EMBEDDEDCOM‘09 International Conference l On Digital Object Identifier 2) WANG Jian And LI Zhuo-Ling. Research And Realization Of Long-Distance Education Platform r Based On Web Mining, Computational Intelligence And Software Engineering 2009, Cise 2009, International Conference On Digital Object Identifier a 3) Sung Ho Ha, Sung Min Bae, Sang Chan Park. Web Mining For Distance Education, Management Of Innovation And Technology,2000,ICMIT 2000, E Proceeding Of The 2000 IEEE Conference On Volume 2, Digital Object Identifier 4) Zhang Yuanyuan, Mo Quian. Research Of The extracted access histories of each individual learner are Constructivism Remote Education Based On Web representing the physical layout of web sites with web page Mining , Education Technology And Computer and hyperlinks between the pages. Once user access Science 2009, ETCS‘09, First International histories have been identified, perform web page traversal Conference On Volume 2, Digital Object Identifier path analysis for customized education and web page 5) Margaret H. Dunham And S. Sridhar. Data Mining: association for virtual knowledge structures. Introductory And Advanced Topics By using different path analysis such as graph representation 6) Pieter Adriaans And Dolf Zantinge. Data Minin we can determine most frequently traversal patterns form the physical layout of a web site. Path analysis is performed P a g e | 14 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Optimized Remote Network Using Specified Factors As Key Performance Indices
John S.N1 Okonigene R.E2 Akinade B.A3
Chukwu I.S3
Abstract-This paper discuss the implementation of an locations across each link respectively. The key optimized remote network, using latency, bandwidth and performance indicators (latency, bandwidth and packet drop packet drop rate as key performance indicator (KPI) to rate) [4, 5, 6] were recorded using standard monitoring tools measure network performance and quality of service (QoS). to monitor each of the experiment performed.Graphical We compared the network performance characteristics derived analysis of the data obtain from the link performance were on the Wide Area Network (WAN) when using Fiber, VSAT and Point-to-Point VPN across the internet respectively as the used as the bases for the conclusion made in this paper using network infrastructure. Network performance variables are latency, bandwidth and packet drop rate as key performance measured across various links (VAST, Fiber and VPN across indicator for network performance the internet) and the corresponding statistical data is analyzed w and used as base-line for the optimization of a corporate II. NETWORK PERFORMANCE CRITERIA network performance. The qualities of service offered on the are able to access applications and carry out given task network before and after optimization are analyzed and use to without undue perceived delay, error or irritation. The determine the level of improvement on the network e primary measure of user perceived performances are performance achieved. availability and completion time. It is important to identify Keywords-Key performance indicator, optimized remote whether utilization factors,i collision rate or bandwidth network, latency, bandwidth, WAN, VSAT. congestion are responsible for network problems [7]. In I. INTRODUCTION general, the performance of a computer network can be divided into three sections for easy analysis and trouble- ost network users often attribute the problem of slow shooting: V network and poor quality of service to lack of M The performance of the application, sufficient bandwidth, which is not generally correct. The performance of the servers, Sometimes, poor network performance can be traced to network congestion, high packet drop rate, chatty protocols The performance of the Network infrastructures. and high latency [1] among others. This paper uses the technique of network base lining to obtain the best Based on end-user perception of the network, we can also y view the network performance in terms of service oriented combination of network metrics that can enhance the performance of network resources up to maximum data flow and efficiency oriented as shown in the Fig. 1. energy (MDFE) which allows maximum amountl of data to be sent in the fastest amount of time using the optimum bandwidth capacity [2]. We assume that the Server and client processing time are minimal relativer to the total time it takes to complete a transaction. Hence, it attributes the cause of service transaction delays to WAN delay. It try to find out the causes of poor quality of service across the a WAN and makes recommendation or how to implement efficient remote network with better quality of service (QoS) [3]. In the methodology, three sets of parallel links (Fiber, VSAT and Point-to-PointE VPN across the internet) of equal bandwidth are set up between two geographically separate Fig:1. Block diagram of IT performance locations. Files of different size were sent between the ______It is noted that, service oriented performance measures how well an application provides service to the customer, About-1 Department of Electrical and Information Engineering, Covenant University, Ota, Nigeria (e-mail; [email protected]) whereas efficiency oriented performance measure how About-2 Department of Electrical & Electronics Engineering, Ambrose Alli much of available channel resource are actually used to University, Ekpoma, Nigeria (e-mail; [email protected]) provide end-user request. This tend to measure how much of 3 About- Department of Electrical and Electronics Engineering, University available channel resources are being wasted due to of Lagos Akoka, Yaba, Lagos Nigeria (email;[email protected]) (e-mail; [email protected]) inefficiencies inherent in the communication channel.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 15
III. METHODOLOGY Bandwidth Throughput , bps The performance of a wide area network can be verified by TotalLatensy studying the effect of network contribution to transaction time (NCTT) on the network [3]. MSS 1 In a high performance network, TCP packets are transferred Also, Thoughput , bps across the WAN with minimal delay (low latency) within RTT p the optimum load limit. When the network becomes overloaded, congestion sets in and TCP packets are drop and where, MSS – Maximum segment size (fixed for each consequently re-transmitted which adds to the total time internet path, typically 60 bytes) required to complete a transaction in a busy network [8]. RTT – Round trip time (as measured by TCP) Network contribution to transaction time is the sum of the P – Packet loss rate (%) round-trip times necessary to complete a given transaction The efficiency of the WAN link can be calculated from type, plus the time for recovery from any lost packets during statistical data on the link utilization, where Utilization (U) the transaction [3]. The network contribution to transaction [7] is the percentage of total channel capacity currently time can be calculated as: being consumed by aggregate traffic. NCTT E RTT L RTO Traffic Utilization 100 where, E – number of round-trip exchange necessary to Channel capaciy complete the transaction, w RTT – round-trip time for packet transfer, Also L – number of round-trips exchanges that experience packet [(Data sent data received)8 Utilization 100 loss, Link speed sample time RTO – retransmission time-out e The number of losses experienced in the course of a Further more, in this research, three point-to-point WAN transaction depends on round-trip packet loss probability, p. link were setup betweeni two separate locations A and B For a two-ways traffic path, loss probability is given by: using three different WAN technologies, namely: (i) 128/256Kbps leased fiber line P P 1{(1 P )(1 P )} RTT oneway otherway (ii) 128/256Kbps point-to-point VPN across the If each round-trip exchange, takes Ai attempt to complete public internet. successfully, and the total attempts to complete a transaction (iii) 128/256KbpsV VSAT link given as: The key performance indicators (KPI) metrics for the E research were Latency, Bandwidth and Packet Drop Rate. A Ai The following approach methods were used to obtain the i1 , then required performance characteristics of the various WAN Prob(A a) pa1(1 p) technologies adopted: i y Expected value of A is given by: (a) Files of various sizes were sent from Host A to Host B across the different WAN links. a1 E(1 p) l a E {A} Ex axp (1 p) axp (b) These KPI values were measured and recorded for a1 p a1 different remote network infrastructure in use (Fiber, VSAT, this converge as: r Point-to-Point VPN across the internet with bandwidth of E 128/256 kbps respectively) E{A} for 0 p 1 1 p (c) The performance statistic values obtained in both cases were plotted in graphical form and analyzed. A is equal to the constant E plusa a random number of losses (d) Recommendation for error correction and L, so E{A} E E{L} performance improvement were made E p (e) Conclusion was drawn based on the result obtained E{L} EE E( ) from the key performance indices. 1 p 1 p , and the average The alternative WAN links between two remote locations NCTT E RTT [E{L} RTO] shown in Fig. 1, were routed to Host A and Host B using different connection links (Fiber, VSAT, P2P VPN) to Note that the probability distribution of NCTT is a set of measure the KPI of the network. discrete values [11] at
(E x RTT), VSAT
{(E x RTT) + (1 x RTO)}, FIBER {E x RTT) + (2 x RTO)}, Pt-2-Pt VPN The performance of the WAN and remote network can also HOST A HOST B be viewed in terms of its effective throughput. Throughput is the quality of error-free data that can be Fig. 1. Schematic diagram of alternative WAN links transmitted over a specified unit of time [9]. between two remote locations P a g e | 16 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
The Table 1, shows the result of the throughput obtained from the remote link of the WAN with different Packet Drop Rate of the links. Table 1. Throughput result of a network as affected by both the latency and the packet drop rate
LATENCY THROUGHPUT (ms) TP1(KBPS) TP2(KBPS) TP3(KBPS) TP4(KBPS) TP5(KBPS) TP6(KBPS) TP7(KBPS) 0.01% PDR 0.05% PDR 0.10% PDR 0.50% PDR 1.00% PDR 2.00% PDR 3.00% PDR 9 1822.22 814.95 576.29 257.70 182.22 128.85 105.20 30 546.67 244.48 172.89 77.31 54.67 38.66 31.56 60 273.33 122.24 86.44 38.66 27.33 19.33 15.78 90 182.22 81.50 57.63 25.77 18.22 12.86 10.52 120 136.67 61.12 43.22 19.33 13.67 9.66 7.87 150 109.33 48.90 34.58 15.46 10.93 7.73 6.32 300 54.67 24.45 17.29 7.73 5.47 3.87 3.16 500 32.80 14.70 10.37 4.64 3.28 2.32 1.90 800 20.50 9.17 6.48 2.90 2.05 1.45 1.18 1000 16.40 7.34 5.19 2.32 1.64 1.16 w0.95 situation becomes worse when increasing packet drop rate is associated with VSAT links. For a Point-to-Point virtual
e private network (VPN) across the public internet with average latency of 250 milliseconds, most real-time and i data-based applications performance is considered
favourable. However, Point-to-Point VPN is always
associated with higher packet drop rate than VSAT or Fiber links becauseV of the large number of hop and routing
protocols across the part from source to destination. This is even worse when considering a two-way traffic situation usually experienced in real life scenario.
Fig. 3. Graph of throughput against latency for different IV. IMPROVEMENT IN QUALITY OF SERVICE packet drop rates y The improvement in quality of service (QoS) can be seen by The Fig. 3, shows the effect of packet drop rate on the comparing the network throughput of the Fiber, VPN, and network throughput over different latency. The throughput l VSAT link of a network. If we assume a minimal packet of a network is affected by both the latency and the packet loss for all the three infrastructures: latency of 850ms for drop rate of the link where an increase in latency decreases VSAT, Point-to-Point VPN across the internet at 260ms and the network throughput performance. Similarly, the r Fiber link of 25ms. throughput also decreases as the packet drop rate increases Throughput for VSAT gives 0.6168Mbps that of the VPN which might put the network quality of service to network across the internet gives 2.016Mbps and the throughout for degradation. Analysis of the achieved result indicates that, fiber gives 20.97 Mbps. By replacing the VSAT the best quality of service willa be obtained by using a link infrastructure with Fiber Optic link, the following whose latency is between 1 – 30 milliseconds and packet improvement in QoS would be achieved. drop rate of 0.01% or less. Such latency can only be Hence the improvement in QoS gives achieved using Fiber or radio link where packets are propagated at the speedE of light with very low bit-error-rate. 20.97 0.6168 100 => 3300% The worst quality of service occurred when latency is 0.6168 between 800 – 1000 milliseconds and the packet drop rate Similarly, replacing the VPN with Fiber optic link would be stands at 3% or more. achieved with an improvement in quality of service QoS as The link latency of 800 milliseconds and above is usually follows: associated, with VSAT link because of its technological 2.016 0.6168 limitation caused by distance along the propagation path 100 => 530% between two locations via the orbital satellite. 0.6168 However, VSAT links could still be used for none delay- sensitive application if there are no packet loss. The
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 17
V. CONCLUSION The Key Performance Indices of network services (packet drop rate, latency and throughput) affects the network performance as one the factors goes out of the optimized range value obtained in the research work. Under perfect conditions (assuming minimal percent of packet loss), the use of WAN link with low latency, and use of optimized bandwidth would significantly enhance the quality of service (QoS) experienced by a remote network user over a WAN link
VI. REFERENCE 1) Bilal Haider, M. Zafrullah,M.K. Islam, ―Radio frequency optimization & quality of service evaluation in operational GSM network,‖ Proceedings of the world Congress on Engineering and Computer Science 2009, Page 1, Volume 1. WCECS, Oct 20-22, 2009, San Francisco USA. w 2) Daniel Nassar, ―Network Performace Baselinning,‖ Publisher MTP, 201 West, 103rd Street, Indianapolis, IN46290 USA, 2002. 3) Network contribution to transaction time. ITU-T e Recommendation G.1040 ITU-T Study Group 12 under the ITU-T Recommendation A.8 procedure i (2005-2008). 4) ―Effect of network latency on load sharing in distributed systems,‖ Journal of parallel and distributed computing volume 66, issue 6 Inc V Orlando, FL USA (June 2006). 5) Jorg Widmer, Catherine Boutremans, Jean-Yves Le Boudec: End-to end congestion control for TCP – friendly flows with variables packet size. Publisher: ACM, 2004. 6) Gregory W. Cermak, ―Multimedia Quality asy a function of bandwidth, packet loss and latency,‖ International Journal of Speech Technology. Publisher: Springer Netherlands Issue: l Volume 8, Number 3, 2005.
7) Michael Lemm, ―How to improve WAN application performance,‖r 2007 http://ezinearticles.com.
8) Lai King Tee, ―Packet error rate and Latency requirements for a mobilea wireless access system in an IP network,‖ Vehicular Technology Conference, Pages 249 -253, 2007. 9) J.Scott Haugdahl, ―Network Analysis and Troubleshooting,‖E Publisher: Addison –Wesley, 2003.
P a g e | 18 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Analysis of the Routing Protocols in Real Time Transmission: A Comparative Study
IKram Ud Din1 Saeed Mahfooz2
Muhammad Adnan3
Abstract-During routing, different routing protocols are used at associated with routing in large networks that are beyond the the routers to route real time data (voice and video) to its scope of RIP. destination. These protocols perform well under different IGRP can select the fastest path based on the bandwidth, circumstances. This paper is about to evaluate the performance delay, reliability and load. By default, it uses only of RIP, OSPF, IGRP, and EIGRP for the parameters: packets bandwidth and delay metrics. To allow the network to dropping, traffic received, End-to-End delay, and variation in delay (jitter). Simulations have been done in OPNET for scale, IGRP also has a much higher maximum hop-count evaluating these routing protocols against each parameter. The limit than RIP. OSPF was developed by the Internet results have been shown in the graphs which show that IGRP Engineering Task Force (IETF) in 1988. OSPF shares performs the best in packets dropping, traffic received, and routing information between routers belongingw to the same End-to-End delay as compared to its other companions (RIP, autonomous system. It was developed to address the needs OSPF, and EIGRP), while in case of jitter, RIP performs well of scalable, large internetworks that RIP could not. EIGRP comparatively. is an advanced version ofe IGRP that provides superior Keywords-Routing, Protocol, Delay, Packet Loss, Jitter operating efficiency such as lower overhead bandwidth and I. INTRODUCTION faster convergence [1]. As we are examiningi the video and voice packets during protocol is a set of rules that reveals how computer video conferencing and voice packet transmission in this A systems communicate with each other across networks. paper, therefore a short introduction of those protocols must A protocol also functions as the common medium by which also be inevitable that are used for the transmission of these different hosts, applications, or systems communicate. The packets. V In video conferencing, Real Time Transport data messages are exchanged when computers communicate Protocol (RTP) is used for carrying out video packets, and with one another. Examples of messages are sending or for session establishment between the two systems, either receiving e-mail, establishing a connection to a remote H.323 or SIP is used. RTP provides end-to-end network machine, and transferring files and data. There are two transport functions premeditated for real time applications classes of protocols at the network layer, i.e., routed and such as video and voice. Those functions comprise payload- routing protocols. The transportation of data acrossy a type identification, time stamping, delivery monitoring and network is the responsibility of the routed protocols, and sequence numbering [2]. routing protocols permit routers to appropriately ldirect data Voice over Internet Protocol (VoIP) is a means of from one place to another. In other words, protocols that compressing voice using a standardized codec, then transfer data packets from one host to another across encapsulating the results within IP for transport over data router(s) are routed protocols, and to r exchange routing networks. For establishing and transporting VoIP traffic, information, routers use routing protocols. IP is considered H.323 is a standard protocol [3]. as a routed protocol while routing protocols are: i). Routing The H.323 standard has been developed by the ITU-T for Information Protocol (RIP), ii). Interior Gateway Routing vendors and equipment manufacturers who provide VoIP Protocol (IGRP), iii). Open Shortesta Path First (OSPF), and service. It was originally developed for multimedia iv). Enhanced Interior Gateway Routing Protocol (EIGRP), conferencing on LANs, but was later extended to VoIP. The etc. To forward data packets, the Internet Protocol (IP) uses 1st and 2nd versions of H.323 were released in 1996 and routing table. RIP uses hop count to determine the path and 1998, respectively. Currently, its version 4 is under distance to any link Ein the internetwork. In case of multiple consideration.Session Initiation Protocol (SIP) is the Internet paths to a destination, RIP selects the path that has fewest Engineering Task Force (IETF) standard for multimedia or hops. The only routing metric RIP uses is hop count; voice session establishment over the Internet. It was therefore, it does not necessarily opt for the fastest path to a proposed as a standard in February 1999. SIP: a detailed destination [1].IGRP is developed to address the problems protocol that stipulates the commands and responses to set ______up and tear-down calls. It also details features such as proxy,
About-1,2,3 Department of Computer Science, University of Peshawar, security, and transport (TCP or UDP) services. SIP Pakistan describes end-to-end call signaling between devices. SIP (e-mail1;[email protected]) defines, as the name implies, how the session is established (e-mail; [email protected]) between two IP nodes with or without media [2]. (e-mail; [email protected])
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 19
The goal of this study is to measure the performance of route the packets based on shortest path, shortest delays, and throughput, packet loss, jitter, and delay in real time greatest bandwidth factors. transmission. The simulations have been done in OPNET, The invention of Curtis et al [11] makes routing decisions. because OPNET has originally been developed for network In their invention, a best path is determined according to an simulation, and it is fully usable as an ample simulation tool IGRP, EIGRP, OSPF, BGP or other routing task that can with higher investment. OPNET provides a complete provide multiple routing paths. A first variety of routers in development environment for the specification, simulation the best routing path is determined. and performance analysis of communication networks [4], Their invention also makes decision for routing a received [5], [6]. OPNET must be able to simulate different network packet. If the first variety of routers had a noise level, the devices and various kinds of transmission lines, and display packet is forwarded to a next router in the best routing path. such information as packet end-to-end delay, delay variation If not, then according to said IGRP, EIGRP, OSPF, BGP, or (jitter), and packet loss in the network. The main purpose is the other routing function in a second routing path is to analyze how the network having speech activity. The determined [11]. voice quality can be characterized by two measurements: i) A network facilitates the delivery of packets from a source delay of the signal, and ii) distortion of the signal. The delay to destination. This delivery is possible through routers. disturbs the interactivity, while distortion reduces the Packets have destination addresses that let routers to legibility [7]. Many factors such as a heavy load in the determine how to route the data packets. A router has a network that creates higher traffic, may contribute to the routing table which stores network-topology information. congestion of network interface [8]. Therefore, this research With the help of network-topology information,w the router is important to be managed in order to measure and predict forwards packets to the destination. A routing protocol data transfers in real time applications. The remaining paper consists of methods to select the best path and exchange is structured as: Section 2 describes the work done in the topology information. There eare two main classes of routing evaluation of routing protocols. Section 3 illustrates the protocols: distance vector routing protocols, e.g. RIP and working environment for the implementation of these IGRP, and link-state routing protocols, e.g. OSPF. For protocols. Section 4 explains the OPNET simulations of the enterprise networks, OSPFi is often preferred [12], [13]. mentioned protocols. Section 5 concludes our work, and To exchange service availability and network reachability references are given in section 6. information, router implements one or more routing protocols. In a specific implementation, the border router II. RELATED WORK implementsV RIP, OSPF, IGRP, EIGRP, or BGP [14]. Privacy and security become necessary requirements for Routing protocols accept network state information and then Voice over IP (VoIP) communications that need security on the basis of such accepted information, update network services such as integrity, confidentiality, non-replay, non- topology information. Routing protocols also distribute the repudiation, and authentication. Quality of Service (QoS) of network state information. Path generation and forwarding the voice is affected by jitter, delay, and packet loss [9]. information generation are also duties of the routing Normally, telecommunication network consists of routersy protocols [15], [16]. which optimize the packets' transmission. Practically, a III. WORKING ENVIRONMENT packet is transmitted through a number of pathsl from one router to another. The selection of path is based on routing When a node wants to transmit real time applications (video tables' information usually received according to routing or voice) over IP then it must have to pass through a router. protocol. A routing protocol is one that providesr techniques For transmission of real time applications, real time facilitating a router to build a routing table. It also shares transport protocol (RTP) is used and the session is routing information with other neighboring routers. established between two remote stations through session When a router is switched off, the packets passing through initiation protocol (SIP) or H.323. Except, these real time that router is passed to anothera router. This operation is transmission protocols, some routing protocols are also used known as "routing protocol convergence". Packets are which route the real time applications to its destination. possibly to be lost during a routing protocol convergence These are: RIP, OSPF, IGRP and EIGRP. [10]. E Consider the following scenario having two servers i.e. Networks like the Internet are renowned today. Such VoIP and video, and two clients which are: VoIP and video networks consist of routers, switches and hubs, client. The distribution of the servers and clients are at two communication media, and firewalls. Servers and clients are different location, i.e., servers are located at site Lahore (in usually interconnected by networks. During communication this case) and the clients at the other site (say Karachi). through the Internet, there may be many possible routing paths and many routers between a source and destination. When packets arrive at a router, the router decides as to the next hop in a path to the destination. For making this decision, many algorithms are used, such as RIP, OSPF, IGRP, and EIGRP, etc. The RIP and OSPF try to route the packets to a destination via the path consisting fewest number of nodes (routers). The IGRP and EIGRP attempt to P a g e | 20 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
voice quality [19]. Jitter plays a vital role for the measurement of the Quality of Service (QoS) of real time applications. The effect of end-to-end delay, packet loss, and jitter can be heard as: The calling party says, ―Hello Sir, how are you?‖ With end-to-end delay, the called party hears,…...Hello Sir, how are you? With packet loss, the called party hears, He.lo….r, w are you? With jitter, the called party hears, Hello…Sir, how....are… you? [2].
IV. SIMULATION RESULTS In this section, a scenario was tested in which the delay, packet loss, and jitter were examined.
Figure 2 shows the number of IP packets dropped per second. Figure 3 illustrates the traffic received during video conferencing. The voice traffic received is shown in figure 4. The end-to-end delay in voice packetsw is given in figure 5, while variation in delay or jitter is clear from figure 6. A. Performancee Evaluation The number of packets dropped is given in figure 2; in which the less numberi of packets is lost when IGRP is Fig. 1: structure of the network implemented at the routers. While a huge amount of packets A. IP Packet/Traffic Dropping is dropped if OSPF works as a routing protocol. IGRP also works well in case of receiving video and voice packets, When a router or switch is unable to receive incoming data V given in figure 3 and 4, respectively. The end-to-end delay packets at a given time, is called Packet loss/drop. The real and variation in delay (jitter) in voice traffic is shown in time applications (video or voice) are drastically degraded figure 5 and 6, respectively, in which IGRP is also the best by packet loss [17]. protocol. In the given figures, the X-axis shows the amount B. Video/Voice Traffic Receiving of time and the Y-axis shows the number of packets in y figure 2, 3, and 4, and in figure 5 and 6, it shows the value Video/voice traffic is the total number of audio and video of jitter and delay. packets received during video conferencing or otherl type of real time communication (e.g., IP telephony).
C. End-to-End delayr End-to-end delay depends on the end-to-end data paths/signal paths, the payload size of the packets, and the
CODEC. Delay is the latency;a one-way or round-trip, encounter when data packets are transmitted from one place to another. In order to maintain the expected voice quality for Voice over IP (VoIP),E the roundtrip delay must remain within almost 120 milliseconds. [17].
D. Variation in Delay (Jitter)
In computer networks, the term jitter means variations in delay of packets received. Jitter is an essential quality of service (QoS) factor in evaluation of network performance. It is one of the significant issues in packet based network for real time applications [18]. The variation of interpacket delay or jitter is one of the principal factors that disturbs
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 21
Fig. 2: Number of packets dropped per second
w
e
i
Fig. 5: End-to-End Delay in voice Packets Fig. 3: video traffic received per second
V
y
l
r
a
E
Figure 4: voice traffic received per second Fig. 6: Jitter in Voice Packets
P a g e | 22 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
V. CONCLUSION quality-ECSQ. 2002, springer-Verlag, Berlin, Heideberg: Helsinki, Finland. p. 63-72. The size of today's networks has been growing quickly and 9) Mohd Nazri Ismail and M.T. Ismail, Analyzing of support complicated applications, e.g., video conferencing Virtual Private Network over Open Source and voice messages. Quality transmission is demand of the Application and Hardware Device Performance. time. This needs some good results producing routing European Journal of Scientific Research, 2009. protocols at the routers. The work done in this paper 28(2): p. 215-226. analyzes the available routing protocols: RIP, OSPF, IGRP 10) Nicolas Dubois and B. Fondeviole, Method for and EIGRP for packets dropping, traffic received, End-to- Control Management Based on a Routing Protocol. End delay, and variation in delay (jitter). Our work is based US Patent, 2008. No: US 2008/0025333 Al. on OPNET simulation for each of these parameters. The 11) Richard Scott Curtis and J.D. Forrester, System, study presents a comprehensive result for each protocol Method and Program for Network Routing. US against the parameters: packets dropping, traffic received, Patent, 2008. No: US 2008/0317056 Al. End-to-End delay, and variation in delay (jitter) one by one. 12) Thomas P. Chu, R.N. and Y.-T. Wang, IGRP performs well in packets dropping, traffic received, Automatically Configuring Mesh Groups in Data and End-to-End delay as compared to its other companions Networks. US Patent, 2010. No: US 2010/0020726 (RIP, OSPF, and EIGRP), while in case of jitter; RIP Al. performs a bit well than IGRP. 13) Xiaode Xu, M.S. and D. Shah, Routing Protocol VI. REFERENCES with Packet Network Attributesw for Improved Route Selection. US Patent, 2009. No: US
1) Cisco Systems, I., Cisco Networking Academy 2009/0059908 Al Program CCNA 1 and 2 Companion Guide Third Edition. 2003. e 14) Rosenberg, J., Peer-to-Peer Network including
2) Cisco Systems, I., Cisco Voice Over IP. Student Routing Protocol Enhancement. US Patent, 2009. Guide, ed. V. 4.2. 2004. No.: US 2009/0122724i Al. 3) Shufang Wu, M.R., Riadul Mannan, and Ljiljana 15) Bruce COLE and A.J. Li, Routing Protocols for Trajkovic, OPNET Implementation of Accommodating nodes with Redundant Routing Megaco/H.248 Protocol. 2003. Facilities. US Patent, 2009. No: US 2009/0219804
4) Mohd Ismail Nazri and A.M. Zin, Emulation AlV. Network Analyzer Development for Campus 16) Russell I. White, S.E.M., James L. Ng, and Alvaro Environmetn and Comparison between OPNET Enrique Retana, Determining an Optimal Route Application and Hardware Network Analyzer. Advertisement in a Reactive Routing Environment. European Journal of Scientific Research, 2008. US Patent, 2009. No.: US 2009/0141651 Al. 24(2): p. 270-291. 17) Paul J. Fong, E.K., David Gray, et.al, Configuring
5) Mohd Ismail Nazri and A.M. Zin, Evaluationy of Cisco Voice Over IP, ed. S. Edition. Software Network Analyzer Prototyping Using 18) C. Demichelis and P. Chimento, IP Packet Delay Qualitative Approach. European Journal of Variation Metric for IP Performance Metrics Scientific Research, 2009. 26(3): p. 170-182.l (IPPM). Request for Comments: 3393, 2002.
6) Sood, A., Network Design by using OPNET™ IT 19) Pedrasa, J.R.I. and C.A.M. Festin, An Enhanced GURU Academic Edition Software. Rivier Framing Strategy for Jitter Management, in Academic Journal, 2007. 3(1). r TENCON 2005 IEEE Region 10. 2005:
7) Sjögren, H.R.C., Voice over IP, simulated IP- Melbourne, Qld. p. 1-6. network, in School of Mathematics and Systems
Engineering. 2008, Växjöa University. 8) Chang, W.K. and H. S, Evaluating the performance
of a web site via queuing theory, in software E
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 23
An Empirical Study on Data Mining Applications
P.Sundari1 Dr.K.Thangadurai2
Abstract-The wide availability of huge amounts of data and the need for transforming such data into knowledge influences Which products should be promoted to a particular towards the attraction of IT industry in data mining. During customer? – Targeted Marketing the early years of the development of computer techniques for business, IT professionals were concerned with designing What is the probability that a certain customer will databases to store the data so that information could be easily leave for a competitor? – Customer Relationship and quickly accessed. The restrictions are storage space and Management the speed of retrieval of the data. Needless to say, the activity What is the appropriate medical diagnosis for this was restricted to a very few, highly qualified professionals. patient? – Bio medical Then came an era when Database Management System What is the likelihood that a certain customer will simplified the task. Thus almost any business such as small, default or pay back a loan? – Banking medium or large scale began using computers for day - to- day Which products are bought most often together? – activities. Now what is the use of all this data? Up to the early 1990’s the answer to this was “NOT much”. No one was really Market Basket Analysis w interested in utilizing data, which was accumulated during the How to identify fraudulent users in telecommunication process of daily activities. As a result a new discipline in industry? – Fraudulent pattern analysis Computer Science, Data Mining gradually evolved. Data These types of questions can be answered quickly and easily mining is becoming a pervasive technology in activities as e if the information hidden among the huge amount of data in diverse as using historical data to predict the success of a the databases can be located and utilized. We will discuss marketing campaign, looking for patterns in financial about the applicationsi of data mining in the following transactions to discover illegal activities or analyzing genome sequences. This paper deals with the application of data mining paragraphs. in various fields in our day to day life. II. APPLICATIONS OF DATA MINING Keywords-Data Mining, Targeted Marketing, Market Based Analysis, Customer Relations AlthoughV a large variety of data mining scenarios can be discussed, for the purpose of this paper the applications of I. INTRODUCTION data mining are divided into the following categories: Data Mining – An Overview Science and Engineering ata mining refers to extracting knowledge from large Business D amounts of data. The data may be spatial data, Banking multimedia data, time series data, text data and web data.y Telecommunication Since Data mining is a young discipline with wide and Spatial data mining diverse applications. In this paper we will discussl a few Surveillance application domains of data mining such as Science and Engineering, Banking, Business, Telecommunication and II. (A) Science and Engineering Surveillance. r The data mining has been widely used in area of science Data mining is the process of extraction of interesting, and engineering, such as bioinformatics, genetics, nontrivial, implicit, previously unknown and potentially medicine, education and electrical power engineering. useful patterns or knowledge from huge amounts of data. It a i) Biomedical and DNA Data analysis is the set of activities used to find new, hidden or unexpected patterns in data or unusual patterns in data. The past decade has seen an explosive growth in Using information contained within data warehouse, data biomedical research, ranging from the development of mining can often provideE answers to questions about an new pharmaceuticals and in cancer therapies to the organization that a decision maker has previously not identification and study of human genome by thought to ask. discovering large scale sequencing patterns and gene functions. Recent research in DNA analysis has led to ______the discovery of genetic causes for many diseases and disabilities as well as approaches for disease diagnosis, About-1 Department of Computer Science, Government Arts College ( Women) , Krishnagiri- 635 001, India prevention and treatment. It is challenging to identify (e-mail: [email protected]) particular gene sequence patterns that play roles in About-2 Department of Computer Science, Government Arts College. various diseases. DNA data analysis is done in the (Men ) , Krishnagiri- 635 001, India following ways.[5]
P a g e | 24 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Semantic integration of heterogeneous, distributed available for many years. Data mining techniques such as genome databases SOM has been applied to analyze data and to Similarity search and comparison among DNA determine trends which are not obvious to the standard DGA sequences ratio techniques such as Duval Triangle.[4] Identification of co occurring gene sequences Data mining technique is used to an integrated-circuit Path analysis includes linking genes to different production line[2]. The data mining technique is applied in stages of disease development decision analysis to the problem of die-level functional test. Visualization tools and genetic data analysis Experiments demonstrate the ability of applying a system of The data mining technique that is used to perform mining historical die-test data to create a probabilistic model this task is known as Multifactor Dimensionality of patterns of die failure which are then utilized to decide in Reduction.[3] real time which die to test next and when to stop testing. This system has been shown, based on experiments with In adverse drug reaction surveillance, the Uppsala historical test data, to have the potential to improve profits Monitoring Centre has, since 1998, used data mining on mature IC products methods to routinely screen for reporting patterns indicative b) Banking of emerging drug safety issues in the WHO global database of 4.6 million suspected adverse drug reaction incidents.[7] Banking data mining applications may, for example, need to Recently, similar methodology has been developed to mine track client spending habits in order to detect unusual transactions that might be fraudulent.w Most banks and large collections of electronic health records for temporal patterns associating drug prescriptions to medical financial institutions offer a wide variety of banking services diagnoses.[8] (such as checking, saving, and business and individual customer transactions), credite (such as business, mortgage, ii ) Education and automobile loans), and investment services (such as The other area of application for data mining in mutual funds) [5]. It has also offer insurance services and stock services. For i example it can also help in fraud science/engineering is within educational research, where data mining has been used to study the factors leading detection by detecting a group of people who stage accidents students to choose to engage in behaviors which reduce their to collect on insurance money. The following methods are learning and to understand the factors influencing university used for financial data analysis. LoanV payment prediction and customer credit student retention.[6] A similar example of the social application of data mining is its use in expertise finding policy analysis systems, whereby descriptors of human expertise are Classification and clustering of customers for extracted, normalized and classified so as to facilitate the targeted marketing finding of experts, particularly in scientific and technical Detection of money laundering and other financial fields. In this way, data mining can facilitate Institutional crimes memory. y c) Business iii) Electrical power engineering Retail industry collects huge amount of data on sales, l customer shopping history, goods transportation and In the area of electrical power engineering, data mining consumption and service records and so on. The quantity of techniques have been widely used for condition monitoring data collected continues to expand rapidly, especially due to of high voltage electrical equipment.r The purpose of the increasing ease, availability and popularity of the condition monitoring is to obtain valuable information on business conducted on web, or e-commerce. Retail industry the insulation's health status of the equipment. Data provides a rich source for data mining. Retail data mining clustering such as Self-Organizinga Map (SOM) has been can help identify customer behavior, discover customer applied on the vibration monitoring and analysis of shopping patterns and trends, improve the quality of transformer On-Load Tap-Changers(OLTCS). Using customer service, achieve better customer retention and vibration monitoring, it can be observed that each tap satisfaction, enhance goods consumption ratios design more change operation E generates a signal that contains effective goods transportation and distribution policies and information about the condition of the tap changer contacts reduce the cost of business [5]. A few examples of data and the drive mechanisms. Obviously, different tap positions mining in the retail industry are as follows. will generate different signals. However, there was Design and construction of data warehouses based considerable variability amongst normal condition signals on benefits of data mining for the exact same tap position. SOM has been applied to detect abnormal conditions and to estimate the nature of the Multidimensional analysis of sales, customers, abnormalities.[4] products, time and region: Data mining techniques have also been applied for The multi feature data cube is a useful data structure in retail Dissolved Gas Analysis (DGA) on power transformers. data analysis. DGA, as a diagnostics for power transformer, has been Another example of data mining, often called the market basket analysis, relates to its use in retail sales. If a clothing
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 25 store records the purchases of customers, a data-mining of telecommunication, computer network, Internet and system could identify those customers who favors silk shirts numerous other means of communication and computing are over cotton ones. Although some explanations of underway. Moreover, with the deregulation of the relationships may be difficult, taking advantage of it is telecommunication industry in many countries and the easier. The example deals with association rules within development of new computer and communication transaction-based data. Not all data are transaction based technologies, the telecommunication market is rapidly and logical or inexact rules may also be present within a expanding and highly competitive. This creates a great database. In a manufacturing application, an inexact rule demand from data mining in order to help understand may state that 73% of products which have a specific defect business involved, identify telecommunication patterns, or problem will develop a secondary problem within the catch fraudulent activities, make better use of resources, and next six months. improve the quality of service. Market basket analysis has also been used to identify the e) Spatial data mining purchase patterns of the Alpha consumer. Alpha Consumers are people that play key roles in connecting with the concept Spatial data mining is the application of data mining behind a product, then adopting that product, and finally techniques to spatial data. It follows along the same validating it for the rest of society. Analyzing the data functions in data mining, with the end objective to find collected on these type of users has allowed companies to patterns in geography. So far, data mining and Geographic predict future buying trends and forecast supply demands. Information Systems (GIS) have existed as two separate Data Mining is a highly effective tool in the catalog technologies, each with its own methods,w traditions and marketing industry. Catalogers have a rich history of approaches to visualization and data analysis. Particularly, customer transactions on millions of customers dating back most contemporary GIS have only very basic spatial several years. Data mining tools can identify patterns among analysis functionality. Thee immense explosion in customers and help identify the most likely customers to geographically referenced data occasioned by developments respond to upcoming mailing campaigns. in IT, digital mapping, remote sensing, and the global diffusion of GIS emphasizesi the importance of developing Analysis of the effectiveness of sales campaigns: Customer retention – analysis of customer loyalty data driven inductive approaches to geographical analysis There are a wide variety of data mining applications and modeling. available, particularly for business uses, such as Customer Data mining, which is the partially automated search for Relationship Management (CRM). Goods purchased at hidden patternsV in large databases, offers great potential different periods by the same customers can be grouped into benefits for applied GIS-based decision-making. Recently, sequences. Sequential pattern mining can be used to the task of integrating these two technologies has become investigate changes in customer consumption and suggest critical, especially as various public and private sector adjustments on the pricing and variety of goods in order to organizations possessing huge databases with thematic and help retain customers and attract new customers. These geographically referenced data begin to realize the huge applications enable marketing managers to understand y the potential of the information hidden there. Among those behaviors of their organizations are: customers and also to predict the potential behaviorl of Offices requiring analysis or dissemination of geo- prospective customers. A data mining technique may assist referenced statistical data. the prediction of future customer retention. For example, a Public health services searching for explanations of disease company may decide to increase prices, rand could use data clusters mining to predict how many customers might be lost for a Environmental agencies assessing the impact of changing particular percentage increase in product price. land-use patterns on climate change Data mining can also be helpfula to human-resources Geo-Marketing companies doing customer segmentation departments in identifying the characteristics of their most based on spatial location. successful employees. Information obtained, such as f) Surveillance universities attended by highly successful employees, can help HR focus recruitingE efforts accordingly. Additionally, Data Mining is used by intelligence agencies like FBI and Strategic Enterprise Management applications help a CIA to identify threats of terrorism. After the 9/11 incident company translate corporate-level goals, such as profit and it has become one of the prime means to uncover terrorist margin share targets, into operational decisions, such as plots. However this led to concerns among the people as production plans and workforce levels.[1] data collected for such works undermines the privacy of a large number of people. d) Telecommunication Two plausible data mining techniques in the context of The telecommunication industry offers local and long combating terrorism include "pattern mining" and "subject- distance telephone services to provide many other based data mining". comprehensive communication services including voice, i)Pattern mining fax, pager, cellular phone, images, e-mail, computer and "Pattern mining" is a data mining technique that involves web data transmission and other data traffic. The integration finding existing patterns in data. Pattern mining is a tool to P a g e | 26 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology identify terrorist activity, the National Research Council a young field with many issues that still need to be provides the following definition: "Pattern-based data researched in depth. The diversity of data, data mining tasks mining looks for patterns (including anomalous data and approaches poses many challenging research issues in patterns) that might be associated with terrorist activity — data mining. The design of data mining languages, the these patterns might be regarded as small signals in a large development of efficient and effective data mining methods, ocean of noise."[9][10][11] Pattern Mining includes new the construction of interactive and integrated data mining areas such a Music Information Retrieval (MIR) where environments and the application of data patterns seen both in the temporal and non temporal mining techniques to solve large application problems are domains are imported to classical knowledge discovery important tasks for data mining researchers. search techniques. V. REFERENCES ii) Subject-based data mining 1) Ellen Monk, Bret Wagner (2006). Concepts in "Subject-based data mining" is a data mining technique Enterprise Resource Planning, Second Edition. involving the search for associations between individuals in Thomson Course Technology, Boston, MA. ISBN data. In the context of combating terrorism, the National 0-619-21663-8. OCLC 224465825. Research Council provides the following definition: 2) Tony Fountain, Thomas Dietterich & Bill Sudyka "Subject-based data mining uses an initiating individual or (2000) Mining IC Test Data to Optimize VLSI other datum that is considered, based on other information, Testing, in Proceedings of the Sixth ACM to be of high interest, and the goal is to determine what other SIGKDD International Conferencew on Knowledge persons or financial transactions or movements, etc., are Discovery & Data Mining. (pp. 18-25). ACM related to that initiating datum."[9] Press. g) Text Mining and Web Mining e 3) Xingquan Zhu, Ian Davidson (2007). Knowledge Text mining is the process of searching large volumes of Discovery and Data Mining: Challenges and documents from certain keywords or key phrases. By i Realities. Hershey, New Your. pp. 18. ISBN 978- searching literally thousands of documents various 159904252-7. relationships between the documents can be established. An extension of text mining is web mining. Web mining is 4) a b A.J. McGrail, E. Gulski et al.. "Data Mining an exciting new field that integrates data and text mining TechniquesV to Asses the Condition of High Voltage within a website. Web serves as a huge, widely distributed, Electrical Plant". CIGRE WG 15.11 of Study global information service center for news, advertisements, Committee 15. consumer information, financial management, education, government, e- commerce and many other information 5) Jiawei Han & Micheline Kamber. (2001) Data services. It enhances the web site with intelligent behavior, Mining: Concepts and Techniques , Morgan such as suggesting related links or recommending newy Kaufmann publishers, CA,USA. products to the consumer. Web mining is especially exciting because it enables tasks that were previously difficult to l 6) J.F. Superby, J-P. Vandamme, N. Meskens. implement. They can be configured to monitor and gather data from a wide variety of locations and can analyze the "Determination of factors influencing the data across one or multiple sites. For example the search achievement of the first-year university students r using data mining methods". Workshop on engines work on the principle of data mining. Educational Data Mining 2006. III. NEED OF DATA MINING The massive growth of data isa due to the wide availability 7) Bate A, Lindquist M, Edwards IR, Olsson S, Orre of data in automated form from various sources as WWW, R, Lansner A, De Freitas RM. A Bayesian neural Business, science, Society and many more. Data is useless, network method for adverse drug reaction signal if it cannot deliver knowledge. That is why data mining is generation. Eur J Clin Pharmacol. 1998 E Jun;54(4):315-21. gaining wide acceptance in today‘s world. A lot has been done in this field and lot more need to be done. 8) Norén GN, Bate A, Hopstadius J, Star K, Edwards IV. CONCLUSION IR. Temporal Pattern Discovery for Trends and Since data mining is a young discipline with wide and Transient Effects: Its Application to Patient diverse applications, there is still a nontrivial gap between Records. Proceedings of the Fourteenth general principles of data mining and domain specific, International Conference on Knowledge Discovery effective data mining tools for particular applications. The and Data Mining SIGKDD 2008, pages 963-971. aim of the paper is the study of application domains of Las Vegas NV, 2008. Data Mining such as science and engineering, banking, business and telecommunication. Although data mining is
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 27
9) a b National Research Council, Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment, Washington, DC: National Academies Press, 2008.
10) R. Agrawal et al., Fast discovery of association rules, in Advances in knowledge discovery and data mining pp. 307-328, MIT Press, 1996.
11) Stephen Haag et al. (2006). Management Information Systems for the information age. Toronto: McGraw-Hill Ryerson. pp. 28. ISBN 0- 07-095569-7. OCLC 63194770.
w
e
i
V
y
l
r
a
E
P a g e | 28 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
A Novel Decision Scheme for Vertical Handoff in 4G Wireless Networks
E. Arun1 R.S Moni2
Abstract−Future wireless networks will consist of multiple that all these new wireless technologies were designed heterogeneous access technologies such as UMTS, WLAN, and without considering any interworking among them In Wi-Max. These technologies differ greatly regarding network heterogeneous wireless networks, mobile devices or mobile capacity, data rates, and other various parameters such as stations will be equipped with multiple network interfaces to power consumption, Received Signal Strength, and coverage access different wireless networks. Users will expect to areas. This paper presents two Handoff Decision schemes for continue their connections without any disruption when they heterogeneous networks. A good handoff decision could avoid move from one network to another. This important process the redundant handoffs and reduce the packet lose. First scheme makes use of a score function to find the best network in wireless networks is referred to as handoff or handover. at best time from a set of neighboring networks. Score function Handoff process among networks usingw different access uses bandwidth, Received Signal Strength (RSS) and access fee technologies is defined as vertical handoff (VHO) [1]. Such as its parameters. Second scheme makes use of classic triangle a process of changing the connections among different types problem to find the best network from a set of neighboring of wireless and mobile networks is called the vertical networks. This problem considers three parameters handoff. Obviously, the networke selection and the vertical bandwidth, Received Signal Strength (RSS) and access fee as handoff decision are two important processes in an the three sides of a triangle. If an equilateral triangle is integrated wireless and mobile network. Handoff process is obtained with these parameters of a network then that network i initiated by change in different factors like Received Signal will be the best among the set of networks. The best decision model meets the individual user needs but also improve the Strength (RSS), Signal to Noise Ratio (SNR) etc. When whole system performance by reducing the unnecessary these factors fall bellow the threshold value the Mobile handoffs. Node (MN)V has to search for another AP having RSS greater Keywords-MIHF, Received Signal Strength, Mobility than threshold value [2, 3]. Wang et al. introduce the policy Management, vertical handoff , enabled handoff in [4], which was followed by several papers on similar approaches. Policy enabled handoff I. INTRODUCTION systems separates the decision making (i.e. which is the urrently, there are various wireless networks deployed ―best‖ network and when to handoff) from the handoff C around the world. Examples include second and thirdy mechanism. Smart Decision Model [5] smartly performs generation (3G) of cellular networks (e.g., GSM/GPRS, vertical handoff among available network interfaces. Using UMTS, CDMA2000), wireless local area networks WLANs a well-defined score function, the proposed model can (e.g., IEEE 802.11a/b/g), and personal area networksl (e.g., properly handoff to the ―best‖ network interface at the Bluetooth). All these wireless networks are heterogeneous in ―best‖ moment according to the properties of available sense of the different radio access technologies. From this network interfaces, system configurations / information, fact, it follows that no access technology ror service provider and user preferences. A handoff decision scheme with can offer ubiquitous coverage expected by users requiring guaranteed QoS [6] for heterogeneous networks make the connectivity anytime and anywhere. The actual trend is to decision according to the user‘s communicating types and integrate complementary wirelessa technologies with the performance of the networks. A generic vertical handoff overlapping coverage, to provide the expected ubiquitous decision function [7] proposed considering the different coverage and to achieve the ―Always Best Connected‖ factors and metric qualities that give an indication of (ABC) concept The ABC concept allows the user to use the whether or not a handoff is needed. The decision function best available accessE network. In order to accomplish the enables devices to assign weights to different network integration and inter-working between heterogeneous factors such as monetary cost, quality of service, power wireless networks and the ABC concept, many challenging requirements, personal preferences etc. A decision strategy research problems have to be solved, taking into account. [8] considers the performance of the whole system while ______taking VHO decisions by meeting individual needs. This decision strategy select the best network based on the About-1Assistant Professor, Dept of Computer Science & Engineering, highest received signal strength (RSS) and lowest Variation Noorul Islam University, Thuckalay, Tamil Nadu, India of Received Signal Strength (VRSS). Thus it ensures the (e-mail: [email protected]) About-2Senior Professor, Dept of Electronics &Communication high system performance by reducing the unnecessary Engineering, Noorul Islam University, Thuckalay, Tamil Nadu, India handoffs. Nasser et al. [9] proposed a VHO decision (VHD) (e-mail: [email protected]) method that simply estimates the service quality for
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 29 available networks and selects the network with the best MN indicates two possibilities a) RSS for an MN dropped quality. However, there still lie ahead many challenges in below some specific threshold while MN in service at an AP integrating cellular networks and WLANs b) RSS for one or more APs exceeded to a specific This paper is organized as follows. In Section II, we threshold while MN in service at BS. Usually AP is introduce our proposed system model for an integrated preferred attachment point than BS since AP is associated wireless and mobile network. In Section III, different with higher bandwidth cost and higher data rate. When handoff decision strategies are presented. In Section IV, we NSDC obtains LLT it executes Network selection decision analyze the performance of the proposed strategy. Finally, algorithm and find the best AP, if no other best APs are we conclude this paper in Section V. found for handoff select cellular network as the best available network. II. SYSTEM MODEL III. NETWORK SELECTION DECISION MAKING ALGORITHMS
Most existing network selection strategies only focused on
the individual user‘s needs. Motivation of this paper is to design a network-selection strategy from a system‘s perspective, and the network-selection strategy can also meet a certain individual user‘s needs. In the following, we discuss how our proposed network-selectionw strategy works. A. Algorithm 1) Handoff Initiation: MN can be in service with AP or BS. When the RSS e strength goes low below some threshold value or when the Fig 1 Vertical handoff in heterogeneous networks RSS strength in any of the AP goes above some threshold value when the MN i is in service with BS, the MN has to As shown in the above figure an MN can be existing at a find a best network to which it has to perform handoff given time in the coverage area of an UMTS alone. .When RSS goes low MN gives Link layer trigger to However, due to mobility, it can move into the regions Network Selection Decision Controller in the network in covered by more than one access network, i.e., which theV MN currently connects to. Thus the handoff simultaneously within the coverage areas of, for example, an process is initiated. UMTS BS and an IEEE 802.11 AP. Multiple IEEE 802.11 2) Handoff Decision: WLAN coverage areas are usually contained within an When handoff process is initiated, the Network Selection UMTS coverage area. A Worldwide Interoperability for decision controller collects the condition of each Microwave Access (WiMAX) coverage area can overlap neighboring network via Media Independent Handover with WLAN and/or UMTS coverage areas. In dense urbany Function (MIHF) and executes Network Selection Decision areas, even the coverage areas of multiple UMTS BSs can Controller (NSDC) algorithm. The algorithm first calculates overlap. Thus, at any given time, the choice of an the score of the current network and compares the score appropriate attachment point (BS or AP) for each lMN needs with each of the neighboring network‗s score. The score of to be made. These access technologies have different the neighboring networks is calculated only if all the bandwidth, power consumption, RSS threshold, data rate, parameters have satisfying value to accept a Mobile Host. jitter, delay etc. So during handoff it is requiredr to find the Our proposed network-selection strategy prefers a call to be best network according to user preferences. At the hotspots accepted by a network with lower traffic load and stronger APs are made available. When the Received Signal Strength received signal strength, which can achieve better traffic of an AP goes low below somea threshold value the Mobile balance among different types of networks and good service Host has to find another best network considering quality. Consequently, we define a score function to bandwidth, RSS, access fee as parameters. Each of these combine these two factors-the traffic load and the received parameters is given a weight according to preferences. If signal strength. Therefore, the score to use a network Ni for any of the best AP E s are not available handoff has to be a call is defined as performed to Base Station of UMTS. Thus, multiple access The score function used is the following: technologies and multiple operators are typically involved in k Network Selection Decision. The Network Selection Score W j Normj decision making algorithm is implemented in Network j1 (1) selection decision Controllers located in access networks. k is the number of parameters. Wj is the weight assigned to Decision input for NSDCs will be obtainable via the MIHF. the parameter j. Normj is the normalized value of the The MIHF of NSDC facilitates standard based message parameter j. If any of the network with higher score is exchanges between various access networks or attachment available handoff to that particular network or if any of the points to share information about the traffic load, bandwidth network with optimum score is not available handoff to BS. available, RSS and other network capabilities of each AP.
NSDC obtains LLT s from MN via MIHF. LLT regarding P a g e | 30 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Scorei wg.Gi ws.Si wf .Fi (2) P th P max 10 log(R ) where Gi is the complementary of the normalized utilization i i i (7) of network Ni, Ri is the relative received signal strength The relative received signal strength from network Ni is from network Ni, Fi is the normalized access fee of network rewritten as Ni and wg (0 ≤ wg ≤ 1) ws (0 ≤ ws ≤ 1),wf (0 ≤ wf ≤ 1), are log(r ) the S 1 i i log(R ) weights that provide preferences to Gi, Si, Fi respectively. i (8) The larger the weight of a specific factor, the more Ri is the radius of cell of network i important that factor is to the user and vice versa The Access fee Φi is given by constraint between wg ,ws and wf is given by
(1i ) wg + ws+wf = 1 (3) i (9) max Even though we could add the different factors in the VHDF to obtain network score, each network parameter has a where φmax is the highest access fee that the mobile user different unit, which leads to the necessity of normalization. likes to pay, and φi is the access fee to use network Ni. The The complementary of normalized utilization Gi is defined mobile user does not connect to a network that charges more by than φmax. If an originating call has more than one connection option, the score of all candidatew networks are
B calculated by using the score function in (2). The originating Gi if call is accepted by a network that has the largest score, Bi (4) which indicates the ―best‖ network.e If there is more than one ―best‖ network, the originating call is randomly accepted by where Bif is the number of available bandwidth units in any one of these ―best‖i networks. network Ni, Bi is the total number of bandwidth units in Flow chart network Ni. In general, stronger received signal strength indicates better signal quality. Therefore, an originating call prefers to be accepted by a network that has higher received signal V strength. However, it is difficult to compare the received signal strength among different types of wireless and mobile networks because they have different maximum transmission power and receiver thresholds. As a result, we propose to use relative received signal strength to compare different types of wireless and mobile networks. Si in (2)y is defined by l P c P th Si i i max th Pi Pi r (5)
P c where i is the current received signal strength from th a P network Ni, i is the receiver threshold in network Ni, P max and i is the maximumE transmitted signal strength in network Ni. It is to note that we only consider the path loss in the radio propagation model. Consequently, the received signal strength (in decibels) in network Ni is given by Fig 2: Handoff decision Algorithm 1
c max Here this algorithm checks only if bandwidth is available P P 10 log(r ) i i i (6) and not checking it greater than threshold. As the available bandwidth decreases i.e. the load increases there is more where ri is the distance between the mobile user and the BS chance for the RSS to go low. Thus the call dropping (or AP) of network Ni, and γ is the fading factor . Therefore, probability increases and holding time decreases. In this the receiver threshold in network Ni is given by
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 31 algorithm if any of the parameters have greater value the representing the conditions of networks. Each side of the score increases even if others have less value. triangle corresponds to each parameter. The parameters this problem considers in this paper are Received Signal B. Algorithm 2 Strength, Bandwidth and Access cost. If all the parameters Handoff Initiation have desired value (value MN expects) then the resultant MN can be in service with AP or BS. When the RSS triangle will be equilateral (S1=S2=S3=a, three sides equal) strength goes low below some threshold value or when the and if two of the parameters have desired value the triangle RSS strength in any of the AP goes above some threshold will be isosceles (S1≠S2=S3 or S1=S2≠S3, two sides equal). value when the MN is in service with BS, the MN has to If S1≠S2≠S3 then the triangle is scalene. The networks that find a best network to which it has to perform handoff give equilateral triangle and isosceles will be in candidate .When RSS goes low MN gives Link layer trigger to list 1 and candidate list 2 respectively. Select one network Network Selection decision controller in the network in from list1 as best network and if list1 is empty select best which the MN currently connects to. Thus the handoff network from list2. Then perform handoff to the selected process is initiated. best network. If both lists are empty handoff to BS. 2) Handoff Execution: Flow chart Handoff execution is based on classic triangular problem. According to triangular problem we consider triangles
w
e
i
V
y
l r
a
E
Fig 3: Handoff decision Algorithm 2 c where P is the current received signal strength RSS can be measured as i c max from network Ni, ri is the distance between the mobile user Pi Pi 10 log(ri ) (10) max and the BS (or AP) of network. Pi is the maximum
P a g e | 32 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology transmitted signal strength in network Ni γ is the fading call dropping factor probability algorithm 1 call dropping probability 0.8 0.75 0.7 0.65 0.55 0.4 0.3 Bandwidth is given by Available Bandwidth of the network = Bandwidth of the network − sum of Bandwidth used by all MNs Attached to the network
Access Fee is the fee that is assigned to each network usage. 1.4 It may vary from network to network. User usually prefers 1.2 the low network fee 1 IV. PERFORMANCE ANALYSIS
Simulations have been performed for the 3G cell overlay 0.8 algorithm 1 structure. In this scenario three networks of different data 0.6 algorithm 2 rates co-exist in the same wireless service area. Network 1 and Network 2 represent 802.11b wireless LANs, with 0.4 w bandwidths of 2Mbps and 1Mbps, respectively. Network 3 calldropping probability is modeled as a UMTS network, which supports multiple 0.2 users simultaneously. e The expected graphs are shown below 0 5 10 20 30 40 50 60 Bandwidth 10 20 30 40 50 60 70 80 90 100 i RSS(dBm) Holding Time algorithm Fig 6: Call dropping probability Vs RSS 1 2.5 4.5 5.7 6.1 6.3 6.5 6.9 7 7 7 V Holding V. CONCLUSION Time Thus this paper describes two different handoff decision algorithm algorithms. First algorithm uses a score function to find the 2 3.5 5.5 6.5 7 7.5 8 8.5 9 9.5 best9.5 network at best time from a set of neighboring networks. Second algorithm uses classic triangle problem to y find the best network from a set of neighboring networks. If an equilateral triangle is obtained with three parameters of a 18 l network then that network will be the best among the set of 16 networks. Since the second algorithm performs handoff only if the constraints are above the threshold value. The call 14 r dropping probability is reduced and holding time is 12 increased.
10 algorithm 2 VI. REFERENCES 8 a algorithm 1 1) Enrique Stevens-Navarro, Ulises Pineda-Rico, and
Holdingtime(sec) 6 Jesus Acosta-Elias ―Vertical Handover in beyond Third Generation (B3G) Wireless Networks‖ 4 E International Journal of Future Generation 2 Communication and Networking, pp. 51-58, 2008 0 2) K.Ayyappan and P.Dananjayan ―RSS 10 20 30 40 50 60 70 80 90 100 measurement for vertical handoff in heterogeneous Bandwidth(Mbps) network‖, Journal of Theoretical and Applied Information Technology, pp. 989-994 , 2005
3) Kemeng Yang, Iqbal Gondal, Bin Qiu and Fig 4: Holding time Vs RSS Laurence S. Dooley ―Combined SINR Based Vertical Handoff Algorithm for Next Generation Heterogeneous Wireless Networks‖ Global RSS 5 10 20 30 40 50 60 Telecommunications Conference, 2007. algorithm 2 0.5 0.4 0.3 0.25 0.2 0.1 0.09 GLOBECOM '07, pp. 4483 – 4487, Nov 2007,
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 33
Digital Object Identifier 10.1109/GLOCOM.2007.852
4) Wang, R. Katz, and J. Giese, "Policy-enabled
handoffs across heterogeneous wireless networks",
WMCSA 99, Feb1999, pp. 51-60, Digital Object
Identifier 10.1109/MCSA.1999.749277
5) L-J. Chen, T. Sun, B. Chen, V. Rajendran, and M. Gerla. "A Smart Decision Model for Vertical Handoff." The 4th ANWIRE International Workshop on Wireless Internet and Reconfigurability (ANWIRE 2004). May 2004. 6) Ying-Hong Wang, Chih-Peng Hsu, Kuo-Feng Huang, Wei-Chia Huang‖Handoff decision scheme with guaranteed QoS in heterogeneous network‖ pp 138-143,2008Digital Object Identifier 10.1109/UMEDIA.2008.4570879 7) Ahmed Hasswa, Nidal Nasser, Hossam Hassanein ―Generic Vertical Handoff Decision Function for w Heterogeneous Wireless Networks‖ Wireless and Optical Communications Networks, 2005. WOCN 2005, pp 239-243,Mar 2005 , Digital Object e Identifier 10.1109/WOCN.2005.1436026 8) Shen,W.;Zeng,Q.-A. ―A Novel Decision Strategy of Vertical Handoff in Overlay Wireless Networks‖ i Fifth IEEE International Symposium on Network Computing and Applications, 2006 ,pp 227-230 Digital Object Identifier 10.1109/NCA.2006.5 9) Summary: In an overlay wireless network, a mobile V user can connect to different radio access networks if it is equipped with appropriate network interfaces. When the mobile user changes its connection between different radio access networks, a vertical handof....N. Nasser, A. Hasswa, and H. Hassanein, ―Handoffs in fourthy generation heterogeneous networks,‖ IEEE Commun. Mag., vol. 44, no. 10, pp. 96l–103, Oct. 2006, Digital Object Identifier 10.1109/MCOM.2006.1710420 10) Olga Orrnond, Philip Perryr and John Murphy‖Network Selection Decision in Wireless Heterogeneous Networks‖ 2005 IEEE 16th International Symposium on Personal, Indoor and Mobile Radio Communicationsa Volume 4,pp 2680 – 2684 Sept 2005, Digital Object Identifier 10.1109/PIMRC.2005.1651930 11) Wei Shen, and Qing-An Zeng, ―Cost-Function- E Based Network Selection Strategy in Integrated Wireless and Mobile Networks,‖ IEEE Trans. Veh. Technol., vol. 57, no. 6, pp. 3778–3788, Nov. 2008. Digital Object Identifier 10.1109/TVT.2008.917257.
P a g e | 34 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Hybrid Approach for Template Protection in Face Recognition System
Sheetal Chaudhary1 Rajender Nath2
Abstract-Biometrics deals with identifying individuals with the It must also check that the presented biometric sample help of their biological (physiological and behavioral) data. The belongs to the live human being who was originally enrolled security of biometric systems has however been questioned and in the system and not just any live human being. It is well previous studies have shown that they can be fooled with known that fingerprint systems can be fooled with artificial artificial artifacts. Also biometric recognition systems face fingerprints, static facial images can be used to fool face challenges arising from intra-class variations and attacks upon template databases. To tackle such problems, a hybrid recognition systems, and static iris images can be used to approach for liveness detection and protecting templates in fool iris recognition systems [2]. face recognition system is proposed. Here, the system captures Multimodal biometric systems consolidate the evidence input face image in three different poses (left, front, right) presented by multiple biometric sourcesw of information and based upon the order chosen by the random select module. are expected to be more reliable due to the presence of This approach will perform live face detection based upon multiple, fairly independent pieces of evidence [3]. Intra- complete body movement of the person to be recognized and class variations in face recognition system can be overcome template protection by randomly shuffling and adding the with multimodal biometric e systems. Figure 1 is showing components of feature set resulting after fusion of three poses intra-class variation associated with an individual‘s face of input face image. It overcomes the limitations imposed by intra-class variations and spoof attacks in face recognition image. Due to changei in pose, face recognition system will system. The resulting hybrid template will be more secure as not be able to match these 3 images successfully, even original biometric template will not be stored in the database though they belong to the same individual [4]. A rather it will be stored after applying some changes (shuffling Multibiometric system can be classified into five categories and addition) in its components. Thus the proposed approach (multi-sensor,V multi-algorithm, multi-instance, multi-sample has higher security and better recognition performance as and multimodal) depending upon the evidence presented by compared to the case when no measures are used for live face multiple sources of biometric information. Multi-sample check and template protection in database. system can be used to tackle intra-class variations. Here, a Keywords-Liveness detection, template protection, face single sensor is used to acquire multiple samples of the same recognition, multiple sample fusion, eigen-coefficients biometric trait in order to account for the variations that can I. INTRODUCTION y occur in the trait. It is an inexpensive way of improving system performance since this system requires neither he term biometrics is derived from the Greek words multiple sensors nor multiple feature extraction and T bios and metron which translates as life measurement.l matching modules [5] [6]. Biometrics are not secrets and therefore should be properly protected. A good biometrics system should depend not only on security of biometric data but the authenticationr process must also check for liveness of the biometric data. People leave fingerprints behind on everything they touch, and the iris can be observed anywhere athey look. Our facial images are recorded every time we enter a bank, railway station, and supermarket [1]. Once biometric measurements are disclosed, they cannot be changed (unless the user is willing to have an organ transplant).E The only way how to make a system secure is to make sure that the data presented came Fig.1: Intra-class variation associated with an individual‘s from a real person and is obtained at the time of face image authentication. Liveness detection in a biometric system One of the properties that make biometrics so attractive for means the capability for the system to detect, during authentication purposes is their invariance over time. One of enrollment and identification/verification, whether or not the the most vulnerabilities of biometrics is that once a biometric sample presented to the system is alive or not. biometric image or template is stolen, it is stolen forever and ______cannot be reissued, updated or destroyed [7]. One of the About-1University Research Scholar, Department Of Comp. Sc. & App. most potentially damaging attacks on a biometric system is K.U., Kurukshetra, Haryana, India (e-mail;[email protected]) against the biometric template database. Attacks on the About-2Associate Professor, Department Of Comp. Sc. & App. K.U., template can lead to the following three vulnerabilities: (i) Kurukshetra, Haryana, India (e-mail;[email protected])
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 35
A template can be replaced by an impostor‘s template to reveal any significant information about the original gain unauthorized access, (ii) A physical spoof can be biometric template, it is needed during matching to extract a created from the template to gain unauthorized access to the cryptographic key from the query biometric features. system (as well as other systems which use the same Matching is performed indirectly by verifying the validity of biometric trait) and (iii) The stolen template can be replayed the extracted key. Biometric cryptosystems can be further to the matcher to gain unauthorized access [8]. classified as key binding and key generation systems The proposed hybrid approach provides three main depending on how the helper data is obtained [16]. advantages: handles intra-class variation, performs live face Liveness detection can be performed either at the acquisition check and provides protection against attacks on template stage, or at the processing stage. There are two approaches database. The rest of the paper is organized as follows. in determining if a biometric trait is alive or not; liveness Section 2 addresses the literature study. In section 3 face detection and non-liveness detection [2]. Liveness detection, feature set extraction using PCA is discussed. In section 4 which aims at recognition of human physiological activities architecture of the proposed approach is presented. Section 5 as the liveness indicator to prevent spoofing attack, is discusses the advantage of proposed approach. Finally, the becoming a very active topic in field of fingerprint summary and conclusions are given in last section recognition and iris recognition, but efforts on live face detection are still very limited though live face detection is II. RELATED WORK highly desirable. The most common faking way is to use a In recent years face recognition has received substantial facial photograph of a valid user to spoof face recognition attention from both research communities and the market, systems. Most of the current face recognitionw works with but still remained very challenging in real applications. A lot excellent performance are based on intensity images and of face recognition algorithms have been developed during equipped with a generic camera. Thus, an anti-spoofing the past decades. Face recognition consists in localizing the method without additional devicee will be preferable, since it most characteristic face components (eyes, nose, mouth, could be easily integrated into the existing face recognition etc.) within images that depict human faces This step is systems [17] [18]. essential for the initialization of many face processing i III. FEATURE EXTRACTION techniques like face tracking, facial expression recognition or face recognition. Among these, face recognition is a Facial recognition is the identification of humans by the lively research area where a great effort has been made in unique characteristics of their faces. It has attracted a lot of the last years to design and compare different techniques attention Vbecause of its potential applications. Among face [9]. Hong and Jain [10] designed a decision fusion scheme recognition algorithms, appearance-based approaches (PCA, to combine faces and fingerprint for personal identification. LDA, and ICA) are the most popular. These approaches Brunelli and Falavigna [11] presented a person identification utilize the pixel intensity or intensity-derived features [12]. system by combining outputs from classifiers based on In this paper, the PCA method using eigenfaces was adopted audio and visual cues. Face recognition algorithms are for face recognition. PCA is a way of identifying patterns in categorized into appearance based and model-basedy data, and expressing the data in such a way as to highlight schemes. For appearance-based methods, three linear their similarities and differences. The main idea of the subspace analysis schemes are presented (PCA, l LDA, and principal component analysis (or Karhunen-Loeve ICA) [12].The model-based approaches include Elastic transform) is to find the vectors which best account for the Bunch Graph matching [13], Active Appearance Model [14] distribution of face images within the entire image space. and 3D Morphable Model [15] methods.r Among face These vectors define the subspace of face images, which we recognition algorithms, appearance-based approaches are call "face space". Because these vectors are the eigenvectors the most popular. These approaches utilize the pixel of the covariance matrix corresponding to the original face intensity or intensity-derived features. images, and because they are face like in appearance, we The template protection schemesa proposed in the literature refer to them as ―eigenfaces‖. Eigenfaces are a set of can be broadly classified into two categories, feature eigenvectors used in the computer vision problem of human transformation approach and biometric cryptosystem face recognition. The eigenfaces are the principal approach [8]. In theE feature transformation approach, a components of a distribution of faces, or equivalently, the transformation function is applied to the biometric template eigenvectors of the covariance matrix of the set of face and only the transformed template is stored in the database. images, where each image with NxN pixels is considered a The same transformation function is applied to query point (or vector) in N2-dimensional space [19]. The idea of features and the transformed query is directly matched using principal components to represent human faces was against the transformed template. Depending on the developed by Sirovich and Kirby [20] and used by Turk and characteristics of the transformation function, the feature Pentland [21] for face detection and recognition. Eigenfaces transform schemes can be further categorized as salting and are mostly used to: non-invertible transforms. In a biometric cryptosystem, (a) Extract the relevant facial information, which may or some public information about the biometric template is may not be directly related to face features such as the eyes, stored. This public information is referred to as helper data nose, and lips. One way to do so is to capture the statistical and hence, biometric cryptosystems are also known as variation between face images. helper data-based methods. While the helper data does not P a g e | 36 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
(b) Represent face images efficiently. To reduce the A. Random selection of three Facial views computation and space complexity, each face image can be This step is responsible for performing liveness detection. represented using a small number of dimensions. Here, the person to be recognized is required to stand in Mathematically, it is simply finding the principal front of camera which is focused upon full height of the components of the distribution of faces, or the eigenvectors person. Based upon the random order (LFR or LRF or FLR of the covariance matrix of the set of face images, treating or FRL or RFL or RLF, L: Left; F: Front; R: Right) an image as a point or a vector in a very high dimensional generated by the random select module as shown in fig. 2, space. Each eigenvector is accounting for a different amount the person is asked to move left or right or look at front. The of the variations among the face images. These eigenvectors camera is focused upon the entire body to examine the can be imagined as a set of features that together actual body movement but it will capture only the images of characterize the variation between face images [19] face in the order selected by the module. The module which IV. PROPOSED APPROACH generates random order of three views will detect whether the person is live or not by instructing the person to move Figure 2 shows the block diagram of the proposed approach left or right or look at front. Complete body movement is for template protection in face recognition system. The main examined through camera and face images will be captured idea behind the proposed approach is to generate secure only if the person is live. To perform liveness detection, the hybrid templates by integrating three different views (left, random select module can be equipped with the following front and right) of input face image and then changing the decision process which first checks liveness and then components of resulting face feature set. w performs person recognition
if data = live
perform acquisition and extraction
else if data = not live e
do not perform acquisition and extraction
B. Extraction of Featurei sets of three Facial views
performs feature set extraction of three views (Left, Front,
Right) of input face image by using PCA (appearance based)
face recognition technique. PCA method is applied V individually on each view of the face image to extract the
corresponding feature set. When using PCA, each face
image is assumed to be a 2-dimensional array of intensity
values. It is represented as 1-dimensional vector by
concatenating each row (or column) into a long thin vector.
By projecting the face vector to the basis vectors, the y projection coefficients are used as the feature representation
of each face image. The PCA method using eigenfaces
l consists of the following two stages [10]
1) training stage, in which a set of N face images are
r collected; eigenfaces that correspond to the M highest
eigenvalues are computed from the data set; and each face is
represented as a point in the M dimensional eigenspace, and
2) operational stage, in which each test image is first a projected onto the M-dimensional eigenspace; the M Fig. 2: Architecture of proposed approach for template dimensional face representation is then deemed as a feature protection in face recognition system vector and fed to a classifier to establish the identity of the The proposed approach can be roughly divided into the E individual. following four steps: For each face image, we obtain a feature vector by A. Random selection of three Facial views (Left, Front, projecting image onto the subspace generated by the Right) to perform Liveness Detection principal directions of the covariance matrix. After applying B. Extraction of Feature sets of three Facial views the projection, the input vector (face) in an n-dimensional C. Fusion of Feature sets of three Facial views (Left + Front space is reduced to a feature vector in an m-dimensional + Right) subspace (M<< N) [9]. D. Random shuffling and addition of components of Eigen Thus, the feature vectors of three individual face views can vector (resulting after fusion) be represented in terms of eigen vectors as described below
eigen vector for left face view V = [a , a , a , a …a ] L 1 2 3 4 m eigen vector for front face view VF = [b1, b2, b3, b4…bm]
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 37 eigen vector for right face view VR = [c1, c2, c3, c4…cm] Random shuffling of coefficients in the eigen vector that is where VL, VF, VR represent the feature sets in terms of eigen- obtained after fusion of three feature sets and addition coefficients of three views of face image respectively. among coefficients of shuffled eigen vector will generate the hybrid template that would be finally stored in the system database. The resulting hybrid template will contain half the C. Fusion of Feature sets of three Facial views no. of coefficients in the original eigen vector that was Fusion involves consolidating the evidence presented by two obtained in the previous step. The no. of coefficients are or more biometric feature sets of the same individual. This reduced by addition function. This approach will make the step performs fusion of feature sets of three face views of template more secure against spoof attacks and will take less the same image at feature level [6]. Here, the three feature memory in the database. sets originate from the same feature extraction algorithm V. ADVANTAGE OF USING PROPOSED APPROACH (PCA). Fusion of three face views is performed by just averaging them as given below basic idea of the proposed approach is that instead of storing X = (VL + VF + VR)/3 the original template in database, it is stored after (1) performing fusion, shuffling and addition in the coefficients The resulting fused eigen vector can be represented as X = of eigen vector. The proposed approach offers advantage in [x1, x2, x3, x4…xm]. terms of liveness detection, intra class variation, and template security by providing the ability to discard the D. Random shuffling and addition of components of Eigen stolen template information. Here, multiplew samples of the vector same biometric trait (face) are captured in order to account This step in the proposed approach is responsible for for the intra class variations that can occur in the trait and performing changes in the eigen vector that is obtained after for checking liveness of acquirede biometric sample. This fusion of three feature sets. It will make the resulting approach for liveness detection is natural, non-intrusive and template more secure. This step is illustrated in fig.3 below no extra hardware is required. But it requires user collaboration by instructingi the user to move left, right or stand in front of camera. It provides template security by performing fusion of feature sets of three facial views (Left, Front, Right), random shuffling Vof eigen-coefficients in the fused eigen vector and addition among the shuffled eigen-coefficients. If the template is found to be compromised, the proposed approach provides the ability to discard it and reissue with new shuffling rules. In this way, with shuffling a number of eigen vectors can be generated. Also it is impossible for the y attacker to convert the stolen template into the original face data (PCA eigen vector). It is well known that each l eigenface represents certain characteristic features of faces and any original image can be reconstructed by combining the eigenfaces in right proportion. Hence, the original eigen r vector is not stored in the database rather it is stored after applying shuffling rules and then adding the shuffled coefficients according to the addition function as discussed in the previous section. Addition reduces the size of the Fig. 3: Steps to generate securea hybrid template from input eigen vector by half and hence the final hybrid template face images generated will be compact and more secure. Thus the The coefficients of eigen vector X are randomly shuffled. proposed scheme provides higher template security and By shuffling, randomlyE chosen columns are interchanged better recognition performance as compared to the case and every time we can generate a new eigen vector. when no measures for liveness detection and template X′ = Hshuffle (X) (2) protection are taken as in existing face recognition system where Hshuffle is the function which performs shuffling on using eigenfaces approach. the eigen vector X and X′ is the shuffled eigen vector. The VI. CONCLUSION number of coefficients in both X and X′ are same, shuffling just changes the order of columns. After that, addition Biometric template protection has become one of the among coefficients of shuffled eigen vector is performed in important issues in deploying a practical biometric system. some order. Addition function is described below: In this paper, a hybrid approach for template protection in p=m-2 face recognition system is proposed. This approach is based Addition = ∑ [xp + xp+2], (3) on the fusion of three different views (left, front, right view p = 1 captured randomly) of input face image, random shuffling of after every two iteration p is incremented with 3. coefficients in the eigen vector (extracted using PCA P a g e | 38 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology method) obtained after fusion and addition among Trans. Pattern Analysis and Machine Intelligence, coefficients in the shuffled eigen vector. On the theoretical vol. 20, no. 12, pp. 1295–1307, 1998. basis, it has been proved that the proposed approach 11) R. Brunelli and D. Falavigna, ―Person provides better template protection against spoof attacks as identification using multiple cues,‖ IEEE Trans. compared to the existing method. One of the weaknesses of Pattern Analysis and Machine Intelligence, vol. 17, biometrics is that once a biometric data or template is stolen, no. 10, pp. 955–966, Oct. 1995. it is stolen forever and cannot be reissued, or discarded. 12) [12] Xiaoguang Lu, ―Image Analysis for Face Thus template security has become very critical in these Recognition – A brief survey‖, Dept. of Computer systems. The proposed scheme provides new measures Science & Engineering, Michigan State University, (shuffling and addition) for template protection by giving personal notes, May 2003. the ability to discard the lost template and reissue a new one. 13) L. Wiskott, J.M. Fellous, N. Kruger, and C. von der Malsburg, ―Face recognition by elastic bunch VII. REFERENCES graph matching,‖ IEEE Trans. Pattern Analysis and 1) Bori Toth, ―Biometric Liveness Detection‖, Machine Intelligence, vol. 19, no. 7, pp. 775–779, Information Security Bulletin, October 2005, 1997. Volume 10, pages 291-297. 14) G.J. Edwards, T.F. Cootes, and C.J. Taylor, ―Face recognition using active appearance models,‖ in 2) International Biometric Group. Liveness detection Proc. European Conference on Computer Vision, in biometric systems, 2003. White paper. Available 1998, vol. 2, pp. 581–695. w at http://www.biometricgroup.com/reports/public/ 15) V. Blanz and T. Vetter, ―A morphable model for reports/liveness.html. the synthesis of 3D faces,‖ in Proc. ACM 3) A. K. Jain, A. Ross, and S. Prabhakar, ―An SIGGRAPH, Mar. 1999,e pp. 187–194. introduction to biometric recognition,‖ IEEE Trans. 16) U. Uludag, S. Pankanti, S. Prabhakar, and A. K. on Circuits and Systems for Video Technology, Jain, ―Biometric Cryptosystems: Issues and vol. 14, pp. 4–20, Jan 2004. Challenges,‖i vol. 92, no. 6, pp. 948–960, June 4) Arun Ross and Anil K. Jain, ―Multimodal 2004. biometrics: An overview‖, appeared in Proc. of 17) Gang Pan, Zhaohui Wu and Lin sun, ―Liveness 12th European Signal Processing Conference Detection for Face Recognition‖, Recent Advances (EUSIPCO), (Vienna, Austria), pp. 1221-1224, inV Face Recognition, pages 109-123, December September 2004. 2008, I-Tech, Vienna, Austria. 5) Arun Ross, ―An Introduction to Multibiometrics‖, 18) Jiangwei Li, Yunhong Wang, Tieniu Tan, A.K. EUSIPCO, 2007. Jain, ―Live Face Detection Based on the Analysis 6) A. Ross, K. Nandakumar, and A. K. Jain, of Fourier Spectra‖, Biometric Technology for Handbook of Multibiometrics, New York, Human Identification, Proceedings of. SPIE, Vol. Springer, 2006. y 5404. 7) B. Schneier, ―The uses and abuses of biometrics‖, 19) Y. Vijaya Lata, Chandra Kiran Bharadwaj Communications of the ACM, vol. 42,l no. 8, pp. Tungathurthi, H. Ram Mohan Rao, A. Govardhan, 136, Aug. 1999. L. P. Reddy, ―Facial Recognition using Eigenfaces 8) A. K. Jain, K. Nandakumar and A. Nagar, by PCA‖, International Journal of Recent Trends in ―Biometric Template Security‖,r EURASIP Journal Engineering, Vol. 1, No. 1, May 2009. on Advances in Signal Processing, January 2008. 20) L. Sirovich and M. Kirby, ―Low-dimensional 9) Lu, X., Wang, Y. & Jain, A.K. (2003). Combining procedure for the characterization of human faces‖, Classifiers for Face Recognition, In IEEE Journal of the Optical Society of America A 4: Conference on Multimediaa & Expo, Vol. 3, pp. 13- 519–524, 1987. 16. 21) M.Turk and A. Pentland, "Eigenfaces for 10) L. Hong and A.K Jain, ―Integrating faces and Recognition", Journal of Cognitive Neuroscience, fingerprint for personal identification,‖ IEEE March 1991. E
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 39
QRS Wave Detection Using Multiresolution Analysis
S.Karpagachelvi1 Dr.M.Arthanari, Prof. & Head2 M.Sivakumar3
Abstract-Te electrocardiogram (ECG or EKG) is basically a based on time domain method. But this is not always diagnostic tool that measures and records the electrical signal adequate to study all the features of ECG signals. Therefore by comparing the activity of heart. It is most commonly used to the frequency representation of a signal is required. The perform cardiac test, since it acts as screening tool for cardiac deviations in the normal electrical patterns indicate various abnormalities. This is necessary because no single point cardiac disorders. Cardiac cells, in the normal state are provides a complete picture of what is going on in the heart. It electrically polarized [5]. mainly comprises of PQRS&T wave by showing corresponding time and frequency. PQRST key feature detector is based on ECG is essentially responsible for patient monitoring and wavelet transform which robust to time varying and noise. It diagnosis. The extracted feature from the ECG signal plays will analyze the waveform including noise purification, sample a vital in diagnosing the cardiac disease.w The development design of digital ECG. R peak is mainly used for detection. In of accurate and quick methods for automatic ECG feature this work, we have developed an electrocardiogram (ECG) extraction is of major importance. Therefore it is necessary feature extraction system based on the multi-resolution wavelet that the feature extraction system performs accurately. The transform. It mainly includes two stages. In the first stage, purpose of feature extractione is to find as few properties as algorithm is quoted by using discrete wavelet transform for de- possible within ECG signal that would allow successful noise the signal. In second step multiresolution is done for QRS abnormality detection and efficient prognosis. complex detection. The proposed schemes were mostly based i on Fuzzy Logic Methods, Artificial Neural Networks (ANN), Genetic Algorithm (GA), Support Vector Machines (SVM), and other Signal Analysis techniques. Keywords-Cardiac Cycle, ECG signal, P-QRS-T waves, V Feature Extraction, Haar wavelets.
I. INTRODUCTION he investigation of the ECG has been extensively used T for diagnosing many cardiac diseases. The ECG is a realistic record of the direction and magnitude of y the electrical commotion that is generated by depolarization and re-polarization of the atria and ventricles. One cardiac cycle in an ECG signal consists of the P-QRS-T waves.l Figure 1 shows a sample ECG signal. The majority of the clinically useful information in the ECG is originated in the intervals and amplitudes defined by its features (characteristicr wave peaks and time durations). The improvement of precise and Figure.1 A Sample ECG Signal showing P-QRS-T Wave rapid methods for automatic ECG feature extraction is of chief importance, particularly a for the examination of long recent year, several research and algorithm have been recordings [1]. developed for the exertion of analyzing and classifying the The ECG feature extraction system provides fundamental ECG signal. The classifying method which have been features (amplitudes and intervals) to be used in subsequent proposed during the last decade and under evaluation automatic analysis. InE recent times, a number of techniques includes digital signal analysis, Fuzzy Logic methods, have been proposed to detect these features [2] [3] [4]. The Artificial Neural Network, Hidden Markov Model, Genetic previously proposed method of ECG signal analysis was Algorithm, Support Vector Machines, Self-Organizing Map, ______Bayesian and other method with each approach exhibiting its own advantages and disadvantages. In this work, we have About-1Doctoral Research Scholar, Mother Teresa Women's University, developed an electrocardiogram (ECG) feature extraction Kodaikanal, Tamilnadu, India.(email;[email protected]) system based on the multi-resolution wavelet transform About-2Dept. of Computer Science and Engineering,Tejaa Shakthi Institute of Technology for Women Coimbatore- 641 659, Tamilnadu, India using haar coefficients and also provide an over view on (email: [email protected]) various techniques and transformations used for extracting About-3 Doctoral Research Scholar Anna University – Coimbatore the feature from ECG signal. This paper is structured as Tamilnadu, India(email;[email protected]) follows. Section 2 discusses the related work that was earlier proposed in literature for ECG feature extraction. Section 3 P a g e | 40 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology gives a description of the DWT based ECG feature detection discussions algorithm and Section 4 concludes the paper with fewer extracted from wavelet decomposition of the ECG images II. RELATED WORK intensity. The obtained ECG features are then further ECG feature extraction has been studied from early time and lots of advanced techniques as well as transformations have been proposed for accurate and fast ECG feature extraction. processed using artificial neural networks. The features are: This section of the paper discusses various techniques and mean, median, maximum, minimum, range, standard transformations proposed earlier in literature for extracting deviation, variance, and mean absolute deviation. The feature from ECG. introduced ANN was trained by the main features of the 63 A novel approach for ECG feature extraction was put forth ECG images of different diseases. by Castro et al. in [6]. Their proposed paper present an An algorithm was presented by Chouhan and Mehta in [9] algorithm, based on the wavelet transform, for feature for detection of QRS complexities. The recognition of QRS- extraction from an electrocardiograph (ECG) signal and complexes forms the origin for more or less all automated recognition of abnormal heartbeats. Since wavelet ECG analysis algorithms. The presented algorithm utilizes a transforms can be localized both in the frequency and time modified definition of slope, of ECG signal, as the feature domains. They developed a method for choosing an optimal for detection of QRS. A succession of transformations of the mother wavelet from a set of orthogonal and bi-orthogonal filtered and baseline drift corrected ECG signal is used for wavelet filter bank by means of the best correlation with the mining of a new modified slope-feature.w In the presented ECG signal. The coefficients, approximations of the last algorithm, filtering procedure based on moving averages scale level and the details of the all levels, are used for the [15] provides smooth spike-free ECG signal, which is ECG analyzed. They divided the coefficients of each cycle appropriate for slope feature eextraction. The foremost step is into three segments that are related to P-wave, QRS to extort slope feature from the filtered and drift corrected complex, and T-wave. The summation of the values from ECG signal, by processing and transforming it, in such a these segments provided the feature vectors of single cycles. way that the extractedi feature signal is significantly Mahmoodabadi et al. in [1] described an approach for ECG enhanced in QRS region and suppressed in non-QRS region. feature extraction which utilizes Daubechies Wavelets Xu et al. in [10] described an algorithm using Slope Vector transform. They had developed and evaluated an Waveform (SVW) for ECG QRS complex detection and RR electrocardiogram (ECG) feature extraction system based on interval evaluation.V In their proposed method variable stage the multi-resolution wavelet transform. The ECG signals differentiation is used to achieve the desired slope vectors from Modified Lead II (MLII) were chosen for processing. for feature extraction, and the non-linear amplification is The wavelet filter with scaling function further intimately used to get better of the signal-to-noise ratio. The method similar to the shape of the ECG signal achieved better allows for a fast and accurate search of the R location, QRS detection. The foremost step of their approach was to de- complex duration, and RR interval and yields excellent ECG noise the ECG signal by removing the equivalent wavelety feature extraction results. In order to get QRS durations, the coefficients at higher scales. Then, QRS complexes are feature extraction rules are needed. detected and each one complex is used to trace thel peaks of A modified combined wavelet transforms technique was the individual waves, including onsets and offsets of the P developed by Saxena et al. in [11]. The technique has been and T waves which are present in one cardiac cycle. developed to analyze multi lead electrocardiogram signals A feature extraction method using r Discrete Wavelet for cardiac disease diagnostics. Two wavelets have been Transform (DWT) was proposed by Emran et al. in [7]. used, i.e. a quadratic spline wavelet (QSWT) for QRS They used a discrete wavelet transform (DWT) to extract the detection and the Daubechies six coefficient (DU6) wavelet relevant information from the ECG input data in order to for P and T detection. A procedure has been evolved using perform the classification task.a Their proposed work electrocardiogram parameters with a point scoring system includes the following modules data acquisition, pre- for diagnosis of various cardiac diseases. The consistency processing beat detection, feature extraction and and reliability of the identified and measured parameters classification. In the feature extraction module the Wavelet were confirmed when both the diagnostic criteria gave the Transform (DWT) isE designed to address the problem of same results. Table 1 shows the comparison of different non-stationary ECG signals. It was derived from a single ECG signal feature extraction techniques. generating function called the mother wavelet by translation Fatemian et al.[12] proposed an approach for ECG feature and dilation operations. Using DWT in feature extraction extraction. They suggested a new wavelet based framework may lead to an optimal frequency resolution in all frequency for automatic analysis of single lead electrocardiogram ranges as it has a varying window size, broad at lower (ECG) for application in human recognition. Their system frequencies, and narrow at higher frequencies. The utilized a robust preprocessing stage, which enables it to DWT characterization will deliver the stable features to the handle noise and outliers. This facilitates it to be directly morphology variations of the ECG waveforms. applied on the raw ECG signal. In addition the proposed Tayel and Bouridy together in [8] put forth a technique for system is capable of managing ECGs regardless of the heart ECG image classification by extracting their feature using rate (HR) which renders making presumptions on the wavelet transformation and neural networks. Features are individual's stress level unnecessary. The substantial
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 41 reduction of the template gallery size decreases the storage QRS recognition is shown in Figure 3. requirements of the system appreciably. Additionally, the 1000 categorization process is speeded up by eliminating the need 0 Signal -1000 for dimensionality reduction techniques such as PCA or 20000 200 400 600 800 1000 1200 1400 0 LDA. Their experimental results revealed the fact that the a5 -2000 proposed technique out performed other conventional 50000 5 10 15 20 25 30 35 40 0 methods of ECG feature extraction. d5 -5000 0 5 10 15 20 25 30 35 40 III. DESCRIPTION OF ALGORITHM 2000
0 d4 A. Wavelet Selection -2000 10000 10 20 30 40 50 60 70 80 0 The large number of known wavelet families and functions d3 provides a rich space in which to search for a wavelet which -1000 5000 20 40 60 80 100 120 140 160
will very efficiently represent a signal of interest in a large 0 d2 variety of applications. Wavelet families include -500 1000 50 100 150 200 250 300 350 Biorthogonal, Coiflet, Harr, Symmlet, Daubechies wavelets, 0 d1 etc. There is no absolute way to choose a certain wavelet. -100 The choice of the wavelet function depends on the 0 100 200 300 400 500 600 700 application. The Haar wavelet algorithm has the advantage Fig.2. Multiresolution decomposition of ECG signal of being simple to compute and easy to understand. In the from 801.dat filew Vector magnitute present work Haar wavelet is chosen. Savitzky Golay 0.2 filtering is used to smooth the signal. To identify the onsets Mag. of Signal Q Amp and offsets of the wave , the wave is made to zero base. To 0.15 e R Amp obtain the wavelet analysis, we used the Matlab program, S Amp which contains a very good ―wavelet toolbox‖. First the 0.1 i considered signal was decomposed using Haar wavelet of the order of 1-5 has been evaluated. One of the key criteria 0.05 of a good mother wavelet is its ability to fully reconstruct the signal from the wavelet decompositions. The fig 2 shows 0 the decomposed signal. The high frequency components of V the ECG signal decreases as lower details are removed from -0.05 the original signal. As the lower details are removed, the -0.1 signal becomes smoother and the noises disappears since noises are marked by high frequency components picked up -0.15 along the ways of transmission. This is the contribution of 0 0.05 0.1 0.15 0.2 0.25 the discrete wavelet transform where noise filtrationy is performed implicitly. Fig.3. Multiresolution process of wavelet-based peak Detection in 801.dat file B. Peaks identification l IV. CONCLUSION In order to detect the peaks, specific details of the signal were selected. R peaks are the Largestr amplitude points In this paper, QRS key feature elements detection algorithm which are greater than threshold points are located in the based on multi resolution analysis was proposed. The wave. Those maxima points are stored and the R-R interval performance of the peak detection was examined by testing is determined. Their mean value is found which is used to the algorithm on data standardized MIT-BIH database. The a DWT based QRS detector performs well with standard find the portion of the single wave. A Q and S peak occurs about the R peak with in 0.1second. Calculating the distance techniques. Thus, the primary advantages of the DWT over from zero point or close zero left side of R peak within the existing techniques are noise removal and ability to process threshold limit denotesE Q peak. The onset is the beginning the time varying ECG data. In this work we pointed out the of the Q wave (or R-wave if the Q-wave is missing) and the advantage of using wavelet transform associated with a offset is the ending of the S-wave (or R-wave if the S wave threshold is missing). Normally, the onset of the QRS complex strategy. Further, the possibility of detecting positions of contains the high-frequency components, which are detected QRS complexes in ECG signals is investigated and a simple at finer scales. Calculating the distance from zero point or detection algorithm is proposed. The main advantage of this close zero right side of R peak within the threshold limit kind of detection is less time consuming analysis for long denotes Q peak. time ECG signal. The QRS detection in the ECG signal is explained with screen shots. The future work mainly C. Results concentrates on improving the proposed algorithm for The algorithm presented in this section is applied directly at various QRS waves of different patients. Moreover one run over the whole digitized ECG signal which are additional statistical data will be utilized for evaluating the saved as data files provided by Physionet. performance of an algorithm in ECG signal feature P a g e | 42 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology detection. Improving the accuracy of diagnosing the cardiac 10) Xiaomin Xu, and Ying Liu, ―ECG QRS Complex disease at the earliest is necessary in the case of patient Detection Using Slope Vector Waveform (SVW) monitoring system. Therefore our future work also has an Algorithm,‖ Proceedings of the 26th Annual eye on improvement in diagnosing the cardiac disease. International Conference of the IEEE EMBS, pp. 3597-3600, 2004. V. REFERENCES 11) S. C. Saxena, V. Kumar, and S. T. Hamde, 1) S. Z. Mahmoodabadi, A. Ahmadian, and M. D. ―Feature extraction from ECG signals using Abolhasani, ―ECG Feature Extraction using wavelet transforms for disease diagnostics,‖ Daubechies Wavelets,‖ Proceedings of the fifth IASTED International conference on Visualization, International Journal of Systems Science, vol. 33, Imaging and Image Processing, pp. 343-348, 2005. no. 13, pp. 1073-1085, 2002. 2) Juan Pablo Martínez, Rute Almeida, Salvador 12) S. Z. Fatemian, and D. Hatzinakos, ―A new ECG Olmos, Ana Paula Rocha, and Pablo Laguna, ―A feature extractor for biometric recognition,‖ 16th Wavelet-Based ECG Delineator: Evaluation on International Conference on Digital Signal Standard Databases,‖ IEEE Transactions on Processing, pp. 1-6, 2009. Biomedical Engineering Vol. 51, No. 4, pp. 570- 581, 2004. 3) Krishna Prasad and J. S. Sahambi, ―Classification of ECG Arrhythmias using Multi-Resolution w Analysis and Neural Networks,‖ IEEE Transactions on Biomedical Engineering, vol. 1, pp. 227-231, 2003. e 4) Cuiwei Li, Chongxun Zheng, and Changfeng Tai, ―Detection of ECG Characteristic Points using Wavelet Transforms,‖ IEEE Transactions on i Biomedical Engineering, Vol. 42, No. 1, pp. 21-28, 1995. 5) Saritha, V. Sukanya, and Y. Narasimha Murthy, ―ECG Signal Analysis Using Wavelet V Transforms,‖ Bulgarian Journal of Physics, vol. 35, pp. 68-77, 2008. 6) B. Castro, D. Kogan, and A. B. Geva, ―ECG feature extraction using optimal mother wavelet,‖ The 21st IEEE Convention of the Electrical and Electronic Engineers in Israel, pp. 346-350, 2000.y 7) Emran M. Tamil, Nor Hafeezah Kamarudin, Rosli Salleh, M. Yamani Idna Idris, Noorzailyl M.Noor, and Azmi Mohd Tamil, ―Heartbeat Electrocardiogram (ECG) Signal Feature Extraction Using Discrete Waveletr Transforms (DWT).‖ 8) Mazhar B. Tayel, and Mohamed E. El-Bouridy, ―ECG Images Classification Using Feature Extraction Based On Waveleta Transformation And Neural Network,‖ ICGST, International Conference on AIML, June 2006. 9) V. S. Chouhan, and S. S. Mehta, ―Detection of E QRS Complexes in 12-lead ECG using Adaptive Quantized Threshold,‖ IJCSNS International Journal of Computer Science and Network Security, vol. 8, no. 1, 2008.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 43
A Review on Data Clustering Algorithms for Mixed Data
D. Hari Prasad1 Dr. M. Punithavalli2
Abstract-Clustering is the unsupervised classification of over traditional central grouping techniques, which are patterns into groups (clusters). The clustering problem has centered on the conception of ―feature‖ (see e.g. [2], [3]). been addressed in many contexts and by researchers in many Several data clustering techniques have been put forth by disciplines; this reflects its broad appeal and usefulness as one researchers to assist in the development of knowledge. of the steps in exploratory data analysis. In general, clustering Fuzzy clustering [4] is a simplification of crisp clustering is a method of dividing the data into groups of similar objects. where each sample has a varying degree of membership in One of significant research areas in data mining is to develop all clusters. In many real-world applications, in fact, a methods to modernize knowledge by using the existing knowledge, since it can generally augment mining efficiency, feasible feature-based description of objects might be especially for very bulky database. Data mining uncovers difficult to obtain or inefficient for learning purposes while, hidden, previously unknown, and potentially useful on the other hand, it is often possible tow obtain a measure of information from large amounts of data. This paper presents a the similarity or dissimilarity between objects. Among the general survey of various clustering algorithms. In addition, central algorithmic procedures for perceptual organization the paper also describes the efficiency of Self-Organized Map are clustering principles like generalized k-means methods (SOM) algorithm in enhancing the mixed data clustering or clustering methods for proximitye data [15]. Keywords-Data Clustering, Data Mining, Mixed Data The remainder of this paper is organized as follows section Clustering, Self-Organized Map algorithm. II describes the backgroundi study that is related to clustering I. INTRODUCTION algorithms proposed earlier, section III explains the challenging problems and areas of research and section IV lustering is one of the standard workhorse techniques in concludes the paper with fewer discussions. C the field of data mining. Its intention is to systematize a dataset into a set of groups, or clusters, which contain V II. BACKGROUND STUDY ―similar‖ data items, as measured by some distance A wealth of clustering techniques had been described in the function. The major applications of clustering include literature. This section of the paper presents an overview on document categorization, scientific data analysis, and these clustering algorithms put forth by various researchers. customer/market segmentation. Data clustering has been In general, major clustering methods can be classified into considered as a primary data mining method for knowledge five categories: partitioning methods, hierarchical methods, discovery. Clustering using Gaussian mixture models is yalso density-based methods, grid-based methods and model- extensively employed for exploratory data analysis. The six based methods. sequential, iterative steps of Data mining processesl are: 1) problem definition; 2) data acquisition; 3) data A. Clustering of the Self-Organizing Map preprocessing and survey; 4) data modeling; 5) evaluation; A novel method [1] was put forth by Juha Vesanto and Esa 6) knowledge deployment [1]. The purposer of survey before Alhoniemi for clustering of Self-Organizing Map. data preprocessing is to gain insight knowledge into the data According to the method proposed in this paper the possibilities and problems to determine whether the data are clustering is carried out using a two-level approach, where sufficient. Moreover the surveya assists us to select the the data set is first clustered using the SOM, and then, the proper preprocessing and modeling tools. Typically, several SOM is clustered. The purpose of this paper was to evaluate different data sets and preprocessing strategies need to be if the data abstraction created by the SOM could be considered. For this reason, efficient visualizations and employed in clustering of data. The most imperative summarizations are essential.E advantage of this procedure is that computational load Primarily the focus must be on clustering since they are decreases noticeably, making it possible to cluster large data important characterizations of data. The clustering method sets and to consider several different preprocessing implemented should be fast, robust, and visually efficient. In strategies in a restricted time. Obviously, the approach is the case of clustering Q means, the foremost step is applicable only if the clusters found using the SOM are partitioning a data set into a set of clusters QiBB ,BB where i = 1 C. analogous to those of the original data. Data clustering techniques are gaining escalating reputation ______B. Kernel-Based Clustering
About-1Senior Lecturer, Department of Computer Applications, Sri Mark Girolami presents a Mercer Kernel-Based Clustering Ramakrishna Institute of Technology, Coimbatore, India. [5] algorithm in Feature Space. This paper presents a About-2Director, Department of Computer Science, Sri Ramakrishna Arts method for both the unsupervised partitioning of a sample of College for Women, Coimbatore, India. data and the estimation of the possible number of inherent P a g e | 44 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology clusters which generate the data. This work utilizes the F. Improving Classification Decisions by Multiple perception that Knowledge performing a nonlinear data transformation into some high The new approach to combine multiple sets of rules for text dimensional feature space increases the probability of the categorization using Dempster‘s rule of combination [10] linear separability of the patterns within the transformed was described by Yaxin Bi et al. A boosting-like technique space and therefore simplifies the associated data structure. for generating multiple sets of rules based on rough set In this case, the eigenvectors of a kernel matrix which theory and model classification decisions from multiple sets defines the implicit mapping provides a means to estimate of rules as pieces of evidence which can be combined by the number of clusters inherent within the data and a Dempster‘s rule computationally simple iterative procedure is presented for of combination is developed in this approach. This approach the subsequent feature space partitioning of the data. is employed to set of benchmark data collection, both C. Grouping of Smooth Curves and Texture Segmentation individually and in combination. The experimental results using path-based clustering show that the performance of the best combination of the multiple sets of rules on the benchmark data is significantly A Path-Based Clustering algorithm [6] was described by better than that of the best single set of rules. Fischer and Buhmann for grouping of smooth curves and texture segmentation. This paper proposed a new grouping G. Clustering Algorithm for Data Mining approach referred to as Path-Based Clustering [7], which Zhijie Xu et al. expressed a Modified Clustering Algorithm measures local homogeneity rather than global similarity of w for Data Mining [11]. This paper describes a clustering objects. The new Path-Based Clustering method defines a method for unsupervised classification of objects in large connectedness criterion, which groups objects together if data sets. The new methodology particularly combines the they are connected by a sequence of intermediate objects. simulating annealing algorithme with CLARANS (clustering Moreover an efficient agglomerative algorithm is proposed Large Application based upon Randomized Search) in order to minimize the Path-Based Clustering cost function. This to cluster large data sets efficiently. The parameter T is used approach utilizes a bootstrap resampling scheme to measure i to control the process of clustering. In every step of the the reliability of the grouping results. search, if the cost of the neighbor is less than the current, set D. Bagging for Path-Based Clustering the current to the neighbor. Otherwise, accept the neighbor with the probability of exp (-(Scost-currentcost)/T). Fischer and Buhmann present bagging for path-based V clustering [8]. A resampling scheme for clustering with H. Dominant Sets and Pairwise Clustering similarity to bootstrap aggregation (bagging) is presented in A graph-theoretic approach [12] for Pairwise data clustering this paper. This aggregation (Bagging) is used to develop was developed by Massimiliano Pavan and Marcello Pelillo. the quality of path-based clustering, a data clustering A correspondence is established between dominant sets and method that can extort stretched out structures from data in a the extrema of a quadratic form over the standard simplex, noise stout way. In order to increase the reliabilityy of thereby allowing the use of straightforward and easily clustering solutions, a stochastic resampling method is implementable continuous optimization techniques from developed to deduce accord clusters. Moreover this paper l evolutionary game theory. In order to study the robustness also evaluates the quality of path-based clustering with of the approach against random noise in the background, the resampling on a large image dataset of human level of clutter is allowed to vary, starting from 100 to 1,000 segmentations. r points. Extensions of the approach presented in this paper E. Isoperimetric Graph Partitioning for Data Clustering involving hierarchical data partitioning and out of-sample extensions of dominant-set clusters can be found in [13], Leo Grady and Eric L. Schwartz together proposed an and [14], respectively. approach known as Isoperimetrica Graph Partitioning for Data Clustering and Image Segmentation [9]. This paper, I. A Conceptual Clustering Algorithm adopts a different approach, based on finding partitions with Biswas et al. in [17] put forth a conceptual clustering a small isoperimetric constant in an image graph. The E algorithm for data mining. Their paper described an algorithm described in this paper generates high quality unsupervised discovery method with biases geared toward segmentations and data clusters of spectral methods, but partitioning objects into clusters that improve with improved speed and stability. The term ―partition‖ in interpretability. Their algorithm, ITERATE, employs: (i) a this paper refers to the assignment of each node in the vertex data ordering scheme and (ii) an iterative redistribution set into two (not necessarily equal) parts. Graph partitioning operator to produce maximally cohesive and distinct has been strongly influenced by properties of a clusters. The important task here is interpretation of the combinatorial formulation of the classic isoperimetric generated patterns, and this is best addressed by creating problem: For a fixed area, find the region with minimum groups of data that demonstrate cohesiveness within but perimeter. clear distinctions between the groups. In clustering schemes, data objects are represented as vectors of feature-value pairs.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 45
Features represent properties of an object that are relevant to datasets widely used to test categorical clustering algorithms the problem-solving task. Distinctness or inter-class show that SCCADDS produces clusters that are consistent dissimilarity was measured by an average of the variance of with those produced by existing algorithms, while avoiding the distribution match between clusters. Additionally, their the computation of the spectra of large matrices and empirical results demonstrated the properties of the problems inherent in methods that employ the K-means type discovery algorithm, and its applications to problem solving. algorithms J. The New K-Windows Algorithm for Improving the K- L. A New Supervised Clustering Algorithm Means Clustering Algorithm A new supervised clustering algorithm was projected by Li The new K-windows algorithm for improving the K-means et al. in [20]. They suggested their algorithm for data set clustering algorithm was described by Vrahatis et al. in [18]. with mixed attributes. Because of the complexity of data set The process of partitioning a large set of patterns into with mixed attributes, the conventional clustering algorithms disjoint and homogeneous clusters is fundamental in appropriate for this kind of dataset are not many and the knowledge acquisition. It is called Clustering in the result of clustering is not good. K-prototype clustering is literature and it is applied in various fields including data one of the most commonly used methods in data mining for mining, statistical data analysis, compression and vector this kind of data. They borrowed the ideas from the multiple quantization. The k-means is a very popular algorithm and classifiers combing technology, use k- prototype as the basis one of the best for implementing the clustering process. The clustering algorithm in order to design a multi-level k-means has a time complexity that is dominated by the clustering ensemble algorithm in w the paper, which product of the number of patterns, the number of clusters, adoptively selects attributes for re-clustering. Comparison and the number of iterations. Also, it often converges to a experiments on Adult data set from UCI machine learning local minimum. In their paper, they presented an data repository show verye competitive results and the improvement of the k-means clustering algorithm, aiming at proposed method is suitable for data editing. a better time complexity and partitioning accuracy. M. An Efficient Clustering Algorithm for mixed type Moreover, their approach reduces the number of patterns i attributes in Large Dataset that are needed to be examined for similarity using a windowing technique. The latter is based on well known Jian et al. in [21] proposed an efficient algorithm for spatial data structures, namely the range tree, which allows clustering mixed type attributes in large dataset. Clustering fast range searches. is a extensivelyV used technique in data mining. At present there exist many clustering algorithms, but most existing K. A Spectral-based Clustering Algorithm clustering algorithms either are restricted to handle the Abdu et al. in [19] presented a novel spectral-based single attribute or can handle both data types but are not algorithm for clustering categorical data that combines competent when clustering large data sets. Few algorithms attribute relationship and dimension reduction techniques can do both well. In this article, they proposed a clustering found in Principal Component Analysis (PCA) and Latenty algorithm that can handle large datasets with mixed type of Semantic Indexing (LSI). The new algorithm uses data attributes. They first used CF*tree (just like CF-tree in summaries that consist of attribute occurrencel and co- BIRCH) to pre-cluster datasets. After that the dense regions occurrence frequencies to create a set of vectors each of are stored in leaf nodes, and then they looked every dense which represents a cluster. They referred to these vectors as region as a single point and used the ameliorated k- ―candidate cluster representatives.‖ The ralgorithm also uses prototype to cluster such dense regions. Experimental results spectral decomposition of the data summaries matrix to showed that this algorithm is very efficient in clustering project and cluster the data objects in a reduced space. They large datasets with mixed type of attributes. referred to the algorithm as SCCADDS (Spectral-based N. A Robust and Scalable Clustering Algorithm Clustering algorithm for CAtegoricala Data using Data Summaries). SCCADDS differs from other spectral A robust and scalable clustering algorithm was put forth by clustering algorithms in several key respects. Initially, the Chiu et al. in [22]. They employed this clustering algorithm algorithm uses theE feature categories similarity matrix for mixed type attributes in large database environment. In instead of the data object similarity matrix (as is the case their paper, they proposed a distance measure that enables with most spectral algorithms that find the normalized cut of clustering data with both continuous and categorical a graph of nodes of data objects). SCCADDS scales well for attributes. This distance measure is derived from a large datasets. Second, non-recursive spectral-based probabilistic model that the distance between two clusters is clustering algorithms characteristically necessitate K-means equivalent to the decrease in log-likelihood function as a or some other iterative clustering method after the data result of merging. Calculation of this measure is memory objects have been projected into a reduced space. efficient as it depends only on the merging cluster pair and SCCADDS clusters the data objects directly by comparing not on all the other clusters. The algorithm is implemented them to candidate cluster representatives without the need in the commercial data mining tool Clementine 6.0 which for an iterative clustering method. Third, unlike standard supports the PMML standard of data mining model spectral-based algorithms, the complexity of SCCADDS is deployment. For data with mixed type of attributes, their linear in terms of the number of data objects. Results on experimental results confirmed that the algorithm not only P a g e | 46 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology generates better quality clusters than the traditional k-means Experimental result illustrates that the GA-based new algorithms, but also exhibits good scalability properties and clustering algorithm is reasonable for the large data sets with is able to identify the underlying number of clusters in the mixed numeric and categorical values. data correctly III. CHALLENGING PROBLEMS AND AREAS OF RESEARCH O. Clustering Algorithm for Network Intrusion Detection The algorithms proposed by researchers discussed in section system II of this paper have their own advantages and limitations. Panda et al. in [23] described some clustering algorithms The main requirements that a clustering algorithm should such as K-Means and Fuzzy c-Means for network intrusion satisfy are: scalability, dealing with different types of detection. The objective of intrusion detection is to construct attributes, discovering clusters with arbitrary shape, minimal a system which would automatically scan network activity requirements for domain knowledge to determine input and detect such intrusion attacks. They built a system which parameters, ability to deal with noise and outliers, created clusters from its input data, then automatically insensitivity to order of input records, high dimensionality, labeled clusters as containing either normal or anomalous interpretability and usability. A number of problems are data instances, and finally used these clusters to classify associated with conventional clustering algorithms. A few network data instances as either normal or anomalous. In among them are current clustering techniques do not address their paper, they intended to propose a fuzzy c-means all the requirements adequately (and concurrently), dealing clustering technique which is capable of clustering the most with large number of dimensions and large number of data suitable number of items can be problematic because of w time complexity, the clusters based on objective function. Both the training and effectiveness of the method depends on the definition of testing was done using 10% KDDCup‘99 data, which is a ―distance‖ (for distance-based clustering), if an obvious very well-liked and broadly used intrusion attack dataset. distance measure doesn‘t exist,e then one must ―define‖ it, which is not always easy, especially in multi-dimensional P. Clustering Algorithm-based on Quantum Games spaces, the result of the clustering algorithm (that in many A new clustering algorithm based on quantum games was cases can be arbitraryi itself) can be interpreted in different projected by Li et al. in [24]. Mammoth successes have been ways [16]. A lot of algorithms for clustering data have been made by quantum algorithms during the last decade. In their developed in recent decades, nonetheless, they all visage a paper, they combined the quantum game with the problem major challenge in scaling up to very large database sizes, of data clustering, and then they developed a quantum- an acceleratingV development brought on by advances in game-based clustering algorithm, in which data points in a computer technology, the Internet, and electronic commerce. dataset are considered as players who can make decisions The mainly focused research area is Clustering of mixed and implement quantum strategies in quantum games. After data. A clustering Q means partitioning a data set into a set each round of a quantum game, each player's expected of clusters Qi, where i = 1… C. In crisp clustering, each data payoff is calculated. Soon after, he uses a link-removing- sample belongs to exactly one cluster. Clustering algorithms and-rewiring (LRR) function to change his neighbors y and may be classified as Exclusive Clustering, Overlapping regulate the strength of links connecting to them in order to Clustering, Hierarchical Clustering, and Probabilistic maximize his payoff. Further, algorithms are discussedl and Clustering. Clustering objects into separated groups is an analyzed in two cases of strategies, two payoff matrixes and important topic in exploratory data analysis and pattern two LRR functions. Accordingly, the simulation results have recognition. Many clustering techniques group the data demonstrated that data points in datasetsr are clustered objects together to ―compact‖ clusters with the explicit or reasonably and efficiently, and the clustering algorithms implicit assumption that all objects within one group are have fast rates of convergence. Furthermore, the comparison either mutually similar to each other or they are similar with with other algorithms also provides an indication of the respect to a common representative or Centroid. Clustering effectiveness of the proposed approacha can also be based on mixture models [1]. In this approach, the data are assumed to be generated by several Q. A GA-based Clustering Algorithm parameterized distributions (typically Gaussians). Jie Li et al. in [25] proposed a GA-based clustering Distribution parameters are estimated using, for example, E the expectation-maximation algorithm. Data points are algorithm for large data sets with mixed and numeric and categorical values. In the field of data mining, it is assigned to different clusters based on their probabilities in frequently encountered to execute cluster analysis on large the distributions. The implementation of clustering data sets with mixed numeric and categorical values. algorithms to mixed data is one of the challenging issues However, most existing clustering algorithms are only IV. CONCLUSION competent for the numeric data rather than the mixed data set. For this reason, their paper presented a novel clustering This proposed paper describes various algorithms presented algorithm for these mixed data sets by modifying the by researchers for data clustering. Most of the real time common cost function, trace of the within cluster dispersion applications need clustering of data. This data clustering can matrix. The genetic algorithm (GA) is used to optimize the be implemented to mixed data which is the combination of new cost function to obtain valid clustering result. numeric and strings. The clustering algorithm proposed in
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 47 literature may have its own advantages and limitations. Transactions on Pattern Analysis and Machine Developing an algorithm that meets all the requirements of Intelligence, vol. 29, no. 1, January 2007. the system is tangible. Different clustering algorithms like k- 13) M. Pavan and M. Pelillo, ―Dominant Sets and means, path-based clustering, clustering of self organized Hierarchical Clustering,‖ Proceedings of IEEE map are used widely for real world applications. The future International Conference Computer Vision, vol. 1, work mainly concentrates on developing a clustering pp. 362-369, 2003. algorithm that meets all the requirements. Moreover, the 14) M. Pavan and M. Pelillo, ―Efficient Out-of-Sample future enhancement vision to develop a clustering algorithm Extension of Dominant-Set Clusters,‖ Advances in that performs significantly well for mixed data set Neural Information Processing Systems 17,L.K. Saul, Y. Weiss, and L. Bottou, eds., pp. 1057-1064, V. REFERENCES 2005. 1) Juha Vesanto and Esa Alhoniemi, ―Clustering of 15) J. M. Buhmann, ―Data Clustering and Learning,‖ Self-Organizing Map,‖ IEEE Transactions on Handbook of Brain Theory and Neural Networks, Neural Networks, vol. 11, no. 3, May 2000, pp. M. Arbib, ed., pp. 308-312, Bradfort Books/MIT 586-600. Press, second ed., 2002. 2) J. Shi and J. Malik, ―Normalized Cuts and Image 16) A Tutorial on Clustering Algorithms,http://home Segmentation,‖ IEEE Transactions on Pattern .dei.polimi.it/matteucc/Clustering/tutorial_html. Analysis and Machine Intelligence, vol. 22, no. 8, 17) Gautam Biswas, Jerry B. Weinberg, and Douglas pp. 888-905, Aug. 2000. H. Fisher, ―ITERATE: A Conceptualw Clustering 3) Y. Gdalyahu, D. Weinshall, and M. Werman, Algorithm for Data Mining,‖ IEEE Transactions on ―Self-Organization in Vision: Stochastic Clustering Systems, Man, and Cybernetics, vol. 28, part c, no. for Image Segmentation, Perceptual Grouping, and 2, pp. 100-111, 1998.e Image Database Organization,‖ IEEE Transactions 18) M. N. Vrahatis, B. Boutsinas, P. Alevizos, and G. on Pattern Analysis and Machine Intelligence, vol. Pavlides, ―The New k-Windows Algorithm for 23, no. 10, pp. 1053-1074, Oct. 2001. Improving thei k -Means Clustering Algorithm,‖ 4) J. C. Bezdek and S. K. Pal, Eds., ―Fuzzy Models Journal of Complexity, Elsevier, vol. 18, no. 1, pp. for Pattern Recognition: Methods that Search for 375-391, 2002. Structures in Data,‖ New York: IEEE, 1992. 19) Eman Abdu, and Douglas Salane, ―A spectral- 5) Mark Girolami, ―Mercer Kernel-based Clustering basedV clustering algorithm for categorical data in Feature space,‖ IEEE Transactions on Neural using data summaries,‖ International Conference Networks, vol. 13, no. 3, May 2002. on Knowledge Discovery and Data Mining, ACM, 6) Bernd Fischer, and J. M. Buhmann, ―Path-Based Article no. 2, 2009. Clustering for Grouping of Smooth Curves and 20) Shijin Li, Jing Liu, Yuelong Zhu, and Xiaohua Texture Segmentation,‖ IEEE Transactions on Zhang, ―A New Supervised Clustering Algorithm Pattern Analysis and Machine Intelligence, vol.y 25, for Data Set with Mixed Attributes,‖ Eighth ACIS no. 4, April 2003. International Conference on Software Engineering, 7) Fischer, T. Zoller, and J.M. Buhmann, ―Pathl Based Artificial Intelligence, Networking, and Pair wise Data Clustering with Application to Parallel/Distributed Computing, vol. 2, pp. 844- Texture Segmentation,‖ Energy Minimization 849, 2007. Methods in Computer Visionr and Pattern 21) Jian Yin, Zhi-Fang Tan, Jiang-Tao Ren, and Yi- Recognition, pp. 235-250, LNCS 2134, 2001. Qun Chen, ―An efficient clustering algorithm for 8) Bernd Fischer, and J. M. Buhmann, ―Bagging for mixed type attributes in large dataset,‖ Proceedings Path-Based Clustering,‖ IEEE Transactions on of 2005 International Conference on Machine Pattern Analysis and Machinea Intelligence, vol. 25, Learning and Cybernetics, vol. 3, pp. 1611-1614, no. 11, November 2003. 2005. 9) Leo Grady and Eric L. Schwartz, ―Isoperimetric 22) Tom Chiu, DongPing Fang, John Chen, Yao Wang, Graph Partitioning for Data Clustering and Image and Christopher Jeris, ―A robust and scalable Segmentation,‖E IEEE Transactions on Pattern clustering algorithm for mixed type attributes in Analysis and Machine Intelligence, 2004. large database environment,‖ International 10) Yaxin Bi, Sally McClean and Terry Anderson, Conference on Knowledge Discovery and Data ―Improving Classification Decisions by Multiple Mining, pp. 263-268, 2001. Knowledge,‖ Proceedings of the 17th IEEE 23) Mrutyunjaya Panda, and Manas Ranjan Patra, International Conference on Tools with Artificial ―Some Clustering Algorithms to Enhance the Intelligence, 2005. Performance of the Network Intrusion Detection 11) Zhijie Xu, Laisheng Wang, Jiancheng Luo and System,‖ Journal of Theoretical and Applied Jianqin Zhang, ―A Modified Clustering Algorithm Information Technology, pp. 710-716, 2008. Data Mining,‖ IEEE 2005. 24) Qiang Li, Yan He, and Jing-ping Jiang, ―A novel 12) Massimiliano Pavan and Marcello Pelillo, clustering algorithm based on quantum games,‖ ―Dominant Sets and Pairwise Clustering,‖ IEEE P a g e | 48 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Journal of Physics A: Mathematical and Theoritical, no. 44, 2009. 25) Jie Li, Xinbo Gao, and Li-cheng Jiao, ―A GA- Based Clustering Algorithm for Large Data Sets with Mixed Numeric and Categorical Values,‖ Proceedings of the 5th International Conference on Computational Intelligence and Multimedia Applications, IEEE Computer Society, p. 102, 2003
w
e i V
y l r
a E
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 49
Optimization Of Shop Floor Operations: Application Of Mrp And Lean Manufacturing Principles
Remy Uche1J.A. Onuoha2
Abstract-This research work is concerned with the optimization lowering production cost since less capital is tied up to of shop floor operations by the application of Material unused inventory. Requirements Planning (MRP) and lean manufacturing MRP systems relies on four pieces of information in principles. The present research covers the involvement of determining what material should be ordered and when. MRP and lean manufacturing techniques in manufacturing Namely: environment. The work is intended to decrease cycle time, The master production schedule: This describes when each reduce waste in material movement and inventory, improve the flow of material through improved system layouts and product is scheduled to be manufactured; subsequently increase productivity in shop floor environment. Bill of materials: Gives informationw about the product Keywords-Material Planning, Lean Manufacturing, structure, i.e., parts and raw material units necessary to Scheduling, Production, cycle time. manufacture one unit of the product of interest; Production cycle times and material needs at each stage of I. INTRODUCTION e the production cycle time and Supplier lead times. ncreasing shop floor efficiency through the integration of The master production schedule and bill of materials indicate what materialsi should be ordered; the master IMaterial Requirements Planning (MRP) and Lean manufacturing principles has become one of the major schedule, production cycle times and supplier lead times concerns of manufacturing companies. In today's complex jointly determine when orders should be placed. manufacturing sector, we are confronted to do more with The Master Production Schedule includes quantities of less, and also challenged with new philosophies and products toV be produced at a given time period. concepts that often push or pull us in different directions. A The Lean Manufacturing is a production method that calls case in point is the ongoing integration of MRP and lean for building products with as few steps and as little work-in- manufacturing principles. MRP systems are frequently process inventory as possible. It relies on work centres or condemned as one of the main reasons so many manufacturing cells that are capable of building multiple manufacturing companies, are locked into push systems, products, giving the company the flexibility to produce the while lean concepts imply that pull systems are the ideal.y exact mix and quantity of products required. Nevertheless, one shouldn't throw one out for the other, as Its fundamental objective is to provide perfect value to the the two can coexist harmoniously and beneficiallyl with a customer through a perfect value creation process that has better definition of roles (Steinbrunner, 2004). eliminated all unnecessary waste. According to the American production and control society, To accomplish this, lean thinking changes the focus of MRP constitutes of a set of techniquesr that use master management from optimizing separate technologies and production schedule, bill of material and inventory data to assets to optimizing the flow of the product or family of calculate material requirements. In simple words, MRP is a products through the entire value stream. Eliminating waste technique use in determining when to order dependent along the entire value stream, instead of at isolated points, demand items and how to reschedulea orders to adjust for the creates processes that need less human effort, space, capital changing needs. A key question to MRP process is the and time. This allows companies to make products and number of times a company procures inventory within a services at far lower costs and with fewer defects, compared year. One can readily realize that a high inventory ratio is with traditional business systems. Companies are able to E respond to changing customer desires with great variety, likely to be conducive to ______high quality, low cost and very fast throughput times. Also, with the application of visual methods to control material About-1Department of Mechanical Engineering, Federal University of flow and work-in-process, information management on the Technology, Owerri, NIGERIA shop floor becomes much simpler and more accurate (Tel: +234 803 668 3339 E-mail: [email protected]) 2 Abou-t Department of Mechanical Engineering, Faculty of Engineering, II. PROCEDURE FOR THE IMPLEMENTATION OF MRP University of Port Harcourt, Choba, Rivers State, NIGERIA The following procedures are followed while implementing (Tel: +234 806 234 0271 E-mail: [email protected]) Material Requirements Planning.
Demand for Products: the demand for end products stems
from two main reasons. The first is known customers who P a g e | 50 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology have placed specific orders, such as those generated by sales Each organization poses a unique environment and that personnel, or from interdepartmental transactions. The means that specific actions need to be taken with due regard second source is forecast demand. to environment specifics. Bill of Materials File: This is simply known as BOM file. It We approach MRP as an organizational innovation and contains the complete product description, listing materials, identify the necessary measure which management should parts, and components but also the sequence in which the adopt in implementing it. Motivational influences product is created. The BOM file is often called the product underlying MRP implementation include: structure file or product tree because it shows how a product 1. Recognition of business opportunity for the timely is put together. It contains the information to identify each acquisition of MRP. item and the quantity used per unit of the item of which it is 2. Recognition of technical opportunity for the timely a part. acquisition of the technologies supporting MRP Inventory Records File: Inventory records file under a implementation. computerized system can be quite lengthy. Each item in 3. Recognition of need for solving manufacturing and/or inventory is carried as a separate file and the range of details inventory problems using MRP. Given the above carried about an item is almost limitless. The MRP program motivational factors one may readily identify what and how accesses the status segment of the file according to specific issues underlying MRP design and implementation. time periods. These files are accessed as needed while What refers to a generic process model composed of steps running the program. and indicative levels of effort to implement each step. How refers to management involvementw with respect to the A. Conditions for implementation process. Several requirements have to be met, in order to given an C. Mrp Computer Program MRP implementation project a chance of success, among e the conditions: The MRP program works as follows: A. A list of end items needed by time periods is A. Availability of a computer based manufacturing specified by ithe master production schedule. system is a must. Although it is possible to obtain B. A description of the materials and parts needed to material requirements plan manually, it would be make each item is specified in the bill of materials impossible to keep it up to date because of the file. highly dynamic nature of manufacturing C. TheV number of units of each item and material environments. currently on hand and on order are contained in the B. A feasible master production schedule must be inventory file. drawn up, or else the accumulated planned orders D. The MRP program ―works‖ on the inventory file. of components might mix with the resource In addition, it continuously refers to the bill of restrictions and become infeasible. materials file to compute quantities of each item C. The bills of material should be accurate. Ity is needed. essential to update them promptly to reflect any E. The number of units of each item required is then engineering changes brought to the product.l If a corrected for on hand amounts, and the net component part is omitted from the bill of material requirement is ―offset‖ to allow for the lead time it will never be ordered by the system. needed to obtain the material. D. Inventory records should be a precise r D. Output Reports representation of reality, or else the netting process and the generation of planned orders become Primary Reports: Primary reports are the main or normal meaningless. reports used for the inventory and production control. These E. Lead times for all inventorya items should be known report consist of and given to the MRP system. 1. Planned orders to be released at a future time. F. Shop floor discipline is necessary to ensure that 2. Order release notices to execute the planned orders. orders are E processed in conformity with the 3. Changes in due dates of open orders due to rescheduling. established priorities. Otherwise, the lead times 4. Cancellations or suspensions of open orders due to passed to MRP will not materialize. cancellation or suspension of orders on the master production schedule. B. Techniques for the implementation of 5. Inventory status data. MRP Secondary Reports: Additional reports, which are optional MRP represents an innovation in the manufacturing under the MRP system, fall into three main categories: environment. Thus, its effective implementation requires 1. Planning reports to be used, for example, in forecasting explicit management action. Steps need to be clearly inventory and specifying requirements over some future identified and necessary measures be taken to ensure time horizon. organizational responsiveness to the technique being implemented.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 51
2. Performance reports for purposes of pointing out inactive 3. Exceptions reports that point out serious discrepancies, items and determining the agreement between actual and such as errors, out of range situations, late or overdue programmed item lead times and between actual and orders, excessive scrap, or nonexistent parts. programmed quantity usage and costs. The Figure below shows an overall View of a Material Requirements Program and the Reports Generated by the Program.
Firm orders Forecasts of from known demand from customers random Inventory customers Transactions
Master production w schedule
e
i
V
Bill of Material planning Inventory materials (MRP computer records file file program)y
l
r
a Primary Reports - Secondary reports -
Planned-order schedulesE for Exceptions reports planning
inventory & production control reports. Reports of
performance control
Figure 1. Overall View of the Inputs to a Standard Material Requirements Program and the Reports Generated by the Program objectives often associated with MRP design and E. MRP objectives implementation may be identified among three main The main theme of MRP is ―getting the right materials to the dimensions, namely: inventory, priorities and capacity: right place at the right time‖. Specific organizational Dimension: Objective specifics P a g e | 52 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Inventory According to Steinbrunner (2004), lean is centred on - Order the right part creating more value with less work. Lean manufacture is a - Order the right quantity generic process management philosophy derived mostly - Order at the right time from the Toyota Production System (TPS) and identified as Priorities ―Lean‖ only in the 1990s. It is renowned for its focus on - Order with the right due date reduction of the original Toyota seven wastes to improve - Keep the due date valid overall customer value, but there are varying perspectives on how this is best achieved. The steady growth of Toyota, Capacity from a small company to the world‘s largest automaker, has - Plan for a complete load focused attention on how it has achieved this. - Plan for an accurate load Lean manufacturing is a variation on the theme of efficiency - Plan for an adequate time to view future load based on optimizing flow; it is a present-day instance of the III. LEAN MANUFACTURING recurring theme in human history toward increasing efficiency, decreasing waste, and using empirical methods to Lean manufacturing is a western adaptation of the Toyota decide what matters, rather than uncritically accepting pre- Production System, developed by the Japanese carmaker existing ideas. Lean manufacturing is often seen as a more and most famously studied (and the term ―Lean‖ coined) in refined version of earlier efficiency efforts, building upon The Machine That Changed the World (Womack, 1996). the work of earlier leaders. A fundamental principle of lean The Internet offers some useful resources on this topic, manufacturing is demand-based flow manufacturing.w In this including BCG systems inc.(http//www.mmsonline.com), type of production setting, inventory is only pulled through which state that Lean Manufacturing is a production method each production center when it is needed meet a customer‘s that calls for building products with as few steps and as little order. The benefits of this goal include: decreased cycle work-in-process inventory as possible. It relies on work e time, less inventory, increased productivity, increased centres or manufacturing cells that are capable of building capital equipment utilization. multiple products, giving the company the flexibility to The core of lean is foundedi on the concept of continuous produce the exact mix and quantity of products required. product and process improvement and the elimination of Taiichi Ohno, the engineer commonly credited with non-value added activities. The value adding activities are development of the Toyota Production System, and simply only those things the customer is willing to pay for, therefore Lean, identified seven types of waste: defective everythingV else is waste, and should be eliminated, products, unnecessary finished products, unnecessary work simplified, reduced, or integrated (Rizzardo, 2003). in process, unnecessary processing, unnecessary movement Improving the flow of material through new ideal system (of people), unnecessary transportation (of products) and layouts at the customer‘s required rate would reduce waste unnecessary delays. Lean focuses on eliminating these in material movement and inventory. wastes from a manufacturing system. In particular, this work is interested in the second and third types – unnecessaryy A. Steps to achieve lean systems finished goods and work in process. The Lean answer to The following steps should be implemented to create the these wastes is to link production at each step in the process ideal lean manufacturing system: with the subsequent process (or the consumer forl finished 1. Design a simple manufacturing system goods). At Toyota, they use kanban (a Japanese word for 2. Recognize that there is always room for ―shop sign‖) cards attached to each sub-assembly that are improvement sent back to the producer each time oner is used. The cards 3. Continuously improve the lean manufacturing then become a signal to produce one more. As a result, the system design number of cards in the system controls the amount of work in process. a B. Basics for the design of a simple lean Liker (1997) describes a sequence of phases that a manufacturing system manufacturing facility must visit to become Lean: process A fundamental principle of lean manufacturing is demand- stabilization, continuous flow, synchronous production, pull based flow manufacturing. In this type of production setting, authorization, and levelE production. Such anecdotes are inventory is only pulled through each production center useful advice for managers and provide a general framework when it is needed to meet a customer‘s order. The benefits for becoming Lean, although they do not provide specific of this goal include: strategies for changing production control schemes.
Lean Manufacturing or Lean production, which is often decreased cycle time known simple as ―Lean‖, is a production practice that less inventory considers the expenditure of resources for any goal other increased productivity than the creation of value for the end customer to be increased capital equipment utilization wasteful, and thus a target for elimination.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 53
(a) There is always room for improvement between the vendor and customer. Materials can then be pulled into the production lines as needed to support the The core of lean is founded on the concept of continuous required production rate of finished goods. Sharing material product and process improvement and the elimination of plans can lead to partnerships with vendors that not only non-value added activities. ―The Value adding activities are reduce lot sizes and lead-times, but also result in reduced simply only those things the customer is willing to pay for, costs and less work-in-process at both vendor and customer everything else is waste, and should be eliminated, locations. For the job shop environment, the planning and simplified, reduced, or integrated‖(Rizzardo, 2003). inventory tools of MRP can also be applied to set priorities Improving the flow of material through new ideal system for raw materials and manufactured products, in addition to layouts at the customer's required rate would reduce waste in developing plans for when and how much will be required. material movement and inventory . Companies will continue to find ways to apply lean (b) Continuously improve manufacturing concepts,if they should remain competitive, to simplify material planning, reduce waste and improve A continuous improvement mindset is essential to reach a their operations. But it may not be feasible to apply pull company's goals. The term "continuous improvement" methods to all of the company's product lines. When MRP means incremental improvement of products, processes, or planning and inventory tools are needed to support the job services over time, with the goal of reducing waste to shop environment, and pull methods make sense to support improve workplace functionality, customer service, or the repetitive production lines, manufacturers will find that a product performance (Suzaki, 1987). blend of MRP push methods and leanw manufacturing pull C. Lean Goals methods can provide the right material planning mix for their mixed mode environment. In order to have a successful The four goals of Lean manufacturing systems are implementation of MRP, the recommended steps are to be to: e followed: Improve quality: To stay competitive in today‘s A computer based manufacturing system should be made marketplace, a company must understand its available. It wouldi be impossible to keep material customers' wants and needs and design processes to requirements plan up to date because of the highly dynamic meet their expectations and requirements. nature of manufacturing environments. Although it is Eliminate waste: Waste is any activity that possible to obtain material requirements plan manually, but consumes time, resources, or space but does not it is time consumingV and a daunting task. add any value to the product or service. There are A feasible master production schedule must be drawn up, or seven types of waste: else the accumulated planned orders of components might 1. Overproduction (occurs when production should fall into the resource restrictions and become infeasible. have stopped) The bills of material should be updated and accurate. It is 2. Waiting (periods of inactivity) essential to update BOM promptly to reflect any engineering 3. Transport (unnecessary movement of materials)y changes brought to the product. If a component part is 4. Extra Processing (rework and reprocessing) omitted from the bill of material it will never be ordered by 5. Inventory (excess inventory not directlyl required the system. for current orders) Inventory records should be a precise representation of 6. Motion (extra steps taken by employees because of reality, or else the netting process and the generation of inefficient layout) r planned orders become meaningless. 7. Defects (do not conform to specifications or Lead times for all inventory items should be known and expectations) given to the MRP system. Reduce time: Reducing the time it takes to finish an The last but not the least is maintaining Shop floor a discipline. It is necessary to ensure that orders are processed activity from start to finish is one of the most effective ways to eliminate waste and lower costs. in conformity with the established priorities. Otherwise, the lead times passed to MRP will not materialize. Reduce total costs: To minimize cost, a company must produceE only to customer demand. V. CONCLUSION Overproduction increases a company‘s inventory MRP and lean are not only capable of co-existing, but they costs because of storage needs can also support one another, provided that the following IV. DISCUSSION concepts are understood and conditions exist: Commitment to planning: First and foremost, there must be MRP can be used to set priorities for the production of a commitment to planning. The "P" in MRP is for planning, finished goods, in an environment where mixed mode is yet its role is often overshadowed by the zeal to reduce practised and in the job shop environment in order to waste. The importance of planning simply cannot be develop a plan for common raw materials consumed. overlooked. Beyond better inventory control, planning Uniform containers can be used to standardize lot sizes in enables you to have the right quality and quantity at the right production lines, for unique items consumed to signal the location and time. Good material planning can help reduce need to replenish materials and to simplify transport P a g e | 54 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology the waste of downtime and reduce overtime. It also helps journal of Applied manufacturing systems, winter, with overall product quality. pp.47-58. Communication with suppliers: While lean concepts reduce 4) Edward, A.S., (1995) Inventory management and waste throughout every cycle of production, MRP can Production Planning and scheduling, John wiley reduce waste in the supply chain through better relationships and sons. with suppliers. Planning enables better data and information 5) Gahagan, S.M. (2008) Simulation and that can be shared with vendors. optimization of production control for lean Dedication to data: While MRP systems can play an manufacturing Transition, unpublished dissertation important role in synchronizing products, if changes occur, submitted to the faculty of Graduate school, MRP can be slow to respond. This is usually a result of University of Maryland. transactions not being entered in a timely manner. Effective 6) Jain,R.K. (2008) Production Technology, sixteenth product data management is critical to adapting traditional Edition, Khanna publishers, 2-B,Nath market,Nai manufacturing systems to agile and lean manufacturing sarak,New Delhi. methods. However, it all begins with the data. By gaining an 7) James, H.G. (1997) American production and understanding about which bills of material and routing inventory control society production and inventory schemes are appropriate for given situations, you learn how control Handbook, McGraw-Hill. they can 8) John, F.P. (1998) Master scheduling: A practical be used to streamline operations, improve quality, reduce Guide to competitive management, John wiley and waste, minimize inventory and increase the use of sons. w manufacturing assets. 9) Liker, J.K. (1997) Becoming lean: inside stories of U.S. Manufacturers, Productivity press Portland, MRP is effective when people understand that the system Oregon. e cannot think for them. Too often, team members know that 10) Mohommed, S.A.(2002) African Iron and steel the information loaded into the system is useless, and they Industry[online]. 10(8). Available therefore have no faith in the resulting data that is intended from:http//globle.steel.com/[Accessedi 22nd to guide their ordering, systems, processes and operations - a February 2010]. classic case of garbage in, garbage out. However, if team 11) Moustakis, V.(2000) Material Requirements members have confidence in the data, they will have planning (MRP), Technical University of Crete. confidence in the system. 12) Orlicky,V J. (1976) Materials Requirements Finally, when the principles are well integrated the Planning, McGraw-Hill publisher. following benefits will be obtained 13) Salem, O. and Zimmer, E. (2006) Application of Improve quality: To stay competitive in today‘s lean manufacturing principles to construction marketplace, a company must understand its customers' [online].Available wants and needs and design processes to meet their from:http//www.leanconstructionjournal.org/[Acce expectations and requirements. y ssed 15th February 2010]. Eliminate waste: Waste is any activity that consumes time, 14) Steinbrunner, D. (2004) Modern machine shop resources, or space but does not add any value to the product [online].6(4).Available from:http//mmsonline .com/ l nd or service. [Accessed 2 February 2010]. Reduce time: Reducing the time it takes to finish an activity 15) Waddell, B. (1984) International journal of from start to finish is one of the mostr effective ways to production Research, vol.22,No. 2.pp. 193-233. eliminate waste and lower costs. 16) Womack, J.P. and Jones, D.T. (1996) Lean- Reduce total costs: To minimize cost, a company must thinking: Banish waste and create wealth in your produce only to the customer‘s specification and company. New York. demand. Overproduction increasesa a company‘s inventory costs because of storage needs and inventory carrying cost.
VI. REFERENCES E 1) Agbu, O. (2007) The Iron and steel Industry and Nigeria‘s Industrialization:Exploring cooperation with Japan,institute of develping Economies,Chiba,Japan. 2) Auston, M.K. (1997) Lean manufacturing principles: A comprehensive Framework for improving production efficiency, University of california, los Angeles. 3) Black, J.T.and chen, J.C. (1994) Decoupler- improved output of an apparel Assembly cell, The
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 55
A Study On Rough Clustering
Dr.K.Thangadurai1 M.Uma2 Dr.M.Punithavalli3
Abstract-Clustering of data is an important data mining clustering. Another kind of clustering is conceptual application. However, the data contained in today’s databases clustering: two or more objects belong to the same cluster if is uncertain in nature. One of the problems with traditional this one defines a concept common to all that objects. In partitioning clustering methods is that they partition the data other words, objects are grouped according to their fit to into hard bound number of clusters. There have been recent descriptive concepts, not according to simple similarity advances in algorithms for clustering uncertain data, Rough set based Indiscernibility relation combined with measures[1]. indiscernibility graph, leads to knowledge discovery in an II. GOALS OF CLUSTERING elegant way as it creates natural clusters in data. In this thesis, rough K-means clustering is studied and compared with the The goal of clustering is to determine the intrinsic grouping traditional K-means and weighted K-Means clustering in a set of unlabeled data. There is no absolute best criterion methods for different data sets available in UCI data which would be independent of the final aim of the repository clustering. Consequently, it is the user which must supply Keywords-Clusters, Boundary, Iteration, Attributes, this criterion, in such a way that the result of the clustering Centroid, w will suit their needs. For instance, we could be interested in I. INTRODUCTION finding representative for homogeneous groups (data reduction), in finding natural clusters and describe their lustering is a technique to group together a set of items unknown properties (naturale data types), in finding useful C having similar characteristic. There are two kinds of and suitable groupings (useful data class) or in finding clusters to be discovered in web usage domain they are unusual data objects (outlieri detection). usage clusters and page clusters. Clustering of users tends to establish groups of users exhibiting similar browsing A. The main requirements that a clustering algorithm patterns. Clustering of pages will discover groups of pages should satisfy are having related content. This information is useful for Scalability , dealing with different types of attributes, internet search engines and web assistance providers. V discovering clusters with arbitrary shape, minimal Clustering can be considered the most important requirements for domain knowledge to determine input unsupervised learning problem, so as every other problem of parameters, ability to deal with noise and outliers, this kind, it deals with finding a structure in collection of insensitivity to order of input records, high dimensionality, unlabeled data. A cluster is therefore a collection of objects interpretability and usability[2] which are similar between them and are dissimilar to the objects belonging to other clusters. Here the simpley B. Numbers of problems with clustering are graphical example for that Current clustering techniques do not address all the l requirements adequately. Dealing with large number of dimensions and large number of data items can be problematic because of time r complexity. The effectiveness of the method depends on the definition of distance. a If and obvious distance measure doesn‘t exist we must define it, which is not always easy, especially in multi- dimensional spaces. The result of the clustering algorithm can be interpreted in FigureE 1: Cluster Analysis different ways In this case we easily identify the 4 clusters into which the data can be divided; the similarity criterion is distance: two III. CLUSTERING ALGORITHMS or more objects belong to the same cluster if they are A large number of techniques have been proposed for ―close‖ according to a given distance (in this case forming clusters from distance matrices. The most important geometrical distance). This is called distance-based types are hierarchical techniques, optimization techniques ______and mixture models. We are going to discuss first two types About-1 Head in Computer Science, Govt. Arts College (Men), Krishnagiri, here TN, India(e-mail; [email protected]) About-2 Research Scholar, Dravidian University, Kuppam, A.P., India C. Approaches to clustering About-3 Director, Department of Computer Science, SRCW, Coimbatore, TN, India 1. Centroid approaches, 2.hierarchical approaches.
P a g e | 56 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Centroid approaches: We guess the centroids or central B. Types of partitional Algorithms point in each cluster, and assign points to the cluster of their Squared Error Algorithms nearest centroid. Graph-Theoretic Clustering Hierarchical approaches: We begin assuming that each Mixture-Resolving point is a cluster by itself. We repeatedly merge nearby Mode-Seeking Algorithms clusters, using some measure of how close two clusters are, or how good a cluster the resulting group would be. K-Means Algorithm:The K-means method aims to minimize D. Hierarchical Clustering Algorithms the sum of squared distances between all points and the A hierarchical algorithm yields a dendogram, representing cluster centre. This procedure consists of the following the nested grouping of patterns and similarity levels at steps, as described by Tou and Gonzalez. which groupings change. The dendogram can be broken at 1. Choose K initial cluster centre z1 (1), z2 (1)…zk (1). different levels to yield different clustering of the data. 2. At the k-th iterative step, distribute the samples {x} Most hierarchical clustering algoritms are variantas of the among the K clusters using the relation single-link, complete-link, and minimum-variance xC j (k)if || x z j (k) |||| x zi (k) || algorithms[3]. The single-link and complete-link algorithms are most popular. These two algorithms differ in the way of For all i=1, 2…K; I ≠ j; where Cj (k) denotes the set of characterize the similarity between a parir of cluster. samples whose cluster centre is zj (k). In the single link method, the distance between two clusters 3. Compute the new cluster centre zj (k+1),w j=1, 2…K such is the minimum of the distances between all pairs of patterns that the sum of the squared distances from all points in Cj drawn from the two clusters. In the complete link algorithm, (k) to the new cluster centre is minimized. The measure the distance between two clusters is the minimum of all pair which minimizes this is simply the sample mean of Cj (k). Therefore, the new cluster centree is given by wise distance between patterns in the two clusters. The 1 clusters obtained by the complete link algorithm are more z (k 1) x compact then those obtained by the single link algorithm j i N j xC j (k) IV. PARTITIONAL ALGORITHMS j=1, 2… K A partitional clustering algorithm obtains a single partition Where Nj is the number of samples in Cj (k) of the data instead of a clustering structure, such a 4. If zj (k+1)V =zj (k) for j=1, 2…K then the algorithm has dendogram produced by a hierarchical technique. converged and the procedure is terminated. Parititional methods have advantages in applications 5. Otherwise go to step 2 involving large data sets for which the construction of a C. Drawbacks of K-Means algorithm dendogram is computationally prohibitive. A problem accomanying the use os partitional algorithm is the choice of The final clusters do not represent a global optimization the number of desired output clusters. The partitionaly result but only the local one, and complete different final technique usually produce clusters by optimizing a citerion clusters can arise from difference in the initial randomly function defined either locally or globally. chosen cluster centers. l We have to know how many clusters we will have at the A. Clustering Techniques first. Let X be a data set, that is, X = {푥푖=1……N}r D. Working Principle Now let be the partition, ℜ, of X into m sets, 퐶 , j=1…m. 푗 The K-Means algorithm working principles are clearly These sets are called clusters and need to satisfy the explained in the following algorithm steps. following conditions: •퐶 ≠ ∅ , i = 1... m a Algorithm: 푖 푚 • 푖=1 퐶푖 =X 1) Initialize the number of clusters k. •퐶푖 ∩ 퐶푗 = ∅, i j, i,j=1,….,m 2) Randomly selecting the centroids in the given data It is important to sayE that the objects (vectors) contained in a set (풄ퟏ풄ퟐ … 풄풌) cluster 퐶푖 are more similar to each other and less similar to 3) Compute the distance between the centroids and the objects (vectors) contained in the other clusters. The objects using the Euclidean Distance equation. intention in the clustering algorithms is to join (or separate) ퟐ a. dij = 풙풊−풄풌 the most similar (or dissimilar) objects of a data set X, it is 4) Update the centroids. necessary to apply a function that can make a quantitative 5) Stop the process when the new centroids are measure among vectors [8] nearer to old one. Partitional algorithm is typically run multipel times with Otherwise, go to step-3. different starting states, and the best configuration obtained from all of the runs issued as the output clustering.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 57
E. Weighted K-Means Algorithm rough sets technique reduces the computational complexity of learning processes and eliminates the unimportant or Weighted K-Means algorithm is one of the clustering irrelevant attributes so that the knowledge discovery in algorithms, based on the K-Means algorithm calculating database or in experimental data sets can be efficiently with weights. A natural extension of the K-Means problem learned. Using rough sets, has been shown to be effective allows us to include some more information, namely, a set for revealing relationships within imprecise data, of weights associated with the data points. These might discovering dependencies among objects and attributes, represent a measure of importance, a frequency count, or evaluating the classificatory importance of attributes, some other information. This algorithm is same as normal removing data re-abundances, and generating decision rules K-Means algorithm just adding the weights. Weighted K- [5]. Some classes, or categories, of objects in an information Means attempts to decompose a set of objects system cannot be into a set of disjoint clusters, taking into consideration the distinguished in term of available attributes. They can only fact that the numerical attributes of objects in the set often be roughly, or approximately, defined. The idea of rough do not come from independent identical normal distribution. sets is based on equivalence relations which partition a data The weighted k-means algorithm uses weight vector to set into equivalence classes, and consists of the decrease the affects of irrelevant attributes and reflect the approximation of a set by a pair of sets, called lower and semantic information of objects. Weighted K-Means upper approximations. The lower approximation of a given algorithms are iterative and use hill-climbing to find an sets of attributes, can be classified as certainly belonging to optimal solution (clustering), and thus usually converge to a the concept. The upper approximationw of a set contains all local minimum. objects that cannot be classified categorically as not In the Weighted K-Means algorithm, the weights can be belonging to the concept. A rough set also is defined as an classified into two types. approximation of a set, defined as a pair of sets: the upper Dynamic Weights: In the dynamic weights, the weights are e and lower approximation of a set [7]. changed during the program. Static Weights: In the static weights, the weights are not G. Roughi K-Means Algorithm changed during the program. Step 0: Initialization. Randomly assign each data object to The Weighted K-Means algorithm is used to clustering the exactly one lower approximation. By definition (Property 2) objects. Using this algorithm we can also calculating the the data object also belongs to the upper approximation of weights dynamically and clustering the data in the dataset. the same cluster.V Working Principle Step 1: Calculation of the new means. The means are The Weighted K-Means algorithm working procedure is calculated as follows: same as the procedure for K –Means algorithm but the only weight is included in the weighted k means algorithm. The 푚푘 working procedure is given in the following algorithm steps. 푋푛 푋푛 퐵 푤푙 + 푤퐵 퐵 푓표푟 퐶푘 ≠ ∅. Input: a set of n data points and the number of clusters |퐶푘 | 퐶 y 푋 ∈퐶 퐵 푘 (K) = 푘 푘 푋푘 ∈퐶푘 Output: centroids of the K clusters 푋푛 l 푤푙 푂푡ℎ푒푟푤푖푠푒. 1. Initialize the number of clusters k. |퐶푘 | 2. Randomly selecting the centroids (푐1푐2 … 푐푘 ) in 푋푘 ∈퐶푘 the data set. r where the parameters wl and wb define the importance of 3. Choosing the Static weight ,which is range 푤 the lower approximation and boundary area of the cluster. from 0 to 2.5 or (5.0) The expression |Ck| indicates the numbers of data objects in 4. Find the distance betweena the centroids using lower approximation of the cluster and |CBk | = |Ck −Ck| is the Euclidean the number of data objects in the boundary areas. Distance equation. Step 2: Assign the data objects to the approximations. (i) For 2 a given data object Xn dij =E 푤 . ∗ (푥푖−푐푘 ) determine its closest mean mh: 5. Update the centroids using this equation. 6. Stop the process when the new centroids are 푚푖푛 푑푛,ℎ = 푑 푋푛 , 푚푘 = 푚푖푛푘=1…푘 푑 푋푛 , 푚푘 nearer to old one. Otherwise, go to step-4. F. Rough Set Clustering Algorithm Assign Xn to the upper approximation of the cluster h:Xn ∈ Ch. Rough sets were introduced by Zdzislaw Pawlak [6][7] to (ii) Determine the means mt that are also close to Xn—they provide a systemic framework for studying imprecise and are not farther away from Xn than d( Xn,mh)where is a insufficient knowledge. Rough sets are used to develop given threshold: efficient heuristics searching for relevant tolerance relations that allow extracting objects in data. An attribute-oriented 푇 = {푡: 푑 푋푛 , 푚푘 − 푑 푋푛 , 푚ℎ ≤ 휀 ∩ ℎ ≠ 푘} P a g e | 58 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
measure of Xie - Bien Index for three different UCI data If T= ∅ (Xn is also close to at least one other mean mt sets. It is observed that Rough K-Means algorithm is besides mh) performing well comparatively Then Xn ∈ Ct , ∀t ∈ T . • Else Xn ∈ Ch. VI. REFERENCES Step 3: If the algorithms continue with Step 1. Else STOP. 1) Agrawal R, Imielinski T and Swami A. ―Mining association rules between sets of items in large H. Experimental Results And Discussion databases‖, In Proc. 1993 Int. Conf. Management The experimental analysis is carried out in this chapter by of Data (SIGMOD-93), 207-216. May 1993 considering three different data sets from UCI data depository and the algorithms are validated through XIE – 2) Agrawal R, Mannila H, Srikant R, Toivonen H and BIEN index Verkamo AI. ― Fast discovery of association rules.‖, In: Fayyad UM, Piatetsky-Shapiro G, I. Xie-Beni Validity Index Smyth P and Uthurusamy R. (Eds.) Advances in In this thesis, the Xie-Beni index has been chosen as the Knowledge Discovery and Data Mining, 307-328, cluster validity measure because it has been shown to be 1996. able to detect the correct number of clusters in several 3) Bhattacharyya S, Pictet O, Zumbach G. experiments. Xie-Beni validity is the combination of two ―Representational semanticsw for genetic functions. The first calculates the compactness of data in the programming based learning in high-frequency same cluster and the second computes the separateness of financial data.‖, Genetic Programming 1998: Proc. data in different clusters. Let S represent the overall validity 3rd Annual Conf., 11e-16. Morgan Kaufmann, 1998. index, π be the compactness and s be the separation of the 4) Jiawei Han and Micheline Kamber, ―Data Mining rough k-partition of the data set. The Xie-Beni validity can Concepts and Techniques‖, Morgan Kaufmann now be expressed as: Publishers, USA,i 2001. 5) Kusiak M, ―Rough set theory: A Data Mining tool K n 2 2 for semiconductor manufacturing‖, IEEE ij || x zi || Transactions on Electronics Packaging i1 j1 ManufacturingV 24 (1) (2001) 44-50
6) Lingras P and West C, ―Interval set clustering of n web users with rough K-means‖, Journal of Where Intelligent Information Systems 23 (1) (2004) 5-16. And s= (d ) 2 min 7) Lingras P, Yan R and M. Hogo, ―Rough set based dmin is the minimum distance between cluster centres, clustering: evolutionary, neural, and statistical given by y approaches‖, Proceedings of the First Indian dmin= minij ||zi-zj|| International Conference on Artificial Intelligence Where n is the number of users, k is the number of clusters, (2003) 1074-1087. and Zi is the cluster centre of cluster Ci, wl is takenl as 0.7 8) Lingras, P. ―Rough Set Clustering for Web for the elements that are placed in lower approximation, wu Mining‖, Proceedings of 2002 IEEE International is taken 0.3 for the elements that are placed in Upper Conference on Fuzzy Systems. 2002. approximation, µij is taken as 0.3 for ther elements that are 9) .Milligan G.W and Cooper M.C., ―An examination placed in boundary region. µij be the membership value of of procedures for determining the number of the user in boundary region. Smaller values of π indicate clusters in a data set‖, Psychometrika, vol. 50, pp. that the clusters are more compact and larger values of s a 159-179, 1985. indicate the clusters are well separated. Thus a smaller S 10) Monmarche N. Slimane M, and Venturini G. reflects that the clusters have greater separation from each Antclass, ―Discovery of cluster in numeric data by other and are more compact. In this thesis, Xie-Beni validity an hybridization of an ant colony with the k-means index is used to validateE the clusters obtained after applying algorithm‖, Technical Report 213, Ecole d‘ the clustering algorithms Ingenieurs en Informatique pour l‘Industrie (E3i), V. CONCLUSION Universite de Tours, Jan. 1999.
The K-Means, Weighted K-Means and Rough K-Means clustering algorithms have been studied and implemented. All the three algorithms are analyzed using the validity
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 59
Applying Software Metrics on Web Applications
Vikas Raheja1 Rajan Saluja2
Abstract- Web Applications Automates many daily business height . where height is an attribute activities . User Interact with these web applications by the Mapping: After finding the Empirical Relation one should interface which these applications provides . Web applications go for mapping from Empirical Relation to Numerical are different from normal applications . The traditional Relation . software metrics can be applied to web applications . but some A is taller than B if and only if M(A) > M(B) new metrics which are made only for web applications are If we convert that type of relation to some mathematical important and increase the performance of web applications. form then such form is called mapping . In this paper traditional software metrics as well as some new web metrics are described . In new approach I have described The stages for measurement are the performance metric for web applications and security Identify attribute for some real world entities. measures and navigability metric which are useful to improve Identify empirical relation for attributes. the web applications . In the beginning I have given basics of Identify numerical relations corresponding to each measurements which are required for better understanding of empirical relation. this paper . w Define Mapping from real world entities to Keywords-Web Metric ,Navigability Metric , Performance numbers. Metric , Security Metric Check that numerical relations preserve and are I. INTRODUCTION preserved by empiricale relation. easurement is the process by which numbers or A. Direct and Indirect Measurement symbols are assigned to attributes of entities in the i M Once we have a model of entities and attributes involved , real world in such a way as to describe them according to we can define the measure in terms of them . Direct clearly defined rules [2]. Software Metric is a term that measurement of an attribute of an entity involves no other embraces many activities , all of which involves some attribute or entity for example length of a physical object degree of measurement . Software Metrics provides a basis V can be measured without reference to any other object or for improving the software process, increasing the accuracy attribute . on the other hand , density of a physical object of project estimates , enhancing project tracking , and can be measured only in terms of mass and volume , we then improving software quality . There are many type of use a model to show us that the relationship among the three Software Metrics present out of which some are in the area is density = mass / volume . some direct measures in of software engineering are length, duration of testing process , 1. Cost and Effort Estimation y number of defects discovered ,time a programmer spends on 2. Productivity Measure and Models the project. Indirect measurement is often useful in making 3. Data Collection l visible the interactions between direct measurement [1] . 4. Quality Model and Measures Example of Common Direct Measurement 5. Reliability Models Length 6. Performance Evaluation and Models r Width 7. Structural and Complexity metrics Line of Code 8. Capability – Maturity Assessment Example of Common Indirect Measurement II. THE BASIC OFa MEASUREM ENT Program Productivity:LOC produced / person months efforts There are several theories of measurement, which will work Module Defect Density :Number of Defects /module size like for e.g. Representational Theory of Measurement will Requirements Stability:Number of initial requirement/ total work [2]. The Representational Theory of Measurement E number of requirements. seeks to formulize our intuition about the way world Test Effectiveness Ratio :Efforts spent fixing faults / total works. that is , the data we obtain project effort as measures should represent the attributes of the entity. Our Intuition is the starting point for all measurements. B. Measurement Scales and Scale types Empirical Relation : Given any two peoples x and y , we There are five major type of scales . can observe that x is taller than y or y is taller than x therefore we say that ―Taller than is a empirical relation for Nominal ______ Ordinal Interval About-1 Assistant Professor ,N.C. Institute of Computer Sciences , Israna , Ratio Panipat (Haryana) 2 Absolute About- Assistant Professor ,N.C. Institute of Computer Sciences , Israna , Panipat (Haryana) P a g e | 60 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
C. Classifying Software Measures Table 1
Software measurement needs entities and attributes , we can III. WEB METRICS divide our software to these three classes . A. Web Engineering Fundamentals Processes : are collection of software-related activities. Web Engineering is the implementation of engineering Products : are any artifacts , deliverables or documents that principals to obtain high quality web applications . Similar result from a process activities types of processes will be followed to make web Resources : are entities required by the process activities. applications as in traditional software‘s but with new ideas . With in each class of entities we distinguish internal and now a day when the platform of programming has changed external attributes then it is difficult to develop the software only with Internal attributes : of a product , process or resources are traditional models . some changes in models needs to be those that can be measured purely in terms of the product , required for the development of online applications. In the process or resources itself. Previous years the web site consist of little more than a set External attributes : of a product , process or resources are of hypertext files that present information using text and those that can be measured only with respect to how the limited graphics . as the time passed , HTML was product , process or resources relates to the environment . augmented by development tools that enabled web engineers
to provide computing capability along with information . Entitie Attributes As in traditional projects attributes are needed for software s metrics either they are internal attributesw or external Produc Internal External attributes. Similarly attributes are needed by web metrics t for the improvement of online projects or web applications . Specifi Size, Reuse, Modularity, Comprehensi some of the attributes which are useful for web metric are cation Redundancy Functionality, bility, e Network Intensiveness : A Web App resides on a network Syntactic Correctness Maintainabili and must serve the needs of a diverse community of clients . ty Web Applications arei network dependents [5]. Design Size , Reuse, Modularity, Reliability, Concurrency : A Large no of users may assess the Web s Coupling , Cohesiveness, Usability, Application at one time [5]. Functionality Maintainabili Unpredictable Load : At one time 1000 users can assess the ty. web applicationV or 10 users may assess the web application Code Size,Reuse,Modularity,Coupling, Reliability, [5]. Functionality,Algorithmic Usability, a. Performance : If a user wait for too long then , he complexity, ControlFlow Maintainabili or she may decide to go else where [5]. Structure ness ty, Availability : Web Application should be available for Test Size , Coverage level Quality maximum time like 24/7/365 basis [5] . data y b. Data Driven : The primary function of many web Processes application is to present Hypermedia files as well Constr Time, Effort, No of Qualityl , as to display the graphics But web applications ucting Requirements Changes , cost, Stability may also be able to assess the database [5]. Specifi Content Sensitive :The text present on the web sites should cation r be of high quality . Because the contents always represent Detaile Time , Effort, No of Specification Cost, Cost the quality of web sites [5]. d Faults Found Effectiveness Continuous Evolution : Web Applications evolves Design continuously. Some web applications may be updated after Testing Time , Effort, Noa of Coding Cost, Cost each hours ,some may be updated after each minutes . Faults Found Effectiveness Security : Web applications are on world network then there , Stability is need for securing the contents of web applications . strong Resources E security measures are to be taken for protecting the Person Age, Price, Productivity information and data of web applications al Experience, Meet the Business requirements web applications should Intelligence solve the purpose of business for which they are made . Teams Size, Communication level, Productivity, Various Types of Web Applications are Structure ness Quality Informational Softwa Price, size , Usability, Downloads re Reliability Customizable Hardw Price, Speed , Memory Size, Reliability, Interaction are User Input Offices Size, Temperature, Light Comfort , Transaction Oriented Quality
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 61
Portal metrics provide the way how these web applications Database Access behaves and what is the quality of these online applications . Software Metrics provides a basis for improving the Traditional Small e-Projects Major e- software process, Increasing the accuracy of project Projects Projects estimates , enhancing project tracking , and improving Requirement Rigorous Limited Rigorous software quality . Web Metrics if properly characterized , Gathering achieve all these benefits also improve the usability , web Technical Robust: Descriptive Robust UML Application performance , and user satisfaction [5]. Specifications Models Overview Models The goal of web metrics is to provide better quality of web spec applications from technical and business point of view . Project Measured Measured in Measured in Web Metric provides the measures of effort, time and Duration in month days weeks or months and complexity of web applications . Some of the measures of or years months years web applications are Testing & QA Focused E. Performance Metric on achieving Performance is related to availability and concurrency of quality web applications . when end user require the service of web targets applications and web applications get fail such condition Risk Explicit Inherent Explicit reduces the performance of Web applicationsw . The cause of Management failure may be any thing either due to network failure or Half 18 months 3 to 4 months 6 to 12 heavy load on servers Life or longer months Fig 1 shows an example of a typical web application Deliverables e architecture . in which web server take request from users Release Process Rigorous Limited Rigorous and passes the request to database server through Post Release – Requires Automatically Obtained application server . andi then result of database query will be customer Proactive obtained from both feedback shifted to client machine [8]. efforts user automatically Single set of web server, application server and database interaction and solicited server is giving the service to no of clients . with such type feedback of architectureV it is difficult to improve performance of web Data Warehousing applications. Table 2 B. Planning For Web Engineering Projects Firewall WEB APPL SERVER SERVER
In Table 2 the comparison of the traditional projects withy small e- Projects and Major e-Projects has been carried out Client . Traditional Software Projects and Major e- Projects have 1 substantial similarities . small e-projects havel special Client characteristic which differs them form traditional projects . 2 Even in case of small e-Projects planning must be occurred and risk must be considered , a r schedule must be Client DATA established and control must be defined so that confusion , 3 BASE frustration, and failure are avoided SERVER
a Client C. Project Management Issues for web applications 4 a) A Business must choose from one of the two web engineering issues (1) The web application is Client outsourced E–The web Engineering is performed by 5 some third party who has the expertise , talent and resources that may be lacking with in the business , (2) or the web application is developed in-house Figure 1 using web engineers that are employed by the Client business . A third alternative is there in which some n work is carried out In-House and some work is outsourced [4] . But if we improve such model and new model we make as shown in fig 2 in which two set of web server application D. Our Approach Towards Web Metrics server and database server give service to the clients then Web Engineering uses metrics to improve the overall load on each server reduces our performance metric says process for the development of web applications. These that response time decreases if total no of servers increases P a g e | 62 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
,Media Testing Time Taken to test all media in web Applications . Total Effort:Structuring effort +Interlinking effort + Interface building +link-Testing effort + Media Testing Response Time 1/ Total no of Servers efforts To reduce the response time increase the no of servers (1) Page Authoring
Text Efforts : Time Taken to author or reuse text in page, Page linking efforts :Time Taken to author link in page, page structuring efforts :Time Taken to structure page. Total page efforts: Text effort + page linking effort + page Structuring effort.
(2) Media Authoring
Media efforts: Time taken to author or reuse media files, Media digitizing: Time taken to digitize media Total media efforts: media effort + media digitizing effort [5] .
(3) Programming Authoring w Programming effort : Time taken to author HTML ,Java or related Language implementations Reuse effort : Time taken to reuse/modifye existing programming .
(4) iNavigability Measures Navigability describe the ease with which user find the Figure 2 desired information . navigability measure is important F. Security Metric for usability. A proper model of navigability reduces the access timeV .there are certain measures through which Web application are on world network then there is need for navigability can be increased e.g. hyperlinks depth securing the contents of web applications [6]. Strong ,hyperlinks breadths, topologies in connection with security measures should be taken to protect the information hyperlinks , some study have been done for the examination and data of web applications. of hypertext topologies on usability [7] .breadth maximum Inputs from user is the way through which security can be approach all links are there on a single page or home page reduced . while coding the web applications appropriatey so that user can move to the desired page just by single checks should be implemented on user inputs to maintain click but this approach is better only for informational the security of web applications . l websites like Rediff home page . e.g an input which is ready to take character type data Depth maximum approach all links are on different pages in should not take numeric data or any other special characters. a web application .depth is the no of clicks required to get Apply user ID and password on securer information .SQL the specific page from the home page . this approach is Injection attacks which are done by better where input is required from user by following the hackers should be avoided by positive tainting techniques specific no of steps . web site navigability can be evaluated [2]. in three ways with user survey , with usage analysis , and HTTP Cookies and server variablesa can be the cause for with navigability measurements [7]. poor security . if user may not perform any action for some Mainly there are four hyper text topologies present (1) period of time then cookies should get expired and Linear Topology (2) Strictly Hierarchical (3) Mixed application should askE for relogin and password .Defensive topology (Hierarchical Topology with cross –referential programming reduces the attacks hyperlinks ) (4) Non linear topology (a complete network based on a large no of cross referential links ) .Previous IV. MEASUREMENT OF TIME AND EFFORTS study finds that navigability decreases in the order (1) A Few measures of efforts and time are given below Linear , (2) Strict (3) Mixed (4) Complex . we can divide the Structuring efforts:Time to Structure Web Application Mixed Topology into three sub categories (1) Mixed Interlinking effort:Time to interlink pages to build the web Hierarchical with link to Home Page (2) Bottom up applications, Interface Planning : Time taken to plan web approach (3) Mixed Hierarchical with link at the same application Interface, Interface Building : Time Taken to level . In the first approach a link to home page is present Implement interface for web applications Link-Testing from every page , In second approach a link to previous effort :Time taken to test all links in web applications page is present from every page , and third approach is a link for every page at the same level is present .
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 63
V. REFERENCES
1) E. Stroulia, M. El-Ramly, P. Iglinski, and P.
Sorenson, ―User Interface Reverse Engineering in
Support of Interface Migration to the Web,‖
Automated Software Eng. J., vol. 10 no.3, pp.
271-301,2003.
2) Norman E . Fenton , ―Software Metircs ―
Thomson Publications, Fifth Edition , 2005
3) Sreedevi Sampath,Lori Pollock ―Appling Concept
Analysis to User-Session-Based-Testing of Web
Applications ― IEEE Trans. on Software
Engineering, Vol 33 , No. 10 pp. 643-657,Oct
2007
4) Powell, T.A. Web Site Engineering, Prentice Hall,
1998
5) Pressman Roger S, Software Engineering, McGraw
Hill, 2005 w 6) Ying Zou, ,Qi Zhang, Xulin Zhao , ― Improving
the usability of e-commerce applications using
business processes ― , IEEE Transactions on
Software Engineering ,Vol 33 , No 12 , pp. 837 – e
853 ,Dec 2007
7) Yuming Zhou,Hareton Leung, ―MNav : A Markov i ―,IEEE Trans. On Software Engineering ,Vol
33,No. 12, pp. 869-889 , Dec 2007.
William G.J. ,Alessandro Orso, Panagiotis ,‖
WASP V 8) Protecting Web Applications Using Positive
Tainting and Syntax Awareness ― IEEE Trans.
On Software Engineering ,Vol 34,No. 1, pp. 65-79
, Jan/Feb 2008.
y
l
r
a
E
P a g e | 64 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Measuring Helpfulness of Personal Decision Aid Design Model
Siti Mahfuzah Sarif 1
Norshuhada Shiratuddin 2
Abstract-The existence of countless computerized personal Hoog, 1983; Alidrisi, 1987; Todd & Benbasat, 1991). The decision aids has triggered the interest to investigate which existence of countless computerized personal decision aids decision strategy and technique are ideal for a personal (either in the form of website, software or spreadsheet) these decision aid and how helpful is decision aid to non-expert days, has triggered the interest to investigate the suitability users? Two categories of decision strategies have been and helpfulness of this technology to users, especially to the reviewed; compensatory and non-compensatory, which results in fusing the two strategies in order to get the best of both non-expert users. worlds. Findings from the study of focus groups show that II. BACKGROUND OF STUDY multi criteria decision method (MCDM) known as Pugh matrix and lexicographic have been identified as two most preferred lthough most personal decisions madew are minor in nature techniques in solving personal decision problems. Both, the and in terms of its consequences, but still, being able to strategies and techniques, are incorporated in the development make an actual decision out of any situation is indeed of a personal decision aid design model (PDADM). The essential (Rich,1999). Living in the 21st century, it is almost proposed model is then validated through prototyping method e impossible not to associate anything with computer in two different case studies (choosing development methodology in mobile computing course; and purchasing a technology and this includes decision making. The evidence of human limitationsi in information processing is mobile phone). In measuring the helpfulness of the prototypes, this study is looking at four dimensions; reliability, decision unquestionable, thus, the advantage of computerized making effort, confidence, and decision process awareness. The decision aids can be a major benefit for decision maker. findings show that the respondents from different decision situations perceived PDADM driven prototypes as helpful. VA. Research Problem Statement Keywords-Computerized decision aid, decision strategy, Decisions are part of human life. Decision majorly involves multi criteria decision method, helpfulness choices, and the hardest part is to make the right choice. It can be demanding to choose without being clear about what I. INTRODUCTION to choose and how to go about it, which later, may lead to uman commonly makes decisions of varying being indecisive. Moreover, indecisiveness may cause failed H importance on daily basis, thus, making the ideay of actions and tendency of being controlled by others seeing personal decision making as a researchable matter (McGuire, 2002; Arsham, 2004). This shows that, under seems odd. However, studies have proven that most humans appropriate circumstances, it is essential to apply decision l aid in making decision. are much poorer at decision making than they think. An understanding of what decision making involves, together Over decades, there are countless of studies on decision with a few effective techniques, will help produce better support technology that proposed the methods of improving r the performance of such technology at organization level. decisions. Thus, explains the existence of decision support technology at different levels in various fields; for instance However, in more recent years, the existence of in management, engineering and medicine. computerized personal decision aids (more examples and To date, the attentions given to athe improvement of decision reviews in section 3.2) are mushrooming and progressively support at organization level has been enormous. On the getting attention from users; for example like ―hunch‖ contrary, the study in improving the performance of decision (www.hunch.com) and ―Let Simon Decide‖ aid in personal decision making is still lacking and out of (www.letsimondecide.com). This shows the relevance of E study in issues related to computerized decision aids date (Jungermann, 1980; Wooler, 1982; Bronner & de ______pertaining to personal decisions. For more than five decades, most of research that have been About-1 Department of Computer Science, College of Arts and Sciences, carried out on decision process focuses either only on Universiti Utara Malaysia, and is currently pursuing her PhD degree. She descriptive aspect (studying how decisions are being made) specializes in software application development and multimedia design. (email: [email protected]) or normative aspect (studying how some ideally logical About-2 Professor and Applied Science Chair at the College of Arts and decider would make decisions). Decider in this context is Sciences, Universiti Utara Malaysia. She obtained her PhD from referring to decision aid. Prescriptive research on decision University of Strathclyde, Glasgow, UK and has published more than 100 processes, on how to help the decider progress from the papers in journals and proceedings. She specializes in design research and application engineering. descriptive to the normative has, however, been scarce (email: [email protected]) (Brown, 2008). This is also has been mentioned earlier in (Bell et al., 1988).
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 65
The term computerized decision aid refers to a very diverse decision and ensuring the helpfulness of the model via set of tools based on a varying techniques and complexity. prototyping method. Generally, decision aids are designed with aims to help human choosing the best decision possible with the Among the topics reviewed from the literatures include knowledge they have available. However, creating effective decision making, multi criteria decision making (MCDM) decision aids is more than meet the eyes (Power, 1998). methods, computerized decision aids, related decision Complex and structured mathematical techniques that theories, and aspects of helpfulness of information systems correspond to the uncertainty of a decision situation have in general and decision support in particular. long held great theoretical appeal for helping decision A. Decision Strategies and Techniques makers make better decisions. Studies by Hayes and Akhavi (2008), Adam and Humphreys (2008), Zannier et al., (2007) Personal decision normally involves evaluation of many and Law (1996) do not agree with the earlier statement. choices and making selection out of many. Generally, there Hayes and Akhavi (2008) also affirmed that ―decision aids are various strategies and techniques in making decision. based on mathematically correct and sophisticated models This study focuses on decision making problems when the do not actually improve the decision making performance. number of the criteria and alternatives is finite, and the This is due to how the decision aids frame the problem in a alternatives are given explicitly. Problems of this type are way that does not fit human decision making approaches‖. called multi attribute decision making problems. Furthermore, although uncertainty can be tackled using Compensatory and Non-compensatory Strategies complex mathematical tools, but more often than not, w decision maker will not have the time to implement the The decision strategies are commonly divided into two structured mathematical strategies (McGuire, 2002; Arsham, broad categories, non-compensatory and compensatory. 2004). These are further supported in Alidrisi (1987) and Ullman (2002) defines non-compensatory strategies using the example of one well documentede non-compensatory Adam and Humphreys (2008). All the researchers agreed that as far as personal decision making is concerned, strategy; the lexicographic method. complex and structured mathematical techniques are not As for compensatory istrategies, Ullman (2002) defines it as preferred. Evidently, this indicates that a simple decision strategy which allows decision makers to evaluate the making model is a more needed solution when compared to alternatives by balancing the strong features of the the rigorous criteria weighing analysis. alternatives with its weaker features. Example of methods All else being equal, decision makers prefer more accurate that supportV compensatory strategy is decision matrix and and less effortful choices. Since these desires are utility theory methods. conflicting, thus selecting suitable strategy for the aid Lexicographic method strategy can be a tricky task (Payne, 1993; Naude, 1997; Al- Shemmeri et al., 1997; Zanakis et al., 1998). Then again, the In the lexicographic method, criteria are ranked in the order appropriate use of decision strategies can contribute to of their importance. The alternative with the best effective decision making (Cosier & Dalton, 1986). y performance score on the most important criterion is chosen. If there are ties with respect to this criterion, the B. Research Objectives performance of the tied alternatives on the next most With the nature of the problem in mind, this studyl aims to important criterion will be compared, and so on, till a unique propose a personal decision aid design model that is alternative is found (Linkov et al., 2004). perceived helpful. The following specific aims are outlined Maut in means to support the general aim: r Multi-attribute utility theory (MAUT) is seen as an ideal i. To identify the appropriate decision strategy and approach for personal decision making by many previous decision technique for apersonal decision making researchers due to the nature of the decision problem. This is ii. To incorporate identified decision strategy and supported in a number of studies (Bronner & Hoog, 1983; technique in the development of the personal Alidrisi, 1987; Işıklar & Büyüközkan, (2007); Adam & decision aid design model Humphreys, 2008). In a study, Adam and Humphreys iii. To validate Ethe personal decision aid design model (2008) described that, ―MAUT is simple enough to in different situations via prototyping method implement as compared to other model of decision making iv. To measure the users‘ perceived helpfulness of the which requires a more rigorous criteria weighing analysis prototypes that is not necessarily needed for the role of decision making‖. Pugh’s Method III. INTRODUCTION TO DECISION TECHNIQUES Pugh's method is known as the simplified MAUT which was Apparently, a working knowledge of decision theory is first introduced by Pugh (1990) as the method for concept needed before embarking into developing a decision aid selection in engineering decision. In Pugh approach, all design model. The design of the model includes two alternatives are compared to a datum alternative on each important expectations which are to accomplish a better criterion. Alternatives are either better (+1), worse (-1), or the same (0) as the datum for a given criterion. The score for P a g e | 66 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology each alternative is calculated as the number of occurrence of Analytic Hierarchical Process (+1) minus the occurrence of (-1). Emphasis was placed on The Analytic Hierarchy Process (AHP) is a multi-criteria using these comparisons to try to improve the weaknesses decision-making approach and was introduced by Saaty (i.e., the –1‘s) of an alternative without weakening any (1977 and 1994). The AHP has attracted the interest of strength (i.e., +1‘s). many researchers mainly due to the careful mathematical Weighted Decision Method properties of the method and the fact that the required input Weighted decision matrix involves mathematical reasoning data are rather easy to obtain. The AHP is a decision support in solving single or multi attribute decision problems. Two tool which can be used to solve complex decision problems. examples of weighted decision matrix are Weighted Sum It uses a multi-level hierarchical structure of objectives, Model (WSM) and Weighted Product Model (WPM). WSM criteria, sub-criteria and alternatives. is probably the most widely used approach, especially in single dimensional problems (Triantaphyllou, 2000). If there Pros and Cons Analysis are m alternatives and n criteria then, the best alternative is Pros and Cons Analysis is a qualitative comparison method the one that satisfies the following expression (Fishburn, in which good things (pros) and bad things (cons) are 1967): identified about each alternative. Lists of the pros and cons, n * based on the input of subject matter experts, are compared max aij w j AWSMscore i one to another for each alternative. The alternative with the j1 , for i = 1, 2, 3 …m strongest pros and weakest cons is preferred. The decision WPM shares almost similar concept with WSM. The main documentation should include an exposition,w which justifies difference is that instead of addition in the model there is why the preferred alternative‘s pros are more important and multiplication. Each alternative is compared with the others its cons are less consequential than those of the other by multiplying a number of ratios, one for each criterion. alternatives. Pros and Cons e Analysis is suitable for simple Each ration is raised to the power equivalent to the relative decisions with few alternatives and few discriminating weight of the corresponding criterion. In general, in order to criteria of approximatelyi equal value. It requires no compare two alternatives AK and AL, the following product mathematical skill and can be implemented rapidly (Baker has to be calculated according to this expression (Bridgman, et al., 2002). 1992; Miller & Star, 1969): B. Computerized Personal Decision Aids n wj R A | A a | a A numberV of computerized decision aids have been K L Kj Lj j1 identified. The aids come in varying mediums like website, , where n is the number spreadsheet, software and web application. All of the of criteria, aij is the actual value of i-th alternative in terms identified aids can be used to assist in personal decision of j-th criterion, and wj is the weight of importance of the j- making and also in other type of decision problems like th criterion. If the term R (AK|AL) is greater than or equal financial and management problems. Table 3.1 summarizes to one, then it indicates that alternative AK is more desirabley eight computerized decision aids along with the reviews. than alternative AL. The best alternative is the one that is The number of aids reviewed in this study is meant to be better than or at least equal to all other alternatives.l representative.
r Table 3.1: Computerized decision aids Z Typea Method/ Technique Description Reviews 1) Hunch (2009) Decision Collective A decision community website the interactivity is (www.hunch.com) engine (web) intelligence uses machine learning based on statistical intuitive but involves decision making, inferences (the system gets smarter as more series of steps machine learning & users use it) (answering questions) E uses question selection algorithm to Involves a lot of decision trees a) find a question which will statistical analysis in discriminate well among the the back end (very remaining possible recommendation complex)
outcomes for user Does not involve b) looks for a question which can help defining importance optimize and rank the remaining of criteria (rank the recommendation outcomes to present criteria) you with the ones you'll like the most
2) Let Simon Decide (2009) Decision Collective consists of three decision making tools: involves complex (www.letsimondecide.co engine (web) intelligence a. My Scores: for logical, fact based mathematical m) decision, weighted decision with multi-alternatives approach to decision- b. My Life Match: for big, life-changing making decisions requires many steps
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 67
decision analysis c. My Points of View: for quick decision although the process combines user qualitative input with a is intuitive weighted, mathematical formula (weighs alternatives against proprietary profile) enables collective learning – share decision summary with others provides action plan for every decision 3) Choose It! (1999) Web Decision Matrix Online decision making tool that use does not acknowledge (chooseit.sitesell.com/) application decision matrix concept the distinct difference can be used to make important business, between subjective financial, and personal life decisions and objective factors
4) Management For The Spreadsheet Decision Matrix based on classic decision grid concept crowded text in the Rest of Us in Excel spreadsheet format which contains: visual presentation (MFTROU.com) a. Overview of how to make decisions Very formal Decision Making Tool b. Decision Making Example presentation (in excel (n.d.) c. Template for Making Your Own environment) (www.mftrou.com/decisio Decision n-making-tool.html)
5) Decision Oven (2008) Software Decision matrix Off the shelf decision support software acknowledge the (decisionoven.com/) with mathematical can be used to support personal or business wdifference between reasoning decisions defining subjective criteria and objective criteria 6) EduTools Decision Web Weighted decision use a rational decision making processe Only focus on Engine (2009) application matrix selecting a course http://ocep.edutools.info/s management system, ummative/index.jsp?pj=4 i not for generic decision User have to be familiar with the products and features that they wish to V compare 7) Career Decision Making Instructor- Guidelines and It‘s a career decision making tool Only focus on career Tool (CDMT) (n.d.) led, teaching/learning It suggests the following decision cycles: decision making, not (http://cte.ed.gov/acrn/cd classroom- material a) Engaging for generic decision mt/tool.htm) based online b) Understanding To be implemented in c) Exploring teaching/learning tool d) Evaluating environment y e) Acting f) Reflecting 8) Super Decisions (2004) Software Analytic Network It extends the Analytic Hierarchy Process Use complex decision http://www.superdecision Process l (AHP) analysis with rigorous s.com/ Uses same fundamental prioritization mathematical process based on deriving priorities through reasoning r judgments on pairs of elements or from Solve for complex direct measurements. decision problem C. Theories in Modeling Decision Aid Process Decision theory is an attempt toa explicate how human make making is crucial to the effective design of the decision aid. decision, and in helping us understand the process of Therefore, this study discusses a number of related theories decision making. A grasp of the fundamentals of decision that contribute to understanding multi criteria decision making. The related literature is summarized in Table 3.2. E
P a g e | 68 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Table 3.2: Literature survey of related decision theories Decision Theories References needs for a solution to propose a proper decision making Multi Attribute Baker et al. (2001); Alidrisi (1987); model for personal decisions. Utility Theory Dyer et al. (1992); Keeney & Raiffa Expert Interview (1993); Collins et al. (2006) Interviews with experts in the related field were conducted Behavioral Einhorn & Hogarth (1981); Westaby to identify relevancies of the addressed problems. Decision Theory (2005) Discussion with the experts involves brainstorming of idea, Bounded Bahl & Hunt (1984); March & Simon approval of idea and reviews on research material. Three Rationality Model (1958); Newell & Simon (1972) experts have been referred to during this stage and also at Implicit Favorite Bahl & Hunt (1984); Soelberg (1967) certain stage of this study. The experts are professors and Model academics specializing in one of these fields: model-based Dominance Theory Easwaran (2007); Zsambok et al. systems and qualitative reasoning, quantitative analysis; and (1992) artificial intelligence. Satisficing Theory Zsambok et al. (1992); Simon (1956) B. Solution Design IV. RESEARCH METHODOLOGY In the second phase, the solution is designed and proposed. This study employed design science approach to address the After identifying the research problems and evaluating its relevance, a solution is developed in w the form of artifacts. research questions posed earlier. The selection of a suitable approach is based on the nature of a research, phases Varying methods are used to come out with all the artifacts involved and research outcomes. March and Smith (1995) including content analysis, expert review, focus group study, described design science research as a process which aims to participatory design, prototypinge and elicitation work. ―produce and apply knowledge of tasks or situations in order C. Evaluation to create effective artifacts‖ in order to enhance practice. In this study, evaluationi is achieved by the mean of case In general, process in design science research can be studies and laboratory experiments. The findings of this structured into three main phases include ―problem stage are further explained in Result section. identification‖, ―solution design‖ and ―evaluation‖. Clearly, design science research consists of a series of steps but in V. DEVELOPMENT OF PERSONAL DECISION AID DESIGN V MODEL (PDADM) practice they are not always executed in sequence; they often are performed iteratively. This study implemented the This section describes the process in developing the following steps, adapted from Offermann et al. (2009), and PDADM. Prior to this, an appropriate decision strategies for driven by design science research approach. personal decision making need to be identified, and A. Problem Identification followed by a selection of appropriate decision technique (i.e. MCDM method). Afterward, both will be incorporated The phase is divided into the following steps: ―identifyy in the development of the decision aid design model. The problem‖, ―literature research‖ and ―expert interviews‖. It method used in developing PDADM involves content specifies a research question and verifies itsl practical analysis, participatory design and expert review. relevance. As a result of this phase, the research questions are defined. A. Decision Strategy Selection r From the literature search, two common decision strategy Identify Problem groups are studied; non-compensatory and compensatory. The existence of countless computerized personal decision Findings indicate that non-compensatory strategies do not aids, these days, has triggered athe interest to investigate the allow very good performance relative to one criterion to relevance and helpfulness of ICT assistance in personal make up for poor performance on another. In other words, decision making. Offermann et al. (2009) provides the no matter how good an alternative is, if it fails on one support for the identification of research problem in this evaluative criterion, it is eliminated from consideration. study, of which, theyE stated that researchable material ―may As for compensatory strategies, they allow the decision arise from a current business problem or opportunities makers to balance the good features of an alternative with its offered by new technology‖. weaker features. Additionally, the compensatory strategies Literature Search give greater accuracy in decision but the non-compensatory strategies take the least time to accomplish decision In order to identify the research problem, literature search is In responding to the earlier discussion, this study decided to used. As a summary, a number of decision strategies, combine the implementation of compensatory and non- decision techniques (MCDM methods), computerized compensatory strategies in order to obtain the ―best of both personal decision aids, and decision making related theories worlds‖. This is supported by Ullman (2002) in his work were reviewed in this study. This results in strengthening the which stated that ―a method that gives the accuracy of the compensatory strategy with the effort of the no
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 69 compensatory strategy would add value to human decision making activities‖. that it would take even more time and effort to achieve B. Decision Technique Selection decision with the PUG and WSM. It is noted that AHP In light of the numerous decision techniques available to scores the lowest response for all three questions, which is decision makers, study of focus groups is used in order to due to refusal of most respondents to utilize it. Evidently, get some understanding of which kind of techniques that is from this focus group study, PUG and LEX are selected as more preferred by the (non-expert) decision maker. This the study also decided that introducing more than one would potential techniques to be incorporated in the design of enhance focus groups abilities to understand that there is not proposed personal decision aid design model. a single right way to resolve a decision. There are five techniques that were introduced to the focus PUG or Pugh matrix is originally a concept selection group of 51 (non-expert) participants of varying method used by engineers for design decision (Pugh, 1990). demographic background; weighted sum method (WSM), Since it was introduced, there have been many different Pugh matrix (PUG), Analytic Hierarchy Process (AHP), pro modified versions of Pugh matrix analysis in various and cons analysis (PCA), and lexicographic (LEX). All examples of its applications. In line with this, a participatory methods involve defining criteria on which to compare a set design study was conducted to learn which implementation of alternatives. The group was encouraged to solve the same of the Pugh matrix is preferred and suitable with the non- decision scenario (choosing a laptop from 4 different expert decision is making style. There are five versions (see brands) using each or at least three of the techniques Appendix) of Pugh matrix approach (includingw the original) mentioned above one at a time. This study did not make it used in this participatory design study. A total of 66 compulsory for them to use all the techniques, because of participants of varying demographic background were varying rate of understanding of the techniques after first involved in this study. e time being introduced to them. Hence, unutilized techniques show respondents‘ difficulty to understand and to get Firstly, the participants were briefly explained about the familiar with it. different implementationsi of the Pugh matrix method. Then, After establishing the focus group previous experience with they were asked to solve a designated decision problem each decision technique, the group was asked which (choosing a laptop from four different brands) using all four technique helped the most and which they had more versions; one at a time. Later, the participants were asked confidence in. Next, the group was asked which tool they ten questionsV (refer Table 5.3) based on their experience think is ―least prone to bias‖. using the different implementation of Pugh matrix and also The results from the survey are summarized for each three additional demographic questions on gender, IT skill question. The first two questions concerned (i) which and age technique that they think helped the most if they were to use Table 5.3: Questions asked in the participatory design study it in real decision and (ii) which technique they had the most confidence in. As shown in Table 5.1, technique PUG yand No. Question LEX scored among the highest number of respondents for Q1 Are you familiar with the use of Pugh matrix? both questions. l Q2 Do you find it difficult to choose the first reference? Table 5.1: Helpful and Confidence Q3 Do you prefer to weigh or not to weigh the criteria? Q4 Do you prefer to use percentage (%) or scaled WSM PUG AHPr PCA LEX values (e.g. 1 to 5) as weight? Helpful 21 39 3 19 43 Q5 Do you prefer to use comparative symbols (+, -, S) or scaled values (e.g. 1 to 5) to rate the alternatives? More 14 31 3 15 45 Q6 Which version of Pugh matrix do you think is most confidence in a helpful? Q7 Which version of Pugh matrix you had more The next question asked the group which technique they felt confidence in? was least prone to E bias (that is, is the most difficult to Q8 In your opinion, which version is least prone to manipulate to achieve preconceived results). These results bias? are shown in Table 5.2 Q9 Would you use either of these Pugh matrix approach in your real life decision? Table 5.2: Bias Q10 Would it be easier if Pugh matrix process is WSM PUG AHP PCA LEX automated (i.e. in a computerized format)? Least prone to bias 34 41 2 18 22 All the responses from participants were recorded and summarized in the following tables (Table 5.4 to 5.12). The Interestingly, even though majority of the participants had first question dealt with the previous experience of the more confidence in LEX, the score changes when it comes participants with Pugh matrix method. As shown in Table to biasness of the technique. More than half of them felt that 5.4, majority of the participants had not used the Pugh PUG was less prone followed by second the highest scored approach before this study technique; the WSM. Nevertheless, the participants noted P a g e | 70 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Table 5.9: Helpful and confidence Table 5.4: Familiar with Pugh matrix
Yes No NA* Original MV1 MV2 MV3 MV4 NA Familiar? 9 57 0 Helpful 22 11 13 7 8 5 *=No answer More 21 10 14 8 10 3 The next question asked about participants experience confidence in during the study when they were required to choose their MV=modified version own reference for comparative analysis in Pugh matrix take place. As shown in Table 5.5, more than half of the Even though majority has more confidence in the original participants claimed that it is not a problem for them to version, but when asked about which version they think is perform that task. But the number of participants who least prone to bias, the majority score shows contrasting claimed the opposite was not far behind. response. One third of the participants agreed MV2 Table 5.5: Difficulty to choose first reference (modified version #2) is the one least prone to bias. Table 5.10: Bias Yes No NA Original MV1 MV MV MV4 NA Difficult? 24 42 0 2 3 The third and fourth questions asked about participants Least 15 11 22 w10 4 4 experience with the use of weight in defining the importance prone to of each of the evaluative criteria. As shown in Table 5.6, bias majority of the participants preferred to weigh their criteria e during the process. From this majority group, 35 of them Concerning the use of Pugh approach in real decision preferred weighing the criteria using scaled values than situation, 49 of 66 indicated that they will consider using using percentage (Table 5.7). This number represented more this approach, 16 indicatedi that they would not, and one did than half of the participants. not respond to this question (refer Table 5.11). Table 5.11: Will use Pugh matrix in real situation Table 5.6: Weighing criteria Yes No NA Yes No NA Will Vuse Pugh approach in 49 16 1 Weighing criteria 42 21 3 real situation? Table 5.7: Use percentage or scaled values for weighing Lastly, when asked whether the participants think that by automating the process of Pugh matrix (in computerized Percentage Scaled NA format) will make it easier to use this approach, majority of Values them answered yes. From 12 of the remaining participants y who answered no, 7 of them appeared to claim themselves Preferred weighing 26 35 5 as having very less IT skill. criteria l Table 5.12: Automate Pugh matrix The fifth question asked the participants if they prefer to use Yes No NA symbols; + for better, - for worse and + for equal); or scaled Automating Pugh approach 54 12 0 value to perform the comparative analysisr of alternatives makes it easier? against the reference on each criterion. Majority agreed that C. Incorporating the Decision Strategy and Decision the use of symbols is more convenience for the comparative Technique in PDADM analysis. a Table 5.8: Use symbols or scaled values The results; decision strategies and techniques, obtained Symbols Scaled Values NA from previous focus group study are incorporated in the Preferred development of personal decision aid design model. The evaluation E52 12 2 model comprises of the flow of the decision process and the styles relationship between input and outcome of each step of the process. Figure 5.1 illustrates the previous statement clearer. The next two questions (question 6 and 7) dealt with participants experience after using the Pugh approach to solve the decision problem. As shown in Table 5.9, the obviously dominant choice for both questions is the original version. The participants, as a whole, not only felt like the original version helped the most in assisting them with decision problem, but they had more confidence in it.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 71
Pre-Decision Post-Decision Decision Process Process Process Non-Compensatory Compensatory Strategy Strategy
Problem Identification & Change N Definition Identification Datum on set of Reduction of Evaluation of Suggestion Filter alternatives selected Modified Alternatives Acceptable alternatives Lexicographic alternatives Pugh’s Method (categorically) Y First Choosing Objective datum Decision Criteria Criteria Weighing Subjective Criteria
N Confirmation or Re-evaluation
w Y e Action i
Figure 5.1: Personal Decision Aid Design Model (PDADM)
VI. IMPLEMENTING PDADM IN DIFFERENT SITUATIONS
The proposed PDADM is validated through development of V two prototypes in two different case studies; choosing exposed to the importance of adopting a suitable development methodology in mobile computing course; and methodology for a mobile development project. purchasing a mobile phone. These case studies involved two Selecting a suitable development methodology for mobile very different decision situations which were intended to development project is another challenge in itself (Bertini et showcase the flexibility and functionality of the proposed al., 2006; Heikkinen & Still, 2005; Atkinson & Olla, 2004; model. y Heyes, 2002; Afonso et al., 1998). Less experienced developers will find the task even more challenging, thus,
A. Case study 1: Choosing a Development Methodology this study seeks to propose a solution by implementing the in Mobile Programming Course l proposed PDADM via a development of prototype named as Over the last decade, mobile computing has received md-Matrix (as in mobile development methodology matrix). significant interest in the academic and r industrial research Features and Screenshots of md-Matrix community. As a result, demands from the industry for This decision-making tool is mainly aimed at assisting graduates of mobile computing course are rising (Gillespie, developers (especially the novice) in choosing the most 2007). appropriate development methodology for mobile The graduates who are enteringa the mobile development development project. The numbers of available development world are expected to put up with the challenges imposed by methodologies in md-Matrix are meant to be representative; the mobile environment. Heyes (2002) reported that mobile only for the purpose of demonstrating the decision process developers face twice as much as challenges than that occur in selecting a mobile development methodology. developing traditionalE system application due to the specific The prototype of md-Matrix features the following (see demand and technical constraints of mobile environment. In Table 6.1): addition to that, inadequate research in assisting developers Table 6.1: Features of md-Matrix with the mobile development issues is also highlighted in Md4-Matrix the GI Dagstuhl Research Seminar in 2007 (König-Ries, Alternatives Mobile application 2009). filter technologies: Within this perspective, it is believed that selecting a Generic suitable development methodology is the key to these issues. J2ME* The use of a methodology is important, as a project can be Flash Lite* structured into small, well-defined activities where the Native sequence and interaction of these activities can be specified Web based (Avison & Fitzgerald, 1990). Hence, students should be P a g e | 72 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Object Oriented Platform dependent Criteria 12 objective options (methodologies) following the non-compensatory 12 subjective strategy (lexicographic process). The three highest scored Alternatives Flash Lite (4 methodologies) methods (see Figure 6.3) which pass most of the selected J2ME (4 methodologies) criteria will be ranked accordingly and the one in the highest Feedback Pop-up window rank will be set as the first reference (datum). Next, the three On screen text identified methods from previous step will be compared to Interface agent each other following the compensatory strategy (modified * enabled in this prototype Pugh‘s method) based on preferred subjective criteria (Figure 6.4). The steps can be iterated in maximum 3 cycles The first step of md-Matrix enables user to filter the where in each round the reference will be changed until each available methodologies based on preferred technology for methodology will be a reference once. The dominance development of a mobile application (Figure 6.1). As it methodology from the 3 rounds will be suggested as the best proceeds with the second step (Figure 6.2), users will make selection. The following are screenshots of md-Matrix: their selection of narrative criteria to further filter the w e i V y
Figure 6.1: Alternatives filtered categorically l r a E
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 73
Figure 6.2: The 12 objective criteria used in non-compensatory (lexicographic) process
w e i
V Figure 6.3: Result obtained in non-compensatory process y l r a E
Figure 6.4: The 12 subjective criteria used in compensatory process P a g e | 74 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology md-Matrix as a Learning Tool or displaying it to the decision maker (consumer) in a much clearer way than simply making a list of the alternatives. Along providing solution to the selection of development Within this perspective, the proposed PDADM is methodology, md-Matrix also can be utilized as an implemented in assisting consumers to make purchasing educational tool either in academic or industry. Learning decision via the use of the prototype known as ep-Matrix (as institutions can utilize it for teaching purposes to educate in electronic purchasing matrix) students on the need to have a well-structured process of Features and Screenshots of ep-Matrix developing mobile applications. As for the industry, this tool The prototype (ep-Matrix) is developed to demonstrate an can be used as one of the materials for training of new example of making a purchasing decision of a mobile interns and apprentice developers. phone. A well know brand of mobile phone is used for three reasons; the convenience of getting all the required data, the B. Case study 2: Choosing a Mobile Phone familiarity factor among consumers and for the purpose of Consumers are faced with purchase decisions mostly every evaluation later on. Table 6.2 summarizes the features of ep- time when a purchase is required. But not all decisions are Matrix that is developed for this case study: treated the same. Some decisions are more complex than Table 6.2: Features of ep-Matrix others and thus require more effort by the consumer. Other Z decisions are fairly frequent and require little effort. Alternatives Mobile phone styles: Consumers will not simply go to a store or online catalog filter Bar and spend their money in a rush. Purchasing takes place Slider* w usually as a result of series of decision making steps. The Touch Screen implication of buying behavior shows the need for a reliable Folder/Flip decision making tool to assist consumers in making a less- QWERTYe regretful and effective decision (Häubl & Trifts, 2000; Chris, 2008). Criteria 13 objective It is also important for the consumers to be able to decide on i 9 subjective the purchasing item with confidence and ease. Thus, a Alternatives Slider (6 models) comprehensive and undemanding decision aid is much Feedback Pop-up window, on-screen needed in the process. Another important aspect is the use of text, decision aid in raising awareness about the consequences of V Interface agent actually choosing the item and purchases it. This could be * enabled in this prototype obtained by organizing data with the purpose of presenting The first step of ep-Matrix enables user to filter the available phone models based on preferred style (Figure 6.5). As it proceeds with the second step (Figure 6.6), users will make their selection of objective criteria to further filter y the options (phone models) following the non-compensatory strategy (lexicographic process). The three highest scored models (see Figure 6.7) which pass most of thel selected criteria will be ranked accordingly and the one in the highest rank will be set as the first reference (datum). Next, the three identified models from previous step willr be compared to each other following the compensatory strategy (modified Pugh‘s method) based on preferred subjective criteria (Figure 6.8). The steps can be iterateda in maximum 3 cycles where in each round the reference will be changed until each model will be a reference once. The dominance model from the 3 rounds will be suggested as the best selection. The following are screenshotsE of ep-Matrix
Figure 6.5: Alternatives filtered categorically
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 75
Figure 6.8: The 9 subjective criteria used in compensatory process (modified Pugh‘s method)
VII. HELPFULNESS OF PDADM DRIVEN PROTOTYPES
This study intends to investigate users‘ perception towards helpfulness of the PDADM driven prototypes in both case studies. In measuring helpfulness, quantitative data need to be gathered through an instrument. In addition to that, subjective input through interviews and observations might help enriching the collected data. To develop the instrument for measuring helpfulness, an elicitation work as summarized in Figure 7.1 was performed (Ariffin, 2009)
Figure 7.1: Summary of elicitation work Figure 7.1 illustrates the processesw involved in the instrument development; beginning with elicitation works to determine measuring items until the instrument is ready for pilot testing. The instrumente was constructed based on the dimensions identified from elicitation work. Later, Figure 6.6: The 13 objective criteria used in non- measuring items were added based on the reviewed compensatory (lexicographic) process literatures. Some modificationsi are made to the measuring items, in terms of rewording some items and repositioning some items into another dimension of the instrument. In measuring the helpfulness of the PDADM driven prototypes,V this study is looking at four important dimensions; reliability, decision making effort, confidence, and decision process awareness. The instrument was then named as Q-HELP, which contains four dimensions: reliability, decision making effort, confidence, and decision process awareness y Table 7.1 illustrates the reliability of Q-HELP by each dimension. In the evaluation, respondents are required to l rate the helpfulness level based on each dimensions using the seven point Likert scales; which are 1 = strongly disagree, 2 = disagree, 3 = somewhat disagree, 4 = r undecided, 5 = somewhat agree, 6 = agree and 7 = strongly agree. Respective measuring items can be seen in Table 7.2 Figure 6.7: Result obtained in non-compensatory process Table 7.1: Reliability of dimensions in Q-HELP Dimensions Cronbach Alpha value a Reliability 0.755 Decision making effort 0.689 Confidence 0.906 Decision process 0.771 E awareness
One hundred and seven respondents participated in the lab experiment; 63 of them were evaluated for the first case study where as 44 for the second case study. The experiment proceeded in two steps for each case study. In the first step participants were required to accomplish the selection task aided by other tool or material. The main concern is to study the process that they went through before they can actually make a selection. In the second step, participants solved the same decision problem by making selection with the
P a g e | 76 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology assistance of proposed PDADM driven prototypes in each same decision problem in the experiment. Table 7.2 also case study. depicts the mean responses for each item in Q-HELP Upon completion of both steps, participants were requested answered by participants in respective case studies. to answer 26 questions from all four dimensions of helpfulness in Q-HELP. The instrument recorded their perceptions and experiences of making a selection for the
Table 7.2: Q-HELP items and mean responses by each item for each case study
Reliability md-Matrix ep-Matrix n=63 n=44 {name of prototype}* can be relied to function properly. 5.22 5.84 {name of prototype}* is suitable to my style of decision making. 5.02 5.43 {name of prototype}* is capable of helping me in making a choice. 5.25 5.80 {name of prototype}* provides the help that I need to make a selection. 5.33 5.75 {name of prototype}* provides the advice that I require to make my decision. 5.08 5.64 I would use {name of prototype}* if I were attempting to make a choice that is ―good enough‖ but 4.95 5.82 not necessarily the best. {name of prototype}* is suitable even during limited time to make a decision. 5.03w 5.82 Group Mean A 5.13 5.73 Decision making effort It was very time consuming to choose a {item} from the available options. 4.81 5.39 e It was very difficult to choose a {item} from the available options. 4.43 5.27 {name of prototype}* allowed me to carefully consider the decision made. 5.35 5.84 The decision process in {name of prototype}* is logical to me. i 5.30 6.14 The decision process in {name of prototype}* is simple to me. 5.19 5.91 I understand how decision process in {name of prototype}* works. 5.17 5.70 I found it very easy to interpret the decision justification provided by {name of prototype}*. 5.06 5.77 Group Mean B V 5.04 5.72 Confidence I am satisfied with the recommended solution. 5.27 5.75 The recommended solution reflects my initial preferences. 5.16 5.61 I am confident that I am able to make selection with {name of prototype}*. 5.17 5.86 I am confident that I can justify the selection that I made with {name of prototype}*. 5.17 5.93 I feel that the problem in making selection is solved. y 5.05 5.45 I am very pleased with my experience using {name of prototype}*. 5.48 5.77 Group Mean C l 5.22 5.73 Decision process awareness {name of prototype}* makes me realize I cannot get everything from just one alternative. 5.44 5.93 {name of prototype}* is an aid for me in clarifying what I want. 5.27 5.84 r {name of prototype}* shows my subconscious decision process. 5.11 5.73 {name of prototype}* helps me not to be easily influenced by others in making selection. 5.29 5.98 {name of prototype}* makes me more independent of others in making a selection. 5.22 6.00 I learned a lot about the problema using {name of prototype}*. 5.48 6.00 Group Mean D 5.30 5.91 moderately high perception on reliability. In case study 2, *replaced with md-MatrixE or ep-Matrix based on respective case studies the group mean score of the same items was 5.73, indicating high level of reliability. VIII. RESULTS Question B1 to B7 are used to assess the user‘s perceptions As mentioned earlier, the instrument used in evaluating the on effort invested in the decision making process with the helpfulness of the PDADM driven prototypes is looking at assistance of PDADM driven prototypes. For case study 1, four important dimensions; reliability, decision making the group mean score for items in dimension B was 5.04, effort, confidence, and decision process awareness. Table signifying moderately high perception on decision making 7.2 presents means of responses to the items in measuring effort among respondents. As for case study 2, the group the helpfulness of the prototypes in both case studies. mean score of the same items was 5.72, indicating high Questions A1 to A7 are used to assess the user‘s perceptions perception on the decision making effort. on reliability of the prototypes. For case study 1, the group Question C1 to C6 are used to assess the confidence level of mean score of items in dimension A was 5.13, indicating respondents in solution and procedure applied in the decision aids. In case study 1, the group mean score was
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 77
5.22, representing moderate confidence level among respondents. As for the second case study, the group mean Perceived Reliability of Both Decision Aids score was 5.73, indicating higher confidence level among 6 5.8 5.82 respondents after using the PDADM driven prototypes. 5.8 5.84 5.75 5.82 For the last dimension of the instrument, six items (items D1 5.6 5.64 5.4 5.43 to D6) have been asked to the respondents in order to 5.33 measure their perception on decision process awareness. In 5.2 5.22 5.08 5.03 5.25
Average 5 case study 1, the group mean score of the last six items in Q- 5.02 HELP was 5.30, representing moderate perception score on 4.8 4.95 decision process awareness among respondents. For case 4.6 4.4 study 2, the group mean score was 5.91, signifying high 1 2 3 4 5 6 7 perception score on decision process awareness. Items in instrument md-Matrix From the analysis above and as can be summarized in ep-Matrix d p Figure 6.9 , generally the mean scores of each dimension Figure 6.10: Perceived reliability of m -Matrix and e - fall under category moderately high or high, indicating that Matrix participants were incline to perceive the use of PDADM driven prototypes as helpful even in different personal decision situations. In both prototypes, participants rated Perceived Confidence of Both Decision Aids highly on decision process awareness, this is followed by w 6 their perceived confidence and reliable in the decision aids. 5.93 5.75 Upon further analysis, participants responded highly on the 5.8 5.86 5.77 items under reliability and confidence as depicted in Figure 5.6 5.45 5.61 e 5.48 5.4 5.27 6.10 and 6.11. Therefore, it can be concluded that both 5.17
decision aids: 5.2 Average i. provide the help that participants needed to make a 5 5.16i 5.17 5.05 selection, 4.8 ii. can be relied to function properly 4.6 iii. are capable of helping participants in making a 1 2 3 4 5 6 Items in instrument md-Matrix choice V ep-Matrix Also, the participants were: Figure 6.11: Perceived confidence in md-Matrix and ep- Matrix i. very pleased with their experience using the decision aids IX. CONCLUSION ii. confident that they can justify the selection that have been made with the decision aids y Despite the existence of various computerized decision aids, iii. satisfied with the recommended solution decision maker perceptions of the ideal decision strategy and technique have not been subjected to systematic l investigation. In doing so, this study seeks to contribute the
Group Means for Helpfulness Dimensions following, along achieving the previously stated objectives:
6.00 i. In general, this study will contribute to decision 5.91 5.80 r making area as well as cross-disciplinary area
5.60 which is related to the decision situation 5.73 5.72 5.73 5.40 ii. A proposed decision making model for personal 5.30 5.20 decisions with emphasis on the non-expert use.
Average 5.00 a5.22 5.13 md-Matrix iii. Two prototypes which utilizing the proposed 4.80 5.04 ep-Matrix decision model in two different situations; 4.60 purchasing decision and educational decision. Reliability Decision making Confidence Decision effort process iv. Algorithms of the developed prototypes. E awareness Dimension v. Instruments to measure users‘ perceived Figure 6.9: Group means for helpfulness dimensions helpfulness of the prototypes.
A comparative analysis of five decision strategies which provides research basis for related future studies
X. REFERENCES 1) Adam, F. and Humphreys, P. (2008). Encyclopedia of Decision Making and Decision Support Technologies. Idea Group Inc. P a g e | 78 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
2) Alidrisi, M.M. (1987). Use of multi attribute utility P., Svenson, O. and Anna Vári, A. (Eds.), theory for personal decision making. International Analysing and Aiding Decision Processes (pp. 281- Journal of Systems Science, 18(12), 2229—2237. 299). North Holland: Amsterdam. 3) Al-Shemmeri, T., Al-Kloub, B. and Pearman, A. 16) Christ, P. (2008). KnowThis: Marketing Basics. (1997). Model Choice in Multicriteria Decision KnowThis Media. Aid. European Journal of Operational Research, 97, 17) Collins, T.R., Rossetti, M.D., Nachtmann, H.L. & 550-560. Oldham J.R. (2006). The use of multi-attribute 4) Afonso, A.P., Regateiro, F.S., and Silva, M.J. utility theory to determine the overall best-in-class (1998). Dynamic Channels: A New Development performer in a benchmarking study. Methodology for Mobile Computing Applications. Benchmarking: An International Journal, 13, 431- Retrieved, Jan 22, 2007, from 446. http://www.di.fc.ul.pt/biblioteca/tech-reports. 18) Cosier, R.A. and Dalton, D.R. (1986). The 5) Ariffin, A.M. (2009). Conceptual Design Model of Appropriate Choice and Implementation of Reality Learning Media (RLM): Towards Decision Strategies. Journal of Industrial Entertaining and Fun Electronic Learning Materials Management & Data Systems, 86(3/4), 18-21. (eLM) (Ph.D. Dissertation, Universiti Utara Abstract retrieved from Malaysia) http://www.emeraldinsight.com/10.1108/eb057436 6) Arsham, H. (2004). Decision Making: Overcoming 19) Dyer, J.S., Fishburn, P.C., Steuer, R.E., Wallenius, Serious Indecisiveness. Retrieved March 10, 2009 J. and Zionts, S. (1992). Multiplew Criteria Decision from Making, Multiattribute Utility Theory: The Next http://home.ubalt.edu/ntsbarsh/opre640/partXIII.ht Ten Years. Management Science, 38(5), 645-654. m. 20) Easwaran, Kennye (2009). Dominance-based 7) Atkinson, C. and Olla, P. (2004). Developing a Decision Theory. Unpublished manuscript. wireless reference model for interpreting Retrieved from complexity in wireless projects. Industrial http://www.ocf.berkeley.edu/~easwaran/papers/deci Management & Data Systems, 104, 262-272. ision.pdf 8) Avison, D.E. and Fitzgerald, G. (1990). 21) Einhorn, H.J. and Hogarth, R.M. (1981). Information Systems Development: Behavioral Decision Theory: Process of Judgment Methodologies, Techniques and Tools. London: andV Choice. Annual Reviews Psychology, 32, 53- Blackwell. 88. 9) Bahl, H.C. and Hunt, R.G. (1984). Decision- 22) Fishburn, P.C. (1967). Additive Utilities with Making Theory and DSS Design. Data Base, 15(4), Incomplete Product Set: Applications to Priorities 10-14. and Assignments. American Society of Operations 10) Baker, D., Bridges, D., Hunter, R., Johnson, G., Research (ORSA), Baltimore, MD: U.S.A. Krupa, J., Murphy, J. and Sorenson, K. (2002)y 23) Gillespie, M. (2007). Resource Guide for the Guidebook to Decision-Making Methods, WSRC- UMPC Software Developer. Intel.com IM-2002-00002, Retrieved from Departmentl of 24) Häubl, G. and Trifts, V. (2000). Consumer Energy, USA website: http://emi- Decision Making in Online Shopping web.inel.gov/Nissmg/Guidebook_2002.pdf. Environments: The Effects of Interactive Decision 11) Bell, D.E., Raiffa, H., and Tversky,r A. (1988). Aids. Marketing Science, 19(1), 4-21. Descriptive, normative, and prescriptive 25) Hayes, C.C. & Akhavi, F. (2008). Creating interactions in decision making. In D. Bell, Raiffa, Effective Decision Aids for Complex Tasks. H., and A. Tversky (Eds.), Decision making: Journal of Usability Studies. 3 (4), 152 - 172. descriptive, normative,a and prescriptive 26) Heikkinen, M.T. and Still, J. (2005). Business interactions (pp. 9-32). Cambridge: Cambridge Networks and New Mobile Service Development. University Press. Proceedings of the International Conference on 12) Bertini, E., Gabrielli, S., and Kimani, S. (2006). Mobile Business (ICMB‘05). 144 -151. AppropriatingE and Assessing Heuristics for Mobile 27) Heyes, I.S. (2002). Just Enough Wireless Computing. Proceedings of the working Computing. Upper Saddle River, NI: Prentice Hall. Conference on Advanced Visual Interfaces 28) Işıklar, G. and Büyüközkan, G. (2007). Using a AVI‘06, Venezia, Italy. 119-126. multi-criteria decision making approach to evaluate 13) Bridgman, P. W. (1922). Dimensional analysis. mobile phone alternatives. Computer Standards & New Haven, CT: Yale University Press. Interfaces, 29, 265-274. 14) Brown, R. (2008). Decision Aiding Research 29) Jungermann, H. (1980). Speculations about Needs. In. Adam F. and Humphreys P. (Eds.), Decision Theoretic Aids for Personal Decision Encyclopedia of Decision Making and Decision Making. In Acta Psychologica 45 (pp. 7-34). North Support Technologies (pp. 141-147). IGI Global. Holland. 15) Bronner, F. & de Hoog, R. (1982). Non-Expert Use 30) Keeney, R. and Raiffa, H. (1993). Decisions with of a Computerized Decision Aid. In Humphreys, Multiple Objectives : Preference and Value
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 79
Tradeoffs, Cambridge University Press, 47) Todd, P. & Benbasat, I. (1991). An Experimental Cambridge. Investigation of the Impact of Computer Based 31) König-Ries, B. (2009). Challenges in Mobile Decision Aids on Decision Making Strategies. Application Development. it – Information Information Systems Research, 2(2), 87-115. Technology, 51(2), 69-71. 48) Triantaphyllou, E. (2000). Multi-Criteria Decision 32) Law, W. S. (1996). Evaluating imprecision in Making Methods: A Comparative Study. Norwell, engineering design (Ph.D. Dissertation, California MA: Springer. Institute of Technology, Pasadena, California). 49) Ullman, D.G. (2002). The Ideal Engineering 33) Linkov, I., Varghese, A., Jamil, S., Seager, T.P., Decision Support System. Retrieved March 10, Kiker, G. and Bridges, T. (2004) Multi-criteria 2009 from decision analysis: A framework for structuring http://citeseerx.ist.psu.edu/viewdoc/download?doi= remedial decisions at the contaminated sites, In: 10.1.1.87.1827&rep=rep1&type=pdf Linkov, I. and Ramadan, A.B. (Eds.), Comparative 50) Westaby, J.D. (2005). Behavioral reasoning theory: Risk Assessment and Environmental Decision Identifying new linkages underlying intentions and Making (pp. 15-54). New York: Springer. behavior. Organizational Behavior and Human 34) March, S.T. and Smith, G. (1995). Design and Decision Processes, 98, 97-120. Natural Science Research on Information 51) Wooler, S. (1982). A Decision Aid for Structuring Technology. Decision Support Systems, 15(4), and Evaluating Career Choice Options. Journal of 251-266. Operational Research Society,w 33(4), 343-351. 35) McGuire, R. (2002). Decision Making. The 52) Zanakis, S.H., Solomon, A., Wishart, N. and Pharmaceutical Journal. 269, 647-649. Dublish, S. (1998). Multi-attribute decision 36) Miller, D.W., & Starr, M.K. (1969). Executive making: A simulatione comparison of select decisions and operations research. Englewood methods. European Journal of Operational Cliffs, NJ: Prentice-Hall, Inc. Research, 107, 507-529. 37) Naude, P., Lockett, G. and Holms, K. (1997). A 53) Zannier, C, Chaisson,i M. and Maurer, F. (2007). A Case Study of Strategic Engineering Decision model of design decision making based on Making Using Judgmental Modeling and empirical results on interviews with software Psychological Profiling. Transactions on designers. Information and Software Technology, Engineering Management, 44(3), 237-247. 49,V 637-653. 38) Offermann, P., Levina, O., Schonherr, M. and Bub, 54) Zsambok, C.E., Beach, L.R. & Klein, G. (1992) A U. (2009). Outline of a Design Science Research Literature Review of Analytical and Naturalistic Process. Proceedings of DESRIST‘09, Malvern, Decision Making. Final technical report, Fairborn, PA: USA. OH: Klein Associates Inc. 39) Payne, J.,Bettman, J. and Johnson, E. (1993). The XI. APPENDIX Adaptive Decision Maker. Cambridge Universityy Press. Original Pugh Matrix 40) Power, D.J. (1998). Designing and Developingl a Computerized Decision Aid - A Case Study. Retrieved December 10, 2009 from http://dssresources.com/papers/decisionaids.htmlr . 41) Pugh, S. (1990). Total Design: Integrated Methods for Successful Product Engineering. Great Britain: Addison Wesley. 42) Rich, P. (1999). A Processa for Effective Decision Making. Retrieved 5 April 2009 from http://www.selfhelpmagazine.com/article/decision- making 43) Saaty, T.L. E(1977). A Scaling Method for Priorities in Hierarchical Structures. Journal of Mathematical Psychology, 15, 57-68. 44) Saaty, T.L. (1994). Fundamentals of Decision Making and Priority Theory with the AHP. Pittsburgh, PA: RWS Publications. 45) Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63, 129–138. 46) Soelberg, P.O. (1967). Unprogrammed Decision Making. Industrial Management Review, 8, 19-29. P a g e | 80 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Modified Pugh Matrix #1 (MV1)
Modified Pugh Matrix #2 (MV2) w e i V
Modified Pugh Matrix #3 (MV3) y l r a
E Modified Pugh Matrix #4 (MV4)
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 81
Security Provision For Miners Data Using Singular Value Decomposition In Privacy Preserving Data Mining
Narendar.Machha1 M.Y.Babu2
Abstract-Large repositories of data contain sensitive information. Data warehouses and government agencies information that must be protected against unauthorized may potentially have access to many databases collected access. The protection of the confidentiality of this information from different sources and may extract any information has been a long-term goal for the database security research from these databases. This potentially unlimited access to community and for the government statistical agencies. Recent data and information raises the fear of possible abuse and advances in data mining and machine learning algorithms have promotes the call for privacy protection and due process of increased the disclosure risks that one may encounter when releasing data to outside parties. It brings out a new branch of law. Privacy-preserving data mining techniques have been data mining, known as Privacy Preserving Data Mining developed to address these concerns. Thew general goal of the (PPDM). Privacy-Preserving is a major concern in the privacy-preserving data mining techniques is defined as to application of data mining techniques to datasets containing hide sensitive individual data values from the outside world personal, sensitive, or confidential information. Data distortion or from unauthorized persons,e and simultaneously preserve is a critical component to preserve privacy in security-related the underlying data patterns and semantics so that a valid data mining applications; we propose a Singular Value and efficient decision model based on the distorted data can Decomposition (SVD) method for data distortion. We focus be constructed. In thei best scenarios, this new decision primarily on privacy preserving data clustering. Our proposed model should be equivalent to or even better than the model method Singular Value Decomposition (SVD) distorts only confidential numerical attributes to meet privacy using the original data from the viewpoint of decision requirements. accuracy. There are currently at least two broad classes of Keywords-Privacy-Preserving Data Mining, Matrix approachesV to achieving this goal. The first class of Decomposition, singular value decomposition, Nonnegative approaches attempts to distort the original data values so Matrix Factorization data distortion, data utility. that the data miners (analysts) have no means (or greatly reduced ability) to derive the original values of the data. The I. INTRODUCTION second is to modify the data mining algorithms so that they ata mining technologies have now been used in allow data mining operations on distributed datasets without D commercial, industrial, and governmental businesses,y knowing the exact values of the data or without direct for various purposes, ranging from increasing profitability to accessing the original datasets. This paper only discusses the enhancing national security. The widespread applicationsl of first class of approaches. Interested readers may consult data mining technologies have raised concerns about trade (Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, secrecy of corporations and privacy of innocent people M., 2003) and the references therein for discussions on contained in the datasets collected andr used for the data distributed data mining approaches. mining purpose. It is necessary that data mining II. BACKGROUND technologies designed for knowledge discovery across corporations and for security purpose towards general The input to a data mining algorithm in many cases can be population have sufficient privacya awareness to protect the represented by a vector-space model, where a collection of corporate trade secrecy and individual private information. records or objects is encoded as an n m × object-attribute Unfortunately, most standard data mining algorithms are not matrix (Frankes, & Baeza-Yates, 1992). For example, the very efficient in terms of privacy protection, as they were set of vocabulary (words or terms) in a dictionary can be the E items forming the rows of the matrix, and the occurrence originally developed mainly for commercial applications, in which different organizations collect and own their private frequencies of all terms in a document are listed in a databases, and mine their private databases for specific column of the matrix. A collection of documents thus forms commercial purposes. In the cases of inter-corporation and a term-document matrix commonly used in information security data mining applications, data mining algorithms retrieval. In the context of privacy preserving data mining, may be applied to datasets containing sensitive or private each column of the data matrix can contain the attributes of ______a person, such as the person‘s name, income, social security number, address, telephone number, medical records, etc. About-1Assistant Professor,HITS College Of Engineering. Datasets of interest often lead to a very high dimensional (e-mail:[email protected]) matrix representation (Achlioptas, 2004). It is observable About-2 Assistant Professor ([email protected]) Aurora Engineering College that many real-world datasets have nonnegative values for attributes. In fact, many of the existing data distortion P a g e | 82 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology methods inevitably fall into the context of matrix VI. SINGULAR VALUE DECOMPOSITION computation. For instance, having the longest history in Singular Value Decomposition (SVD) is a popular matrix privacy protection area and by adding random noise to the factorization method in data mining and information data, additive noise method can be viewed as a random retrieval. It has been used to reduce the dimensionality of, matrix and therefore its properties can be understood by (and remove the noise in the noisy), datasets in practice studying the properties of random matrices (Kargupta, (Berry, Drmac, & Jessup, 1999). The use of SVD technique Sivakumar, & Ghosh, 2002; Mahta, 1991). Matrix in data distortion is proposed in (Xu, Zhang, Han, & Wang, decomposition in numerical linear algebra typically serves 2005). In (Wang, Zhang, Zhong, & Xu, 2007), the SVD the purpose of finding a computationally convenient means technique is used to distort portions of the datasets. to obtain a solution to a linear system. In the context of data The SVD of the data matrix A is written as mining, the main purpose of matrix decomposition is to T obtain some form of simplified low-rank approximation to A = UΣV the original dataset for understanding the structure of the where U is an nxn orthonormal matrix, Σ = diag[σ , σ ,….. 1 2 data, particularly the relationship within the objects and σ ] (s= min{m, n} ) is an n xm diagonal matrix whose within the attributes and how the objects relate to the s nonnegative diagonal entries (the singular values) are in a attributes (Hubert, Meulman, & Heiser, 2000). The study of T matrix decomposition techniques in data mining, descending order, and V is an mxm orthonormal matrix. particularly in text mining, is not new, but the application of The number of nonzero diagonal entries of Σ is equal to the these techniques as data distortion methods in privacy- rank of the matrix A. w preserving data mining is a recent interest (Xu, Zhang, Han, Due to the arrangement of the singular values in the matrix & Wang, 2005). A unique characteristic of the matrix Σ (in a descending order), the SVD transformation has the decomposition techniques, a compact representation with property that the maximum e variation among the objects is reduced-rank while preserving dominant data patterns, captured in the first dimension, as σ ≥ σ for i≥2. Similarly, 1 i stimulates researchers‘ interest in utilizing them to achieve a much of the remaining variations is captured in the second win-win task both on high degree privacy preserving and i dimension, and so on. Thus, a transformed matrix with a high level data mining accuracy. much lower dimension can be constructed to represent the
AIN FOCUS structure of the original matrix faithfully. Define III. M T A =U Σ V Data distortion is one of the most important parts in many k k Vk k privacy-preserving data mining tasks. The desired distortion Where Uk contains the first k columns of U , Σ contains the k methods must preserve data privacy, and at the same time, T first k nonzero singular values, and V contains the first k must keep the utility of the data after the distortion k T (Verykios, Bertino, Fovino, Provenza, Saygin, & rows of V . The rank of the matrix A is k. With k being Theodoridis, 2004). The classical data distortion methods k are based on the random value y usually small, the dimensionality of the dataset has been perturbation (Agrawal, & Srikant, 2000). The more recent reduced dramatically from min{m, n} to k (assuming all ones are based on the data matrix decomposition strategies attributes are linearly independent). It has been proved that l A is the best k dimensional approximation of A in the (Wang, Zhong, & Zhang, 2006; Wang, Zhang, Zhong, & k Xu, 2007; Xu, Zhang, Han, & Wang, 2006). sense of the Frobenius norm. In data mining applications, the use of A to represent A has IV. UNIFORMLY DISTRIBUTED NOISEr k another important implication. The removed part E =A- A The original data matrix A is added with a uniformly k k distributed noise matrix E . Here E is of the same can be considered as the noise in the original dataset (Xu, u a u Zhang, Han, & Wang, 2006). Thus, in many situations, dimension as that of A, and its elements are random mining on the reduced dataset A may yield better results numbers generated from a continuous uniform distribution k on the interval fromC1 to C2. The distorted data matrix A is than mining on the original dataset A . When used for u privacy-preserving purpose, the distorted dataset A can denoted as: A = A+E . k u Eu provide protection for data privacy, at the same time, it V. NORMALLY DISTRIBUTED NOISE keeps the utility of the original data as it can faithfully represent the original data structure. Similar to the previous method, here the original data matrix A is added with a normally VII. NONNEGATIVE MATRIX FACTORIZATION distributed noise matrix En, which has the same dimension Given an nxm nonnegative matrix dataset A with Aij ≥ 0 as that of A . The elements of En are random numbers and a prespecified positive integer k ≤min{n,m} , the generated from the normal distribution with a parameter nonnegative matrix factorization (NMF) finds two mean μ and a standard deviation ρ. The distorted data matrix nonnegative matrices W Є Rnxk with Wij ≥ 0 and H Є An is denoted as: A = A+ E . n n
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 83
Rkxm with Hij≥ 0, such that A ≈ WH and the objective datasets. The distorted datasets from the techniques based on function adding either uniformly distributed noise or normally distributed noise do not have this property. They is minimized. Here is the Frobenius norm. The matrices W actually generate ―noisy‖ datasets in order to distort data and H may have many other values. desirable properties in data mining applications. Several IX. FUTURE TRENDS algorithms to compute nonnegative matrix factorizations for some applications of practical interests are proposed in (Lee, Using matrix decomposition-based techniques in data & Seung, 1999; Pascual-Montano, Carazo, Kochi, Lehmann, distortion for privacy-preserving data mining is a relatively & Pascual-Marqui, 2006). Some of these algorithms are new trend. This class of data privacy-preserving approaches modified in (Wang, Zhong, & Zhang, 2006) to compute has many desirable advantages over the more standard nonnegative matrix factorizations for enabling privacy- privacy-preserving data mining approaches. There are a lot preserving in datasets for data mining applications. Similar of unanswered questions in this new research direction. For to the sparsified SVD techniques, sparsification techniques example, a classical problem in SVD-based dimensionality can be used to drop small size entries from the computed reduction techniques is to determine the optimal rank of the matrix factors to further distort the data values (Wang, reduced dataset matrix. Although in the data distortion Zhong, & Zhang, 2006). In text mining, NMF has an applications, the rank of the reduced matrix does not seem to advantage over SVD in the sense that if the data values are sensitively affect the degree of the data distortion or the nonnegative in the original dataset, NMF maintains their level of the accuracy of the data miningw results (Wang, nonnegative, but SVD does not. The nonnegative constraints Zhang, Zhong, & Xu, 2007), it is still of both practical and can lead to a parts-based representation because they allow theoretical interests to be able to choose a good rank size for only additive, not subtractive, combinations of the original the reduced data matrix. e Unlike the data distortion basis vectors (Lee, & Seung, 1999). Thus, dataset values techniques based on adding either uniformly distributed from NMF have some meaningful interpretations in the noise or normally distributed noise, SVD and NMF does not original sense. On the contrary, data values from SVD are maintain some statisticali properties of the original sestets, no longer guaranteed to be nonnegative. There has been no such as the mean of the data attributes. Such statistical obvious meaning for the negative values in the SVD properties may or may not be important in certain data matrices. In the context of privacy preserving, on the other mining applications. It would be desirable to design some hand, the negative values in the dataset may actually be an matrix decomposition-basedV data distortion techniques that advantage, as they further obscure the properties of the maintain these statistical properties. The SVD and NMF original datasets. data distortion techniques have been used with the support vector machine based classification algorithms (Xu, Zhang, VIII. UTILITY OF THE DISTORTED DATA Han, & Wang, 2006). It is not clear if they are equally Experimental results obtained in (Wang, Zhang, Zhong, & applicable to other data mining algorithms. It is certainly of Xu, 2007; Wang, Zhong, & Zhang, 2006; Xu, Zhang, Han,y interest for the research community to experiment these data & Wang, 2006; Xu, Zhang, Han, & Wang, 2005), using both distortion techniques with other data mining algorithms. synthetic and real-world datasets with a classificationl There is also a need to develop certain techniques to algorithm, show that both SVD and NMF techniques quantify the level of data privacy preserved in the data provide much higher degree of data distortion than the distortion process. Although some measures for data standard data distortion techniques r based on adding distortion and data utility are defined in (Xu, Zhang, Han, & uniformly distributed noise or normally distributed noise. In Wang, 2006), they are not directly related to the concept of terms of the accuracy of the data mining algorithm, privacy-preserving in datasets. techniques based on adding uniformly distributed noise or X. CONCLUSION normally distributed noise sometimesa degrade the accuracy of the classification results, compared with applying the We have presented two classes of matrix decomposition- algorithm on the original, undistorted datasets. On the other based techniques for data distortion to achieve privacy- hand, both SVD and ENMF techniques can generate distorted preserving in data mining applications. These techniques are datasets that are able to yield better classification results, based on matrix factorization techniques commonly compared with applying the algorithm directly on the practiced in matrix computation and numerical linear original, undistorted datasets. This is amazing, as we algebra. Although their application in text mining is not intuitively expect that data mining algorithms applied on the new, their application in data distortion with privacy- distorted datasets may produce less accurate results, than preserving data mining is a recent attempt. Previous applied on the original datasets. It is not clear why the experimental results have demonstrated that these data distorted data from SVD and NMF are better for the data distortion techniques are highly effective for high accuracy classification algorithm used to obtain the experimental privacy protection, in the sense that they can provide high results. The hypothesis is that both SVD and NMF may have degree of data distortion and maintain high level data utility some functionalities to remove the noise from the original with respect to the data mining algorithms. The datasets by removing small size matrix entries. Thus, the computational methods for SVD and NMF are well distorted datasets from SVD and NMF look like ―cleaned‖ developed in the matrix computation community. Very P a g e | 84 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology efficient software packages are available either in standard matrix computation packages such as MATLAB or from several websites maintained by individual researchers. The availability of these software packages greatly accelerates the application of these and other matrix decomposition and factorization techniques in data mining and other application areas.
XI. REFERENCES
1) Agrawal, R., & Srikant, R. (2000). Privacy- preserving data mining, Proceedings of the 2000 2) ACM SIGMOD International Conference on Management of Data, pp. 439-450, Dallas,TX. 3) Berry, M. W., Drmac, Z., & Jessup, E. R. (1999). Matrix, vector space, and information retrieval. SIAM Review, 41, 335-362. 4) Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., and Zhu, M. (2003). Tools for privacy preserving distributed data mining. ACM SIGKDD w Explorations, 4(2), 1-7. 5) Gao, J., & Zhang, J. (2003). Sparsification strategies in latent semantic indexing. Proceedings e of the 2003 Text Mining Workshop, pp. 93-103, San Francisco, CA. 6) Hubert, L., Meulman, J., & Heiser, W. (2000). Two i purposes for matrix factorization: a historical appraisal. SIAM Review, 42(4), 68-82. 7) Kargupta, H., Sivakumar, K., & Ghosh, S. (2002). Dependency detection in mobimine and V 8) random matrices. Proceedings of the 6th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 250-262, Helsinki, Finland. 9) Lee, D. D., & Seung, H. S. (1999). Learning in parts of objects by non-negative matrixy factorization. Nature, 401, 788-791. 10) Mahta, M. L. (1991). Random Matrices.l 2nd edition. Academic, London. 11) Analysis and Machine Intelligence, 28, 403-415. Verykios, V.S., Bertino, E.,r Fovino, I. N., Provenza, L. P., Saygin, Y., & Theodoridis, Y. (2004).State-of-the-art in privacy preserving data mining. ACM SIGMOD Record, 3(1), 50-57. 12) Wang, J., Zhong, W.a J., & Zhang, J. (2006). NNMF-based factorization techniques for highaccuracy privacy protection on non-negative- valued datasets. Proceedings of the IEEE E Conference on Data Mining 2006, International Workshop on Privacy Aspects of Date Mining (PADM 2006), pp. 513-517, Hong Kong, China. 13) Xu, S., Zhang, J., Han, D., & Wang, J. (2006). Singular value decomposition based data distortion strategy for privacy protection. Knowledge and Information Systems, 10(3), 383-397.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 85
An Efficient Synchronous Checkpointing Protocol for Mobile Distributed Systems
Parveen Kumar1Rachit Garg2
Abstract-Recent years have witnessed rapid development of messages sent but not yet received. A global state is said to mobile communications and become part of everyday life for be ―consistent‖ if it contains no orphan message; i.e., a most people. In order to transparently adding fault tolerance in message whose receive event is recorded, but its send event mobile distributed systems, Minimum-process coordinated is lost [5]. To recover from a failure, the system restarts its checkpointing is preferable but it may require blocking of execution from the previous consistent global state saved on processes, extra synchronization messages or taking some useless checkpoints. All-process checkpointing may lead to the stable storage during fault-free execution. This saves all exceedingly high checkpointing overhead. In order to balance the computation done up to the last checkpointed state and the checkpointing overhead and the loss of computation on only the computation done thereafter needs to be redone. recovery, we propose a hybrid checkpointing algorithm, In independent checkpointing, processesw do not synchronize wherein an all-process coordinated checkpoint is taken after their checkpointing activity and processes are allowed to the execution of minimum-process coordinated checkpointing records their local checkpoints in an independent way. After algorithm for a fixed number of times. In the minimum-process a failure, system will search a consistent global state by coordinated checkpointing algorithm; an effort has been made tracking the dependencies frome the stable storage. The main to optimize the number of useless checkpoints and blocking of advantage of this approach is that there is no need to processes using probabilistic approach and by computing an interacting set of processes at beginning. We try to reduce the exchange any controli messages during checkpointing. But loss of checkpointing effort when any process fails to take its this requires each process to keep several checkpoints in checkpoint in coordination with others. We reduce the size of stable storage and there is no certainty that a global checkpoint sequence number piggybacked on each consistent state can be built. It may require cascaded computation message rollbacks V that may lead to the initial state due to domino- effect [6]. Acharya and Badrinath[1] were the first who I. BACKGROUND present a uncoordinated checkpointing algorithm for mobile ecent years have witnessed rapid development of computing systems. In their algorithm, an MH takes a local R mobile communications and become part of everyday checkpoint whenever a message reception is preceded by a life for most people. In the future, we will expect more and message sent at that MH. If the send and receive of more people will use some portable units such as notebooksy messages are interleaved, the number of local checkpoints or personal data assistants. With increasing use small will be equal to half of the number of computation portable computers, wireless networks and satellites, a trend messages, which may degrade the system performance. to support ―Computing of the move‖ has emerged.l This In coordinated or synchronous checkpointing, processes take trend is known as mobile computing or ―anytime‖ or checkpoints in such a manner that the resulting global state ―anywhere‖ computing. This enables the user to access and is consistent. Mostly it follows the two-phase commit exchange information while they travel, rroam in their home structure [2], [5], [6], [7], [10], [15]. In the first phase, environments, or work at their desktop computers. Mobile processes take tentative checkpoints, and in the second Hosts (MHs) are increasingly becoming common in phase, these are made permanent. The main advantage is distributed systems due to their aavailability, cost, and mobile that only one permanent checkpoint and at most one connectivity. An MH is a computer that may retain its tentative checkpoint is required to be stored. In the connectivity with the rest of the distributed system through a case of a fault, processes rollback to the last checkpointed wireless network while on move. An MH communicates state [6]. The Chandy-Lamport [5] algorithm is the earliest with the other nodesE of the distributed system via a special non-blocking all-process coordinated checkpointing node called mobile support station (MSS). A ―cell‖ is a algorithm. geographical area around an MSS in which it can support an The existence of mobile nodes in a distributed system MH. An MSS has both wired and wireless links and it acts introduces new issues that need proper handling while as an interface between the static network and a part of the designing a checkpointing algorithm for such systems [1], mobile network. Static nodes are connected by a high speed [4], [14], [16]. These issues are mobility, disconnections, wired network [1]. finite power source, vulnerable to physical damage, lack of A checkpoint is a local state of a process saved on the stable stable storage etc. Prakash and Singhal [14] proposed a storage. In a distributed system, since the processes in the nonblocking minimum-process coordinated checkpointing system do not share memory, a global state of the system is protocol for mobile distributed systems. They proposed that defined as a set of local states, one from each process. The a good checkpointing protocol for mobile distributed state of channels corresponding to a global state is the set of systems should have low overheads on MHs and wireless P a g e | 86 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology channels; and it should avoid awakening of an MH in doze authors proves that the algorithm in[4] is designed to only mode operation. The disconnection of an MH should not handle the situation where the system has only one lead to infinite wait state. The algorithm should be non- checkpoint initiator at a time and can cause inconsistency intrusive and it should force minimum number of processes when there are multiple forced checkpoints or multiple to take their local checkpoints. In minimum-process concurrent checkpoint initiations. In[22], the author point coordinated checkpointing algorithms, some blocking of the out following problems in allowing concurrent initiations in processes takes place [3], [10], [11], or some useless minimum-process checkpointing protocols, particularly in checkpoints are taken [4], [15]. case of mobile distributed systems: In minimum-process coordinated checkpointing algorithms, i) If Pi and Pj concurrently initiate checkpointing and a process Pi takes its checkpoint only if it a member of the Pj belongs to the minimum set of Pi, then Pj‘s minimum set (a subset of interacting process). A process Pi initiation will be redundant one. Some processes, in is in the minimum set only if the checkpoint initiator process Pj‘s minimum set, will unnecessarily take is transitively dependent upon it. Pj is directly dependent multiple checkpoints by hardly advancing their upon Pk only if there exists m such that Pj receives m from recovery line. In other words, an MH may be Pk in the current checkpointing interval [CI] and Pk has not asked to store multiple checkpoints in its local disk. taken its permanent checkpoint after sending m. The ith CI It may also transfer multiple checkpoints to its of a process denotes all the computation performed between local MSS. its ith and (i+1)th checkpoint, including the ith checkpoint ii) Sometimes, multiple triggers need to be but not the (i+1)th checkpoint. piggybacked onto normal w messages. Trigger The koo-Toueg[10] proposed a minimum process contains the initiator process identification and its coordinated checkpointing algorithm for distributed systems csn. Even if a process takes a checkpoint and no with the cost of blocking of processes during checkpointing. concurrent initiatione is going on, it will piggyback However this algorithm requires minimum number of its trigger, unnecessarily. If we do not allow synchronization message and number of checkpoints but concurrent initiation, no trigger is required to be each process uses monotonically increasing labels in its piggybackedi onto normal messages. Hence, outgoing messages. The initiator process sends the concurrent initiations increase message size. checkpoint request to Pi only if it has received m from Pi in Authors [23] have proposed a minimum process coordinated the current CI. Similarly, Pi sends the checkpoint request to checkpointing algorithm for mobile distributed system, other processes. In this way, a checkpointing tree is formed where noV useless checkpoints are taken and an effort is and at last the leaf node processes take checkpoints. The made to minimize the blocking of processes. . They time taken to collect coordinated checkpoint in mobile captured the transitive dependencies during the normal systems may be too large due to mobility, disconnections execution. The Z-dependencies are well taken care of in this and unreliable wireless channels. The extensive blocking of protocol. They also avoided collecting dependency vectors processes may degrade the system performance. Cao and of all processes to compute the minimum set. Singhal [4] achieved non-intrusiveness in the minimum-y In this paper [24], authors propose a nonblocking process algorithm by introducing the concept of mutable coordinated checkpointing algorithm for mobile computing checkpoints. Kumar and Kumar [21] proposed a lminimum- systems, which requires only a minimum number of process coordinated checkpointing algorithm for mobile processes to take permanent checkpoints. They reduce the distributed systems, where the number of useless message complexity as compared to the Cao-Singhal checkpoints and the blocking of processesr are reduced using algorithm [4], while keeping the number of useless a probabilistic approach. Singh and Cabillic [20] proposed a checkpoints unchanged. minimum-process non-intrusive coordinated checkpointing II. INTRODUCTION protocol for deterministic mobile systems,a where anti-messages of The system model is similar to [3], [4]. A mobile computing selective messages are logged during checkpointing. Higaki system consists of a large number of MH‘s and relatively and Takizawa [8], and Kumar et al [17] proposed hybrid fewer MSS‘s. The distributed computation we consider checkpointing protocols where MHs checkpoint consists of n spatially separated sequential processes independently and MSSsE checkpoint synchronously. Neves denoted by P0, P1, ..., Pn-1, running on fail-stop MH‘s or on et al. [13] gave a time based loosely synchronized MSS‘s. Each MH or MSS has one process running on it. coordinated checkpointing protocol that removes the The processes do no share common memory or common overhead of synchronization and piggybacks integer csn clock. Message passing is the only way for processes to (checkpoint sequence number). Pradhan et al [19] had communicate with each other. Each process progresses at its shown that asynchronous checkpointing with message own speed and messages are exchanged through reliable logging is quite effective for checkpointing mobile systems. channels, whose transmission delays are finite but arbitrary. Most of the proposed checkpointing algorithms do not We assume the processes to be non-deterministic. addressing the multiple concurrent initiations in their Similar to [3], [21], [22] initiator process collects the algorithms, as it may exhaust the limited battery and congest dependency vectors of all processes and computes the the wireless channels. The authors claim in that their tentative minimum set. Suppose, during the execution of the algorithm supports concurrent initiations [4]. But in[15] checkpointing algorithm, Pi takes its checkpoint and sends
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 87 m to Pj. Pj receives m such that it has not taken its initiates checkpointing process and sends request to all checkpoint for the current initiation and it does not know processes for their dependency vectors. At time t2, P1 whether it will get the checkpoint request. If Pj takes its receives the dependency vectors from all processes and checkpoint after processing m, m will become orphan. In computes the tentative minimum (mset[]) set as in [21], order to avoid such orphan messages, we use the following which in case of Figure 1 is {P0, P1, P2}]. P1 sends this technique as mentioned in [21]. tentative minimum set to all processes. A process takes its If Pj has sent at least one message to a process, say Pk, and soft checkpoint if it is a member of the tentative minimum Pk is in the tentative minimum set, there is a good set. When P0 and P2 get the mset[], they find themselves in probability that Pj will get the checkpoint request. the mset[]; therefore, they take their soft checkpoints. Therefore, Pj takes its mutable checkpoint before processing When P3, P4 and P5 get the mset[], they find that they are m [4]. In this case, most probably, Pj will get the checkpoint not its members ; therefore, they do not take their request and its mutable checkpoint will be converted into checkpoints. permanent one. Alternatively, this message is buffered Pj. Pj P1 sends m8 after taking its checkpoint and P0 receives m8 will process m only after taking its tentative checkpoint or before getting the mset[]. In this case, P0 buffers m8 and after getting commit as in [22]. processes it only after taking its soft checkpoint. After In minimum-process checkpointing, some processes may taking its soft checkpoint, P1 sends m11 to P3. At the time not be included in the minimum set for several checkpoint of receiving m11, P4 has received the mset[] and it has not initiations due to typical dependency pattern; and they may taken its checkpoint, therefore, P4 takes bitwise logical starve for checkpointing. In the case of a recovery after a AND of sendv4[] and mset[] and findsw that the resultant fault, the loss of computation at such processes may be vector is not all zeroes [sendv3[1]=1 due to m3; mset[2]=1]. unreasonably high [22]. In Mobile Systems, the P3 concludes that most probably, it will get the checkpoint checkpointing overhead is quite high in all-process request in the current initiation;e therefore, it takes its checkpointing [14]. Thus, to balance the checkpointing mutable checkpoint before processing m11. When P2 takes overhead and the loss of computation on recovery, we its soft checkpoint, it finds that it is dependent upon P3 and design a hybrid checkpointing algorithm for mobile P3 is not in the minimumi set [known locally]; therefore, P2 distributed systems, where an all-process checkpoint is sends checkpoint request to P3. On receiving the checkpoint taken after certain number of minimum-process checkpoints. request, P3 converts its mutable checkpoint into soft one. In coordinated checkpointing, if a single process fails to take After taking its checkpoint, P2 sends m13 to P4. P4 takes its checkpoint; all the checkpointing effort goes waste, the bitwiseV logical AND of sendv4[] and mset[] and finds because, each process has to abort its tentative checkpoint. the resultant vector to be all zeroes (sendv4[]=[000001]; In order to take the tentative checkpoint, an MH needs to mset[]=[111000]). P4 concludes that most probably, it will transfer large checkpoint data to its local MSS over wireless not get the checkpoint request in the current initiation; channels. Hence, the loss of checkpointing effort may be therefore, P4 does not take mutable checkpoint but buffers exceedingly high. Therefore, we propose that in the first m13. P4 processes m13 only after getting commit request. phase, all concerned MHs will take soft checkpoint only.y P5 processes m14, because, it has not sent any message Soft checkpoint is similar to mutable checkpoint [4], which since last permanent checkpoint. After taking its is stored on the memory of MH only. In this case,l if some checkpoint, P1 sends m12 to P2. P2 processes m12, because, process fails to take checkpoint in the first phase, then MHs it has already taken its checkpoint in the current initiation. need to abort their soft checkpoints only. The effort of At time t3, P1 receives responses from all relevant processes taking a soft checkpoint is negligible asr compared to the and issues tentative checkpoint request along with the exact tentative one. When the initiator comes to know that all minimum set [P0, P1, P2, P3 ] to all processes. On receiving relevant processes have taken their soft checkpoints, it asks tentative checkpoint request, all relevant processes convert all relevant processes to come into the second phase, in their soft checkpoints into tentative ones and inform the which, a process converts its softa checkpoint into tentative initiator. Finally, at time t4, initiator P2 issues commit. On one. Finally, the initiator issues the commit request. receiving commit following actions are taken. A process, in In the present study, we present a hybrid scheme, where an the minimum set, converts its tentative checkpoint into all process checkpoint is enforced after executing minimum- permanent one and discards its earlier permanent process algorithm forE a fixed number of times as in [22]. In checkpoint, if any. A process, not in the minimum set, the first phase, the MHs in the minimum set are required to discards its mutable checkpoint, if any, or processes the take soft checkpoint only. In the minimum-process buffered messages, if any algorithm, a process takes its forced checkpoint only if it is having a good probability of getting the checkpoint request as in [21].
III. THE PROPOSED CHECKPOINTING SCHEME
A. An Example We explain the minimum-process checkpointing algorithm with the help of an example. In Figure 1, at time t1, P1
P a g e | 88 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
w
e
i
V the minimum set, MSSk converts its disconnected B. Handling Node Mobility and Disconnections checkpoint into permanent one. On global checkpoint Suppose, an MH, say MHi, disconnects from an MSS, say commit, MSSk also updates MHi‘s ddv[], as if, it is a MSSk, it stores its own checkpoint, say disconnect_ckpti, normal process. On the receipt of messages for MHi, MSSk and other support information, e.g. ddv[], at MSSk. During stores them in a queue without updating ddv[]. When MHi, disconnection period, MSSk acts on behalf of MHiy as enters in the follows. If checkpointing process is initiated and MHi is in cell of MSSj, it is connected to the MSSj, if g_chkpt is reset. system). Intuitively, we can say that the number of useless Otherwise, it waits for the g_chkpt to be reset.l Before checkpoints in the proposed algorithm will be negligibly connection, MSSj collects its ddv[], buffered messages small as compared to the algorithm [20]. from MSSk; and MSSk discards MHi‘s support information The proposed protocol suffers from the following limitations and disconnect_ckpti. The buffered messagesr are processed with respect to the existing algorithm [20]. Initiator MSS by MHi, in the order of their receipt at the MSSk. MHi‘s collects dependencies of all processes, computes the ddv[] is updated on the processing of buffered messages. tentative minimum set, and broadcasts the tentative Comparison with existing non-blockinga algorithm In Cao- minimum set along with the checkpoint request to all Singhal algorithm [20], suppose, Pi receives m from Pj MSS‘s. Initiator MSS broadcasts exact minimum set along before taking its checkpoint and Pi is in the minimum set. with the commit request on the static network. Blocking of In this case, after taking its checkpoint, Pi sends checkpoint processes also takes place. Concurrent executions of the request to Pj due toE m. If Pj has taken some permanent algorithm are avoided. checkpoint request after sending m, the checkpoint request IV. CONCLUSIONS to Pj is useless. To enable Pj to decide whether the checkpoint request is useful, Pi also piggybacks csni[j] and a We propose a hybrid checkpointing algorithm, wherein an huge data structure MR[] along with the checkpoint request all-process coordinated checkpoint is taken after the to Pj. These useless checkpoint requests and piggybacked execution of minimum-process coordinated checkpointing data structures increase the message complexity of the algorithm for a fixed number of times. In minimum-process algorithm. Whereas, in our algorithm, no such useless checkpointing, we try to reduce number of useless checkpoint requests are sent and no such information is checkpoints and blocking of processes. We have proposed a piggybacked onto checkpoint requests. The csni[j] is integer; probabilistic approach to reduce the number of useless its size is 4 bytes. In worst case the size of MR[] is (4n +n/8) checkpoints. Thus, the proposed protocol is simultaneously bytes (n is the number of processes in the distributed able to reduce the useless checkpoints and blocking of
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 89 processes at very less cost of maintaining and collecting Systems‖, International Journal of Information and dependencies and piggybacking checkpoint sequence Computing Science, Vol. 9, No. 1, pp. 18-27, 2006. numbers onto normal messages. Concurrent initiations of 12) Lalit Kumar, M. Misra, R.C. Joshi, ―Low overhead the proposed protocol do not cause its concurrent optimal checkpointing for mobile distributed executions. We try to reduce the loss of checkpointing effort systems‖ Proceedings. 19th International when any process fails to take its checkpoint in coordination Conference on IEEE Data Engineering, pp 686 – with others. 88, 2003. 13) Neves N. and Fuchs W. K., ―Adaptive Recovery V. REFERENCES for Mobile Environments,‖ Communications of the 1) Acharya A. and Badrinath B. R., ―Checkpointing ACM, vol. 40, no. 1, pp. 68-74, January 1997. Distributed Applications on Mobile Computers,‖ 14) Prakash R. and Singhal M., ―Low-Cost Proceedings of the 3rd International Conference on Checkpointing and Failure Recovery in Mobile Parallel and Distributed Information Systems, pp. Computing Systems,‖ IEEE Transaction On 73-80, September 1994. Parallel and Distributed Systems, vol. 7, no. 10, pp. 2) Cao G. and Singhal M., ―On coordinated 1035-1048, October1996. checkpointing in Distributed Systems‖, IEEE 15) Weigang Ni, Susan V. Vrbsky and Sibabrata Ray, ― Transactions on Parallel and Distributed Systems, Pitfalls in nonblocking checkpointing‖ World vol. 9, no.12, pp. 1213-1225, Dec 1998. Science‘s journal of Interconnected Networks. Vol. 3) Cao G. and Singhal M., ―On the Impossibility of 1 No. 5, pp. 47-78, March 2004.w Min-process Non-blocking Checkpointing and an 16) Parveen Kumar, Lalit Kumar, R K Chauhan, ―A Efficient Checkpointing Algorithm for Mobile low overhead Non-intrusive Hybrid Synchronous Computing Systems,‖ Proceedings of International checkpointing protocole for mobile systems‖, Conference on Parallel Processing, pp. 37-44, Journal of Multidisciplinary Engineering August 1998. Technologies, Vol.1, No. 1, pp 40-50, 2005. 4) Cao G. and Singhal M., ―Mutable Checkpoints: A 17) Lalit Kumar,i Parveen Kumar, R K chauhan New Checkpointing Approach for Mobile ―Logging based Coordinated Checkpointing in Computing systems,‖ IEEE Transaction On Mobile Distributed Computing Systems‖, IETE Parallel and Distributed Systems, vol. 12, no. 2, pp. journal of research, vol. 51, no. 6, 2005. 157-172, February 2001. 18) LamportsV L., ―Time, clocks and ordering of events 5) Chandy K. M. and Lamport L., ―Distributed in distributed systems‖ Comm. ACM, 21(7), 1978, Snapshots: Determining Global State of Distributed pp 558-565. Systems,‖ ACM Transaction on Computing 19) Pradhan D.K., Krishana P.P. and Vaidya N.H., Systems, vol. 3, No. 1, pp. 63-75, February 1985. ―Recovery in Mobile Wireless Environment: 6) Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson Design and Trade-off Analysis,‖ Proceedings 26th D.B., ―A Survey of Rollback-Recovery Protocolsy International Symposium on Fault-Tolerant in Message-Passing Systems,‖ ACM Computing Computing, pp. 16-25, 1996. Surveys, vol. 34, no. 3, pp. 375-408, 2002.l 20) Pushpendra Singh, Gilbert Cabillic, ―A 7) Elnozahy E.N., Johnson D.B. and Zwaenepoel W., Checkpointing Algorithm for Mobile Computing ―The Performance of Consistent Checkpointing,‖ Environment‖, LNCS, No. 2775, pp 65-74, 2003. Proceedings of the 11th Symposiumr on Reliable 21) Lalit Kumar Awasthi, P.Kumar, ―A Synchronous Distributed Systems, pp. 39-47, October 1992. Checkpointing Protocol for Mobile Distributed 8) Higaki H. and Takizawa M., ―Checkpoint-recovery Systems: Probabilistic Approach‖ International Protocol for Reliable Mobile Systems,‖ Trans. of Journal of Information and Computer Security, Information processinga Japan, vol. 40, no.1, pp. Vol.1, No.3 pp 298-314, 2007. 236-244, Jan. 1999. 22) Parveen Kumar, ―A Low-Cost Hybrid 9) J.L. Kim, T. Park, ―An efficient Protocol for Coordinated Checkpointing Protocol for Mobile checkpointing Recovery in Distributed Systems,‖ Distributed Systems‖, Mobile Information Systems E pp 13-32, Vol. 4, No. 1, 2007. IEEE Trans. Parallel and Distributed Systems, pp. 955-960, Aug. 1993. 23) [23] Kumar, P., & Khunteta, A. A Minimum- 10) Koo R. and Toueg S., ―Checkpointing and Roll- Process Coordinated Checkpointing Protocol For Back Recovery for Distributed Systems,‖ IEEE Mobile Distributed System. International Journal of Trans. on Software Engineering, vol. 13, no. 1, pp. Computer Science issues, Vol. 7, Issue 3, 2010. 23-31, January 1987. 24) [24] Garg, R., & Kumar, P.(2010). A Nonblocking 11) Parveen Kumar, R K Chauhan, ―A Coordinated Coordinated Checkpointing Algorithm for Mobile Checkpointing Protocol for Mobile Computing Computing Systems. International Journal of Computer Science issues, Vol. 7, Issue 3, 2010.
P a g e | 90 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
A Fuzzy Co-Clustering approach for Clickstream Data Pattern
R.Rathipriya1 Dr. K.Thangavel2
Abstract-Web Usage mining is a very important tool to extract identifying the web communities and election analysis the hidden business intelligence data from large databases. [1][2]. Co-Clustering techniques can be used in The extracted information provides the organizations with the collaborative filtering to identify subgroups of customers ability to produce results more effectively to improve their with similar preferences or behaviors towards a subset of businesses and increasing of sales. Co-clustering is a powerful products with the goal of performing target marketing. bipartition technique which identifies group of users associated Recommendation systems and Target Marketing are to group of web pages. These associations are quantified to reveal the users’ interest in the different web pages’ clusters. In important applications in E-commerce area. The main goal this paper, Fuzzy Co-Clustering algorithm is proposed for for the above applications is to identify group of web users clickstream data to identify the subset of users of similar or customers with similar behavior/ interestw so that one can navigational behavior /interest over a subset of web pages of a predict the customer‘s interest and make proper website. Targeting the users group for various promotional recommendations to improve their sale activities is an important aspect of marketing practices. Generally, Co-clustering is a form of two-way clustering in Experiments are conducted on real dataset to prove the which both dimensions aree clustered simultaneously and efficiency of proposed algorithm. The results and findings of generated Co-Clusters are refined using some techniques this algorithm could be used to enhance the marketing strategy like fuzzy approach. The goal of this paper is to provide for directing marketing, advertisements for web based i businesses and so on. fuzzy Co-Clustering algorithm for clickstream data to Keywords-Web usage mining, Fuzzy Co-Clustering, Target quantify the discovered Co-Clusters .Users‘ clusters with marketing, Clickstream data their members are related in a different degree with pages‘ clusters. TheV relation between these clusters is quantified I. INTRODUCTION using fuzzy membership function to show the distribution of owdays, internet is a very fast communication media users‘ interest over the web page clusters. Nbetween business organizations‘ services and their The organization of the paper is as follows. Section 2 customers with very low cost. Web Data mining [1] is an summarizes some of the existing web clustering techniques intelligent data mining technique to analyze web data. It and co-clustering approaches. Section 3 describes the includes web content data, web structure data and web usagey problem statements. The proposed Fuzzy Co-clustering data. Analysis of usage data provides the organizations with algorithm is described briefly in Section 4. The experimental the required information to improve their performances. results of the proposed algorithm are discussed in the l Section 5. Section 6 concludes this paper. In general, Web clustering techniques are used to discover the group of users or group of pages called clusters which II. BACKGROUND are similar between them and dissimilar to the users /pages r A. Related work in the other cluster. User clustering approaches of usage data create groups with similar browsing pattern. Web page‘s Web mining was first proposed by Etzioni in 1996.Web content data, structure data and usage data are used to mining techniques automatically discover and extract cluster the web pages of a weba site. Clustering results may information from World Wide Web documents and services. be beneficial for wide range of application such as web site Cooley et al.[1,6] did in-depth research in web usage personalization, system improvement, web caching and pre- mining.Approaches proposed in [3,10] extend the one fetching, recommendation system, design of collaborative dimensional clustering problem and focus on the filtering and target marketing.E These clustering techniques simultaneous grouping of both users and web pages by are one dimensional where as Co-Clustering is the bi- exploiting their relations. Its goal is to identify groups of dimensional clustering technique. The combination of user related web users and pages, which has similar interest cluster with set of its significant web pages of a web site is across the same subset of pages. This behavior reveals called a Co-Cluster. users‘ interests as similar and highly related to the topic that There are many applications for Co-clustering[7] such as the specific set of pages involves. The obtained results are recommendations systems, direct marketing, text mining, particularly useful for applications such as e-commerce and ______recommendation engines, since relations between clients and products may be revealed. These relations are more About-1Department of Computer Science, Periyar University, meaningful than the one dimensional clustering of users or Salem,Tamilnadu,India. pages. e-mail;[email protected] ,[email protected]
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 91
Co-Clustering algorithms fall into three categories. First clustering the objects of different types simultaneously category models each type of object as a random variable, while preserving the mutual information between the random variables that model these objects. The second where Hits(Ui ,Pj) is the count/frequency of the user category models the relationship between different types of Ui accesses the page Pj during a given period of time. objects as a (nonnegative) matrix. This matrix is approximately decomposed into several matrices, which A user cluster is a group of users that have similar behavior indicate the cluster memberships for the objects . The third when navigating through a web site during a given period of category treats the relationship between different types of time. A page cluster is a group of web pages that are related objects as a graph and performs co-clustering by graph according to user‘s perception, specifically they are partitioning based on spectral analysis [2].Fuzzy biclustering accessed by similar users during a given period of time. approach to correlate the web users and web pages based on Similarity measure used in this paper is Fuzzy similarity. spectral clustering technique was proposed in[2]. Fuzzy similarity measure between two fuzzy subsets B. Co-Clustering Approach X1={x11,x12 ,…….,x1n} and X2={x21,xu22 ,…….,x2n} is By definition Co-Clustering[9] is the process of defined as simultaneous categorization of user and web pages into user cluster and page cluster respectively. The term co-cluster m refers to each pair of user cluster and page cluster. Using the xx12kk matrix illustration, a co-cluster is represented by a sub- fsim(X X )= k 1 matrix of A where the aij values of all its elements are 1, 2 m w similar to one another. Thus co-clustering is the task of xx12kk finding these coherent sub-matrices of A. One illustration of k 1 (2) co-clustering is shown in the following matrix. The six This ratio defines the similaritye between two fuzzy subsets, square matrices represent the six co-clusters (i.e. A11 to with values between 0 and 1. Using this similarity measure, A32). compute similarity matrix for user vector and page vector of i user associated matrix A. Fuzzy co-clustering is a technique that performs simultaneous clustering of objects and features using fuzzy membership function to correlate their relations. It allows V user clusters to belong to several page clusters simultaneously with different degree of membership value. The membership value lies between 0 and 1. A. Clustering algorithm : K-Means In this paper, K-Means clustering technique[12] is used to This paper aims to provide a framework for y the create user cluster and page cluster. K-means is one of the simultaneous clustering of web pages and users called simplest unsupervised learning algorithms for clustering Fuzzy Co-Clustering. The relations between webl users and problem. The procedure is simple and easy way to classify a pages in a co-cluster will be identified and quantified. Here, given data set through a certain fixed number of clusters users grouped in the same users‘ cluster may be related to (assume K clusters). more than one web pages‘ cluster with r different degree of The algorithm is composed of the following steps fuzzy membership value and vice versa. : III. PROBLEM STATEMENT This section gives the formal definitionsa of the problem and Place K points into the space represented by the describes how the clickstream data from web server log file objects that are being clustered. These points converted into matrix form. represent initial group centroids. Let A( U , P) be anE ‗n x m‘ user associated matrix where U={U1,U2 ,…….,Un} be a set of users and P={ Assign each object to the group that has the closest P1,P2,…….,Pm} be a set of pages of a web site. It is used to centroid. describe the relationship between web pages and users who access these web pages. Let ‗n‘ be the number of web user When all objects have been assigned, recalculate and ‗m‘ be the number of web pages. The element aij of the positions of the K centroids. A(U,P) represents frequency of the user Ui of U visit the page Pj of P during a given period of time Repeat Steps 2 and 3 until the centroids no longer Hits(Ui ,Pj) , if Pj is visited by Ui move. This produces a separation of the objects into groups from which the metric to be minimized aij = 0 , otherwise (1) can be calculated.
P a g e | 92 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
The K-Means algorithm is significantly sensitive to the D. User fuzzy subset initial randomly selected cluster centers. Run K-Means For each user U , use the user accessing information on algorithm repeatedly with different random cluster i each web page P to describe the visiting pattern. Then centers(called centriods)approximately for ten times. j user fuzzy subset µ of ith user that reflects the user‘s Choose the best centriod whose Davis Bouldin Index Ui visiting behavior is defined as value is minimum.
B. Web Server Log File µµUii P,f j U P j | P jò P Web server log file[3,5] is a log file automatically created and maintained by server of activity performed by it. where fµUi (Pj) is the membership function which is defined Default log file format is Common log File format. It as contains information about the request, client IP address, Hits Uij , P fµ i P request date/time, page requested, HTTP code, bytes served, Uj m user agent, and referrer. These data can be combined into a Hits Uik , P single file, or separated into distinct logs, such as an access k1 (3) log, error log, or referrer log. From web server log file, and m is the number of web pages of a web site. which user access which web page of a web site during a E. Page Fuzzy subset specified period of time can be obtained easily. For each page Pj , use the all user accessingw information on C. Clickstream Data the web page Pj to describe the web page itself. Then page Clickstream data[4] is a natural by-product of a user fuzzy subset µPj that reflects all users‘ visiting behavior on accessing world wide web pages, and refers to the sequence the jth page is defined as e of pages visited and the time these pages were viewed. Clickstream data is to Internet marketers and advertisers. µµPjj U i , f P U i | U iò U An instance of real clickstream records is the MSNBC i dataset, which describes the page visits of users who visited where fµPj (Ui) is the membership function which is msnbc.com on a single day. There are 989,818 users and defined as only 17 distinct items, because these items are recorded at the level of URL category, not at V
page level, which greatly reduces the dimensionality. The 17 Hits Uij , P categories are tabulated with their category number. fµUPj i n Hits Ukj , P Frontpage 1 News 2 k1 y (4) Tech 3 Local 4 Opinion 5 On-air 6 and n is the number of web users. Misc 7 Weather 8 l Health 9 Living 10 IV. FUZZY CO-CLUSTERING ALGORITHM FOR CLICKSTREAM Business 11 Sports 12 DATA Summary 13 Bbs r14 In this paper, K-Means clustering method is applied on the Travel 15 msn-news 16 user(row) and page(column) dimensions of the user access msn-sports 17 matrix A(U,P) separately and, then combine the results to Sample Sequences a obtain small co-regulated submatrices called Co-Clusters. Given a user access matrix A, let ku be the number of 1 1 clusters on user dimension and kp be the number of clusters 2 on page dimension after K-Means clustering is applied. Cu 3 2 2 4 2 2 2 3 3 E is the family of user clusters and Cp is the family of page 6 7 7 7 6 6 8 8 8 8 clusters. Let ciu be a subset of users and ciu ϵ Cu (1≤ i≥ 6 9 4 4 4 10 3 10 5 10 4 4 4 ku). Let cjp be a subset of pages and cjp ϵ Cp (1≤ j≥ kp). The pair (ciu,cjp) denotes a Co-Cluster of A . By combining Each row describes the hits of a single user. For example, the results of user dimensional clustering and page the first user hits "frontpage" twice, and the second user hits dimensional clustering, ku × kp Co-clusters are obtained. "news" once. The objective of the paper is to quantify these Co-clusters in different degree using fuzzy membership function. The proposed Fuzzy Co-Clustering algorithm has three phases .
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 93
A. First Phase: User Clustering strategy and also used to improve the quality of recommendation systems. 1. Compute user fuzzy subset of the user associated A(U,P)nxm using equation 3. V. EXPERIMENTATION AND RESULTS 2. Compute user similarity matrix of size ‗n x n‘ Inorder to evaluate performance of the proposed algorithm, using fuzzy similarity measure as defined in experiment is conducted on the benchmark clickstream equation2. dataset of MSNBC.com which describes the sequence of 3. Apply K-Means to user similarity matrix and page visits of users on 28 September 1999. generate ku user groups. A. Data Preprocessing B. Second Phase: Page Clustering Data preprocessing[8] transforms the data into a format that will be more easily and effectively processed for the purpose 1. Compute page fuzzy subset of the user of the user. The techniques to preprocess data include data associated A(U,P)nxm using equation 4. cleaning, data integration, data transformation and data 2. Compute page fuzzy similarity matrix of size ‗m reduction. Clickstream records in the MSNBC dataset is x m‘ using equation2. converted into matrix format where elements aij of A(U,P) 3. Apply K-Means to page similarity matrix and represents the frequency of the user Ui accesses the web generate kp page groups. page Pj of a web site during a given periodw of time. During the user session, the user visited web page categories are C. Third Phase: Fuzzy Relation Coefficients marked with the frequency of that page accessed and otherwise 0. e 1. Combining the results of user dimensional B. Data filtering clustering and page dimensional clustering, to Data filtering is the task of extracting only those records of obtain ku × kp Co-clusters . i 2. Calculate relation coefficients between user weblog files, which are essential for the analysis, thus cluster and page cluster of each Co-Cluster using reducing significantly data necessary for further processing. equation5 that indicates the distribution of related In this paper, data filtering aims to filter out the users who users‘ interest over the page clusters. have visitedV less than 9 page categories of web site. Initially 3. Calculate relation coefficients between there are 989818 users, after this step number of users are user cluster and page cluster of each Co- reduced to 1720 users. Cluster using equation6 that shows which C. Results user cluster has more interest in that page K-Means algorithm is applied to the resultant user cluster. associated matrix of size 1720 X 17 where ku=10 and kp= 3 y was fixed to create ten user clusters and three page clusters. Using equation 4 and equation 5, the relations between user After performing one dimensional clustering on user clusters and page clusters were quantified as shown in Fuzzy fuzzy subset and page fuzzy subset, k user clusters and u l Relation Coefficient Matrix 1and Matrix 2 kp page clusters are related and quantified user clusters‘ interest in the different degree to different page clusters. VI. FUZZY RELATION CO-EFFICIENT It reveals the group of related users‘ r interest in the Table 1 shows which user and page clusters are more related different group of related web pages. The fuzzy relation and it indicate the way of users‘ clusters interest distribution co-efficient between user cluster and web page cluster is u over all pages‘ clusters. From the Table1, User Cluster c2 defined in two ways as p a has more interest in the page cluster c3 because that Co- Cluster ‗s fuzzy relation value is high. Similarly, interested pages for each user cluster can be found easily and efficiently E
Equation5 quantifies the each user clusters‘ interest for different related web page clusters. Equation 6 quantifies the different users‘ clusters interest for each web page clusters. The interpretation of the fuzzy co-clustering result can be used to improve direct and target marketing P a g e | 94 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
c 4 0.0847 0.0423 0.0456 c u 5 0.0485 0.0019 0.0549 c u 6 0.0921 0.3978 0.0877 c u 7 0.0739 0.0346 0.0792 c u 8 0.1049 0.0752 0.1165 c u 9 0.1734 0.0562 0.0475 c u 10 0.1305 0.1256 0.0431
Table 1 : Users‘ Cluster Fuzzy Relation Coefficient for Page Clusters
Table 2 shows Co-Cluster‘s fuzzy relation value by relating user and page clusters. It clearly pictures out which user w cluster has more interest for a page cluster. By this way it is easy to identify target related user group for each page cluster and which is useful for target marketing to make recommendations according to their frequent access of web e
pages during a given period of time. i p p p Clusters c1 c2 c3 u c1 0.0839 0.0162 0.0489 u c2 0.1615 0.1739 0.4192 - V u c3 0.0466 0.0763 0.0574
Table 2 shows Co-Cluster‘s fuzzy relation value by relating user and page clusters. It clearly pictures out which user cluster has more interest for a page cluster. By this way it is easy to identify target related user group for each pagey Interpretation of Co-Cluster result with fuzzy relation value cluster and which is useful for target marketing to make is very helpful to realize how and with which patterns the recommendations according to their frequent accessl of web web site page categories are visited more by the which user pages during a given period of time. cluster. Such information‘s are useful to the web administrators for web site evaluation or reorganization. r Recommend set of related web page category for users group based on the fuzzy relation value also possible.
a cluster results are used by the company for focalized u u u u u u u u u Clusters c1 c2 c3 c4 c5 c6 c7 c8 c9
p c1 0.4192E 0.1389 0.2445 0.4309 0.274 0.2652 0.2805 0.2712 0.595 p c2 0.0103 0.0189 0.0507 0.0273 0.0014 0.1451 0.0167 0.0246 0.0244
p c3 0.5705 0.8422 0.7048 0.5419 0.7246 0.5898 0.7028 0.7041 0.3806 marketing campaigns to an interesting target user cluster.
VII. CONCLUSION
This paper proposed Fuzzy Co-Clustering algorithm for This is a key feature in target marketing. Our Fuzzy Co- Cliskstream data and evaluated it with real dataset. The Clustering algorithm produces non-overlapping co-clusters. results proved its efficiency in correlating the relevant users In future it is extended to generate overlapping clusters. and web pages of a web site. Thus, interpretation of Co-
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 95
VIII. REFERENCES
1) Cooley.R ,Srivastava.J, Deshpande.M, , ―Data preparation for mining world wide web browsing patterns‖, Knowledge and Information Systems ,Vol 1,No.1,pp.5–32,1999. 2) Koutsonikola, V.A. and Vakali, A. ,―A fuzzy bi- clustering approach to correlate web users and
3) pages‖, Int. J. Knowledge and Web Intelligence, Vol. 1, No. 1/2, pp.3–23, 2009. 4) Liu, X., He, P. and Yang, Q. ‗Mining user access patterns based on Web logs‘, Canadian Conference on Electrical and Computer Engineering, May, Saskatoon Inn Saskatoon,Saskatchewan Canada, pp.2280–2283, 2005. 5) Panagiotis Antonellis, Christos Makris, Nikos Tsirakis,‖Algorithms for Clustering ClickStream Data‖,Information Processing Letters, Vol- 109, w Issue 8, pp. 381-385,2009 6) Qinbao Song , Martin Shepperd ,‖ Mining web browsing patterns for E-commerce‖, Computers in e Industry 57,pp. 622–630,2006. 7) Srivastava.J, Cooley.R, Deshpande.M, and P.-N. Tan, ―Web usage mining: Discovery and i applications of usage patterns from web data,‖ SIGKDD Explorations, Vol. 1, No. 2, pp. 12-23, 2000. 8) Stanislav Busygina, Oleg Prokopyevb, and Panos V M. Pardalos,‖ Biclustering in data mining‖, Computers & Operations Research 35 ,pp.2964 – 2987,2008. 9) Suneetha K.R, Dr. R. Krishnamoorthi,‖Data Preprocessing and Easy Access Retrieval of Data through Data Ware House ―,Proceedings of y the World Congress on Engineering and Computer Science 2009, USA, Vol. 1,2009. l 10) Tjhi.W.C and L. Chen ,‖ Minimum sum-squared residue for fuzzy co-clustering‖ Intelligent Data Analysis 9 pp.1–13,2006. r 11) Zeng, H-J., Chen, Z. and Ma, Y-M. ―A unified framework for clustering heterogeneous web objects‖, Proceedings of the 3rd International Conference on Weba Information Systems Engineering, December, Singapore, pp.161–172, 2002.
E
P a g e | 96 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
A Survey on Topology for Bluetooth Based Personal Area Networks
Prof. Anuradha.V1 Dr. Sivaprakasam. P2
Abstract-Bluetooth is a proficient technology for short range between piconets on a time division basis, and, while wireless communication and networking, fundamentally used switching, they must re-synchronize with the current as a alternate for connected cables. Bluetooth is a Wireless piconet. An intended full duplex connection can be Personal Area Network (WPAN) technology, which enables established between the master and the slave by sending and devices to connect and communicate by means of short-range receiving the traffic alternatively. A master or a slave ad-hoc networks. Topology formation remains to be challenging problem in most of the Bluetooth based Wireless involved in the activity of more than one piconet can act as a Personal Area Networks (BT-WPANs). The problem of bridge allowing piconets to form a larger network, a so- topology creation in WPANs can be divided into two sub called scatternet. A slave is allowed to start transmission in a problems: the election of the nodes that have to act as master, given slot, if the master has addressed it in the preceding and the assignment of the slaves to the piconets. Topology slot. In Bluetooth technology, frequencyw hopping or time creation is the procedure of defining the piconets, and the division duplex (FH/TDD) is used for time division into interconnection of the nodes organized in the network area. 625- sec intervals, termed as slots. The master uses intra- Traffic load distribution and energy consumption by the nodes piconet scheduling algorithms to schedule the traffic within are the two major factors that are affected by improper a piconet. Inter-piconet schedulinge algorithms are used to topology design. Many researches have been conceded on topology study for Bluetooth WPANs. These researches decide schedule the existence of the bridges in diverse piconets [8]. to develop an efficient topology for BT-WPANs that may Abundant intra and inter-piconeti scheduling algorithms have consume less energy for communication between the master been proposed [5] [6] [7]. and the slave. This paper presents a survey on various network Network topology creation remains to be a most important topology distribution techniques for Bluetooth based WPANs. aspect in WPANs. Topology creation is the process of Additionally, as a part of future research, this paper also defining the piconets, and the interconnection of the nodes discusses some of the limitations of the available topologies and V deployed in the network area. Certainly, topology design has the probable solutions to over come the limitations an essential impact on the traffic load distribution within the Keywords- Bluetooth, Bridges, Topology, Wireless Personal WPAN, and on the nodes energy consumption. One of the Area Network (WPAN), Nodes, Master, Slave, Piconets, most demanding problems in deploying a BT-WPAN Scatternets, Slots, Frequency Hopping consists in forming a scatternet that meets the constraints I. INTRODUCTION y posed by the system specifications and the traffic requirements. This paper presents a survey on various n recent years, wireless ad-hoc networks have acquired network topology distribution techniques for Bluetooth I significant importance. Correspondingly, a greatl deal of based WPANs. Additionally, as a part of future research, attention is offered towards short range radio systems that this paper also discusses some of the limitations of the are operated using Bluetooth technology [1], [2] and IEEE available topologies and the probable solutions to over come 802.15 [3] Wireless Personal Area Networksr (WPAN). the limitations. Piconets form the fundamental architectural unit in WPANs. The remainder of this paper is organized as follows. Section Bluetooth is a Wireless Personal Area Network (WPAN) II of this paper provides an insight view on different technology, which enables a devices to connect and topologies for Bluetooth based wireless personal area communicate via short-range ad-hoc networks [4]. networks that were proposed earlier in literature. Section III Bluetooth WPANs (BT-WPANs) are characteristically used gives directions for future research. Section IV concludes to twist stand-alone devices located in the range of about 10 the paper with fewer discussions. m into networked E equipment. In general, a piconet comprises of a master device and a maximum of seven slave II. LITERATURE REVIEW devices. The slave devices are limited in operation as they Numerous researches have been carried on topology study are permitted to communicate only with their master device. for Bluetooth WPANs. These researches determine to Additionally, a piconet can have unlimited number of nodes, develop an efficient topology for BT-WPANs that may provided that they remain inactive. In other words, the consume less energy for communication between the master excess nodes will not participate in piconet transmissions. and the slave. This section of the paper provides a close A different frequency hopping sequence may be utilized by study on different topologies for Bluetooth based wireless each piconet. This frequency hopping sequence is normally personal area networks that were proposed earlier in derived from the master address. Because of the exercise of literature. different hopping sequences, a bridge cannot be active in An effective topology for Bluetooth scatternet was proposed more than one piconet at a time; thus, bridges have to switch by Huang et al. in [9]. Bluetooth is a capable technology for
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 97 short range wireless communication and networking, Optimal topology for Bluetooth was projected by Melodia et essentially used as a replacement for connected cables. al. in [12]. As we all know, Bluetooth is a hopeful Since the Bluetooth specification only defines how to build technology for personal/local area wireless communications. piconet, several solutions have been proposed to construct a A Bluetooth scatternet is composed of overlapping piconets, scatternet from the piconets in the literatures. A tree shaped each with a low number of devices sharing the same radio scatternet is called the bluetree. In their paper, they proposed channel. Their paper discusses the scatternet formation an approach to generate the bluetree hierarchically; namely, problem by analyzing topological characteristics of the the nodes are added into the bluetree level by level. This scatternet formed. A matrix-based representation of the kind of Hierarchical Grown Bluetree (HGB) topology network topology is used to define metrics that are applied resolves the defects of the conventional bluetree. During to estimate the key cost parameters and the scatternet growing up, HGB always remains balanced so as to performance. Numerical examples are presented and maintain shorter routing paths. Besides, the links between discussed, highlighting the impact of metric selection on siblings provide alternative paths for routing. As a result, the scatternet performance. Then, a distributed algorithm for traffic load at parent nodes can be significantly improved scatternet topology optimization is introduced, that supports and only two separate parts will be induced if a parent node the formation of a locally optimal scatternet based on a is lost. The Bluetooth network therefore achieves better selected metric. Numerical results obtained by adopting this reliability. distributed approach to optimize the network topology are L. Huang et al. in [10] described the impact of topology on shown to be close to the global optimum. multi-hop Bluetooth personal area network. Their paper Lin et al. in [13] proposed the formationw of a new BlueRing concentrates on the impact of topology on Bluetooth scatternet topology for Bluetooth WPANs. It is personal area network. They initially described some recommendable to have uncomplicated yet competent observations on performance degradations of Bluetooth scatternet topologies withe good supports of routing PAN due to network topologies, and then analyzed its protocols, considering that Bluetooth are to be used for reason. Based on their analysis, they described a lithe personal area networks with design goals of simplicity and scatternet formation algorithm under conference scenario for compactness. In the i literature, even though many routing multi-hop communication. By using proposed method, protocols have been proposed for mobile ad hoc networks, scatternet can be formed flexibly with different topologies directly applying them poses a difficulty due to Bluetooth's under a controlled way. In order to utilize topology special base band and MAC-layer features. In their work, information in multi-hop communication, they proposed they proposedV an attractive scatternet topology called new link metric Load Metric (LM) information in multi-hop BlueRing, which connects piconets as a ring interleaved by communication; they proposed a new link metric Load bridges between piconets, and address its formation, routing, Metric (LM) instead of number of hops. LM is derived from and topology-maintenance protocols. The BlueRing estimation of nodes link bandwidth, which reflects different architecture enjoys the following fine features. First, routing roles of nodes in Bluetooth scatternet. Furthermore, their on BlueRing is stateless in the sense that no routing proposal helped routing protocol to bypass heavily loadedy information needs to be kept by any host once the ring is nodes, and find route with larger bandwidth. They presented formed. This would be favorable for environments such as some experimental results based on implementation,l which Smart Homes where computing capability is limited. proved the effectiveness of their protocols. Second, the architecture is scalable to median-size Hsu et al. in [11] put forth a method of topology formation scatternets easily (e.g. around 50 ~ 70 Bluetooth units). In with the assistance of ns. Bluetoothr is a promising comparison, most star- or treelike scatternet topologies can technology in wireless applications, and many associated easily form a communication bottleneck at the root of the issues are however to be explored both in academia and tree as the network enlarges. Third, maintaining a BlueRing industry. Because of the complexity and the dynamics of is a trouble-free task computer networks, a good a simulation tool plays an as some Bluetooth units join or leave the network. To imperative role in the development stage. Of the existing endure single-point failure, they proposed a protocol-level simulation tools, ns is an accepted, open-source package that solution mechanism. To tolerate multipoint failure, they has a considerable support for simulation of TCP, routing, proposed a recovery mechanism to reconnect the BlueRing. and multicast protocolsE over wired and wireless networks. It Graceful failure is tolerable as long as no two or more also has BlueHoc as its extension for Bluetooth. Although critical points fail at the same time. In addition, they also BlueHoc offers many simulation functions for Bluetooth, all evaluated the ideal network throughput at different simulations must be done in a practically fixed topology. BlueRing sizes and configurations by mathematical analysis. Hence simulation about dynamic topology construction-the Simulation results are presented, which demonstrated that first and an important step in establishing a Bluetooth BlueRing outperforms other scatternet structures with higher network-cannot be conducted. Besides, BlueHoc offers only network throughput and moderate packet delay. a restricted support for building a network. It also lacks A feasible topology formation algorithm for Bluetooth based flexibility in device control, in animated presentation, and in WPANs was presented by Carla et al. in [14]. In their paper, modeling mobility. The main contribution of their paper is they begin with the problem of topology formation in therefore to enhance BlueHoc to support the aforementioned Bluetooth Wireless Personal Area Networks (BT-WPANs). functions. They initially overviewed and extended a previously P a g e | 98 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
proposed centralized optimization approach, and discussed ―HELLO‖ message. The design has no assumption on its results. Then they outlined the main steps of two number, distribution and mobility of nodes. In addition, they procedures that can lead to feasible distributed algorithms presented discussion and simulation results that showed the for the incremental construction of the topology of a BT- proposed algorithm has lower formation latency, maintained WPAN. The centralized optimization approach has the consume and generated an efficient and good quality of advantage of producing topologies that reduce the traffic topology for forwarding packet. load of the most congested node in the network while A self-routing topology for Bluetooth WPANs was put forth meeting the limitations on the BT-WPAN structure and by Sun et al. in [17]. The emerging Bluetooth standard is capacity. On the other hand, the centralized nature and the considered to be the most promising technology to construct high complexity of the optimization are a strong limitation ad-hoc networks. It contains specifications on how to build a of the proposed approach. Distributed algorithms for the piconet but left out the details on how to automatically topology formation of BT-WPANs are much more construct a scatternet from the piconets. Existing solutions attractive, provided their algorithmic complexity and energy only discussed the scatternet formation concern without cost are sufficiently low to allow implementation in large considering the ease of routing in such a scatternet. They BT-WPANs. Moreover, they discussed the distributed presented algorithms to embed b-trees into a scatternet procedures for the insertion and the removal of a node which enables such a network to become self-routing. It in/from a BT-WPAN, which are easily implementable and requires only fix-sized message header and no routing table able to cooperate between the system efficiency and its at each node regardless of the size of the scatternet. These ability to rapidly recover from topology changes. These properties made their solution scalablew to deal with networks procedures are the key building blocks for a distributed of large sizes. Their solutions are of distributed control and solution approach to the BT-WPAN topology formation asynchronous. They also proved that their algorithm problem. preserves the b-tree propertye when devices join or leave the Roy et al. in [15] proposed a new topology construction scatternet and when one scatternet is merged into another. technique for Bluetooth WPANs. They proposed a Salonidis et al. in [18] proposed a distributed topology Bluetooth topology construction protocol that works in formation technique fori Bluetooth personal area networks. combination with a priority-based polling scheme. A master In their paper they introduced and analyzed a randomized assigns a priority to its slaves including bridges for each symmetric protocol that yields link establishment delay with polling cycle and then polls them as many times as the predictable statistical properties. They then proposed the assigned priority. The slaves can splurge their idle time BluetoothV Topology Construction Protocol (BTCP), an either in a power-saving mode or execute new node asynchronous distributed protocol that extends the point-to- discovery. The topology construction algorithm works in a point symmetric mechanism to the case of several nodes. bottom-up manner in which isolated nodes join to form BTCP is based on a distributed leader election process small piconets. These small piconets can come together to where closeness information is discovered in a progressive form larger piconets. Larger piconets can establish sharing way and ultimately accumulated to an elected coordinator bridge nodes to form a scatternet. Individual piconets y can node. BTCP consists on three important phases. They are also discover new nodes while participating in the master- Coordinator election, role determination, connection driven polling process. The shutting down of masterl and establishment and leader election termination. Bluetooth slave nodes is detected for dynamic restructuring of the link establishment is a two-step process that involves the scatternet. The protocol can handle situations when all the Inquiry and Paging procedures. Leader election is an Bluetooth nodes are not within radio ranger of each other. important tool for breaking symmetry in a distributed Scatternet formation of Bluetooth wireless networks was system. They have implemented BTCP on top of an existing projected by Zhen et al. in [16]. In their paper, a protocol prototype implementation that emulates the Bluetooth stack of Bluetooth group ad hoc network and a ―blue-star environment on a Linux platform. island‖ network formation algorithma are proposed. The A distributed Bluetooth scatternet formation method was network formation locates within Bluetooth Network presented by Chang et al. in [19]. They devised a distributed Encapsulation Protocol (BNEP) layer and is underneath the Bluetooth scatternet formation algorithm using the parking routing protocol. The most important task of network property. This parking mechanism allows the master to formation is to establishE and maintain Bluetooth network manage more than seven slaves in its piconet. When a topology with better performance and in a fast and economic master slave pair is formed, the slave is immediately parked way. The routing protocol is generally to find the best routes such that the master will not be restricted by already having among the existing network topology. The network seven active slaves. This method is effortless and valuable formation communicates with routing protocol and and is well-matched with current Bluetooth specification. As management entity using ―Routing Trigger‖ mechanism. we all know that straight line is the shortest way to connect The ―blue-star island‖ algorithm is a distributed 2-stages to points in space, they named their algorithm Blueline to scheme. First, a group of neighbor nodes are self-organized indicate that the communicating path between two Bluetooth into ―blue-star Island,‖ where the joint node is slave in nodes is shorter compared to other scatternets. Their scatternet. Then, initiated by ―Routing Trigger‖ from routing proposed scatternet formation algorithm will allow two protocol, blue-star islands are bridged together. The Bluetooth nodes to form a connection and communicate ―Routing trigger‖ can be ―Route REQuest‖ message or directly if they are within each other‘s transmission range.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 99
The important purpose is to form a topology with the algorithms can be seen as an evolution of the classical local minimum number of hops for routes. One thing not optimum solution search called Steepest Descent (SD). The described in the above algorithm is the switching policy of a approach they proposed to find the optimal network bridge in the scatternet. In order to evaluate the performance topology in a centralized manner completely relies on the of Blueline, they have developed a Bluetooth extension to use of the tabu search (TS) methodology. Numerical results the VINT project network simulator. showed that the distributed algorithm closely approximates Metin et al. in [20] discussed the construction of energy the performance of the centralized solution for almost any efficient Bluetooth scatternets. Bluetooth networks can be number of nodes in the network area. constructed as piconets or scatternets depending on the In order to optimize the topology in Bluetooth PANs Marsan number of nodes in the network. Though piconet et al. proposed a method in [23]. Their optimization construction is a distinct process specified in Bluetooth approach is based on a model resultant from constrictions standards, scatternet formation policies and algorithms are that are unambiguous to the BT-WPAN technology, but the not well specified. Among many solution proposals for this level of abstraction of the model is such that it can be related problem, only a few of them focus on efficient usage of to the more general field of ad hoc networking. By using a bandwidth in the resulting scatternets. In their paper, they min-max formulation, they determined the optimal topology proposed a distributed algorithm for the scatternet formation that provides full network connectivity, fulfills the traffic problem that dynamically constructs and maintains a requirements and the constraints posed by the system scatternet based on estimated traffic flow rates between specification, and minimizes the traffic load of the most nodes. The algorithm is adaptive to changes and maintains a congested node in the network, or equivalentlyw its energy constructed scatternet for bandwidth-efficiency when nodes consumption. Results showed that a topology optimized for come and go or when traffic flow rates change. Based on some traffic requirements is also remarkably robust to simulations, the paper also presented the improvements in changes in the traffic pattern.e Due to the problem bandwidth-efficiency and reduction in energy consumption complexity, the optimal solution is attained in a centralized provided by the proposed algorithm. manner. Although this implies severe limitations, a An algorithm for connected topologies in Bluetooth WPANs centralized solution i can be applied whenever a network was described by Guerin et al. in [21]. They first described coordinator is elected, and provides a useful term of the fundamental characteristics of the Bluetooth technology comparison for any distributed heuristics. that are appropriate to topology formation. They formulated III. FUTURE ENHANCEMENT a mathematical model for the system objectives and V constraints, as an initial step towards a systematic In recent years, wireless ad hoc networks have been a investigation of the connectivity issue. They mainly focused growing area of research. While there has been considerable on designing a topology where a node‘s degree does not research on the topic of routing in such networks, the topic exceed 7. They presented a topology design procedure based of topology creation has received due attention. Bluetooth is on an approximation algorithm guaranteed to generate a a promising new wireless technology, which enables spanning tree with degree at most one more than y the portable devices to form short-range wireless ad hoc minimum possible value in any arbitrary graph. The networks and is based on a frequency Minimum weighted spanning tree algorithm doesl not give hopping physical layer. However, the network topology any analytical guarantee on the degrees of the nodes in the construction at present requires that devices are pairwise in 3-dimensional case. Therefore they utilized MST algorithm range of each other. The issue of determining an optimal to form connected topologies for Bluetoothr networks. topology specifically for BT-WPANs is discussed in [18] Marsan et al. in [22] projected an approach for optimal but is not actually addressed there. The first attempt at topology design in WPANs. In their paper, they deal with finding a solution to the problem is represented by the work the master election and the assignment of the slaves to the in [24]. Further research is needed to conquer this strong piconets, while they do not addressa the election of the bridge requirement while maintaining an easy construction process. nodes. They defined an intention function to be optimized in In addition, it would be interesting to perform simulation the course of the network topology design, which represents studies in order to estimate the parameters of real schedules the above requirements on traffic load distribution and that yield a good tradeoff between achievable throughput, energy consumptionE at the network nodes. Then, they average path length and medium access delay caused by the devised topology design algorithms for WPAN systems, that scheduling. The mobility support of the algorithm is not both maximize the objective function, and satisfy the discussed in [19]. Therefore, the future work may take steps constraints on the maximum number of active slaves to make the algorithm to support mobility by turning the allowed per piconet and on the maximum transmission neighbors discover time to infinity. The future study may range of the radio devices. They initially assumed that a determine to find a mathematical framework for Bluetooth centralized procedure can be performed, and they found the scatternets, in order to allow the design of efficient optimal set of masters as well as the optimal assignment of scatternet topologies slaves to piconets. Then, by maintaining the set of masters identified via the centralized algorithm, they developed a distributed assignment scheme, which well approximates the performance of the centralized solution. Tabu search P a g e | 100 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
IV. CONCLUSION 10) Leping Huang, Hongyuan Chen, V. L. N. Sivakumar, Tsuyoshi Kashima and Kaoru Sezaki, Wireless networks are implemented in a variety of real time ―Impact of Topology on Multi-hop Bluetooth applications. Bluetooth is a capable technology in wireless Personal Area Network,‖ Book Chapter, Springer applications, and many associated issues are however to be link, pp. 131-138, 2004. explored both in academia and industry. Therefore, the 11) Chia-Jui Hsu and Yuh-Jzer Joung, ―An ns-based Bluetooth technology that is used to interface the devices Bluetooth Topology Construction Simulation within a short range is widely used in recent years. The Environment,‖ Proceedings of the 36th annual communication between the connected devices takes place symposium on Simulation, p. 145, 2003. by means of a network, which has to be assigned a topology. 12) Tommaso Melodia and Francesca Cuomo, ―Locally Topology determination for a Bluetooth based WPANs are a Optimal Scatternet Topologies for Bluetooth Ad serious problem in most of the applications. Topology Hoc Networks,‖ Book Chapter on Wireless On- creation resides on election of a master and assignment of Demand Network Systems, Springer link, pp. 19- the slaves to that particular elected master. A lot of 24, 2004. techniques and methods have been proposed earlier in 13) Ting-Yu Lin, Yu-Chee Tseng and Keng-Ming literature for topology formation in Bluetooth wireless Chang, ―A new BlueRing scatternet topology for personal area networks. This paper presents a survey on Bluetooth with its formation, routing, and various network topology distribution techniques for maintenance protocols,‖ Research in Ad Hoc Bluetooth based WPANs. The future work mainly focuses Networking, Smart Sensingw and Pervasive on developing an approach for topology creation that Computing, vol. 3, no. 4, pp. 517-537, 2003. accounts for minimum energy consumption between the 14) Carla F. Chiasserini, Marco Ajmone Marsan, Elena master and slave node. The development of an approach Baralis and Paolo Garza, ―Towards Feasible also considers the traffic load between the nodes e Topology Formation Algorithms for Bluetooth- V. REFERENCES based WPANs,‖ 36th Annual Hawaii International Conference oni System Sciences (HICSS‘03), vol. 1) Haartsen, ―The Bluetooth radio system,‖ IEEE 9, p. 313, 2003. Personal Communications Magazine, pp. 28–36, 15) Rajarshi Roy, Mukesh Kumar, Navin K. Sharma February 2000. and Shamik Sural, ―Bottom-Up Construction of
2) ―The Bluetooth core specification,‖ 2001, BluetoothV Topology under a Traffic-Aware http://www.bluetooth.com. Scheduling Scheme,‖ IEEE Transactions on 3) ―IEEE 802.15 Working Group,‖ 2001, Mobile Computing, vol. 6, no. 1, pp. 72-86, http://www.ieee802.org/15/pub/TG2.html. January, 2007.
4) Bluetooth Special Interest Group, ―Specification of 16) Bin Zhen, Jonghun Park, and Yongsuk Kim, the Bluetooth System – Version 2.0,‖ Nov. 2004. ―Scatternet Formation of Bluetooth Ad Hoc
5) Baatz, M. Frank, C. Kuhl, P. Martini, andy C. Networks,‖ 36th Annual Hawaii International Scholz, ―Bluetooth Scatternet: An Enhanced Conference on System Sciences (HICSS‘03), vol. Adaptive Scheduling Scheme,‖ in Proceedings of 9, p. 312, 2003. IEEE INFOCOM‘02, pp. 782-790, 2002.l 17) Min-Te Sun, Chung-Kuo Chang and Ten-Hwang
6) A .Capone, M. Gerla, and R. Kapoor, ―An Efficient Lai, ―A Self-Routing Topology for Bluetooth Polling Schemes for Bluetooth Picocells,‖ in Scatternets,‖ International Symposium on Parallel Proceedings of IEEE ICC‘01, r vol. 7, pp. 1990- Architectures, Algorithms and Networks (ISPAN 1994, 2001. ‗02), p. 17, 2002. 7) Har-Shai, R. Kofman, A. Segall, and G. Zussman, 18) Theodoros Salonidis and Leandros Tassiulas, ―Load Adaptive Inter-piconeta Scheduling in Small- ―Distributed Topology Construction of Bluetooth scale Bluetooth Scatternets,‖ IEEE Wireless Personal Area Networks,‖ In Proceedings Communications Magazine, vol. 42, pp. 136–142, of IEEE INFOCOM, 2001. July 2004. 19) Ruay-Shiung Chang and Ming-Te Chou, ―Blueline:
8) Gil Zussman,E Adrian Segall and Uri Yechiali, ―On A Distributed Bluetooth Scatternet Formation and the Analysis of the Bluetooth Time Division Routing Algorithm,‖ Journal of Information Duplex Mechanism,‖ IEEE Transactions on Science and Engineering, vol. 21, pp. 479-494, Wireless Communications, vol. 6, no. 6, pp. 2149- 2005. 2161, 2007. 20) Metin Tekkalmaz, Hasan Sozer and Ibrahim
9) Tsung-Chuan Huang, Chu-Sing Yang, Chao-Chieh Korpeoglu, ―Distributed Construction and Huang and Sheng-Wen Bai, ―Hierarchical Grown Maintenance of Bandwidth and Energy Efficient Bluetrees (HGB): an effective topology for Bluetooth Scatternets,‖ IEEE Transactions on Bluetooth scatternets,‖ International Journal of Parallel and Distributed Systems, vol. 17, no. 9, pp. Computational Science and Engineering, vol. 2, no. 963-974, 2006. 2, pp. 23-31, 2006. 21) Guerin, J. Rank, S. Sarkar and E. Vergetis, ―Forming Connected Topologies in Bluetooth Ad-
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 101
hoc Networks - An Algorithmic Perspective,‖ 2003. 22) Ajmone Marsan, C. F. Chiasserini, and A. Nucci, ―Optimal Topology Design in Wireless Personal Area Networks,‖ www.cercom.polito.it/Publication/Pdf/114.pdf. 23) Marco Ajmone Marsan, Carla F. Chiasserini, Antonio Nucci, Giuliana Carello, and Luigi De Giovanni, ―Optimizing the Topology of Bluetooth Wireless Personal Area Networks,‖ 2005. 24) O. Miklos, A. Racz, Z. Turanyi, A.Valko, and P. Johansson, ―Performance Aspects of Bluetooth Scatternet Formation,‖ First Annual Workshop on Mobile and Ad Hoc Networking and Computing (MobiHoc), pp. 147–148, August 2000.
w
e
i
V
y
l
r
a
E
P a g e | 102 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Identification of Most Desirable Parameters in SIGN Language Tools: A Comparative Study
Yousef Al-Ohali
Abstract-Symbolic languages have gained popularity to provided to summarize and facilitate the easy choice of tool undertake communication within community of people with for translating the words and punctuations into their special needs. The languages are used to help the individuals symbolic equality. with special needs. Research exporting the possibilities for suggesting better communication symbols can not only be II. SIGN LANGUAGE TOOLS taken as a scientific achievement but also as great services to humanity. This paper focuses on the identification and In this section, we survey available tools that help designers comparison of tools that have been developed to present the and developers to develop new systems for sign language words in symbols. The careful selection of tool has been made translation. so that the competitors are of adequate standard to generate A. Vsign the valuable results. The comparative study has focused on w most desirable parameters, e.g. 3D animation, video based Vsign [1] is a 3-D animation tool implemented in representation, sign Editor, Education tool, dictionary, text Macromedia ShockWave. It is sponsored by EMMA analysis and speech recognition. An ample amount of tools (European Media Masters of Art) 2001/2002. It models the have been discussed and their merits and de-merits have been e sign animations by means of an editor. Vsign consists of two explored. In light of the discussion the choice of appropriate parts: tool can be made based on the customized requirements. Vsign Builder: The builderi is an editor that facilitates a way I. INTRODUCTION to setup the beginning, end and intermediate states of signs (Figure 1). It provides separate modeling for hands, body sign language uses visual sign patterns to convey and arms. The animation is saved to text files (with special Ameanings by combining the hand shapes, movements file extensionV ―gbr‖). and orientations of the diversified shapes of hands, arms and Vsign Player: This part facilitates playback of the animation other associated parts of the body. The facial expressions are file from a properly stored text file. also used to fully express the thoughts of the speaker. The Vsign is a good tool that can be utilized to implement sign sign languages are basically developed to help the deaf language translation since it contains a 3D capability along understand the message without listening. Diversity in the with the sign language editor. Fortunately, it does not have expressions has been observed throughout the world that is an extra hardware requirement. Furthermore, Vsign uses a governed by the culture, traditions, symbolic signaly simple file format to store animation information. However, representation and inter-symbol sequencing. Hundreds of this tool has some drawbacks. It does not have a user- sign languages are being used throughout l the world friendly interface, for instance. In addition, it produces simultaneously and have been greatly admired by the deaf unrealistic (far from natural) 3D viewing culture. The sign languages have been observed tor be existing since 5th century BC. In 1620 Juan Pablo published ―Reduction of letters and art for teaching mute people to speak‖ in Madrid which is considered to be the afirst symbolic representation of the words and phonetics enabling deaf to learn and present themselves by using the signs. Charles-Michel‘s work has been revolutionary in this domain and is used in France and North AmericaE until the present time. With the passage of time, the need to develop computerized systems that can help the deaf in conveying and understanding the message has increased. In the consequent sections we discuss the available tools that can help in translating the words into symbols, we also find the merits and de-merits of each tool, and finally a tabular view is
______
About-Deanship Of E-Transactions And Communication King Saud University, Riyadh, Saudi Arabia Figure 1: Vsign Builder [email protected]
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 103
B. The DeaulP University American Sign Language in videos which are helpful for reminding the user of the Project words. The site and the product use video clips to show This is a large scaled and professional academic 3-D project motions and icons but as the quality increases it gets harder that aims to translate English to the American Sign to process and download those videos. The portal has many Language (ASL) [2]. In order to improve quality of the lessons that are educationally well-organized. It includes animation, the project emphasizes on shadows and some games that are simultaneously educational and naturalness. Shadows and different light sources are entertaining, thus providing an enjoyable user experience. implemented to make animations look normal. To achieve E. SIGN naturalness, every animation is repeated hundreds of times The eSIGN [5] project aims to provide sign language on to detect and correct sharp/unrealistic movement transitions. websites with small software installed to clients. It uses both Furthermore, the project aims towards comprehensible 3-D animations and videos as the expression medium. It finger spelling. Thus, translation from every letter to a animates original BBC news simultaneously with a smart proper sign is kept in video file as AVI format. avatar near the news video. Moreover, eSIGN provides a The produced animations seem descriptive and realistic. user friendly interface (Figure 3). The animations are However, there are only a few sample animations in the created with an intelligent sign language editor. The website (Figure 2) which does not give a concrete idea about animations are based on motion-capture data and so they are the educational aspect and user interface of the project more realistic than synthetic ones. Nevertheless, the hand
shapes are not caught easily since the avatar is small. w
e
i
V
y Figure 2: The DePaul University American Sign Language example l C. Reading Power
Reading Power [3] is an educational software product for r native signers focused on literacy and reading comprehension. The software includes storytelling, interactive conversation, and tools to build comprehension and vocabulary. Reading Powera uses 3-D signing characters to unlock the power of reading and to add fun to the learning process. Reading Power also includes teacher support materials, activities, a starter dictionary and ideas for E integrating technology into learning. Reading Power uses 3-
D signing characters which avoid the disadvantages of video based applications. Reading Power has a big advantage with its 3-D virtual environment.
D. Ready Set Sign Ready Set Sign (RSS) [4] is an online portal for teaching American Sign Language, but the main product is published and sold via CD. The portal has many lessons and many video clips for each lesson. The courses are organized as if they are intended to teach a foreign language. The site is easily understandable. Iconic explanations are widely used P a g e | 104 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Figure 3: eSIGN (a- provide sign language on websites, b- components: word processor, text to sign/speech converter, sign language translator) an English-ASL dictionary and the ASL playroom. The target audience is not limited to people who want to F. SignGenius learn new signs, but extends to those who look for fun along SignGenius [6] is a fast, interactive software package to the way. Interaction with the objects is provided in a user learn Sign Language developed by Moving Hand friendly and clever manner (Figure 5). This is digital video oriented tool, and is not a 3-D based environment Enterprises and accredited by DEAFSA (South African
National Council for the Deaf). It uses video clips to demonstrate sign language. SignGenius is composed of six sections (Figure 4): Tips: Overview of the basic hand shapes and movements that a user may need to know in order to use sign language correctly.
Tutor: 2197 video clips grouped into 65 categories.
Test: test feature to test the ability to associate the
video clips with the correct words.
Score: For parents, teachers and students, measure w progressing medium. Info: Comprehensive list of addresses of Deaf organizations, support groups etc. Game: A built-in Hangman game. e
SignGenius is not an animation sign language tool but it has (a)i Word Processor a numbers of features like advanced search function, user
friendly interface, and good categorization for the tutor. However, SignGenius has some shortcomings e.g. low video quality, and insufficient educational perspective. V
y
l
r
(c) English-ASL Dictionary a
E
Figure 4: SignGenius
G. Personal Communicator Personal communicator [7] is a tool for learning and communicating in American Sign Language (ASL) developed by Comm. Tech Lab in MSU. Personal Communicator uses digital video and compression technology for presenting sign language features. It has four
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 105
(c) ASL Playroom I. Auslan Tuition System Auslan Tuition System [9] is a 3-D animation of Australian Sign Language. It is created by the School of Computer Science & Software Engineering, University of Western Australia. It consists of two parts: Auslan Tuition System and Auslan Sign Editor. The Auslan tuition system is made up of several modes: Tutorial mode that allows the user to select an Auslan phrase and learn the sign. Finger-spelling mode where user enters words that are then finger spelled. Dialogue mode that has two avatars signing dialogue together. This mode is designed for the sake of phrases learning in conversations. Numbers mode which is used for number signing. The Auslan Sign Editor software concentrates on building the signs, whereas the tuition part is the front end of the system and is used to display the constructed signs in a tutorial manner (Figure 7). Only Auslanw Tuition System is (d) ASL Browser available for download from the web. Shown animation Figure 5: Personal Communicator demos seem detailed and realistic. H. ViSiCAST e
ViSiCAST [8] is a project funded under the European Union
Fifth Framework which is part of the Information Society i
Technologies (IST) program. It is a large project consisting of three main parts:
Multimedia and WWW Applications: which enables authors of web pages to provide signed material as part of the page‘s V content.
Face-to-Face Transactions: this part provides a basis for dialogue between customer and clerk, through the incorporation of available moving image recognition technology to ‗read‘ simple signs made by the deaf customer, which can then be translated into text or speech y for the benefit of the clerk.
Television & Broadcast Transmission: this part concerns the provision of virtual human synthetic signing capabilitiesl in the context of broadcast television, and has two related Figure 7: Auslan Tuition System aspects: development of the necessary transmission technology and the incorporation of ViSiCASTr work into J. Sign Smith Studio & Gesture Builder the relevant broadcast standards. The Sign Smith Studio authoring tool from Vcom3D allows
individuals to rapidly create Signing Avatar scripts for a creating sign enabled content. Studio offers many powerful
features for changing coordinated facial expression, eye
gaze, role shifting and speech [10]. It contains over 2,500 ready to use signs in its dictionary. Sign Builder allows E users of Studio to "spatially inflect" signs such as pronouns,
verbs and classifiers. Sign Builder also allows users to
create other signs that may not be a part of Studio‘s core
dictionary. These include:
Specialized technical and science vocabulary.
Signs which are standard in certain regions of the U.S.
Contextualized name signs for people and places. Foreign sign languages such as British Sign Language Figure 6: ViSiCAST (BSL), etc. A key feature of this tool is Inverse Kinematics (IK) technology. It allows the user to focus on the hand position. P a g e | 106 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Once the user selects a hand shape and properly positions Arabic text to show the related ARSL animation, the hand, the IK software automatically places the joints of categorized ARSL vocabulary dictionary, and a sign the wrist, elbow and shoulder in natural positions. These language text editor (Figure 10). It consists of three parts: features give full power of creativity to the user. It is an easy Translator: It provides text to translate signs. It allows users tool to learn and use (Figure 8). to enter an Arabic text and view the Arabic signs that are related to the entered text. Dictionary: Dictionary of Tawasoul is a basic vocabulary guide for users who want to learn Arabic Sign Language. It consists of a number of categories; each one contains a
related group of words. Finger Spelling: it can be utilized as a sign language editor to help users to write documents in sign alphabetic letters by converting the entered Arabic text to sign language text
w
e
Figure 8: Sign Smith Studio i K. SiSi (Say It Sign It) System
It is 3D animation tool developed by researchers at IBM [11]. SiSi translates spoken or written words to the British Sign Language (BSL). In case of spoken words SiSi first V Figure 10: Tawasoul translate the words to text then to 3D animations (Figure 9). The system is useful in many situations when there‘s no sign M. 3D-Sign language interpreter like radio, telephone calls, some It is Malaysian sign language project [13]. It aims to develop television shows. a package to assist the learning of Malaysian sign language in 3D format using the 3D Poser Artist 4.0 which allows y creating animations using 3D characters; the interface is easy to use (Figure 11). The package consists of the l following functions: One of three human characters can be selected: male, female and child. r Learners can select between different levels: beginner, intermediate and advanced. The 3D animation enables learners to view hand/finger signing from different angles. a Different ways of learning such as chatting & puzzles‘ games.
Figure 9: The 3D character of SiSi system E L. Tawasoul Tawasoul [12] is a research project conducted by the Computer Sciences Department in King Saud University. It was developed as an Arabic Sign Language (ARSL) educational tool for hearing impaired children, their parents, and others who are interested in learning ARSL. The system is comprised of four key features: namely, 3D Animations of ARSL expressions such as hand-signs, mouth and eyes expressions, morphological Analyzers which analyze the Figure 11: 3D Sign
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver.1.0 July 2010 P a g e | 107
Hand Speak N. Sign to me O. Hand Speak [14] is American Sign Language (ASL) site that Simon Harvey developed British sign language tool and produced an online dictionary, grammar, storytelling and introduced the ‗Sign to me‘ tool [14]. It provides videos of poetry, manual alphabet (finger spelling), manual numeral, everyday signs aimed at adults and children who have tutorials, articles. Hand speak consist ASL words in the difficulties with their reading and pre-reading age. It constantly growing dictionary. All images after consists of many functions: September 2007 are video-based, the rest of the older Find a Sign (Alphabetical Dictionary): by writing the word images are gif-animation (which will be replaced or the phrase then the video demonstrates the corresponding continually). A teacher can vocally speak a word and the symbol (Figure 12) child fingerspells out a word in spelling lessons. (Figure 13) Picture Signs (Picture Dictionary): Each sign is represented by a symbol in clear categories, when the cursor is rolled over the symbol, a video clip of the sign for that symbol appears and the word is also spoken. Games: by showing a video clip of a sign, then letting the player choose the correct picture that represents that sign. The main advantages is the ease of use and colorful symbols which make it an attractive way to learn. w
e
i
Figure 13: Hand Speak V P. Comparison of various tools Examining these fifteen notable products gives a good
overview of the technology solutions in this domain. Table 1
presents the whole ten products along with main features of Figure 12: Sign to me y each one.
l
r presents the whole ten products along with main features of Q. Comparison of various tools each one. Examining these fifteen notable products gives a good overview of the technology solutionsa in this domain. Table 1 3D Video Sign Education Free Text Speech Tool/ Feature Dictionary Animation Based Editor Tool Analyzer Recognition Vsign EYes No Yes No No No No ASL Project Yes No Yes No Yes No No
eSIGN Yes Yes Yes Yes Yes Yes No
Reading Power Yes No No Yes Yes No No
SignGenius No Yes No No Yes No No
Ready Set Sign No Yes No Yes Yes No Yes P a g e | 108 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
ViSiCAST Yes No No No No Yes Yes Personal No Yes No Yes Yes Yes No Communicator Auslan Tuition Yes No Yes Yes Yes No No
Sign Smith Studio Yes No Yes Yes Yes Yes No
SiSi Yes No No Yes No Yes Yes
Tawasoul Yes No No Yes Yes Yes No
3D-Sign Yes Yes No Yes Yes No No
Sign to me No Yes No Yes Yes No No
Hand Speak No Yes No No Yes No No Table 2: Comparison of sign language products 5) eSIGN at UEA, eSign, w III. CONCLUSION ―http://www.visicast.cmp.uea.ac.uk/eSIGN/Introdu This paper highlights certain parameters that are essential ction.html". for evaluating the effectiveness of sign language tools. The 6) Sign Language Softwaree by SignGenius, parameters include but not limited to 3-D animation, video ―http://www.signgenius.com/‖. based features, sign editor, education tool, dictionary, text 7) Comm Tech Lab @ MSU, Personal analyzer and speech recognition component. After Communicator.,i comprehensive discussion of ten different sign language ―http://commtechlab.msu.edu/index.php‖. tools based on the mentioned parameters it has been 8) ViSiCAST Project, observed that none of the existing tools meets all the ―http://www.visicast.sys.uea.ac.uk/Publications.ht parameters. Comparing all the available tools against known mlV‖. parameters we can identify the extent to which each tool 9) Auslan Tuition System, supports these essential features. ‗VSign‘, ‘SignGenius‘, and ―http://auslantuition.csse.uwa.edu.au/‖. ‘Hand Speak‘ support only two features each, ‗Ready Set 10) Vcom3D, Sign Smith Studio & Gesture Builder., Sign‘ and ‗Auslan Tuition‘ and few other support three or ―http://www.vcom3d.com/‖. four features respectively. ‗Sign Smith Studio‘ support the 11) IBM Recruitment , SiSi software, ―http://www- five major features required in a sign language tool. y The 05.ibm.com/employment/uk/extreme-blue/cool- SiSi tools has the support for animation, text analyzer and projects/sisi.html― speech recognition but lacks the valuable featuresl like video 12) Tawasoul, ―http://tawasoul4arsl.ksu.edu.sa/― base and sign editor. Tawasol also lack the features that SiSi 13) DSpace@UM, 3D-Sign, lacks. ‗3D-Sign‘ and ‗Sign to me‘ are very well video based ―http://dspace.fsktm.um.edu.my/handle/1812/200― but lack the features like animation, editor,r text analyzer and 14) British sign language, Sing to me, speech analyzer. ‗Hand Speak‘ also suffers from the facts http://www.britishsign.co.uk/product_info.php?pN that it does not support 3D-animation, Sign Editor, ame=sign-to-me-cdrom Education Tool, Text and speech analyzer. The result of the 15) Handspeak: http://www.handspeak.com/tour study support that eSIGN a tool provides the best 16) 16. Ahmad Mukhtar Omer, Muhammad Hamassa functionality with respect to the features used for evaluation. Abdul Latif, Mustafa Zahran, „Basic Language „, Bachelors degree thesis, Feb. 2000 17) A tool for analysis of the Arabic sentence," twitter EIV. R EFERENCES Aallissan, Samia geek, Maha Al-Rabiah and Faten 1) Vsign, Vsign Project, Al-Qahtani, a graduation project for a bachelor's http://www.vsign.nl/EN/vsignEN.htm. degree, Riyadh, Feb. 2000 2) DePaul ASL Synthesizer, The DePaul University 18) Francik, J. and P. Fabian, ―Animating Sign American Sign Language Project., Language in the Real Time, 20th IASTED ―http://asl.cs.depaul.edu‖. International Multi-Conference Applied 3) Reading Power, ―http://voisales.com/items/sign- Informatics, Innsbruck, Austria, pp. 276-281, 2002. language-software/vcom3d/reading-power- 19) Holden, E. J. and G. G. Roy, ―The Graphical detail.html‖. Translation of English Text into Signed English in 4) Ready! Set! Sign! the Hand Sign Translator System‖, Computer ―http://www.readysetsign.com/index2.html‖. Graphics Forum (Eurographics'92), vol. 11, no. 3, pp. C357-C366, 1992.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 109
A Low-Overhead Minimum Process Coordinated Checkpointing Algorithm For Mobile Distributed System
Parveen Kumar1 Poonam Gahlan2
Abstract-A distributed system is a collection of independent computers. Fault-tolerant techniques enable a system to entities that cooperate to solve a problem that cannot be perform tasks in the presence of faults. It is easier and more individually solved. A mobile computing system is a distributed cost effective to provide software fault tolerance solutions system where some of processes are running on mobile hosts than hardware solutions to cope with transient failures [25]. (MHs), whose location in the network changes with time. The A distributed system is a collection of independent entities number of processes that take checkpoints is minimized to 1) avoid awakening of MHs in doze mode of operation, 2) that cooperate to solve a problem that cannot be individually minimize thrashing of MHs with checkpointing activity, 3) save solved. With the widespread proliferation of the Internet and limited battery life of MHs and low bandwidth of wireless the emerging global village, the notionw of distributed channels. In minimum-process checkpointing protocols, some computing systems as a useful and widely deployed tool is useless checkpoints are taken or blocking of processes takes becoming a reality [24]. A distributed system can be place. In this paper, we propose a minimum-process characterized as a collectione of mostly autonomous coordinated checkpointing algorithm for non- processors communicating over a communication network deterministic mobile distributed systems, where no and having the following features [25] No common physicali clock; This is an important useless checkpoints are taken. An effort has been made to minimize the blocking of processes and assumption because it introduces the element of ―distribution‖ in the system and gives rise to the inherent synchronization message overhead. We try to reduce asynchrony amongst the processors. the loss of checkpointing effort when any process fails to take its checkpoint in coordination with others. No sharedV memory; This is a key feature that requires Keywords-Checkpointing algorithms; parallel & distributed message-passing for communication. It may be noted that a computing; rollback recovery; fault-tolerant system; mobile distributed system may still provide the abstraction of a computing common address space via the distributed shared memory abstraction. I. INTRODUCTION Geographical separation; It is not necessary for the arallel computing with clusters of workstations is beingy processors to be on a wide-area network (WAN). Recently, P used extensively as they are cost-effective and scalable, the network/cluster of workstations (NOW/COW) and are able to meet the demands of high performancel configuration connecting processors on a LAN is also being computing. Increase in the number of components in such increasingly regarded as a small distributed system. This systems increases the failure probability. It is, thus, NOW configuration is becoming popular because of the necessary to examine both hardware andr software solutions low-cost high-speed off-the-shelf processors now available. to ensure fault tolerance of such parallel computers. To The Google search engine is based on the NOW provide fault tolerance, it is essential to understand the architecture. nature of the faults that occur in these systems. There are Autonomy and heterogeneity; The processors are ―loosely a coupled in that they have different speeds and each can be mainly two kinds of faults: permanent and transient. Permanent faults are caused by permanent damage to one or running a different operating system. They are usually not more components and transient faults are caused by changes part of a dedicated system, but cooperate with one another in environmental conditions.E Permanent faults can be by offering services or solving a problem [25]. rectified by repair or replacement of components. Transient Local checkpoint is the saved state of a process at a faults remain for a short duration of time and are difficult to processor at a given instance. Global checkpoint is a detect and deal with. Hence it is necessary to provide fault collection of local checkpoints, one from each process. A tolerance particularly for transient failures in parallel global state is said to be ―consistent‖ if it contains no orphan ______message; i.e., a message whose receive event is recorded, but its send event is lost. To recover from a failure, the About-1Department of Computer Science & Engineering Meerut Institute of system restarts its execution from a previous consistent Engineering & Technology, Meerut, India, -250005 (e-mail: [email protected]) global state saved on the stable storage during fault-free About-2Department of Computer Sc & Engg, Singhania University, Pacheri execution. In distributed systems, checkpointing can be Bari (Rajasthan) India;( e-mail: [email protected]) independent, coordinated or quasi-synchronous. Message Logging is also used for fault tolerance in distributed systems [14]. Most of the existing coordinated P a g e | 110 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
checkpointing algorithms [9, 19] rely on the two-phase minimizes the stable storage requirement and also forces protocol and save two kinds of checkpoints on the stable only interacting processes to checkpoint. storage: tentative and permanent. In the first phase, the In coordinated or synchronous checkpointing, processes initiator process takes a tentative checkpoint and requests all coordinate their local checkpointing actions such that the set or selective processes to take their tentative checkpoints. If of all recent checkpoints in the system is guaranteed to be all processes are asked to take their checkpoints, it is called consistent [add reference list……]. In case of a fault, every all-process coordinated checkpointing [5, 7, 19]. process restarts from its most recent permanent/committed Alternatively, if selective communicating processes are checkpoint. Hence, this approach simplifies recovery and it required to take checkpoints, it is called minimum-process does not suffer from domino-effect. Furthermore, checkpointing. Each process informs the initiator whether it coordinated checkpointing requires each process to maintain succeeded in taking a tentative checkpoint. After the only one permanent checkpoint on stable storage, reducing initiator has received positive acknowledgments from all storage overhead and eliminating the need for garbage relevant processes, the algorithm enters the second phase. collection. Its main disadvantage is the large latency Alternatively, if a process fails to take its tentative involved in output commit. checkpoint in the first phase, the initiator process requests A straightforward approach to coordinate checkpointing is all processes to abort their tentative checkpoint. to block communications while the checkpointing process If the initiator learns that all concerned processes have executes. A coordinator takes a checkpoint and broadcasts a successfully taken their tentative checkpoints, the algorithm request message to all processes, asking them to take a enters in the second phase and the initiator asks the relevant checkpoint. When a process receives aw message, it stops its processes to make their tentative checkpoints permanent. execution, flushes all the communication channels, takes a In order to record a consistent global checkpoint, when a tentative checkpoint, and sends an acknowledgement process takes a checkpoint, it asks (by sending checkpoint message back to the coordinator.e After the coordinator requests to) all relevant processes to take checkpoints. receives acknowledgement from all processes, it broadcasts Therefore, coordinated checkpointing suffers from high a commit message that completes the two phase overhead associated with the checkpointing process [20], checkpointing protocol.i After receiving the commit [21], [22], [23]. Much of the previous work [2, 3, 4, 20, 21, message, each process receives the old permanent 22, 23] in coordinated checkpointing has focused on checkpoint and makes the tentative checkpoint permanent. minimizing the number of synchronization messages and the The process is then free to resume execution and exchange number of checkpoints during the checkpointing process. messages V with other processes. The coordinated However, some algorithms (called blocking algorithm) checkpointing algorithms can also be classified into force all relevant processes in the system to block their following two categories: minimum-process and all process computations during the checkpointing process [3, 9, 21, 22, algorithms. 23]. Checkpointing includes the time to trace the Prakash-Singhal algorithm [13] forces only a minimum dependency tree and to save the states of processes on the number of processes to take checkpoints and does not block stable storage, which may be long. Moreover, in mobiley the underlying computation during checkpointing. However, computing systems, due to the mobility of MHs, a message it was proved that their algorithm may result in an may be routed several times before reaching its destination.l inconsistency [3]. Cao and Singhal [4] achieved non- Therefore, blocking algorithms may dramatically reduce the intrusiveness in the minimum-process algorithm by performance of these systems [7]. Recently, non-blocking introducing the concept of mutable checkpoints. The algorithms [7, 19] have received considerabler attention. In number of useless checkpoints in [4] may be exceedingly these algorithms, processes need not block during the high in some situations [16]. Kumar et. al [16] and Kumar checkpointing by using a checkpointing sequence number to et. al [11] reduced the height of the checkpointing tree and identify orphan messages. Moreover, these algorithms [4, the number of useless checkpoints by keeping non- 10] require all processes in thea system to take checkpoints intrusiveness intact, at the extra cost of maintaining and during checkpointing, even though many of them may not collecting dependency vectors, computing the minimum set be necessary. and broadcasting the same on the static network along with A mobile computing system is a distributed system where the checkpoint request. Some minimum-process blocking some of processes E are running on mobile hosts (MHs), algorithms are also proposed in literature [3, 9, 21, 23]. whose location in the network changes with time. To In this paper, we propose an efficient checkpointing communicate with MHs, mobile support stations (MSSs) act algorithm for mobile computing systems that forces only a as access points for the MHs by wireless networks. Features minimum number of processes to take checkpoints. An that make traditional checkpointing algorithms for effort has been made to minimize the blocking of processes distributed systems unsuitable for mobile computing and synchronization message overhead. systems are: locating processes that have to take their We capture the partial transitive dependencies during the checkpoints, energy consumption constraints, lack of stable normal execution by piggybacking dependency vectors onto storage in MHs, and low bandwidth for communication with computation messages. The Z-dependencies are well taken MHs [1]. Minimum-process coordinated checkpointing is an care of in this protocol. In order to reduce the message attractive approach for transparently adding fault tolerance overhead, we also avoid collecting dependency vectors of all to distributed applications, since it avoids domino effect, processes to find the minimum set as in [3], [11], [21]. We
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 111 also try to minimize the loss of checkpointing effort when any process fails to take its checkpoint. Third Phase of the Algorithm;Finally, when the proxy MSS learns that all processes in the minimum set have taken their II. PROPOSED CHECKPOINTING ALGORITHM tentative checkpoints successfully, it issues commit request Our system model is similar to [4, 21]. We propose to to all MSSs. When a process in the minimum set gets the handle node mobility and failures during checkpointing as commit request, it converts its tentative checkpoint into proposed in [21]. permanent one and discards its earlier permanent checkpoint, if any. A. The Proposed Algorithm First phase of the algorithm: When a process, say Pi, B. Massage Handling During Checkpointing running on an MH, say MHi, initiates a checkpointing, it When a process takes its mutable checkpoint, it does not sends a checkpoint initiation request to its local MSS, which send any massage till it receives the tentative checkpoint will be the proxy MSS (if the initiator runs on an MSS, then request. Suppose, Pi sends m to Pj after taking its mutable the MSS is the proxy MSS). The proxy MSS maintains the checkpoint and Pj has not taken its mutable checkpoint at dependency vector of Pi say Ri. On the basis of Ri, the set of the time of receiving m. In this case, if Pj takes its mutable dependent processes of Pi is formed, say Sminset. The proxy checkpoint after processing m, then m will become orphan. MSS broadcasts ckpt (Sminset) to all MSSs. When an MSS Therefore, we do not allow Pi to send any massage unless receive ckpt (Sminset) message, it checks, if any processes and until every process in the minimum set have taken its in Sminset are in its cell. If so, the MSS sends mutable mutable checkpoint in the first phase. wPi can send massages checkpoint request message to them. Any process receiving when it receives the tentative checkpoint request; because, at a mutable checkpoint request takes a mutable checkpoint this moment every concerned process has taken its mutable and sends a response to its local MSS. After an MSS checkpoint and m cannot becomee orphan. The massages to received all response messages from the processes to which be sent are buffered at senders end. In this duration, a it sent mutable checkpoint request messages, it sends a process is allowed to continue its normal computations and response to the proxy MSS. It should be noted that in the receive massages. i first phase, all processes take the mutable checkpoints. For a Suppose, Pj gets the mutable checkpoint request at MSSp. process running on a static host, mutable checkpoint is Now, we find any process Pk such that Pk does not belong equivalent to tentative checkpoint. But, for an MH, mutable to Sminset and Pk belongs to Rj[]. In this case, Pk is also checkpoint is different from tentative checkpoint. In order to included V in the minimum set; and Pj sends mutable take a tentative checkpoint, an MH has to record its local checkpoint request to Pk. It should be noted that the state and has to transfer it to its local MSS. But, the mutable Sminset, computed on the basis of dependency vector of checkpoint is stored on the local disk of the MH. It should initiator process is only a subset of the minimum set. Due to be noted that the effort of taking a mutable checkpoint is zigzag dependencies, initiator process may be transitively very small as compared to the tentative one[4]. For a dependent upon some more process which is not included in disconnected MH that is a member of minimum set, y the the Sminset. MSS that has its disconnected checkpoint, considers its C. An Example disconnected checkpoint as the required come. Second Phase of the Algorithm;After the proxyl MSS has The proposed Algorithm can be better understood by the received the response from every MSS, the algorithm enters example shown in Figure 2. There are six processes (P0 to the second phase. If the proxy MSS learns that all relevant P5) denoted by straight lines. Each process is assumed to processes have taken their mutable checkpointsr successfully, have initial permanent checkpoints with csn equal to ―0‖. it asks them to convert their mutable checkpoints into Cix denotes the xth checkpoints of Pi. Initial dependency tentative ones and also sends the exact minimum set along vectors of P0, P1, P2, P3, P4, P5 are [000001], [000010] with this request. Alternatively,a if initiator MSS comes to [000100], [001000], [010000], and [100000], respectively. know that some process has failed to take its checkpoint in The dependency vectors are maintained as explained in the first phase, it issues abort request to all MSS. In this way Section 2.1. the MHs need to abort only the mutable checkpoints, and P0 sends m2 to P1 along with its dependency vector not the tentative ones.E In this way we try to reduce the loss [000001]. When P1 receives m2, it computes its dependency of checkpointing effort in case of abort of checkpointing vector by taking bitwise logical OR of dependency vectors algorithm in first phase. of P0 and P1, which comes out to be [000011]. Similarly, P2 When an MSS receives the tentative checkpoint request, it updates its dependency vector on receiving m3 and it comes asks all the process in the minimum set, which are also out to be [000111]. At time t1, P2 initiates checkpointing running in itself, to convert their mutable checkpoints into algorithm with its dependency vector is [000111]. At time tentative ones. When an MSS learns that all relevant process t1, P2 finds that it is transitively dependent upon P0 and P1. in its cell have taken their tentative checkpoints Therefore, P2 computes the tentative minimum set [Sminset= successfully, it sends response to proxy MSS {P0, P1, P2}]. P2 sends the mutable checkpoint request to P1 and P0 and takes its own mutable checkpoint C21. For an MH the mutable checkpoint is stored on the disk of MH. It should be noted that Sminset is only a subset of the minimum P a g e | 112 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
set. When P1 takes its mutable checkpoint C11, it finds that it is dependent upon P3 due to m4, but P3 is not a member of Sminset; therefore, P1 sends mutable checkpoint request to P3. At time t2, P2 receives responses to mutable checkpoints Consequently, P3 takes its mutable checkpoint C31. requests from all process in the minimum set (not shown in After taking its mutable checkpoint C21, P2 generates m8 the Figure 2) and finds that they have taken their mutable for P3. As P2 has already taken its mutable checkpoint for checkpoints successfully, therefore, P2 issues tentative the current initiation and it has not received the tentative checkpoint request to all processes. On getting tentative checkpoint request from the initiator; therefore P2 buffers checkpoint request, processess in the minimum set [ P0, P1, m8 on its local disk. We define this duration as the P2, P3 ] convert their mutable checkpoints into tentative uncertainty period of a process during which a process is not ones and send the response to initiator process P2; these allowed to send any massage. The massages generated for process also send the massages, buffered at their local disks, sending are buffered at the local disk of the sender‘s to the destination processes For example, P0 sends m10 to process. P2 can sends m8 only after getting tentative P1 after getting tentative checkpoint request [not shown in checkpoint request or abort massages from the initiator the figure]. Similarly, P2 sends m8 to P3 after getting process. Similarly, after taking its mutable checkpoint P0 tentative checkpoint request. At time t3, P2 receives buffers m10 for its uncertainty period. It should be noted responses from the process in minimum set [not shown in that P1 receives m10 only after taking its mutable the figure] and finds that they have taken their tentative checkpoint. Similarly, P3 receives m8 only after taking its checkpoints successfully, therefore, P2 issues commit mutable checkpoint C31.A process receives all the massages request to all process. A process in the minimum set during its uncertainty period for example P3 receives m11. converts its tentative checkpoint into permanentw checkpoint A process is also allowed to perform its normal and discards it old permanent checkpoint if any. computations during its uncertainty period. e i V y l r a E uncertainty period and it can not send any message unless D. Correctness Proof and until it receives the tentative checkpoint request. P2 can We can show that global state collected by the proposed protocol will be consistent. We can prove the result by issue the tentative checkpoint request only after getting contradiction. Suppose there is some orphan message in the confirmed that every concerned process (including P1) has recorded global state. We explore different possibilities with taken its mutable check point. Hence P1 can not receive the help of Figure 2. Suppose, P0 sends m10 after taking its m10 before taking its mutable checkpoint C11. Suppose, P5 mutable checkpoint and P1 receives m10 before taking its sends m13 to P3 after C50 and P3 gets m13 before C31 (not mutable checkpoint. This situation is not possible, because, show in the Figure 2). In this case, when P3 takes its after taking its mutable checkpoint P0 comes into its mutable checkpoint C31, it will find that P5 dose not belong
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 113 to Sminset and P3 is dependent upon P5; therefore, P3 will IV. MESSAGE OVERHEAD OF THE PROPOSED ALGORITHM send mutable checkpoint request to P5 and send (m13) will A. Message overhead in the first phase also be included in the global state the other possibilities can be proved by obviousness [21]. Initiator process sends mutable checkpoint request to the local MSS and (say MSSin) and gets response from the III. COMPARATIVE ANALYSIS OF THE PROPOSED MSSin: 2 Cwl ALGORITHMS WITH OTHER ALGORITHMS MSS in broadcasts mutable checkpoint request over the We use following notations to compare our algorithm with static network: Cbroadcast other algorithms: We suppose that all the process are running on MHs. All the process in the minimum set get the mutable N : number of MSSs. mss checkpoint request from the local MSS and sends response Nmh: number of MHs. to the local MSS: 2*Nmin*Cwl Cpp: cost of sending a message from one Every MSS sends response to MSSin: Nmss*Cst process to another B. MESSAGE OVERHEAD IN THE SECOND PHASE Cst: cost of sending a message between MSSin broadcasts tentative checkpoint request over static any two MSSs. network:Cbroadcast Every process in the minimum set receives tentative Cwl: cost of sending a message from an checkpoint request, and sends responsew to these requests to MH to its local MSS (or vice versa). local MSS: 2*Nmin*Cwl Cbroadcast: cost of broadcasting a message Every MSS sends response to MSSin: Nmss*Cst over static network. C. MESSAGE OVERHEADe IN THE THIRD PHASE C : cost incurred to locate an MH and search MSS broadcasts commit request over static network: forward a message to its current local MSS, in Cbroadcast i from a source MSS. Total Average message overhead: 2Cwl+3 Cbroadcast +4*N *C + 2*N *C Tst: average message delay in static min wl mss st network. Our algorithm is a three phase algorithm; therefore it suffers from extraV message overhead of Cbroadcast +4*Nmin*Cwl. By Twl: average message delay in the wireless doing so, we are able to reduce the loss of checkpointing network. effort in case of abort of the checkpointing procedure in the first phase. In other algorithms [2, 3, 4, 9], in case of abort Tch: average delay to save a checkpoint on in the first phase, all concerned processes are forced to abort the stable storage. It also includes the time to their tentative checkpoint whereas in the proposed scheme, transfer the checkpoint from an MH to y its all relevant processes abort their mutable checkpoints only. local MSS. The effort of taking a mutable checkpoint is negligible as compared to tentative one in the mobile distributed system N: total number of processes l [4]. Frequent abort of checkpointing algorithms, due to Nmin: number of minimum processes exhausted battery power, abrupt disconnections etc., may required to take checkpoints. r significantly increase the checkpointing overhead in two- phase algorithms [2, 3, 4, 9]. We try to minimize the same Nmut: number of useless mutable by designing the three phase algorithm. checkpoints [4]. In our algorithm, only minimum number of processes is required to take their checkpoints. Tsearch: average delay a incurred to locate an The blocking time of the Koo-Toueg [11] protocol is MH and forward a message to its current local highest, followed by Cao-Singhal [4] algorithm. We claim MSS. that the blocking time in the proposed scheme will be E significantly smaller as compared to the KT Algorithm [9]. Nucr: average number of useless checkpoint requests in [4]. Because, in algorithm [9], transitive dependencies are collected by direct dependencies. The checkpoint initiator Ndep: average number of processes on process, say Pin, sends the checkpoint request to any process which a process depends. Pi if Pin is causally dependent upon Pi. Similarly, Pi sends the checkpoint request to any process Pj if Pi is causally h1 : height of the checkpointing tree in dependent upon Pj. In this way, a checkpointing tree is Koo-Toueg algorithm [9]. formed. In the proposed algorithm, transitive dependencies h2 : height of the checkpointing tree in the are captured during normal execution as described in proposed algorithm. Section 2.1. Some zigzag dependencies may not be captured in the proposed scheme during normal execution and they may form low order checkpointing tree in some typical P a g e | 114 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
situations. But, in general, the checkpointing tree formed in processes will be small in the proposed scheme as compared the proposed scheme will be negligibly small as compared to KT algorithm [9]. Furthermore, in the proposed scheme, a to KT algorithm [9] and hence the blocking time of process is blocked when it takes its mutable checkpoint and duration. In our scheme, a process is blocked only if it is a it waits for the other concerned process to take their mutable member of the minimum set.Furthermore, a process is checkpoints to come out of blocking state. In KT allowed to perform its normal computations and receive algorithm [9], a process is blocked when it takes its tentative messages during its blocking period checkpoint and it waits for the other concerned process to In the algorithms proposed in [4], [20], no blocking of take their tentative checkpoints to come out of blocking processes takes place, but some useless checkpoints are state. In mobile distributed systems, the time to take a taken, which are discarded on commit. In Elnozahy et al [7] mutable algorithm, all processes take checkpoints. In the protocols checkpoint may be negligibly small as compared to tentative [3], [9], and in the proposed one, only minimum numbers of checkpoint. Hence, in the proposed scheme, the blocking processes record their checkpoints. In algorithm [4], period of a process will be significantly small as compared concurrent executions of the algorithm are allowed, but it to the KT algorithm [9]. Our blocking period is larger than may lead to inconsistencies in doing so [17]. We avoid the CS algorithm [3], but it suffers from extra message overhead concurrent executions of the proposed algorithm.. of collecting dependency vectors from all processes and Table 1. A Comparison of System Performance moreover, it forces all the processes to block for a short w Cao- Cao- Koo-Toeg Elnozahy Proposed Singhal Singhal [5] Algorithm et al [8] Algorithm [4] [11] e
Avg. 2Tst 0 h1*Tch 0 h2*Tch blocking i Time Average No. N N + N N N min min min min of N mut checkpoints
Average 3Cbroadcast+2C 2*Nmin* 3*Nmin*Cpp* NdepV 2* C broadcast + N 2Cwl+3 + Cbroadcast Message wl+2Nmss*Cst+ Cpp + Cbroadcast + *Cpp +4*Nmin*Cwl + Overhead 3Nmh* Cwl Nucr*Cpp 2*Nmss*Cst
y
V. CONCLUSION l 3) Transactions on Parallel and Distributed Systems, In this paper, we have proposed a minimum-process vol. 9, no.12, pp. 1213-1225, Dec 1998. checkpointing protocol for deterministic mobile distributed 4) Cao G. and Singhal M., ―On the Impossibility of systems, where no useless checkpointsr are taken and an Min-process Non-blocking Checkpointing and an effort has been made to minimize the blocking of processes. Efficient Checkpointing Algorithm for Mobile We try to reduce the checkpointing time and blocking time 5) Computing Systems,‖ Proceedings of International of processes by limiting checkpointing tree which may be Conference on Parallel Processing, pp. 37-44, formed in other algorithms [4, 9].a We captured the transitive August 1998. dependencies during the normal execution by piggybacking 6) Cao G. and Singhal M., ―Mutable Checkpoints: A dependency vectors onto computation messages. The Z- New Checkpointing Approach for Mobile dependencies are well taken care of in this protocol. We also Computing systems,‖ IEEE Transaction On E Parallel and Distributed Systems, vol. 12, no. 2, pp. try to reduce the loss of checkpointing effort when any process fails to take its checkpoint in coordination with 157-172, February 2001. others 7) Chandy K. M. and Lamport L., ―Distributed Snapshots: Determining Global State of Distributed EFERENCES VI. R Systems,‖ ACM Transaction on Computing 1) Acharya A. and Badrinath B. R., ―Checkpointing Systems, vol. 3, No. 1, pp. 63-75, February 1985. Distributed Applications on Mobile Computers,‖ 8) Elnozahy E.N., Alvisi L., Wang Y.M. and Johnson Proceedings of the 3rd International Conference on D.B., ―A Survey of Rollback-Recovery Protocols Parallel and Distributed Information Systems, pp. in Message-Passing Systems,‖ ACM Computing 73-80, September 1994. Surveys, vol. 34, no. 3, pp. 375-408, 2002. 2) Cao G. and Singhal M., ―On coordinated 9) Elnozahy E.N., Johnson D.B. and Zwaenepoel W., checkpointing in Distributed Systems‖, IEEE ―The Performance of Consistent Checkpointing,‖
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 115
Proceedings of the 11th Symposium on Reliable 22) Parveen Kumar, Lalit Kumar, R K Chauhan, ―A Distributed Systems, pp. 39-47, October 1992. Non-intrusive Hybrid Synchronous Checkpointing 10) Higaki H. and Takizawa M., ―Checkpoint-recovery Protocol for Mobile Systems‖, IETE Journal of Protocol for Reliable Mobile Systems,‖ Trans. of Research, Vol. 52 No. 2&3, 2006. Information processing Japan, vol. 40, no.1, pp. 23) Parveen Kumar, ―A Low-Cost Hybrid Coordinated 236-244, Jan. 1999. Checkpointing Protocol for mobile distributed 11) Koo R. and Toueg S., ―Checkpointing and Roll- systems‖, Mobile Information Systems. pp 13-32, Back Recovery for Distributed Systems,‖ IEEE Vol. 4, No. 1, 2007. Trans. on Software Engineering, vol. 13, no. 1, pp. 24) Lalit Kumar Awasthi, Parveen Kumar, ―A 23-31, January 1987. Synchronous Checkpointing Protocol for Mobile 12) Neves N. and Fuchs W. K., ―Adaptive Recovery Distributed Systems: Probabilistic Approach‖ for Mobile Environments,‖ Communications of the International Journal of Information and Computer ACM, vol. 40, no. 1, pp. 68-74, January 1997. Security, Vol.1, No.3 pp 298-314. 13) Parveen Kumar, Lalit Kumar, R K Chauhan, V K 25) Sunil Kumar, R K Chauhan, Parveen Kumar, ―A Gupta ―A Non-Intrusive Minimum Process Minimum-process Coordinated Checkpointing Synchronous Checkpointing Protocol for Mobile Protocol for Mobile Computing Systems‖, Distributed Systems‖ Proceedings of IEEE International Journal of Foundations of Computer ICPWC-2005, pp 491-95, January 2005. science,Vol 19, No. 4, pp 1015-1038 (2008). 14) Pradhan D.K., Krishana P.P. and Vaidya N.H., 26) [24] A. Tanenbaum and M. Vanw Steen, Distributed ―Recovery in Mobile Wireless Environment: Systems: Principles and Paradigms, Upper Saddle Design and Trade-off Analysis,‖ Proceedings 26th River, NJ, Prentice-Hall, 2003. International Symposium on Fault-Tolerant 27) [25] M. Singhal e and N. Shivaratri, Advanced Computing, pp. 16-25, 1996. Concepts in Operating Systems, New York, 15) Prakash R. and Singhal M., ―Low-Cost McGraw Hill, 1994. Checkpointing and Failure Recovery in Mobile i Computing Systems,‖ IEEE Transaction On Parallel and Distributed Systems, vol. 7, no. 10, pp. 1035-1048, October1996. 16) Ssu K.F., Yao B., Fuchs W.K. and Neves N. F., V ―Adaptive Checkpointing with Storage Management for Mobile Environments,‖ IEEE Transactions on Reliability, vol. 48, no. 4, pp. 315- 324, December 1999. 17) J.L. Kim, T. Park, ―An efficient Protocol for checkpointing Recovery in Distributed Systems,‖y IEEE Trans. Parallel and Distributed Systems, pp. 955-960, Aug. 1993. l 18) L. Kumar, M. Misra, R.C. Joshi, ―Low overhead optimal checkpointing for mobile distributed systems‖ Proceedings. 19th IEEEr International Conference on Data Engineering, pp 686 – 88, 2003. 19) Ni, W., S. Vrbsky and S. Ray, ―Pitfalls in Distributed Nonblockinga Checkpointing‖, Journal of Interconnection Networks, Vol. 1 No. 5, pp. 47- 78, March 2004. 20) L. Lamport, ―Time, clocks and ordering of events in a distributedE system‖ Comm. ACM, vol.21, no.7, pp. 558-565, July 1978. 21) Silva, L.M. and J.G. Silva, ―Global checkpointing for distributed programs‖, Proc. 11th symp. Reliable Distributed Systems, pp. 155-62, Oct. 1992.
P a g e | 116 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
The Establishment of an AR-based Interactive Digital Artworks
1 2 3 Min-Chai Hsieh Hao-Chiang Koong Lin Jin-Wei Lin Mei-Chi Chen4
Abstract-This work attempts to declare the background of disappeared. From now emphasizing that the verisimilitude personal contemporary state through an immersion of “digital imitation is also redefined in the art history [3]. The text vacancy”. The work is stacked on the identical digital space should be opened to and created by the readers. The with concurrent portrait and enjoyment. Moreover, the work meaning of text is interpreted by the readers instead of the describes the doubt and depression in life, combining with author. This is the well-known ―writable text‖ concept [4]. humor of predicament and absurdities of senses. We employ augmented reality to create digital artworks to present In this work, we will employ Augmented Reality (AR) interactive poem. This work is established where the digital technology to create digital artworks to present a series of poem is generated via the interaction between a video film and interactive poem. The audience can interact with the digital a text-based poem. After establishing the digital artwork, we poem via pre-designed postcards. Notice that the postcard is exhibited the digital work at Digital Art Center (DAC), Taipei, real object, while the digital poemw is virtual sight. Taiwan. The audiences can interactive with digital poem in real Interestingly, the real lies in the virtual; vice versa, virtual time. In comparison to the other AR equipment, the cost of this scenes render in the real environment. Therefore, audience work is quite low. In the future, some usability evaluation will can feel themselves in an environment, both virtual and real. be performed on this work. e Keywords- Augmented reality, Digital artworks, Interactive II. RELATEDWORK poem. In recent years, manyi scholars and institutes have been I. INTRODUCTION carrying out the research on Augmented Reality, one of the techniques of computer vision application. AR is also called n the past, the artists present the creation of domain and Mixed Reality (MR), the extension of Virtual Reality (VR). private space. The most artworks are based on non- I By setting up the scene via Computer Graphics, VR can interactive visual creative expression. With the progress V simulate objects in the real world and create the of information technology development, people can create environment where users can interact with the simulated art by using the digital multimedia rather than just doing in a objects. AR is the images, objects or scenes generated by the traditional manner. That is, the way of art-creating has computer that blend into the real environment to strengthen changed dramatically. Thus, the digital art creation becomes our visual feelings. In sum, it adds virtual objects to our real more lively and interesting. Furthermore, these environment. The technology has to possess three qualities, materials/technologies enhance the artists‘ creativity. Artistsy the combination of virtual objects and the real world, real are able to create artworks via technology and multimedia; time interaction, 3D space only. that is to say, they can create artworks with the multimedia l besides the traditional way of creating arts, so they can
create in more fashions to express their thoughts. Today, the
ideas of artists can be implemented in real time via the r powerful computation abilities of various computers.
The process of artworks creation is charmingly. Because it no longer a phenomenon of the slice, but a manifestation of Fig.1. Reality and Virtuality(RV) continuum the experience. Interaction hasa been considered as an Milgram et al. [5] treats the real environment and the virtual important characteristic of digital artworks. But the one as a closed union. We can find it in Fig 1. On the left evolution of the aesthetic point of view is seldom side is merely real environment while on the right side is mentioned. Participate in the experience during the E only virtual environment. VR is inclined to take place of the construction is the significance of create all of the works. It real world; AR is to augment the virtual image produced by formed the ―interactive aesthetics‖ gradually. These are the computer to the real environment. Presently, AR is being important concepts in new media art [1][2]. applied very extensively to such as education, medical In ―The End of Art‖, Arthur C. Danto said/mentioned that technology, military training, engineering, industrial design, the function of art imitation and reappear has already ______art, entertainment and so on [6][7][8][9][10][11][12]. AR combines virtual objects with the real environment and About-1 Dept. of Information and Learning Technology, National displays the virtual object generated by computers in front University of Tainan, Tainan, Taiwan. of users‘ eyes. Milgram et al. [5] defines two displaying (telephone: +886-6-2133111#771 email: [email protected]) ways of AR. One is See-Through AR. Users can directly see About-2Dept. of Information and Learning Technology, National University of Tainan, Tainan, Taiwan. the surrounding environment through the monitor and the (telephone: +886-6-2133111#771 email: [email protected]) monitor also display the virtual image in it. Accordingly, the
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 117 effect of augmented environment can be the greatest via consists of a sequence of Chinese characters. Fig. 2 is the See-through AR. The other is Monitor Based AR. The transformation program written in Processing computer combines the images captured by the webcam with the virtual images. The final image after combination will show up on Head Mounted Display (HMD) or computer monitor. There are two kinds of HMD, one is pure HMD and the other is HMD with a small webcam. The former has small volume and can be equipped with the head mounted tracking instrument, which can track the present angle as well as direction ahead of user‘s head. It is more suitable for research and application of AR. The latter has immersion effect.
III. CONCEPT OF WORK This work attempts to declare the background of personal contemporary state through an immersion of ―digital vacancy,‖ the work is stacked on the identical digital space with concurrent portrait and enjoyment. The author allows the audience to generate engagement of ideas from past w created videos and poetry using interactive media, which further pushes the audience to wonder what they are expecting and the kind of attention they involve in while Fig.2. The transformation programe written in Processing. waiting. Such as life, the crowd passes each other in the city, alternating and switching consciousness and predicament. After these two inputs are fed to the system, each frame in Perhaps the image of dust generally contains an ingenious i the video is transformed to an image constructed by texts. meaning due to naturally-born vision and wisdom; while the The transformation process is depicted as follows. A ―cell condensation of air, image and signs symbolize the endless size‖ is defined in the program, and each cell contains vacancy. Perhaps the audience has fallen into a conventional several pixels, for example, four pixels. The number of the mindset. Often times, the audience needs to think again cell size V determines the style of the resulting image. For before understanding the definition to his or herself. each cell in the frame, we replace the content by the Our thinking can be both naive and profound. Therefore, character in the poem. The orders for the characters to be this work attempts to expand fragments of a series of applied depend on their positions in the poem. identify from the phenomenology of inconspicuous things. Moreover, the color of the character is based on the color of The work describes the doubt and depression in life, the cell on the same position. Notice that the font size of the combining with humor of predicament and absurditiesy of character can also be defined by the designer. Therefore, if senses. We are the city wanderers who observe various the font size is larger than the cell size, characters on the surrounding symbols through constantly mutteringl without image may overlap with each others so that the colors will probing into its significance. blur to embellish the frame to be ―draw-like‖. After we enter the kingdom of other dimension, we often When all the frames are generated and be filled with colors start immersing in the beauty of ambiguityr while thinking via the above process, an interactive ―digital poem‖ with a about the multilevel of possibility. Such pattern forms video form is thus produced. Fig. 3 is the video file before cognitive approach to reflect the nature and details of things, transformation, play with Quick Time. And Fig. 4 is the while estimating the length and scale of seemingly familiar frame after transformation. yet strange surrounding sceneries,a giving a little taste of such inspiration. As Claude Levi-Strauss said, ―Our eyes have lost the ability to distinguish and we no longer know how to treat things.‖E Subjective regularity helps us gain insight to streamer and freeze in true cleverness. Our creation no longer belongs to part of theories and we can unlimitedly slow down our pace. III. DIGITAL POEM
A system (written in the Processing programming language) is established where the digital poem is generated via the interaction between a video film and a text-based poem. In other words, the system acquires two kinds of inputs: (1) a video file which was produced by the artist before, and (2) a modem poem which was written by the artist. The poem P a g e | 118 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Fig.3. The video file before transformation.
vertex, the u is horizontal coordinate for the texture mapping and the y is vertical coordinate for the texture mapping.
Fig.6. The film draws on image texture. Fig.4. The frame after transformation (Pixels in this frame were replaced by w texts in poem). About the development environment, we use a PC with Pentium(R) Dual-Core 2.6GHz CPU, Logitech Orbit as the webcam, which captures 30e frames per second. The frame IV. IMPLEMENTATION size is 640×480. The distance between the webcam and the postcard is 50 centimeters. The AR marker on postcard is . 4.55 cm in length and iwidth respectively. The interactive content of the work contains video. Each poem from postcard matches a virtual digital poem. Then, the audience can interact with postcard by directly manipulatingV it. Figure 7 is the example of postcard. Each postcard corresponds to a video of digital poem. The total numbers of postcards and videos are both 12. Shown in figure 8 are all of the 12 postcards.
y
The webcam captures video of the real world andl sends it to the computer. The system searches through each video frame for any square shapes of black color. If a square is found, the system uses some mathematicsr to calculate the position of the webcam relative to the black square. Once the position of the webcam is determined, a film of digital poem is drawn from that same position. This film of digital poem is drawn on top of the videoa of the real world and so appears stuck on the square marker. The final output is shown back, and displayed via the projector. Therefore, when the audience looksE through the display they see film of digital poem overlaid on the real world. Figure 6 shows the digital poem presentation based on our system written in Processing language. We create image textures and corresponding vertices. Then, the four vertexes
of film will match the four ones of Image Texture and film will be drawn on the Image Texture. There are four vertices in Image Texture. They are expressed as vertex (x, y, u, v). Fig.2. (1) The postcard of back is AR marker. (2) The postcard of front is The x is coordinate of the vertex, the y is coordinate of the poem.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 119
w
e
i
V
y
l
r
a
E
Fig. 8. The total postcards P a g e | 120 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
.The element of presentation includes the webcam interact with the AR digital poem via internet. The artwork embedded to the lamp, reading-desk, projector and white setting and operation is easy. Audiences only need to set up wall in exhibition. Figure 9 is the presentation of artworks. a webcam, with no additional hardware requirement. In There are several postcards on reading-desk and the lamp is comparison to other AR equipment, the cost of this work is installed at a higher position in order to present a broader quite low. In the future, some usability evaluation will be view. The audience ―read‖ the content of these poem by performed on this work.
VII. REFERENCES manipulating the postcards, so that the digital films ―hidden‖ behind the AR markers will be displayed. The installation of 1) Lev, M. (2001). The Language of New Media. this artwork is show in figure 10. Massachusetts: MIT Press. 2) Kirk, V., and Gopnik, A. (1990). High and Low: Modern Art and Popular Culture. New York: Museum of Modern Art. 3) Oliver, G. (2003).Virtual Art. Massachusetts: MIT Press. 4) Zucker, S. D. (1997). The Arts of Interaction: Interactivity, Performativity and Computers, Journal of Aesthetics and Art Criticism (Special Issue on Art and Technology),w 55(2), 17-127. 5) Milgram, P., and Kishino, F. (1994). A Taxonomy of Mixed Reality Visual Displays. IEICE Trans. Information Systems,e E77-D(12), 1321-1329. 6) Azuma, R.(1997). A Survey of Augmented Reality. Presence: Teleoperators and Virtual Environments, 6, 355-385. i 7) Azuma, R., Baillot, Y., and Behringer, R. (2001). Recent advances in augmented reality. IEEE Computers and Graphic, 21, 34–47. 8) Dünser,V A., and Hornecker, E. (2007). Lessons from an AR book study. In: Proceedings of the First International Conference on Tangible and Embedded Interaction, 179-182. 9) Liarokapis, F., Petridis, P., Lister P.F., and White, Fig.9. The presentation of artworks. M. (2002). Multimedia Augmented Reality y Interface for E-learning (MARIE). World Trans. on Engineering and Technology Education, 1(2), 173- l 176. 10) Billinghurst, M., Kato, H., and Poupyrev, I. (2001). The MagicBook: A transitional AR interface. r Computer and Graphic, 25, 745-753. 11) Kirner, C., and Zorzal, E.R. (2005). Educational applications of augmented reality collaborative environments. Proceedings of sixteenth Brazilian a Symposium on Informatics in Education, 114-124. 12) Hsieh, M.C., and Lee, J.S. (2008). AR marker capacity increasing for kindergarten English E learning. International Multiconference of Engineerings and Computer Scientists, 663-666. 13) Kato, H., Billinghurst, M., Blanding, B., and May, R. (1999). ARToolKit. Technical Report (Hiroshima City University). Fig.10. The audience interacts with AR digital poem.
VI. CONCLUSION
In this work, we employ augmented reality technologies to
create digital artworks to present interactive poem. This
artwork was exhibited in Digital Art Center, Taipei, Taiwan,
and the exhibition duration is from March 6, 2010 to April
11, 2010. We will extend this work so that the audiences can
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 121
AUTOCLUS: A Proposed Automated Cluster Generation Algorithm
Samarjeet Borah1 Mrinal Kanti Ghose2
Abstract-Among all kind of clustering algorithms partition that an objective function is optimized. Partition based based and hierarchical based methods have gained more clustering algorithms try to locally improve a certain popularity among the researchers. Both of the methods have criterion. The majority of them could be considered as their own advantages and disadvantages. In this paper an greedy algorithms, i.e., algorithms that at each step choose attempt has been made to propose a new clustering algorithm the best solution and may not lead to optimal results in the which includes the selected features of both of the algorithms. While developing the proposed methodology, emphasis has end. The best solution at each step is the placement of a been given on some of the disadvantages of both categories of certain object in the cluster for which the representative algorithms. The proposed algorithm is tested with various point is nearest to the object. This family of clustering datasets and found satisfactory results algorithms includes the first ones that appeared in the Data Keywords-Clustering, Partition, Hierarchical, Automatic, Mining Community. The most commonlyw used are K-means Distance Measure. [7], PAM (Partitioning Around Medoids), CLARA (Clustering LARge Applications) and CLARANS I. INTRODUCTION (Clustering LARge ApplicatioNS). All of them are applicable to data sets with numericale attributes. here is huge amount of data in the world and it is T increasing day by day. Everyday new data are collected B. Hierarchical Clustering Algorithms and stored in the databases. To obtain implicit meaningful i information from the data the requirement of efficient Hierarchical algorithms can be agglomerative (bottom-up) analysis methods [1] arises. If a data set has thousands of or divisive (top-down). Agglomerative algorithms begin entries and hundreds of attributes, it is impossible for a with each element as a separate cluster and merge them in human being to extract meaningful information from it by successivelyV larger clusters. Divisive algorithms begin with means of visual inspection only. Computer-based data the whole set and proceed to divide it into successively mining techniques are essential in order to reveal a more smaller clusters. Hierarchical algorithms have two basic complicated inner structure of the data. Such techniques are advantages [4]. First, the number of classes need not be the clustering solutions which help in extracting information specified a priori, and second, they are independent of the from the large dataset initial conditions. However, the main drawback of hierarchical clustering techniques is that they are static; that II. CLUSTERING y is, data points assigned to a cluster cannot move to another Clustering [2][3][4] is a type of unsupervised learning cluster. In addition to that, they may fail to separate method in which a set of elements is separatedl into overlapping clusters due to lack of information about the homogeneous groups. Intuitively, patterns within a valid global shape or size of the clusters [8]. In hierarchical cluster are more similar to each other than they are to a clustering, the output is a tree showing a sequence of pattern belonging to a different cluster.r The variety of clustering, with each cluster being a partition of the data set techniques for representing data, measuring similarity [9]. between data elements, and grouping data elements has III. AUTOMATED CLUSTERING (AUTOCLUS) produced a rich and often confusinga assortment of clustering methods. Clustering is useful in several exploratory pattern- This work has been motivated by the issues mentioned analysis, grouping, decision-making, and machine-learning above. Although the above algorithms are well established situations, including data mining, document retrieval, image and quite efficient one but their particular drawbacks may segmentation, and E pattern classification [5][3]. Data affect the clustering result. For example, many of these clustering algorithms can be hierarchical or partitional [6]. algorithms require the user to specify input parameters Within each of the types, there exists a wealth of subtypes where wrong input parameter may result in bad clustering. and different algorithms for finding the clusters The algorithm AUTOCLUS has been proposed keeping some of the issues in mind faced in the above algorithms. A. Partition Based Clustering Methods It‘s a hybrid kind of algorithm which includes some features Given a database of n objects, a partition based [5] of both partition based and hierarchical based algorithms. clustering algorithm constructs k partitions of the data, so A. Proposed Methodology ―Autoclus‖ ______
Let the data set be given as 푿 = {풙풊, 풊 = ퟏ, ퟐ … 푵} which About-*Department of Computer Science & Engineering, Sikkim Manipal consists of N data objects 풙 , 풙 … 풙 ., where each object Institute of Technology Majitar, Rangpo, East Sikkim-737136, India ퟏ ퟐ 푵 (e-mail;[email protected] [email protected]) has M different attribute values corresponding to the M P a g e | 122 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
different attributes. The value of i-th object can be given by: algorithm follows the reproduction process of amoeba, 푿풊 = {풙풊ퟏ, 풙풊ퟐ … 풙풊푴} which is a tiny, one celled organism. Amoebas reproduce by Let the relation 풙풊 = 풙풌 does not mean that 풙풊 and 풙풌 are binary fission. A parent cell divides the nucleus also divides the same objects but that the two objects has equal values in a process called fission and produces two smaller copies for the attribute set 푨 = {풂ퟏ, 풂ퟐ, … ., 풂풎}. The main of it. objective of the algorithm is to partition the dataset into k The same phenomenon has been followed here. A cluster disjoint subsets where 풌 ≤ 푵. The algorithm tries to will be divided into two smaller clusters selecting a new minimize the inter-cluster similarity and maximize the intra- mean value. This new means value will be selected at the cluster similarity. furthest Euclidian distance of the current mean. Then the B. Distance Measure rest of the objects will be redistributed among the two While searching a certain structure from given data sets, the means (the old mean and the newly selected mean). important thing is to find an appropriate distance function. E. Cluster Validation Phase In this context the most important question is what should be the criterion for selecting an appropriate distance This is the most important part of the algorithm. Whether function. For distance calculation the distance measure sum the newly generated cluster is a stable cluster or not will be of square Euclidian distance is used in this algorithm. It checked by this Cluster validation phase. Within group sum aims at minimizing the average square error criterion which of square has been taken as the criteria for cluster validation. is a good measure of the within cluster variation across all If the total WGSSE of the newly generated clusters is the partitions. Thus the average square error criterion tries to smaller than the parent cluster‘s WGSSE,w then the clusters make the k-clusters as compact and separated as possible. are valid. Otherwise the newly generated clusters will be The algorithm combines the features of both hierarchical discarded and the clustering process will be stopped here. and partition based clustering. It creates a hierarchical F. The pseudoe code for AUTOCLUS decomposition of the given set of data objects. At the same time it tries to group the objects based on a mean value. It is 1. Take an initial data set D. a simple algorithm which applies a top-down or divisive i 2. Compute Grand Mean: CALCULATE_GM(D). approach. Initially all the objects of the dataset will be 3. Find the object with mean value closest to GM and assumed as a single cluster. The algorithm applies an call it Cluster_Head1. iterative process to divide the given dataset into a set of 4. Assign points to the cluster ASSIGN_PT (X, C) clusters until the termination condition converges. The V // X = {xi, i = 1, 2 … N}, classification is done based on the popular clustering // C = {c , c … c } where k ≤ N criterion within-group sum of squared error (WGSSE) 1 2 k function. 5. SS ∶= CALCULATE_WGSS (M, C). 풏 푴 ퟐ 푾푮푺푺푬 = 풙풊풋 − 풙풊 // M= {m1, m2 … mk}where k ≤ n 풊=ퟏ 풋=ퟏ y 6. Repeat the following steps while The classical WGSSE function was originally designed to WGSSE_of_Parent > define the traditional hard c-means and l ISODATA (푇표푡푎푙_푊퐺푆푆퐸_표푓_퐶ℎ 푖푙푑푠) algorithms. With the emergence of fuzzy sets theory, Dunn a. Obtain Euclidian Distance(ED) from all [10] firstly generalized WGSSE to square weighting other objects to Cluster_Head1. WGSSE function. Later, Bezdek [11] r extended it to an b. Select the object at the largest Euclidian infinite family of criterion functions which formed a distance of Cluster_Head1. universal clustering objective function of fuzzy c-means (FCM) type algorithms. The studies on criterion functions c. Name the object at largest distance as a Cluster_Head2. have mainly been focused on the measurements of similarity or distortion D (.), which are often expressed by the d. Rename the Cluster_Head1 as distances between the samples and prototypes. Different Cluster_Head1.1 distance measurementsE are used to detect various structural e. Reassign objects around Cluster_Head1.1 subsets. and Cluster_Head2. f. Calculate WGSSE for the C. Phases of the Algorithm Cluster_Head1.1 and The algorithm works in two different phases. One is the Cluster_Head2 (SS1 & 푆푆2). cluster generation phase and the other is the cluster g. If validation phase. The various phases of the algorithm work WGSSE_of_Parent > as follows: (푇표푡푎푙_푊퐺푆푆퐸_표푓_퐶ℎ 푖푙푑푠) then the child clusters will be accepted else D. Cluster Generation Phase discarded. This phase involves in the formation of new clusters by h. Go to step 6 and repeat the whole process grouping the objects around a new mean value. The for accepted new clusters.
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 123
IV. IMPLEMENTATION & RESULTS appropriate number of clusters without sound domain The algorithm has been implemented in C using a synthetic knowledge in prior. Again, hierarchical methods suffer from data set having 10 dimensions. The data set consists of real a fact that once a step (merge/split) is done, it can never be values including both positive and negative. It is almost undone. This algorithm overcomes that problem. But it similar to a gene expression data set. The program has increases the computation cost because of the cluster different procedures for the implementation of various validity process. In the development of the AUTOCLUS elements of the algorithm. For example the procedure algorithm, it has been tried to minimize these drawbacks as compute_mean_grandmean() has been used to compute the much as possible. From the experiments it has been found grand mean of the dataset. Again the procedure wgsse_cal() that the algorithm is working properly with minimum user has been used to calculate the within group sum of square of interaction. The number of clusters to be generated need not a cluster and computationss() has been used to generate the to be entered in prior. Again, in the cluster validation phase clusters. It‘s a top-down approach. The clusters that have it can automatically accept or discard clusters based on the been generated are uniquely identified by a cluster number. criteria function. The algorithm is tested with datasets of The numbering of the clusters have been done in such a way varying size and found satisfactory result. that the level of the cluster in the sub-tree can be found out from the number itself. For example 0 is the number VI. REFERENCES assigned to the root of the tree, which is the cluster 1) Yi Jiang, Efficient Classification Method for Large containing all the nodes of the dataset, 00 is assigned to the Dataset, School of Informatics,w Guangdong Univ. left sub-tree, 01 is assigned to the right sub-tree and so on. of Foreign Studies, Guangzhou Applying the algorithm on the given dataset finally six 2) Alexander Hinneburg, Daniel A. Keim, Clustering clusters have been found. The tree of the clusters generated Techniques for Larges Data Sets from the Past to can be shown as below: e the Future,
0 3) A.K. Jain (Michigan State University), M.N. Murty
(Indian Institutei of Science) and P.J. Flynn (The
Ohio State University), Data Clustering: A Review, 00 01 4) Lourdes Perez, Data Clustering, Student Papers, 000 001 010 011 University of California San Diego, http://cseweb.ucsd.edu/~paturiV /cse91/papers.html 5) Raza Ali, Usman Ghani, Aasim Saeed, Data
0000 0001 0010 0011 0100 0101 Clustering and Its Applications, http://
members.tripod.com/asimsaeed/paper.htm,
Fig 1: The Tree of the Clusters Generated by AUTOCLUS 6) Pavel Berkhin, Survey of Clustering Data Mining A small portion of the result of AUTOCLUS is given in the Techniques, Accrue Software Inc, San Jose, CA, following figure. The pair of centroids mentioned there are thosey (2002). 7) McQueen J.B. Some methods for classification and Cluster No. 0: 0 8.00 4.30 9.00 7.10 9.10 6.00 2.30 ) ( 2.00 5.00 11.00l 88.00 3.00 analysis of multivariate observations. In 5.00 66.00 22.00 7.77 4.30 ) ( -3.00 9.00 -2.00 -4.00 0.90 5.00 6.00 Proceedings of the Fifth Berkeley Symposium on 1.00 1.10 4.50 ) ( 5.00 1.00 0.20 9.00 0.40 -8.00 -1.00 -4.00 -5.00 Mathematical Statistics and Probability, volume 1, 6.30 ) ( 8.00 9.00 2.00 1.00 6.00 -4.00 7.00 6.10 9.00 0.80 ) ( 9.00 7.80 pages 281–297, Univ.of California, Berkeley, 9.00 6.10 -9.00 -3.00 0.50 8.00 9.00 -9.00 ) ( r-1.00 -3.00 6.00 7.00 3.00 3.00 5.70 55.50 32.50 -66.00 ) -->Centroid: (6.16 , 21.41) 1967. Univ.of California Press, Berkeley. Cluster No. 00: 8) A.K. Jain (Michigan State University), M.N. Murty ( 5.00 5.00 6.00 8.00 -9.00 4.40 22.00 8.50 7.70 4.00 ) ( 5.40 6.00 (Indian Institute of Science) and P.J. Flynn (The 2.10 8.00 4.30 9.00 7.10 9.10 6.00 2.30a ) ( - 3.00 9.00 -2.00 -4.00 0.90 Ohio State University) Data Clustering: A Review. 5.00 6.00 1.00 1.10 4.50 ) ( 5.00 1.00 0.20 9.00 0.40 -8.00 - 9) I .Murtagh, A Survey of Recent Advances in 1.00 -4.00 -5.00 6.30 ) ( 8.00 9.00 2.00 1.00 6.00 -4.00 Hierarchical Clustering Algorithms, Oxford 7.00 6.10 9.00 0.80 ) Journals, The Computer Journal, Vol. 26, No. 4, ( 9.00 7.80 9.00 6.10E -9.00 -3.00 0.50 8.00 9.00 -9.00 ) pp. 354-359. ( -1.00 -3.00 6.00 7.00 3.00 3.00 5.70 55.50 32.50 - 10) Dunn, J. C., A fuzzy relative of the ISODATA 66.00 ) -->Centroid : (2.84 , 4.27) process and its use in detecting compact well
which will be considered for the next decomposition of the cluster. separated cluster, J. Cybemet, 1974, 3: 32. Fig 2: Results from AUTOCLUS implementation 11) Bezdek, J. C., Pattern Recognition with Fuzzy Objective Function Algorithms, New York: V. CONCLUSION Plenum Press, 1981 Partition based clustering algorithms face a problem that the number of partitions to be generated has to be entered by the user. Generally, algorithms of K-means family face this problem. As a result the clusters formed may not be upto mark. Because, it is difficult for a user to select the P a g e | 124 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
Implementing Search Engine Optimization Technique to Dynamic / Model View Controller Web Application
R. Vadivel1 Dr. K. Baskaran2
Abstract-the main objective of this paper is implementing the familiarity with basic HTML. SEO is sometimes also called search engine optimization to dynamic web application / Model SEO copyrighting because most of the techniques that are Viewer Controller web application. Implement the SEO used to promote sites in search engines deal with text. concepts to both applications static and dynamic web Generally, SEO can be defined as the activity of optimizing application. There is no issue for create SEO contents to static Web pages or whole sites in order to make them more (web contents does not change until that web site is re host) web application and keep up the SEO regulations and state of search engine-friendly, thus getting higher positions in affairs. A few significant challenges to dynamic content poses. search results. w To overcome these challenges to have a fully functional A Search Engine Optimization (SEO) is very popular term dynamic site that is optimized as much as a static site can be in web application industry. We can implement the SEO optimized. Whatever user search and they can get information concepts to both applicationse static and dynamic web their information quickly. In that circumstances we are using application. No matter for implement SEO to static web few search engine optimization dynamic web application application. We have just followed up the SEO rules and methods such as User Friendly URL’s, URL Redirector and conditions. We have ito implement to dynamic / MVC web HTML Generic and few other SEO methods and concepts such application it should be an insignificant complicate and use as a crawler, an index (or catalog) and a search interface, search engine algorithms and page rank algorithms. Both some tricky. internal and external elements of the site affect the way it’s The specific objective is to implement search engine ranked in any given search engine, so all of these elements optimizationV for model 1 and model 2/Model Viewer should be taken into consideration. Controller (MVC) dynamic web applications. There is no Keywords— Search Engine Optimization (SEO), Model Viewer specified web technology in dynamic web applications. We Controller (MVC), Dynamic web, Friendly URLs, ASP.Net can use any Microsoft or any other corporation software. In I. INTRODUCTION my work .NET has played major role. To understand dynamic content, it's important to have an f we have a website, we definitely need it to be a friendy of idea of its opposite, static content. The term static content I search engines. There are several ways to attract visitors refers to web content that is generated without using a data to our website, but in order to make searchers knowl about source such as a database. Essentially, the site viewer sees our website, search engine is the tool where we need to exactly what is coded in the web page's HTML. prove our contents. If we are just having a static HTML With dynamic pages, a site can display the same address for content, then there is no much problem inr promoting it. But every visitor, and have totally unique content for each one to where in today‘s world of Content Managed Websites and view. For example, when I visit the social networking site eCommerce Portals we need to look further and implement a Facebook (facebook.com), I see few more techniques in order to make the site more http://www.facebook.com/home.php as the address in my prominent to robots. In this articlea we will discuss how we web browser, but I see a unique page that's different from can develop a SEO Friendly website where the content is what anyone else sees if they view that page at the same driven from the Database with a Content Management time. Thesite shows information about my friends in my System which is developed using ASP.NET. We will learn account, and different information for each person in his to build a simple CMSE driven site with no nonsense URL, account, or for someone who has no account. which Search Engines invite. Search Engine Optimization (SEO) is often considered the more technical part of Web Not all dynamically generated content is unique to every marketing. This is true because SEO does help in the viewer, but all dynamic content comes from a data source, promotion of sites and at the same time it requires some whether it's a database or another source, such as an XML technical knowledge – at least file ______A. SEO in web application About-1Computer Science, Karpagam University Pollachi Road, Eachanari, Coimbatore, Tamilnadu India 641 024 A web application has playing most important role in the (e-mail;[email protected]) online business. About-2Asst. Professor (RD), Dept. of CSE and IT, Govt. College of Technology, Coimbatore – 641 006
Global Journal of Computer Science and Technology Vol. 10 Issue 5 Ver. 1.0 July 2010 P a g e | 125
A million of static and dynamic web pages are available in Because dirty URLs are long and complex, they are the internet and million users can have used those web pages difficult to repeat or remember and provide few clues for for their required information. average users as to what a particular resource actually In this circumstances search engine optimization is play contains or the function it performs. most important play between user and web applications. In Million web pages are available the user should need D. Dirty URLs are a security risk their specific search criteria such as business man have The query string which follows the question mark (?) in a search the own needs, students have search their own needs dirty URL is often modified by hackers in an attempt to and etc., perform a front door attack into a web application. The very Our aim is whatever user search and they can get file extensions used in complex URLs such as .asp, .jsp, .pl, information their information quickly. In that situation we and so on also give away valuable information about the are using few search engine optimization methods and implementation of a dynamic web site that a potential concepts such as a crawler, an index (or catalog) and a hacker may utilize. search interface, search engine algorithms and page rank algorithms. E. Dirty URLs impede abstraction and maintainability Search engines take advantage of reverse broadcast Because dirty URLs generally expose the technology used networks to help save you time and money. Search allows (via the file extension) and the parameters used (via the you to "sell what your customers want, when they want it! ―. query string), they do not promote abstraction. Instead of Search Engine Optimization is the science of customizing hiding such implementation details, dirtyw URLs expose the elements of your web site to achieve the best possible search underlying "wiring" of a site. As a result, changing from one engine ranking. That‘s really all there is to search engine technology to another is a difficult and painful process filled optimization. But as simple as it sounds, don‘t let it fool with the potential for brokene links and numerous required you. redirects. Both internal and external elements of the site affect the III. RELATED WORKS way it‘s ranked in any given search engine, so all of these i elements should be taken into consideration. Good Search There is a three technologies have been used that is 1. User Engine Optimization can be very difficult to achieve, and Friendly URL‘s, 2. URL Redirector and 3. HTML Generic. great Search Engine Optimization seems pretty well A Model View Controller has been used Microsoft .NET impossible at times. web applicationV with ASP.NET and C#. In this application Optimization involves making pages readable to search has used data model and business layer in separate module engines and emphasizing key topics related to your content. and its like a DLL (Dynamic Link Library) and we have Basic optimization may involve nothing more than ensuring started to created and converted dynamic URL‘s into Static that a site does not unnecessarily become part of the URLs. invisible Web (the portion of the Web not accessible Fig 5 – Fig 8 is shows and implemented those technologies through Web search engines). y which are mentioned in early. The URLs converting code first we must grab the incoming URL and split the extension II. EXISTING SYSTEM l of the page. Which pages have ―.html‖ extension we should Previously SEO have implemented in static commercial / redirect that page to related ―.aspx‖ page on code behind non-commercial web sites. In this way there is no they have executed business logic or data manipulation or dynamically site map and have not well-definedr RSS feed whatever functionality need, and display to the end user for those implementations and there is no specific way to exact content for that particular page with proper Meta find the back links. description and keywords. In this time of period user can only view ―.html‖ page but all other logics will execute the A. Dirty URLsa code behind. Complex, hard-to-read URLs are often dubbed dirty URLs because they tend to be littered with punctuation and A. Dynamic Content and SEO identifiers that are at best irrelevant to the ordinary user. SEO for dynamic content poses a few significant URLs such E as http://www.example.com/cgi- challenges. Luckily, you have ways to overcome these bin/gen.pl?id=4&view=basic are commonplace in today's challenges to have a fully functional dynamic site that is optimized as much as a static site can be optimized. This dynamic web. Unfortunately, dirty URLs have a variety section discusses the pitfalls of dynamic sites, and how to of troubling aspects, including: overcome them to create fully optimized dynamic sites. B. Dirty URLs are difficult to type B. Challenges for Optimizing Dynamic Content The length, use of punctuation, and complexity of these Here are some common areas of dynamic sites that URLs makes typos commonplace. provide setbacks for humans as well as search engine spiders. C. Dirty URLs do not promote usability P a g e | 126 Vol. 10 Issue 5 Ver. 1.0 July 2010 Global Journal of Computer Science and Technology
1) Dynamic URLs their appearance. Here's an example from http:// A Dynamic URL is an address of a dynamic web page, as www.financialadvisormatch.com/ article for a product called opposed to a Static URL, which is the address of a static Kindle: web page. Dynamic URLs are typically fairly cryptic in http://www.financialadvisormatch.com/article/product/B00 While search engines may not have problems indexing 0FI73MA/ref=amb_link_7646122_1?pf_rd_ URLs with variables, it's important to note that highly m=ATVPDKIKX0DER&pf_rd_s=center- descriptive URLs like the one just shown can get more 1&pf_rd_r=1FYB35NGH8MSMESECBX7&pf_rd_t=101 clicks in searches than cryptic URLs.if searchers can clearly &pf_rd_p=450995701&pf_rd_i=507846 see keywords that have to do with the content they're Notice that the URL doesn't contain any information about looking for in your page's URL. the item's product type, or anything about the item's name. For a well-trusted site like Amazon, this is not a problem at 2) Logins and other forms all. But for a new site, or for a site that's gaining credibility Login forms can restrict access to pages not only to users, and popularity, a better solution can help search results by but also search engines. In some cases, you want pages showing a searcher some relevant keywords in the page's behind logins made searchable. In those cases, you can place URL. Here's an example of something a little more code in those pages that determines whether the person effective: visiting has access to view that content, and determine what http://www.financialadvisormatch.com/article/products/ele to do from there. ctronics/kindle/ w
e
i
V
y
l
r
a
E Fig – 1 Login and search engine validations
3) Cookies Other web forms, referring to content in