Designing & Implementing a Java Web Application to Interact With

Designing & Implementing a Java Web Application to Interact With

St. Cloud State University theRepository at St. Cloud State Culminating Projects in Information Assurance Department of Information Systems 12-2016 Designing & Implementing a Java Web Application to Interact with Data Stored in a Distributed File System Punith Reddy Etikala St. Cloud State University, [email protected] Follow this and additional works at: https://repository.stcloudstate.edu/msia_etds Recommended Citation Etikala, Punith Reddy, "Designing & Implementing a Java Web Application to Interact with Data Stored in a Distributed File System" (2016). Culminating Projects in Information Assurance. 11. https://repository.stcloudstate.edu/msia_etds/11 This Starred Paper is brought to you for free and open access by the Department of Information Systems at theRepository at St. Cloud State. It has been accepted for inclusion in Culminating Projects in Information Assurance by an authorized administrator of theRepository at St. Cloud State. For more information, please contact [email protected]. Designing & Implementing a Java Web Application to Interact with Data Stored in a Distributed File System by Punith Reddy Etikala A Starred Paper Submitted to the Graduate Faculty of St. Cloud State University in Partial Fulfillment of the Requirements for the Degree of Master of Science in Information Assurance December, 2016 Starred Paper Committee: Dr. Dennis Guster, Chairperson Dr. Lynn Collen Dr. Keith Ewing 2 Abstract Every day there is an exponential increase of information and this data must be stored and analyzed. Traditional data warehousing solutions are expensive. Apache Hadoop is a popular open source data store which implements map-reduce concepts to create a distributed database architecture. In this paper, a performance analysis project was devised that compares Apache Hive, which is built on top of Apache Hadoop, with a traditional database such as MySQL. Hive supports HiveQueryLanguage, a SQL like directive language which implements MapReduce jobs. These jobs can then be executed using Hadoop. Hive also has a system catalog – Metastore which is used to index data components. The Hadoop framework is developed to include a duplication detection system which helps managing multiple copies of the same data at the file level. The Java Server Pages and Java Servlet framework were used to build a Java web application to provide a web interface for the clients to access and analyze large data sets present in Apache Hive or MySQL databases. 3 Acknowledgement This research paper about designing and implementing Java web applications to interact with data stored in a distributed file system was undertaken using resources provided by the Business Computing Research Laboratory of St. Cloud State University. Data used for the analyses came from the St. Cloud State University library. 4 Table of Contents Page List of Tables ........................................................................................................................ 7 List of Figures ........................................................................................................................ 8 Chapter 1: Introduction .......................................................................................................... 11 Introduction ................................................................................................................ 11 Problem Statement ..................................................................................................... 14 Definition of Terms.................................................................................................... 15 Chapter 2: Literature Review and Background ..................................................................... 16 Challenges of working with Big Data ........................................................................ 16 Need for Distributed File System .............................................................................. 17 Architectures to support Big Data .............................................................................. 18 Performance Issues with Big Data ............................................................................. 18 Advantages of Hadoop and Map Reduce ................................................................... 22 Challenges of Effectively Integrating a Web with Hadoop and Map Reduce ........... 23 Chapter 3: Methodology ........................................................................................................ 26 Design of the Study .................................................................................................... 26 Web Application Technical Architecture .................................................................. 27 Data Collection .......................................................................................................... 28 Tools and Techniques ................................................................................................ 34 Hardware Environment .............................................................................................. 37 Software Environment ............................................................................................... 38 5 Page Chapter 4: Implementation .................................................................................................... 39 Installing Java ............................................................................................................ 39 Installing SSH ............................................................................................................ 39 Installing and Configuring Apache Tomcat ............................................................... 40 Installing and Configuring MySQL Server 5.6.......................................................... 41 Disabling IPv6 ........................................................................................................... 42 Installing and Configuring Apache Sqoop ................................................................. 42 Installing and Configuring Apache Hive ................................................................... 44 Installing and Configuring Apache Hadoop .............................................................. 47 Enabling Secure Connection using SSH .................................................................... 52 Configuring Hostname ............................................................................................... 53 Loading Twitter Data from Text file to MySQL ....................................................... 54 Importing Data from MySQL to HDFS ..................................................................... 54 Creating Tables in Hive ............................................................................................. 55 Loading Data from HDFS to Hive Tables ................................................................. 61 Chapter 5: Analysis and Results ............................................................................................ 62 Access to Hadoop Cluster .......................................................................................... 62 Access to Apache Tomcat .......................................................................................... 70 Access to Java Web Application................................................................................ 72 UML Diagrams .......................................................................................................... 82 Summary .................................................................................................................... 85 6 Page Chapter 6: Conclusion and Future Work ............................................................................... 88 Conclusion ................................................................................................................. 88 Future Work ............................................................................................................... 88 References……. ..................................................................................................................... 89 Appendix………… ................................................................................................................ 93 7 List of Tables Table Page 1. Definition of Terms.................................................................................................... 15 2. Students table in MySQL ........................................................................................... 29 3. Majors table in MySQL ............................................................................................. 30 4. CirculationLog table in MySQL ................................................................................ 31 5. Virtual Machine Details ............................................................................................. 37 6. Students table in Hive ................................................................................................ 57 7. Majors table in Hive................................................................................................... 58 8. CirculationLog table in Hive ..................................................................................... 59 9. TwitterAnalysis table in MySQL and Hive ............................................................... 60 10. Comparison of Computation time of Hive vs MySQL .............................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    186 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us