Hadoop 2.0 Introduction – with HDP for Windows

Total Page:16

File Type:pdf, Size:1020Kb

Hadoop 2.0 Introduction – with HDP for Windows Hadoop 2.0 Introduction – with HDP for Windows Seele Lin Who am I • Speaker: – 林彥⾠ – A.K.A Seele Lin – Mail: [email protected] • Experience – 2010~Present – 2013~2014 • Trainer of Hortonworks Certificated Training lecture – HCAHD( Hortonworks Certified Apache Hadoop Developer) – HCAHA ( Hortonworks Certified Apache Hadoop Administrator) Agenda • What is Big Data – The Need for Hadoop • Hadoop Introduction – What is Hadoop 2.0 • Hadoop Architecture Fundamentals – What is HDFS – What is MapReduce – What is YARN – Hadoop eco-systems • HDP for Windows – What is HDP – How to install HDP on Windows – The advantages of HDP – What’s Next • Conclusion • Q&A What is Big Data What is Big Data? 1. In what timeframe do we now create the same amount of information that 2 days we created from the dawn of civilization until 2003? 2. 90% of the world’s data was created 2 years in the last (how many years)? 3. This is data from 2010 report! Sources: http://www.itbusinessedge.com/cm/blogs/lawson/just-the-stats-big-numbers-about-big-data/? cs=48051 http://techcrunch.com/2010/08/04/schmidt-data/ How large can it be? • 1ZB = 1000 EB = 1,000,000 PB = 1,000,000,000 TB Every minute… • http://whathappensontheinternetin60seconds.com/ The definition? “Big Data is like teenage sex: Everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it too.”── Dan Ariely. A set of files A database A single file Big Data Includes All Types of Data • Pre-defined • Inconsistent • Irregular schema structure structure or.. • Relational • Cannot be • Parts of it database stored in lack system rows in a structure Structured single table • Pictures • Logs, tweets Unstructured • Video • Often has Semi-structured nested structure Time-sensitive Immutable 6 Key Hadoop DATA TYPES 1. Sentiment How your customers feel 2. Clickstream Website visitors’ data 3. Sensor/Machine Data from remote sensors and machines 4. Geographic Location-based data Value 5. Server Logs 6. Text Millions of web pages, emails, and documents Page © Hortonworks Inc. 2013 4 V’s of Big Data http://www.datasciencecentral.com/profiles/blogs/data-veracity Next Product to Buy (NPTB) Business Problem • Telecom product portfolios are complex • There are many cross-sell opportunities to installed base • Sales associates use in-person conversations to guess about NPTB recommendations, with little supporting data Solution • Hadoop gives telcos the ability to make confident NPTB recommendations, based on data from all its customers • Confident NPTB recommendations empower sales associates and improve their interactions with customers • Use the HDP data lake to reduce sales friction and create NPTB advantage like Amazon’s advantage in eCommerce Use case – Walmart prediction Diapers Beer Friday Revenue ? Localized, Personalized Promotions Business Problem • Telcos can geo-locate their mobile subscribers • They could create localized and personalized promotions • This requires connections with both deep historical data and real- time streaming data • Those connections have been expensive and complicated Solution • Hadoop brings the data together to inexpensively localize and personalize promotions delivered to mobile devices • Notify subscribers about local attractions, events and sales that align with their preferences and location • Telcos can sell these promotional services to retailers 360° View of the Customer Business Problem • Retailers interact with customers across multiple channels • Customer interaction and purchase data is often siloed • Few retailers can correlate customer purchases with marketing campaigns and online browsing behavior • Merging data in relational databases is expensive Solution • Hadoop gives retailers a 360° view of customer behavior • Store data longer & track phases of the customer lifecycle • Gain competitive advantage: increase sales, reduce supply chain expenses and retain the best customers Use case – Target case • Target mined their customer data and send coupons to shopper who have high “pregnancy prediction” score. • One angry father stormed into a Target to yell at them for sending his daughter coupons for baby clothes and cribs. • Guess what, she was pregnant, and hadn’t told her father yet. Changes in Analyzing Data Big data is fundamentally changing the way we analyze information. – Ability to analyze vast amounts of data rather than evaluating sample sets. – Historically we have had to look at causes. Now we can look at patterns and correlations in data that give us much better perspective. Recent day cases 1: • http://www.ibtimes.co.uk/global-smartphone-data-traffic-increase-eightfold-17-exabyte-2020-1475571 Recent day cases 1: Practice on LINE http://tech.naver.jp/blog/?p=2412 Recent day cases 2: in Taiwan – The media analyze of 2014 Taipei City mayor election • IMHO,⿊貘來說 http://gene.speaking.tw/2014/11/blog-post_28.html • 破解社群與 APP⾏銷 http://taiwansmm.wordpress.com/2014/11/26/⾏銷絕不等於買廣 告 -2014年台北市⻑選舉柯⽂哲與連 / Scale up or Scale out? Guess what… • Traditionally, computation has been processor-bound • For decades, the primary push was to increase the computing power of a single machine – Faster processor, more RAM • Distributed systems evolved to allow developers to use multiple machines for a single job – At compute time, data is copied to the compute nodes Scaling with a traditional database • scalling with a queue • sharding the database • fault-tolerane issue • corruption issue • problems NOSQL • Not Only SQL • provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. • Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. • Column: Accumulo, Cassandra, Druid, HBase, Vertica • Document: Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB • Key-value: Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE, Aerospike • Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog First principles(1/2) • "At the most fundamental level, what does a data system do?" • a data system does: "A data system answers questions based on information that was acquired in the past". • "What is this person's name?” • "How many friends does this person have?” • A bank account web page answers questions like "What is my current balance?” • "What transactions have occurred on my account recently?" First principles(2/2) • Data is often used interchangeably with the word "information". • You answer questions on your data by running functions that take data as input • The most general purpose data system can answer questions by running functions that take in the as input. In fact, any query can be answered entire dataset by running a function on the complete dataset Desired Properties of a Big Data System • Robust and fault-tolerant • Low latency reads and updates • Scalable • General • Extensible • Allow ad hoc queries • Minimal maintenance The Lambda Architecture • There is no single tool that provides a complete solution. • You have to use The Lambda Architecture a variety of tools and techniques to build a complete Big Data system. • solves the problem of computing arbitrary functions on arbitrary data in realtime by decomposing the problem into three layers • batch layer, the serving layer, and the speed layer The Lambda Architecture model Batch Layer 1 • The batch layer stores the master copy of the dataset and precomputes batch views on that master dataset. • The master dataset can be thought of us a very large list of records. • two things : – store an immutable, constantly growing master dataset, – and compute arbitrary functions on that dataset. • If you're going to precompute views on a dataset, you need to be able to do so for any view and any dataset. • There's a class of systems called "batch processing systems" that are built to do exactly what the batch layer requires Batch Layer(2 What is Batch View • Everything starts from the "query = function(all data)" equation. • You could literally run your query functions on the fly on the complete dataset to get the results. • it would take a huge amount of resources to do and would be unreasonably expensive • Instead of computing the query on the fly, you read the results from the precomputed view. How we get Batch View Serving Layer 1 Serving Layer 2 • The batch layer emits batch views as the result of its functions. • The next step is to load the views somewhere so that they can be queried. • The serving layer indexes the batch view and loads it up so it can be efficiently queried to get particular values out of the view. Batch and serving layers satisfy almost all properties • Robust and fault tolerant • Scalable • General • Extensible • Allows ad hoc queries • Minimal maintenance Speed Layer 1 Speed Layer 2 • speed layer as similar to the batch layer in that it produces views based on data it receives. • One big difference is that in order to achieve the fastest latencies possible, the speed layer doesn't look at all the new data at once. • it updates the realtime view as it receives new data instead of recomputing them like the batch layer does. • "incremental updates” vs "recomputation updates". • Page view example Speed Layer 3 • complexity isolation:complexity is pushed into a layer whose results are only temporary • The last piece of the Lambda Architecture is merging the resutlts from the batch and realtime views Summary of the Lambda Architecture Summary of the Lambda Architecture
Recommended publications
  • LIST of NOSQL DATABASES [Currently 150]
    Your Ultimate Guide to the Non - Relational Universe! [the best selected nosql link Archive in the web] ...never miss a conceptual article again... News Feed covering all changes here! NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply such as: schema-free, easy replication support, simple API, eventually consistent / BASE (not ACID), a huge amount of data and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above. [based on 7 sources, 14 constructive feedback emails (thanks!) and 1 disliking comment . Agree / Disagree? Tell me so! By the way: this is a strong definition and it is out there here since 2009!] LIST OF NOSQL DATABASES [currently 150] Core NoSQL Systems: [Mostly originated out of a Web 2.0 need] Wide Column Store / Column Families Hadoop / HBase API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java, Concurrency: ?, Misc: Links: 3 Books [1, 2, 3] Cassandra massively scalable, partitioned row store, masterless architecture, linear scale performance, no single points of failure, read/write support across multiple data centers & cloud availability zones. API / Query Method: CQL and Thrift, replication: peer-to-peer, written in: Java, Concurrency: tunable consistency, Misc: built-in data compression, MapReduce support, primary/secondary indexes, security features.
    [Show full text]
  • STUDY and SURVEY of BIG DATA for INDUSTRY Surbhi Verma*, Sai Rohit
    ISSN: 2277-9655 [Verma* et al., 5(11): November, 2016] Impact Factor: 4.116 IC™ Value: 3.00 CODEN: IJESS7 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY STUDY AND SURVEY OF BIG DATA FOR INDUSTRY Surbhi Verma*, Sai Rohit DOI: 10.5281/zenodo.166840 ABSTRACT Now-a-days we rarely observe any company or any industry who don’t have any database. Industries with huge amounts of data are finding it difficult to manage. They all are in search of some technology which can make their work easy and fast. The primary purpose of this paper is to provide an in-depth analysis of different platforms available for performing big data over local data and how they differ with each other. This paper surveys different hardware platforms available for big data and local data and assesses the advantages and drawbacks of each of these platforms. KEYWORDS: Big data, Local data, HadoopBase, Clusterpoint, Mongodb, Couchbase, Database. INTRODUCTION This is an era of Big Data. Big Data is making radical changes in traditional data analysis platforms. To perform any kind of analysis on such huge and complex data, scaling up the hardware platforms becomes imminent and choosing the right hardware/software platforms becomes very important. In this research we are showing how big data has been improvising over the local databases and other technologies. Present day, big data is making a huge turnaround in technological world and so to manage and access data there must be some kind of linking between big data and local data which is not done yet.
    [Show full text]
  • IDC Techscape IDC Techscape: Internet of Things Analytics and Information Management
    IDC TechScape IDC TechScape: Internet of Things Analytics and Information Management Maureen Fleming Stewart Bond Carl W. Olofson David Schubmehl Dan Vesset Chandana Gopal Carrie Solinger IDC TECHSCAPE FIGURE FIGURE 1 IDC TechScape: Internet of Things Analytics and Information Management — Current Adoption Patterns Note: The IDC TechScape represents a snapshot of various technology adoption life cycles, given IDC's current market analysis. Expect, over time, for these technologies to follow the adoption curve on which they are currently mapped. Source: IDC, 2016 December 2016, IDC #US41841116 IN THIS STUDY Implementing the analytics and information management (AIM) tier of an Internet of Things (IoT) initiative is about the delivery and processing of sensor data, the insights that can be derived from that data and, at the moment of insight, initiating actions that should then be taken to respond as rapidly as possible. To achieve value, insight to action must fall within a useful time window. That means the IoT AIM tier needs to be designed for the shortest time window of IoT workloads running through the end- to-end system. It is also critical that the correct type of analytics is used to arrive at the insight. Over time, AIM technology adopted for IoT will be different from an organization's existing technology investments that perform a similar but less time-sensitive or data volume–intensive function. Enterprises will want to leverage as much of their existing AIM investments as possible, especially initially, but will want to adopt IoT-aligned technology as they operationalize and identify functionality gaps in how data is moved and managed, how analytics are applied, and how actions are defined and triggered at the moment of insight.
    [Show full text]
  • Nosql - Notonly Sql
    International Journal of Enterprise Computing and Business Systems ISSN (Online) : 2230-8849 Volume 2 Issue 2 July 2013 International Manuscript ID : ISSN22308849-V2I2M3-072013 NOSQL - NOTONLY SQL Dr. S. George University of New Zealand Abstract A NoSQL database provides a mechanism for storage and retrieval of data that uses looser consistency models than traditional relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. NoSQL databases are often highly optimized key–value stores intended for simple retrieval and appending operations, with the goal being significant performance benefits in terms of latency and throughput. NoSQL databases are finding significant and growing industry use in big data and real-time web applications. NoSQL systems are also referred to as "Not only SQL" to emphasize that they do in fact allow SQL-like query languages to be used. ACID vs BASE NoSQL cannot necessarily give full ACID guarantees. Usually eventual consistency is guaranteed or transactions limited to single data items. This means that given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system. [citation needed ]. Contents History Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-source relational database that did not expose the standard SQL interface. Strozzi suggests that, as the International Journal of Enterprise Computing and Business Systems ISSN (Online) : 2230-8849 Volume 2 Issue 2 July 2013 International Manuscript ID : ISSN22308849-V2I2M3-072013 current NoSQL movement "departs from the relational model altogether; it should therefore have been called more appropriately 'NoREL'.
    [Show full text]
  • NOSQL DATABASE and ITS COMPARISON with SQL DATABASE Parwinder Kumar (Assistant Professor) Department of Computer Science & IT S.G.G.S
    NOSQL DATABASE AND ITS COMPARISON WITH SQL DATABASE Parwinder Kumar (Assistant Professor) Department of Computer Science & IT S.G.G.S. Khalsa College Mahilpur, Hoshiarpur, Punjab (India) ABSTRACT NOSQL database is an emerging alternative to the most widely used relational databases. As the name suggests, it does not completely replace SQL but compliments it in such a way that they can co-exist. In this paper we will be discussing the NOSQL database, types of NOSQL database type, advantages and disadvantages of NOSQL. Keywords: Data Stores, NOSQL, Relational Databases. 1. INTRODUCTION A NoSQL (originally referring to "non SQL" or "non relational" database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Such databases have existed since the late 1960s, but did not obtain the "NoSQL" moniker until a surge of popularity in the early twenty-first century, triggered by the needs of Web 2.0 companies such as Face book, Google and Amazon.com. Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to clusters of machines, which is a problem for relational databases, and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, graph, or document) differ slightly from those used by default in relational databases, making some operations faster in NoSQL and others faster in relational databases. The particular suitability of a given NoSQL database depends on the problem it must solve. Sometimes the data structures used by NoSQL databases are also viewed as "more flexible" than relational database tables.
    [Show full text]
  • Database Management Systems Ebooks for All Edition (
    Database Management Systems eBooks For All Edition (www.ebooks-for-all.com) PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sun, 20 Oct 2013 01:48:50 UTC Contents Articles Database 1 Database model 16 Database normalization 23 Database storage structures 31 Distributed database 33 Federated database system 36 Referential integrity 40 Relational algebra 41 Relational calculus 53 Relational database 53 Relational database management system 57 Relational model 59 Object-relational database 69 Transaction processing 72 Concepts 76 ACID 76 Create, read, update and delete 79 Null (SQL) 80 Candidate key 96 Foreign key 98 Unique key 102 Superkey 105 Surrogate key 107 Armstrong's axioms 111 Objects 113 Relation (database) 113 Table (database) 115 Column (database) 116 Row (database) 117 View (SQL) 118 Database transaction 120 Transaction log 123 Database trigger 124 Database index 130 Stored procedure 135 Cursor (databases) 138 Partition (database) 143 Components 145 Concurrency control 145 Data dictionary 152 Java Database Connectivity 154 XQuery API for Java 157 ODBC 163 Query language 169 Query optimization 170 Query plan 173 Functions 175 Database administration and automation 175 Replication (computing) 177 Database Products 183 Comparison of object database management systems 183 Comparison of object-relational database management systems 185 List of relational database management systems 187 Comparison of relational database management systems 190 Document-oriented database 213 Graph database 217 NoSQL 226 NewSQL 232 References Article Sources and Contributors 234 Image Sources, Licenses and Contributors 240 Article Licenses License 241 Database 1 Database A database is an organized collection of data.
    [Show full text]
  • Use Style: Paper Title
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND INFORMATICS Volume 9, 2015 Simulating user activities for measuring data request performance of the ECM visualization tasks Juris Rats We arrive there through these steps: Abstract—The aim of the research is to assess performance of the • We define activity model of the ECM user that NoSQL database Clusterpoint when processing data requests of the includes definitions of user classes, relevant interactive large user base executing Enterprise Content Management (ECM) visualization scenarios and their usage frequencies; visualization tasks. The user activity model comprising user types, • We develop the set of atomic data interactions and task types, data interaction types, expected preparation times and match them to suitable NoSQL document database data execution frequencies is defined. Visualization models relevant for requesting methods; the defined task types are specified. The data interaction types are • We assess the performance of the NoSQL database matched against data request types of Clusterpoint database. The when processing the identified data request sequences. methodology for creating user activity flows for defined user base is We demonstrate in our study that the visualization process described and used to measure Clusterpoint performance when can be organized in a sequence of tasks (controlled by user) so processing the data requests generated by user activity flows. The that each visualization step needs to process comparatively performance tests are executed for user bases up to 30’000 users and small subset of data even on very large databases. This is for several database cluster configurations. The test results are possible because typical data request of the visualization task analyzed and assessed.
    [Show full text]
  • How Cloud Database Enables Efficient Real-Time Analytics? Data Management Matters
    HOW CLOUD DATABASE ENABLES EFFICIENT REAL-TIME ANALYTICS? DATA MANAGEMENT MATTERS Worldwide data volumes keep growing Clusterpoint — Introducing instantly scalable database as a service Real time management of big data Return result Deals with TBs in milliseconds to PBs of data HIGH CONTRADICTING FAST GOALS ? CAPACITY WHAT? NEED FOR ADVANCED TECHNOLOGY HOW CAN TECHNOLOGY MAKE DATA ACCESS REAL-TIME IN A COST EFFECTIVE WAY? 1. Utilize the right hardware 2. Build advanced indices 3. Cloud Computing 4. Consistency HOW? 1. Utilize the right hardware STORAGE MEDIA There are three types of storage media RAM SSD HDD Clusterpoint — Introducing instantly scalable database as a service 1. Utilize the right hardware STORAGE MEDIA How do they differ? RAM SSD HDD $ / TB 12,000 600 40 Read time / 40 s 20 min 3 h TB 100 ms read 2.5 GB 0.1 0.01 size GB Clusterpoint — Introducing instantly scalable database as a service 1. Utilize the right hardware Relational (SQL) vs Document Oriented (NoSQL) Data Model Data represented in complex tabular Data is organized in self contained documents structure distributed among many servers Clusterpoint — Introducing instantly scalable database as a service 1. Utilise the right hardware Relational (SQL) vs Document Oriented (NoSQL) Implications on scaling Scales vertically by adding a bigger Scales horizontally by adding a server, which is disproportionally more servers, thus costs growing expensive proportionally with data Clusterpoint — Introducing instantly scalable database as a service 1. Utilise the right hardware TYPICAL 30 SERVER CLUSTER RAM SSD HDD Storage, TB 2 30 100 Cost, $ 24,000 12,000 5,000 100 ms read 80 3.2 0.3 size GB Read ratio 4% 0.01% 2.3*10-6 Clusterpoint — Introducing instantly scalable database as a service 2.
    [Show full text]
  • Structure Denormalization and Data Transformation
    7th International Conference on Computational Intelligence, Communication Systems and Networks (CICSyN) Migration of Relational Database to Document-Oriented Database: Structure Denormalization and Data Transformation Girts Karnitis and Guntis Arnicans Faculty of Computing University of Latvia Riga, Latvia {girts.karnitis, guntis.arnicans}@lu.lv Abstract — Relational databases remain the leading data 1.5 billion vehicle documents, the new system works five storage technology. Nevertheless, many companies want to times faster than legacy one). reduce operating expenses, to make scalable applications that Various estimations exist about the future of NoSQL use cloud computing technologies. Use of NoSQL database is databases market, but all predict a significant increase of one of the possible solutions, and it is forecasted that the share. Company Research and Markets estimated the NoSQL NoSQL market will be growing at a CAGR of approximately market will be growing at a CAGR of 53.09 percent over the 50 percent over the next five years. The paper offers a solution period 2013-2018 [2]. CAGR is an indicator that shows the for quick data migration from a relational database into a year-over-year growth rate of an investment over a specified document-oriented database. We have created semi- period. The company Allied Market Research forecast a automatically two logical levels over physical data. Users can CAGR of 40.8% during 2014 – 2020 and the global NoSQL refine generated logical data model and configure data migration template for each needed document. Data migration market is expected to reach $4.2 billion by 2020 [3]. The features are implemented into relational database browser most significant segments of applications are data storage, DigiBrowser.
    [Show full text]
  • Paper Title (Use Style: Paper Title)
    International Association of Scientific Innovation and Research (IASIR) ISSN (Print): 2279-0047 (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Online): 2279-0055 International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www.iasir.net BIG DATA ANALYTICS ON THE CLOUD Dr. Tamanna Siddiqui, Mohammad Al Kadri Department Of Computer Science Aligarh Muslim University, Aligarh, INDIA Abstract: The late advancements in innovations and construction modeling made the big data –which is information examination procedure – one of the sultriest points in our present studies. For that big data applications get to be more the most required programming in our life, in light of the fact that all our business, social, instructive and medicinal studies consider this idea as nightmare specially when we discuss storing and accessing client information in business clouds, or mining of social information. Big data needs a gigantic hardware equipment and processing assets, this needs made the little and medium estimated organizations be perplexed from these expenses. Anyhow, cloud computing offers all these needs for big data implementation to those organizations. It gives services to internet by using assets of the figuring framework to give diverse services of the web. It makes customers and organizations have the capacity to control their applications without establishment and retrieve their own data at any computer through internet access. The significant concerns according to cloud computing are security and loss of control. This paper presents an overview of big data concept, its famous interactive analysis tools (Apache Hive, BigQuery, CitusDB, ..), big data software (MATLAB, R, SAS, STATA, … ) and their comparisons on the basis of specific parameters .This paper also focuses on Cloud computing services and its four models (PaaS, SaaS, IaaS, HaaS).
    [Show full text]
  • Databases Theoretical Introduction Contents
    Databases Theoretical Introduction Contents 1 Databases 1 1.1 Database ................................................ 1 1.1.1 Terminology and overview .................................. 1 1.1.2 Applications .......................................... 2 1.1.3 General-purpose and special-purpose DBMSs ........................ 2 1.1.4 History ............................................ 2 1.1.5 Research ........................................... 6 1.1.6 Examples ........................................... 6 1.1.7 Design and modeling ..................................... 7 1.1.8 Languages ........................................... 9 1.1.9 Performance, security, and availability ............................ 10 1.1.10 See also ............................................ 12 1.1.11 References .......................................... 12 1.1.12 Further reading ........................................ 13 1.1.13 External links ......................................... 14 1.2 Schema migration ........................................... 14 1.2.1 Risks and Benefits ...................................... 14 1.2.2 Schema migration in agile software development ...................... 14 1.2.3 Available Tools ........................................ 15 1.2.4 References .......................................... 15 1.3 Star schema .............................................. 16 1.3.1 Model ............................................ 16 1.3.2 Benefits ............................................ 16 1.3.3 Disadvantages .......................................
    [Show full text]
  • GOL V2.2 Data Sheet
    PRODUCT FEATURES Operational and centralized investigation of events recorded in log file data with Google-like search queries getting results in a fraction of seconds Faster, as compared to conventional log data analysis methods, incident identification and resolving, reducing the cost of the consequences as result Monitoring IT systems and infrastructure conditions in real time using visual analysis tools, dashboards and incident alerts Control of the entire IT infrastructure from one GOL GUI: web-oriented graphical interface for access of centralized point of view for administration productivity consolidated in golDB log file entries, event search with and efficiency improvement regular Google-like queries, the graphs and trend In-depth understanding of the trends, transactions and monitoring in real time by using the dashboards and system status, identification of patterns and behavior of warnings of possible incidents; user activity Application interface (API): for connection to golDB from external user systems (like MS Excel for example) with regular XML requests to get the data for creation of statistical reports or deeper analysis of events. golLoader: log data collection and transmission On servers or users' PCs installed golLoader agent utility collects log data from the user specified files and sends them to golDB for indexing and consolidation. Clusterpark GOL is a powerful log data consolidation application, with an open XML standard document database, that provides very fast data analysis. This is very important for fast detection of security breaches, software related troubleshooting, and detection of technical problems in IT infrastructure. GOL can serve as a complementary tool to other IT systems monitoring solutions by providing in-depth understanding of the systems, processes, and quick decision making for preventive activities.
    [Show full text]