Hadoop 2.0 Introduction – with HDP for Windows Seele Lin Who am I • Speaker: – 林彥⾠ – A.K.A Seele Lin – Mail:
[email protected] • Experience – 2010~Present – 2013~2014 • Trainer of Hortonworks Certificated Training lecture – HCAHD( Hortonworks Certified Apache Hadoop Developer) – HCAHA ( Hortonworks Certified Apache Hadoop Administrator) Agenda • What is Big Data – The Need for Hadoop • Hadoop Introduction – What is Hadoop 2.0 • Hadoop Architecture Fundamentals – What is HDFS – What is MapReduce – What is YARN – Hadoop eco-systems • HDP for Windows – What is HDP – How to install HDP on Windows – The advantages of HDP – What’s Next • Conclusion • Q&A What is Big Data What is Big Data? 1. In what timeframe do we now create the same amount of information that 2 days we created from the dawn of civilization until 2003? 2. 90% of the world’s data was created 2 years in the last (how many years)? 3. This is data from 2010 report! Sources: http://www.itbusinessedge.com/cm/blogs/lawson/just-the-stats-big-numbers-about-big-data/? cs=48051 http://techcrunch.com/2010/08/04/schmidt-data/ How large can it be? • 1ZB = 1000 EB = 1,000,000 PB = 1,000,000,000 TB Every minute… • http://whathappensontheinternetin60seconds.com/ The definition? “Big Data is like teenage sex: Everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it too.”── Dan Ariely. A set of files A database A single file Big Data Includes All Types of Data • Pre-defined • Inconsistent • Irregular schema structure structure or.