课程介绍
Apache Hadoop is one of the most popular tools for big data processing. It has been successfully deployed in production by many companies for several years. Though Hadoop is considered a reliable, scalable, and cost-effective solution, it is constantly being improved by a large community of developers. As a result, the 2.0 version offers several revolutionary features, including Yet Another Resource Negotiator (YARN), HDFS Federation, and high availability, which make the Hadoop cluster much more efficient, powerful, and reliable.
The most serious limitations of classical MapReduce are primarily related to scalability, resource utilization, and the support of workloads different from MapReduce. In the MapReduce framework, the job execution is controlled by two types of processes: a single master process called JobTracker and a number of subordinate processes called TaskTrackers.
Apache Hadoop 2.0 includes YARN, which separates the resource management and processing components. The YARN-based architecture is not constrained to MapReduce. In YARN, MapReduce is simply degraded to a role of a distributed application (but still a very popular and useful one) and is now called MRv2. MRv2 is simply the re-implementation of the classic MapReduce engine, now called MRv1, which runs on top of YARN.
The course reviews MapReduce1 and provides insight into the design and implementation of YARN: ResourceManager instead of a cluster manager, ApplicationMaster instead of a dedicated and short-lived JobTracker, NodeManager instead of TaskTracker, a distributed application instead of a MapReduce job.
Big Data University has been chosen by IBM as one of the issuers of badges as part of the IBM Open Badge program. Share your achievements through LinkedIn, Facebook, Twitter, and more!
Big Data University leverages the services of Pearson VUE Acclaim to assist in the administration of the IBM Open Badge program. By enrolling into this course, you agree to Big Data University sharing your details with Pearson VUE Acclaim for the strict use of issuing your badge upon completion of the badge criteria.
课程大纲
学习要求
Before taking this course, you should have the following background:
Taken the Hadoop Fundamental v3 on BDU or equivalent
Basic understanding of Big Data, Apache Hadoop, and HDFS
Basic Linux Operating System knowledge
Some knowledge of Java and XML
Basic understanding of Apache Hadoop and Big Data.
Basic Linux Operating System knowledge
Basic understanding of the Scala, Python, or Java programming languages.
考核标准
课件浏览100%,客观练习0%,主观练习0%,课内讨论0%。
课程内容不断迭代,成绩以当时的课程内容为准,一旦合格,可以申请证书。申请证书后,以结课处理,成绩不再改动