Training: Hadoop Operations
Hadoop
42 uur
Engels (US)

Training: Hadoop Operations

Snel navigeren naar:

  • Informatie
  • Inhoud
  • Kenmerken
  • Meer informatie
  • Reviews
  • FAQ

Productinformatie

In deze Hadoop training / cursus maakt u op een uitgebreide manier kennis met Hadoop clusters. Zo komt het ontwerpen van Hadoop clusters aan bod alsook Hadoop in de cloud aan bod. Vervolgens gaat de cursus verder met het deployen en beveiligen van Hadoop clusters. Aan het einde van de cursus komt capacity management, cloudera manager en performance tuning van Hadoop clusters aan bod.

Onderwerpen die onder andere aan bod komen zijn EC2, AWS, EMR job, Hive daemons, Install Hadoop, NameNode failure, DataNode, YARN, HDFS, event management, JogHistoryServer logs, performance tuning en nog veel meer.

Inhoud van de training

Hadoop Operations

42 uur

Designing Hadoop Clusters

  • start the course
  • describe the principles of supercomputing
  • recall the roles and skills needed for the Hadoop engineering team
  • recall the advantages and shortcomings of using Hadoop as a supercomputing platform
  • describe the three axioms of supercomputing
  • describe the dumb hardware and smart software, and the share nothing design principles
  • describe the design principles for move processing not data, embrace failure, and build applications not infrastructure
  • describe the different rack architectures for Hadoop.
  • describe the best practices for scaling a Hadoop cluster.
  • recall the best practices for different types of network clusters
  • recall the primary responsibilities for the master, data, and edge servers
  • recall some of the recommendations for a master server and edge server
  • recall some of the recommendations for a data server
  • recall some of the recommendations for an operating system
  • recall some of the recommendations for hostnames and DNS entries
  • describe the recommendations for HDD
  • calculate the correct number of disks required for a storage solution
  • compare the use of commodity hardware with enterprise disks
  • plan for the development of a Hadoop cluster
  • set up flash drives as boot media
  • set up a kickstart file as boot media
  • set up a network installer
  • identify the hardware and networking recommendations for a Hadoop cluster

Hadoop in the Cloud

  • start the course
  • describe how cloud computing can be used as a solution for Hadoop
  • recall some of the most come services of the EC2 service bundle
  • recall some of the most common services that Amazon offers
  • describe how the AWS credentials are used for authentication
  • create an AWS account
  • describe the use of AWS access keys
  • describe AWS identification and access management
  • set up AWS IAM
  • describe the use of SSH key pairs for remote access
  • set up S3 and import data
  • provision a micro instance of EC2
  • prepare to install and configure a Hadoop cluster on AWS
  • create an EC2 baseline server
  • create an Amazon machine image
  • create an Amazon cluster
  • describe what the command line interface is used for
  • use the command line interface
  • describe the various ways to move data into AWS
  • recall the advantages and limitations of using Hadoop in the cloud
  • recall the advantages and limitations of using AWS EMR
  • describe EMR End-user connections and EMR security levels
  • set up an EMR cluster
  • run an EMR job from the web console
  • run an EMR job with Hue
  • run an EMR job with the command line interface
  • write an Elastic MapReduce script for AWS

Deploying Hadoop Clusters

  • start the course
  • describe the configurations management tools
  • simulate a configuration management tool
  • build an image for a baseline server
  • build an image for a DataServer
  • build an image for a master server
  • provision an admin server
  • describe the layout and structure of the Hadoop cluster
  • provision a Hadoop cluster
  • distribute configuration files and admin scripts
  • use init scripts to start and stop a Hadoop cluster
  • configure a Hadoop cluster
  • configure logging for the Hadoop cluster
  • build images for required servers in the Hadoop cluster
  • configure a MySQL database
  • build the Hadoop clients
  • configure Hive daemons
  • test the functionality of Flume, Sqoop, HDFS, and MapReduce
  • test the functionality of Hive and Pig
  • configure Hcatalog daemons
  • configure Oozie
  • configure Hue and Hue users
  • install Hadoop on to the admin server

Hadoop Cluster Availability

  • start the course
  • describe how Hadoop leverages fault tolerance
  • recall the most common causes for NameNode failure
  • recall the uses for the Checkpoint node
  • test the availability for the NameNode
  • describe the operation of the NameNode during a recovery
  • swap to a new NameNode
  • recall the most common causes for DataNode failure
  • test the availability for the DataNode
  • describe the operation of the DataNode during a recovery
  • set up the DataNode for replication
  • identify and recover from a missing data block scenario
  • describe the functions of Hadoop high availability
  • edit the Hadoop configuration files for high availability
  • set up a high availability solution for NameNode
  • recall the requirements for enabling an automated failover for the NameNode
  • create an automated failover for the NameNode
  • recall the most common causes for YARN task failure
  • describe the functions of YARN containers
  • test YARN container reliability
  • recall the most common causes of YARN job failure
  • test application reliability
  • describe the system view of the Resource Manager configurations set for high availability
  • set up high availability for the Resource Manager
  • move the Resource Manager HA to alternate master servers

Securing Hadoop Clusters

  • start the course
  • describe the four pillars of the Hadoop security model
  • recall the ports required for Hadoop and how network gateways are used
  • install security groups for AWS
  • describe Kerberos and recall some of the common commands
  • diagram Kerberos and label the primary components
  • prepare for a Kerberos installation
  • install Kerberos
  • configure Kerberos
  • describe how to configure HDFS and YARN for use with Kerberos
  • configure HDFS for Kerberos
  • configure YARN for Kerberos
  • describe how to configure Hive for use with Kerberos
  • configure Hive for Kerberos
  • describe how to configure Pig, Sqoop, and Oozie for use with Kerberos
  • configure Pig and HTTPFS for use with Kerberos
  • configure Oozie for use with Kerberos
  • configure Hue for use with Kerberos
  • describe how to configure Flume for use with Kerberos
  • describe the security model for users on a Hadoop cluster
  • describe the use of POSIX and ACL for managing user access
  • create access control lists
  • describe how to encrypt data in motion for Hadoop, Sqoop, and Flume
  • encrypt data in motion
  • describe how to encrypt data at rest
  • recall the primary security threats faced by the Hadoop cluster
  • describe how to monitor Hadoop security
  • configure Hbase for Kerberos

Operating Hadoop Clusters

  • start the course
  • monitor and improve service levels
  • deploy a Hadoop release
  • describe the purpose of change management
  • describe rack awareness
  • write configuration files for rack awareness
  • start and stop a Hadoop cluster
  • write init scripts for Hadoop
  • describe the tools fsck and dfsadmin
  • use fsck to check the HDFS file system
  • set quotas for the HDFS file system
  • install and configure trash
  • manage an HDFS DataNode
  • use include and exclude files to replace a DataNode
  • describe the operations for scaling a Hadoop cluster
  • add a DataNode to a Hadoop cluster
  • describe the process for balancing a Hadoop cluster
  • balance a Hadoop cluster
  • describe the operations involved for backing up data
  • use distcp to copy data from one cluster to another
  • describe MapReduce job management on a Hadoop cluster
  • perform MapReduce job management on a Hadoop cluster
  • plan an upgrade of a Hadoop cluster

Stabilizing Hadoop Clusters

  • start the course
  • describe the importance of event management
  • describe the importance of incident management
  • describe the different methodologies used for root cause analysis
  • recall what Ganglia is and what it can be used for
  • recall how Ganglia monitors Hadoop clusters
  • install Ganglia
  • describe Hadoop Metrics2
  • install Hadoop Metrics2 for Ganglia
  • describe how to use Ganglia to monitor a Hadoop cluster
  • use Ganglia to monitor a Hadoop cluster
  • recall what Nagios is and what it can be used for
  • install Nagios
  • manage Nagios contact records
  • manage Nagios Push
  • use Nagios commands
  • use Nagios to monitor a Hadoop cluster
  • use Hadoop Metrics2 for Nagios
  • describe how to manage logging levels
  • describe how to configure Hadoop jobs for logging
  • describe how to configure log4j for Hadoop
  • describe how to configure JogHistoryServer logs
  • configure Hadoop logs
  • describe the problem management lifecycle
  • recall some of the best practices for problem management
  • describe the categories of errors for a Hadoop cluster
  • conduct a root cause analysis on a major problem
  • use different monitoring tools to identify problems, failures, errors and solutions

Capacity Management for Hadoop Clusters

  • start the course
  • compare the differences of availability versus performance
  • describe different strategies of resource capacity management
  • describe how schedulers perform various resource management
  • set quotas for the HDFS file system
  • recall how to set the maximum and minimum memory allocations per container
  • describe how the fair scheduling method allows all applications to get equal amounts of resource time
  • describe the primary algorithm and the configuration files for the Fair Scheduler
  • describe the default behavior of the Fair Scheduler methods
  • monitor the behavior of Fair Share
  • describe the policy for single resource fairness
  • describe how resources are distributed over the total capacity
  • identify different configuration options for single resource fairness
  • configure single resource fairness
  • describe the minimum share function of the Fair Scheduler
  • configure minimum share on the Fair Scheduler
  • describe the preemption functions of the Fair Scheduler
  • configure preemption for the Fair Scheduler
  • describe dominant resource fairness
  • write service levels for performance
  • use the fail scheduler with multiple users

Performance Tuning of Hadoop Clusters

  • start the course
  • recall the three main functions of service capacity
  • describe different strategies of performance tuning
  • list some of the best practices for network tuning
  • install compression
  • describe the configuration files and parameters used in performance tuning of the operating system
  • describe the purpose of Java tuning
  • recall some of the rules for tuning the datanode
  • describe the configuration files and parameters used in performance tuning of memory for daemons
  • describe the purpose of memory tuning for YARN
  • recall why the Node Manager kills containers
  • performance tune memory for the Hadoop cluster
  • describe the configuration files and parameters used in performance tuning of HDFS
  • describe the sizing and balancing of the HDFS data blocks
  • describe the use of TestDFSIO
  • performance tune HDFS
  • describe the configuration files and parameters used in performance tuning of YARN
  • configure Speculative execution
  • describe the configuration files and parameters used in performance tuning of MapReduce
  • tune up MapReduce for performance reasons
  • describe the practice of benchmarking on a Hadoop cluster
  • describe the different tools used for benchmarking a cluster
  • perform a benchmark of a Hadoop cluster
  • describe the purpose of application modeling
  • optimize memory and benchmark a Hadoop cluster

Cloudera Manager and Hadoop Clusters

  • start the course
  • describe what cluster management entails and recall some of the tools that can be used
  • describe different tools from a functional perspective
  • describe the purpose and functionality of Cloudera Manager
  • install Cloudera Manager
  • use Cloudera Manager to deploy a cluster
  • use Cloudera Manager to install Hadoop
  • describe the different parts of the Cloudera Manager Admin Console
  • describe the Cloudera Manager internal architecture
  • use Cloudera Manager to manage a cluster
  • manage Cloudera Manager's services
  • manage hosts with Cloudera Manager
  • set up Cloudera Manager for high availability
  • user Cloudera Manager to manage resources
  • use Cloudera Manager's monitoring features
  • manage logs through Cloudera Manager
  • improve cluster performance with Cloudera Manager
  • install and configure Impala
  • install and configure Sentry
  • implement security administration using Hive
  • perform backups, snapshots, and upgrades using Cloudera Manager
  • configure Hue with My SQL
  • import data using Hue
  • use Hue to run a Hive job
  • use Hue to edit Oozie workflows and coordinators
  • format HDFS, create an HDFS directory, import data, run a WordCount, and view the results

Kenmerken

Docent inbegrepen
Bereidt voor op officieel examen
Engels (US)
42 uur
Hadoop
90 dagen online toegang
HBO

Meer informatie

Doelgroep Softwareontwikkelaar, Webontwikkelaar, Databasebeheerders
Voorkennis

Basiskennis van cloud computing, big data en databases is een pré.

Resultaat

Na het volgen van deze training heeft u uitgebreide kennis verkregen op het gebied van Hadoop clusters.

Positieve reacties van cursisten

Training: Leidinggeven aan de AI transformatie

Nuttige training. Het bestelproces verliep vlot, ik kon direct beginnen.

- Mike van Manen

Onbeperkt Leren Abonnement

Onbeperkt Leren aangeschaft omdat je veel waar voor je geld krijgt. Ik gebruik het nog maar kort, maar eerste indruk is goed.

- Floor van Dijk

Training: Leidinggeven aan de AI transformatie

Al jaren is icttrainingen.nl onze trouwe partner op het gebied van kennisontwikkeling voor onze IT-ers. Wij zijn blij dat wij door het platform van icttrainingen.nl maatwerk en een groot aanbod aan opleidingen kunnen bieden aan ons personeel.

- Loranne, Teamlead bij Inwork

Hoe gaat het te werk?

1

Training bestellen

Nadat je de training hebt besteld krijg je bevestiging per e-mail.

2

Toegang leerplatform

In de e-mail staat een link waarmee je toegang krijgt tot ons leerplatform.

3

Direct beginnen

Je kunt direct van start. Studeer vanaf nu waar en wanneer jij wilt.

4

Training afronden

Rond de training succesvol af en ontvang van ons een certificaat!

Veelgestelde vragen

Veelgestelde vragen

Op welke manieren kan ik betalen?

Je kunt bij ons betalen met iDEAL, PayPal, Creditcard, Bancontact en op factuur. Betaal je op factuur, dan kun je met de training starten zodra de betaling binnen is.

Hoe lang heb ik toegang tot de training?

Dit verschilt per training, maar meestal 180 dagen. Je kunt dit vinden onder het kopje ‘Kenmerken’.

Waar kan ik terecht als ik vragen heb?

Je kunt onze Learning & Development collega’s tijdens kantoortijden altijd bereiken via support@icttrainingen.nl of telefonisch via 026-8402941.

Background Frame
Background Frame

Onbeperkt leren

Met ons Unlimited concept kun je onbeperkt gebruikmaken van de trainingen op de website voor een vast bedrag per maand.

Bekijk de voordelen

Heb je nog twijfels?

Of gewoon een vraag over de training? Blijf er vooral niet mee zitten. We helpen je graag verder. Daar zijn we voor!

Contactopties