Prove Table of Contents

12.2. Installing the Hadoop FileSystem Plugin for Red Hat Storage

12.ii.1. Calculation the Hadoop Installer for Crimson Hat Storage

You must have the large-information channel added and the hadoop components installed on all the servers to use the Hadoop feature on Cherry-red Hat Storage. Run the following command on the Ambari Management Server, the YARN Principal Server and all the servers within the Crimson Hat Storage trusted storage pool:

# yum install rhs-hadoop rhs-hadoop-install

On the YARN Master Server

The YARN Chief Server is required to FUSE Mountain all Blood-red Hat Storage Volumes that is used with Hadoop. It must have the Cerise Hat Storage Client Channel enabled and then that the setup_cluster script tin install the Red Hat Storage Client Libraries on it.

  • If you have registered your auto using Red Hat Subscription Manager, enable the aqueduct by running the following command:

    # subscription-manager repos --enable=rhel-six-server-rhs-client-1-rpms
  • If you accept registered your machine using Satellite server, enable the channel by running the following command:

    # rhn-aqueduct --add --channel=rhel-x86_64-server-rhsclient-half dozen

12.2.2. Configuring the Trusted Storage Pool for employ with Hadoop

Ruddy Hat Storage provides a series of utility scripts that allows yous to speedily set up Red Hat Storage for use with Hadoop, and install the Ambari Direction Server. You must first run the Hadoop cluster configuration initial script to install the Ambari Management Server, set up the YARN Primary Server to host the Resource Manager and Job History Server services for Red Hat Storage and build a trusted storage pool if it does not exist.

Y'all must run the script given below irrespective of whether yous have an existing Cherry-red Chapeau Storage trusted storage puddle or not.

To run the Hadoop configuration initial script:

  1. Open the terminal window of the server designated to be the Ambari Management Server and navigate to the /usr/share/rhs-hadoop-install/ directory.

  2. Run the hadoop cluster configuration script as given below:

    setup_cluster.sh [-y] [--hadoop-mgmt-node <node>] [--yarn-chief <node>]  <node-list-spec>

    where <node-list-spec> is

    <node1>:<brickmnt1>:<blkdev1>  <node2>[:<brickmnt2>][:<blkdev2>]  [<node3>[:<brickmnt3>][:<blkdev3>]] ... [<nodeN>[:<brickmntN>][:<blkdevN>]]

    where

    • <brickmnt> is the name of the XFS mount for the higher up <blkdev>,for example, /mnt/brick1 or /external/HadoopBrick. When a Blood-red Lid Storage book is created its bricks has the volume name appended, so <brickmnt> is a prefix for the volume'southward bricks. Example: If a new volume is named HadoopVol and then its brick listing would be: <node>:/mnt/brick1/HadoopVol or <node>:/external/HadoopBrick/HadoopVol.

    • <blkdev> is the name of a Logical Volume device path, for example, /dev/VG1/LV1 or /dev/mapper/VG1-LV1. Since LVM is a prerequisite for Cherry-red Hat Storage, the <blkdev> is not expected to be a raw block path, such equally /dev/sdb.

    Given below is an case of running setup_cluster.sh script on a the YARN Master server and four Scarlet Hat Storage Nodes which has the same logical volume and mount point intended to be used as a Ruddy Hat Storage Brick.

                                  ./setup_cluster.sh --yarn-primary yarn.hdp rhs-one.hdp:/mnt/brick1:/dev/rhs_vg1/rhs_lv1 rhs-2.hdp rhs-3.hdp rhs-iv.hdp

    If a brick mount is omitted, the brick mount of the get-go node is used and if one block device is omitted, the block device of the first node is used.

12.2.3. Creating Volumes for use with Hadoop

To use an existing Red Lid Storage Book with Hadoop, skip this section and continue with the section Adding the User Directories for the Hadoop Processes on the Ruby-red Chapeau Storage Book.

Whether you have a new or existing Red Hat Storage trusted storage puddle, to create a volume for use with Hadoop, the volume need to be created in such a mode as to support Hadoop workloads. The supported book configuration for Hadoop is Distributed Replicated volume with replica count 2. You must non proper noun the Hadoop enabled Crimson Hat Storage volume every bit hadoop or mapredlocal.

Run the script given below to create new volumes that yous intend to use with Hadoop. The script provides the necessary configuration parameters to the volume too as updates the Hadoop Configuration to brand the volume accessible to Hadoop.

  1. Open the concluding window of the server designated to exist the Ambari Management Server and navigate to the /usr/share/rhs-hadoop-install/ directory.

  2. Run the hadoop cluster configuration script every bit given below:

    create_vol.sh [-y] <volName> <volMountPrefix> <node-listing>

    where

    • <node-listing> is: <node1>:<brickmnt> <node2>[:<brickmnt2>] <node3>[:<brickmnt3>] ... [<nodeN>[:<brickmntN>

    • <brickmnt> is the name of the XFS mountain for the block devices used by the in a higher place nodes, for example, /mnt/brick1 or /external/HadoopBrick. When a RHS book is created its bricks will have the volume name appended, then <brickmnt> is a prefix for the volume's bricks. For example, if a new volume is named HadoopVol then its brick list would exist: <node>:/mnt/brick1/HadoopVol or <node>:/external/HadoopBrick/HadoopVol.

    The node-list for create_vol.sh is similar to the node-list-spec used by setup_cluster.sh except that a block device is not specified in create_vol.

    Given below is an example on how to create a book named HadoopVol, using 4 Red Hat Storage Servers, each with the same brick mount and mount the book on /mnt/glusterfs

    ./create_vol.sh HadoopVol /mnt/glusterfs rhs-1.hdp:/mnt/brick1 rhs-2.hdp rhs-3.hdp rhs-4.hdp

12.two.4. Adding the User Directories for the Hadoop Processes on the Crimson Hat Storage Volume

Later creating the volume, y'all need to setup the user directories for all the Hadoop ecosystem component users that you created in the prerequisites section. This is required for completing the Ambari distribution successfully.

Perform the steps given below merely when the book is created and enabled to exist used with Hadoop.

Open the terminal window of the Red Hat Storage server within the trusted storage pool and run the post-obit commands:

# mkdir /mnt/glusterfs/HadoopVol/user/mapred # mkdir /mnt/glusterfs/HadoopVol/user/yarn # mkdir /mnt/glusterfs/HadoopVol/user/hcat # mkdir /mnt/glusterfs/HadoopVol/user/hive # mkdir /mnt/glusterfs/HadoopVol/user/ambari-qa
# chown ambari-qa:hadoop /mnt/glusterfs/HadoopVol/user/ambari-qa # chown hive:hadoop /mnt/glusterfs/HadoopVol/user/hive # chown hcat:hadoop /mnt/glusterfs/HadoopVol/user/hcat # chown yarn:hadoop /mnt/glusterfs/HadoopVol/user/yarn # chown mapred:hadoop /mnt/glusterfs/HadoopVol/user/mapred

12.2.v. Deploying and Configuring the HDP 2.0.6 Stack on Red Hat Storage using Ambari Manager

Perform the following steps to deploy and configure HDP stack on Red Hat Storage:

This section describes how to deploy HDP on Ruddy Chapeau Storage. Selecting HDFS as the storage selection in the HDP two.0.six.GlusterFS stack is not supported. If you want to deploy HDFS, and so y'all must select the HDP 2.0.6 stack (non HDP two.0.6.GlusterFS) and follow the instructions of the Hortonworks documentation.

  1. Launch a web browser and enter http://hostname:8080 in the URL by replacing hostname with the hostname of your Ambari Management Server.

    If the Ambari Console fails to load in the browser, it is normally considering iptables is still running. Stop iptables past opening a terminal window and run service iptables stop control.

  2. Enter admin and admin for the username and password.

  3. Assign a name to your cluster, such as MyCluster .

  4. Select the HDP 2.0.6.GlusterFS Stack (if not already selected past default) and click Adjacent.

  5. On the Install Options screen:

    1. For Target Hosts, add the YARN server and all the nodes in the trusted storage pool.

    2. Select Perform manual registrations on hosts and exercise not utilize SSH pick.

    3. Accept any warnings you may see and click Register and Ostend button.

    4. Click OK on Before yous keep alarm warning. The Ambari Agents have all been installed for you during the setup_cluster.sh script.

  6. For Confirm Hosts, the progress must show as green for all the hosts. Click Side by side and ignore the Host Check alarm.

  7. For Cull Services, unselect HDFS and as a minimum select GlusterFS, Ganglia, YARN+MapReduce2 and ZooKeeper.

    • Exercise not select the Nagios service, as it is not supported. For more data, encounter subsection 21.1. Deployment Scenarios of chapter 21. Administering the Hortonworks Data Platform on Carmine Hat Storage in the Red Hat Storage 3.0 Administration Guide.

    • The use of HBase has not been extensively tested and is not yet supported.

    • This department describes how to deploy HDP on Red Chapeau Storage. Selecting HDFS as the storage selection in the HDP two.0.6.GlusterFS stack is not supported. If users wish to deploy HDFS, then they must select the HDP 2.0.6 stack (non HDP 2.0.6.GlusterFS) and follow the instructions in the Hortonworks documentation.

  8. For Assign Masters, ready all the services to your designated YARN Master Server. For ZooKeeper, select at least three carve up nodes within your cluster.

  9. For Assign Slaves and Clients, select all the nodes as NodeManagers except the YARN Master Server. You must also ensure to click the Customer checkbox for each node.

  10. On the Customize Services screen:

    1. Click YARN tab, curlicue downward to the yarn.nodemanager.log-dirs and yarn.nodemanager.local-dirs backdrop and remove any entries that begin with /mnt/glusterfs/.

    2. Click MapReduce2 tab, curlicue down to the Avant-garde department, and modify the following property:

      Key Value
      yarn.app.mapreduce.am.staging-dir glusterfs:///user
    3. Click MapReduce2 tab, roll downward to the bottom, and under the custom mapred-site.xml, add together the following four custom properties and so click on the Next button:

      Key Value
      mapred.healthChecker.script.path glusterfs:///mapred/jobstatus
      mapred.task.tracker.history.completed.location glusterfs:///mapred/history/washed
      mapred.system.dir glusterfs:///mapred/arrangement
      mapreduce.jobtracker.staging.root.dir glusterfs:///user
    4. Review other tabs that are highlighted in red. These require you to enter additional information, such every bit passwords for the respective services.

  11. Review your configuration and then click Deploy button. Once the deployment is complete, it volition state that the deployment is 100% complete and the progress bars will be colored in Orange.

    The deployment process is susceptible to network and bandwidth bug. If the deployment fails, try clicking "Retry" to effort the deployment over again. This oft resolves the upshot.

12.ii.half dozen. Enabling Existing Volumes for use with Hadoop

This section is mandatory for every volume you intend to use with Hadoop. It is not sufficient to run the create_vol.sh script, y'all must follow the steps listed in this section besides.

If you have an existing Blood-red Hat Storage trusted storage pool with volumes that incorporate data that you would like to analyze with Hadoop, the volumes need to exist configured to back up Hadoop workloads. Execute the script given below on every volume that you lot intend to utilise with Hadoop. The script provides the necessary configuration parameters for the volume and updates the Hadoop Configuration to brand the volume accessible to Hadoop.

The supported volume configuration for Hadoop is Distributed Replicated book with replica count 2.

  1. Open the terminal window of the server designated to be the Ambari Management Server and navigate to the /usr/share/rhs-hadoop-install/ directory.

  2. Run the Hadoop Trusted Storage pool configuration script as given beneath:

    # enable_vol.sh [-y]  [--hadoop-mgmt-node <node>] [--user <admin-user>] [--pass <admin-password>] [--port <mgmt-port-num>] [--yarn-principal <node>] [--rhs-node <storage-node>] <volName>

    For Example;

    ./enable_vol.sh --yarn-main yarn.hdp  --rhs-node rhs-1.hdp HadoopVol

    If --yarn-primary and/or --rhs-node options are omitted then the default of localhost (the node from which the script is being executed) is assumed. --rhs-node is the hostname of any of the storage nodes in the trusted storage pool. This is required to access the gluster control. Default is localhost and it must have gluster CLI access.

12.2.7. Configuring the Linux Container Executor

The Container Executor program used by the YARN framework defines how any container is launched and controlled. The Linux Container Executor sets upward restricted permissions and the user/grouping ownership of local files and directories used by the containers such equally the shared objects, jars, intermediate files, log files, and then on. Perform the post-obit steps to configure the Linux Container Executor plan:

  1. In the Ambari console, click Stop All in the Services navigation panel. You lot must wait until all the services are completely stopped.

  2. On each server within the Crimson Hat Storage trusted storage pool:

    1. Open the terminal and navigate to /usr/share/rhs-hadoop-install/ directory:

    2. Execute the setup_container_executor.sh script.

  3. On each server inside the Red Hat Storage trusted storage pool and the YARN Master server:

    1. Open the terminal and navigate to /etc/hadoop/conf/ directory.

    2. Supersede the contents of container-executor.cfg file with the following:

      yarn.nodemanager.linux-container-executor.group=hadoop banned.users=yarn min.user.id=yard allowed.system.users=tom

      Ensure that there is no additional whitespace at the end of each line and at the cease of the file. Likewise, tom is an case user. Hadoop ignores the allowed.organization.user parameter, only we recommend having at least ane valid user. Y'all can modify this file on one server and so employ Secure Copy (or whatever another approach) to copy the modified file to the same location on each server.