1环境准备
集群规划
ip 主机名 安装的软件
192.168.232.132 HA01 jdk、HA0doop、NameNode、DFSZKFailoverController(zkfc)
192.168.232.133 HA02 jdk、HA0doop、NameNode、DFSZKFailoverController(zkfc)
192.168.232.134 HA03 jdk、HA0doop、ResourceManager
192.168.232.135 HA04 jdk、HA0doop、ResourceManager
192.168.232.136 HA05 jdk、HA0doop、zookeeper、DataNode、NodeManager、JournalNode
192.168.232.137 HA06 jdk、HA0doop、zookeeper、DataNode、NodeManager、JournalNode
192.168.232.138 HA07 jdk、HA0doop、zookeeper、DataNode、NodeManager、JournalNode
1.1准备7个环境,会出现下面问题—虚拟机克隆网卡变为eth1问题
解决方法:https://blog.csdn.net/zhou920786312/article/details/84778234
1.2
- 修改Linux主机名
- 修改主机名和IP的映射关系 /etc/hosts
- 关闭防火墙
- ssh免登陆
- 1对1到7的免登陆
- 3到4,5,6,7的勉登录
- 5到6,7的免登陆
- 安装JDK,配置环境变量等
2安装zookeeper(HA05)
1解压tar -zxvf zookeeper-3.4.5.tar.gz -C /home/hadoop/app/
2修改配置
cd /home/hadoop/app/zookeeper-3.4.5/conf/
cp zoo_sample.cfg zoo.cfg
vim zoo.cfg
修改:dataDir=/home/hadoop/app/zookeeper-3.4.5/tmp
在最后添加:
server.1=HA05:2888:3888
server.2=HA06:2888:3888
server.3=HA07:2888:3888
保存退出
然后创建一个tmp文件夹
mkdir /home/hadoop/app/zookeeper-3.4.5/tmp
echo 1 > /home/hadoop/app/zookeeper-3.4.5/tmp/myid
将配置好的zookeeper拷贝到其他节点
scp -r /home/hadoop/app/ HA06:/home/hadoop/app/
scp -r /home/hadoop/app/ HA07:/home/hadoop/app/
注意:修改hadoop06、hadoop07对应/hadoop/zookeeper-3.4.5/tmp/myid内容
hadoop06:
echo 2 > /home/hadoop/app/zookeeper-3.4.5/tmp/myid
hadoop07:
echo 3 > /home/hadoop/app/zookeeper-3.4.5/tmp/myid
3启动zookeeper(ha05,ha06,ha07)
cd /home/hadoop/app/zookeeper-3.4.5/bin
./zkServer.sh start
./zkServer.sh status
[root@HA06 bin]# ./zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/app/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: leader
3安装配置hadoop集群(在HA01上操作)
解压
[root@HA01 ~]# tar -zxvf cenos-6.5-hadoop-2.6.4.tar.gz -C /home/hadoop/app/
配置HDFS
vim /etc/profile
export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.4
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
修改hadoo-env.sh
[root@HA01 ~]# cd /home/hadoop/app/hadoop-2.6.4/etc/hadoop
export JAVA_HOME=/usr/local/jdk1.7.0_45
修改core-site.xml
<configuration>
<!-- 指定hdfs的nameservice为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://bi/</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/app/hdpdata/</value>
</property><!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>HA05:2181,HA06:2181,HA07:2181</value>
</property>
</configuration>
修改hdfs-site.xml
<configuration>
<!--指定hdfs的nameservice为bi,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>bi</value>
</property>
<!-- bi下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.bi</name>
<value>nn1,nn2</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.bi.nn1</name>
<value>HA01:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.bi.nn1</name>
<value>HA01:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.bi.nn2</name>
<value>HA02:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.bi.nn2</name>
<value>HA02:50070</value>
</property>
<!-- 指定NameNode的edits元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://HA05:8485;HA06:8485;HA07:8485/bi</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/journaldata</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.bi</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
修改mapred-site.xml
[root@HA01 hadoop]# mv mapred-site.xml.template mapred-site.xml
[root@HA01 hadoop]# vi mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
修改yarn-site.xml
<configuration>
<!-- 开启RM高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- 指定RM的名字 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>HA03</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>HA04</value>
</property>
<!-- 指定zk集群地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>HA05:2181,HA06:2181,HA07:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
修改slaves
[root@HA01 hadoop]# vi slaves
删除掉localhost添加
HA05
HA06
HA07
[root@HA01 hadoop]# cat slaves
HA05
HA06
HA07
将配置好的hadoop拷贝到其他节点(HA02,HA03,HA04,HA05,HA06,HA07)
scp -r /home/hadoop/app/ HA02:/home/hadoop/app/
scp -r /home/hadoop/app/ HA03:/home/hadoop/app/
scp -r /home/hadoop/app/ HA04:/home/hadoop/app/
scp -r /home/hadoop/app/ HA05:/home/hadoop/app/
scp -r /home/hadoop/app/ HA06:/home/hadoop/app/
scp -r /home/hadoop/app/ HA07:/home/hadoop/app/
启动journalnode(分别在在HA05、HA06、HA07上执行)
cd /home/hadoop/app/hadoop-2.6.4
sbin/hadoop-daemon.sh start journalnode
#运行jps命令检验,hadoop05、hadoop06、hadoop07上多了JournalNode进程
[root@HA07 hadoop-2.6.4]# sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/app/hadoop-2.6.4/logs/hadoop-root-journalnode-HA07.out
[root@HA07 hadoop-2.6.4]# jps
2232 Jps
2134 QuorumPeerMain
2186 JournalNode
[root@HA07 hadoop-2.6.4]#
格式化HDFS(在mini1上执行命令)
hdfs namenode -format
格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/home/hadoop/app/hdpdata,
然后将/home/hadoop/app 拷贝到HA02的/home/hadoop/下。
scp -r /home/hadoop/app HA02:/home/hadoop/
##也可以这样,建议hdfs namenode -bootstrapStandby
[root@HA01 app]# scp -r /home/hadoop/app HA02:/home/hadoop/
格式化ZKFC(在HA01上执行一次即可)
hdfs zkfc -formatZK
[root@HA01 app]# hdfs zkfc -formatZK
19/03/04 08:07:07 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at HA01/192.168.232.132:9000
19/03/04 08:07:08 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
19/03/04 08:07:08 INFO zookeeper.ZooKeeper: Client environment:host.name=HA01
19/03/04 08:07:08 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_45
19/03/04 08:07:08 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
19/03/04 08:07:08 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/local/jdk1.7.0_45/jre
启动HDFS(在HA01上执行)
[root@HA01 hadoop-2.6.4]# pwd
/home/hadoop/app/hadoop-2.6.4
[root@HA01 hadoop-2.6.4]# ./sbin/start-dfs.sh
Starting namenodes on [HA01 HA02]
HA02: starting namenode, logging to /home/hadoop/app/hadoop-2.6.4/logs/hadoop-root-namenode-HA02.out
HA01: starting namenode, logging to /home/hadoop/app/hadoop-2.6.4/logs/hadoop-root-namenode-HA01.out
HA05: starting datanode, logging to /home/hadoop/app/hadoop-2.6.4/logs/hadoop-root-datanode-HA05.out
HA06: starting datanode, logging to /home/hadoop/app/hadoop-2.6.4/logs/hadoop-root-datanode-HA06.out
HA07: starting datanode, logging to /home/hadoop/app/hadoop-2.6.4/logs/hadoop-root-datanode-HA07.out
Starting journal nodes [HA05 HA06 HA07]
HA05: journalnode running as process 2187. Stop it first.
HA06: journalnode running as process 2180. Stop it first.
HA07: journalnode running as process 2186. Stop it first.
Starting ZK Failover Controllers on NN hosts [HA01 HA02]
HA01: starting zkfc, logging to /home/hadoop/app/hadoop-2.6.4/logs/hadoop-root-zkfc-HA01.out
HA02: starting zkfc, logging to /home/hadoop/app/hadoop-2.6.4/logs/hadoop-root-zkfc-HA02.out
启动YARN(在HA03上执行start-yarn.sh,把namenode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动
HA03
[root@HA03 ~]# cd /home/hadoop/app/hadoop-2.6.4
[root@HA03 hadoop-2.6.4]# ./sbin/start-yarn.sh
starting yarn daemons
resourcemanager running as process 2236. Stop it first.
HA06: nodemanager running as process 2837. Stop it first.
HA05: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.4/logs/yarn-root-nodemanager-HA05.out
HA07: starting nodemanager, logging to /home/hadoop/app/hadoop-2.6.4/logs/yarn-root-nodemanager-HA07.out
HA04
[root@HA04 sbin]# cd /home/hadoop/app/hadoop-2.6.4/sbin
[root@HA04 sbin]# yarn-daemon.sh start resourcemanager
-bash: yarn-daemon.sh: command not found
[root@HA04 sbin]# ./yarn-daemon.sh start resourcemanager
starting resourcemanager, logging to /home/hadoop/app/hadoop-2.6.4/logs/yarn-root-resourcemanager-HA04.out
[root@HA04 sbin]# jps
2668 ResourceManager
2721 Jps
整个启动过程
2.5启动zookeeper集群(分别在HA05、HA06、HA07上启动zk)cd /home/hadoop/app/zookeeper-3.4.5/bin/./zkServer.sh start#查看状态:一个leader,两个follower./zkServer.sh status2.6启动journalnode(分别在在HA05、HA06、HA07上执行)cd /home/hadoop/app/hadoop-2.6.4sbin/hadoop-daemon.sh start journalnode#运行jps命令检验,hadoop05、hadoop06、hadoop07上多了JournalNode进程2.7格式化HDFS#在HA01上执行命令:hdfs namenode -format#格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/hadoop/hadoop-2.6.4/tmp,然后将/home/hadoop/app拷贝到 HA02的/home/hadoop/下。scp -r /home/hadoop/app HA02:/home/hadoop/##也可以这样,建议hdfs namenode -bootstrapStandby2.8格式化ZKFC(在HA01上执行一次即可)hdfs zkfc -formatZK2.9启动HDFS(在HA01上执行)./sbin/start-dfs.sh2.10启动YARN(#####注意#####:是在HA03上执行start-yarn.sh,把namenode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动)sbin/start-yarn.sh
hadoop-2.6.4配置完毕
验证
http://ha01:50070/
验证HDFS HA(一个挂了,切换到其他机器)
在HA01操作
[root@HA01 hadoop-2.6.4]# hadoop fs -put /etc/profile /profile
[root@HA01 hadoop-2.6.4]# hadoop fs -ls /
Found 1 items
-rw-r–r– 3 root supergroup 1968 2019-03-04 08:59 /profile
[root@HA01 hadoop-2.6.4]# jps
3344 Jps
2637 DFSZKFailoverController
2366 NameNode
[root@HA01 hadoop-2.6.4]# kill -9 2366
查看浏览器
在HA01执行命令(结果hdfs还能使用)
[root@HA01 hadoop-2.6.4]# hadoop fs -ls /
Found 1 items
-rw-r–r– 3 root supergroup 1968 2019-03-04 08:59 /profile
手动启动那个挂掉的NameNode
[root@HA01 hadoop-2.6.4]# ./sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /home/hadoop/app/hadoop-2.6.4/logs/hadoop-root-namenode-HA01.out
测试集群工作状态的一些指令 :
bin/hdfs dfsadmin -report 查看hdfs的各节点状态信息
bin/hdfs haadmin -getServiceState nn1 获取一个namenode节点的HA状态sbin/hadoop-daemon.sh start namenode 单独启动一个namenode进程
./hadoop-daemon.sh start zkfc 单独启动一个zkfc进程