由于大数据框架很多,为了解决某个问题,一般来说会用到多个框架,但是每个框架又都有自己的web UI监控界面,对应着不同的端口号。比如HDFS(9870)、YARN(8088)、MapReduce(19888)等。这个时候有一个统一的web UI界面去管理各个大数据常用框架是非常方便的。这就使得对大数据的开发、监控和运维更加的方便。由此Hue诞生就是为了解决每个框架都有自己的Web界面的问题。

Hue官方网站:https://gethue.com/
HUE官方用户手册:https://docs.gethue.com/
官方安装文档:https://docs.gethue.com/administrator/installation/install/
HUE下载地址:Hue - The open source SQL Assistant for Data Warehouses
Hue - The open source SQL Assistant for Data Warehouses

hue源码包
链接:https://pan.baidu.com/s/10UPgRfejKpwdV6qT4WuJog
提取码:yyds
--来自百度网盘超级会员V5的分享
npm
先下载npm,安装,这里我就不具体了(记得加环境变量)
- wget https://nodejs.org/dist/v14.15.4/node-v14.15.4-linux-x64.tar.xz
- tar -xf node-v14.15.4-linux-x64.tar.xz
配置环境变量
sudo vi /etc/profile.d/my_env.sh
- #NPM_HOME
- NPM_HOME=/home/bigdata/node-v14.15.4-linux-x64
- export PATH=$PATH:$NPM_HOME/bin:$NPM_HOME/sbin
source /etc/profile.d/my_env.sh
配置淘宝镜像
npm config set registry https://registry.npm.taobao.org
查看是否切换成功
npm config get registry
如果npm不好使,使用cnpm
- npm install -g cnpm --registry=https://registry.npm.taobao.org
- cd /usr/bin
- ln -s /usr/local/node/bin/cnpm cnpm
tar -zxvf hue-4.5.0.tgz
安装依赖包(安装最好在一台没有安装过mysql的机器编译安装)
- # 需要Python支持(Python 2.7+ / Python 3.5+)
- python --version
- # 在 CentOS 系统中安装编译 Hue 需要的依赖库
- sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
以上依赖仅适用CentOS/RHEL 7.X,其他情况请参考https://docs.gethue.com/administrator/installation/dependencies/
安装Hue的节点上最好没有安装过MySQL,否则可能有版本冲突
安装过程中需要联网,网络不好会有各种奇怪的问题
修改hue.ini文件
- # [desktop]
- http_host=node2
- http_port=8000
- time_zone=Asia/Shanghai
- server_user=bigdata
- server_group=bigdata
- default_user=bigdata
- app_blacklist=search
- # [[database]]。Hue默认使用SQLite数据库记录相关元数据,替换为mysql
- engine=mysql
- host=master
- port=3306
- user=root
- password=root
- #数据库名称
- name=hue
- # 1003行左右,Hadoop配置文件的路径
- hadoop_conf_dir=/home/bigdata/hadoop/hadoop/etc/hadoop
hue编译

- # 进入 hue 源码目录,进行编译。 使用 PREFIX 指定安装 Hue 的路径
- cd hue-4.5.0
- make apps
如果遇到下列问题

yum install mysql-devel
然后删除上面指定编译目录的target里面的文件
- PREFIX=/home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target
-
- cd /home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target
- rm -rf ./*
如果出现下面的错误

sudo yum install -y libxslt-devel
如果出现下面的错误

查找对应的依赖
sudo yum search sqlite3

找到对应的依赖进行安装
- sudo yum install -y libsqlite3x.x86_64
- sudo yum install -y libsqlite3x-devel.x86_64
- sudo yum install -y gmp-devel.x86_64
再次编译
PREFIX=/home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target make install
稍微的等待.......恭喜编译成功!

编译以后的包不能到其他机器使用,因为挺多都是觉得路径里面,除非环境一样。
tar -zcvf hue.tar.gz hue
修改hadoop配置
在 hdfs-site.xml 中增加配置
- <!-- HUE -->
- <property>
- <name>dfs.webhdfs.enabled</name>
- <value>true</value>
- </property>
- <property>
- <name>dfs.permissions.enabled</name>
- <value>false</value>
- </property>
在 core-site.xml 中增加配置
- <!-- HUE -->
- <property>
- <name>hadoop.proxyuser.bigdata.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>hadoop.proxyuser.bigdata.groups</name>
- <value>*</value>
- </property>
- <property>
- <name>hadoop.proxyuser.hdfs.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>hadoop.proxyuser.hdfs.groups</name>
- <value>*</value>
- </property>
增加 httpfs-site.xml 文件,加入配置
- <configuration>
- <!-- HUE -->
- <property>
- <name>httpfs.proxyuser.bigdata.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>httpfs.proxyuser.bigdata.groups</name>
- <value>*</value>
- </property>
- </configuration>
备注:修改完HDFS相关配置后,需要把配置scp给集群中每台机器,重启hdfs服务。
修改hue配置
- cd /home/bigdata/apache-maven-3.8.6/hue-4.5.0/desktop/conf
- vi hue.ini
- # [desktop]
- http_host=node2
- http_port=8000
- time_zone=Asia/Shanghai
- server_user=bigdata
- server_group=bigdata
- default_user=bigdata
- app_blacklist=search
- # [[database]]。Hue默认使用SQLite数据库记录相关元数据,替换为mysql
- engine=mysql
- host=master
- port=3306
- user=root
- password=root
- #数据库名称
- name=hue
- # 1003行左右,Hadoop配置文件的路径
- hadoop_conf_dir=/home/bigdata/hadoop/hadoop/etc/hadoop
创建数据库
CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
Hue初始化数据库
- # 初始化数据库
- cd /home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target/hue/build/env/bin
- ./hue syncdb
- ./hue migrate
- # 检查数据

启动hue
/data/hue/build/env/bin/supervisor
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
-
- <!-- Put site-specific property overrides in this file. -->
-
- <configuration>
-
- <property>
- <!--指定 namenode 的 hdfs 协议文件系统的通信地址-->
- <name>fs.defaultFS</name>
- <!--指定hdfs高可用的集群名称-->
- <value>hdfs://bigdatacluster</value>
- </property>
- <property>
- <!--指定 hadoop 集群存储临时文件的目录-->
- <name>hadoop.tmp.dir</name>
- <value>/home/bigdata/module/hadoop-3.1.3/data</value>
- </property>
-
- <!-- 配置HDFS网页登录使用的静态用户为bigdata -->
- <property>
- <name>hadoop.http.staticuser.user</name>
- <value>bigdata</value>
- </property>
-
- <!-- 回收站 -->
- <property>
- <name>fs.trash.interval</name>
- <value>1</value>
- </property>
-
- <property>
- <name>fs.trash.checkpoint.interval</name>
- <value>1</value>
- </property>
-
- <!-- 配置该bigdata(superUser)允许通过代理访问的主机节点 -->
- <property>
- <name>hadoop.proxyuser.bigdata.hosts</name>
- <value>*</value>
- </property>
- <!-- 配置该bigdata(superUser)允许通过代理用户所属组 -->
- <property>
- <name>hadoop.proxyuser.bigdata.groups</name>
- <value>*</value>
- </property>
- <!-- 配置该bigdata(superUser)允许通过代理的用户-->
- <property>
- <name>hadoop.proxyuser.bigdata.users</name>
- <value>*</value>
- </property>
-
- <!-- 指定zkfc要连接的zkServer地址 -->
- <property>
- <name>ha.zookeeper.quorum</name>
- <value>node1:2181,node2:2181,node3:2181</value>
- </property>
-
- <!-- Hue -->
- <property>
- <name>hadoop.proxyuser.hdfs.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>hadoop.proxyuser.hdfs.groups</name>
- <value>*</value>
- </property>
-
- <property>
- <name>hadoop.proxyuser.httpfs.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>hadoop.proxyuser.httpfs.groups</name>
- <value>*</value>
- </property>
-
- <property>
- <name>hadoop.proxyuser.hue.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>hadoop.proxyuser.hue.groups</name>
- <value>*</value>
- </property>
-
- </configuration>
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
-
- <!-- Put site-specific property overrides in this file. -->
-
- <configuration>
-
- <!-- NameNode数据存储目录 -->
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>file://${hadoop.tmp.dir}/name</value>
- </property>
- <!-- DataNode数据存储目录 -->
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>file://${hadoop.tmp.dir}/data</value>
- </property>
- <!-- JournalNode数据存储目录 -->
- <property>
- <name>dfs.journalnode.edits.dir</name>
- <value>${hadoop.tmp.dir}/jn</value>
- </property>
- <!-- 完全分布式集群名称 对应core.xml里面的fs.defaultFS-->
- <property>
- <name>dfs.nameservices</name>
- <value>bigdatacluster</value>
- </property>
- <!-- 集群中NameNode节点都有哪些 -->
- <property>
- <name>dfs.ha.namenodes.bigdatacluster</name>
- <value>nn1,nn2</value>
- </property>
- <!-- NameNode的RPC通信地址 -->
- <property>
- <name>dfs.namenode.rpc-address.bigdatacluster.nn1</name>
- <value>master1:8020</value>
- </property>
- <property>
- <name>dfs.namenode.rpc-address.bigdatacluster.nn2</name>
- <value>master2:8020</value>
- </property>
- <!-- NameNode的http通信地址 -->
- <property>
- <name>dfs.namenode.http-address.bigdatacluster.nn1</name>
- <value>master1:9870</value>
- </property>
- <property>
- <name>dfs.namenode.http-address.bigdatacluster.nn2</name>
- <value>master2:9870</value>
- </property>
- <!-- 指定NameNode元数据在JournalNode上的存放位置 -->
- <property>
- <name>dfs.namenode.shared.edits.dir</name>
- <value>qjournal://node1:8485;node2:8485;node3:8485/bigdatacluster</value>
- </property>
- <!-- 访问代理类:client用于确定哪个NameNode为Active -->
- <property>
- <name>dfs.client.failover.proxy.provider.bigdatacluster</name>
- <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
- </property>
- <!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
- <property>
- <name>dfs.ha.fencing.methods</name>
- <value>sshfence</value>
- </property>
- <!-- 使用隔离机制时需要ssh秘钥登录-->
- <property>
- <name>dfs.ha.fencing.ssh.private-key-files</name>
- <value>/home/bigdata/.ssh/id_rsa</value>
- </property>
-
- <!-- 配置黑名单 -->
- <property>
- <name>dfs.hosts.exclude</name>
- <value>/home/bigdata/module/hadoop-3.1.3/etc/blacklist</value>
- </property>
-
- <!-- 启用nn故障自动转移 -->
- <property>
- <name>dfs.ha.automatic-failover.enabled</name>
- <value>true</value>
- </property>
-
- <!-- HUE -->
- <property>
- <name>dfs.webhdfs.enabled</name>
- <value>true</value>
- </property>
- <property>
- <name>dfs.permissions.enabled</name>
- <value>false</value>
- </property>
-
- </configuration>
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
-
- <!-- Put site-specific property overrides in this file. -->
-
- <configuration>
-
- <!-- 启用jvm重用 -->
- <property>
- <name>mapreduce.job.jvm.numtasks</name>
- <value>10</value>
- <description>How many tasks to run per jvm,if set to -1 ,there is no limit</description>
- </property>
-
- <!--
- <property>
- <name>mapreduce.job.tracker</name>
- <value>hdfs://master1:8001</value>
- <final>true</final>
- </property>
- -->
- <property>
- <!--指定 mapreduce 作业运行在 yarn 上-->
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
-
- <property>
- <name>yarn.app.mapreduce.am.env</name>
- <value>HADOOP_MAPRED_HOME=/home/bigdata/module/hadoop-3.1.3</value>
- </property>
- <property>
- <name>mapreduce.map.env</name>
- <value>HADOOP_MAPRED_HOME=/home/bigdata/module/hadoop-3.1.3</value>
- </property>
- <property>
- <name>mapreduce.reduce.env</name>
- <value>HADOOP_MAPRED_HOME=/home/bigdata/module/hadoop-3.1.3</value>
- </property>
-
- <!-- 历史服务器端地址 -->
- <property>
- <name>mapreduce.jobhistory.address</name>
- <value>master1:10020</value>
- </property>
- <!-- 历史服务器web端地址 -->
- <property>
- <name>mapreduce.jobhistory.webapp.address</name>
- <value>master1:19888</value>
- </property>
-
- </configuration>
- <?xml version="1.0"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
-
- <configuration>
-
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
-
- <!-- 启用resourcemanager ha -->
- <property>
- <name>yarn.resourcemanager.ha.enabled</name>
- <value>true</value>
- </property>
-
- <!-- 声明两台resourcemanager的地址 -->
- <property>
- <name>yarn.resourcemanager.cluster-id</name>
- <value>cluster-yarn1</value>
- </property>
- <!--指定resourcemanager的逻辑列表-->
- <property>
- <name>yarn.resourcemanager.ha.rm-ids</name>
- <value>rm1,rm2</value>
- </property>
- <!-- ========== rm1的配置 ========== -->
- <!-- 指定rm1的主机名 -->
- <property>
- <name>yarn.resourcemanager.hostname.rm1</name>
- <value>master1</value>
- </property>
- <!-- 指定rm1的web端地址 -->
- <property>
- <name>yarn.resourcemanager.webapp.address.rm1</name>
- <value>master1:8088</value>
- </property>
- <!-- 指定rm1的内部通信地址 -->
- <property>
- <name>yarn.resourcemanager.address.rm1</name>
- <value>master1:8032</value>
- </property>
- <!-- 指定AM向rm1申请资源的地址 -->
- <property>
- <name>yarn.resourcemanager.scheduler.address.rm1</name>
- <value>master1:8030</value>
- </property>
- <!-- 指定供NM连接的地址 -->
- <property>
- <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
- <value>master1:8031</value>
- </property>
- <!-- ========== rm2的配置 ========== -->
- <!-- 指定rm2的主机名 -->
- <property>
- <name>yarn.resourcemanager.hostname.rm2</name>
- <value>master2</value>
- </property>
- <property>
- <name>yarn.resourcemanager.webapp.address.rm2</name>
- <value>master2:8088</value>
- </property>
- <property>
- <name>yarn.resourcemanager.address.rm2</name>
- <value>master2:8032</value>
- </property>
- <property>
- <name>yarn.resourcemanager.scheduler.address.rm2</name>
- <value>master2:8030</value>
- </property>
- <property>
- <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
- <value>master2:8031</value>
- </property>
-
- <!-- 指定zookeeper集群的地址 -->
- <property>
- <name>yarn.resourcemanager.zk-address</name>
- <value>node1:2181,node2:2181,node3:2181</value>
- </property>
-
- <!-- 启用自动恢复 -->
- <property>
- <name>yarn.resourcemanager.recovery.enabled</name>
- <value>true</value>
- </property>
-
- <!-- 指定resourcemanager的状态信息存储在zookeeper集群 -->
- <property>
- <name>yarn.resourcemanager.store.class</name>
- <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
- </property>
- <!-- 环境变量的继承 -->
- <property>
- <name>yarn.nodemanager.env-whitelist</name>
- <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
- </property>
-
- <!-- 开启日志聚集功能 -->
- <property>
- <name>yarn.log-aggregation-enable</name>
- <value>true</value>
- </property>
- <!-- 设置日志聚集服务器地址 -->
- <!-- 设置日志聚集服务器地址 -->
- <property>
- <name>yarn.log.server.url</name>
- <value>http://master1:19888/jobhistory/logs</value>
- </property>
- <!-- 设置日志保留时间为7天 -->
- <property>
- <name>yarn.log-aggregation.retain-seconds</name>
- <value>604800</value>
- </property>
-
- <!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
- <property>
- <name>yarn.nodemanager.pmem-check-enabled</name>
- <value>false</value>
- </property>
-
- <!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
- <property>
- <name>yarn.nodemanager.vmem-check-enabled</name>
- <value>false</value>
- </property>
-
- <property>
- <name>yarn.nodemanager.resource.memory-mb</name>
- <value>24576</value>
- </property>
-
- </configuration>
- <?xml version="1.0" encoding="UTF-8"?>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
- -->
- <configuration>
-
- <!-- HUE -->
- <property>
- <name>httpfs.proxyuser.bigdata.hosts</name>
- <value>*</value>
- </property>
- <property>
- <name>httpfs.proxyuser.bigdata.groups</name>
- <value>*</value>
- </property>
-
-
- </configuration>
- <!--
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- <configuration>
-
- <property>
- <name>yarn.scheduler.capacity.maximum-applications</name>
- <value>10000</value>
- <description>
- Maximum number of applications that can be pending and running.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
- <value>0.3</value>
- <description>
- Maximum percent of resources in the cluster which can be used to run
- application masters i.e. controls number of concurrent running
- applications.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.resource-calculator</name>
- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
- <description>
- The ResourceCalculator implementation to be used to compare
- Resources in the scheduler.
- The default i.e. DefaultResourceCalculator only uses Memory while
- DominantResourceCalculator uses dominant-resource to compare
- multi-dimensional resources such as Memory, CPU etc.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.queues</name>
- <value>high,low</value>
- <description>
- The queues at the this level (root is the root queue).
- </description>
- </property>
- <!--
- 队列占比
- -->
- <property>
- <name>yarn.scheduler.capacity.root.high.capacity</name>
- <value>70</value>
- <description>Default queue target capacity.</description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.low.capacity</name>
- <value>30</value>
- <description>Default queue target capacity.</description>
- </property>
-
-
- <!--
- 百分比
- -->
- <property>
- <name>yarn.scheduler.capacity.root.high.user-limit-factor</name>
- <value>1</value>
- <description>
- Default queue user limit a percentage from 0.0 to 1.0.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.low.user-limit-factor</name>
- <value>1</value>
- <description>
- Default queue user limit a percentage from 0.0 to 1.0.
- </description>
- </property>
-
-
- <!--
- 运行状态
- -->
- <property>
- <name>yarn.scheduler.capacity.root.high.maximum-capacity</name>
- <value>100</value>
- <description>
- The maximum capacity of the default queue.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.low.state</name>
- <value>RUNNING</value>
- <description>
- The state of the default queue. State can be one of RUNNING or STOPPED.
- </description>
- </property>
-
- <!--
- 权限
- -->
- <property>
- <name>yarn.scheduler.capacity.root.high.acl_submit_applications</name>
- <value>*</value>
- <description>
- The ACL of who can submit jobs to the default queue.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.low.acl_submit_applications</name>
- <value>*</value>
- <description>
- The ACL of who can submit jobs to the default queue.
- </description>
- </property>
-
-
- <!--
- 权限
- -->
- <property>
- <name>yarn.scheduler.capacity.root.high.acl_administer_queue</name>
- <value>*</value>
- <description>
- The ACL of who can administer jobs on the default queue.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.low.acl_administer_queue</name>
- <value>*</value>
- <description>
- The ACL of who can administer jobs on the default queue.
- </description>
- </property>
-
- <!--
- 权限
- -->
- <property>
- <name>yarn.scheduler.capacity.root.high.acl_application_max_priority</name>
- <value>*</value>
- <description>
- The ACL of who can submit applications with configured priority.
- For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.low.acl_application_max_priority</name>
- <value>*</value>
- <description>
- The ACL of who can submit applications with configured priority.
- For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
- </description>
- </property>
-
- <!--
- 权限
- -->
- <property>
- <name>yarn.scheduler.capacity.root.high.maximum-application-lifetime
- </name>
- <value>-1</value>
- <description>
- Maximum lifetime of an application which is submitted to a queue
- in seconds. Any value less than or equal to zero will be considered as
- disabled.
- This will be a hard time limit for all applications in this
- queue. If positive value is configured then any application submitted
- to this queue will be killed after exceeds the configured lifetime.
- User can also specify lifetime per application basis in
- application submission context. But user lifetime will be
- overridden if it exceeds queue maximum lifetime. It is point-in-time
- configuration.
- Note : Configuring too low value will result in killing application
- sooner. This feature is applicable only for leaf queue.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.low.maximum-application-lifetime
- </name>
- <value>-1</value>
- <description>
- Maximum lifetime of an application which is submitted to a queue
- in seconds. Any value less than or equal to zero will be considered as
- disabled.
- This will be a hard time limit for all applications in this
- queue. If positive value is configured then any application submitted
- to this queue will be killed after exceeds the configured lifetime.
- User can also specify lifetime per application basis in
- application submission context. But user lifetime will be
- overridden if it exceeds queue maximum lifetime. It is point-in-time
- configuration.
- Note : Configuring too low value will result in killing application
- sooner. This feature is applicable only for leaf queue.
- </description>
- </property>
-
-
- <!--
- 生命周期
- -->
- <property>
- <name>yarn.scheduler.capacity.root.high.default-application-lifetime
- </name>
- <value>-1</value>
- <description>
- Default lifetime of an application which is submitted to a queue
- in seconds. Any value less than or equal to zero will be considered as
- disabled.
- If the user has not submitted application with lifetime value then this
- value will be taken. It is point-in-time configuration.
- Note : Default lifetime can't exceed maximum lifetime. This feature is
- applicable only for leaf queue.
-
-
-
yarn.scheduler.capacity.root.low.default-application-lifetime -
-
-1 -
- Default lifetime of an application which is submitted to a queue
- in seconds. Any value less than or equal to zero will be considered as
- disabled.
- If the user has not submitted application with lifetime value then this
- value will be taken. It is point-in-time configuration.
- Note : Default lifetime can't exceed maximum lifetime. This feature is
- applicable only for leaf queue.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.node-locality-delay</name>
- <value>40</value>
- <description>
- Number of missed scheduling opportunities after which the CapacityScheduler
- attempts to schedule rack-local containers.
- When setting this parameter, the size of the cluster should be taken into account.
- We use 40 as the default value, which is approximately the number of nodes in one rack.
- Note, if this value is -1, the locality constraint in the container request
- will be ignored, which disables the delay scheduling.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
- <value>-1</value>
- <description>
- Number of additional missed scheduling opportunities over the node-locality-delay
- ones, after which the CapacityScheduler attempts to schedule off-switch containers,
- instead of rack-local ones.
- Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
- attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
- after 40+20=60 missed opportunities.
- When setting this parameter, the size of the cluster should be taken into account.
- We use -1 as the default value, which disables this feature. In this case, the number
- of missed opportunities for assigning off-switch containers is calculated based on
- the number of containers and unique locations specified in the resource request,
- as well as the size of the cluster.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.queue-mappings</name>
- <value></value>
- <description>
- A list of mappings that will be used to assign jobs to queues
- The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
- Typically this list will be used to map users to queues,
- for example, u:%user:%user maps all users to queues with the same name
- as the user.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
- <value>false</value>
- <description>
- If a queue mapping is present, will it override the value specified
- by the user? This can be used by administrators to place jobs in queues
- that are different than the one specified by the user.
- The default is false.
- </description>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
- <value>1</value>
- <description>
- Controls the number of OFF_SWITCH assignments allowed
- during a node's heartbeat. Increasing this value can improve
- scheduling rate for OFF_SWITCH containers. Lower values reduce
- "clumping" of applications on particular nodes. The default is 1.
- Legal values are 1-MAX_INT. This config is refreshable.
-
-
-
-
yarn.scheduler.capacity.application.fail-fast -
false -
- Whether RM should fail during recovery if previous applications'
- queue is no longer valid.
- </description>
- </property>
-
- </configuration>
- #这个主要是解决找不到java的问题
- export JAVA_HOME=/home/bigdata/module/jdk1.8.0_161
- #!/bin/bash
- if [ $# -lt 1 ]
- then
- echo "No Args Input..."
- exit ;
- fi
- case $1 in
- "start")
- echo " =================== 启动 hadoop集群 ==================="
- echo "node1的journalnode启动"
- ssh node1 "hdfs --daemon start journalnode"
- echo "node2的journalnode启动"
- ssh node2 "hdfs --daemon start journalnode"
- echo "node3的journalnode启动"
- ssh node3 "hdfs --daemon start journalnode"
-
-
- echo " --------------- 启动 hdfs ---------------"
- ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/start-dfs.sh"
- echo " --------------- 启动 yarn ---------------"
- ssh master2 "/home/bigdata/module/hadoop-3.1.3/sbin/start-yarn.sh"
-
- echo " --------------- 启动 historyserver ---------------"
- ssh master1 "/home/bigdata/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
- echo " --------------- 启动 httpfs ---------------"
- ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/httpfs.sh start"
- #建议/home/bigdata/hadoop/hadoop/bin/hdfs --daemon start httpfs
- ;;
- "stop")
- echo " --------------- 关闭httpfs ---------------"
- #建议/home/bigdata/hadoop/hadoop/bin/hdfs --daemon stop httpfs
- ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/httpfs.sh stop"
- echo " =================== 关闭 hadoop集群 ==================="
- echo " --------------- 关闭 historyserver ---------------"
- ssh master1 "/home/bigdata/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
-
- echo " --------------- 关闭 yarn ---------------"
- ssh master2 "/home/bigdata/module/hadoop-3.1.3/sbin/stop-yarn.sh"
- echo " --------------- 关闭 hdfs ---------------"
- ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/stop-dfs.sh"
-
- echo "node1的journalnode关闭"
- ssh node1 "hdfs --daemon stop journalnode"
- echo "node2的journalnode关闭"
- ssh node2 "hdfs --daemon stop journalnode"
- echo "node3的journalnode关闭"
- ssh node3 "hdfs --daemon stop journalnode"
- ;;
- *)
- echo "Input Args Error..."
- ;;
- esac
- [hadoop]
-
- # Configuration for HDFS NameNode
- # ------------------------------------------------------------------------
- [[hdfs_clusters]]
- # HA support by using HttpFs
-
- [[[default]]]
- # Enter the filesystem uri
- fs_defaultfs=hdfs://master1:8020
-
- # NameNode logical name.
- ## logical_name=
-
- # Use WebHdfs/HttpFs as the communication mechanism.
- # Domain should be the NameNode or HttpFs host.
- # Default port is 14000 for HttpFs.
- #要单独启动对应的webhdfs
- webhdfs_url=http://master1:14000/webhdfs/v1
-
- # Change this if your HDFS cluster is Kerberos-secured
- ## security_enabled=false
-
- # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
- # have to be verified against certificate authority
- ## ssl_cert_ca_verify=True
-
- # Directory of the Hadoop configuration
- hadoop_conf_dir=/home/bigdata/module/hadoop-3.1.3/etc/hadoop
- hadoop_bin=/home/bigdata/module/hadoop-3.1.3/bin
- hadoop_hdfs_home=/home/bigdata/module/hadoop-3.1.3
-
- # Configuration for YARN (MR2)
- # ------------------------------------------------------------------------
- [[yarn_clusters]]
-
- [[[default]]]
- # Enter the host on which you are running the ResourceManager
- resourcemanager_host=cluster-yarn1
-
- # The port where the ResourceManager IPC listens on
- resourcemanager_port=8032
-
- # Whether to submit jobs to this cluster
- submit_to=True
-
- # Resource Manager logical name (required for HA)
- logical_name=rm1
-
- # Change this if your YARN cluster is Kerberos-secured
- ## security_enabled=false
-
- # URL of the ResourceManager API
- resourcemanager_api_url=http://master1:8088
-
- # URL of the ProxyServer API
- proxy_api_url=http://master1:8088
-
- # URL of the HistoryServer API
- history_server_api_url=http://master1:19888
-
- # URL of the Spark History Server
- ## spark_history_server_url=http://localhost:18088
-
- # Change this if your Spark History Server is Kerberos-secured
- ## spark_history_server_security_enabled=false
-
- # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
- # have to be verified against certificate authority
- ## ssl_cert_ca_verify=True
-
- # HA support by specifying multiple clusters.
- # Redefine different properties there.
- # e.g.
-
- [[[ha]]]
- # Resource Manager logical name (required for HA)
- logical_name=rm2
-
- # Un-comment to enable
- submit_to=True
-
- # URL of the ResourceManager API
- resourcemanager_api_url=http://master2:8088
- history_server_api_url=http://master1:19888
- # ...
对接hive的时候把超时加长不然很容易就任务失败了
server_conn_timeout=3600
- #!/bin/bash
- case $1 in
- "start"){
- for i in master2
- do
- echo " --------启动 $i hbase-------"
- ssh $i "/home/bigdata/module/hbase-2.4.9/bin/start-hbase.sh"
- ssh $i "/home/bigdata/module/hbase-2.4.9/bin/hbase-daemons.sh start thrift"
- done
- };;
- "stop"){
- for i in master2
- do
- echo " --------停止 $i hbase-------"
- ssh $i "/home/bigdata/module/hbase-2.4.9/bin/hbase-daemons.sh stop thrift"
- ssh $i "/home/bigdata/module/hbase-2.4.9/bin/stop-hbase.sh"
- done
- };;
- esac
- [hbase]
- # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
- # Use full hostname. If hbase.thrift.ssl.enabled in hbase-site is set to true, https will be used otherwise it will use http
- # If using Kerberos we assume GSSAPI SASL, not PLAIN.
- hbase_clusters=(Cluster|node3:9090)
-
- # HBase configuration directory, where hbase-site.xml is located.
- hbase_conf_dir=/home/bigdata/module/hbase-2.4.9/conf
-
- # Hard limit of rows or columns per row fetched before truncating.
- ## truncate_limit = 500
-
- # Should come from hbase-site.xml, do not set. 'framed' is used to chunk up responses, used with the nonblocking server in Thrift but is not supported in Hue.
- # 'buffered' used to be the default of the HBase Thrift Server. Default is buffered when not set in hbase-site.xml.
- ## thrift_transport=buffered
-
- # Choose whether Hue should validate certificates received from the server.
- ## ssl_cert_ca_verify=true
参考文章
Hue编译安装_Endless在路上的博客-CSDN博客_hue编译安装