• Hue编译安装使用


    简介

          由于大数据框架很多,为了解决某个问题,一般来说会用到多个框架,但是每个框架又都有自己的web UI监控界面,对应着不同的端口号。比如HDFS(9870)、YARN(8088)、MapReduce(19888)等。这个时候有一个统一的web UI界面去管理各个大数据常用框架是非常方便的。这就使得对大数据的开发、监控和运维更加的方便。由此Hue诞生就是为了解决每个框架都有自己的Web界面的问题。

    编译安装

    Hue官方网站:https://gethue.com/
    HUE官方用户手册:https://docs.gethue.com/
    官方安装文档:https://docs.gethue.com/administrator/installation/install/
    HUE下载地址:Hue - The open source SQL Assistant for Data Warehouses

    下载(点上面那个Hue下载地址下面地址作废)

    Hue - The open source SQL Assistant for Data Warehouses

    相关安装包

    1. centos 7+
    2. hue 4.5
    3. node.js v10.6.0(参考官网建议,高版本编译存在问题)

    hue源码包

    链接:https://pan.baidu.com/s/10UPgRfejKpwdV6qT4WuJog 
    提取码:yyds 
    --来自百度网盘超级会员V5的分享

    npm

     先下载npm,安装,这里我就不具体了(记得加环境变量)

    1. wget https://nodejs.org/dist/v14.15.4/node-v14.15.4-linux-x64.tar.xz
    2. tar -xf node-v14.15.4-linux-x64.tar.xz

    配置环境变量

     sudo vi /etc/profile.d/my_env.sh
    1. #NPM_HOME
    2. NPM_HOME=/home/bigdata/node-v14.15.4-linux-x64
    3. export PATH=$PATH:$NPM_HOME/bin:$NPM_HOME/sbin
     source /etc/profile.d/my_env.sh

    配置淘宝镜像

    npm config set registry https://registry.npm.taobao.org

    查看是否切换成功

    npm config get registry

    如果npm不好使,使用cnpm

    1. npm install -g cnpm --registry=https://registry.npm.taobao.org
    2. cd /usr/bin
    3. ln -s /usr/local/node/bin/cnpm cnpm

    编译

    tar -zxvf hue-4.5.0.tgz

    安装依赖包(安装最好在一台没有安装过mysql的机器编译安装)

    1. # 需要Python支持(Python 2.7+ / Python 3.5+)
    2. python --version
    3. # 在 CentOS 系统中安装编译 Hue 需要的依赖库
    4. sudo yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel

    以上依赖仅适用CentOS/RHEL 7.X,其他情况请参考https://docs.gethue.com/administrator/installation/dependencies/
    安装Hue的节点上最好没有安装过MySQL,否则可能有版本冲突
    安装过程中需要联网,网络不好会有各种奇怪的问题

    修改hue.ini文件

    1. # [desktop]
    2. http_host=node2
    3. http_port=8000
    4. time_zone=Asia/Shanghai
    5. server_user=bigdata
    6. server_group=bigdata
    7. default_user=bigdata
    8. app_blacklist=search
    9. # [[database]]。Hue默认使用SQLite数据库记录相关元数据,替换为mysql
    10. engine=mysql
    11. host=master
    12. port=3306
    13. user=root
    14. password=root
    15. #数据库名称
    16. name=hue
    17. # 1003行左右,Hadoop配置文件的路径
    18. hadoop_conf_dir=/home/bigdata/hadoop/hadoop/etc/hadoop


    hue编译

    1. # 进入 hue 源码目录,进行编译。 使用 PREFIX 指定安装 Hue 的路径
    2. cd hue-4.5.0
    3. make apps

     如果遇到下列问题

    yum install mysql-devel

    然后删除上面指定编译目录的target里面的文件

    1. PREFIX=/home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target
    2. cd /home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target
    3. rm -rf ./*

    如果出现下面的错误

     sudo yum install -y libxslt-devel

     如果出现下面的错误

     查找对应的依赖

    sudo yum search sqlite3

    找到对应的依赖进行安装 

    1. sudo yum install -y libsqlite3x.x86_64
    2. sudo yum install -y libsqlite3x-devel.x86_64
    3. sudo yum install -y gmp-devel.x86_64

    再次编译

    PREFIX=/home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target make install

    稍微的等待.......恭喜编译成功! 

    编译以后的包不能到其他机器使用,因为挺多都是觉得路径里面,除非环境一样。

    tar -zcvf hue.tar.gz hue

    整合

    HDFS

    修改hadoop配置

    在 hdfs-site.xml 中增加配置

    1. <!-- HUE -->
    2. <property>
    3. <name>dfs.webhdfs.enabled</name>
    4. <value>true</value>
    5. </property>
    6. <property>
    7. <name>dfs.permissions.enabled</name>
    8. <value>false</value>
    9. </property>

    在 core-site.xml 中增加配置

    1. <!-- HUE -->
    2. <property>
    3. <name>hadoop.proxyuser.bigdata.hosts</name>
    4. <value>*</value>
    5. </property>
    6. <property>
    7. <name>hadoop.proxyuser.bigdata.groups</name>
    8. <value>*</value>
    9. </property>
    10. <property>
    11. <name>hadoop.proxyuser.hdfs.hosts</name>
    12. <value>*</value>
    13. </property>
    14. <property>
    15. <name>hadoop.proxyuser.hdfs.groups</name>
    16. <value>*</value>
    17. </property>

    增加 httpfs-site.xml 文件,加入配置

    1. <configuration>
    2. <!-- HUE -->
    3. <property>
    4. <name>httpfs.proxyuser.bigdata.hosts</name>
    5. <value>*</value>
    6. </property>
    7. <property>
    8. <name>httpfs.proxyuser.bigdata.groups</name>
    9. <value>*</value>
    10. </property>
    11. </configuration>

    备注:修改完HDFS相关配置后,需要把配置scp给集群中每台机器,重启hdfs服务。

    修改hue配置

    1. cd /home/bigdata/apache-maven-3.8.6/hue-4.5.0/desktop/conf
    2. vi hue.ini
    1. # [desktop]
    2. http_host=node2
    3. http_port=8000
    4. time_zone=Asia/Shanghai
    5. server_user=bigdata
    6. server_group=bigdata
    7. default_user=bigdata
    8. app_blacklist=search
    9. # [[database]]。Hue默认使用SQLite数据库记录相关元数据,替换为mysql
    10. engine=mysql
    11. host=master
    12. port=3306
    13. user=root
    14. password=root
    15. #数据库名称
    16. name=hue
    17. # 1003行左右,Hadoop配置文件的路径
    18. hadoop_conf_dir=/home/bigdata/hadoop/hadoop/etc/hadoop

    创建数据库

    CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

    Hue初始化数据库

    1. # 初始化数据库
    2. cd /home/bigdata/apache-maven-3.8.6/hue-release-4.4.0/target/hue/build/env/bin
    3. ./hue syncdb
    4. ./hue migrate
    5. # 检查数据

    启动hue

    /data/hue/build/env/bin/supervisor
    

    最全配置

    core-site.xml

    1. <?xml version="1.0" encoding="UTF-8"?>
    2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    3. <!--
    4. Licensed under the Apache License, Version 2.0 (the "License");
    5. you may not use this file except in compliance with the License.
    6. You may obtain a copy of the License at
    7. http://www.apache.org/licenses/LICENSE-2.0
    8. Unless required by applicable law or agreed to in writing, software
    9. distributed under the License is distributed on an "AS IS" BASIS,
    10. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    11. See the License for the specific language governing permissions and
    12. limitations under the License. See accompanying LICENSE file.
    13. -->
    14. <!-- Put site-specific property overrides in this file. -->
    15. <configuration>
    16. <property>
    17. <!--指定 namenode 的 hdfs 协议文件系统的通信地址-->
    18. <name>fs.defaultFS</name>
    19. <!--指定hdfs高可用的集群名称-->
    20. <value>hdfs://bigdatacluster</value>
    21. </property>
    22. <property>
    23. <!--指定 hadoop 集群存储临时文件的目录-->
    24. <name>hadoop.tmp.dir</name>
    25. <value>/home/bigdata/module/hadoop-3.1.3/data</value>
    26. </property>
    27. <!-- 配置HDFS网页登录使用的静态用户为bigdata -->
    28. <property>
    29. <name>hadoop.http.staticuser.user</name>
    30. <value>bigdata</value>
    31. </property>
    32. <!-- 回收站 -->
    33. <property>
    34. <name>fs.trash.interval</name>
    35. <value>1</value>
    36. </property>
    37. <property>
    38. <name>fs.trash.checkpoint.interval</name>
    39. <value>1</value>
    40. </property>
    41. <!-- 配置该bigdata(superUser)允许通过代理访问的主机节点 -->
    42. <property>
    43. <name>hadoop.proxyuser.bigdata.hosts</name>
    44. <value>*</value>
    45. </property>
    46. <!-- 配置该bigdata(superUser)允许通过代理用户所属组 -->
    47. <property>
    48. <name>hadoop.proxyuser.bigdata.groups</name>
    49. <value>*</value>
    50. </property>
    51. <!-- 配置该bigdata(superUser)允许通过代理的用户-->
    52. <property>
    53. <name>hadoop.proxyuser.bigdata.users</name>
    54. <value>*</value>
    55. </property>
    56. <!-- 指定zkfc要连接的zkServer地址 -->
    57. <property>
    58. <name>ha.zookeeper.quorum</name>
    59. <value>node1:2181,node2:2181,node3:2181</value>
    60. </property>
    61. <!-- Hue -->
    62. <property>
    63. <name>hadoop.proxyuser.hdfs.hosts</name>
    64. <value>*</value>
    65. </property>
    66. <property>
    67. <name>hadoop.proxyuser.hdfs.groups</name>
    68. <value>*</value>
    69. </property>
    70. <property>
    71. <name>hadoop.proxyuser.httpfs.hosts</name>
    72. <value>*</value>
    73. </property>
    74. <property>
    75. <name>hadoop.proxyuser.httpfs.groups</name>
    76. <value>*</value>
    77. </property>
    78. <property>
    79. <name>hadoop.proxyuser.hue.hosts</name>
    80. <value>*</value>
    81. </property>
    82. <property>
    83. <name>hadoop.proxyuser.hue.groups</name>
    84. <value>*</value>
    85. </property>
    86. </configuration>

    hdfs-site.xml 

    1. <?xml version="1.0" encoding="UTF-8"?>
    2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    3. <!--
    4. Licensed under the Apache License, Version 2.0 (the "License");
    5. you may not use this file except in compliance with the License.
    6. You may obtain a copy of the License at
    7. http://www.apache.org/licenses/LICENSE-2.0
    8. Unless required by applicable law or agreed to in writing, software
    9. distributed under the License is distributed on an "AS IS" BASIS,
    10. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    11. See the License for the specific language governing permissions and
    12. limitations under the License. See accompanying LICENSE file.
    13. -->
    14. <!-- Put site-specific property overrides in this file. -->
    15. <configuration>
    16. <!-- NameNode数据存储目录 -->
    17. <property>
    18. <name>dfs.namenode.name.dir</name>
    19. <value>file://${hadoop.tmp.dir}/name</value>
    20. </property>
    21. <!-- DataNode数据存储目录 -->
    22. <property>
    23. <name>dfs.datanode.data.dir</name>
    24. <value>file://${hadoop.tmp.dir}/data</value>
    25. </property>
    26. <!-- JournalNode数据存储目录 -->
    27. <property>
    28. <name>dfs.journalnode.edits.dir</name>
    29. <value>${hadoop.tmp.dir}/jn</value>
    30. </property>
    31. <!-- 完全分布式集群名称 对应core.xml里面的fs.defaultFS-->
    32. <property>
    33. <name>dfs.nameservices</name>
    34. <value>bigdatacluster</value>
    35. </property>
    36. <!-- 集群中NameNode节点都有哪些 -->
    37. <property>
    38. <name>dfs.ha.namenodes.bigdatacluster</name>
    39. <value>nn1,nn2</value>
    40. </property>
    41. <!-- NameNode的RPC通信地址 -->
    42. <property>
    43. <name>dfs.namenode.rpc-address.bigdatacluster.nn1</name>
    44. <value>master1:8020</value>
    45. </property>
    46. <property>
    47. <name>dfs.namenode.rpc-address.bigdatacluster.nn2</name>
    48. <value>master2:8020</value>
    49. </property>
    50. <!-- NameNode的http通信地址 -->
    51. <property>
    52. <name>dfs.namenode.http-address.bigdatacluster.nn1</name>
    53. <value>master1:9870</value>
    54. </property>
    55. <property>
    56. <name>dfs.namenode.http-address.bigdatacluster.nn2</name>
    57. <value>master2:9870</value>
    58. </property>
    59. <!-- 指定NameNode元数据在JournalNode上的存放位置 -->
    60. <property>
    61. <name>dfs.namenode.shared.edits.dir</name>
    62. <value>qjournal://node1:8485;node2:8485;node3:8485/bigdatacluster</value>
    63. </property>
    64. <!-- 访问代理类:client用于确定哪个NameNode为Active -->
    65. <property>
    66. <name>dfs.client.failover.proxy.provider.bigdatacluster</name>
    67. <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    68. </property>
    69. <!-- 配置隔离机制,即同一时刻只能有一台服务器对外响应 -->
    70. <property>
    71. <name>dfs.ha.fencing.methods</name>
    72. <value>sshfence</value>
    73. </property>
    74. <!-- 使用隔离机制时需要ssh秘钥登录-->
    75. <property>
    76. <name>dfs.ha.fencing.ssh.private-key-files</name>
    77. <value>/home/bigdata/.ssh/id_rsa</value>
    78. </property>
    79. <!-- 配置黑名单 -->
    80. <property>
    81. <name>dfs.hosts.exclude</name>
    82. <value>/home/bigdata/module/hadoop-3.1.3/etc/blacklist</value>
    83. </property>
    84. <!-- 启用nn故障自动转移 -->
    85. <property>
    86. <name>dfs.ha.automatic-failover.enabled</name>
    87. <value>true</value>
    88. </property>
    89. <!-- HUE -->
    90. <property>
    91. <name>dfs.webhdfs.enabled</name>
    92. <value>true</value>
    93. </property>
    94. <property>
    95. <name>dfs.permissions.enabled</name>
    96. <value>false</value>
    97. </property>
    98. </configuration>

    mapred-site.xml

    1. <?xml version="1.0"?>
    2. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    3. <!--
    4. Licensed under the Apache License, Version 2.0 (the "License");
    5. you may not use this file except in compliance with the License.
    6. You may obtain a copy of the License at
    7. http://www.apache.org/licenses/LICENSE-2.0
    8. Unless required by applicable law or agreed to in writing, software
    9. distributed under the License is distributed on an "AS IS" BASIS,
    10. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    11. See the License for the specific language governing permissions and
    12. limitations under the License. See accompanying LICENSE file.
    13. -->
    14. <!-- Put site-specific property overrides in this file. -->
    15. <configuration>
    16. <!-- 启用jvm重用 -->
    17. <property>
    18. <name>mapreduce.job.jvm.numtasks</name>
    19. <value>10</value>
    20. <description>How many tasks to run per jvm,if set to -1 ,there is no limit</description>
    21. </property>
    22. <!--
    23. <property>
    24. <name>mapreduce.job.tracker</name>
    25. <value>hdfs://master1:8001</value>
    26. <final>true</final>
    27. </property>
    28. -->
    29. <property>
    30. <!--指定 mapreduce 作业运行在 yarn 上-->
    31. <name>mapreduce.framework.name</name>
    32. <value>yarn</value>
    33. </property>
    34. <property>
    35. <name>yarn.app.mapreduce.am.env</name>
    36. <value>HADOOP_MAPRED_HOME=/home/bigdata/module/hadoop-3.1.3</value>
    37. </property>
    38. <property>
    39. <name>mapreduce.map.env</name>
    40. <value>HADOOP_MAPRED_HOME=/home/bigdata/module/hadoop-3.1.3</value>
    41. </property>
    42. <property>
    43. <name>mapreduce.reduce.env</name>
    44. <value>HADOOP_MAPRED_HOME=/home/bigdata/module/hadoop-3.1.3</value>
    45. </property>
    46. <!-- 历史服务器端地址 -->
    47. <property>
    48. <name>mapreduce.jobhistory.address</name>
    49. <value>master1:10020</value>
    50. </property>
    51. <!-- 历史服务器web端地址 -->
    52. <property>
    53. <name>mapreduce.jobhistory.webapp.address</name>
    54. <value>master1:19888</value>
    55. </property>
    56. </configuration>

     yarn-site.xml

    1. <?xml version="1.0"?>
    2. <!--
    3. Licensed under the Apache License, Version 2.0 (the "License");
    4. you may not use this file except in compliance with the License.
    5. You may obtain a copy of the License at
    6. http://www.apache.org/licenses/LICENSE-2.0
    7. Unless required by applicable law or agreed to in writing, software
    8. distributed under the License is distributed on an "AS IS" BASIS,
    9. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    10. See the License for the specific language governing permissions and
    11. limitations under the License. See accompanying LICENSE file.
    12. -->
    13. <configuration>
    14. <property>
    15. <name>yarn.nodemanager.aux-services</name>
    16. <value>mapreduce_shuffle</value>
    17. </property>
    18. <!-- 启用resourcemanager ha -->
    19. <property>
    20. <name>yarn.resourcemanager.ha.enabled</name>
    21. <value>true</value>
    22. </property>
    23. <!-- 声明两台resourcemanager的地址 -->
    24. <property>
    25. <name>yarn.resourcemanager.cluster-id</name>
    26. <value>cluster-yarn1</value>
    27. </property>
    28. <!--指定resourcemanager的逻辑列表-->
    29. <property>
    30. <name>yarn.resourcemanager.ha.rm-ids</name>
    31. <value>rm1,rm2</value>
    32. </property>
    33. <!-- ========== rm1的配置 ========== -->
    34. <!-- 指定rm1的主机名 -->
    35. <property>
    36. <name>yarn.resourcemanager.hostname.rm1</name>
    37. <value>master1</value>
    38. </property>
    39. <!-- 指定rm1的web端地址 -->
    40. <property>
    41. <name>yarn.resourcemanager.webapp.address.rm1</name>
    42. <value>master1:8088</value>
    43. </property>
    44. <!-- 指定rm1的内部通信地址 -->
    45. <property>
    46. <name>yarn.resourcemanager.address.rm1</name>
    47. <value>master1:8032</value>
    48. </property>
    49. <!-- 指定AM向rm1申请资源的地址 -->
    50. <property>
    51. <name>yarn.resourcemanager.scheduler.address.rm1</name>
    52. <value>master1:8030</value>
    53. </property>
    54. <!-- 指定供NM连接的地址 -->
    55. <property>
    56. <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    57. <value>master1:8031</value>
    58. </property>
    59. <!-- ========== rm2的配置 ========== -->
    60. <!-- 指定rm2的主机名 -->
    61. <property>
    62. <name>yarn.resourcemanager.hostname.rm2</name>
    63. <value>master2</value>
    64. </property>
    65. <property>
    66. <name>yarn.resourcemanager.webapp.address.rm2</name>
    67. <value>master2:8088</value>
    68. </property>
    69. <property>
    70. <name>yarn.resourcemanager.address.rm2</name>
    71. <value>master2:8032</value>
    72. </property>
    73. <property>
    74. <name>yarn.resourcemanager.scheduler.address.rm2</name>
    75. <value>master2:8030</value>
    76. </property>
    77. <property>
    78. <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    79. <value>master2:8031</value>
    80. </property>
    81. <!-- 指定zookeeper集群的地址 -->
    82. <property>
    83. <name>yarn.resourcemanager.zk-address</name>
    84. <value>node1:2181,node2:2181,node3:2181</value>
    85. </property>
    86. <!-- 启用自动恢复 -->
    87. <property>
    88. <name>yarn.resourcemanager.recovery.enabled</name>
    89. <value>true</value>
    90. </property>
    91. <!-- 指定resourcemanager的状态信息存储在zookeeper集群 -->
    92. <property>
    93. <name>yarn.resourcemanager.store.class</name>
    94. <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    95. </property>
    96. <!-- 环境变量的继承 -->
    97. <property>
    98. <name>yarn.nodemanager.env-whitelist</name>
    99. <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    100. </property>
    101. <!-- 开启日志聚集功能 -->
    102. <property>
    103. <name>yarn.log-aggregation-enable</name>
    104. <value>true</value>
    105. </property>
    106. <!-- 设置日志聚集服务器地址 -->
    107. <!-- 设置日志聚集服务器地址 -->
    108. <property>
    109. <name>yarn.log.server.url</name>
    110. <value>http://master1:19888/jobhistory/logs</value>
    111. </property>
    112. <!-- 设置日志保留时间为7天 -->
    113. <property>
    114. <name>yarn.log-aggregation.retain-seconds</name>
    115. <value>604800</value>
    116. </property>
    117. <!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
    118. <property>
    119. <name>yarn.nodemanager.pmem-check-enabled</name>
    120. <value>false</value>
    121. </property>
    122. <!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
    123. <property>
    124. <name>yarn.nodemanager.vmem-check-enabled</name>
    125. <value>false</value>
    126. </property>
    127. <property>
    128. <name>yarn.nodemanager.resource.memory-mb</name>
    129. <value>24576</value>
    130. </property>
    131. </configuration>

    httpfs-site.xml

    1. <?xml version="1.0" encoding="UTF-8"?>
    2. <!--
    3. Licensed under the Apache License, Version 2.0 (the "License");
    4. you may not use this file except in compliance with the License.
    5. You may obtain a copy of the License at
    6. http://www.apache.org/licenses/LICENSE-2.0
    7. Unless required by applicable law or agreed to in writing, software
    8. distributed under the License is distributed on an "AS IS" BASIS,
    9. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    10. See the License for the specific language governing permissions and
    11. limitations under the License.
    12. -->
    13. <configuration>
    14. <!-- HUE -->
    15. <property>
    16. <name>httpfs.proxyuser.bigdata.hosts</name>
    17. <value>*</value>
    18. </property>
    19. <property>
    20. <name>httpfs.proxyuser.bigdata.groups</name>
    21. <value>*</value>
    22. </property>
    23. </configuration>

    capacity-scheduler.xml 

    1. <!--
    2. Licensed under the Apache License, Version 2.0 (the "License");
    3. you may not use this file except in compliance with the License.
    4. You may obtain a copy of the License at
    5. http://www.apache.org/licenses/LICENSE-2.0
    6. Unless required by applicable law or agreed to in writing, software
    7. distributed under the License is distributed on an "AS IS" BASIS,
    8. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    9. See the License for the specific language governing permissions and
    10. limitations under the License. See accompanying LICENSE file.
    11. -->
    12. <configuration>
    13. <property>
    14. <name>yarn.scheduler.capacity.maximum-applications</name>
    15. <value>10000</value>
    16. <description>
    17. Maximum number of applications that can be pending and running.
    18. </description>
    19. </property>
    20. <property>
    21. <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
    22. <value>0.3</value>
    23. <description>
    24. Maximum percent of resources in the cluster which can be used to run
    25. application masters i.e. controls number of concurrent running
    26. applications.
    27. </description>
    28. </property>
    29. <property>
    30. <name>yarn.scheduler.capacity.resource-calculator</name>
    31. <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value>
    32. <description>
    33. The ResourceCalculator implementation to be used to compare
    34. Resources in the scheduler.
    35. The default i.e. DefaultResourceCalculator only uses Memory while
    36. DominantResourceCalculator uses dominant-resource to compare
    37. multi-dimensional resources such as Memory, CPU etc.
    38. </description>
    39. </property>
    40. <property>
    41. <name>yarn.scheduler.capacity.root.queues</name>
    42. <value>high,low</value>
    43. <description>
    44. The queues at the this level (root is the root queue).
    45. </description>
    46. </property>
    47. <!--
    48. 队列占比
    49. -->
    50. <property>
    51. <name>yarn.scheduler.capacity.root.high.capacity</name>
    52. <value>70</value>
    53. <description>Default queue target capacity.</description>
    54. </property>
    55. <property>
    56. <name>yarn.scheduler.capacity.root.low.capacity</name>
    57. <value>30</value>
    58. <description>Default queue target capacity.</description>
    59. </property>
    60. <!--
    61. 百分比
    62. -->
    63. <property>
    64. <name>yarn.scheduler.capacity.root.high.user-limit-factor</name>
    65. <value>1</value>
    66. <description>
    67. Default queue user limit a percentage from 0.0 to 1.0.
    68. </description>
    69. </property>
    70. <property>
    71. <name>yarn.scheduler.capacity.root.low.user-limit-factor</name>
    72. <value>1</value>
    73. <description>
    74. Default queue user limit a percentage from 0.0 to 1.0.
    75. </description>
    76. </property>
    77. <!--
    78. 运行状态
    79. -->
    80. <property>
    81. <name>yarn.scheduler.capacity.root.high.maximum-capacity</name>
    82. <value>100</value>
    83. <description>
    84. The maximum capacity of the default queue.
    85. </description>
    86. </property>
    87. <property>
    88. <name>yarn.scheduler.capacity.root.low.state</name>
    89. <value>RUNNING</value>
    90. <description>
    91. The state of the default queue. State can be one of RUNNING or STOPPED.
    92. </description>
    93. </property>
    94. <!--
    95. 权限
    96. -->
    97. <property>
    98. <name>yarn.scheduler.capacity.root.high.acl_submit_applications</name>
    99. <value>*</value>
    100. <description>
    101. The ACL of who can submit jobs to the default queue.
    102. </description>
    103. </property>
    104. <property>
    105. <name>yarn.scheduler.capacity.root.low.acl_submit_applications</name>
    106. <value>*</value>
    107. <description>
    108. The ACL of who can submit jobs to the default queue.
    109. </description>
    110. </property>
    111. <!--
    112. 权限
    113. -->
    114. <property>
    115. <name>yarn.scheduler.capacity.root.high.acl_administer_queue</name>
    116. <value>*</value>
    117. <description>
    118. The ACL of who can administer jobs on the default queue.
    119. </description>
    120. </property>
    121. <property>
    122. <name>yarn.scheduler.capacity.root.low.acl_administer_queue</name>
    123. <value>*</value>
    124. <description>
    125. The ACL of who can administer jobs on the default queue.
    126. </description>
    127. </property>
    128. <!--
    129. 权限
    130. -->
    131. <property>
    132. <name>yarn.scheduler.capacity.root.high.acl_application_max_priority</name>
    133. <value>*</value>
    134. <description>
    135. The ACL of who can submit applications with configured priority.
    136. For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    137. </description>
    138. </property>
    139. <property>
    140. <name>yarn.scheduler.capacity.root.low.acl_application_max_priority</name>
    141. <value>*</value>
    142. <description>
    143. The ACL of who can submit applications with configured priority.
    144. For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
    145. </description>
    146. </property>
    147. <!--
    148. 权限
    149. -->
    150. <property>
    151. <name>yarn.scheduler.capacity.root.high.maximum-application-lifetime
    152. </name>
    153. <value>-1</value>
    154. <description>
    155. Maximum lifetime of an application which is submitted to a queue
    156. in seconds. Any value less than or equal to zero will be considered as
    157. disabled.
    158. This will be a hard time limit for all applications in this
    159. queue. If positive value is configured then any application submitted
    160. to this queue will be killed after exceeds the configured lifetime.
    161. User can also specify lifetime per application basis in
    162. application submission context. But user lifetime will be
    163. overridden if it exceeds queue maximum lifetime. It is point-in-time
    164. configuration.
    165. Note : Configuring too low value will result in killing application
    166. sooner. This feature is applicable only for leaf queue.
    167. </description>
    168. </property>
    169. <property>
    170. <name>yarn.scheduler.capacity.root.low.maximum-application-lifetime
    171. </name>
    172. <value>-1</value>
    173. <description>
    174. Maximum lifetime of an application which is submitted to a queue
    175. in seconds. Any value less than or equal to zero will be considered as
    176. disabled.
    177. This will be a hard time limit for all applications in this
    178. queue. If positive value is configured then any application submitted
    179. to this queue will be killed after exceeds the configured lifetime.
    180. User can also specify lifetime per application basis in
    181. application submission context. But user lifetime will be
    182. overridden if it exceeds queue maximum lifetime. It is point-in-time
    183. configuration.
    184. Note : Configuring too low value will result in killing application
    185. sooner. This feature is applicable only for leaf queue.
    186. </description>
    187. </property>
    188. <!--
    189. 生命周期
    190. -->
    191. <property>
    192. <name>yarn.scheduler.capacity.root.high.default-application-lifetime
    193. </name>
    194. <value>-1</value>
    195. <description>
    196. Default lifetime of an application which is submitted to a queue
    197. in seconds. Any value less than or equal to zero will be considered as
    198. disabled.
    199. If the user has not submitted application with lifetime value then this
    200. value will be taken. It is point-in-time configuration.
    201. Note : Default lifetime can't exceed maximum lifetime. This feature is
    202. applicable only for leaf queue.
    203. yarn.scheduler.capacity.root.low.default-application-lifetime
    204. -1
    205. Default lifetime of an application which is submitted to a queue
    206. in seconds. Any value less than or equal to zero will be considered as
    207. disabled.
    208. If the user has not submitted application with lifetime value then this
    209. value will be taken. It is point-in-time configuration.
    210. Note : Default lifetime can't exceed maximum lifetime. This feature is
    211. applicable only for leaf queue.
    212. </description>
    213. </property>
    214. <property>
    215. <name>yarn.scheduler.capacity.node-locality-delay</name>
    216. <value>40</value>
    217. <description>
    218. Number of missed scheduling opportunities after which the CapacityScheduler
    219. attempts to schedule rack-local containers.
    220. When setting this parameter, the size of the cluster should be taken into account.
    221. We use 40 as the default value, which is approximately the number of nodes in one rack.
    222. Note, if this value is -1, the locality constraint in the container request
    223. will be ignored, which disables the delay scheduling.
    224. </description>
    225. </property>
    226. <property>
    227. <name>yarn.scheduler.capacity.rack-locality-additional-delay</name>
    228. <value>-1</value>
    229. <description>
    230. Number of additional missed scheduling opportunities over the node-locality-delay
    231. ones, after which the CapacityScheduler attempts to schedule off-switch containers,
    232. instead of rack-local ones.
    233. Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
    234. attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
    235. after 40+20=60 missed opportunities.
    236. When setting this parameter, the size of the cluster should be taken into account.
    237. We use -1 as the default value, which disables this feature. In this case, the number
    238. of missed opportunities for assigning off-switch containers is calculated based on
    239. the number of containers and unique locations specified in the resource request,
    240. as well as the size of the cluster.
    241. </description>
    242. </property>
    243. <property>
    244. <name>yarn.scheduler.capacity.queue-mappings</name>
    245. <value></value>
    246. <description>
    247. A list of mappings that will be used to assign jobs to queues
    248. The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
    249. Typically this list will be used to map users to queues,
    250. for example, u:%user:%user maps all users to queues with the same name
    251. as the user.
    252. </description>
    253. </property>
    254. <property>
    255. <name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
    256. <value>false</value>
    257. <description>
    258. If a queue mapping is present, will it override the value specified
    259. by the user? This can be used by administrators to place jobs in queues
    260. that are different than the one specified by the user.
    261. The default is false.
    262. </description>
    263. </property>
    264. <property>
    265. <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignments</name>
    266. <value>1</value>
    267. <description>
    268. Controls the number of OFF_SWITCH assignments allowed
    269. during a node's heartbeat. Increasing this value can improve
    270. scheduling rate for OFF_SWITCH containers. Lower values reduce
    271. "clumping" of applications on particular nodes. The default is 1.
    272. Legal values are 1-MAX_INT. This config is refreshable.
    273. yarn.scheduler.capacity.application.fail-fast
    274. false
    275. Whether RM should fail during recovery if previous applications'
    276. queue is no longer valid.
    277. </description>
    278. </property>
    279. </configuration>

    yarn-env.sh

    1. #这个主要是解决找不到java的问题
    2. export JAVA_HOME=/home/bigdata/module/jdk1.8.0_161

    hadoop-server.sh 

    1. #!/bin/bash
    2. if [ $# -lt 1 ]
    3. then
    4. echo "No Args Input..."
    5. exit ;
    6. fi
    7. case $1 in
    8. "start")
    9. echo " =================== 启动 hadoop集群 ==================="
    10. echo "node1的journalnode启动"
    11. ssh node1 "hdfs --daemon start journalnode"
    12. echo "node2的journalnode启动"
    13. ssh node2 "hdfs --daemon start journalnode"
    14. echo "node3的journalnode启动"
    15. ssh node3 "hdfs --daemon start journalnode"
    16. echo " --------------- 启动 hdfs ---------------"
    17. ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/start-dfs.sh"
    18. echo " --------------- 启动 yarn ---------------"
    19. ssh master2 "/home/bigdata/module/hadoop-3.1.3/sbin/start-yarn.sh"
    20. echo " --------------- 启动 historyserver ---------------"
    21. ssh master1 "/home/bigdata/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
    22. echo " --------------- 启动 httpfs ---------------"
    23. ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/httpfs.sh start"
    24. #建议/home/bigdata/hadoop/hadoop/bin/hdfs --daemon start httpfs
    25. ;;
    26. "stop")
    27. echo " --------------- 关闭httpfs ---------------"
    28. #建议/home/bigdata/hadoop/hadoop/bin/hdfs --daemon stop httpfs
    29. ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/httpfs.sh stop"
    30. echo " =================== 关闭 hadoop集群 ==================="
    31. echo " --------------- 关闭 historyserver ---------------"
    32. ssh master1 "/home/bigdata/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
    33. echo " --------------- 关闭 yarn ---------------"
    34. ssh master2 "/home/bigdata/module/hadoop-3.1.3/sbin/stop-yarn.sh"
    35. echo " --------------- 关闭 hdfs ---------------"
    36. ssh master1 "/home/bigdata/module/hadoop-3.1.3/sbin/stop-dfs.sh"
    37. echo "node1的journalnode关闭"
    38. ssh node1 "hdfs --daemon stop journalnode"
    39. echo "node2的journalnode关闭"
    40. ssh node2 "hdfs --daemon stop journalnode"
    41. echo "node3的journalnode关闭"
    42. ssh node3 "hdfs --daemon stop journalnode"
    43. ;;
    44. *)
    45. echo "Input Args Error..."
    46. ;;
    47. esac

    Hue整合Hdfs和Yarn集群配置

    hue.ini

    1. [hadoop]
    2. # Configuration for HDFS NameNode
    3. # ------------------------------------------------------------------------
    4. [[hdfs_clusters]]
    5. # HA support by using HttpFs
    6. [[[default]]]
    7. # Enter the filesystem uri
    8. fs_defaultfs=hdfs://master1:8020
    9. # NameNode logical name.
    10. ## logical_name=
    11. # Use WebHdfs/HttpFs as the communication mechanism.
    12. # Domain should be the NameNode or HttpFs host.
    13. # Default port is 14000 for HttpFs.
    14. #要单独启动对应的webhdfs
    15. webhdfs_url=http://master1:14000/webhdfs/v1
    16. # Change this if your HDFS cluster is Kerberos-secured
    17. ## security_enabled=false
    18. # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
    19. # have to be verified against certificate authority
    20. ## ssl_cert_ca_verify=True
    21. # Directory of the Hadoop configuration
    22. hadoop_conf_dir=/home/bigdata/module/hadoop-3.1.3/etc/hadoop
    23. hadoop_bin=/home/bigdata/module/hadoop-3.1.3/bin
    24. hadoop_hdfs_home=/home/bigdata/module/hadoop-3.1.3
    25. # Configuration for YARN (MR2)
    26. # ------------------------------------------------------------------------
    27. [[yarn_clusters]]
    28. [[[default]]]
    29. # Enter the host on which you are running the ResourceManager
    30. resourcemanager_host=cluster-yarn1
    31. # The port where the ResourceManager IPC listens on
    32. resourcemanager_port=8032
    33. # Whether to submit jobs to this cluster
    34. submit_to=True
    35. # Resource Manager logical name (required for HA)
    36. logical_name=rm1
    37. # Change this if your YARN cluster is Kerberos-secured
    38. ## security_enabled=false
    39. # URL of the ResourceManager API
    40. resourcemanager_api_url=http://master1:8088
    41. # URL of the ProxyServer API
    42. proxy_api_url=http://master1:8088
    43. # URL of the HistoryServer API
    44. history_server_api_url=http://master1:19888
    45. # URL of the Spark History Server
    46. ## spark_history_server_url=http://localhost:18088
    47. # Change this if your Spark History Server is Kerberos-secured
    48. ## spark_history_server_security_enabled=false
    49. # In secure mode (HTTPS), if SSL certificates from YARN Rest APIs
    50. # have to be verified against certificate authority
    51. ## ssl_cert_ca_verify=True
    52. # HA support by specifying multiple clusters.
    53. # Redefine different properties there.
    54. # e.g.
    55. [[[ha]]]
    56. # Resource Manager logical name (required for HA)
    57. logical_name=rm2
    58. # Un-comment to enable
    59. submit_to=True
    60. # URL of the ResourceManager API
    61. resourcemanager_api_url=http://master2:8088
    62. history_server_api_url=http://master1:19888
    63. # ...

    对接hive的时候把超时加长不然很容易就任务失败了 

    server_conn_timeout=3600

    Hue整合Hbase

    hbase启动脚本(要开启thrift主要是给Hue用)

    1. #!/bin/bash
    2. case $1 in
    3. "start"){
    4. for i in master2
    5. do
    6. echo " --------启动 $i hbase-------"
    7. ssh $i "/home/bigdata/module/hbase-2.4.9/bin/start-hbase.sh"
    8. ssh $i "/home/bigdata/module/hbase-2.4.9/bin/hbase-daemons.sh start thrift"
    9. done
    10. };;
    11. "stop"){
    12. for i in master2
    13. do
    14. echo " --------停止 $i hbase-------"
    15. ssh $i "/home/bigdata/module/hbase-2.4.9/bin/hbase-daemons.sh stop thrift"
    16. ssh $i "/home/bigdata/module/hbase-2.4.9/bin/stop-hbase.sh"
    17. done
    18. };;
    19. esac

    Hue配置

    1. [hbase]
    2. # Comma-separated list of HBase Thrift servers for clusters in the format of '(name|host:port)'.
    3. # Use full hostname. If hbase.thrift.ssl.enabled in hbase-site is set to true, https will be used otherwise it will use http
    4. # If using Kerberos we assume GSSAPI SASL, not PLAIN.
    5. hbase_clusters=(Cluster|node3:9090)
    6. # HBase configuration directory, where hbase-site.xml is located.
    7. hbase_conf_dir=/home/bigdata/module/hbase-2.4.9/conf
    8. # Hard limit of rows or columns per row fetched before truncating.
    9. ## truncate_limit = 500
    10. # Should come from hbase-site.xml, do not set. 'framed' is used to chunk up responses, used with the nonblocking server in Thrift but is not supported in Hue.
    11. # 'buffered' used to be the default of the HBase Thrift Server. Default is buffered when not set in hbase-site.xml.
    12. ## thrift_transport=buffered
    13. # Choose whether Hue should validate certificates received from the server.
    14. ## ssl_cert_ca_verify=true

    参考文章

    Hue编译安装_Endless在路上的博客-CSDN博客_hue编译安装

    hue的编译与安装_聆听金生的博客-CSDN博客_hue编译

    https://blog.csdn.net/yxluojiecpp/article/details/126828755

  • 相关阅读:
    T1064 奥运奖牌计数(信息学一本通C++)
    信息系统项目管理师教程 第四版【第3章-信息系统治理-思维导图】
    万能曲线公式4800
    Spring 面向切面编程 第3关:AOP实现原理-JDK动态代理
    8086汇编段地址和偏移地址分配原则,深入理解.
    微服务【RabbitMQ安装】第3章
    Redis数据类型——set类型数据介绍及操作
    Linux性能优化实战CPU篇(二)
    转行要趁早!盘点网络安全的岗位汇总!
    5-2Web应用程序漏洞扫描
  • 原文地址:https://blog.csdn.net/S1124654/article/details/128185250