虚拟机上安装centos7.5参照https://blog.csdn.net/a111111__/article/details/117230257(磁盘挂载配置看这个)和https://blog.csdn.net/weixin_45309636/article/details/108504978(其他配置看这个)
需要centos7.5系统,在安装系统的时候选择开发者版本
执行命令
sudo yum install -y epel-release
sudo yum install -y psmisc nc net-tools rsync vim lrzsz ntp libzstd openssl-static
![[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-5TDrZ7EY-1660384188933)(C:\Users\HP\AppData\Roaming\Typora\typora-user-images\image-20220508180132660.png)]](https://1000bd.com/contentImg/2022/08/16/094109969.png)
sudo vim /etc/sysconfig/network-scripts/ifcfg-ens33
DEVICE=ens33
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static #修改为静态
NAME="ens33"
#添加下面的IP地址和网关信息以及DNS域名服务器信息
#注意:这里的网关信息是虚拟机配置的,需要与虚拟机上的虚拟网关保持一致,在生产环境下需要提前获取集群的网关信息
IPADDR=192.168.10.100
PREFIX=24
GATEWAY=192.168.10.2
DNS1=192.168.10.2
vim /etc/hostnames
hadoop100
sudo vim /etc/hosts
192.168.1.100 hadoop100
192.168.1.101 hadoop101
192.168.1.102 hadoop102
192.168.1.103 hadoop103
192.168.1.104 hadoop104
192.168.1.105 hadoop105
192.168.1.106 hadoop106
192.168.1.107 hadoop107
192.168.1.108 hadoop108
如果是在自己的虚拟集群上,自己可以在windows环境下也配置一下自己的hosts
1、进入C:\Windows\System32\drivers\etc路径
2、打开hosts文件并添加如下内容
192.168.1.100 hadoop100
192.168.1.101 hadoop101
192.168.1.102 hadoop102
192.168.1.103 hadoop103
192.168.1.104 hadoop104
192.168.1.105 hadoop105
192.168.1.106 hadoop106
192.168.1.107 hadoop107
192.168.1.108 hadoop108
注意:直接修改是不允许的,可以将hosts文件保存下来以后将原来的覆盖掉
sudo systemctl stop firewalld
sudo systemctl disable firewalld
sudo useradd wt
sudo passwd wt
注意:这一步其实在安装系统的时候就可以完成,如果在安装系统的时候没有完成,需要创建自己的用户,在以后操作集群的时候,其实都是在用自己创建的用户在操作,只有在修改系统文件的时候才会切换到root用户下
用root用户权限修改/etc/sudoers
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
wt ALL=(ALL) ALL
1、在/opt目录下创建module、software文件夹
sudo mkdir module
sudo mkdir software
2、修改module、software文件夹的所有者cd
sudo mkdir /opt/module /opt/software
sudo chown wt:wt /opt/module /opt/software
rpm -qa | grep -i java | xargs -n1 sudo rpm -e –nodeps
hadoop-3.1.3.tar.gz
jdk-8u212-linux-x64.tar.gz
tar -zxvf jdk-8u212-linux-x64.tar.gz -C /opt/module/
tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/
1、新建/etc/profile.d/my_env.sh文件
sudo vim /etc/profile.d/my_env.sh
2、添加如下内容
#JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin
##HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
3、重新加载一下资源环境
source /etc/profile
4、测试环境是否安装成功
java -version
java version "1.8.0_212"
hadoop version
Hadoop 3.1.3
5、重启服务器
6、用hadoop自带的jar测试hadoop是否能正常运行
1.在创建在hadoop-3.1.3文件下面创建一个wcinput文件夹
mkdir wcinput
2.在wcinput目录下创建word.txt文件
hadoop yarn
hadoop mapreduce
spark
spark
3.回到Hadoop目录/opt/module/hadoop-3.1.3执行程序
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount wcinput wcoutput
查看执行结果
cat wcoutput/part-r-00000
spark 2
hadoop 2
mapreduce 1
yarn 1
1、在虚拟机上拍快照,然后一个一个克隆
2、分别修改IP地址
DEVICE=ens33
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=static #修改为静态
NAME="ens33"
#添加下面的IP地址和网关信息以及DNS域名服务器信息
#注意:这里的网关信息是虚拟机配置的,需要与虚拟机上的虚拟网关保持一致,在生产环境下需要提前获取集群的网关信息
IPADDR=192.168.10.101
PREFIX=24
GATEWAY=192.168.10.2
DNS1=192.168.10.2
3、分别修改主机名称
vim /etc/hostnames
hadoop101
这个时候基本上就配置完了,后面就要开始操作hadoop的配置文件了,但是为了和方便的实现配置文件的同步,我们需要做如下步骤:
1.在/home/wt目录下创建xsync文件
cd /home/wt
vim xsync
编写文件
#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
echo Not Enough Arguement!
exit;
fi
#2. 遍历集群所有机器
for host in hadoop102 hadoop103 hadoop104
do
echo ==================== $host ====================
#3. 遍历所有目录,挨个发送
for file in $@
do
#4 判断文件是否存在
if [ -e $file ]
then
#5. 获取父目录
pdir=$(cd -P $(dirname $file); pwd)
#6. 获取当前文件的名称
fname=$(basename $file)
ssh $host "mkdir -p $pdir"
rsync -av $pdir/$fname $host:$pdir
else
echo $file does not exists!
fi
done
done
修改脚本 xsync 具有执行权限
chmod +x xsync
将脚本移动到/bin中,以便全局调用
sudo mv xsync /bin/
测试一下脚本,这个时候如果在一个文件夹下修改了某一个文件,那么就可以整体同步到集群的其他服务器下
xsync /bin/xsync
1、集群部署规划
注意:NameNode和SecondaryNameNode不要安装在同一台服务器
注意:ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台机器上。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-d679a832-1660384188935)(C:\Users\HP\AppData\Roaming\Typora\typora-user-images\image-20220508163439178.png)]
2、配置核心文件
1、配置core-site.xml
cd $HADOOP_HOME/etc/hadoop
vim core-site.xml
#添加
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop102:8020</value>
</property>
<property>
<name>hadoop.data.dir</name>
<value>/opt/module/hadoop-3.1.3/data</value>
</property>
<property>
<name>hadoop.proxyuser.atguigu.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.atguigu.groups</name>
<value>*</value>
</property>
</configuration>
2、配置hdfs-site.xml
vim hdfs-site.xml
#添加
<configuration>
<!--nn web端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop102:9870</value>
</property>
<!--2nn web端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop104:9868</value>
</property>
</configuration>
3、配置yarn-site.xml
vim yarn-site.xml
#添加
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop103</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
4、配置mapred-site.xml
vim mapred-site.xml
#添加
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
3、配置works
hadoop102
hadoop103
hadoop104
注意:这个文件夹下面不能有任意一个多余的空格和空行
4、分发配置好的文件
xsync /opt/module/hadoop-3.1.3/etc/hadoop/
可以到其他服务器上查看一下相应的文件是否已经修改
1、如果集群是第一次启动,需要格式化NameNode
hdfs namenode -format
2、根据集群规划,我们要在hadoop102上启动NameNode,在hadoop102、hadoop103以及hadoop104上执行如下命令(三台都要执行)
#我们要在hadoop102上启动NameNode
hdfs --daemon start namenode
#在hadoop102、hadoop103以及hadoop104上执行如下命令(三台都要执行)
hdfs --daemon start datanode
很麻烦,我们可以通过群起命令启动集群,但是我们需要配置免密登录
1、进入到/home/wt/.ssh/下执行命令,生成公钥和私钥
ssh-keygen -t rsa
2.将生成的公钥分别分发到集群服务器上,执行(不同用户不公用,wt用户能用,root不能用,想用切换用户单独配置)
ssh-copy-id hadoop102
ssh-copy-id hadoop103
ssh-copy-id hadoop104
3、在其他服务器上配置相同操作
1、如果集群是第一次启动,需要在hadoop102节点格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据)
hdfs namenode -format
2、启动HDFS**(hadoop102上启动)**
sbin/start-dfs.sh
3、**在配置了ResourceManager的节点(hadoop103)**启动YARN
sbin/start-yarn.sh
4、查看启动情况
jps
1、配置mapred-site.xml
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop102:10020</value>
</property>
<!-- 历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop102:19888</value>
</property>
2、分发脚本
xsync $HADOOP_HOME/etc/hadoop/mapred-site.xml
3、在hadoop102启动历史服务器
mapred –daemon start historyserver
日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。
日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。
注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryManager。
步骤如下:
1、1. 配置yarn-site.xml
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop102:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
2、 分发配置
xsync $HADOOP_HOME/etc/hadoop/yarn-site.xml
3、关闭集群
4、重启集群
在103上执行:start-yarn.sh
在102上执行:mapred --daemon start historyserver
我们发现,我们在启动hadoop集群的时候,需要在hadoop102上启动hdfs,启动historyserver,造hadoop103上启动yarn,很麻烦,也很容易启动错误,我们有没有什么方法来简化启动过程呢?这个时候我们就用到了群起脚本;
#!/bin/bash
if [ $# -lt 1 ]
then
echo No Argument Input!
exit;
fi
case $1 in
"start")
echo "=================启动 Hadoop集群========================"
echo "------------------启动 hdfs-----------------------------"
ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
echo "------------------启动 yarn-----------------------------"
ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
echo "------------------启动 historyserver--------------------"
ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
;;
"stop")
echo "==================关闭 Hadoop集群==================="
echo "------------------关闭 historyserver-----------------------------"
ssh hadoop102 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
echo "------------------关闭 yarn--------------------------------------"
ssh hadoop103 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
echo "------------------关闭 hdfs-----------------------------"
ssh hadoop102 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
;;
*)
echo "input Args Error..."
;;
esac
2、将脚本放在/bin/目录下方便全局使用
3、给文件添加执行权限
chmod 777 myhadoop.sh
4、群起集群
myhadoop.sh start
这个时候我们想看集群的各个组件在集群中的运行情况,我们也可以添加集群的查看脚本
1、vim jpsall
#!/bin/bash
for host in hadoop102 hadoop103 hadoop104
do
echo =============== $host ===============
ssh $host jps
done
2、给脚本添加执行权限
chmod +777 jpsall
3、将脚本放在/bin/目录下以便全局使用
4、使用jpsall
jpsall
注意:时间服务器配置(必须root用户)
1、在所有节点关闭ntp服务和自启动
systemctl stop ntpd
systemctl disable ntpd
2、修改ntp配置文件(hadoop102)
vim /etc/ntp.conf
########修改1(授权192.168.1.0-192.168.1.255网段上的所有机器可以从这台机器上查询和同步时间)
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap
#######修改2(集群在局域网中,不使用其他互联网上的时间)
server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst
####添加3(当该节点丢失网络连接,依然可以采用本地时间作为时间服务器为集群中的其他节点提供时间同步)
server 127.127.1.0
fudge 127.127.1.0 stratum 10
3、修改/etc/sysconfig/ntpd 文件(hadoop102)
vim /etc/sysconfig/ntpd
#增加内容如下(让硬件时间与系统时间一起同步)
SYNC_HWCLOCK=yes
4、重新启动ntpd服务
systemctl start ntpd
5、设置ntpd服务开机启动
systemctl enable ntpd
6、其他机器配置(必须root用户)(hadoop103、hadoop104)
crontab -e
*/10 * * * * /usr/sbin/ntpdate hadoop102
----------------------------------这个时候,集群就配置完全了,完结,撒花!----------------------------------------------------
[wt@hadoop102 hadoop-3.1.3]$ bin/hadoop fs
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] <path> ...]
[-cp [-f] [-p] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-usage [cmd ...]]
[wt@hadoop102 hadoop-3.1.3]$ ll
总用量 180
drwxr-xr-x. 2 wt wt 183 9月 12 2019 bin
drwxrwxr-x. 4 wt wt 37 5月 6 16:08 data
drwxr-xr-x. 3 wt wt 20 9月 12 2019 etc
drwxr-xr-x. 2 wt wt 106 9月 12 2019 include
drwxr-xr-x. 3 wt wt 20 9月 12 2019 lib
drwxr-xr-x. 4 wt wt 288 9月 12 2019 libexec
-rw-rw-r--. 1 wt wt 147145 9月 4 2019 LICENSE.txt
drwxrwxr-x. 3 wt wt 4096 5月 7 12:14 logs
-rw-rw-r--. 1 wt wt 21867 9月 4 2019 NOTICE.txt
-rw-rw-r--. 1 wt wt 1366 9月 4 2019 README.txt
drwxr-xr-x. 3 wt wt 4096 9月 12 2019 sbin
drwxr-xr-x. 4 wt wt 31 9月 12 2019 share
drwxrwxr-x. 2 wt wt 22 5月 6 11:15 wcinput
drwxr-xr-x. 2 wt wt 88 5月 6 11:18 wcoutput
bin目录:存放对Hadoop相关服务(HDFS,YARN)进行操作的脚本
etc目录:Hadoop的配置文件目录,存放Hadoop的配置文件
lib目录:存放Hadoop的本地库(对数据进行压缩解压缩功能)
sbin目录:存放启动或停止Hadoop相关服务的脚本
share目录:存放Hadoop的依赖jar包、文档、和官方案例
data目录:实际存放hdfs文件的文件路径(hdfs文件实际存放在集群上,就在data文件夹里)
例如,目前集群中存在的文件存在于目录:
[wt@hadoop102 subdir0]$ pwd
/opt/module/hadoop-3.1.3/data/dfs/data/current/BP-2030843570-192.168.10.102-1651821962452/current/finalized/subdir0/subdir0
[wt@hadoop102 subdir0]$ ll
总用量 272
-rw-rw-r--. 1 wt wt 23 5月 6 17:23 blk_1073741825
-rw-rw-r--. 1 wt wt 11 5月 6 17:23 blk_1073741825_1001.meta
-rw-rw-r--. 1 wt wt 31 5月 6 17:24 blk_1073741832
-rw-rw-r--. 1 wt wt 11 5月 6 17:24 blk_1073741832_1008.meta
-rw-rw-r--. 1 wt wt 25554 5月 6 17:24 blk_1073741834
-rw-rw-r--. 1 wt wt 207 5月 6 17:24 blk_1073741834_1010.meta
-rw-rw-r--. 1 wt wt 214456 5月 6 17:24 blk_1073741835
-rw-rw-r--. 1 wt wt 1683 5月 6 17:24 blk_1073741835_1011.meta
-rw-rw-r--. 1 wt wt 50 5月 8 13:35 blk_1073741836
-rw-rw-r--. 1 wt wt 11 5月 8 13:35 blk_1073741836_1012.meta
[wt@hadoop102 subdir0]$ cat blk_1073741836
hello spark kafka hive flume atals zookeeper datax
logs目录:存放hadoop的文件日志
第一步:杀死进程,杀死服务
[wt@hadoop102 hadoop-3.1.3]$ sbin/stop-dfs.sh
第二步:删除每个集群上的data和logs
[wt@hadoop102 hadoop-3.1.3]$ rm -rf data/ logs/
第三步:格式化
[wt@hadoop102 hadoop-3.1.3]$ hdfs namenode -format
第四步:启动集群
[leslie@hadoop102 hadoop-3.1.3]$ sbin/start-dfs.sh
单独启动某个节点,如:
hdfs --daemon start namenode
1、下载apache-maven-3.2.2-bin.zip,并解压
2、配置maven的环境变量
3、测试是否配置成功
C:\Users\HP>mvn -v
Apache Maven 3.2.2 (45f7c06d68e745d05611f7fd14efb6594181933e; 2014-06-17T21:51:42+08:00)
Maven home: C:\maven\apache-maven-3.2.2\bin\..
Java version: 1.8.0_92, vendor: Oracle Corporation
Java home: C:\Java\jdk1.8.0_92\jre
Default locale: zh_CN, platform encoding: GBK
OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos"
4、如果不成功,看下JAVA_HOME是否配置完全
比如说PATH是否添加
%JAVA_HOME%\bin\jre;
是否添加CLASSPATH
.;%JAVA_HOME%\lib;%JAVA_HOME%\lib\dt.jar;%JAVA_HOME%\lib\tools.jar
5、在C:\maven\apache-maven-3.2.2\conf下修改setting.xml
<mirror>
<id>alimavenid>
<name>aliyun mavenname>
<url>http://maven.aliyun.com/nexus/content/groups/publicurl>
<mirrorOf>centralmirrorOf>
mirror>
<localRepository>C:\maven\repositorylocalRepository>
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-pluginartifactId>
<version>2.3.2version>
<configuration>
<source>1.8source>
<target>1.8target>
configuration>
plugin>
<plugin>
<artifactId>maven-assembly-plugin artifactId>
<configuration>
<descriptorRefs>
<descriptorRef>jar-with-dependenciesdescriptorRef>
descriptorRefs>
<archive>
<manifest>
<mainClass>com.atguigu.mr.WordcountDrivermainClass>
manifest>
archive>
configuration>
<executions>
<execution>
<id>make-assemblyid>
<phase>packagephase>
<goals>
<goal>singlegoal>
goals>
execution>
executions>
plugin>
plugins>
build>
注意:如果工程上显示红叉。在项目上右键->maven->update project即可。
1.需要在windows上配置hadoop的环境变量(需要提前准备hadoop的解压文件)
2.在IDEA上添加依赖
<dependencies>
<dependency>
<groupId>junitgroupId>
<artifactId>junitartifactId>
<version>4.12version>
dependency>
<dependency>
<groupId>org.apache.logging.log4jgroupId>
<artifactId>log4j-slf4j-implartifactId>
<version>2.12.0version>
dependency>
<dependency>
<groupId>org.apache.hadoopgroupId>
<artifactId>hadoop-client-apiartifactId>
<version>3.1.3version>
dependency>
<dependency>
<groupId>org.apache.hadoopgroupId>
<artifactId>hadoop-client-runtimeartifactId>
<version>3.1.3version>
dependency>
dependencies>
3、编写配置文件
package com.dtdream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.junit.Test;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import static org.apache.hadoop.shaded.com.google.gson.internal.bind.TypeAdapters.URI;
//import
public class HdfsClient{
@Test
public void testMkdirs() throws URISyntaxException, IOException, InterruptedException {
URI uri = new URI("hdfs://hadoop102:8020");
Configuration configuration = new Configuration();
String user = "wt";
FileSystem fs = FileSystem.get(uri, configuration, user);
fs.mkdirs(new Path("/wt/test"));
fs.close();
}
@Test
public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException {
// 1 获取文件系统
Configuration configuration = new Configuration();
configuration.set("dfs.replication", "2");
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt");
// 2 上传文件
fs.copyFromLocalFile(new Path("C:\\IDEA_test_files\\JAVA_TEST02\\src\\main\\resources\\word.txt"), new Path("/wt/test/word.txt"));
// 3 关闭资源
fs.close();
System.out.println("over");
}
@Test
public void testCopyToLocalFile() throws IOException, InterruptedException, URISyntaxException{
// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt");
// 2 执行下载操作
// boolean delSrc 指是否将原文件删除
// Path src 指要下载的文件路径
// Path dst 指将文件下载到的路径
// boolean useRawLocalFileSystem 是否开启文件校验
fs.copyToLocalFile(false, new Path("/wt/test/word.txt"), new Path("C:\\IDEA_test_files\\JAVA_TEST02\\src\\main\\resources\\word2.txt"), true);
// 3 关闭资源
fs.close();
}
@Test
public void Mytest() throws URISyntaxException, IOException, InterruptedException {
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"),configuration,"wt");
fs.mkdirs(new Path("/idea_test"));
fs.close();
}
@Test
public void testDelete() throws IOException, InterruptedException, URISyntaxException{
// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt");
// 2 执行删除
fs.delete(new Path("/idea_test"), true);
// 3 关闭资源
fs.close();
}
@Test
public void testRename() throws IOException, InterruptedException, URISyntaxException{
// 1 获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt");
// 2 修改文件名称
fs.rename(new Path("/wt/test/word.txt"), new Path("/wt/test/word_new.txt"));
// 3 关闭资源
fs.close();
}
@Test
public void testListFiles() throws IOException, InterruptedException, URISyntaxException{
// 1获取文件系统
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt");
// 2 获取文件详情
RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);
while(listFiles.hasNext()){
LocatedFileStatus status = listFiles.next();
// 输出详情
// 文件名称
System.out.println(status.getPath().getName());
// 长度
System.out.println(status.getLen());
// 权限
System.out.println(status.getPermission());
// 分组
System.out.println(status.getGroup());
// 获取存储的块信息
BlockLocation[] blockLocations = status.getBlockLocations();
for (BlockLocation blockLocation : blockLocations) {
// 获取块存储的主机节点
String[] hosts = blockLocation.getHosts();
for (String host : hosts) {
System.out.println(host);
}
}
System.out.println("---------------------");
}
// 3 关闭资源
fs.close();
}
@Test
public void testListStatus() throws IOException, InterruptedException, URISyntaxException{
// 1 获取文件配置信息
Configuration configuration = new Configuration();
FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:8020"), configuration, "wt");
// 2 判断是文件还是文件夹
FileStatus[] listStatus = fs.listStatus(new Path("/"));
for (FileStatus fileStatus : listStatus) {
// 如果是文件
if (fileStatus.isFile()) {
System.out.println("f:"+fileStatus.getPath().getName());
}else {
System.out.println("d:"+fileStatus.getPath().getName());
}
}
// 3 关闭资源
fs.close();
}
}
1、下载apache-zookeeper-3.5.7-bin.tar.gz,解压到集群上并配置环境变量
2、在conf目录下将zoo_sample.cfg文件修改为zoo.cfg
3、在zookeeper目录下创建zkData文件夹(用于保存zookeeper信息)
4、在zkData文件夹下创建myid文件并设定当前zookeeper的id
vim myid
2
4、修改zoo.cfg文件
#添加
#######################cluster##########################
server.2=hadoop102:2888:3888
server.3=hadoop103:2888:3888
server.4=hadoop104:2888:3888
5、分发zookeeper到其他服务器上
6、修改zkData目录下myid唯一
7、在服务器上分别打开
1、在/home/wt下创建zk.sh
vim zk.sh
#!/bin/bash
for host in hadoop102 hadoop103 hadoop104
do
case $1 in
"start"){
echo "------------ $host zookeeper -----------"
ssh $host "source /etc/profile; zkServer.sh start"
};;
"stop"){
echo "------------ $host zookeeper -----------"
ssh $host "source /etc/profile; zkServer.sh stop"
};;
"status"){
echo "------------ $host zookeeper -----------"
ssh $host "source /etc/profile; zkServer.sh status"
};;
esac
done
2、给文本添加执行权限
chmod +x zk.sh
3、将文本移动到/bin/目录下以便全局调用
sudo mv zk.sh /bin/zk.sh
测试群起效果
[wt@hadoop102 bin]$ zk.sh start
------------ hadoop102 zookeeper -----------
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper-3.5.7/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
------------ hadoop103 zookeeper -----------
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper-3.5.7/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
------------ hadoop104 zookeeper -----------
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper-3.5.7/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[wt@hadoop102 bin]$
1、下载并解压apache-flume-1.7.0-bin.tar.gz包
2、将flume/conf下的flume-env.sh.template文件修改为flume-env.sh,并配置flume-env.sh文件
export JAVA_HOME=$JAVA_HOME
3、配置完了。
1、下载并解压kafka_2.12-3.0.1.tgz包
2、配置kafka环境变量
3、在$KAFKA_HOME目录下创建logs文件夹
4、修改配置文件config/server.properties
输入以下内容:
#broker的全局唯一编号,不能重复
broker.id=0
#删除topic功能使能
delete.topic.enable=true
#处理网络请求的线程数量
num.network.threads=3
#用来处理磁盘IO的现成数量
num.io.threads=8
#发送套接字的缓冲区大小
socket.send.buffer.bytes=102400
#接收套接字的缓冲区大小
socket.receive.buffer.bytes=102400
#请求套接字的缓冲区大小
socket.request.max.bytes=104857600
#kafka运行日志存放的路径
log.dirs=/opt/module/kafka/logs
#topic在当前broker上的分区个数
num.partitions=1
#用来恢复和清理data下数据的线程数量
num.recovery.threads.per.data.dir=1
#segment文件保留的最长时间,超时将被删除
log.retention.hours=168
#配置连接Zookeeper集群地址
zookeeper.connect=hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka
5、分发zookeeper,修改各config/server.properties中broker.id唯一
6、在个服务器上配置kafka环境变量
7、测试kafka
kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
sudo vim kf.sh
#!/bin/bash
case $1 in
"start"){
for i in hadoop102 hadoop103 hadoop104
do
echo " --------启动 $i Kafka-------"
ssh $i "/opt/module/kafka-3.0.1/bin/kafka-server-start.sh -daemon /opt/module/kafka-3.0.1/config/server.properties "
done
};;
"stop"){
for i in hadoop102 hadoop103 hadoop104
do
echo " --------停止 $i Kafka-------"
ssh $i "/opt/module/kafka-3.0.1/bin/kafka-server-stop.sh"
done
};;
esac
2、给文件添加执行权限
chmod 777 kf.sh
3、测试脚本
kf.sh start
1、解压mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar到 /opt/software/mysql下
tar -xvf mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar -C ./mysql
2、卸载系统自带的mysql
#查看
rpm -qa|grep mariadb
rpm -qa|grep mysql
rpm -e --nodeps mariadb-libs-5.5.56-2.el7.x86_64
3、解压命令(执行顺序别乱)
sudo rpm -ivh mysql-community-common-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-libs-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-client-5.7.28-1.el7.x86_64.rpm
sudo rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm
4、初始化mysql
sudo mysqld --initialize --user=mysql
5、查看初始化的mysql密码
[wt@hadoop102 mysql]$ sudo cat /var/log/mysqld.log
2022-05-10T05:10:39.513714Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2022-05-10T05:10:39.662586Z 0 [Warning] InnoDB: New log files created, LSN=45790
2022-05-10T05:10:39.687784Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2022-05-10T05:10:39.746734Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 88b0625e-d01f-11ec-ac64-000c299e0128.
2022-05-10T05:10:39.748791Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2022-05-10T05:10:40.608922Z 0 [Warning] CA certificate ca.pem is self signed.
2022-05-10T05:10:41.029552Z 1 [Note] A temporary password is generated for root@localhost: 1(VjXf=cIMDI
6、启动mysql‘服务
sudo systemctl start mysqld
7、进入mysql(初始化密码为:1(VjXf=cIMDI )
[wt@hadoop102 mysql]$ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.28
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
8、设置root用户的新密码,要不然会出错
mysql> set password = password("123456");
Query OK, 0 rows affected, 1 warning (0.00 sec)
9、修改root用户的host,允许root用户远程登录
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> select host,user from user;
+-----------+---------------+
| host | user |
+-----------+---------------+
| localhost | mysql.session |
| localhost | mysql.sys |
| localhost | root |
+-----------+---------------+
3 rows in set (0.00 sec)
mysql> update user set host = '%' where user = 'root';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select host,user from user;
+-----------+---------------+
| host | user |
+-----------+---------------+
| % | root |
| localhost | mysql.session |
| localhost | mysql.sys |
+-----------+---------------+
3 rows in set (0.00 sec)
刷新配置
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
1、解压apache-hive-3.1.2-bin.tar.gz到 /opt/module下并修改名称
2、配置hive的环境变量
vim /etc/profile.d/my_env.sh
#HIVE_HOME
export HIVE_HOME=/opt/module/hive-3.1.2
export PATH=$PATH:$HIVE_HOME/bin
3、安装mysql 并在mysql中创建数据库
create database metastore;
4、将mysql-connector-java-5.1.27-bin.jar驱动拷贝到hive/lib目录下
cp /opt/software/mysql-connector-java-5.1.27-bin.jar /opt/module/hive-3.1.2/lib/
5、在hive-3.1.2/conf目录下创建hive-site.xml文件并填写配置信息
<configuration>
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=truevalue>
<description>JDBC connect string for a JDBC metastoredescription>
property>
<property>
<name>javax.jdo.option.ConnectionDriverNamename>
<value>com.mysql.jdbc.Drivervalue>
<description>Driver class name for a JDBC metastoredescription>
property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>rootvalue>
<description>username to use against metastore databasedescription>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>123456value>
<description>password to use against metastore databasedescription>
property>
<property>
<name>hive.metastore.schema.verificationname>
<value>falsevalue>
property>
<property>
<name>hive.metastore.event.db.notification.api.authname>
<value>falsevalue>
property>
<property>
<name>hive.metastore.warehouse.dirname>
<value>/user/hive/warehousevalue>
property>
configuration>
6、修改/opt/module/hive-3.1.2/conf目录下的hive-env.sh.template名称为hive-env.sh
vim hive-env.sh
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export HIVE_CONF_DIR=/opt/module/hive-3.1.2/conf
7、初始化hive
schematool -initSchema -dbType mysql -verbose
在hive-site.xml文件中添加
vim hive-site.xml
<property>
<name>hive.metastore.urisname>
<value>thrift://hadoop102:9083value>
property>
如果配置了元数据连接服务,就必须要启动元数据连接服务,要不然,使用hive/bin是无法使用的。
[wt@hadoop102 hive-3.1.2]$ bin/hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.2/bin:/home/wt/.local/bin:/home/wt/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 68c09d3a-0258-4ecc-8df2-01617ef7b28a
Logging initialized using configuration in jar:file:/opt/module/hive-3.1.2/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show tables;
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
开启元数据服务命令(打开以后对应窗口就用不了了,会停在那里)
[wt@hadoop102 hive-3.1.2]$ bin/hive --service metastore
2022-05-10 14:45:36: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
使用新窗口打开hive
[wt@hadoop102 hive-3.1.2]$ bin/hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/home/wt/.local/bin:/home/wt/bin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.2/bin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.3/bin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.3/bin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.2/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 66417456-280b-4197-8d1a-09ae42b82292
Logging initialized using configuration in jar:file:/opt/module/hive-3.1.2/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = c677d7c2-6e76-40fd-8c89-3d060756d341
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
1、在hive-site.xml文件里添加
<property>
<name>hive.server2.thrift.bind.hostname>
<value>hadoop102value>
property>
<property>
<name>hive.server2.thrift.portname>
<value>10000value>
property>
2、在启动JDBC连接的时候首先启动元数据连接方式
[wt@hadoop102 hive-3.1.2]$ bin/hive --service metastore
2022-05-10 16:35:48: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
3、开启hiveserver2
[wt@hadoop102 hive-3.1.2]$ bin/hive --service hiveserver2
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/kafka-3.0.1/bin:/opt/module/hive-3.1.2/bin:/home/wt/.local/bin:/home/wt/bin)
2022-05-10 16:38:29: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = ddd237be-01f3-477a-a0c3-bf1b053ab258
Hive Session ID = 70777be5-1bf7-4935-8dab-572300f56ab3
Hive Session ID = d19fe2fe-1131-4bb2-88ab-be9e19abcf40
Hive Session ID = ac661263-4ed4-44a9-8d01-9aeea1068636
4、启动beeline客户端
[wt@hadoop102 hive-3.1.2]$ bin/beeline -u jdbc:hive2://hadoop102:10000 -n wt
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://hadoop102:10000
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://hadoop102:10000>
如果因为权限问题报错,需要修改hadoop的core-site.xml文件,添加
<property>
<name>hadoop.proxyuser.wt.hostsname>
<value>*value>
property>
<property>
<name>hadoop.proxyuser.wt.groupsname>
<value>*value>
property>
/tmp/wt/hive.log
taif -f hive.log
1、在使用这个脚本的时候,确保hive-site.xml中有
<property>
<name>hive.metastore.urisname>
<value>thrift://hadoop102:9083value>
property>
<property>
<name>hive.server2.thrift.bind.hostname>
<value>hadoop102value>
property>
<property>
<name>hive.server2.thrift.portname>
<value>10000value>
property>
2、编写启动脚本(sudo vim /bin/hiveservice.sh)
#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs
# 创建日志目录
if [ ! -d $HIVE_LOG_DIR ]
then
mkdir -p $HIVE_LOG_DIR
fi
# 检查进程是否运行正常,参数 1 为进程名,参数 2 为进程端口
function check_process()
{
pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')
ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)
echo $pid
[[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1
}
# 启动服务
function hive_start()
{
# 启动Metastore
metapid=$(check_process HiveMetastore 9083)
cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"
[ -z "$metapid" ] && eval $cmd || echo -e "\033[47;36m Metastroe 服务已启动\033[0m"
# 启动HiveServer2
server2pid=$(check_process HiveServer2 10000)
cmd="nohup hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"
[ -z "$server2pid" ] && eval $cmd || echo -e "\033[47;36m HiveServer2 服务已启动\033[0m"
}
# 停止服务
function hive_stop()
{
# 停止Metastore
metapid=$(check_process HiveMetastore 9083)
[ "$metapid" ] && kill $metapid || echo -e "\033[47;33m Metastore 服务未启动\033[0m"
# 停止HiveServer2
server2pid=$(check_process HiveServer2 10000)
[ "$server2pid" ] && kill $server2pid || echo -e "\033[47;33m HiveServer2 服务未启动\033[0m"
}
# 脚本参数菜单
case $1 in
"start")
echo -e "\033[47;32m 服务启动中,HiveServer2启动时间较长,请等待!\033[0m"
hive_start
;;
"stop")
echo -e "\033[47;32m 服务停止中,请等待!\033[0m"
hive_stop
;;
"restart")
echo -e "\033[47;32m 服务重启中,HiveServer2启动时间较长,请等待!\033[0m"
hive_stop
sleep 2
hive_start
;;
"status")
check_process HiveMetastore 9083 >/dev/null && echo -e "\033[47;36m Metastore 服务运行正常\033[0m" || echo -e "\033[47;31m Metastore 服务运行异常\033[0m"
check_process HiveServer2 10000 >/dev/null && echo -e "\033[47;36m HiveServer2 服务运行正常\033[0m" || echo -e "\033[47;31m HiveServer2 服务运行异常\033[0m"
;;
*)
echo -e "\033[47;31m Invalid Args!\033[0m"
echo 'Usage: '$(basename $0)' start|stop|restart|status'
;;
esac
3、添加执行权限
chmod 777 hiveservice.sh
4、启动
hiveservice.sh start
1、将 spark-3.0.0-bin-hadoop3.2.tgz 文件上传到 linux 并解压缩,放置在指定位置。
tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz -C /opt/module
cd /opt/module
mv spark-3.0.0-bin-hadoop3.2 spark-yarn
2、修改 hadoop 配置文件/opt/module/hadoop-3.1.3/etc/hadoop/yarn-site.xml, 并分发
<property>
<name>yarn.nodemanager.pmem-check-enabledname>
<value>falsevalue>
property>
<property>
<name>yarn.nodemanager.vmem-check-enabledname>
<value>falsevalue>
property>
3、修改 conf/spark-env.sh,添加 JAVA_HOME 和 YARN_CONF_DIR 配置
mv spark-env.sh.template spark-env.sh
。。。
export JAVA_HOME=/opt/module/jdk1.8.0
YARN_CONF_DIR=/opt/module/hadoop-3.1.3/etc/hadoop
测试
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
./examples/jars/spark-examples_2.12-3.0.0.jar \
10
1、修改 spark-defaults.conf.template 文件名为 spark-defaults.conf
mv spark-defaults.conf.template spark-defaults.conf
2、修改 spark-default.conf 文件,配置日志存储路径
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop102:8020/spark-directory
注意:需要启动 hadoop 集群,HDFS 上的目录需要提前存在。
hadoop fs -mkdir /spark-directory
3、修改 spark-env.sh 文件, 添加日志配置
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://hadoop102:8020/spark-directory
-Dspark.history.retainedApplications=30"
4、修改 spark-defaults.conf
spark.yarn.historyServer.address=hadoop102:18080
spark.history.ui.port=18080
5、启动历史服务
sbin/start-history-server.sh
6、重新提交应用
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
./examples/jars/spark-examples_2.12-3.0.0.jar \
10
<property>
<name>yarn.nodemanager.pmem-check-enabledname>
<value>falsevalue>
property>
<property>
<name>yarn.nodemanager.vmem-check-enabledname>
<value>falsevalue>
property>
3、修改 conf/spark-env.sh,添加 JAVA_HOME 和 YARN_CONF_DIR 配置
mv spark-env.sh.template spark-env.sh
。。。
export JAVA_HOME=/opt/module/jdk1.8.0
YARN_CONF_DIR=/opt/module/hadoop-3.1.3/etc/hadoop
测试
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
./examples/jars/spark-examples_2.12-3.0.0.jar \
10
1、修改 spark-defaults.conf.template 文件名为 spark-defaults.conf
mv spark-defaults.conf.template spark-defaults.conf
2、修改 spark-default.conf 文件,配置日志存储路径
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop102:8020/spark-directory
注意:需要启动 hadoop 集群,HDFS 上的目录需要提前存在。
hadoop fs -mkdir /spark-directory
3、修改 spark-env.sh 文件, 添加日志配置
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://hadoop102:8020/spark-directory
-Dspark.history.retainedApplications=30"
4、修改 spark-defaults.conf
spark.yarn.historyServer.address=hadoop102:18080
spark.history.ui.port=18080
5、启动历史服务
sbin/start-history-server.sh
6、重新提交应用
bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
./examples/jars/spark-examples_2.12-3.0.0.jar \
10