flume1.11.0安装部署

1、准备安装包apache-flume-1.11.0-bin.tar.gz；

上传；

2、安装flume-1.11.0；

解压；

tar -zxvf apache-flume-1.11.0-bin.tar.gz -C /opt/server

进入conf目录，修改flume-env.sh，配置JAVA_HOME；


cd /opt/server/apache-flume-1.9.0-bin/conf
 
# 先复制一份flume-env.sh.template文件
cp flume-env.sh.template flume-env.sh
 
# 修改
vim flume-env.sh
export JAVA_HOME=/opt/server/jdk1.8.0_221

3、flume采集nginx的日志数据，保存到hdfs；

安装nginx;


yum install epel-release
 
yum update
 
yum -y install nginx

"yum update"命令最后报错了，但好像没有影响nginx的安装，估计是版本兼容性问题，如下图；

nginx命令；


systemctl start nginx #开启nginx服务
 
systemctl stop nginx #停止nginx服务
 
systemctl restart nginx #重启nginx服务

启动nginx后，访问80端口；

nginx网络80端口访问日志文件保存位置；

cd /var/log/nginx

4、flume-1.9之后版本整合hadoop3.x版本；

注意：网上说“在hadoop3.x之前需要将flume的lib 文件夹下的 guava-11.0.2.jar 删除，否则会报错，Hadoop 3.1.0之后无需删除，是兼容的，flume1.9”，本次没有删除。

"/opt/server/apache-flume-1.11.0-bin/lib/guava-11.0.2.jar"

拷贝hadoop3.x里相关的jar包到flume-1.11.0的lib目录；


 
 
cp /opt/server/hadoop-3.3.1/share/hadoop/common/*.jar /opt/server/apache-flume-1.11.0-bin/lib
 
cp /opt/server/hadoop-3.3.1/share/hadoop/common/lib/*.jar /opt/server/apache-flume-1.11.0-bin/lib
 
cp /opt/server/hadoop-3.3.1/share/hadoop/hdfs/*.jar /opt/server/apache-flume-1.11.0-bin/lib

5、flume采集nginx日志，保存到hdfs；

在目录“/opt/server/apache-flume-1.11.0-bin/conf/”创建配置文件taildir-hdfs.conf，并且编辑内容；

taildir-hdfs.conf；


a3.sources = r3
a3.sinks = k3
a3.channels = c3
 
# Describe/configure the source
a3.sources.r3.type = TAILDIR
a3.sources.r3.filegroups = f1
 
# 此处支持正则
a3.sources.r3.filegroups.f1 = /var/log/nginx/access.log
 
# 用于记录文件读取的位置信息
a3.sources.r3.positionFile = /opt/server/apache-flume-1.11.0-bin/tail_dir.json
 
# Describe the sink
a3.sinks.k3.type = hdfs
a3.sinks.k3.hdfs.path = hdfs://server:8020/user/tailDir
a3.sinks.k3.hdfs.fileType = DataStream
 
# 设置每个文件的滚动大小大概是 128M，默认值：1024，当临时文件达到该大小（单位：bytes）时，滚动成目标文件。如果设置成0，则表示不根据临时文件大小来滚动文件。
a3.sinks.k3.hdfs.rollSize = 134217700
 
# 默认值：10，当events数据达到该数量时候，将临时文件滚动成目标文件，如果设置成0，则表示不根据events数据来滚动文件。
a3.sinks.k3.hdfs.rollCount = 0
 
# 不随时间滚动，默认为30秒
a3.sinks.k3.hdfs.rollInterval = 60
 
# flume检测到hdfs在复制块时会自动滚动文件，导致roll参数不生效，要将该参数设置为1；否则HFDS文
件所在块的复制会引起文件滚动
a3.sinks.k3.hdfs.minBlockReplicas = 1
 
# Use a channel which buffers events in memory
a3.channels.c3.type = memory
a3.channels.c3.capacity = 1000
a3.channels.c3.transactionCapacity = 100
 
# Bind the source and sink to the channel
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3

flume启动命令：“./bin/flume-ng agent -c ./conf -f ./conf/taildir-hdfs.conf -n a3 -Dflume.root.logger=INFO,console”；

用"ctrl+c"也可以停止当前运行的进程；

日志已经写入hdfs；

注意：在flume1.10之后的版本，启动命令使用参数“-Dflume.root.logger=INFO,console”，仍无法在控制台打印日志，主要原因是：Flume从1.10版本开始，使用Log4j 2.x替换Log4j 1.x版本，使用log4j2.xml替换log4j.properties。

网上有解决方法的文章。

相关阅读:
基于springboot的旅游景点管理系统
内核实战教程第1期｜数据库系统概述，带你走近 OceanBase 研发环境！
【React源码】(十三) Hook 原理(概览)
【C++11新算法】all_of、any_of、none_of算法
Java多线程基础，你可以这样学
国稻种芯百团计划行动丰收节贸促会·袁隆平：水稻国际竞争
精酿啤酒新风尚，FENDI CLUB盛宴启幕，品质生活触手可及
openresty 动态黑白名单
Vue3+ts学习笔记1
《软件性能测试分析与调优实践之路》第二版-手稿节选-Mysql数据库性能定位与分析

原文地址：https://blog.csdn.net/shanxiderenheni/article/details/132729592