• ES8生产实践——pod日志采集(ELK方案)


    ELK方案采集介绍

    方案简介

    面对大规模集群海量日志采集需求时,filebeat相较于fluent bit拥有更高的性能,因此可以通过daemonset方式在每个k8s节点运行一个filebeat日志采集容器,用于采集业务容器产生的日志并暂存到kafka消息队列中。借助Kafka的Consumer Group技术部署多个logstash副本,由logstash集群逐个消费并写入ES,防止瞬间高峰导致直接写入ES失败,提升数据处理能力和高可用性。

    采集方案

    elk stack架构图-ELK采集日志.jpg

    Kafka部署

    生产环境推荐的kafka部署方式为operator方式部署,Strimzi是目前最主流的operator方案。集群数据量较小的话,可以采用NFS共享存储,数据量较大的话可使用local pv存储。

    部署operator

    operator部署方式为helm或yaml文件部署,此处以helm方式部署为例:

    [root@tiaoban kafka]# helm repo add strimzi https://strimzi.io/charts/
    "strimzi" has been added to your repositories
    [root@tiaoban kafka]# helm install strimzi -n kafka strimzi/strimzi-kafka-operator
    NAME: strimzi
    LAST DEPLOYED: Sun Oct  8 21:16:31 2023
    NAMESPACE: kafka
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    NOTES:
    Thank you for installing strimzi-kafka-operator-0.37.0
    
    To create a Kafka cluster refer to the following documentation.
    
    https://strimzi.io/docs/operators/latest/deploying.html#deploying-cluster-operator-helm-chart-str
    
    [root@tiaoban strimzi-kafka-operator]# kubectl get pod -n kafka
    NAME                                        READY   STATUS    RESTARTS   AGE
    strimzi-cluster-operator-56fdbb99cb-gznkw   1/1     Running   0          17m
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19

    查看示例文件

    Strimzi官方仓库为我们提供了各种场景下的示例文件,资源清单下载地址:https://github.com/strimzi/strimzi-kafka-operator/releases

    [root@tiaoban kafka]# ls
    strimzi-kafka-operator
    [root@tiaoban kafka]# wget https://github.com/strimzi/strimzi-kafka-operator/releases/download/0.37.0/strimzi-0.37.0.tar.gz
    [root@tiaoban kafka]# tar -zxf strimzi-0.37.0.tar.gz
    [root@tiaoban kafka]# cd strimzi-0.37.0/examples/kafka
    [root@tiaoban kafka]# ls
    kafka-ephemeral-single.yaml  kafka-ephemeral.yaml  kafka-jbod.yaml  kafka-persistent-single.yaml  kafka-persistent.yaml  nodepools
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • kafka-persistent.yaml:部署具有三个 ZooKeeper 和三个 Kafka 节点的持久集群。(推荐)
    • kafka-jbod.yaml:部署具有三个 ZooKeeper 和三个 Kafka 节点(每个节点使用多个持久卷)的持久集群。
    • kafka-persistent-single.yaml:部署具有单个 ZooKeeper 节点和单个 Kafka 节点的持久集群。
    • kafka-ephemeral.yaml:部署具有三个 ZooKeeper 和三个 Kafka 节点的临时群集。
    • kafka-ephemeral-single.yaml:部署具有三个 ZooKeeper 节点和一个 Kafka 节点的临时群集。

    创建pvc资源

    此处以nfs存储为例,提前创建pvc资源,分别用于3个zookeeper和3个kafka持久化存储数据使用。

    [root@tiaoban kafka]# cat kafka-pvc.yaml
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: data-my-cluster-zookeeper-0
      namespace: kafka
    spec:
      storageClassName: nfs-client
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
    ---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: data-my-cluster-zookeeper-1
      namespace: kafka
    spec:
      storageClassName: nfs-client
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
    ---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: data-my-cluster-zookeeper-2
      namespace: kafka
    spec:
      storageClassName: nfs-client
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
    ---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: data-0-my-cluster-kafka-0
      namespace: kafka
    spec:
      storageClassName: nfs-client
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
    ---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: data-0-my-cluster-kafka-1
      namespace: kafka
    spec:
      storageClassName: nfs-client
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
    ---
    kind: PersistentVolumeClaim
    apiVersion: v1
    metadata:
      name: data-0-my-cluster-kafka-2
      namespace: kafka
    spec:
      storageClassName: nfs-client
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Gi
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78

    部署kafka和zookeeper

    参考官方仓库的kafka-persistent.yaml示例文件,部署三个 ZooKeeper 和三个 Kafka 节点的持久集群。

    [root@tiaoban kafka]# cat kafka.yaml
    apiVersion: kafka.strimzi.io/v1beta2
    kind: Kafka
    metadata:
      name: my-cluster
      namespace: kafka
    spec:
      kafka:
        version: 3.5.1
        replicas: 3
        listeners:
          - name: plain
            port: 9092
            type: internal
            tls: false
          - name: tls
            port: 9093
            type: internal
            tls: true
        config:
          offsets.topic.replication.factor: 3
          transaction.state.log.replication.factor: 3
          transaction.state.log.min.isr: 2
          default.replication.factor: 3
          min.insync.replicas: 2
          inter.broker.protocol.version: "3.5"
        storage:
          type: jbod
          volumes:
          - id: 0
            type: persistent-claim
            size: 100Gi
            deleteClaim: false
      zookeeper:
        replicas: 3
        storage:
          type: persistent-claim
          size: 100Gi
          deleteClaim: false
      entityOperator:
        topicOperator: {}
        userOperator: {}
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42

    访问验证

    查看资源信息,已成功创建相关pod和svc资源。

    [root@tiaoban kafka]# kubectl get pod -n kafka
    NAME                                          READY   STATUS    RESTARTS   AGE
    my-cluster-entity-operator-7c68d4b9d9-tg56j   3/3     Running   0          2m15s
    my-cluster-kafka-0                            1/1     Running   0          2m54s
    my-cluster-kafka-1                            1/1     Running   0          2m54s
    my-cluster-kafka-2                            1/1     Running   0          2m54s
    my-cluster-zookeeper-0                        1/1     Running   0          3m19s
    my-cluster-zookeeper-1                        1/1     Running   0          3m19s
    my-cluster-zookeeper-2                        1/1     Running   0          3m19s
    strimzi-cluster-operator-56fdbb99cb-gznkw     1/1     Running   0          97m
    [root@tiaoban kafka]# kubectl get svc -n kafka
    NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                                        AGE
    my-cluster-kafka-bootstrap    ClusterIP   10.99.246.133   <none>        9091/TCP,9092/TCP,9093/TCP                     3m3s
    my-cluster-kafka-brokers      ClusterIP   None            <none>        9090/TCP,9091/TCP,8443/TCP,9092/TCP,9093/TCP   3m3s
    my-cluster-zookeeper-client   ClusterIP   10.109.106.29   <none>        2181/TCP                                       3m28s
    my-cluster-zookeeper-nodes    ClusterIP   None            <none>        2181/TCP,2888/TCP,3888/TCP                     3m28s
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16

    部署kafka-ui

    创建configmap和ingress资源,在configmap中指定kafka连接地址。以traefik为例,创建ingress资源便于通过域名方式访问。

    [root@tiaoban kafka]# cat kafka-ui.yaml 
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: kafka-ui-helm-values
      namespace: kafka
    data:
      KAFKA_CLUSTERS_0_NAME: "kafka-cluster"
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: "my-cluster-kafka-brokers.kafka.svc:9092"
      AUTH_TYPE: "DISABLED"
      MANAGEMENT_HEALTH_LDAP_ENABLED: "FALSE" 
    ---
    apiVersion: traefik.containo.us/v1alpha1
    kind: IngressRoute
    metadata:
      name: kafka-ui
      namespace: kafka
    spec:
      entryPoints:
      - web
      routes:
      - match: Host(`kafka-ui.local.com`) 
        kind: Rule
        services:
          - name: kafka-ui
            port: 80
    [root@tiaoban kafka]# kubectl apply -f kafka-ui.yaml 
    configmap/kafka-ui-helm-values created
    ingressroute.traefik.containo.us/kafka-ui created
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29

    helm方式部署kafka-ui并指定配置文件

    [root@tiaoban kafka]# helm install kafka-ui kafka-ui/kafka-ui -n kafka --set existingConfigMap="kafka-ui-helm-values"
    NAME: kafka-ui
    LAST DEPLOYED: Mon Oct  9 09:56:45 2023
    NAMESPACE: kafka
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    NOTES:
    1. Get the application URL by running these commands:
      export POD_NAME=$(kubectl get pods --namespace kafka -l "app.kubernetes.io/name=kafka-ui,app.kubernetes.io/instance=kafka-ui" -o jsonpath="{.items[0].metadata.name}")
      echo "Visit http://127.0.0.1:8080 to use your application"
      kubectl --namespace kafka port-forward $POD_NAME 8080:8080
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    访问验证,添加hosts记录192.168.10.100 kafka-ui.local.com,然后访问测试。
    image.png

    filebeat部署配置

    资源清单

    • rbac.yaml:创建filebeat用户和filebeat角色,并授予filebeat角色获取集群资源权限,并绑定角色与权限。
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: filebeat
      namespace: elk
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: filebeat
      namespace: elk
    rules:
      - apiGroups: ["","apps","batch"]
        resources: ["*"]
        verbs:
          - get
          - watch
          - list
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: filebeat
      namespace: elk
    subjects:
      - kind: ServiceAccount
        name: filebeat
        namespace: elk
    roleRef:
      kind: ClusterRole
      name: filebeat
      apiGroup: rbac.authorization.k8s.io
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • filebeat-conf.yaml:使用filebeat.autodiscover方式自动获取pod日志,避免新增pod时日志采集不到的情况发生,并将日志发送到kafka消息队列中。
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: filebeat-config
      namespace: elk
    data:
      filebeat.yml: |-
        filebeat.autodiscover:
          providers:  # 启用自动发现采集pod日志
          - type: kubernetes
            node: ${NODE_NAME}
            hints.enabled: true
            hints.default_config:
              type: container
              paths:
              - /var/log/containers/*${data.kubernetes.container.id}.log
              exclude_files: ['.*filebeat-.*'] # 排除filebeat自身日志采集
          multiline: # 避免日志换行
            pattern: '^[0-9]{4}-[0-9]{2}-[0-9]{2}' 
            negate: true 
            match: after
        
        processors:
        - add_kubernetes_metadata: # 增加kubernetes的属性
            in_cluster: true
            host: ${NODE_NAME}
            matchers:
            - logs_path:
                logs_path: "/var/log/containers/"
        - drop_event: # 不收集debug日志
            when: 
              contains:
                message: "DEBUG"
      
        output.kafka:
          hosts: ["my-cluster-kafka-brokers.kafka.svc:9092"]
          topic: "pod_logs"
          partition.round_robin:
            reachable_only: false
          required_acks: -1
          compression: gzip
        
        monitoring: # monitoring相关配置
          enabled: true
          cluster_uuid: "ZUnqLCRqQL2jeo5FNvMI9g"
          elasticsearch:
            hosts:  ["https://elasticsearch-es-http.elk.svc:9200"]
            username: "elastic" 
            password: "2zg5q6AU7xW5jY649yuEpZ47"
            ssl.verification_mode: "none"
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • filebeat.yaml:使用daemonset方式每个节点运行一个filebeat容器,并挂载filebeat配置文件、数据目录、宿主机日志目录。
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: filebeat
      namespace: elk
      labels:
        app: filebeat
    spec:
      selector:
        matchLabels:
          app: filebeat
      template:
        metadata:
          labels:
            app: filebeat
        spec:
          serviceAccountName: filebeat
          dnsPolicy: ClusterFirstWithHostNet
          containers:
            - name: filebeat
              image: harbor.local.com/elk/filebeat:8.9.1
              args: ["-c","/etc/filebeat/filebeat.yml","-e"]
              env:
                - name: NODE_NAME
                  valueFrom:
                    fieldRef:
                      fieldPath: spec.nodeName
              securityContext:
                runAsUser: 0
              resources:
                limits:
                  cpu: 500m
                  memory: 1Gi
              volumeMounts:
                - name: timezone
                  mountPath: /etc/localtime
                - name: config
                  mountPath: /etc/filebeat/filebeat.yml
                  subPath: filebeat.yml
                - name: data
                  mountPath: /usr/share/filebeat/data
                - name: containers
                  mountPath: /var/log/containers
                  readOnly: true
                - name: logs
                  mountPath: /var/log/pods
          volumes:
            - name: timezone
              hostPath:
                path: /usr/share/zoneinfo/Asia/Shanghai
            - name: config
              configMap:
                name: filebeat-config
            - name: data
              hostPath:
                path: /var/lib/filebeat-data
                type: DirectoryOrCreate
            - name: containers
              hostPath:
                path: /var/log/containers
            - name: logs
              hostPath:
                path: /var/log/pods
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63

    访问验证

    查看pod信息,在集群每个节点上运行了一个filebeat采集容器。

    [root@tiaoban ~]# kubectl get pod -n elk | grep filebeat
    filebeat-8p24s             1/1     Running        0      29s
    filebeat-chh9b             1/1     Running        0      29s
    filebeat-dl28d             1/1     Running        0      29s
    filebeat-gnkt6             1/1     Running        0      29s
    filebeat-m4rfx             1/1     Running        0      29s
    filebeat-w4pdz             1/1     Running        0      29s
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    查看kafka topic信息,已经成功创建了名为pod_logs的topic,此时我们调整partitions为2,方便logstash多副本消费。
    image.png

    logstash部署配置

    构建镜像

    由于logstash镜像默认不包含geoip地理位置数据库文件,如果需要解析ip位置信息时会存在解析失败的情况。因此需要提前构建包含geoip数据库文件的logstash镜像,并上传至harbor仓库中。

    [root@tiaoban elk]# cat Dockerfile
    FROM docker.elastic.co/logstash/logstash:8.9.1
    ADD GeoLite2-City.mmdb /etc/logstash/GeoLite2-City.mmdb
    [root@tiaoban elk]# docker build -t harbor.local.com/elk/logstash:v8.9.1 .
    [root@tiaoban elk]# docker push harbor.local.com/elk/logstash:v8.9.1
    
    • 1
    • 2
    • 3
    • 4
    • 5

    资源清单

    • logstash-log4j2.yaml:容器方式运行时,logstash日志默认使用的console输出, 不记录到日志文件中, logs目录下面只有gc.log,我们可以通过配置log4j2设置,将日志写入到文件中,方便fleet采集分析logstash日志。
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: logstash-log4j2
      namespace: elk
    data:
      log4j2.properties: |
        status = error
        name = LogstashPropertiesConfig
    
        appender.console.type = Console
        appender.console.name = plain_console
        appender.console.layout.type = PatternLayout
        appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c]%notEmpty{[%X{pipeline.id}]}%notEmpty{[%X{plugin.id}]} %m%n
    
        appender.json_console.type = Console
        appender.json_console.name = json_console
        appender.json_console.layout.type = JSONLayout
        appender.json_console.layout.compact = true
        appender.json_console.layout.eventEol = true
    
        appender.rolling.type = RollingFile
        appender.rolling.name = plain_rolling
        appender.rolling.fileName = ${sys:ls.logs}/logstash-plain.log
        appender.rolling.filePattern = ${sys:ls.logs}/logstash-plain-%d{yyyy-MM-dd}-%i.log.gz
        appender.rolling.policies.type = Policies
        appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
        appender.rolling.policies.time.interval = 1
        appender.rolling.policies.time.modulate = true
        appender.rolling.layout.type = PatternLayout
        appender.rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c]%notEmpty{[%X{pipeline.id}]}%notEmpty{[%X{plugin.id}]} %m%n
        appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
        appender.rolling.policies.size.size = 100MB
        appender.rolling.strategy.type = DefaultRolloverStrategy
        appender.rolling.strategy.max = 30
        appender.rolling.avoid_pipelined_filter.type = PipelineRoutingFilter
    
        appender.json_rolling.type = RollingFile
        appender.json_rolling.name = json_rolling
        appender.json_rolling.fileName = ${sys:ls.logs}/logstash-json.log
        appender.json_rolling.filePattern = ${sys:ls.logs}/logstash-json-%d{yyyy-MM-dd}-%i.log.gz
        appender.json_rolling.policies.type = Policies
        appender.json_rolling.policies.time.type = TimeBasedTriggeringPolicy
        appender.json_rolling.policies.time.interval = 1
        appender.json_rolling.policies.time.modulate = true
        appender.json_rolling.layout.type = JSONLayout
        appender.json_rolling.layout.compact = true
        appender.json_rolling.layout.eventEol = true
        appender.json_rolling.policies.size.type = SizeBasedTriggeringPolicy
        appender.json_rolling.policies.size.size = 100MB
        appender.json_rolling.strategy.type = DefaultRolloverStrategy
        appender.json_rolling.strategy.max = 30
        appender.json_rolling.avoid_pipelined_filter.type = PipelineRoutingFilter
    
        appender.routing.type = PipelineRouting
        appender.routing.name = pipeline_routing_appender
        appender.routing.pipeline.type = RollingFile
        appender.routing.pipeline.name = appender-${ctx:pipeline.id}
        appender.routing.pipeline.fileName = ${sys:ls.logs}/pipeline_${ctx:pipeline.id}.log
        appender.routing.pipeline.filePattern = ${sys:ls.logs}/pipeline_${ctx:pipeline.id}.%i.log.gz
        appender.routing.pipeline.layout.type = PatternLayout
        appender.routing.pipeline.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %m%n
        appender.routing.pipeline.policy.type = SizeBasedTriggeringPolicy
        appender.routing.pipeline.policy.size = 100MB
        appender.routing.pipeline.strategy.type = DefaultRolloverStrategy
        appender.routing.pipeline.strategy.max = 30
    
        rootLogger.level = ${sys:ls.log.level}
        rootLogger.appenderRef.console.ref = ${sys:ls.log.format}_console
        rootLogger.appenderRef.rolling.ref = ${sys:ls.log.format}_rolling
        rootLogger.appenderRef.routing.ref = pipeline_routing_appender
    
        # Slowlog
    
        appender.console_slowlog.type = Console
        appender.console_slowlog.name = plain_console_slowlog
        appender.console_slowlog.layout.type = PatternLayout
        appender.console_slowlog.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %m%n
    
        appender.json_console_slowlog.type = Console
        appender.json_console_slowlog.name = json_console_slowlog
        appender.json_console_slowlog.layout.type = JSONLayout
        appender.json_console_slowlog.layout.compact = true
        appender.json_console_slowlog.layout.eventEol = true
    
        appender.rolling_slowlog.type = RollingFile
        appender.rolling_slowlog.name = plain_rolling_slowlog
        appender.rolling_slowlog.fileName = ${sys:ls.logs}/logstash-slowlog-plain.log
        appender.rolling_slowlog.filePattern = ${sys:ls.logs}/logstash-slowlog-plain-%d{yyyy-MM-dd}-%i.log.gz
        appender.rolling_slowlog.policies.type = Policies
        appender.rolling_slowlog.policies.time.type = TimeBasedTriggeringPolicy
        appender.rolling_slowlog.policies.time.interval = 1
        appender.rolling_slowlog.policies.time.modulate = true
        appender.rolling_slowlog.layout.type = PatternLayout
        appender.rolling_slowlog.layout.pattern = [%d{ISO8601}][%-5p][%-25c] %m%n
        appender.rolling_slowlog.policies.size.type = SizeBasedTriggeringPolicy
        appender.rolling_slowlog.policies.size.size = 100MB
        appender.rolling_slowlog.strategy.type = DefaultRolloverStrategy
        appender.rolling_slowlog.strategy.max = 30
    
        appender.json_rolling_slowlog.type = RollingFile
        appender.json_rolling_slowlog.name = json_rolling_slowlog
        appender.json_rolling_slowlog.fileName = ${sys:ls.logs}/logstash-slowlog-json.log
        appender.json_rolling_slowlog.filePattern = ${sys:ls.logs}/logstash-slowlog-json-%d{yyyy-MM-dd}-%i.log.gz
        appender.json_rolling_slowlog.policies.type = Policies
        appender.json_rolling_slowlog.policies.time.type = TimeBasedTriggeringPolicy
        appender.json_rolling_slowlog.policies.time.interval = 1
        appender.json_rolling_slowlog.policies.time.modulate = true
        appender.json_rolling_slowlog.layout.type = JSONLayout
        appender.json_rolling_slowlog.layout.compact = true
        appender.json_rolling_slowlog.layout.eventEol = true
        appender.json_rolling_slowlog.policies.size.type = SizeBasedTriggeringPolicy
        appender.json_rolling_slowlog.policies.size.size = 100MB
        appender.json_rolling_slowlog.strategy.type = DefaultRolloverStrategy
        appender.json_rolling_slowlog.strategy.max = 30
    
        logger.slowlog.name = slowlog
        logger.slowlog.level = trace
        logger.slowlog.appenderRef.console_slowlog.ref = ${sys:ls.log.format}_console_slowlog
        logger.slowlog.appenderRef.rolling_slowlog.ref = ${sys:ls.log.format}_rolling_slowlog
        logger.slowlog.additivity = false
    
        logger.licensereader.name = logstash.licensechecker.licensereader
        logger.licensereader.level = error
    
        # Silence http-client by default
        logger.apache_http_client.name = org.apache.http
        logger.apache_http_client.level = fatal
    
        # Deprecation log
        appender.deprecation_rolling.type = RollingFile
        appender.deprecation_rolling.name = deprecation_plain_rolling
        appender.deprecation_rolling.fileName = ${sys:ls.logs}/logstash-deprecation.log
        appender.deprecation_rolling.filePattern = ${sys:ls.logs}/logstash-deprecation-%d{yyyy-MM-dd}-%i.log.gz
        appender.deprecation_rolling.policies.type = Policies
        appender.deprecation_rolling.policies.time.type = TimeBasedTriggeringPolicy
        appender.deprecation_rolling.policies.time.interval = 1
        appender.deprecation_rolling.policies.time.modulate = true
        appender.deprecation_rolling.layout.type = PatternLayout
        appender.deprecation_rolling.layout.pattern = [%d{ISO8601}][%-5p][%-25c]%notEmpty{[%X{pipeline.id}]}%notEmpty{[%X{plugin.id}]} %m%n
        appender.deprecation_rolling.policies.size.type = SizeBasedTriggeringPolicy
        appender.deprecation_rolling.policies.size.size = 100MB
        appender.deprecation_rolling.strategy.type = DefaultRolloverStrategy
        appender.deprecation_rolling.strategy.max = 30
    
        logger.deprecation.name = org.logstash.deprecation, deprecation
        logger.deprecation.level = WARN
        logger.deprecation.appenderRef.deprecation_rolling.ref = deprecation_plain_rolling
        logger.deprecation.additivity = false
    
        logger.deprecation_root.name = deprecation
        logger.deprecation_root.level = WARN
        logger.deprecation_root.appenderRef.deprecation_rolling.ref = deprecation_plain_rolling
        logger.deprecation_root.additivity = false
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153
    • 154
    • logstash-conf.yaml:修改Logstash配置,禁用默认的指标收集配置,并指定es集群uuid。
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: logstash-config
      namespace: elk
    data:
      logstash.conf: |
        api.enabled: true
        api.http.port: 9600
        xpack.monitoring.enabled: false
        monitoring.cluster_uuid: "ZUnqLCRqQL2jeo5FNvMI9g"
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • pod-pipeline.yaml:配置pipeline处理pod日志规则,从kafka读取数据后移除非必要的字段,然后写入ES集群中。
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: logstash-pod-pipeline
      namespace: elk
    data:
      pipeline.conf: |
        input {
            kafka {
                bootstrap_servers=>"my-cluster-kafka-brokers.kafka.svc:9092"
                auto_offset_reset => "latest"
                topics=>["pod_logs"]
                codec => "json"
                group_id => "pod"
            }
        }
        filter {
          mutate {
            remove_field => ["agent","event","ecs","host","[kubernetes][labels]","input","log","orchestrator","stream"]
          }
        }
        output{
          elasticsearch{
            hosts => ["https://elasticsearch-es-http.elk.svc:9200"]
            data_stream => "true"
            data_stream_type => "logs"
            data_stream_dataset => "pod"
            data_stream_namespace => "elk"
            user => "elastic"
            password => "2zg5q6AU7xW5jY649yuEpZ47"
            ssl_enabled => "true"
            ssl_verification_mode => "none"
          }
        }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • pod-logstash.yaml:部署2副本的logstash容器,挂载pipeline、log4j2、logstash配置文件、日志路径资源。
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: logstash-pod
      namespace: elk
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: logstash-pod
      template:
        metadata:
          labels:
            app: logstash-pod
            monitor: enable
        spec:
          securityContext:
            runAsUser: 0
          containers:
          - image: harbor.local.com/elk/logstash:v8.9.1
            name: logstash-pod
            resources:
              limits:
                cpu: "1"
                memory: 1Gi
            args:
            - -f
            - /usr/share/logstash/pipeline/pipeline.conf
            env:
            - name: XPACK_MONITORING_ENABLED
              value: "false"
            ports:
              - containerPort: 9600
            volumeMounts:
            - name: timezone
              mountPath: /etc/localtime
            - name: config
              mountPath: /usr/share/logstash/config/logstash.conf
              subPath: logstash.conf
            - name: log4j2
              mountPath: /usr/share/logstash/config/log4j2.properties
              subPath: log4j2.properties
            - name: pipeline
              mountPath: /usr/share/logstash/pipeline/pipeline.conf
              subPath: pipeline.conf
            - name: log
              mountPath: /usr/share/logstash/logs
          volumes:
          - name: timezone
            hostPath:
              path: /usr/share/zoneinfo/Asia/Shanghai
          - name: config
            configMap:
              name: logstash-config
          - name: log4j2
            configMap:
              name: logstash-log4j2
          - name: pipeline
            configMap:
              name: logstash-pod-pipeline
          - name: log
            hostPath:
              path: /var/log/logstash
              type: DirectoryOrCreate
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • logstash-svc.yaml:创建svc资源,用于暴露logstash监控信息接口。
    apiVersion: v1
    kind: Service
    metadata:
      name: logstash-monitor
      namespace: elk
    spec:
      selector:
        monitor: enable
      ports:
      - port: 9600
        targetPort: 9600
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    添加监控指标采集

    在fleet集成策略中安装logstash集群,并配置metrics接口地址为http://logstash-monitor.elk.svc:9600
    image.png

    访问验证

    查看pod信息,已正常运行2副本的logstash。

    [root@tiaoban ~]# kubectl get pod -n elk | grep logstash
    logstash-pod-7bb6f6c8c6-ffc4b       1/1     Running   0       58s
    logstash-pod-7bb6f6c8c6-qv9kd       1/1     Running   0       58s
    
    • 1
    • 2
    • 3

    登录kibana查看监控信息,已成功采集filebeat和logstash指标和日志数据。
    image.png
    查看数据流信息,已成功创建名为logs-pod-elk的数据流。
    image.png
    查看数据流内容,成功存储解析了pod所在节点、namespace、container、日志内容等数据。
    image.png

    自定义日志解析

    需求分析

    默认情况下,fluent bit会采集所有pod日志信息,并自动添加namespace、pod、container等信息,所有日志内容存储在log字段中。
    以log-demo应为日志为例,将所有日志内容存储到log字段下,如果想要按条件筛选分析日志数据时,无法很好的解析日志内容,因此需要配置logstash解析规则,实现日志自定义日志内容解析。
    image.png

    资源清单

    • myapp-pipeline.yaml:从kafka中读取数据,当匹配到[kubernetes][deployment][name]字段值为log-demo时,进一步做解析处理,其余日志数据丢弃。logstash详细配置可参考历史文章:https://www.cuiliangblog.cn/detail/article/63
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: logstash-myapp-pipeline
      namespace: elk
    data:
      pipeline.conf: |
        input {
            kafka {
                bootstrap_servers=>"my-cluster-kafka-brokers.kafka.svc:9092"
                auto_offset_reset => "latest"
                topics=>["pod_logs"]
                codec => "json"
                group_id => "myapp"
            }
        }
        filter {
          if [kubernetes][deployment][name] == "log-demo" {
            grok{
              match => {"message" => "%{TIMESTAMP_ISO8601:log_timestamp} \| %{LOGLEVEL:level} %{SPACE}* \| (?[__main__:[\w]*:\d*]+) \- %{GREEDYDATA:content}"}
            }
            mutate {
              gsub =>[
                  "content", "'", '"'
              ]
              lowercase => [ "level" ]
            }
            json {
              source => "content"
            }
            geoip {
              source => "remote_address"
              database => "/etc/logstash/GeoLite2-City.mmdb"
              ecs_compatibility => disabled
            }
            mutate {
              remove_field => ["agent","event","ecs","host","[kubernetes][labels]","input","log","orchestrator","stream","content"]
            }
          }
          else {
            drop{}
          }
        }
        output{
          elasticsearch{
            hosts => ["https://elasticsearch-es-http.elk.svc:9200"]
            data_stream => "true"
            data_stream_type => "logs"
            data_stream_dataset => "myapp"
            data_stream_namespace => "elk"
            user => "elastic"
            password => "2zg5q6AU7xW5jY649yuEpZ47"
            ssl_enabled => "true"
            ssl_verification_mode => "none"
          }
        }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • myapp-logstash.yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: logstash-myapp
      namespace: elk
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: logstash-myapp
      template:
        metadata:
          labels:
            app: logstash-myapp
            monitor: enable
        spec:
          securityContext:
            runAsUser: 0
          containers:
          - image: harbor.local.com/elk/logstash:v8.9.1
            name: logstash-myapp
            resources:
              limits:
                cpu: "1"
                memory: 1Gi
            args:
            - -f
            - /usr/share/logstash/pipeline/pipeline.conf
            env:
            - name: XPACK_MONITORING_ENABLED
              value: "false"
            ports:
              - containerPort: 9600
            volumeMounts:
            - name: timezone
              mountPath: /etc/localtime
            - name: config
              mountPath: /usr/share/logstash/config/logstash.conf
              subPath: logstash.conf
            - name: log4j2
              mountPath: /usr/share/logstash/config/log4j2.properties
              subPath: log4j2.properties
            - name: pipeline
              mountPath: /usr/share/logstash/pipeline/pipeline.conf
              subPath: pipeline.conf
            - name: log
              mountPath: /usr/share/logstash/logs
          volumes:
          - name: timezone
            hostPath:
              path: /usr/share/zoneinfo/Asia/Shanghai
          - name: config
            configMap:
              name: logstash-config
          - name: log4j2
            configMap:
              name: logstash-log4j2
          - name: pipeline
            configMap:
              name: logstash-myapp-pipeline
          - name: log
            hostPath:
              path: /var/log/logstash
              type: DirectoryOrCreate
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64

    访问验证

    查看数据流信息,已成功创建名为logs-myapp-elk的数据流。
    image.png
    查看数据流详细内容,成功解析了日志相关字段数据。
    image.png

    注意事项

    kafka partition数配置

    需要注意的是每个consumer最多只能使用一个partition,当一个Group内consumer的数量大于partition的数量时,只有等于partition个数的consumer能同时消费,其他的consumer处于等待状态。因此想要增加logstash的消费性能,可以适当的增加topic的partition数量,但kafka中partition数量过多也会导致kafka集群故障恢复时间过长。

    logstash副本数配置

    Logstash副本数=kafka partition数/每个logstash线程数(默认为1,数据量大时可增加线程数,建议不超过4)

    完整资源清单

    本实验案例所有yaml文件已上传至git仓库。访问地址如下:

    github

    https://github.com/cuiliang0302/blog-demo

    gitee

    https://gitee.com/cuiliang0302/blog_demo

    参考文档

    helm部署Strimzi:https://strimzi.io/docs/operators/latest/deploying#deploying-cluster-operator-helm-chart-str
    filebeat通过自动发现采集k8s日志:https://www.elastic.co/guide/en/beats/filebeat/current/configuration-autodiscover-hints.html
    kubernetes集群运行filebeat:https://www.elastic.co/guide/en/beats/filebeat/current/running-on-kubernetes.html
    filebeat处理器新增kubernetes元数据信息:https://www.elastic.co/guide/en/beats/filebeat/current/add-kubernetes-metadata.html
    filebeat丢弃指定事件:https://www.elastic.co/guide/en/beats/filebeat/current/drop-event.html

    查看更多

    微信公众号

    微信公众号同步更新,欢迎关注微信公众号《崔亮的博客》第一时间获取最近文章。

    博客网站

    崔亮的博客-专注devops自动化运维,传播优秀it运维技术文章。更多原创运维开发相关文章,欢迎访问https://www.cuiliangblog.cn

  • 相关阅读:
    VsCode搭建Java开发环境 vscode搭建java开发环境 vscode springboot 搭建springboot
    小国王——状压DP
    Day 44 Ansible自动化运维
    2023-10-11 python-windows平台-安装-记录
    C#程序随系统启动例子 - 开源研究系列文章
    JUC并发编程系列详解篇十二(synchronized底层原理进阶)
    Python绘制各种图形(模板)
    13薪|18k-30k -JAVA开发工程师[北京市 - 海淀区]
    伴随对象的初始化
    Vue路由重复点击报错解决
  • 原文地址:https://blog.csdn.net/qq_33816243/article/details/133934539