1.1 围绕日志标签构建索引,而不是像es一样进行全文索引
1.2 多租户
通过tenant ID实现多租户,如果关闭多租户, 则默认唯一租户为fake
1.3 部署模式
1.1 单进程模式
所有的组件运行在一个进程中,适用于测试环境或者较小的生产环境
1.2 微服务可扩展模式
各组件单独运行,可水平伸缩扩展
1. Distributor
负责处理客户端的日志写入,负责接收日志数据,然后将其拆分成多个块,并行的发送给ingester
Distributor通过GRPC协议与Ingester进行通信
2. Hashing
Distributor通过一致性哈希和可配置因子来确定哪些Ingester服务的实例应该接收日志数据
hash基于日志标签和tenant ID
console中的hash ring用于实现一致性hash,所有Ingester都使用自己拥有的一组令牌注册到console中,Distributor通过找到日志hash值最
匹配的令牌并将日志数据发送给该令牌的所有者
3. Ingester
负责将日志数据写入持久化后端(S3,OSS)
Ingester负责所有的日志行有序
Ingester负责所有的日志行按升序排序,如果收到乱序的日志行,将拒绝并报错
来自每一组唯一标签的日志在内存中被构建为“块”,然后被刷新到备份存储后端。
如果ingester进程奔溃,内存中构建的块的数据未刷写到磁盘,则会丢失
4. Querier
LogQL
首先尝试查询所有Ingester的内存数据,然后再从后端存储加载数据。
5. Chunk Store
块存储是Loki长期数据存储,支持交互式查询和持续写入
包含如下:
1.1 块索引
1.2 块数据本身的键值存储
注意: 块存储不是单独的服务,而是嵌入到需要访问的Loki数据的服务中的库:Querier和Ingester
数据写入:
1.1 Distributor负责接收日志数据,然后拆分为多个块,并行的发送给Ingester
1.2 Ingester接收Distributor发送的数据块,缓存在内存中, 同时定时刷写进持久化存储Chunk Store中
数据查询
Ingester接收Querier查询请求,根据块索引查询指定的块,如果内存中没有,将从持久化存储chunk Store中查找数据,并返回
本次部署使用单进程模式进行部署,通过复用“阿果阿郭”老师的部署方式进行单进程部署,仅作为学习复习使用,原文链接:k8s loki 容器日志解决方案-4. alertmanager 报警及loki rules - 哔哩哔哩
官网部署方式有:
请根据需要自行参考学习
部署如下:
安装supervisor
yum install epel-release -y
yum install supervisor -y
修改内存、进程、文件限制
sed -i '/forking/a LimitNOFILE=65536' /usr/lib/systemd/system/supervisord.service;
sed -i '/forking/a LimitNPROC=65536' /usr/lib/systemd/system/supervisord.service ;
sed -i '/forking/a LimitMEMLOCK=infinity' /usr/lib/systemd/system/supervisord.service ;启动服务
systemctl start supervisord.service
上传loki-linux-amd64.zip压缩包到/data/loki目录
解压文件
unzip loki-linux-amd64.zip
验证版本
./loki-linux-amd64 --versionsystemd管理Loki
cat </usr/lib/systemd/system/loki.service
[Unit]
Description=loki.service
After=rc-local.service nss-user-lookup.target[Service]
Type=simple
LimitMEMLOCK=infinity
LimitNPROC=65536
LimitNOFILE=65536
WorkingDirectory=/data/loki
ExecStart=/data/loki/loki-linux-amd64 -log.level=info -target all -config.file=loki-local-config.yaml[Install]
WantedBy=multi-user.target
EOFsupervisord管理Loki
cat </etc/supervisord.d/loki.ini
[program:loki]
command=/data/loki/loki-linux-amd64 -log.level=info -target all -config.file=loki-local-config.yaml
autorestart=true
autostart=true
stderr_logfile=/tmp/loki_err.log
stdout_logfile=/tmp/loki_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/loki
EOF配置Loki文件
cat </data/loki/loki-local-config.yaml
auth_enabled: false #是否启用认证。这里认证是针对多租户而言,这里我们使用单租户server:
http_listen_port: 3100
grpc_server_max_concurrent_streams: 0ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h
max_chunk_age: 1h
chunk_target_size: 10485760
chunk_retain_period: 30s
max_transfer_retries: 0schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
# 存储配置
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache #定义缓存地址
cache_ttl: 24h
shared_store: filesystem
filesystem:
directory: /data/loki/chunks #定义块地址compactor:
working_directory: /data/loki/boltdb-shipper-compactor #压缩位置
shared_store: filesystem
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_rate_mb: 200
# ingestion_burst_size_mb: 400
# max_streams_per_user: 0
# max_chunks_per_query: 20000000
# max_query_parallelism: 140
# max_query_series: 5000
# cardinality_limit: 1000000
# max_streams_matchers_per_query: 10000chunk_store_config:
max_look_back_period: 0s# 数据保留时间
table_manager:
retention_deletes_enabled: true
retention_period: 24hruler:
storage:
type: local
local:
directory: /data/loki/rules
rule_path: /data/loki/rules-temp
alertmanager_url: http://localhost:9093
ring:
kvstore:
store: inmemory
enable_api: trueEOF
启动Loki
supervisorctl status
supervisorctl update
supervisorctl status
mkdir /data/promtail/{bin,config,logs} -p
cd /data/promtail/bin
curl -O -L "https://github.com/grafana/loki/releases/download/v2.3.0/promtail-linux-amd64.zip"
unzip "promtail-linux-amd64.zip"
chmod a+x "promtail-linux-amd64"配置文件
cat << EOF > /data/promtail/config/promtail.conf
server: #promtail服务的server配置
http_listen_address: 0.0.0.0
http_listen_port: 19080
grpc_listen_port: 0positions:
filename: ./logs/loki_positions.yaml
ignore_invalid_yaml: trueclients: #定义Loki服务的地址
- url: http://127.0.0.1:3100/loki/api/v1/pushscrape_configs:
- job_name: service_log
file_sd_configs: #定义抓取的日志,通过文件实现服务发现
- files:
- ./config/*.yaml
refresh_interval: 1m
EOF配置supervisor管理程序
cat << EOF > /etc/supervisord.d/promtail.ini
[program:promtail]
command=/data/promtail/bin/promtail-linux-amd64 -config.expand-env=true -config.file=/data/promtail/config/promtail.conf
autorestart=true
autostart=true
stderr_logfile=/tmp/promtail_err.log
stdout_logfile=/tmp/promtail_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/promtail/
EOF定义收集日志配置
cat << EOF > /data/promtail/config/varlogmessage.yaml
- targets:
- localhost
labels:
__path__: /var/log/messages
env: {{ENV}}
hostname: {{BINDIP}}
service_name: var-log-messages
log_type: var-log-messages
- targets:
- localhost
labels:
__path__: /var/log/secure
env: {{ENV}}
hostname: {{BINDIP}}
service_name: var-log-secure
log_type: var-log-secure
EOF注意: env中变量使用的jinja2的语法
ENV=test
BINDIP=192.168.161.118
sed -i "s/{{ENV}}/$ENV/g" /data/promtail/config/varlogmessage.yaml
sed -i "s/{{BINDIP}}/$BINDIP/g" /data/promtail/config/varlogmessage.yaml
启动promtail
supervisorctl status
supervisorctl update
supervisorctl status验证Loki是否收集到日志
curl 127.0.0.1:3100/loki/api/v1/labels
curl 127.0.0.1:3100/loki/api/v1/label/service_name/values
curl 127.0.0.1:3100/loki/api/v1/label/filename/values
1.1 下载grafana二进制包
下载地址:wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.14.linux-amd64.tar.gz
建议国内下载:
https://mirrors.huaweicloud.com/grafana/8.5.9/grafana-enterprise-8.5.9.linux-amd64.tar.gz
tar xf grafana-enterprise-8.5.9.linux-amd64.tar.gz -C /data
cd /data/
mv grafana-8.5.9/ grafana配置supervisor管理grafana
cat <
/etc/supervisord.d/grafana.ini
[program:grafana]
command=/data/grafana/bin/grafana-server web
autorestart=true
autostart=true
stderr_logfile=/tmp/grafana_err.log
stdout_logfile=/tmp/grafana_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/grafana
EOF启动grafana
supervisorctl status
supervisorctl update
supervisorctl status
添加loki数据源
通过 Explore 查看 loki 数据
导入grafana loki dashboard 查看数据
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"target": {
"limit": 100,
"matchAny": false,
"tags": [],
"type": "dashboard"
},
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": 8,
"iteration": 1655978337467,
"links": [],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "${ENV}",
"fill": 1,
"fillGradient": 0,
"gridPos": {
"h": 5,
"w": 24,
"x": 0,
"y": 0
},
"hiddenSeries": false,
"id": 4,
"legend": {
"alignAsTable": true,
"avg": false,
"current": false,
"max": false,
"min": false,
"rightSide": true,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"nullPointMode": "null",
"options": {
"alertThreshold": true
},
"percentage": false,
"pluginVersion": "8.1.5",
"pointradius": 2,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum (count_over_time({service_name=~\"$app_name\",filename=~\"$log_type\",hostname=~\"$hostname\"}[2m] )) by (hostname)",
"hide": true,
"legendFormat": "",
"queryType": "randomWalk",
"refId": "A"
},
{
"expr": "sum (count_over_time({service_name=~\"$app_name\",filename=~\"$log_type\",hostname=~\"$hostname\"}[2m] )) by (hostname,filename)",
"hide": false,
"legendFormat": "{{hostname}}/{{filename}}",
"refId": "B"
}
],
"thresholds": [],
"timeFrom": null,
"timeRegions": [],
"timeShift": null,
"title": "日志量统计",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"hashKey": "object:319", "format": "short", "label": null, "logBase": 1, "max": null, "min": null, "show": true }, { "hashKey": "object:320",
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"datasource": "${ENV}",
"description": "",
"gridPos": {
"h": 21,
"w": 24,
"x": 0,
"y": 5
},
"id": 2,
"options": {
"dedupStrategy": "exact",
"enableLogDetails": false,
"prettifyLogMessage": false,
"showCommonLabels": false,
"showLabels": false,
"showTime": true,
"sortOrder": "Descending",
"wrapLogMessage": true
},
"pluginVersion": "7.4.3",
"targets": [
{
"expr": "{service_name=~\"$app_name\",filename=~\"$log_type\",hostname=~\"$hostname\"} |~ \"(?i)$log_level\"",
"maxLines": 1000,
"queryType": "randomWalk",
"refId": "A"
}
],
"timeFrom": null,
"timeShift": null,
"title": "日志",
"transparent": true,
"type": "logs"
}
],
"refresh": false,
"schemaVersion": 30,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "crm-cd",
"value": "crm-cd"
},
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "选择环境",
"multi": false,
"name": "ENV",
"options": [],
"query": "loki",
"queryValue": "",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
},
{
"allValue": null,
"current": {
"selected": true,
"text": "neo-pharma-service",
"value": "neo-pharma-service"
},
"datasource": "${ENV}",
"definition": "label_values({service_name=~\".+\"},service_name)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "服务名",
"multi": false,
"name": "app_name",
"options": [],
"query": "label_values({service_name=~\".+\"},service_name)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
},
{
"allValue": null,
"current": {
"selected": false,
"text": "/logs/gc.log",
"value": "/logs/gc.log"
},
"datasource": "${ENV}",
"definition": "label_values({service_name=\"$app_name\"}, filename)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "日志名",
"multi": false,
"name": "log_type",
"options": [],
"query": "label_values({service_name=\"$app_name\"}, filename)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"type": "query"
},
{
"allValue": ".*",
"current": {
"selected": true,
"text": "neo-pharma-service-7c87d876d5-js77h",
"value": "neo-pharma-service-7c87d876d5-js77h"
},
"datasource": "${ENV}",
"definition": "label_values({service_name=\"$app_name\",filename=\"$log_type\"}, hostname)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "主机名",
"multi": false,
"name": "hostname",
"options": [],
"query": "label_values({service_name=\"$app_name\",filename=\"$log_type\"}, hostname)",
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": "(^\\\\S|^\\\\s)",
"current": {
"selected": false,
"text": "All",
"value": "$__all"
},
"description": "可以直接输入搜索的关键字进行过滤",
"error": null,
"hide": 0,
"includeAll": true,
"label": "关键字过滤",
"multi": false,
"name": "log_level",
"options": [
{
"selected": true,
"text": "All",
"value": "$__all"
},
{
"selected": false,
"text": "warning",
"value": "warning"
},
{
"selected": false,
"text": "unknown",
"value": "unknown"
},
{
"selected": false,
"text": "info",
"value": "info"
},
{
"selected": false,
"text": "error",
"value": "error"
},
{
"selected": false,
"text": "直接输入关键字搜索",
"value": "直接输入关键字搜索"
}
],
"query": "warning,unknown,info,error,直接输入关键字搜索",
"queryValue": "",
"skipUrlSync": false,
"type": "custom"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "日志中心",
"uid": "NlV_8QD7k",
"version": 21
}
效果图
cd /data
tar xf alertmanager-0.24.0.linux-amd64.tar.gz
mv alertmanager-0.24.0.linux-amd64 alertmanager
配置supervisor管理alertmanager
cat </etc/supervisord.d/alertmanager.ini
[program:alertmanager]
command=/data/alertmanager/alertmanager
autorestart=true
autostart=true
stderr_logfile=/tmp/alertmanager_err.log
stdout_logfile=/tmp/alertmanager_out.log
user=root
stopsignal=INT
startsecs=10
startretries=3
directory=/data/alertmanager
EOF配置alertmanager配置文件
cat <
/data/alertmanager/alertmanager.yml global:
smtp_smarthost: 'smtp.qq.com:465' # smtp地址
smtp_from: '4506259@qq.com' # 谁发邮件
smtp_auth_username: '4507259@qq.com' # 邮箱用户
smtp_auth_password: 'gbrqbrcace' # 邮箱密码
smtp_require_tls: falsetemplates:
- '/usr/local/alertmanager/template/*.tmpl'route:
group_by: ["instance"] # 分组名
group_wait: 30s # 当收到告警的时候,等待三十秒看是否还有告警,如果有就一起发出去
group_interval: 5m # 发送警告间隔时间
repeat_interval: 3h # 重复报警的间隔时间
receiver: mail # 全局报警组,这个参数是必选的,和下面报警组名要相同receivers:
- name: 'mail' # 报警组名
email_configs:
- to: '187171160@163.com' # 发送给谁
send_resolved: true #
EOF
配置警报规则
cat <<'EOF'> /data/loki/rules/fake/rules.yaml
groups:
- name: service OutOfMemoryError
rules:
# 关键字监控
- alert: loki check words java.lang.OutOfMemoryError
expr: sum by (env, hostname, log_type, filename) (count_over_time({env=~"\\w+"} |= "java.lang.OutOfMemoryError" [5m]) > 0)
labels:
severity: critical
annotations:
description: '{{$labels.env}} {{$labels.hostname}} file {{$labels.filename}} has {{ $value }} error'
summary: java.lang.OutOfMemoryError
# java 程序日志性能报警
- alert: loki java full gc count check
expr: sum by (env, hostname, log_type, filename) (count_over_time({env=~"\\w+"} |= "Full GC (Allocation" [5m]) > 5)
labels:
severity: warning
annotations:
description: '{{$labels.env}} {{$labels.hostname}} {{$labels.filename}} {{ $value }}'
summary: java full gc count check
# 使用正则表达式报警匹配示例
- alert: dbperform slowlog sql 慢查询
expr: 'sum by (env, hostname, log_type, filename) (count_over_time({env=~"\\w+"} |~ "time: [1-9]\\d{4,}" [5m]) > 5)'
labels:
severity: warning
annotations:
description: '{{$labels.env}} {{$labels.hostname}} file {{$labels.filename}} has {{ $value }} error'
summary: sql slowlog
EOF
测试警报
echo 'The String object java.lang.OutOfMemoryError is used to represent and manipulate a sequence of characters.' >> /var/log/messages`
EFK:
1.1 Elasticsearch中的数据作为非结构化JSON对象存储在磁盘上。每个对象的键和每个键的内容都被索引。
然后可以使用JSON对象定义查询(称为查询DSL)或通过Lucene查询语言查询数据。
1.2 EFK使用fluentd作为日志收集器
Loki:
1.1 单进程模式将日志数据存储到磁盘中,微服务可扩展模式将数据存储在云存储中。日志通过标记标签,仅只有标签被索引,索引更少,成本更低
1.2 Loki使用promtail作为日志收集器。通过发现存储在磁盘上的日志文件, 并将它们与标签做关联,然后转发给Loki
Promtail可以充当Pod 的sidecar进行Pod的日志收集,以及从指定文件中读取日志、跟踪系统日志
参考文档:
k8s loki 容器日志解决方案-4. alertmanager 报警及loki rules - 哔哩哔哩