Prometheus监控平台与Grafana可视化平台使用
查看正在运行的端口,避免端口冲突
netstat -ntlp
开启端口
firewall-cmd --zone=public --add-port=9100/tcp --permanent
firewall-cmd --reload
下载地址:https://prometheus.io/download/
Github地址:https://github.com/prometheus/prometheus
本文通过二进制文件安装,下载:prometheus-2.37.0.linux-amd64.tar.gz
安装也很简单,直接解压到部署目录,可以直接命令运行
tar -zxvf prometheus-2.37.0.linux-amd64.tar.gz -C /usr/local
ln -sv /usr/local/prometheus-2.37.0.linux-amd64 /usr/local/prometheus
cd /usr/local/prometheus
./prometheus # 运行
访问http://192.168.28.131:9090/
相关启动参数可以通过以下命令查询
./prometheus -h
usage: prometheus [<flags>]
The Prometheus monitoring server
Flags:
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--version Show application version.
--config.file="prometheus.yml"
Prometheus configuration file path.
--web.listen-address="0.0.0.0:9090"
Address to listen on for UI, API, and
telemetry.
--web.config.file="" [EXPERIMENTAL] Path to configuration file that
can enable TLS or authentication.
--web.read-timeout=5m Maximum duration before timing out read of the
request, and closing idle connections.
--web.max-connections=512 Maximum number of simultaneous connections.
--web.external-url=<URL> The URL under which Prometheus is externally
reachable (for example, if Prometheus is served
via a reverse proxy). Used for generating
relative and absolute links back to Prometheus
itself. If the URL has a path portion, it will
be used to prefix all HTTP endpoints served by
Prometheus. If omitted, relevant URL components
will be derived automatically.
--web.route-prefix=<path> Prefix for the internal routes of web
endpoints. Defaults to path of
--web.external-url.
--web.user-assets=<path> Path to static asset directory, available at
/user.
--web.enable-lifecycle Enable shutdown and reload via HTTP request.
--web.enable-admin-api Enable API endpoints for admin control actions.
--web.enable-remote-write-receiver
Enable API endpoint accepting remote write
requests.
--web.console.templates="consoles"
Path to the console template directory,
available at /consoles.
--web.console.libraries="console_libraries"
Path to the console library directory.
--web.page-title="Prometheus Time Series Collection and Processing Server"
Document title of Prometheus instance.
--web.cors.origin=".*" Regex for CORS origin. It is fully anchored.
Example: 'https?://(domain1|domain2)\.com'
--storage.tsdb.path="data/"
Base path for metrics storage. Use with server
mode only.
--storage.tsdb.retention=STORAGE.TSDB.RETENTION
[DEPRECATED] How long to retain samples in
storage. This flag has been deprecated, use
"storage.tsdb.retention.time" instead. Use with
server mode only.
--storage.tsdb.retention.time=STORAGE.TSDB.RETENTION.TIME
How long to retain samples in storage. When
this flag is set it overrides
"storage.tsdb.retention". If neither this flag
nor "storage.tsdb.retention" nor
"storage.tsdb.retention.size" is set, the
retention time defaults to 15d. Units
Supported: y, w, d, h, m, s, ms. Use with
server mode only.
--storage.tsdb.retention.size=STORAGE.TSDB.RETENTION.SIZE
Maximum number of bytes that can be stored for
blocks. A unit is required, supported units: B,
KB, MB, GB, TB, PB, EB. Ex: "512MB". Based on
powers-of-2, so 1KB is 1024B. Use with server
mode only.
--storage.tsdb.no-lockfile
Do not create lockfile in data directory. Use
with server mode only.
--storage.tsdb.allow-overlapping-blocks
Allow overlapping blocks, which in turn enables
vertical compaction and vertical query merge.
Use with server mode only.
--storage.tsdb.head-chunks-write-queue-size=0
Size of the queue through which head chunks are
written to the disk to be m-mapped, 0 disables
the queue completely. Experimental. Use with
server mode only.
--storage.agent.path="data-agent/"
Base path for metrics storage. Use with agent
mode only.
--storage.agent.wal-compression
Compress the agent WAL. Use with agent mode
only.
--storage.agent.retention.min-time=STORAGE.AGENT.RETENTION.MIN-TIME
Minimum age samples may be before being
considered for deletion when the WAL is
truncated Use with agent mode only.
--storage.agent.retention.max-time=STORAGE.AGENT.RETENTION.MAX-TIME
Maximum age samples may be before being
forcibly deleted when the WAL is truncated Use
with agent mode only.
--storage.agent.no-lockfile
Do not create lockfile in data directory. Use
with agent mode only.
--storage.remote.flush-deadline=<duration>
How long to wait flushing sample on shutdown or
config reload.
--storage.remote.read-sample-limit=5e7
Maximum overall number of samples to return via
the remote read interface, in a single query. 0
means no limit. This limit is ignored for
streamed response types. Use with server mode
only.
--storage.remote.read-concurrent-limit=10
Maximum number of concurrent remote read calls.
0 means no limit. Use with server mode only.
--storage.remote.read-max-bytes-in-frame=1048576
Maximum number of bytes in a single frame for
streaming remote read response types before
marshalling. Note that client might have limit
on frame size as well. 1MB as recommended by
protobuf by default. Use with server mode only.
--rules.alert.for-outage-tolerance=1h
Max time to tolerate prometheus outage for
restoring "for" state of alert. Use with server
mode only.
--rules.alert.for-grace-period=10m
Minimum duration between alert and restored
"for" state. This is maintained only for alerts
with configured "for" time greater than grace
period. Use with server mode only.
--rules.alert.resend-delay=1m
Minimum amount of time to wait before resending
an alert to Alertmanager. Use with server mode
only.
--alertmanager.notification-queue-capacity=10000
The capacity of the queue for pending
Alertmanager notifications. Use with server
mode only.
--query.lookback-delta=5m The maximum lookback duration for retrieving
metrics during expression evaluations and
federation. Use with server mode only.
--query.timeout=2m Maximum time a query may take before being
aborted. Use with server mode only.
--query.max-concurrency=20
Maximum number of queries executed
concurrently. Use with server mode only.
--query.max-samples=50000000
Maximum number of samples a single query can
load into memory. Note that queries will fail
if they try to load more samples than this into
memory, so this also limits the number of
samples a query can return. Use with server
mode only.
--enable-feature= ... Comma separated feature names to enable. Valid
options: agent, exemplar-storage,
expand-external-labels,
memory-snapshot-on-shutdown,
promql-at-modifier, promql-negative-offset,
promql-per-step-stats, remote-write-receiver
(DEPRECATED), extra-scrape-metrics,
new-service-discovery-manager, auto-gomaxprocs.
See
https://prometheus.io/docs/prometheus/latest/feature_flags/
for more details.
--log.level=info Only log messages with the given severity or
above. One of: [debug, info, warn, error]
--log.format=logfmt Output format of log messages. One of: [logfmt,
json]
Prometheus的Web UI和相关Exporter组件都是默认是允许所有人直接访问的。
生成帐号和密钥
# 安装https-tools
[root]# yum install -y httpd-tools
# 使用httpd-tools内的htpasswd生成密钥
[root]# htpasswd -nbBC 12 penngo 123456
penngo:$2y$12$HBw06HgxQlm3z6I85OPH.eNqeUCbqP.w7xFnb0ch60RcK9p3ZFLea # 密码123对应的密钥,在config.yml文件中使用
配置用户信息文件
vi /usr/local/prometheus-2.37.0.linux-amd64/config.yml
# config.yml文件内容
basic_auth_users:
# 可配置多个用户
penngo: $2y$12$PoUH4HDg3hxWqqcrfWDUB.f52O/oW0J6wRP5/Epwf5k2qd0XNhFVe
可以在启动参数中添加参数–web.config.file=/usr/local/prometheus/config.yml,限制必须登录才能访问Prometheus的Web UI
vi /usr/lib/systemd/system/prometheus.service
[Unit]
Description=Prometheus server daemon
After=network.target
[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/prometheus/prometheus \
--config.file=/usr/local/prometheus/prometheus.yml \
--web.config.file=/usr/local/prometheus/config.yml \
--web.enable-lifecycle \ # curl http://127.0.0.1:9090/-/reload 重新加载配置
--storage.tsdb.path=/usr/local/prometheus/data \
--storage.tsdb.retention=15d \
--web.console.templates=/usr/local/prometheus/consoles \
--web.console.libraries=/usr/local/prometheus/console_libraries \
--web.max-connections=512 \
--web.external-url=http://192.168.28.131:9090 \
--web.listen-address=0.0.0.0:9090
Restart=on-failure
[Install]
WantedBy=multi-user.target
prometheus服务命令
systemctl daemon-reload # 通知systemd重新加载配置文件
systemctl enable prometheus # 设置开机启动
systemctl disable prometheus # 取消开机启动
systemctl start prometheus # 启动服务
systemctl restart prometheus # 重启服务
systemctl stop prometheus # 关闭服务
systemctl status prometheus # 查看状态
下载地址:https://grafana.com/grafana/download
Linux系统下的安装方法
Ubuntu and Debian(64 Bit)
sudo apt-get install -y adduser libfontconfig1
wget https://dl.grafana.com/enterprise/release/grafana-enterprise_8.5.11_amd64.deb
sudo dpkg -i grafana-enterprise_8.5.11_amd64.deb
Read the Ubuntu / Debian installation guide for more information. We also provide an APT package repository.
Standalone Linux Binaries(64 Bit)
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.11.linux-amd64.tar.gz
tar -zxvf grafana-enterprise-8.5.11.linux-amd64.tar.gz
Red Hat, CentOS, RHEL, and Fedora(64 Bit)
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.11-1.x86_64.rpm
sudo yum install grafana-enterprise-8.5.11-1.x86_64.rpm
OpenSUSE and SUSE
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.11-1.x86_64.rpm
sudo rpm -i --nodeps grafana-enterprise-8.5.11-1.x86_64.rpm
wget https://dl.grafana.com/enterprise/release/grafana-enterprise-8.5.11-1.x86_64.rpm
sudo yum install grafana-enterprise-8.5.11-1.x86_64.rpm
systemctl daemon-reload # 通知systemd重新加载配置文件
systemctl enable grafana-server # 设置开机启动
systemctl disable grafana-server # 取消开机启动
systemctl start grafana-server # 启动服务
systemctl stop grafana-server # 关闭服务
systemctl status grafana-server # 查看状态
ps -ef | grep grafana #查看启动情况:
二进制文件安装位置:/usr/sbin/grafana-server
启动脚本文件:/etc/init.d/grafana-server
默认环境变量文件:/etc/sysconfig/grafana-server
默认配置文件:/etc/grafana/grafana.ini
systemd服务用进程名称:grafana-server.service
默认日志文件:/var/log/grafana/grafana.log
默认指定sqlite3数据库文件:/var/lib/grafana/grafana.db
访问:http://127.0.0.1:3000,输入默认用户名/密码:admin/admin。
prometheus提供两种方式集成
客户端库集成:https://prometheus.io/docs/instrumenting/clientlibs/
通过不同语言的客户端库,可以非常方便的把各种应用系统接入prometeus的监控。
Exporter集成:https://prometheus.io/docs/instrumenting/exporters/
现成监控组件,提供对数据库、硬件、消息队列、存储、HTTP服务、API服务、日志等的监控。
这两种集成方式都同时有官方提供和社区提供。
下载地址:https://prometheus.io/download/
Github:https://github.com/prometheus/node_exporter
本文下载:node_exporter-1.3.1.linux-amd64.tar.gz
# 解压到指定目录
tar -zxvf node_exporter-1.3.1.linux-amd64.tar.gz -C /usr/local
# 启动
/usr/local/node_exporter-1.3.1.linux-amd64/node_explorter --web.listen-address=":9100"
创建系统服务
vi /usr/lib/systemd/system/node_exporter.service
# node_exporter.service文件内容
[Unit]
Description=node_exporter
Wants=network-online.target
After=network-online.target
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/local/node_exporter-1.3.1.linux-amd64/node_exporter
[Install]
WantedBy=multi-user.target
node_exporter服务命令
systemctl daemon-reload # 通知systemd重新加载配置文件
systemctl enable node_exporter # 设置开机启动
systemctl disable node_exporter # 取消开机启动
systemctl start node_exporter # 启动服务
systemctl stop node_exporter # 关闭服务
systemctl status node_exporter # 查看状态
本地查看监控参数
curl http://127.0.0.1:9100/metrics
修改prometheus.yml
vi /usr/local/prometheus-2.37.0.linux-amd64/prometheus.yml
prometheus.yml配置
# 全局配置
global:
scrape_interval: 15s # 设置采集时间为15秒,默认为1分钟。
evaluation_interval: 15s # 每15秒评估一次规则。默认为1分钟。
# scrape_timeout 设置为全局默认值(10s)。
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=` to any timeseries scraped from this config.
- job_name: "prometheus"
basic_auth:
username: penngo
password: 123456
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
# 添加以下配置,与prometheus集成
- job_name: 'node_expporter'
static_configs:
- targets: ['192.168.28.136:9100']
Github:https://github.com/prometheus-community/windows_exporter
本文不介绍,需要集成的去windows_exporter官网查看文档。
使用主机的监控模板:https://grafana.com/grafana/dashboards/16098-1-node-exporter-for-prometheus-dashboard-cn-0417-job/