node_exporter-1.4.0.linux-amd64.tar.gz
在被监控的机器上面安装,多台服务器的话,就安装多个node_exporter,并启动完成
启动node_exporter:[root@centos7 node_exporter-1.4.0.linux-amd64]# nohup ./node_exporter &
[root@centos7 node_exporter-1.4.0.linux-amd64]# nohup node_exporter &
[1] 30117
[root@centos7 node_exporter-1.4.0.linux-amd64]# nohup: 忽略输入并把输出追加到"nohup.out"
nohup: 无法运行命令"node_exporter": 没有那个文件或目录
无法运行命令"node_exporter": 没有那个文件或目录:通常应该是有重复开启了,先查询一下ps -ef |grep node
prometheus-2.40.3.linux-amd64.tar.gz ,解压后,需要修改prometheus.yml文件,修改的时候需要注意job_name与static_configs之间的缩进,不然会报错的!
yaml: unmarshal errors:\n line 30: field static_configs already set in type config.ScrapeConfig\n line 33: field static_configs already set in type config.ScrapeConfig
prometheus.yml文件修改如下,配置要监听的服务器地址,如果有多个服务器地址: - targets: ["l192.168.1.156:9100",192.168.1.157:9200]
static_configs:
- targets: ["localhost:9090"]
- job_name: "node"
static_configs:
- targets: ["192.168.1.156:9100"]
- job_name: "jvm"
static_configs:
- targets: ["192.168.1.156:3388"]
- # my global config
- global:
- scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
- evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
- # scrape_timeout is set to the global default (10s).
-
- # Alertmanager configuration
- alerting:
- alertmanagers:
- - static_configs:
- - targets:
- # - alertmanager:9093
-
- # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
- rule_files:
- # - "first_rules.yml"
- # - "second_rules.yml"
-
- # A scrape configuration containing exactly one endpoint to scrape:
- # Here it's Prometheus itself.
- scrape_configs:
- # The job name is added as a label `job=
` to any timeseries scraped from this config. - - job_name: "prometheus"
- # metrics_path defaults to '/metrics'
- # scheme defaults to 'http'.
- static_configs:
- - targets: ["localhost:9090"]
- - job_name: "node"
- static_configs:
- - targets: ["192.168.1.156:9100"]
- - job_name: "jvm"
- static_configs:
- - targets: ["192.168.1.156:3388"]
编辑完prometheus.yml后,启动prometheus,没有报错就可以了,用nohup ./prometheus & 后台启动
- [root@centos7 prometheus-2.40.3.linux-amd64]# vim ./prometheus.yml
- [root@centos7 prometheus-2.40.3.linux-amd64]# ./prometheus
- ts=2022-11-27T18:14:06.505Z caller=main.go:512 level=info msg="No time or size retention was set so using the default time retention" duration=15d
- ts=2022-11-27T18:14:06.505Z caller=main.go:556 level=info msg="Starting Prometheus Server" mode=server version="(version=2.40.3, branch=HEAD, revision=84e95d8cbc51b89f1a69b25dd239cae2a44cb6c1)"
- ts=2022-11-27T18:14:06.505Z caller=main.go:561 level=info build_context="(go=go1.19.3, user=root@72aff466572b, date=20221124-09:08:44)"
- ts=2022-11-27T18:14:06.505Z caller=main.go:562 level=info host_details="(Linux 3.10.0-1160.80.1.el7.x86_64 #1 SMP Tue Nov 8 15:48:59 UTC 2022 x86_64 centos7 (none))"
- ts=2022-11-27T18:14:06.505Z caller=main.go:563 level=info fd_limits="(soft=4096, hard=4096)"
- ts=2022-11-27T18:14:06.505Z caller=main.go:564 level=info vm_limits="(soft=unlimited, hard=unlimited)"
- ts=2022-11-27T18:14:06.506Z caller=web.go:559 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
- ts=2022-11-27T18:14:06.507Z caller=main.go:993 level=info msg="Starting TSDB ..."
- ts=2022-11-27T18:14:06.509Z caller=head.go:562 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
- ts=2022-11-27T18:14:06.509Z caller=head.go:606 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=4.645µs
- ts=2022-11-27T18:14:06.509Z caller=head.go:612 level=info component=tsdb msg="Replaying WAL, this may take a while"
- ts=2022-11-27T18:14:06.512Z caller=tls_config.go:232 level=info component=web msg="Listening on" address=[::]:9090
- ts=2022-11-27T18:14:06.512Z caller=tls_config.go:235 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
- ts=2022-11-27T18:14:06.513Z caller=head.go:683 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=1
- ts=2022-11-27T18:14:06.514Z caller=head.go:683 level=info component=tsdb msg="WAL segment loaded" segment=1 maxSegment=1
- ts=2022-11-27T18:14:06.514Z caller=head.go:720 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=14.183µs wal_replay_duration=4.687682ms wbl_replay_duration=121ns total_replay_duration=4.719326ms
- ts=2022-11-27T18:14:06.515Z caller=main.go:1014 level=info fs_type=XFS_SUPER_MAGIC
- ts=2022-11-27T18:14:06.515Z caller=main.go:1017 level=info msg="TSDB started"
- ts=2022-11-27T18:14:06.515Z caller=main.go:1197 level=info msg="Loading configuration file" filename=prometheus.yml
- ts=2022-11-27T18:14:06.519Z caller=main.go:1234 level=info msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=3.68688ms db_storage=2.02µs remote_storage=2.83µs web_handler=189ns query_engine=433ns scrape=3.42652ms scrape_sd=33.153µs notify=16.474µs notify_sd=4.906µs rules=2.199µs tracing=10.114µs
- ts=2022-11-27T18:14:06.519Z caller=main.go:978 level=info msg="Server is ready to receive web requests."
- ts=2022-11-27T18:14:06.519Z caller=manager.go:944 level=info component="rule manager" msg="Starting rule manager..."
再重启以下grafana
[root@centos7 prometheus-2.40.3.linux-amd64]# systemctl restart grafana-server
选择prometheus数据源,以及配置好grafana服务地址


导入一个模板:11074

数据源prometheus

Grafana监控平台结果

top:java进程的%CPU高,us用户态比sy系统态要高,所有java的us用户态导致,系统负载升高

下载arthas-boot.jar,然后用java -jar的方式启动:
将arthas-boot.jar下载到指定文件夹中,并启动java -jar arthas-boot.jar,执行后会显示目前系统中的java项目有哪些,并以数字开头排序好,我们只需要输入对应项目的编号就可以了
- curl -O https://arthas.aliyun.com/arthas-boot.jar
- java -jar arthas-boot.jar
启动完成后,自动进入arthas监控工具,输入help回车,查看帮助文档
heapdump,Heap dump(堆)
thread --help,具体查看使用方法

thread,线程,Display thread info, thread stack(栈
线程,程序里面真正干活的
线程栈,每个干活的线程具体的工作是什么
thread 显示所有线程
thread -n 5,,显示占用多资源的前5个线程和线程栈信息,发现有问题,UserController.java