• pacemaker的HA集群无法启动listener监听故障的处理


    项目场景:

    提示:这里简述项目相关背景:

    之前帮客户在Oracle Linux 7.9下搭建了一套HA集群(pacemaker),最近做测试时候发现了一个集群切换后的BUG。
    因为软件安装完全在共享存储上,所以监听日志只配置vip即可。


    问题①描述(主机名已脱敏)

    提示:这里描述项目中遇到的问题:

    在实现切换后节点二的监听和数据库实例状态是STOP,HA提示比较明显:listener_orac_start_0 on cxl-pcs02 ‘unknown error’

    [root@cxl-pcs01 ~]# pcs status
    Cluster name: cluster01
    Stack: corosync
    Current DC: cxl-pcs01 (version 1.1.23-1.0.1.el7-9acf116022) - partition with quorum
    Last updated: Wed Sep 14 19:03:29 2022
    Last change: Wed Sep 14 18:40:13 2022 by root via cibadmin on cxl-pcs01
    
    2 nodes configured
    8 resource instances configured
    
    Online: [ cxl-pcs01 cxl-pcs02 ]
    
    Full list of resources:
    
     Resource Group: oracle
         clustervip01	(ocf::heartbeat:IPaddr2):	Started cxl-pcs02
         vg01	(ocf::heartbeat:LVM):	Started cxl-pcs02
         vg02	(ocf::heartbeat:LVM):	Started cxl-pcs02
         data1	(ocf::heartbeat:Filesystem):	Started cxl-pcs02
         data2	(ocf::heartbeat:Filesystem):	Started cxl-pcs02
         listener_orac	(ocf::heartbeat:oralsnr):	Stopped
         orac	(ocf::heartbeat:oracle):	Stopped
     sbd_fencing	(stonith:fence_sbd):	Started cxl-pcs01
    
    Failed Resource Actions:
    * listener_orac_start_0 on cxl-pcs01 'unknown error' (1): call=48, status=complete, exitreason='Listener listener appears to have started, but is not running properly: ',
        last-rc-change='Wed Sep 14 19:01:46 2022', queued=0ms, exec=60196ms
    * listener_orac_start_0 on cxl-pcs02 'unknown error' (1): call=44, status=complete, exitreason='Listener listener appears to have started, but is not running properly: ',
        last-rc-change='Wed Sep 14 19:00:38 2022', queued=0ms, exec=61821ms
    
    Daemon Status:
      corosync: active/enabled
      pacemaker: active/enabled
      pcsd: active/enabled
      sbd: active/enabled
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35

    拿到以上提示就去看看listener.ora配置,果然发现了问题,之前为了测试主机加了一行本机的IP地址

    [oracle@cxl-pcs01 admin]$ cat listener.ora
    # listener.ora Network Configuration File: /data1/app/oracle/product/11.2.0/db_1/network/admin/listener.ora
    # Generated by Oracle configuration tools.
    LISTENER =
      (DESCRIPTION_LIST =
        (DESCRIPTION =
          (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
          #本机IP:
          (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.84.173)(PORT = 1521)) 
          #虚拟IP:
          (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.125.231)(PORT = 1521))
        )
      )
    SID_LIST_LISTENER =
      (SID_LIST =
        (SID_DESC =
          (GLOBAL_DBNAME = orac)
          (ORACLE_HOME = /data1/app/oracle/product/11.2.0/db_1)
          (SID_NAME = orac)
        )
      )
    
    
    ADR_BASE_LISTENER = /data1/app/oracle
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24

    问题②描述(主机名已脱敏)

    当我去掉cxl-pcs01主机名的监听IP后还是有问题:VIP居然和物理IP不是一个网段的,之前没有考虑到此方面原因,遂将虚拟IP更改为了同网段IP地址

    [root@cxl-pcs01 ~]# pcs resource update clustervip01 Ipaddr2 ip=192.168.84.167 cidr_netmask=24 op monitor interval=30s
    
    [oracle@cxl-pcs01 admin]$ cat listener.ora
    # listener.ora Network Configuration File: /data1/app/oracle/product/11.2.0/db_1/network/admin/listener.ora
    # Generated by Oracle configuration tools.
    LISTENER =
      (DESCRIPTION_LIST =
        (DESCRIPTION =
          (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
          #虚拟IP:
          (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.84.167)(PORT = 1521))
        )
      )
    SID_LIST_LISTENER =
      (SID_LIST =
        (SID_DESC =
          (GLOBAL_DBNAME = orac)
          (ORACLE_HOME = /data1/app/oracle/product/11.2.0/db_1)
          (SID_NAME = orac)
        )
      )
    
    
    ADR_BASE_LISTENER = /data1/app/oracle
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24

    问题③描述(主机名已脱敏)

    但是还是不好使,没道理啊!我又删除了资源重新创建并更改了资源依赖和启动顺序,但和这个没啥关系

    [root@cxl-pcs01 ~]#pcs resource delete orac
    [root@cxl-pcs01 ~]#pcs resource delete listener_orac
    [root@cxl-pcs01 ~]# pcs resource create orac oracle sid="orac" --group=oracle
    Assumed agent name 'ocf:heartbeat:oracle' (deduced from 'oracle')
    [root@cxl-pcs01 ~]# pcs constraint colocation add orac with listener_orac
    [root@cxl-pcs01 ~]# pcs constraint order start listener_orac then start orac
    Adding listener_orac orac (kind: Mandatory) (Options: first-action=start then-action=start)
    
    [root@cxl-pcs01 ~]# pcs constraint show --full
    Location Constraints:
      Resource: clustervip01
        Enabled on: cxl-pcs01 (score:INFINITY) (role: Started) (id:cli-prefer-clustervip01)
      Resource: vg01
        Enabled on: cxl-pcs01 (score:INFINITY) (role: Started) (id:cli-prefer-vg01)
      Resource: vg02
        Enabled on: cxl-pcs01 (score:INFINITY) (role: Started) (id:cli-prefer-vg02)
    Ordering Constraints:
      start clustervip01 then start vg01 (kind:Mandatory) (id:order-clustervip01-vg01-mandatory)
      start clustervip01 then start vg02 (kind:Mandatory) (id:order-clustervip01-vg02-mandatory)
      start vg01 then start data1 (kind:Mandatory) (id:order-vg01-data1-mandatory)
      start vg02 then start data2 (kind:Mandatory) (id:order-vg02-data2-mandatory)
      start data1 then start orac (kind:Mandatory) (id:order-data1-listener_orac-mandatory)
      start orac then start listener_orac (kind:Mandatory) (id:order-listener_orac-orac-mandatory)
    Colocation Constraints:
      vg01 with clustervip01 (score:INFINITY) (id:colocation-vg01-clustervip01-INFINITY)
      vg02 with clustervip01 (score:INFINITY) (id:colocation-vg02-clustervip01-INFINITY)
      data1 with vg01 (score:INFINITY) (id:colocation-data1-vg01-INFINITY)
      data2 with vg02 (score:INFINITY) (id:colocation-data2-vg02-INFINITY)
      orac with data1 (score:INFINITY) (id:colocation-listener_orac-data1-INFINITY)
      listener_orac with orac (score:INFINITY) (id:colocation-orac-listener_orac-INFINITY)
    Ticket Constraints:
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31

    更改资源顺序后发现确实只是监听有问题,HA主动给停止了!!!我手动debug启动监听资源发现以下问题:
    > stderr: Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.125.231)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orac)))

    [root@cxl-pcs01 ~]# pcs resource debug-start listener_orac
    Operation start for listener_orac (ocf:heartbeat:oralsnr) returned: 'unknown error' (1)
     >  stderr: ocf-exit-reason:tnsping orac failed: 
     >  stderr: TNS Ping Utility for Linux: Version 11.2.0.4.0 - Production on 14-SEP-2022 18:34:36
     >  stderr: 
     >  stderr: Copyright (c) 1997, 2013, Oracle.  All rights reserved.
     >  stderr: 
     >  stderr: Used parameter files:
     >  stderr: /data1/app/oracle/product/11.2.0/db_1/network/admin/sqlnet.ora
     >  stderr: 
     >  stderr: 
     >  stderr: Used TNSNAMES adapter to resolve the alias
     >  stderr: Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.125.231)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = orac)))
     >  stderr: TNS-12535: TNS:operation timed out
     >  stderr: ocf-exit-reason:Listener listener appears to have started, but is not running properly: 
     >  stderr: LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 14-SEP-2022 18:34:36
     >  stderr: 
     >  stderr: Copyright (c) 1991, 2013, Oracle.  All rights reserved.
     >  stderr: 
     >  stderr: Starting /data1/app/oracle/product/11.2.0/db_1/bin/tnslsnr: please wait...
     >  stderr: 
     >  stderr: TNSLSNR for Linux: Version 11.2.0.4.0 - Production
     >  stderr: System parameter file is /data1/app/oracle/product/11.2.0/db_1/network/admin/listener.ora
     >  stderr: Log messages written to /data1/app/oracle/diag/tnslsnr/cxl-pcs01/listener/alert/log.xml
     >  stderr: Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
     >  stderr: Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.84.167)(PORT=1521)))
     >  stderr: 
     >  stderr: Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC1521)))
     >  stderr: STATUS of the LISTENER
     >  stderr: ------------------------
     >  stderr: Alias                     listener
     >  stderr: Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
     >  stderr: Start Date                14-SEP-2022 18:34:36
     >  stderr: Uptime                    0 days 0 hr. 0 min. 0 sec
     >  stderr: Trace Level               off
     >  stderr: Security                  ON: Local OS Authentication
     >  stderr: SNMP                      OFF
     >  stderr: Listener Parameter File   /data1/app/oracle/product/11.2.0/db_1/network/admin/listener.ora
     >  stderr: Listener Log File         /data1/app/oracle/diag/tnslsnr/cxl-pcs01/listener/alert/log.xml
     >  stderr: Listening Endpoints Summary...
     >  stderr:   (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
     >  stderr:   (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.84.167)(PORT=1521)))
     >  stderr: Services Summary...
     >  stderr: Service "orac" has 1 instance(s).
     >  stderr:   Instance "orac", status UNKNOWN, has 1 handler(s) for this service...
     >  stderr: The command completed successfully
     >  stderr: Last login: Wed Sep 14 18:28:25 CST 2022 on pts/0
     >  stderr: Sep 14 18:35:36 ERROR: Probable Oracle configuration error
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48

    我更改了网段,怎么还有125网段的IP提示,一拍脑袋,tnsname.ora里的配置没有改!
    修改后结果:

    [oracle@cxl-pcs01 admin]$ cat tnsnames.ora 
    # tnsnames.ora Network Configuration File: /data1/app/oracle/product/11.2.0/db_1/network/admin/tnsnames.ora
    # Generated by Oracle configuration tools.
    
    ORAC =
      (DESCRIPTION =
        (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.84.167)(PORT = 1521))
        (CONNECT_DATA =
          (SERVER = DEDICATED)
          (SERVICE_NAME = orac)
        )
      )
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12

    这次最后重启pacemaker服务后没问题了。


    原因分析:

    提示:这里填写问题的分析:

    在这里插入图片描述
    原因是pacemaker在启监听时候,oraLsnr会去tnsping服务名,如果不通就宕掉,主动关闭了监听服务。所以要更改tnsname.ora的配置文件。

  • 相关阅读:
    电池电动汽车 (BEV) 优化调度(Matlab代码实现)
    【正点原子STM32连载】第六章 新建寄存器版本MDK工程 摘自【正点原子】MiniPro STM32H750 开发指南_V1.1
    前端工作总结213-解决vuex刷新数据丢失
    初始Spring MVC
    HR领域有哪些含金量高的证书?这个证书了解一下!
    zblog插件大全-zblog免费插件
    如何通过Java8新特性Optional类来有效防止空指针异常
    新一代 L1 公链Aptos:安全、可扩展和可升级的Web3基础设施 |Tokenview
    STL容器 ——stack和queue的模拟实现
    Rollup打包导入assert、path、util、fs模块
  • 原文地址:https://blog.csdn.net/weixin_41607523/article/details/126868699