参考文档:
Auto disk management feature in Exadata (Doc ID 1484274.1)
EXADATA AUTO MANAGEMENT INITIATE DROP AND ADD OF THE GRIDDISKS (Doc ID 1599448.1)
上次对exadata换盘,没有太留意log信息。
这次对exadata换盘,留意了以下相关log信息。有一些xdwk进程。
换盘步骤:
周一; 在asm层面把grid disk drop掉
周五;硬盘到了以后,把坏盘拔掉,插上新盘。
在周一正常drop掉griddisk后,asm层面的alert log中有以下提示,大约5分钟一次。
- Thu Nov 03 22:50:43 2022
- XDWK started with pid=40, OS id=382866
- Thu Nov 03 23:05:46 2022
- Starting background process XDWK
- Thu Nov 03 23:05:46 2022
- XDWK started with pid=40, OS id=31173
- Thu Nov 03 23:20:49 2022
- Starting background process XDWK
- Thu Nov 03 23:20:49 2022
- XDWK started with pid=40, OS id=79593
- Thu Nov 03 23:35:52 2022
- Starting background process XDWK
- Thu Nov 03 23:35:52 2022
- XDWK started with pid=40, OS id=127598
- Thu Nov 03 23:50:55 2022
- Starting background process XDWK
- Thu Nov 03 23:50:55 2022
- XDWK started with pid=40, OS id=173112
- Fri Nov 04 00:05:58 2022
- Starting background process XDWK
- Fri Nov 04 00:05:58 2022
同时,在XDWK的trace文件中发现以下信息:
- *** 2022-11-02 11:30:33.033
- *** SESSION ID:(788.329) 2022-11-02 11:30:33.033
- *** CLIENT ID:() 2022-11-02 11:30:33.033
- *** SERVICE NAME:() 2022-11-02 11:30:33.033
- *** MODULE NAME:() 2022-11-02 11:30:33.033
- *** ACTION NAME:() 2022-11-02 11:30:33.033
-
- 2022-11-02 11:30:33.032987 : kxdam_is_disk_offline: Operation ID: 614109: in diskgroup Failed.
- SQL : /* Exadata Auto Mgmt: Is Disk in the given MODE_STATUS */
- select count(disk_number) from v$asm_disk_stat
- where
- name='DATA_PROD_CD_02_PRODCEL03'
- and
- mode_status='OFFLINE'
- and
- group_number in
- (
- select group_number from v$asm_diskgroup_stat
- where
- name='DATA_PROD'
- and
- state in ('MOUNTED', 'RESTRICTED')
- )
-
- Cause : Disk not found in offline state.
-
- Action : Check if disk has been dropped from the diskgroup.
- If so manually add disk back to the diskgroup.
- Ignore this error if disk is part of the diskgroup.
同时,手工查询这些语句,没有返回值。
- [grid@xxxx01 trace]$ sqlplus /nolog
-
- SQL*Plus: Release 11.2.0.4.0 Production on Wed Nov 2 11:37:22 2022
- Copyright (c) 1982, 2013, Oracle. All rights reserved.
-
- SQL> conn / as sysasm
- Connected.
- SQL> select count(disk_number) from v$asm_disk_stat
- 2 where
- 3 name='RECO_PROD_CD_02_PRODCEL03'
- 4 and
- 5 (path='o/192.168.0.5/RECO_PROD_CD_02_prodcel03' or mode_status='OFFLINE')
- 6 and
- 7 group_number in
- 8 (
- 9 select group_number from v$asm_diskgroup_stat
- 10 where
- 11 name='RECO_PROD'
- 12 and
- 13 state in ('MOUNTED', 'RESTRICTED')
- 14 );
-
- COUNT(DISK_NUMBER)
- ------------------
- 0
-
- SQL>
通过查询MOS,MOS上关于XDWK进程是这样说明的:(看下面的文档,会自动把盘添加到ASM里面去,实际上并没有自动添加到ASM里面去,需要手工添加到ASM里面。可能与人工drop掉磁盘有关,而不是由exadata自动drop掉由问题的盘)
3. Automatic Storage Management
The Automatic Storage Management (ASM) instance runs on the compute (database) node and has two processes that implement the auto disk management feature:
当拔出磁盘后,ASM的alert log中,不在显示XDWK的信息。
接下来换盘
换盘后,在cell存储的alert.log中有以下信息:可以看到Grid Disk自动创建了。
- 2022-11-04T09:37:34.508930+08:00
- create CELLDISK CD_02_prodcel03 on device /dev/sds
- 2022-11-04T09:37:34.604668+08:00
- create GRIDDISK DATA_prod_CD_02_prodcel03 on CELLDISK CD_02_prodcel03 type 0
- GridDisk name=DATA_prod_CD_02_prodcel03 guid=5d3d5a26-b1fd-44c2-9e97-5cd3d640a107 (2749608172) status=GDISK_ACTIVE
- 2022-11-04T09:37:34.655927+08:00
- create GRIDDISK RECO_prod_CD_02_prodcel03 on CELLDISK CD_02_prodcel03 type 0
- GridDisk name=RECO_prod_CD_02_prodcel03 guid=7743d6e9-1537-4bd8-bc6c-a43a10985ede (3983708148) status=GDISK_ACTIVE
- 2022-11-04T09:37:34.685691+08:00
- create GRIDDISK DBFS_DG_CD_02_prodcel03 on CELLDISK CD_02_prodcel03 type 0
- GridDisk name=DBFS_DG_CD_02_prodcel03 guid=60463395-c176-4edd-8d57-875900f06b9e (4287003124) status=GDISK_ACTIVE
接下来,将griddisk添加到ASM层面,结束。
END