• 验证ADG的坏块检测和自动修复


    环境: Oracle 19c ADG(主库:单实例;备库:RAC)

    1.主库新建测试文件

    主库在AWR的PDB中做测试,为了不影响其他测试,创建一个新的测试表空间tbs_test及对应数据文件:
    SQL> conn awr@awr
    Enter password:
    Connected.
    SQL> create tablespace tbs_test datafile '/flash/oradata/DEMO/awr/tbs_test01.dbf' size 30m;
    
    Tablespace created.
    

    2.主库创建测试表

    主库在新建表空间上创建测试表awr.test:
    SQL> create table awr.test tablespace tbs_test as select * from dba_users;
    
    Table created.
    
    SQL> select count(*) from awr.test;
    
      COUNT(*)
    ----------
    	37
    

    3.查询表对应数据文件信息

    通过dbms_rowid查看awr.test表对应行数据的文件号(rel_fno)、块号(blockno,)和行号(rowno):
    select rowid, 
         dbms_rowid.rowid_relative_fno(rowid) rel_fno,        
         dbms_rowid.rowid_block_number(rowid) blockno,  
         dbms_rowid.rowid_row_number(rowid) rowno 
    from awr.test    
    order by rowid;
    
    ROWID		      REL_FNO	 BLOCKNO      ROWNO
    ------------------ ---------- ---------- ----------
    AAATR2AAdAAAACDAAA	   29	     131	  0
    AAATR2AAdAAAACDAAB	   29	     131	  1
    AAATR2AAdAAAACDAAC	   29	     131	  2
    AAATR2AAdAAAACDAAD	   29	     131	  3
    AAATR2AAdAAAACDAAE	   29	     131	  4
    AAATR2AAdAAAACDAAF	   29	     131	  5
    AAATR2AAdAAAACDAAG	   29	     131	  6
    AAATR2AAdAAAACDAAH	   29	     131	  7
    AAATR2AAdAAAACDAAI	   29	     131	  8
    AAATR2AAdAAAACDAAJ	   29	     131	  9
    AAATR2AAdAAAACDAAK	   29	     131	 10
    ...
    

    4.模拟数据文件物理坏块

    使用dd模拟数据文件的物理坏块:
    dd if=/dev/zero of=/flash/oradata/DEMO/awr/tbs_test01.dbf bs=8192 conv=notrunc seek=131 count=1
    

    5.查询对应测试表

    再次查询被破坏数据文件上的表awr.test,发现客户端只是卡顿一下就正常出了结果,并没有任何显示的报错:
    ALTER SYSTEM Flush buffer_cache;
    select count(*) from awr.test;
    

    6.进一步查询日志信息

    上面查询表没有报错,但是从主库的alert日志中可以看到:
    2023-04-03T12:08:02.602504+08:00
    AWR(6):create tablespace tbs_test datafile '/flash/oradata/DEMO/awr/tbs_test01.dbf' size 30m
    AWR(6):Completed: create tablespace tbs_test datafile '/flash/oradata/DEMO/awr/tbs_test01.dbf' size 30m
    2023-04-03T12:15:21.021834+08:00
    AWR(6):ALTER SYSTEM: Flushing buffer cache inst=0 container=6 global
    AWR(6):TABLE AUDSYS.AUD$UNIFIED: ADDED INTERVAL PARTITION SYS_P368 (106) VALUES LESS THAN (TIMESTAMP' 2023-05-01 00:00:00')
    2023-04-03T12:15:24.751443+08:00
    AWR(6):Hex dump of (file 29, block 131) in trace file /u01/app/oracle/diag/rdbms/demo/demo/trace/demo_ora_11735.trc
    AWR(6):
    AWR(6):Corrupt block relative dba: 0x07400083 (file 29, block 131)
    AWR(6):Completely zero block found during multiblock buffer read
    AWR(6):
    AWR(6):Reading datafile '/flash/oradata/DEMO/awr/tbs_test01.dbf' for corrupt data at rdba: 0x07400083 (file 29, block 131)
    AWR(6):Reread (file 29, block 131) found same corrupt data (no logical check)
    AWR(6):Starting background process ABMR
    2023-04-03T12:15:24.763408+08:00
    Corrupt Block Found
             TIME STAMP (GMT) = 04/03/2023 12:15:24
             CONT = 6, TSN = 5, TSNAME = TBS_TEST
             RFN = 29, BLK = 131, RDBA = 121634947
             OBJN = 78966, OBJD = 78966, OBJECT = TEST, SUBOBJECT =
             SEGMENT OWNER = AWR, SEGMENT TYPE = Table Segment
    2023-04-03T12:15:24.766521+08:00
    ABMR started with pid=132, OS id=11983
    2023-04-03T12:15:24.767751+08:00
    Automatic block media recovery service is active.
    2023-04-03T12:15:24.767981+08:00
    AWR(6):Automatic block media recovery requested for (file# 29, block# 131)
    2023-04-03T12:15:27.096763+08:00
    Automatic block media recovery successful for (file# 29, block# 131)
    2023-04-03T12:15:27.097189+08:00
    AWR(6):Automatic block media recovery successful for (file# 29, block# 131)
    

    日志中显示自动启用了ABMR(Automatic block media recovery)成功修复了物理坏块。

    7.确认当前参数设置

    如果查询 db_block_checking 、 db_lost_write_protect 这些参数,会发现我这里并没有去特殊设置:
    SQL> show parameter db_block
    
    NAME				     TYPE	 VALUE
    ------------------------------------ ----------- ------------------------------
    db_block_buffers		     integer	 0
    db_block_checking		     string	 FALSE
    db_block_checksum		     string	 TYPICAL
    db_block_size			     integer	 8192
    SQL> show parameter db_lost
    
    NAME				     TYPE	 VALUE
    ------------------------------------ ----------- ------------------------------
    db_lost_write_protect		     string	 NONE
    SQL>
    

    那么那些参数的意义呢?其实MOS文档:

    • Best Practices for Corruption Detection, Prevention, and Automatic Repair - in a Data Guard Configuration (Doc ID 1302539.1)

    文档中有说明,物理坏块默认ADG就能检测,逻辑坏块要配合这些参数设置。包括上一步的日志信息中,在发现数据损坏时,也标注了(no logical check)非逻辑检查的提示。

    当然,如果您想要获得更全面的保护,还是要按文档说明,额外设置这些参数。

  • 相关阅读:
    3.程序控制
    时间轴-新年倒计时(实操java)
    ssm+微信小程序网易云音乐设计与实现毕业设计源码261620
    阿里云无影研发负责人任晋奎:无影核心技术能力突破,打造云网端融合的全新体验
    (科目三)数据库基础知识
    Java实现登录功能(一)
    nginx配置proxy_pass之后返回404问题及Nginx host相关变量说明
    打造个人的NAS云存储-通过Nextcloud搭建私有云盘实现公网远程访问
    【深入理解TcaplusDB技术】如何实现Tmonitor单机安装
    图数据挖掘(二):网络的常见度量属性
  • 原文地址:https://www.cnblogs.com/jyzhao/p/17282819.html