您现在的位置是:首页 > 编程 > 

记一次:RAC 扩容ASM空间发生的意外

2025-07-27 19:19:52
记一次:RAC 扩容ASM空间发生的意外 背景  最近集归档目录(ARCH)80%报警,随着业务交易的突增归档量由原来的+增涨为150G,因此对ARCH目录再扩容500GB。次扩容操作都没出个问题,这次差点就载了。RAC集扩容归档空间-ARCH一、存储划分500G arch2空间给RAC集二、RAC 集节点均操作1、刷新发现新磁盘代码语言:javascript代码运行次数:0运行复

记一次:RAC 扩容ASM空间发生的意外

背景

  最近集归档目录(ARCH)80%报警,随着业务交易的突增归档量由原来的+增涨为150G,因此对ARCH目录再扩容500GB。次扩容操作都没出个问题,这次差点就载了。

RAC集扩容归档空间-ARCH

一、存储划分500G arch2空间给RAC集
二、RAC 集节点均操作
  • 1、刷新发现新磁盘
代码语言:javascript代码运行次数:0运行复制
[root@dbrac1 ~]# echo "- - -" >  /sys/class/scsi_host/host0/scan
[root@dbrac1 ~]# echo "- - -" >  /sys/class/scsi_host/host1/scan
[root@dbrac1 ~]# echo "- - -" >  /sys/class/scsi_host/host2/scan
[root@dbrac1 ~]# echo "- - -" >  /sys/class/scsi_host/host/scan
[root@dbrac1 ~]# echo "- - -" >  /sys/class/scsi_host/host4/scan
[root@dbrac1 ~]# multipath -l
mpathv (560002ac0000000000000007a00020406) dm-24 PARdata,VV
size=500G features='0' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  |- 1:0:0:17 sdbv 68:144 active undef unknown
  |- :0:0:17 sdbx 68:176 active undef unknown
  |- 1:0:1:17 sdbw 68:160 active undef unknown
  `- :0:1:17 sdby 68:192 active undef unknown
  • 2、配置别名:/etc/
代码语言:javascript代码运行次数:0运行复制
[root@dbrac1  ~]# vim /etc/ 
defaults {
        user_friendly_names yes
}
multipaths {
        multipath {
            no_path_retry fail
            wwid 560002ac0000000000000007a00020406
            alias ASM-ARCH2
    }
}
-- 重新多路径
[root@dbrac1  ~]# /etc/init.d/multipathd restart
ok
正在关闭multipathd 端口监控程序:                          [确定]
正在启动守护进程multipathd:                               [确定]
[root@dbrac1 ~]# multipath -l
ASM-ARCH2 (560002ac0000000000000007a00020406) dm-24 PARdata,VV
size=500G features='0' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  |- 1:0:0:17 sdbv 68:144 active undef unknown
  |- :0:0:17 sdbx 68:176 active undef unknown
  |- 1:0:1:17 sdbw 68:160 active undef unknown
  `- :0:1:17 sdby 68:192 active undef unknown
  • 、目录赋权
代码语言:javascript代码运行次数:0运行复制
 [root@dbrac1 ~]#chown grid.asmadmin  /dev/mapper/ASM-ARCH2
 [root@dbrac1 ~]#chmod 660 /dev/mapper/ASM-ARCH2
三、RAC 集其中一节点操作扩容ARCH
  • 1、查看数据磁盘组:
代码语言:javascript代码运行次数:0运行复制
SQL>  set linesize 200; 
SQL>  col name format a20;  
SQL>  select group_number,name,TOTAL_MB, FREE_MB from v$asm_diskgroup; 
GROUP_UMBER AME                   TOTAL_MB    FREE_MB
------------ -------------------- ---------- ----------
           1 ARCH                     512000     14955
 ......
6 rows selected.
  • 2、查看数据磁盘目录:/dev/mapper/ASM-ARCH2
代码语言:javascript代码运行次数:0运行复制
SQL> col name format a20;  
SQL> col path format a0;    
SQL> select name,path,mode_status,state,disk_number,failgroup from v$asm_disk; 
AME                 PATH                           MODE_ST STATE    DISK_UMBER FAILGROUP
-------------------- ------------------------------ ------- -------- ----------- -----------------------
                     /dev/mapper/ASM-ARCH2          OLIE  ORMAL             0
ARCH_0000            /dev/mapper/ASM-ARCH1          OLIE  ORMAL             0 ARCH_0000
......
  • 、ARCH 扩容平衡数据: 0-11表示平衡级别 11为最高级别,受初始化参数 ASM_POWER_LIMIT限制
代码语言:javascript代码运行次数:0运行复制
 SQL> alter diskgroup ARCH add disk  '/dev/mapper/ASM-ARCH2' rebalance power 1;
  • 4、查看扩容成功
代码语言:javascript代码运行次数:0运行复制
SQL> set line 800
SQL> select group_number,name,TOTAL_MB, FREE_MB from v$asm_diskgroup;

GROUP_UMBER AME                             TOTAL_MB    FREE_MB
------------ ------------------------------ ---------- ----------
           1 ARCH                              1024000     64424
           ......
6 rows selected.
  • 5、查看ASM卷组平衡过程,平衡完后,该内容为空。
代码语言:javascript代码运行次数:0运行复制
SQL> select * from v$asm_operation;
no rows selected
SQL>  select group_number,name,total_mb,free_mb,total_mb-free_mb used_mb from v$asm_disk_stat;
GROUP_UMBER AME                             TOTAL_MB    FREE_MB    USED_MB
------------ ------------------------------ ---------- ---------- ----------
           1 ARCH_0000                          512000     17197     19480
           1 ARCH_0001                          512000     17227     19477
......
18 rows selected.

意外发生

  4个节点的RAC集,突然收到其它个节点数据库宕机报警,唯一还支撑业务的仅有目前操作的节点,Session直接飙升到100(幸亏数据库Sesssion最大配置比较高:2500)。当时最先怀疑的是:其它个节点的新加磁盘路径权限没有赋权。

一、查看各节点历史操作命令均正常,排除权限问题
代码语言:javascript代码运行次数:0运行复制
  85  2024-11-20 14:49:4 echo "- - -" >  /sys/class/scsi_host/host0/scan
  86  2024-11-20 14:49:4 echo "- - -" >  /sys/class/scsi_host/host1/scan
  87  2024-11-20 14:49:4 echo "- - -" >  /sys/class/scsi_host/host2/scan
  88  2024-11-20 14:49:4 echo "- - -" >  /sys/class/scsi_host/host/scan
  89  2024-11-20 14:49:5 echo "- - -" >  /sys/class/scsi_host/host4/scan
  840  2024-11-20 14:49:7 multipath -l
  841  2024-11-20 14:49:50 /etc/init.d/multipathd reload
  842  2024-11-20 14:49:54 multipath -l
  84  2024-11-20 14:51:05 exit
  844  2024-11-20 15:58:27 vim /etc/ 
  845  2024-11-20 15:58:47 /etc/init.d/multipathd restart
  846  2024-11-20 15:58:49 multipath -l
  847  2024-11-20 15:58:58 vim /etc/ 
  848  2024-11-20 15:59:09 /etc/init.d/multipathd restart
  849  2024-11-20 15:59:11 multipath -l
  850  2024-11-20 15:59:46 cat /var/log/messages
  851  2024-11-20 16:02:08 chown grid.asmadmin  /dev/mapper/ASM-ARCH2
  852  2024-11-20 16:0:16 chmod 660 /dev/mapper/ASM-ARCH2
二、查看日志
  • 部分系统异常日志 发现新增路径:mpathv(sdbr、sdbs、sdbt、sdbu)异常:couldn’t get asymmetric access state,由此判断是多路径的问题
代码语言:javascript代码运行次数:0运行复制
ov 20 14:42:24 dbrac2 kernel: sd :0:0:1: Warning! Received an indication that the LU assignments on this target have changed. The Linux SCSI layer does not automatically remap LU assignments.
ov 20 14:42:24 dbrac2 kernel: sd 1:0:0:0: Warning! Received an indication that the LU assignments on this target have changed. The Linux SCSI layer does not automatically remap LU assignments.
ov 20 14:42:24 dbrac2 kernel: sd :0:1:5: Warning! Received an indication that the LU assignments on this target have changed. The Linux SCSI layer does not automatically remap LU assignments.
ov 20 14:42:24 dbrac2 kernel: sd 1:0:1:11: Warning! Received an indication that the LU assignments on this target have changed. The Linux SCSI layer does not automatically remap LU assignments.
ov 20 14:47:16 dbrac2 puppet-agent[1926]: Finished catalog run in 4.88 seconds
ov 20 14:47:2 dbrac2 d[22245]: Accepted password for hnyunwei from 10.10.6.15 port 10266 2
ov 20 14:47:28 dbrac2 kernel: scsi: host 0 channel 0 id 0 lun419404 has a LU larger than allowed by the host adapter
ov 20 14:47:29 dbrac2 kernel: scsi: host 0 channel  id 0 lun419404 has a LU larger than allowed by the host adapter
ov 20 14:48:55 dbrac2 kernel: scsi: host 0 channel 0 id 0 lun419404 has a LU larger than allowed by the host adapter
ov 20 14:48:56 dbrac2 kernel: scsi: host 0 channel  id 0 lun419404 has a LU larger than allowed by the host adapter.
......
ov 20 14:50:02 dbrac2 multipathd: sdbr: couldn't get asymmetric access state
ov 20 14:50:02 dbrac2 multipathd: sdbs: couldn't get asymmetric access state
ov 20 14:50:02 dbrac2 multipathd: sdbt: couldn't get asymmetric access state
ov 20 14:50:02 dbrac2 multipathd: sdbu: couldn't get asymmetric access state
ov 20 14:50:0 dbrac2 kernel: device-mapper: table: 25:24: multipath: error getting device
ov 20 14:50:0 dbrac2 kernel: device-mapper: ioctl: error adding target to table
ov 20 14:50:0 dbrac2 multipathd: mpatha: ignoring map
......
ov 20 14:50:05 dbrac2 multipathd: mpathv: load table [0 20971520 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 4 1 68:144 1 68:176 1 68:160 1 68:192 1]
......
ov 20 14:50:05 dbrac2 multipathd: mpathv: event checker started
ov 20 14:50:05 dbrac2 kernel: sd 1:0:0:17: alua: port group 01 state A preferred supports tolusnA
ov 20 14:50:05 dbrac2 kernel: sd :0:0:17: alua: port group 01 state A preferred supports tolusnA
ov 20 14:50:05 dbrac2 kernel: sd 1:0:1:17: alua: port group 01 state A preferred supports tolusnA
ov 20 14:50:05 dbrac2 kernel: sd :0:1:17: alua: port group 01 state A preferred supports tolusnA
ov 20 14:50:05 dbrac2 multipathd: dm-24: remove map (uevent)
ov 20 14:50:05 dbrac2 multipathd: mpathv: stop event checker thread (1407745021696)
ov 20 14:50:05 dbrac2 multipathd: dm-24: remove map (uevent)
ov 20 14:50:05 dbrac2 multipathd: dm-24: devmap not registered, can't remove
ov 20 14:50:05 dbrac2 multipathd: dm-24: adding map
ov 20 14:50:05 dbrac2 multipathd: mpathv: event checker started
ov 20 14:50:05 dbrac2 multipathd: mpathv: devmap dm-24 added
......
ov 20 16:21:47 dbrac2 kernel: rport-1:0-16: blocked FC remote port time out: removing rport
ov 20 16:21:47 dbrac2 kernel: rport-2:0-85: blocked FC remote port time out: removing rport
ov 20 16:26:19 dbrac2 kernel: rport-4:0-2: blocked FC remote port time out: removing rport
ov 20 16:26:19 dbrac2 kernel: rport-:0-5: blocked FC remote port time out: removing rport
ov 20 16:26:19 dbrac2 kernel: rport-2:0-4: blocked FC remote port time out: removing rport
ov 20 16:26:19 dbrac2 kernel: rport-1:0-5: blocked FC remote port time out: removing rport
ov 20 16:0:48 dbrac2 d[1298]: Accepted password for hnyunwei from 10.10.6.9 port 55959 2
  • 数据库日志
代码语言:javascript代码运行次数:0运行复制
Wed ov 20 16:06:6 2024
OTE: ASMB terminating
Errors in file /u01/oracle/diag/rdbms/dbrac/rac2/trace/rac2_asmb_:
ORA-15064: communication failure with ASM instance
ORA-011: end-of-file on communication channel
Process ID:
Session ID: 216 Serial number: 
Errors in file /u01/oracle/diag/rdbms/dbrac/rac2/trace/rac2_asmb_:
ORA-15064: communication failure with ASM instance
ORA-011: end-of-file on communication channel
Process ID:
Session ID: 216 Serial number: 
Wed ov 20 16:06:6 2024
System state dump requested by (instance=2, osid=77922 (ASMB)), summary=[abnormal instance termination].
System State dumped to trace file /u01/oracle/diag/rdbms/dbrac/rac2/trace/rac2_diag_
ASMB (ospid: 77922): terminating the instance due to error 15064
Wed ov 20 16:06:6 2024
opiodr aborting process unknown ospid (16621) as a result of ORA-1092
Wed ov 20 16:06:6 2024
opiodr aborting process unknown ospid (8007) as a result of ORA-1092
Wed ov 20 16:06:6 2024
opiodr aborting process unknown ospid (50410) as a result of ORA-1092
Wed ov 20 16:06:6 2024
ORA-1092 : opitsk aborting process
Wed ov 20 16:06:6 2024
opiodr aborting process unknown ospid (140718) as a result of ORA-1092
Wed ov 20 16:06:6 2024
ORA-1092 : opitsk aborting process
Wed ov 20 16:06:7 2024
ORA-1092 : opitsk aborting process
Wed ov 20 16:06:7 2024
ORA-1092 : opitsk aborting process
Wed ov 20 16:06:7 2024
ORA-1092 : opitsk aborting process
Wed ov 20 16:06:8 2024
  • 集日志
代码语言:javascript代码运行次数:0运行复制
2024-11-20 16:06:6.265:
[/u01/grid/11.2.0./product/bin/oraagent.bin(11611)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLS00006:)" in "/u01/grid/11.2.0./product/log/dbrac2/agent/ohasd/oraagent_grid/oraagent_grid.log"
2024-11-20 16:06:6.689:
[ohasd(10948)]CRS-2765:Resource 'ora.asm' has failed on server 'dbrac2'.
2024-11-20 16:06:6.694:
[/u01/grid/11.2.0./product/bin/oraagent.bin(11611)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLS00006:)" in "/u01/grid/11.2.0./product/log/dbrac2/agent/ohasd/oraagent_grid/oraagent_grid.log"
2024-11-20 16:06:6.768:
[crsd(77204)]CRS-2765:Resource 'ora.dbrac.db' has failed on server 'dbrac4'.
2024-11-20 16:06:6.772:
[crsd(77204)]CRS-2765:Resource 'ora.asm' has failed on server 'dbrac4'.
2024-11-20 16:06:6.972:
[/u01/grid/11.2.0./product/bin/oraagent.bin(77425)]CRS-5011:Check of resource "dbrac" failed: details at "(:CLS00007:)" in "/u01/grid/11.2.0./product/log/dbrac2/agent/crsd/oraagent_oracle/oraagent_oracle.log"
2024-11-20 16:06:7.06:
三、操作处理
  • 发现新路径存在问题,后面再排查可能需要时间,考虑到目前集仅有一个节点在撑,先试一下重启大法(reboot)对节点轮翻重启,然后集各节点逐渐加入集恢复正常。

分析小结:

  • 从日志来分析应该是新分的存储路径的问题,导致集ASM异常。 网上发现同种案例:Redhat6主机系统Oracle11g数据库异常重启问题,说是系统多路径BUG,日志输出与网页相吻合。
  • 回溯操作当时有记录路径状态发现,新路径ASM-ARCH2本来应该是size=500G,但其它节点均识别为size=10G,由此想到之前给集分配过一次10G的测试磁盘但已摘除。从目前现象分析像是新分的500G磁盘路径,系统按旧的10G信息识别到了,但UUID为新的,但磁盘信息为旧的(如下),没有重启的节点是因为1年前因硬件问题重启过一次,残留信息已清理。此次经历真是有惊有点小险啊,差点全军(全节点宕机)覆灭。
代码语言:javascript代码运行次数:0运行复制
-- 重启前
ASM-ARCH2 (560002ac0000000000000007a00020406) dm-24 PARdata,VV
size=10G features='0' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
  |- 1:0:0:17 sdbv 68:144 active undef unknown
  |- :0:0:17 sdbx 68:176 active undef unknown
  |- 1:0:1:17 sdbw 68:160 active undef unknown
  `- :0:1:17 sdby 68:192 active undef unknown

-- 重启后 :
ASM-ARCH2 (560002ac0000000000000007a00020406) dm-24 PARdata,VV
size=500G features='0' hwhandler='1 alua' wp=rw
`-+- policy='round-robin 0' prio=50 status=active
  |- 1:0:0:17 sdbr 68:80  active ready running
  |- :0:0:17 sdbt 68:112 active ready running
  |- 1:0:1:17 sdbs 68:96  active ready running
  `- :0:1:17 sdbu 68:128 active ready running
  • 后续再次测试整理:HBA&multipath 操作及问题汇总

#感谢您对电脑配置推荐网 - 最新i3 i5 i7组装电脑配置单推荐报价格的认可,转载请说明来源于"电脑配置推荐网 - 最新i3 i5 i7组装电脑配置单推荐报价格

本文地址:http://www.dnpztj.cn/biancheng/1178378.html

相关标签:无
上传时间: 2025-07-21 21:55:07
留言与评论(共有 15 条评论)
本站网友 祛鱼尾纹价格
12分钟前 发表
50
本站网友 温州二手房出售
9分钟前 发表
0
本站网友 清远二手房网
29分钟前 发表
event checker started ov 20 14
本站网友 联想旭日410a
24分钟前 发表
total_mb
本站网友 求网站
20分钟前 发表
05 dbrac2 kernel
本站网友 昌邑租房网
11分钟前 发表
failgroup from v$asm_disk; AME PATH MODE_ST STATE DISK_UMBER FAILGROUP -------------------- ------------------------------ ------- -------- ----------- ----------------------- /dev/mapper/ASM-ARCH2 OLIE ORMAL 0 ARCH_0000 /dev/mapper/ASM-ARCH1 OLIE ORMAL 0 ARCH_0000 ......
本站网友 西荟城
24分钟前 发表
刷新发现新磁盘代码语言:javascript代码运行次数:0运行复制[root@dbrac1 ~]# echo "- - -" > /sys/class/scsi_host/host0/scan [root@dbrac1 ~]# echo "- - -" > /sys/class/scsi_host/host1/scan [root@dbrac1 ~]# echo "- - -" > /sys/class/scsi_host/host2/scan [root@dbrac1 ~]# echo "- - -" > /sys/class/scsi_host/host/scan [root@dbrac1 ~]# echo "- - -" > /sys/class/scsi_host/host4/scan [root@dbrac1 ~]# multipath -l mpathv (560002ac0000000000000007a00020406) dm-24 PARdata
本站网友 国芳
26分钟前 发表
0
本站网友 周星池
6分钟前 发表
0 dbrac2 multipathd
本站网友 莲花国际广场
1秒前 发表
7 2024 ORA-1092
本站网友 涓涓细流
16分钟前 发表
次扩容操作都没出个问题
本站网友 中药砂仁
29分钟前 发表
05 dbrac2 multipathd
本站网友 品味斋
5分钟前 发表
77922)
本站网友 太白穴
25分钟前 发表
58