Recently, when installing a database on a grid infrastructure, dbca failed with the message that it wasn’t able to establish the dependency between the database and the corresponding grid infrastructure resource of the DATA ASM disk group I was using for my data files. Its name would have been “ora.DATA.dg”. (ERROR: failed to establish dependency between database MYDB and diskgroup resource ora.DATA.dg)
The Problem
When looking up
crsctl stat res -t
showed me , that there was no “ora.DATA.dg”, so no wonder. But a SELECT on v$asm_diskgroup confirmed that the diskgroup WAS there, it was even mounted!
Exploring possibilities
Okay, whatever might have been wrong when creating the diskgroup, just let’s create the resource, and off we would go. But Oracle refused to cooperate. SRVCTL only knows operations such as START, STOP, STATUS, ENABLE, DISABLE and REMOVE a diskgroup. I was not able to find something in the official documentations, that describes definitely how a resource for a diskgroup is created. All sources pointed to something automagic. When testing with GI and ASM on my test cluster, it became clear that the resource ora.DATA.dg is added to the resource list when the disk group DATA is mounted by ASM for the very first time. (Some folks on the net indicating that this would happen when the RDBMS is using this DG for the first time are mistaken, and are maybe confused by the experience that the DB access triggers ASM to mount this diskgroup.) But knowing this was no help for a start, because on my new production box this DID NOT happen, and nobody seemed to know why.
This was my manual mount command:
(+ASM1)$ sqlplus / as sysasm <...> SQL> alter diskgroup DATA mount; Diskgroup altered. SQL>
No error, but as expected, no such disk group resource in crsctl. Now let’s look into the alert log of my first ASM instance ( alert_+ASM1.log):
(+ASM1)$ tail -1000f $ORACLE_BASE/diag/+asm/+ASM1/trace/alert_+ASM1.log <...> NOTE: diskgroup resource ora.DATA.dg is online ERROR: failed to update diskgroup resource ora.DATA.dg
Ah, here we go. But no trace file enlighted the darkness, in fact a grep on ora.DATA.dg in my trace directory found zero results. But this error message leaves two options open for consideration: a) ASM tried to switch an existing resource and fails, because it’s not there or b) ASM tries to create a resource and fails for an unknown reason. But option a) is a thought loop, so what about b)?
A Theory
Facts:
- All ASM resources depend on ASM, and ASM depends on a listener and a listener depends on network. Oracle Grid Infrastructure usually checks for possibly failing dependencies BEFORE doing anything to avoid creating outages in vain.
- All ASM processes are running as OS user ORACLE (here).
- My manually created admin VIP “admin.vip1.res” was created by root (to bring along enough permissions for ifconfig) and oracle has no rights there (not even READ).
(+ASM1)$ crsctl getperm resource admin.vip1.res Name: admin.vip1.res owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x
Maybe GI tries to look up something and can’t…?
Check
So let’s check something. Give oracle the right to fuss around with my handmade VIP:
[root@db_node1 ~]$ (+ASM1) crsctl setperm resource admin.vip1.res -u user:oracle:r-x [root@db_node1 ~]$ (+ASM1) crsctl getperm resource admin.vip1.res Name: admin.vip1.res owner:root:rwx,pgrp:root:r-x,other::r--,user:root:r-x,user:oracle:r-x
Dismount/mount the VIP: Same problem. 🙁 But in the last second before dismissing this direction: Why shouldn’t we restart the VIP before trying?
[root@db_node1 ~]$ (+ASM1) crsctl stop res admin.vip1.res CRS-2673: Attempting to stop 'klug.fs.klug' on 'db_node1' CRS-2677: Stop of 'klug.fs.klug' on 'db_node1' succeeded [root@db_node1 ~]$ (+ASM1) crsctl start res admin.vip1.res CRS-2672: Attempting to start 'admin.vip1.res' on 'db_node1' CRS-2676: Start of 'admin.vip1.res' on 'db_node1' succeeded
And retry:
(+ASM1)$ sqlplus / as sysasm <...> SQL> alter diskgroup DATA dismount; Diskgroup altered. SQL>
Still no resource in the list.
SQL> alter diskgroup DATA mount; Diskgroup altered. SQL>
Check again!
[oracle@db_node1 ~]$ (+ASM1) crsctl stat res -t ------------------------------------------------- NAME TARGET STATE SERVER ------------------------------------------------- Local Resources ------------------------------------------------- <...> ora.DATA.dg ONLINE ONLINE db_node1 OFFLINE OFFLINE db_node2 <...>
YAY! Ok, it’s still offline on Node 2, but it was a local ASM command to mount it, +ASM2 had no chance to know it.
Result
Give your admin-made VIPs at least the ACL rights “-u user:oracle:r-x” (I tried “r–” as well, but it’s not enough) to enable ASM hooking its resources into grid infrastructure. Definitely not documented, and thus, maybe nice to know.
Carry on
Martin