Recently I had an issue with a two-node Grid Infrastructure on Windows 2012R2. After an infrastructure-caused cluster restart (irresponsible SAN hardware patching 🙂 ), everything was running on Node 2, and Node 1 could not join the cluster any more.
No easy solution: On CSSD level, there was no issue (network and disk heartbeat worked, according to ocssd.log). It turned out, that the ASM instance on Node 1 started, but its LMON could not communicate with the already-running ASM on Node 2: Instance terminated by LMON. No really speaking ORA error messages in its alert log.
But on the working Node 2, the ASM alert log shows
ORA-27300: OS system dependent operation:IPC_TCPConnectCheck failed with status: -1
Guessing from the module name, I started thinking about the network – and yes, somebody activated the Windows Firewall on Node 1. Strange that the errors did not show up on the node causing the error, but I was glad to have found the culprit.
How to check Windows firewall:
netsh advfirewall show currentprofile
Syntax that will always help with annoying firewalls, but has to be clarified by security:
netsh advfirewall set currentprofile state off
Lessons learned:
- People tend to introducing new problems during fixing others (in this case, messing with the Windows Firewall config during looking for a SAN problem), so DBAs, adapt your thinking to that.
- Obvious, but easy to forget: When diagnosing RAC / Clusterware issues, look into logs on all nodes (or build a central ADR)
Us usual, take care and think about the (other) box
Martin