Once upon a time, there was an Oracle RAC cluster 12.2.0.1 on SUSE Linux Enterprise Server (SLES) 12 SP2, that did not do well. Its database and ASM instances used to fail with:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable
ORA-27302: failure occurred at: skgpspawn3
Of course, in such a case you will check ulimits -u / limits.conf (nproc) and sysctl.conf (kernel.pid_max). But what if this does not help?
SUSE introduced a new Thread Limit using the PID cgroup controller. Its default value is 512, and obviously too small for any production RAC I ever saw. If your ohasd cgroup violates it, you will get one tiny and singular error message in dmesg and many of the above, uncertain ORA-27300 “fork failed with status:11”. And did you know what “11” is? it means “Try again”. 🙂 Thank you.
cgroup: fork rejected by pids controller in /system.slice/ohasd.service
Next time running Oracle on SLES12 SP2, you may want to check /etc/systemd/system.conf, too. There is a parameter to configure the default max threads number per cgroup:
#DefaultTasksMax=512
Since it’s the default, it is commented. In my case, I wanted 65535, so now I am happy with:
DefaultTasksMax=65535
I hope you come about this blog post before spending too many hours on this pointless issue as I did. 🙂
Regards
Martin Klier
PS: That’s the link that says “Yes we changed a detail in our system with SP2”:
https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12-SP2/#fate-320358
PS2: Thank you to Stefan Köhler and many other idea-dispensers who helped to narrow down the issue!