I am expecting something good to come my way soon. Why? I have suffered for a few days trying to get to work a very simple cluster that consistently refused all of my efforts insisting on not letting the first node to start the HA services and to join the cluster. This is not an extraordinary cluster, there is nothing special about just two nodes, one resource group and this is all.
But first, this is how the cluster was created.
# clmgr add cluster lawmsmpa1 \ nodes=lawmsmpa1c1,lawmsmpa1c2 \ type=NSC \ heartbeat_type=unicast \ repositories=hdisk2
hdisk2 has identical PVID on both nodes and its
reserve_policy is set to
no_reserve on both nodes.
# lsattr -El hdisk2 | grep reserve_policy reserve_policy no_reserve
Next, I defined the application controller aka the “entity” identifying the highly available application start and stop scripts.
# clmgr add application_controller lawmsmp \ startscript=/usr/es/sbin/cluster/scripts/start_cluster.ksh \ stopscript=/usr/es/sbin/cluster/scripts/stop_cluster.ksh
The service address was defined next.
# clmgr add service_ip 10.27.45.1 \ netmask=255.255.255.0 \ network=net_ether_01
Finally, the cluster resource group is defined as follows.
# clmgr add resource_group lawmsmpRG \ nodes=lawmsmpa1c1,lawmsmpa1c2 \ startup=OHN fallover=FNPN fallback=NFB \ service_label=lawmsmpa1 \ applications=lawmsmp \ volume_group=lawson_vg \ fs_before_ipaddr=true
At this time, we should have the
caavg_private volume group present on the
hdisk2 on the both nodes… Well, this was not the case….. It was present on the second node (
lawmsmpa1c2) but not on the first one which by the way is the primary node of this cluster (it was declared first while defining the cluster).
Rebooting both nodes definitely did not help. The situation did not change. Every attempt to start cluster services failed on the primary node. The error message was always the same –
lawmsmpa1c1: rc.cluster: Error: CAA cluster services are not active on this node.
RSCT cluster services (cthags) are not active on this node
clconfg service was not running.
# lssrc -g caa Subsystem Group PID Status clcomd caa 13041908 active clconfd caa failed
Indeed, id does not work. The cluster services and its resource group could be started and brought on line on the second node but not on the first one…. Looking at the
lawmsmpa1c1, I noticed that one HA entry was missing. The missing entry was
Well, this could explain at least some of the reasons behind this disaster but not all. The cluster sync following the update to the
services file did not help. The repository was still mangled and the
caavg_private volume group was still only present on the second node. It is time for some scrubbing!
Executed on both cluster nodes:
# export CAA_FORCE_ENABLED=1 # rmcluster -f -r hdisk2 # lsattr -El cluster0 # rmdev -dl cluster0
# mkvg -f -y scrubvg hdisk2 # varyoffvg scrubvg
# importvg -f -y scrubvg hdisk2 # varyoffvg scrubvg # exportvg scrubvg
# exportvg scrubvg
Yes, we validated it – the disk for the repository volume group is accessible from both nodes.
On both nodes
# shutdown -Fr
After reboot, the next command executed on both nodes showed that cluster definition present (do not be surprised, we only cleaned the repository disk!)
# odmget -q name=cluster0 CuAt CuAt: name = "cluster0" attribute = "node_uuid" value = "749fb3f2-05f9-11e5-95f1-56bdbf443b02" type = "R" generic = "DU" rep = "s" nls_index = 3
Crossed my fingers, and asked to synchronize the cluster which if everything works as intended will create the
caa_private volume group on both nodes – the way it was designed to be!
# clmgr sync cluster
It worked like a charm and the cluster started like nothing bad ever happened…… Thanks God it is Friday! 🙂
By the way, PowerHA 7.1.3 has the following requirement:
CAAnodename = hostname = COMMUNICATION_PATH
hostname and COMMUNICATION_PATH are both in the short-name format the contents of
/etc/hosts also followed this formula.
100.127.145.2 lawmsmpa1c1 100.127.145.3 lawmsmpa1c2 100.127.145.1 lawmsmpa1