Xenserver: Changing poolmaster when old poolmaster is not available

0

Scenario, a 2 node XenServer Cluster with HA. In general, one node is assigned a pool master to which for example the XenCenter is going to connect to.

Now exactly that node goes down. It could be the other one but you know it never is.

I have had configured HA and assigned which guests I want to restart when going down. While the guest restarted fine on the other node I’m still unable to connect to the XenServer Cluster using my XenCenter.

Now XenCenter is trying to connect to the IP of Node00 which just happened to go down.
I could try to use the IP of the remaining pool member. But that would be too easy and has resulted into some:

Server 'xxx-pool' in in a pool. To connect to a pool, you must connect to the pool master. Do you want to connect to the pool master '192.xxx.xxx.xx'?

That IP is of course down. I gave it some time hoping it will sort out the pool master by itself but it didn’t. So apparently the pool master role did not move to one of the remaining pool members when the old pool master went down.

I will need to do some research on why this is or if this is deliberately. It kind of doesn’t make much sense to me since the pool master role would be the control role for the whole pool.

However to recover from such use of of the the remaining nodes command line.

1) Assign the pool master to the node you have ssh to:

xe pool-emergency-transition-to-master

After a few seconds you would be able to connect with XenCenter to that Server and reclaim the pool management.

2) Some weird HA situation: When HA is blocking the Transition:

xe host-emergency-ha-disable
xe-toolstack-restart
xe pool-emergency-transition-to-master

This should disable the Pool HA and reclaim on that node the pool master

3) Assign the pool master from any node to a particular node:

For this you need to be on a pool member node and disable HA first.

xe pool-ha-disable

Then list your configured XenServer hosts:

xe host-list
uuid ( RO) : f54a8c38-56f2-4dc4-878c-e8ed3fc36889
name-label ( RW): xenserver01.local
name-description ( RW): Default install of XenServer

uuid ( RO) : 3f6db05d-bef6-4f9f-a2f3-a01f1786f60d
name-label ( RW): xenserver00.local
name-description ( RW): Default install of XenServer

Using the list above example, select a new pool master by using the uuid associated with the selected host:

xe pool-designate-new-master host-uuid=<uuid>
xe pool-designate-new-master host-uuid=f54a8c38-56f2-4dc4-878c-e8ed3fc36889

Now your new pool master should be active and you can connect your XenCenter to the IP address of the new pool master.

We can now re-enable the HA of the pool.

xe pool-ha-enable
Lost connection to the server.

The “Lost connection to the server” is normal since it restarts the tool-stack and reassigned the pool master. With  XenCenter you can now connect to the IP of the new pool master

 

3) (Untested / Unverified) Recover after a split brain or pool outage:

I came across the following statement in a post.

“When High Availability is disabled and the pool master and one or more of the slave servers in the pool self-fence or lose power at the same time, the other servers nominate a new master.
After the old master server and the hosts start up again, the servers do not contain the updated details of the changed pool master server. These servers enter into an emergency mode and elect each other as the master server.
If the master server fails when it starts again, it cannot communicate to any other machines in the pool, it displays itself as a slave and points to itself.”

 You can check the following in the /etc/xensource/pool.conf which is the master and which is the slave and points to which master. This example would be correct.

[root@xenserver01 ~]# cat /etc/xensource/pool.conf
master
[root@xenserver00 ~]# cat /etc/xensource/pool.conf
slave:192.xxx.xxx.xxx   [the IP of my master xenserver01]

If you have 2 masters or only slaves then run the following commands on any of the remaining pool member or the new designated pool master.

xe host-emergency-ha-disable --force  #(This operation is extremely dangerous and may cause data loss. This operation must be forced (use --force)
xe-toolstack-restart
xe pool-emergency-transition-to-master

cat /etc/xensource/pool.conf
      master
xe pool-recover-slaves    #to seed and tell the other nodes that they are slaved to this master now.

That should recover your pool and make it again manageable.

Leave a Reply