In a vsan stretched cluster vSAN witness might show as isolated when running the below command, it confirms vsan is Isolated.
esxcli vsan cluster get
If the output of the command returns:
Sub-Cluster Member Count: 1
Local Node State: STANDALONE
In a vSAN Stretched Cluster, the Witness plays an important role in assuring keeping all the witness components of the vSAN Objects. To ensure proper TCP/IP communication between the data hosts and the Witness, these requirements exist:
- Round-Trip Time (RTT) latency between the Witness and the ESXi hosts must be < 500 ms.
- A full frame must be sent between pings. If using MTU 1500, the payload must be at least 1472.
- To verify if the payload can be sent, run this command from one of the ESXi hosts: vmkping -I VSANvmknic WitnessIP -s 1472 -d -c20
If the ping fails, something on the network is not allowing the full payload to travel between the ESXi and the Witness Host
What if all the ping is working over vsan witness interface to other data nodes, routing is setup correctly, but still the witness node is isolated with respect to VSAN?
That’s when we need to check two main aspects firstly, the unicast table via the command below
esxcli vsan cluster unicastagent list
The witness appears with the value 1 in the “is witness” section. If it’s not there , add it manually via the command
esxcli vsan cluster unicast agent add -t witness -u <local_UUID> -U true -a <vSAN IP address> -p 12321
Evan if vSAN witness is present in the unicast table but its isolated form vsan cluster. We have to check the UDP traffic. For that, we need to perform a TCP dump via the below command on the witness.
tcpdump-uw -i <vmk witness interface> port 12321
The results should have the IP address of the other data node’s witness VMK interface. If it’s not there UDP is either blocked at the switch level, or by a firewall, or witness traffic is being NATed . VSAN does not support NAT & VSAN needs UDP to function.