Monday, February 15, 2021

ESX Hosts disconnected from vCenter, Unable to connect them back.


I had a couple Hosts  disconnected from vCenter, 

First step will be to restart management agents.

 A. Restart Management agents in ESXi Using ESXi Shell or Secure Shell (SSH):

  1. Log in to ESXi Shell or SSH as root.

    For Enabling ESXi Shell or SSH, see Using ESXi Shell in ESXi 5.x and 6.x (2004746).
     
  2. Restart the ESXi host daemon and vCenter Agent services using these commands:

    /etc/init.d/hostd restart

    /etc/init.d/vpxa restart

B. To restart all management agents on a host:  (Please note the Cautions below)

  • To restart all management agents on the host, run the command:

    services.sh restart


Caution:

  • If LACP is enabled and configured, do not restart management services using services.sh command. Instead restart independent services using the /etc/init.d/module restart command.
  • If the issue is not resolved, and you are restarting all the services that are a part of the services.sh script, take a downtime before proceeding to the script.
  • If NSX is configured in the environment, do not run the /sbin/services.sh restart command because this will restart all services on the ESXi host. If you need to restart the management agents on the ESXi host, restart vpxa, host.d, and fdm individually. If you also need to run the /sbin/services.sh restart command because restarting each management agent does not work, then migrate all the VMs off the ESXi host and put the host in maintenance mode if possible.

 

If restarting Hostd and vpxa service does not work ( as in my case)  and you are on 6.7 u1, and while in SSH you try to do ESXCLI commands and get a error about connection refused.......   refer to this KB..   

https://kb.vmware.com/s/article/78124

Also run this command to check if there is an issue with libcimsvc:

cat /var/log/vobd.log | less

If you see this in the vobd.log over and over:   [UserWorldCorrelator] 995067852232us: [vob.uw.core.dumped] /bin/sfcbd(5856860) /var/core/sfcb-intelcim-zdump.000

Then you know the libcimsvc is failing even though hostd service says its running, and perform the workaround in the above VMware KB.

To workaround follow the below steps

1. /etc/init.d/hostd stop
2. edit the /etc/vmware/hostd/config.xml

Find the line:
     <cimsvc>
        <path>
libcimsvc.so</path>
        
<enabled>true</enabled>
     </cimsvc>


 set to: <enabled>false</enabled>

3. save the file
4. /etc/init.d/hostd start

 

-----------------------------------------------------------------------------------------------------------------------------------------------

 

Here are those other bugs for hosts getting disconnected from VC:

 

https://kb.vmware.com/s/article/70597 : 

ESXi 6.x host is disconnected from vCenter Server due to dcism exhausting inodes 

https://kb.vmware.com/s/article/74966 : 

ESXi 6.5/6.7 hangs during certain tasks like maintenance mode, connecting to vCenter, or after a reboot.

https://kb.vmware.com/s/article/67920 :

Multiple attempts to log in to an ESXi host with incorrect credentials might cause the hostd service to stop responding (CVE-2019-5528)