In this blog post, we will take a look at an issue, I encountered during a VCSA migration and PSC cleanup (see other blogpost).
During the reboot of a migrated vCenter appliance, we got an error 400 failed to connect to VMware Lookup service when connecting using the browser.
But first, let me give you some context on how we got here in the first place.
I was tasked with a multi VCSA with external PSC migration from 6.7 to 7. As well as perform some separations of workloads, resulting in deploying an additional VCSA in the SSO domain. This is where we encountered the issue when we joined the new VCSA into the SSO domain.
The join went successfully, but after a clean reboot we received the error 400 – failed to connect to VMware Lookup service.
I quickly checked if the VAMI interface (https://VCSA:5480) was online, which it was and the health was ok, but the SSO domain had a Status: unknown..
So, the First thing that I checked was DNS, NTP. Perhaps I made a typo during the deployment and resulted in some strange behavior during boot up.
But a quick check showed me that both settings were correct. The VCSA had the correct Time and was able to do a reverse and forward DNS query.
So continuing the troubleshoot, we opened a SSH to the vCenter.
So a quick check of the running services indicated that only the following services were successfully started during the boot.
root@VCSA [ /var/log/vmware/vmdird ]# service-control –status –allRunning:
|
A quick google showed me that the next service to start was the vmdird service. Thanks to David Pasek, Link to the blog article.
- lwsmd (Likewise Service Manager)
- vmafdd (VMware Authentication Framework)
- vmdird (VMware Directory Service)
- vmcad (VMware Certificate Service)
- vmware-sts-idmd (VMware Identity Management Service)
- vmware-stsd (VMware Security Token Service)
- vmdnsd (VMware Domain Name Service)
- vmware-psc-client (VMware Platform Services Controller Client)
- vmon (VMware Service Lifecycle Manager)
So, the next thing to try was getting the VMDIRD service started or get a glimpse of why it was failing to start.
Here, I tried multiple commands: service-control –start –all and service-control –start vmdird but both gave me the same error:
2021-02-23T08:58:42.857Z {
|
The error did not really provide me any indication of what the cause was but it referred to systemctl and journalctl.
Well, the first just shows you the system status of that service, so not really helpful.
But the second journalctl is a log that captures all of the messages produced by the kernel, services, etc.
So after a quick look here, we found the issue:
Here the log referred to an old PSC entry that the customer had removed some time ago. So the entry was indeed unavailable as it was long gone and deleted from the environment.
Solution
With the issue, Identified we had to somehow trick the VCSA in skipping the LDAP communication.
Thanks to GSS engineer Michael O’Sullivan, we had to disconnect the NIC from the VCSA VM and restart the services once more (temporary solution).
Succes, now the VCSA was able to boot in an offline mode. All services did boot successfully and the web interface can up without any more issues.
After the boot, we reconnected the NIC of the VCSA and the linked enhanced mode between the VCSA worked again.
Of course, once we rebooted the Center again we would be faced with the same issue as long as the Stale PSC entry is located in the SSO domain. If you would like to know how to resolve the rootcause, head over to my blogpost: Resolving stale PSC entries from your vSphere environment
Thanks for reading!