When decommissioning and modifying SSO domains, you sometimes forget to follow the correct procedure and end-up with stale PSC entries in your vSphere environment.
In this blog, we will take a look at how to identify and resolve these stale entries.
NOTE
This guide is not a quick solution, be aware of the commands we are running can have a massive impact on the operation of your vSphere environment, removing stale PSC entries is not a quick fix.
If not performed correctly, If you are unsure and have stale entry issues. Please open a VMware support ticket with GSS.
Intro – how does this even happen?
Let’s begin with what the PSC stale entry looks like and how it even gets there in the first place.
The most common way stale PSC or vCenter objects occur is due to an improper way of decommissioning one of your SSO domain members.
VMware has created a KB article specific on how to properly remove an SSO member from the SSO domain using the CMSSO-UTIL command.
The command that needs to be performed on a remaining member is following:
cmsso-util unregister –node-pnid PSC_FQDN_YOU_WANT_TO_REMOVE –username administrator@your_domain_name –passwd vCenter_Single_Sign_On_password
If this command isn’t used during the decommissioning and the vCenter of PSC VM is just powered off and removed.
All entries will remain in the SSO domain database likely resulting in issues when manipulating your vSphere environment (upgrades, consolidations, expansions,…)
Identifying Stale entries in your environment
With the explanation of how this can happen done, let’s continue and start identifying stale entries in our vSphere environment.
To validate the current health and topology from the environment, there are two easy ways to validate.
open your vCenter server and navigate to menu -> administration -> system configuration. On the right click on topology and a nice diagram will be shown of all the members.
From the CLI standpoint, we can do the same but it’s spread across multiple commands. The first command is getting a list of all known members of the environment. This will also give you an indication if there are any stale entries present.
/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showservers -h localhost -u administrator -w PASSWORD
Next, the following command will validate the replication members of the node on which you have opened the SSH session.
This allows you to quickly identify on which member the replication partner is still referring to the stale entry. Or if no member is still referring to the stale entry and it’s just still in de DB.
/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showpartners -h localhost -u administrator -w PASSWORD
Next, the last command will validate the replication status between the node and the replication partner.
/usr/lib/vmware-vmdir/bin/vdcrepadmin -f showpartnerstatus -h localhost -u administrator -w PASSWORD
Based on the following output we can validate if we have a stale entry in the environment.
Removing stale entries
The following screenshots have been taken during a troubleshoot at a customer. I have therefore anonymized the VCSA / PSO names in the screenshots.
I have changed them into the vCenter names that I use in my personal lab.
Make sure you have created a file-based backup of the vCenter servers and create snapshots of all vCenter when they are powered off (all of them at the same time).
With the stale entry identified CD-VVCA02 , we will now try to remove them using the correct procedure given in the following VMware KB article.
The CMSSO-UTIL unregister command will be used to attempt to remove the stale entry from the SSO database.
cmsso-util unregister –node-pnid PSC_FQDN_YOU_WANT_TO_REMOVE –username administrator@your_domain_name –passwd vCenter_Single_Sign_On_password
The command will give the following output if the command ran successfully:
root@CD-VVCSA01 [ ~ ]# cmsso-util unregister –node-pnid cd-VVCSA02.agisko.int –username administrator@cloudduo.local –passwd PASSWORD
Solution users, computer account and service endpoints will be unregistered
2021-02-03T13:38:22.588Z Running command: [‘/usr/lib/vmware-vmafd/bin/dir-cli’, ‘service’, ‘list’, ‘–login’, ‘administrator@cloudduo.local’]
2021-02-03T13:38:22.620Z Done running command
Stopping all the services …
The vCenter services will restart and the stale entry is fully removed from the environment.
Of course, I wouldn’t be writing a blog post on a command that has been well documented KB article.
The command failed in my case and gave the following output:
Note: I have replaced all FQDN entries, with shortened entries of my homelab. Make sure to use the full FQDN entry when using CMSSO-util
In the command output, we can clearly see that the reference to the stale CD-VVCSA02 object is missing. The command gives a “No such object” error.
With that the command automatically stops and restarts all services, you can face the following issue where the vCenter service does not want to start correctly.
You can find more info on how to resolve this in the following blog post: vCenter error 400 failed to connect to VMware Lookup service
Removing stale entries – the more aggressive part
Following the official procedure, the correct command did not really solve our stale entry issue.
Therefore we will be going to be using the underlying command that CMSSO-util uses to hopefully force remove the stale entry.
/usr/lib/vmware-vmdir/bin/vdcleavefed -h PSC_FQDN_YOU_WANT_TO_REMOVE -u administrator -w PASSWORD
The following command unfortunately also did not work for our issue. The stale entry just did not want to get removed by any command.
Removing stale entries – the unsupported part
As a final solution, I called in the help of the GSS colleagues, in solving the issue once and for all.
They attempted the same steps as mentioned above but came to the same conclusion as me, we had to manually remove the stale entry from the Database.
If you have come this far, please open a ticket with GSS, as the next steps can result in fully corrupting your vSphere environment. You have been warned!
To connect to a vCenter Database, you need to download JExplorer: http://jxplorer.org/
Use the following settings:
Host: FQDN
Port: 389
Protocol: LDAP v3
Level: User + Password
User DN:
cn=administrator,cn=users,dc=vsphere,dc=local
After we connected to the database, open the configuration tab on the right.
We clearly see that our stale entry CD-VVCSA02 is present in the database. With our offline snapshots in place, we can now try to remove the stale entry manually. Open the entry and delete the replication partner(s) entries first. This should not give any error and means that we can remove the entry itself as well. The change (the delete) is replicated instantly across the replication partners.
This should resolve our stale PSC entries issue in our vSphere environment. In our case, we were able to fully start our newly deployed vCenter.
I hope this blog has helped you in any way! Like, share, or comment if you have any feedback!
Additional blogs: VxRail / ESXi – Not enough space on the bootbank
Nice article, than you so much
Thank you very much
No worries, I’m happy that my blogpost was able to help you!
Nice!!
One thing to add…my stale object didn’t go away until i deleted it from the locations listed below.
1. World > local > vsphere > Configuration > Sites > Default-First-Site > Servers > *
2. World > local > vsphere > Domain Controllers > *
Thanks for the write up!!!
Hi Michael,
Great to know! Thanks for letting me know. 🙂