When upgrading recent versions of NVIDIA, I stumbled across an error I hadn’t seen before.
And Google wasn’t really helpful, so I decided to make a quick blog about it myself.
EDIT: it is mentioned in the NVIDIA user guide but for information on the vague error I will keep the blog up.
I was upgrading a VDI cluster from 7.0.2 to 8.0 and everything was successful with the only step remaining was upgrading the Nvidia VIBs.
The customer was still running on an older version of GRID (previous LTSR), so we wanted to upgrade the hypervisor drivers to 16.x to remain on the latest LTSR.
When running the remove command, it continuously received the following error:
[root@esx1:~] esxcli software Vib remove --maintenance-mode -n NVIDIA-VMware_Esxi_6.7_Host_Driver [Live InstallationError] Error in running rm /tardisks/nvidia v.vOO: Return code: 1 output: rm: can't remove ' /tardisks/nvidia v.vOO': Device or resource busy It is not safe to continue. Please reboot the host immediately to discard the unfinished update. cause = Error in running rm /tardisks/nvidia v.vOO: Return code: 1 output: rm: can't remove ' /tardisks/nvidia v.vOO': Device or resource busy It is not safe to continue. Please reboot the host inunediately to discard the unfinished update. Please refer to the log file for more details. [root@esx1:~] |
---|
So after reading the error message I rebooted the server and tried the remove it again.. BUT the same error…
TLDR
After quite a while of searching for what might have caused the issue, I looked at all the running services within the ESXi and saw it!
With the newer versions of NVIDIA, it is necessary to stop the nvdGpuMgmtDeamon service before you can remove the VIB.
[root@esx1:~] etc/init.d/nvdgpumgmtdeamon stop [root@esx1:~] |
---|
After stopping this service, the removal will be successful and you will be able to reinstall the newer NVIDIA VIB.
I hope this small but to-the-point blog helps you!
More blogs on NVIDIA: NVIDIA vGPU less framebuffer available