Lately at $employer we have been planning an upgrade of VMware vCenter from version 5.5 installed on a Windows server, to version 6.0u1 installed as the vCenter Appliance (VCSA).
Fortunately there are plenty of smart people that have been in the same situation as us, who have developed an unofficial migration tool (the VCS to VCVA Converter fling) to move from windows vCenter 5.5 to the corresponding appliance version. Subsequent to that process, the idea is to run the usual upgrade install from VCSA 5.5 to VCSA 6.0u1, which theoretically should run without any issues.
Sadly, in our test lab we have experienced a few issues with this process. Since some of the remediation steps don’t appear to have been published, I decided to document the issues and resolution here in the hope that others will find it helpful.
The first issue experienced in the migration process was the size of the migration fling. The OVA deployment configures the VM with 4GB of RAM, but I found this to be insufficient to cater for the 4.8GB MSSQL database attached to the Windows vCenter. Increasing the RAM to 10GB allowed the database migration, along with event history, to proceed in a reasonable timeframe, although the migration still took a couple of hours.
The next issue experienced in our lab for the upgrade of the VCSA 5.5 to VCSA 6 related to SSLv3 being disabled in vCenter 5.5u3b. This is obviously a useful thing from a security point of view, but some of the process expects SSLv3 to work, so a bit of searching turned up VMware KB 2139396. Follow the steps listed under “Security Token Service (sts) – Port 7444” to remove the xml key ‘sslEnabledProtocols’ and its associated value, then restart the vmware-stsd service.
Enabling SSLv3 allowed the VCSA 6 upgrade to commence, but we were met with the dreaded “failed to start services. Firstboot error” message displayed in the DCUI of the resulting VCSA 6 appliance. Some contact with VMware support suggested that deploying a “tiny” VCSA 6 appliance with 8GB RAM was too small, and that 16GB may address this issue. However, the issue persisted after deploying a “small” vcsa 6 with 16GB RAM.
Further analysis of the support bundle pointed towards an issue with the database migration to the new appliance. The log file containing details of exactly what went wrong with the database migration was /var/log/vmware/vpxd/vcdb_import.err on the vcsa 6 appliance. This file contained the following entry:
psql.bin:/storage/seat/cis-export-folder/vcdb/create_constr.sql:914: ERROR: insert or update on table "vpx_event_arg" violates foreign key constraint "fk_vpx_event_arg_ref_event"
DETAIL: Key (event_id)=(1826342) is not present in table "vpx_event".
VMware support’s solution for this was to truncate the vpx_event and vpx_event_arg tables in the vcsa 5.5 and re-run the vcsa 6 upgrade. This wasn’t something we wanted to do, since in our real production environment we wanted to maintain this historical data. I’d never used postgres before, but after some reading I came up with the following solution.
The error log indicated that a specific entry in the database was causing the migration to fail, so I decided to take a look in the database on the vcsa 5.5 appliance. To access this database, use the following command:
/opt/vmware/vpostgres/1.0/bin/psql -d VCDB vc
Once there, I could check which entries were present in vpx_event_arg but not present in vpx_event. For the sake of completeness I decided to check rows that were in the opposite condition too – present in vpx_event but not vpx_event_arg. This last step may or may not be required, but it’s what I did to perform the upgrade successfully, so I’ll document it here anyway.
Running the following query indicated that 27 rows (out of approximately 1.8million) in the vpx_event_arg table were an issue.
SELECT vpx_event_arg.event_id, vpx_event_arg.vm_id, vpx_event_arg.host_id FROM vpx_event_arg WHERE NOT EXISTS (SELECT vpx_event.event_id FROM vpx_event WHERE vpx_event.event_id = vpx_event_arg.event_id);
Changing the SELECT to DELETE got rid of the offending rows. Repeating the process in the reverse identified a further 701 rows.
SELECT vpx_event.event_id, vpx_event.event_type, vpx_event.vm_name, vpx_event.host_name FROM vpx_event WHERE NOT EXISTS (SELECT vpx_event_arg.event_id FROM vpx_event_arg WHERE vpx_event.event_id = vpx_event_arg.event_id);
After removing these entries, I re-ran the VCSA 6 upgrade and it completed successfully.