When a local disk fails in an instance, you end up with an empty disk upon live ...

Nextgrid · on Aug 16, 2022

> you'll get IO errors, and then the IO errors will go away once the migration completes but your disk will be empty.

This seems extremely dangerous as nothing notifies the OS to unmount the filesystem and flush its caches, leading to trashing of the new disk as well. The only way to recover would be to manually unmount, drop all IO caches, then reformat and remount.

paulfurtado · on Aug 16, 2022

When standard filesystems like ext4 and xfs hit enough io errors,they unmount the filesystem. I find that this happens pretty reliably in AWS at least and I can't imagine the filesystem possibly continuing to do very much when 100% of the underlying data has disappeared.

That said, from further reading of the GCP docs, it does sound like if they detect a disk failure they will reboot the VM as part of the not-so-live migration.

fragmede · on Aug 16, 2022

That's the failure mode for bad disk, but are you saying that in the normal case of live migrate (eg BIOS update needs to be applied to host machine), that the (data on the) local SSD is a also seamlessly moved to the new host, seamlessly and invisible to the guest VM?

paulfurtado · on Aug 16, 2022

Yes, under a graceful live migration with no hardware failure, the data is seamlessly moved to a new machine. The problem of moving local data is ultimately no different that live migrating the actual RAM in the machine. The performance does degrade briefly during the migration, but typically this is a very short time window.

You can read more about GCP live migrations here: https://cloud.google.com/compute/docs/instances/live-migrati...