Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How does NVME make local storage more reliable?


I think they were saying it shouldn’t be failing, not that it’s reliable. From personal experience, I got a server with two local drives. Both drives ended up failing within 15 minutes of each other. That was an annoying day…


Cascading failures in drives is a fairly common occurrence unless you're actively trying to avoid it. People will typically just buy several of the same drive, from the same source, because it's easy and seems to make sense. What you've likely done is bought several of the same drive, from the same manufacturing batch, and then brought them all online at the same time. If there's a manufacturing issue, or just an expected lifetime for those drives, you're going to hit at almost the same moment across all your drives.

The ideal here is to buy similarly specced drives from multiple manufacturers to reduce the risk. At the very least buy from multiple suppliers to reduce your risk of getting drives from the same batch if this is something you're going to care about.


Here’s one fairly recent example:

https://www.zdnet.com/article/hpe-says-firmware-bug-will-bri... [2020]


"Well NVME drives shouldn't fail at a high rate"

What is it about NVME that means they shouldn't fail at a high rate? I don't understand how the protocol should matter much.


I, personally, think a protocol should be resilient and never fail unless there is absolutely no other choice (like TCP where failures are so common that recovering is part of the protocol).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: