Postmortem September 17th outage and rollback

  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
I'm very thankful I get free enterprise help regarding the disks because I hate touching disks. I have a phobia of doing any disk operations.

Bro, it aint gay to touch disk, as long as its your own disk and you do so in the privacy of your house. Its when you start touching other disks that you've gotta get a check up every month or you might end up with RAIDS.

Do you want RAIDS? Cause thats how you get RAIDS.

Reminds me, I had a coworker once who had disklexia, couldn't tell the 1's from the 0's. He was non-binary, kind of a fag too IIRC.
 
How recent was the backup? Sounds like we nearly experienced Total Kiwi Death.

Also LMAO at the fact that trannies can't do as much damage to KF as a hard drive error can.
Trannies weak, hard drives strong.
 
Just wanted to drop in and thank you for what you do, Null. I appreciate having a forum where I don't get bashed for saying "nigger" or "tranny." :semperfidelis:
 
Well this sounds strange and we maybe need some further investigation into how all harddrives in a raid can break at the same time.
 
Well this sounds strange and we maybe need some further investigation into how all harddrives in a raid can break at the same time.
In absolute layman terms, and ignoring all other means of failure, a solid state drive can only be written to a specific number of times before it will absolutely, certainly fail. In a RAID, all the drives are kept in sync with each other so that if one fails, the data is still intact on the other working drives. But if they are all the exact same make and model, and installed at the same time, they will all have the same amount of remaining writes before they fail, and will essentially always write at the same time to keep the data in sync between all of them. It's understandable that it would sound sketchy for a bunch to fail at once if you didn't know this specific scenario. As some others have touched on, many companies would prefer to get drives from different manufacturers, or at least not replace every drive in the array at once, to avoid this kind of all-at-once failure.
 
In absolute layman terms, and ignoring all other means of failure, a solid state drive can only be written to a specific number of times before it will absolutely, certainly fail.
im pretty sure that its a chance game and not a specific number. those numbers you see on SSDs are the amount of writes you can do before the chance of failure gets high enough to be an issue.
 
im pretty sure that its a chance game and not a specific number. those numbers you see on SSDs are the amount of writes you can do before the chance of failure gets high enough to be an issue.
Basically correct, but at an enterprise level it's fairly predictable, and the margins aren't so far apart that one drive is likely to last significantly longer than the others... unless one fails very early anyway. The difference is likely to be a matter of minutes at most when the drives are under heavy use.
 
Null, the most based peruvian doom nigger i've ever seen
thank you
 
In a RAID, all the drives are kept in sync with each other so that if one fails, the data is still intact on the other working drives. But if they are all the exact same make and model, and installed at the same time, they will all have the same amount of remaining writes before they fail, and will essentially always write at the same time to keep the data in sync between all of them.
This is only accurate for a type 1 RAID mirror, which would be pretty retarded if you have 4 drives (especially SSD). You have the storage space that you'd have if you have only 1, and you can afford to have all 4 drives fail before you'd actually lose any data. And, as you said, you'll be writing to all 4 drives exactly the same, so they're more likely to fail at the exact same time.

Typically you'd have a RAID configuration where you can afford to have 1-2 drives fail without losing any data. You get more capacity because all of the drives contain different data and you're not writing to all of them exactly the same. Error correction algorithms are used to rebuild the missing data from a failed drive.
 
im pretty sure that its a chance game and not a specific number. those numbers you see on SSDs are the amount of writes you can do before the chance of failure gets high enough to be an issue.
Very occasionally it's actually etched in stone, like when the firmware has a bug that literally kills the drive.
Typically you'd have a RAID configuration where you can afford to have 1-2 drives fail without losing any data. You get more capacity because all of the drives contain different data and you're not writing to all of them exactly the same. Error correction algorithms are used to rebuild the missing data from a failed drive.
Something like RAID 3 with a parity drive generally at least gives you some failure tolerance and time to move to another array.

I don't really know how this site is configured but I suspect that despite its advantages, that wouldn't really be an option for here.
 
Back
Top Bottom