Postmortem September 17th outage and rollback

SanicBlackMagic · Sep 20, 2023

Null said:
I'm very thankful I get free enterprise help regarding the disks because I hate touching disks. I have a phobia of doing any disk operations.

Bro, it aint gay to touch disk, as long as its your own disk and you do so in the privacy of your house. Its when you start touching other disks that you've gotta get a check up every month or you might end up with RAIDS.

Do you want RAIDS? Cause thats how you get RAIDS.

Reminds me, I had a coworker once who had disklexia, couldn't tell the 1's from the 0's. He was non-binary, kind of a fag too IIRC.

No. 7 cat · Sep 21, 2023

TheGuntinator said:
How recent was the backup? Sounds like we nearly experienced Total Kiwi Death.

Also LMAO at the fact that trannies can't do as much damage to KF as a hard drive error can.

Trannies weak, hard drives strong.

Big Brutus · Sep 21, 2023

Just wanted to drop in and thank you for what you do, Null. I appreciate having a forum where I don't get bashed for saying "nigger" or "tranny." :semperfidelis:

Stoneheart · Sep 22, 2023

Well this sounds strange and we maybe need some further investigation into how all harddrives in a raid can break at the same time.

Maximum Girl · Sep 22, 2023

Stoneheart said:
Well this sounds strange and we maybe need some further investigation into how all harddrives in a raid can break at the same time.

In absolute layman terms, and ignoring all other means of failure, a solid state drive can only be written to a specific number of times before it will absolutely, certainly fail. In a RAID, all the drives are kept in sync with each other so that if one fails, the data is still intact on the other working drives. But if they are all the exact same make and model, and installed at the same time, they will all have the same amount of remaining writes before they fail, and will essentially always write at the same time to keep the data in sync between all of them. It's understandable that it would sound sketchy for a bunch to fail at once if you didn't know this specific scenario. As some others have touched on, many companies would prefer to get drives from different manufacturers, or at least not replace every drive in the array at once, to avoid this kind of all-at-once failure.

Stoneheart · Sep 22, 2023

Maximum Girl said:
In absolute layman terms, and ignoring all other means of failure, a solid state drive can only be written to a specific number of times before it will absolutely, certainly fail.

im pretty sure that its a chance game and not a specific number. those numbers you see on SSDs are the amount of writes you can do before the chance of failure gets high enough to be an issue.

Maximum Girl · Sep 22, 2023

Stoneheart said:
im pretty sure that its a chance game and not a specific number. those numbers you see on SSDs are the amount of writes you can do before the chance of failure gets high enough to be an issue.

Basically correct, but at an enterprise level it's fairly predictable, and the margins aren't so far apart that one drive is likely to last significantly longer than the others... unless one fails very early anyway. The difference is likely to be a matter of minutes at most when the drives are under heavy use.

Alber · Sep 23, 2023

Null, the most based peruvian doom nigger i've ever seen
thank you

Kosher Salt · Sep 23, 2023

Maximum Girl said:
In a RAID, all the drives are kept in sync with each other so that if one fails, the data is still intact on the other working drives. But if they are all the exact same make and model, and installed at the same time, they will all have the same amount of remaining writes before they fail, and will essentially always write at the same time to keep the data in sync between all of them.

This is only accurate for a type 1 RAID mirror, which would be pretty retarded if you have 4 drives (especially SSD). You have the storage space that you'd have if you have only 1, and you can afford to have all 4 drives fail before you'd actually lose any data. And, as you said, you'll be writing to all 4 drives exactly the same, so they're more likely to fail at the exact same time.

Typically you'd have a RAID configuration where you can afford to have 1-2 drives fail without losing any data. You get more capacity because all of the drives contain different data and you're not writing to all of them exactly the same. Error correction algorithms are used to rebuild the missing data from a failed drive.

AnOminous · Sep 23, 2023

Stoneheart said:
im pretty sure that its a chance game and not a specific number. those numbers you see on SSDs are the amount of writes you can do before the chance of failure gets high enough to be an issue.

Very occasionally it's actually etched in stone, like when the firmware has a bug that literally kills the drive.

Kosher Salt said:
Typically you'd have a RAID configuration where you can afford to have 1-2 drives fail without losing any data. You get more capacity because all of the drives contain different data and you're not writing to all of them exactly the same. Error correction algorithms are used to rebuild the missing data from a failed drive.

Something like RAID 3 with a parity drive generally at least gives you some failure tolerance and time to move to another array.

I don't really know how this site is configured but I suspect that despite its advantages, that wouldn't really be an option for here.

The Ugly One · Sep 24, 2023

sounds like the solution here is to just not write to drives, maybe write to a napkin or something

Postmortem September 17th outage and rollback

SanicBlackMagic

There were clean undies here, they're gone now.

No. 7 cat

protecc flower pot tabby cat

Big Brutus

Around Blacks, Never Relax

Stoneheart

Well hung, and snow white tan

Maximum Girl

The Max

Stoneheart

Well hung, and snow white tan

Maximum Girl

The Max

Alber

Starrcade; 9x Cocaine Olympics Gold

Kosher Salt

(((NaCl)))

AnOminous

Any road will take you there.

The Ugly One

Thy Wigga With Motion