Postmortem September 17th outage and rollback

  • 🏰 The Fediverse is up. If you know, you know.
  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
I'm surprised at how well you've been handling all the site's technical problems given that you're basically a one-man operation, at least at the top-level of things.
 
NVM means non-volatile memory, which is another name for flash memory, so yes.
My point is that the term "hard drive" specifically refers to drives that use spinning magnetic platters to store the data on.

Your drive has spinning platters, a read/write head, is noisy and heavy? That's a hard drive.
Your drive has a bunch of chips and circuitry on it, weighs nothing and doesn't make a sound? That's a solid state drive.

That's all. If it's a spinning platter drive, it's a hard drive. If it's a solid state memory drive, it's an SSD. It can be a SATA SSD or a NVMe SSD in this case.

But you get the point, four NVMe drives went tits up because they were from the same batch so they failed at the same time and that brought the entire site down.

There's a reason companies like Backblaze buy various drives from various manufacturers for the same array. The more random your drive selection is, the lesser the chance for multiple of them to go out at the same time.
 
Only oddity I've noticed is image thumbnails not displaying properly in posts but that'll probably pass. This is what I'm talking about:

1694971892009.png
 
I'm still getting the 502 Bad Gateway error when opening any chat, have other kiwis noticed or is it a problem on my side?
 
it only took about 7 hours, where most of that was just waiting on the database to import over 120,000,000 post stickers.
Truly the most important and vital piece of the entire site.
 
Every time the site is down I'm reminded of how fucking dogshit the internet has become. Went on /b/ for the first time in years and two posts of the first line of the catalog were actual child porn and the rest was tranny shit. /pol/ is like a fraction of a degree above Facebook at this point. Thank you for keeping one of the few remaining bastions of entertaining websites alive Jewsh,
 
I am perfectly fine with nuking all the old stickers left on everything and starting fresh if they really cause that much trouble.
B-but my updoots! I worked hard for my reddit kar-- stickers, mister! I need my validation!


Unironically though, if the stickers take up that much fuckin' space and the majority of the reinstall time, get rid of them or simplify em to like 5.
 
I'm surprised at how well you've been handling all the site's technical problems given that you're basically a one-man operation, at least at the top-level of things.
I'm very thankful I get free enterprise help regarding the disks because I hate touching disks. I have a phobia of doing any disk operations.
 
Hum, four enterprise harddrives failed simultaneously. Surely thats just a coincidence and not something more serious?
It really sounds like it's just a problem with the raid card or the back panel unit for the hotswap front loader. Null mentioned that he was running an ASRock motherboard before. That would use software raid with storage directly connected to the board via Pcie slots. He recently moved over to a previously used enterprise server probably a 2U from Dell or HP or Supermicro. Those use hardware solutions for their raid and storage solutions. The drives are not directly patched into the motherboard, but through a subsystem that handles the IO which connects all of the drives to the motherboard simultaneously. It's prohibitively expensive to buy or lease a current gen HP or Dell. You would most likely be going with last Gen which may have been running this subsystem for 5 years on max load. It's more likely that the storage subsystem failed than multiple U.2 drives, at least without the drives failing in sequence and logging errors on each failure. Something I'm sure was checked.
 
Back
Top Bottom