Postmortem September 17th outage and rollback

Toolbox · Sep 17, 2023

ClashCity said:
It's the exact opposite over here. i can only access the onionnsite and st just shows this every time for the past 2 hours

I worded that a bit poorly, didn't mean to say I couldn't log in, just figured maybe exiting the browser and logging in again after a bit might fix some things. Couldn't post as they said, the button to show the password when logging in didn't work, attempting to give posts stickers failed to bring open the menu, that one happens quite a lot when something goes wrong. Replying on/to profile posts also did not function.

AnOminous · Sep 17, 2023

IT Dude said:
There was no foul play at hand here. At best a firmware bug, at worse, the drives sudoku'd.

Good to hear that because I was initially suspicious of troon shenanigans from some insider.

reptile baht spaniard rid · Sep 17, 2023

I would recommend adding a totally separate drive of a different type on a different controller to reduce single-path failures.

ZFS is rock solid, but underlying hardware? Not so much these days.

DavidS877 · Sep 17, 2023

St.Davis said:
I'm surprised that took up much time, surely it's just a few integers attached to each post? Or is this sort of data stored in some other ridiculous fashion? If you're merely being facetious then disregard.

Hopefully Mysql dump/restore is smart enough to disable indexes(and FK constraints and similar) during the restore and replace them after... if not then I could see 120,000,000 stickers taking a while.

SaidNoOneEver · Sep 17, 2023

Null said:
chat up

The ross and juice emotes are gone!

1996 Toyota Camry · Sep 17, 2023

Resunoit said:
I thought it was another Troon attack.

Troons working for WD have had a backdoor in the SSD firmware just in case Joshua Moon of Kiwi Farms, racist, homophobic, koumpounophobic, swatting forum got a hold of some and then and nuked all of them at once

Fluoxetine Man · Sep 17, 2023

200MB said:
My reaction score will never financially recover from this.

Wait, how did you get 6 million reaction stickers in 4 posts?

AltisticRight · Sep 17, 2023

Even harddrives aren't free from consent ACKcidents.

UERISIMILITUDO · Sep 17, 2023

St.Davis said:
I'm surprised that took up much time, surely it's just a few integers attached to each post? Or is this sort of data stored in some other ridiculous fashion? If you're merely being facetious then disregard.

When I first joined this website, I was impressed at the number of fancy things it does for its users. Many of them are append-only sequences of indefinite size. Stickers would best be represented as a list of reaction type and user elements, but there are probably other representations needed for efficient implementation of other features. If all of the stickers are verified and transformed into other representations, I can see it taking a while, although not hours.

reptile baht spaniard rid said:
ZFS is rock solid, but underlying hardware? Not so much these days.

Software wishes it could be as reliable as hardware. Unfortunately, yes, the trend is for hardware to become as reliable as software, in the worst possible way.

Seething Troon Collector · Sep 17, 2023

Here’s what kills me. How does an entire shelf of NVMe drives fail. I’ve never seen that. Spinning rust sure, but solid state storage is pretty resilient. I can buy the firmware issue though, disk firmware can fuck storage faster than you can blink.

Smar Mijou · Sep 17, 2023

Seething Troon Collector said:
Here’s what kills me. How does an entire shelf of NVMe drives fail. I’ve never seen that. Spinning rust sure, but solid state storage is pretty resilient. I can buy the firmware issue though, disk firmware can fuck storage faster than you can blink.

My thoughts is an input failure as stated by others, or a single SSD failure that overloaded the others leading to a cascade failure of the entire setup. The IT guy seems to think a firmware bug is likely, which is also possible. I HIGHLY doubt that 4 drives all completely failed all at once, even if they were made on the same day, sequentially after each other. That's just a hair short of impossible.

Yankee Shogun · Sep 17, 2023

The Hero of Kvatch said:
I thought trannies were sending mortar fire at Kiwi Farms HQ again.

This makes me think of Kiwifarms HQ as some kind of Barad-Dûr esque fortress, though this time the Orcs are attacking us.

Q !!Hs1Jq13jV6 · Sep 17, 2023

Seething Troon Collector said:
How does an entire shelf of NVMe drives fail.

it is not uncommon for entire ssd raid arrays to fail at (roughly) the same time as the drives all roughly have the same lifetime
i've seen this mentioned in a couple of raid guides
hdds are a lot more unlikely to fail at the same time as long as someone isn't shouting NIGGER at the drives at the time of failure

the common poor mans solution to this is to buy an extra drive and swap it with each of the drives after several days of use to make sure no two drives will die at the same time because they have reached the end of their lifetime

Sir Baz the Intolerant · Sep 17, 2023

IT Dude said:
There was no foul play at hand here. At best a firmware bug, at worse, the drives sudoku'd.

Imagine being a troon who's wasted years of your already shortened life trying to bring down an autistic gossip site filled with harmless retards, only for you to fail spectacularly at every instance. But then, out of nowhere, the site drops and you are vindicated, all the federal crimes you committed to bring those transphobes down came to fruition and you can now harass women and children in peace.

But nope, all of your gay-ops did fuck all except cost us time, and it was a fucking drive failure of all things that almost caused us some actual damage, and we still tanked our way through that. The coping, seething, and dilating must be on another fucking level. Beautiful.

(And thanks for the help you've given to our Dear Feeder in keeping the site up It Dude, it's very much appreciated).

Lithuophile · Sep 17, 2023

Seething Troon Collector said:
Here’s what kills me. How does an entire shelf of NVMe drives fail

Skill issue

AnOminous · Sep 17, 2023

Smar Mijou said:
I HIGHLY doubt that 4 drives all completely failed all at once, even if they were made on the same day, sequentially after each other. That's just a hair short of impossible.

There were the HP SSDs that all failed after an exact number of hours, so if you started with them brand new and put them in an array together, they'd all fail practically on the same nanosecond.

Unknown Recipient #12 · Sep 17, 2023

Null said:
Yes. I have it set up so if a nuclear bomb detonates in the datacenter, we won't lose anything.

Is there a deadman switch in place to upload a torrent of the site in case something happens to you? I don't think trannies are going to assassinate you, but people die in traffic every day.

elastic eye · Sep 17, 2023

Sounds like the backplane the drives slot into failed and fried the drives to me. Only decent explanation besides all 4 drives coming from the same batch. That backplane should also be discarded and replaced, hopefully the server chassis has more than 4 slots.

reptile baht spaniard rid · Sep 17, 2023

Seething Troon Collector said:
Here’s what kills me. How does an entire shelf of NVMe drives fail. I’ve never seen that. Spinning rust sure, but solid state storage is pretty resilient. I can buy the firmware issue though, disk firmware can fuck storage faster than you can blink.

It's much much MUCH more likely that something happened on one drive that "took them all down" from the viewpoint of the machine, though usually rebooting it recovers mostly. ZFS can go sideways but the admin sounds competent enough to notice the old "off by one" errors that cause all drives to be foreign and need importing.

But a drive locking up on a return causing the driver (which is shared with all the drives) to lockup? That shit happens in Linux all the time.

soapy124 · Sep 17, 2023

all hail holy nool. thx for keeping the site up no matter what, i adore the hard work. seriously irreplaceable.

Postmortem September 17th outage and rollback

Toolbox

Coon Ethnostate

AnOminous

Any road will take you there.

reptile baht spaniard rid

witless witness schema iguanas

DavidS877

2026, year of DOOM.

SaidNoOneEver

1996 Toyota Camry

Fluoxetine Man

Bouncing dollars off the Fat Controller!

AltisticRight

UERISIMILITUDO

UNA PERSONA AGRICOLARUM MIHI EST

Seething Troon Collector

Purveyor of fine European cheeses

Smar Mijou

NOT Jazz's brother 🤔....

Yankee Shogun

The last Sneedgun of Japan

Q !!Hs1Jq13jV6

trust the plan

Sir Baz the Intolerant

Just don't like 'em, simple as

Lithuophile

AnOminous

Any road will take you there.

Unknown Recipient #12

elastic eye

the color of melancholy

reptile baht spaniard rid

witless witness schema iguanas

soapy124