Postmortem September 17th outage and rollback

Artificial Stupidity · Sep 17, 2023

Voidoom said:
Is there a deadman switch in place to upload a torrent of the site in case something happens to you? I don't think trannies are going to assassinate you, but people die in traffic every day.

Isn't that what the oasis dev did?
It was some form of front page but with links to everything, he died or something and dropped the source.

Osama Bin Laden · Sep 17, 2023

and now we wait

oramge cat · Sep 17, 2023

Null said:
I'm very thankful I get free enterprise help regarding the disks because I hate touching disks. I have a phobia of doing any disk operations.

Chad oramge cat vacuuming the dust out of his PC while it's on and running Ark on max settings vs virgin null scared to look at his own hard drives lest they self-destruct when they sense a person breathing in the same room as them

Jump · Sep 17, 2023

verymuchawful said:
Total Sticker Death.

Wrong. The sticker King @AnOminous Lives.

Long live the King.

BrainSand63 · Sep 17, 2023

I like to imagine that the time Null spent getting the server back up was prolonged by all of my relatively worthless shitposts made over the years.
Kinda like this one.

Seriously though, thanks for all of your work Josh, you’re the best

Pomniman · Sep 17, 2023

This site is always just one ounce of misplaced pizza day mookbong grease away from total self-inflicted destruction.

General Tug Boat · Sep 17, 2023

Well at least the men who wear dresses still have not been successful at getting our stickers. I would hate to lose the amount of puzzle pieces that I have worked tirelessly to collect over the years. It’s like losing the equivalent of all the bathtub HRT that can be cooked up in an evening, would be an absolute tragedy. Resulting in absolute Kiwi Death.

Another L for the elements of nature that try to demolish this forum. At this point the forum is going to get knocked down by a bumble bee knocking into the severs power supply and will somehow result in the same result. Some down time, but for the forum to return in its former glory. Proving that not only men in dresses, but God, nature, and even the disabled have no chance in stopping this wild ride.

Shitty User Stole a Good Name · Sep 17, 2023

I’ll make sure to rate every single post I see with stickers just to make dear feeder’s life a constant misery forever

Haramburger · Sep 17, 2023

Null said:
it only took about 7 hours, where most of that was just waiting on the database to import over 120,000,000 post stickers.

No wonder you dislike the idea of new reactions so much.

Cucktry Roads said:
If post stickers are taking so much time, maybe consider not backing them up. We can live without imaginary internet awards.

You think you do, but you don't. The sticker system merely existing prevents so many unnecessary 1-word shitposts and bannings it would make your head spin. Threads would be practically unreadable, it would take ten times the effort to separate the useful info and content from the chaff, and unlimited paid staff could never hope to parse the rules being violated. React stickers are an indelible part of site culture at this point, removing them would be a catastrophic collapse.

GeorgeHWBush · Sep 17, 2023

Almost a kiwifarms 9/11

innocent jogger · Sep 17, 2023

Damn, I completely missed this whole happening due to Saturday being my day of rest during the "busy seasons" at work, and to me being too busy working all day today to even get on the Farms. It's pretty awesome that it was all taken care of and explained before I even noticed though. Viva la Sneed.

>Why don't you Farm on Saturdays, Jogger?

Because, Fresh Meat, Saturday is SneedlessHangoverDay, the Jogger's day of rest. That means that I don't work, I don't get in a car, I don't ride in a car, I don't pick up the phone, I don't turn on the oven, and I sure as shit DON'T FUCKING FARM. SneedlessHangoverDay!

RedMage · Sep 18, 2023

Dull Pencil said:
How likely is it that 4 disks can break simultaneously?

It was a consent accident.

Scarlett Johansson · Sep 18, 2023

God I thought Fong Jones or Keffals was trying to destroy the site

Patrick Bait-man · Sep 18, 2023

"Hello Joshua. I want to play a game."

"You are currently strapped to your chair at your desk. If you'll look to the side, you'll see a bucket of water hanging above your datacenter."

"Within the next 60 seconds, this bucket will be dropped, resulting in irreversible damage to all data pertaining to the infamous stalker site known as Kiwi Farms; effectively killing it."

"If you wish to save this precious gossip site of yours, you must perform one small task: Chop off your dick."

"To complete this task, a butter knife has been placed in your right hand."

"Sneed or feed, the choice is yours."

Shartavius · Sep 18, 2023

Scarlett Johansson said:
God I thought Fong Jones or Keffals was trying to destroy the site

They are. Did you think they were trying to be women, too?

Smar Mijou · Sep 18, 2023

AnOminous said:
There were the HP SSDs that all failed after an exact number of hours, so if you started with them brand new and put them in an array together, they'd all fail practically on the same nanosecond.

Yes, but wasn't that more of a firmware issue than the actual drives failing? Its been a while since ive kept up on that stuff though, so I might be misremembering. It might just be me sperging, but I would really only consider it a "failure" if it was something that was basically unavoidable or no workaround beforehand. I know flash/SSDs are limited by their read/write cycles and bit-rot, so the excess backups and all that could also cause an artificially short life as well.

AnOminous · Sep 18, 2023

General Tug Boat said:
Well at least the men who wear dresses still have not been successful at getting our stickers.

They may take our drives, but they'll never take our STIIIIICKERS!!!

Smar Mijou said:
Yes, but wasn't that more of a firmware issue than the actual drives failing?

Exactly that.

Scarlett Johansson · Sep 18, 2023

Marc said:
They are. Did you think they were trying to be women, too?

My understanding is that Keffals is too wasted at the moment to attempt

oramge cat · Sep 18, 2023

Jump said:
Wrong. The sticker King @AnOminous Lives.

View attachment 5344604
Long live the King.

I thought Chris had the highest reaction score from when a mod set his to like a billion as a goof? Was it undone?

trash cat · Sep 18, 2023

Null said:
I am guessing there is some sort of adapter on the frontloader's backplate which combines them to a single mobo PCIe slot or something and that shit out.

Dull Pencil said:
How likely is it that 4 disks can break simultaneously?

Autism follows. Feel free to correct my math if I got it wrong.

Drive: Western Digital Ultrastar DC SN620
Mean Time Between Failures (MTBF): 2 million hours (aka 228 years)
Odds one drive fails before MTBF: ~68%
Annualized Failure Rate (AFR): 0.44%

Ultrastar Drive Specs

MTBF probability distribution

Odds of four independent SSD drives all failing over the course of a year:

1 in 1÷(0.0044^4)=~ 1 in 2.7 billion

Odds of four independent SSD drives all failing on a single day:

(1-p)^365 = (1-0.0044)
p= 1-(.9956^(1/365))

1 in 1÷((1-(.9956^(1/365)))^4)=~ 1 in 10^19

This number (10 to the 19th) is more than the number of grains of sand on the earth.

Conclusion:. The drives failures were not independent, and must have had a common cause.

Root causes from most to least likely:

(1) Firmware, Kernel, or ZoL Bug
(2) Failed PCH Electrical Component
(3) Power surge
(4) Sabotage / Exploit

The drive failures can be clustered in time due to common flaws, increased load, etc but four exactly simultaneously is very unusual - although not unheard of.

Many modern devices are brought down via bugs and/or shitty components.

ZoL is pretty stable in general, but when it has bugs they can be catastrophic, because it was ported from Solaris and it's codebase is very complex. Similar issues with complexity with PCH firmware.

The most common circuit hardware component that fails (in any device) is the lowly electrolytic capacitor.

Manufacturers use garbage-tier Chinese capacitors which cannot take any thermal stress.

Example of a capacitor failure: It typically bulges at the top when it goes bust.

My money is on (a) a firmware, kernel, or ZoL bug or (b) an electrical (capacitor) failure between the backplane and the PCH.

In both cases I expect you will recover at least two or three of the drives depending on the configuration.

Dean Pentel said:
So more likely a part failure related to the drives and not four different drives simultaneously failing?

Four drives didn't fail simultaneously without a common cause.

IT Dude said:
Made an account to clarify things. I'm the dude that helps Null with this stuff when he needs it.

There are a few reasons this could have happened, from most likely to less likely:
- BIOS/UEFI Firmware stopped communicating with the NVMe drives. This happened with a certain BIOS setting when it was initially setup
- The drives actually died from the workload. Unlikely considering these can handle 1.7 Drive Writes per day. But very feasible. These are 2nd hand enterprise drives
- The backplane/JNVMe headers exploded. Super unlikely

The drives are likely still alive, and the server's firmware probably took a shit.
We need to inspect the server's BIOS settings or possibly even update the firmware. Then we can determine if the drives are toast or useless.
There was no foul play at hand here. At best a firmware bug, at worse, the drives sudoku'd.

Great info thank you sir. 100% agree. Two or three of the drives are probably fine. Would be curious to know when you find out.

One drive (or the PCH) may have experienced an unusual failure mode and that bug cascaded up the stack. The system as a whole may have been unable to deal with the problem.

Thanks for your hard work together with @Null to track down the issue and bring the site back online.

Seething Troon Collector said:
Here’s what kills me. How does an entire shelf of NVMe drives fail. I’ve never seen that. Spinning rust sure, but solid state storage is pretty resilient. I can buy the firmware issue though, disk firmware can fuck storage faster than you can blink.

It strange, I think it's a low level firmware bug or electrical problem in the motherboard.

Smar Mijou said:
I HIGHLY doubt that 4 drives all completely failed all at once, even if they were made on the same day, sequentially after each other. That's just a hair short of impossible.

Yeah agreed there is no way they all failed at once unless there is a underlying proximal cause.

Postmortem September 17th outage and rollback

Artificial Stupidity

Osama Bin Laden

Have You Seen This Osama?

oramge cat

Neow!

Jump

Onion Enjoyer

BrainSand63

Spreading disease like a dog

Pomniman

He's not a real maaaaaaaaaaaan, yaaaaaaaaaaaaa

General Tug Boat

∆x∆y>=h/4π

Shitty User Stole a Good Name

I rape niggers with razor blades

Haramburger

Gilded Donor II Turbo HD Remix

GeorgeHWBush

I did not have sex with that land-whale

innocent jogger

I say, can one of you chaps spare some sneed?

RedMage

Scarlett Johansson

Hello, I'm Shelley Duvall!

Patrick Bait-man

The Perfect OP

Shartavius

Robert McDougal Alt

Smar Mijou

NOT Jazz's brother 🤔....

AnOminous

Any road will take you there.

Scarlett Johansson

Hello, I'm Shelley Duvall!

oramge cat

Neow!

trash cat

You either die an oldfag or live to be the lolcow