Postmortem September 17th outage and rollback

  • 🔧 Issue with uploading attachments resolved.
  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
Frontloader on the backplate of what? I guess the injection is made with another server on wherever the fuck KF connects to the internet?
He almost certainly means something like this, so the NVMe drives can be swapped without disassembling the system:
2023-09-17_11-54.png
Which I didn't know they had M.2 adapters for these days, I may have to upgrade. The U.2 drives are fairly esoteric and usually only come in "Enterprise" flavors.
 
Thankfully I didn't try to load up the site till after hearing the cause.. So no worries about another troon take down attempt or something. Good work Null and thanks for all that you do.


Have you ever thought it possible that someone might try to physically sabotage Kiwi Farms?

I've thought about the possibility of it being only a matter of time before someone tries at this point. We know how obsessed/extreme troons and their allies can get.


Stickers are heavy?

I suspect quite so in fact. Given how they are the first to shit out when the site is under stress. Not just here either. Xeno's whole implementation of them it seems like.
 
Last edited:
Question for the nerds: how do enterprises and critical systems etc stop this happening? I guess it makes sense that if you have 4 drives made at the same time, being used in the same way, they will fail at the same time or very close to each other. Making RAID more risky?
Replication, redundancies, load-balancing between many servrs. so money basically
 
Question for the nerds: how do enterprises and critical systems etc stop this happening? I guess it makes sense that if you have 4 drives made at the same time, being used in the same way, they will fail at the same time or very close to each other. Making RAID more risky?

They have multiple servers that cost tens of thousands of dollars each and which are dedicated to a single purpose and have staff that constantly monitor them.
 
Stickers are heavy?

Reminds me of what happened on sherdog .com
The Likepocalypse.


Alright, here's the deal man. Tech has reset Like counters to zero. They did this because the system was getting too clogged up - you may have noticed serious browsing issues the last few months, and this is why.

Please don't e-suicide your accounts because of this. We know this sucks, and that they meant a lot to some of you, but it was either this or have a janky forum. Parent company went with the former.

Tech has also implemented a limiter for standard accounts, where posters should only be able to like a certain number of posts each day. This also applies to profile posts and comments as well.


Direct quote from Tech:
Likes are reason your servers have been going off the rails. It's also making it difficult to perform some routine administrative tasks such as moderate users, change forum permissions, usergroup permissions, and comply with GDPR requests. They're in too many of the necessary database queries and they cause locks. We would have to rewrite core XF files to work around it. Our techs advise against that because the site would break every time we updated and it could create security vulnerabilities. We're going to have to remove them all. Your admins may put restrictions on their use afterward to cut down on excessive usage. You'll be able to like this post when we're done (or not).
 
Last edited:
It was a brief outage, rollbacks always suck but it was only half a day really.

I think it was handled very well. Good job Null.
 
Question for the nerds: how do enterprises and critical systems etc stop this happening? I guess it makes sense that if you have 4 drives made at the same time, being used in the same way, they will fail at the same time or very close to each other. Making RAID more risky?
Once you get into mid range server hardware, a lot of SANs will have two RAID cards, complete with power and everything, servicing the same array. If one card fails, it offloads everything to the other card at the cost of degraded IO. Likewise, you usually have drives built to a certain spec coming from multiple manufacturers/batches to prevent simultaneous drive failures.

Can't speak for REALLY big hardware, but that's one method.
 
Every piece of hardware ever made, from the simplest roller bearing to the most precision circuit board, has a finite amount of usable cycles. Just like our gay dying bodily organs.
In some instances, you install components from the same batch on purpose so that they wear evenly, like chains and belts.
In other instances like redundancy arrays, you kind of want the opposite theory. But it isn't always possible to hand-pick your components, especially if you aren't personally there to purchase and install them.
Thanks for the grand effort, Commander Ween.
 
If post stickers are taking so much time, maybe consider not backing them up. We can live without imaginary internet awards.
 
Likewise, you usually have drives built to a certain spec coming from multiple manufacturers/batches to prevent simultaneous drive failures.

It still happens all the fukkin time. One drive fails, leading the other drives to become more stressed, and then another drive fails.
 
Back
Top Bottom