The Year of Endless Technical Problems

Diggus Bickus · Sep 10, 2025

Just had this happen by merely clicking "What's new" by the way.

Null · Sep 10, 2025

Vecr said:
If anything looks weird in here, don't run it. It's pretty straightforwards though:

this has absolutely and completely and totally slaughtered the system. iowait is 25%+ with no cpu usage.

Null · Sep 10, 2025

SNEED must be mq-deadline or the site eats shit.

oh yeah oh yeah oh yeah oh yeah

babaisyou · Sep 10, 2025

If SNEED is having issues with BFQ then you might benefit from setting slice_idle to zero, per this kernel document. But I believe ZFS has it's own IO scheduler so it shouldn't be using one.

CHUCK should also have no IO scheduler on it as it is NVMe backed.

Vecr · Sep 10, 2025

Null said:
this has absolutely and completely and totally slaughtered the system. iowait is 25%+ with no cpu usage.

Is

Code:

$ zcat /proc/config.gz | grep CONFIG_HZ_1000
CONFIG_HZ_1000=y

on your machine?
Sorry for not telling you about that. You need a fast tick rate. Back to the drawing board, I'll try to test with a load that creates higher IOWAIT on a system I have access to.

Do you have a particular NUMA setup? What program gets pinned to what node, that sort of thing. That might be useful to know for testing.

Ah yeah, the scheduler behaves differently when IOWAIT is high but CPU is low.

Null · Sep 10, 2025

Vecr said:
on your machine?

No.

I've reconfigured MariaDB to be less resource hoggy. I am seeing windows where the site is blazing fast and I am trying to figure out how.

Lian Xing · Sep 10, 2025

>an unexpected database error occurred.
I thought we'd lost the ability to sneed forever.

Harvey Danger · Sep 10, 2025

Lian Xing said:
>an unexpected database error occurred.
I thought we'd lost the ability to sneed forever.

Buckle up cowboy, we're testing in PROD.

Null said:
SNEED must be mq-deadline or the site eats shit.

That's weird. You said SNEED is a SSD, have you tried explicitly setting to none already?

Null · Sep 10, 2025

Harvey Danger said:
That's weird. You said SNEED is a SSD, have you tried explicitly setting to none already?

yes as previously explained it killed the site

gentoofag · Sep 10, 2025

Do you have compression enabled on any filesystems or zfs datasets? If so turn it off right now. I had a problem where my system would stall for a minute after using a Windows VM. Turns out it was caused by f2fs's kernel threads compressing all the writes that were done to the VM image.
A way to check for similar issues is to show kernel threads in htop (shift-k) and look for ones with high priority and CPU usage. Also keep a window open with dmesg -w -H and watch for anything interesting to show up.

Vecr · Sep 10, 2025

I didn't make much progress. I don't want to bother you more. If you get a CONFIG_HZ_1000=y kernel and details on your NUMA setup (if you have one) I can try to help again.

SCV · Sep 10, 2025

I had some thoughts more thoughts about this since yesterday. The correct way is still add monitoring until the problem becomes apparent but seeing as we're doin' the cowboy thing I have a few things to try that so far haven't been suggested (in this thread at least).

Have you tried turning pcie power management off? Just add "pcie_aspm=off" to the grub linux command, update grub, and reboot. I've seen a few times where buggy power management can tank performance or imitate a flaky pcie device or connection. And since you have nvme drives...

I assume the server has ECC memory but do you have rasdaemon setup so you actually will see ECC (and other machine check) errors? ECC errors will tank performance but can be sporadic based on what (or nothing) is using that memory or even memory temperature and it won't necessarily crash if the ECC can recover. Since the site hasn't been down for several days recently you probably haven't run memtest but at this point it might be worth it. You MUST use the free version of memtest86+ from the company website. The one bundled with most linux distros WILL NOT REPORT CORRECTED ECC ERRORS.

I know you said you use debian but we are on a new-ish server. What kernel version are we on currently? If it's older a yolo upgrade to the newest LTS might just werk (YeeHaw!)

I presume you've checked dmesg for anything suspicious. But giving us a copy of dmesg to look at might yield some clues.

Edit: I feel like this must've been checked but during the slowness there's no packet loss right?

Null · Sep 10, 2025

It's so weird, I've had this burning all consuming desire to fix the site all week, I sat down and did 6 hours of work on it today, and almost as soon as I got it working great, Charlie got shot.

Looseleaf Paper · Sep 10, 2025

Null said:
It's so weird, I've had this burning all consuming desire to fix the site all week, I sat down and did 6 hours of work on it today, and almost as soon as I got it working great, Charlie got shot.

Is the traffic from the shooting killing the clear net? I had to dust off my TOR browser to post this.

Null · Sep 10, 2025

No. DNS issue. Will resolve itself.

The Noise · Sep 11, 2025

how it feels to finally be able to use 3000+ page threads again without the site shitting itself and doing nothing

thanks null

skunt · Sep 18, 2025

any updates on this @Null, did you manage to fix it?
did anything here help?

Provably Wrong · Sep 18, 2025

In last week’s MATI, Josh said AI suggested an issue with having lots of requests allocating and releasing lots of memory each and reducing that happened to solve the issue. Not because of not enough memory but because you can’t do infinity of these memory operations at once and apparently we hit the limit because Josh was feeling RAM rich and upped the spending limits like a nigger getting his first credit card.

At this rate he might just abandon us and just post to his AI so be can get the answers he wants, and just have AI Josh niggerpost in a random thread every other day.

Margo Martindale · Sep 19, 2025

It still feels kinda slow, like half the time the reaction image icons are not even loading for more, and some images

MerelyAPlateOfSpaghetti · Sep 19, 2025

Margo Martindale said:
It still feels kinda slow, like half the time the reaction image icons are not even loading for more, and some images

I don't disagree, but it's been reliably slow. No more random 504 errors, very few "clicked a link and it took 15 seconds to load" issues. That's a major step in the right direction.

The Year of Endless Technical Problems

Diggus Bickus

Oh Watamelons, and Molasses.

Null

Ooperator

Null

Ooperator

babaisyou

Vecr

DM if I don't respond.

Null

Ooperator

Lian Xing

Gabe?...Gabe!?

Harvey Danger

getting tired of this whole internet thing

Null

Ooperator

gentoofag

Life is Mizzy

Vecr

DM if I don't respond.

SCV

ffmpeg -i nothing_really_mattress.mkv

Null

Ooperator

Looseleaf Paper

Null

Ooperator

The Noise

The Noiseim Have Gone Insane

skunt

Morning! Have a ship!

Provably Wrong

Release BroTeam from his Machinima contract

Margo Martindale

The Trannytale Strangler

MerelyAPlateOfSpaghetti

Reject attraction to degeneracy