Raid B is Raid-6 NVMe with 7TB named 'CHUCK'.
This post got long as shit so I've split it up into sections. The section I highlighted with

is the one I reckon is the most valuable in terms of low risk / high potential reward and very little effort required.
RAID
What's the purpose of CHUCK? If it's running database shit, RAID 10 is probably a better choice due to faster recovery from failures and since it's a mirror + stripe you get read and write benefits that scale with the number of striped sets of mirrors you have.
Also you mentioned ZFS for the first RAID but just called this one RAID 6, is CHUCK actually some form of hardware RAID or using Linux software RAID (md or LVM)? If it's not ZFS I'd suggest converting it into a ZFS pool of some type.
nginx
Last year
I wrote a long ass post talking about proxy request buffering in nginx in response to you providing samples of the middle node's config. The idea was to address uploads stalling at 100% due to the middle nodes soaking up the request buffer before sending it on. The config you were running at that time looked vulnerable to goloris-style worker exhaustion attacks and if you haven't tightened up the timeouts, you should. The default limit is 512 * cpu_cores for worker connections.
TCP Congestion Control Algorithms
The default TCP Congestion Control algorithm in the Linux kernel is CUBIC and frankly it sucks shit if you have any packet loss whatsoever. It uses packet loss as a feedback mechanism to say "Holy shitballs we're overloading this pathetic faggot link!" and back off, creating a sawtooth pattern all too familiar to any autist who stares at network graphs all day.
Google created an algorithm relatively recently called BBR that is designed around link speed prediction. The effect of this is that transfer rates remain high even if you're experiencing some packet loss as this is not used as a signal to back off. I switched to BBR a few weeks ago on BMJ TV and it did show an improvement in estimated connection speeds for clients.
If you do set this up, you'll need to switch the algorithm to BBR from the backend all the way to the L4 frontends to experience any real benefit. No reboot required and it can coexist fine with CUBIC end-users.
Add
net.ipv4.tcp_congestion_control = bbr to sysctl to use BBR.
MTU
Between your zero trust L4 proxies, Kiwi Flare and the Super Secret backend, are you doing any tunneling? IPsec, GRE, WireGuard, OpenVPN...
anything? The reason I ask is that encapsulation adds overhead and if not every hop along the journey is fully aware of the extent of that overhead, you will have issues.
What I would suggest is if you're using WireGuard (as an example) between the L4 proxy and Kiwi Flare, drop the MTU for the public facing interface to match the tunnel MTU (by default 1420 bytes). The reason is that 1460/1440 bytes MSS (IPv4 and IPv6 respectively) is being negotiated because both ends think they have 1500 bytes to work with, but when it tries to squeeze that through the tunnel, it won't fit and will either fragment (depending on the DF flag) or drop the packet.
This all being said, it's a tad unlikely that's the major cause of performance issues. MTU size issues generally cause packet drops, not inexplicable sluggishness, and it's actually pretty easy to rule it in or out: just change your client to the lowest possible MTU (1280 bytes) and see if the site magically becomes more usable. I've already tried this and honestly the site still runs like shit. Still, it doesn't hurt to spend a few minutes with a calculator to make sure there's no unaddressed bottlenecks.
Virtualization
Are you doing any virtualization at all? Is everything running bare metal? If so, I'd suggest you seriously consider a re-architecture around converting the host into a hypervisor and separating the services from one another into separate guest VMs.
MariaDB, the self-hosted S3 shit, Redis, web frontends, the reverse proxies themselves: they should all be on separate VMs. You're not able to effectively utilize the resources you have because you're butting up against OS limits. If you find yourself having to go down the path of investigating TCP port exhaustion or ulimits, then you're at the point you need to figure out "How do I spread my resources out?"
My suggestion is create a separate VM for each of the services which can't be easily clustered (or where clustering is just too painful). Focus on scaling out the web frontends (the shit running PHP FPM) to where you have something like 4 upstreams in nginx for serving up XenForo pages. When backends start crumbling under pressure, just add more instead of trying to figure out how to stretch their resources.
All of this can operate on an internal network that exists within the hypervisor itself so you aren't exhausting external IP resources and having to fret over firewall rules. Use a software router like OPNsense to act as the gateway for this network and establish tunnels back to Kiwi Flare nodes, or expose one or more reverse proxies to the Internet and setup iptables rules in Proxmox to manage inbound traffic. This isn't the only answer, you've got a lot of options for how to do this.
Server Sperg User Group
Last thing I'll bring up is that it sounds like you're pretty much trying to do everything alone and only reaching out to the community as an absolute last resort. A lot of the replies in this thread are useless noise, even the ones that are well intentioned, and it makes it much harder to figure out what has already been suggested and find additional information you've shared.
What I reckon might be helpful is a post in somewhere like Supporters, Inner Circle or I&T which is strictly a place for you to ask questions and get suggestions. Nothing off topic allowed, no glib answers tolerated, instant thread ban for shit stirrirs or brainlets.
Edit: Thank you XenForo for adding [ICODE] into every fucking paragraph for no reason! Very Cool!