AI Megathread

macrodegenerate · 2026-05-12T10:06:48-0400

BigGuyOS 6 : Jovial Janny said:
Llama 4 failed because it was the first model to launch with a broken day 1 support (this is now industry standard and expected) but more importantly dropped support for GPUs with less than 80GB of VRAM. People got mad when it didn't work out of the box on their 3080 and shit their pants that they couldn't run it, so all their opinions are second hand from people who had a bad experience with a "added support for llama 4" patch that didn't work perfectly. A bad first impression due to bad implementations of a broken chat template killed it.

What is it with the number 4 and shit tier support? Gemma 4 came out without a valid template ready for Silly Tavern and other consumer software pages for its think tags. On top of that, it can't keep the starting think tags straight in any Quant's. Regardless of google being the ones to come out with turbo quant.

Squishie PP · 2026-05-12T14:28:47-0400

ToroidalBoat said:
https://youtube.com/watch?v=uRDBco-cSK4Scientists Left 1000 AIs Alone in Minecraft. They Created A Civilization.

- Species | Documenting AGI

tl;dw: Vid claims that an AI can make innumerable decisions not instructed from a single instruction, and AI is an existential threat.

oy vey da computah can tawk, dis is annuddah shoah

blackshirt · 2026-05-12T14:29:59-0400

Avroboros · 2026-05-13T02:29:41-0400

BigGuyOS 6 : Jovial Janny said:
Llama 4 failed because it was the first model to launch with a broken day 1 support (this is now industry standard and expected) but more importantly dropped support for GPUs with less than 80GB of VRAM. People got mad when it didn't work out of the box on their 3080 and shit their pants that they couldn't run it, so all their opinions are second hand from people who had a bad experience with a "added support for llama 4" patch that didn't work perfectly. A bad first impression due to bad implementations of a broken chat template killed it.

I never heard anything about "Day 1 Support" being brought up when it comes to Llama 4, it's universally agreed upon that the models are simply dogshit and hence why it flopped.

https://artificialanalysis.ai/models/llama-4-scout

https://artificialanalysis.ai/models/llama-4-maverick

Artificial Analysis estimates that Scout (109B-A17B-16E, aka 109B parameters, 17B activation tokens and 16 experts) has an intelligence index of 14, while Maverick (400B-A17B-128E) has one of 18.

If you look at benchmarks you'll find that Maverick is one of the worst performant models of all frontier models ever released, and a score of 18 is so disgusting in fact that even Gemma 4 E4B (an 8B model with 4.5B activation tokens) is better than it, a model that's a fraction of Maverick's size (50x smaller).

As you can see discussions online about it are completely focused on its performance, possible fraudulent benchmarks, and the fact that Llama 4 was so disappointing despite all the hype that surrounded it before its launch that Meta cancelled Behemoth altogether. So I genuinely don't know what you're talking about when you're describing people "shitting their pants" because of "day 1 support" and "hardware problems".

BigGuyOS 6 : Jovial Janny · 2026-05-13T07:46:53-0400

Avroboros said:
So I genuinely don't know what you're talking about when you're describing people "shitting their pants" because of "day 1 support" and "hardware problems".

I'm not dying on the hill of llama 4, I'm just saying it was never given a fair shot at life.
1. What do you think all these benchmarks are based on? They were all run on day one launch implementations. This was the first open-source MoE model to my recollection and I don't think anyone got the implementation right. Just compare this to the release of Gemma 4, where tons of people couldn't get anything but retarded garbage out if it for the first month while others swore by it.
2. I never mentioned hardware problems, I was alluding to how the model was designed to run on an H100 rather than locally on a 3060. People were anticipating a llama 4 model small enough that they would be able to run it, but there was no 9B llama4. Expert offloading wasn't a thing yet, so you could run it locally on a CPU if you had enough RAM. That's obviously not the same thing as "bad performance". This made people very angry and uncharitable, and is the reason none of the benchmarks were ever corrected. Because everyone abandoned it within a week, none of the issues ever got discovered and fixed like with gemma 4.

Also, are you really comparing it to a model over a year newer with reasoning? Why not compare it to a contemporary like GPT-4o? This is what it was designed to go up against.

Artificial Analysis Intelligence Index - Results (13 May '26) (1).png

Artificial Analysis Intelligence Index - Token Usage (13 May '26).png

Artificial Analysis Intelligence Index - Cost Breakdown (13 May '26).png

According to the website you're referencing, Gemma 4 E4B without reasoning was more expensive to run than GPT OSS 120B with reasoning and Llama 4 Maverick, which is obviously bullshit. So no, I don't trust anything but real world usage.

AI Megathread

macrodegenerate

Generative AI was a mistake

Squishie PP

blackshirt

MOSLEY

Avroboros

"Is this Niggerlicious or is it Divine Intellect?"

BigGuyOS 6 : Jovial Janny

Crashing the kernel with no survivors