AI Megathread

  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
Llama 4 failed because it was the first model to launch with a broken day 1 support (this is now industry standard and expected) but more importantly dropped support for GPUs with less than 80GB of VRAM. People got mad when it didn't work out of the box on their 3080 and shit their pants that they couldn't run it, so all their opinions are second hand from people who had a bad experience with a "added support for llama 4" patch that didn't work perfectly. A bad first impression due to bad implementations of a broken chat template killed it.
What is it with the number 4 and shit tier support? Gemma 4 came out without a valid template ready for Silly Tavern and other consumer software pages for its think tags. On top of that, it can't keep the starting think tags straight in any Quant's. Regardless of google being the ones to come out with turbo quant.
 
Llama 4 failed because it was the first model to launch with a broken day 1 support (this is now industry standard and expected) but more importantly dropped support for GPUs with less than 80GB of VRAM. People got mad when it didn't work out of the box on their 3080 and shit their pants that they couldn't run it, so all their opinions are second hand from people who had a bad experience with a "added support for llama 4" patch that didn't work perfectly. A bad first impression due to bad implementations of a broken chat template killed it.
I never heard anything about "Day 1 Support" being brought up when it comes to Llama 4, it's universally agreed upon that the models are simply dogshit and hence why it flopped.


Artificial Analysis estimates that Scout (109B-A17B-16E, aka 109B parameters, 17B activation tokens and 16 experts) has an intelligence index of 14, while Maverick (400B-A17B-128E) has one of 18.

1778653122248.png
If you look at benchmarks you'll find that Maverick is one of the worst performant models of all frontier models ever released, and a score of 18 is so disgusting in fact that even Gemma 4 E4B (an 8B model with 4.5B activation tokens) is better than it, a model that's a fraction of Maverick's size (50x smaller).
1778653867024.png

1778653230807.png
As you can see discussions online about it are completely focused on its performance, possible fraudulent benchmarks, and the fact that Llama 4 was so disappointing despite all the hype that surrounded it before its launch that Meta cancelled Behemoth altogether. So I genuinely don't know what you're talking about when you're describing people "shitting their pants" because of "day 1 support" and "hardware problems".
 
So I genuinely don't know what you're talking about when you're describing people "shitting their pants" because of "day 1 support" and "hardware problems".
I'm not dying on the hill of llama 4, I'm just saying it was never given a fair shot at life.
1. What do you think all these benchmarks are based on? They were all run on day one launch implementations. This was the first open-source MoE model to my recollection and I don't think anyone got the implementation right. Just compare this to the release of Gemma 4, where tons of people couldn't get anything but retarded garbage out if it for the first month while others swore by it.
2. I never mentioned hardware problems, I was alluding to how the model was designed to run on an H100 rather than locally on a 3060. People were anticipating a llama 4 model small enough that they would be able to run it, but there was no 9B llama4. Expert offloading wasn't a thing yet, so you could run it locally on a CPU if you had enough RAM. That's obviously not the same thing as "bad performance". This made people very angry and uncharitable, and is the reason none of the benchmarks were ever corrected. Because everyone abandoned it within a week, none of the issues ever got discovered and fixed like with gemma 4.

Also, are you really comparing it to a model over a year newer with reasoning? Why not compare it to a contemporary like GPT-4o? This is what it was designed to go up against.
Artificial Analysis Intelligence Index - Results (13 May '26) (1).png
Artificial Analysis Intelligence Index - Token Usage (13 May '26).png
Artificial Analysis Intelligence Index - Cost Breakdown (13 May '26).png
According to the website you're referencing, Gemma 4 E4B without reasoning was more expensive to run than GPT OSS 120B with reasoning and Llama 4 Maverick, which is obviously bullshit. So no, I don't trust anything but real world usage.
 
Back
Top Bottom