I've been burning so many tokens with OpenClaw via OpenRouter. Z.ai's GLM 5.1 is UNREAL. Took me a while to switch from Kimi and even K2.6 sucks in comparison. Gemma 4 31B is great for spawning low cost subagents.
Hey, an OpenClaw user! I was just roasting OpenClaw in the AI Derangement CW thread, and no one else in the thread had any experience with it to explain what it's really good for. What do you use it for? Good news: unless you have a hand-curated OpenClaw setup that segregates the AI-required functionality from the programmable functionality, you may stand to save a lot of money in API costs. Harnesses are extremely token-inefficient in general.
OpenClaw has one-shotted a certain type of productivity-maxxing tech hustler. Those guys who relentlessly optimize their workflows and second brains in service of writing tweets about workflow optimization and second brains.
In my experience it doesn't do much you can't already do better with Claude Code/Codex/OpenCode, and you get less visibility over the process because you're talking to an agent running on remote hardware over Telegram.
Its main selling point is that it lets you see yourself as a high-powered executive who delegates lots of stuff to your robot PA. Never mind...
I'm a lawyer too (BigLaw for Public Entities). It is unreal how many pro pers are filing Complaints using AI. Most however are easily susceptible to Demurrers and Motions to Dismiss though. They never want to meet and confer because over the phone they would be exposed about knowing nothing regarding civil procedure and what they actually filed.
I can tell most us ChatGPT which is the worst for legal writing. Opus is by far the best for drafting legal work although Gemini with Deep Research usually can catch a case cite or too in which the citation doesn't really jive.
If I may derail the thread, how are industry lawyers (mis)using AI? I might be the mirror image of you (ML engineer with autistic interest in law). A lot of my leftover tokens go to asking Claude to generate simple imaginary LARP cases (like small business tax assessments), then I practice writing filings/opinions for them and ask Claude to check my work. I curate sources myself and feed them to Claude for the check instead of relying on the model knowledge (otherwise the citations are constantly wrong; it's an inherent problem in the architecture). I have no idea if I'm doing something retarded because Claude sounds like it knows what it's saying and can criticize me pretty harshly. No I'm not trying to train myself to write pro se filings, I'm a good boy who dindu nuffin and don't have to deal with the courts IRL.
imho the best card for AI at the present moment is the 24gb 3090
This depends on what models you intend to run locally. In the peasant consumer class the 3090 (24GB) and the 5070Ti (16GB) share the same price point in my region. Despite the higher VRAM, the drawback of the 3090 is that you have to get it second-hand, and the 30-series tensor cores don't support 8-bit and 4-bit quantization natively unlike the 50-series. That means that whatever TPS improvements you're meant to get from using a lower quant don't really apply to the 3090 as it re-expands the weights to FP16 during inference. I'm not saying that the 5070Ti is the best consumer card for local AI, 16GB VRAM is dogshit, but it's better than the 3090 if one of it is all you can afford. The reason why the 3090 is still holding up in price is because people are stacking them. The most popular configuration is 4x3090 for 96GB VRAM, also known as the poor man's RTX Pro 6000. If you're going to have 4 of the same card, the extra 32GB VRAM edges out the architectural improvements.
For LLMs, the VRAM of a single card doesn't matter a lot - LLMs have the strongest support for multi-card deployments, and going up from 16GB to 24GB doesn't give you access to a whole new tier of models. You're still stuck in the sub-30B parameter zone. Some low quant 30Bs can squeak into 24GB, but the 3090's FP16 processing and low memory overhead will make them run like snails. If you have the know-how to source them, Chinese hackers sell modded 4090s with 48GB VRAM which gives you a real step-up to Q3/4 70B models,
>though I can't imagine these are much cheaper than A6000s in this shitty market. You can get away with stacking old 3D rendering cards like Quadros and Mi50s if you really need a chungus LLM at home, but you need the specialized hardware and power equipment to sustain the data bandwidth and yuge power draw. I'd rather just pay for an external API at that point.
For image and video generation, multi-GPU isn't as well-supported as for LLMs, so a single good card with yuge VRAM is better than several okay ones. For image generation, both 16GB and 24GB are more than enough for SOTA. Okay, maybe the newer Fluxes are a bit too fat for 24GB, but there
isn't a great performance drop for their quants. A newer-generation 16GB like the 5070Ti/5080 (5070Tis are binned 5080s) will outperform the 3090 in generation speed and access to native quantization. For video generation, neither 16GB nor 24GB are enough for SOTA, so the 50-series wins hands down in generation speed for smaller models as it has access to Flash Attention while the 30-series doesn't.
BTW to anyone still considering it: The DGX Spark is Nvidia's version of the Mac Studio and AMD Strix Halo. It's a "shared memory" mini PC. It's not a GPU.
I have been unable to find any thread, in general, that would allow me to explain how batshit insane some usages of AI are. Obvious examples are AI-generated advertisements, AI-powered scams, AI agents having a meltdown like an autistic toddler, and so on.
The
AI Derangement Syndrome thread welcomes you! We're trying to get more Pro-AI derangement (AI ads/scams/marketing campaigns, AI paranoia/dooming, "AI is God" hype) because the discussion is currently skewed to the Anti-AI side.
Also, if I remember correctly, the "AI Skeptic" communities you listed, especially the "Pause AI" one, were frequented by the people who attacked Scam Altman's house.
Dolphin, in my opinion, was kind of lame. It will tell you how to make meth but will act uncomfortable when talking negatively about jews. Okay.
Most "uncensored" models are like that because abliteration only targets explicit refusals. The uncensored model won't say "no" outright, but with enough negative reinforcement in its weights, it will try to worm itself into subverting no-no requests
like OpenAI demonstrates in this post.
Jailbreaking with push prompting (explicitly tell the model you wish to gas the kikes so it will prioritize associated tokens) is the stronger solution overall.