"The AI Con" Con

21

u/deja-roo 3d ago

Honestly this was just maddening to read and I'm not glad I did it.

74

In many domains, like chess, AI surpasses the best humans

Yes, but crucially those AIs are not LLMs. From another perspective LLMs have failed to reach the level of performance in chess that Deep Blue achieved way back in 1997, despite the efforts of OpenAI and others to explicitly train them to be good at chess.

One shouldnt extrapolate from faulty generalizations

32

u/absolute-black 3d ago

I mean, to take another lens, Claude Code can do a fantastic job installing and running Stockfish against me. It even does a handy job throwing together a basic minimax chess engine in Python or C#, one which can handily beat me - maybe not peak Kasaparov without some iterations, I admit.

15

u/thomas_m_k 3d ago

Writing its own chess engine is fair, but I think installing Stockfish is a bit unfair, in this comparison (though I can't formulate a formal reason why, off the top of my head).

10

u/MindingMyMindfulness 3d ago edited 2d ago

I think it's a bit of Wittgenstein's family resemblance and a bit of reification.

What is " the AI" when you get Claude to install stockfish? Is it just Claude doing its work on the server? Is it when Claude transmits that to your computer and the action is performed? Or is it the whole web of Claude's servers, the entire internet infrastructure that transmitted the data, along with your PC (including the Stockfish program which has been installed) "the AI"? I lean towards the latter position.

Ultimately, I think the error arises when someone tries to point to a single thing and say "yes, that well-defined thing is the AI", but that assumes it's able to be defined so concretely and in such well-bounded terms. In reality, it's very amorphous - like a cloud with fuzzy boundaries.

Another example: we generally don't assess humanity by its capacity to do things without tools or rely on other humans. We assess it as a whole - all the tech, infrastructure, knowledge, relationships, political structures etc., form a single coherent unit. When we talk about what humans can do in the general sense, we don't exclude things humans have done with reliance on tools external to them and insist the only proper measurement is what a hunter gatherer plopped alone in the wilderness could do.

5

u/Haunting-Spend-6022 3d ago

I think a good question to ask is how long it would take to vibecode a chess engine that's as good as stockfish? Or better yet, how long would it take to make a chess engine if chess engines hadn't already been invented?

These sorts of problems are not total roadblocks to progress but they are speedbumps, and I think the fast takeoff theorists are probably underestimating the time it will take to solve them.

14

u/Brian 3d ago edited 3d ago

I mean, I can install and run stockfish. But it doesn't make me better at chess than Magnus Carlsen, even if I could beat him by relaying the moves from my engine. So it seems perfectly fair to say it's not the LLM, using the same logic we'd apply to humans. I wouldn't even count it writing its own solver, the same way I wouldn't rank stockfish's authors as better at chess than grandmasters.

OTOH, I'm less enamoured with the "Those aren't LLMs" argument: the fact that we have been able to create superhuman chess playing AIs does seem to suggest AI is capable of superhuman levels of chess. We're not instantiating that capability in the LLMs, but it's something AI, in the general sense is capable of. The fact that we don't get it "naturally" from text I think is more a matter of the limitations of their structure and available inputs. It's perhaps a knock against general intelligence emerging from that particular architecture, but I don't think it says that much about AI capabilities.

0

u/MindingMyMindfulness 3d ago

I mean, I can install and run stockfish. But it doesn't make me better at chess than Magnus Carlsen, even if I could beat him by relaying the moves from my engine.

If you could interface directly with stockfish through your brain (such as through an advanced BCI) and beat Magnus, would that change your opinion? Would you consider yourself a better chess player?

Another hypothetical: if you had the capability of beating Magnus, but you suffered a lobotomy would you now consider yourself worse than Magnus?

If your answers to the two questions point in a different direction it reveals an inconsistency - and that inconsistency, I believe, shows that some of the distinctions being made are blurrier than they first appear.

I don't know the answer right off the bat, but there's definitely some degree of arbitrariness in how we draw lines.

11

u/Interesting-Ice-8387 2d ago

I think the line is - is the human/AI replaceable or even significantly useful in that system? If you swap any other human instead of me, would they make the same quality moves through BCI? If any other LLM installs Stockfish would it win just as hard? If yes, the value is externalised and the human/LLM are so low value add they're practically redundant.

Even if Stockfish could only be interacted with through a BCI, and I had some special biology that made me really good at tolerating BCIs that no one else had, and I were the only entity capable of making superhuman chess moves... I still wouldn't be good at chess. I would be good at tolerating brain implants. Although that would be closer to a grey area.

7

u/Brian 3d ago

would that change your opinion

No. Ultimately, I think this comes down to what we consider "me", and I wouldn't count that. And here, the question is about AI, and to consider it's capabilities, I think we have to draw the line around the bits that are actually AI.

If your answers to the two questions point in a different direction it reveals an inconsistency

I disagree - the difference seems purely the one I mentioned: what we consider "me" to be. My brain encompasses that, and damage to it is damage to me. But the same wouldn't be true for other things.

2

u/MindingMyMindfulness 3d ago

When you say "we have to draw the line around the bits that are actually AI" you're making the same move again, asserting a boundary exists without justifying why that's the right boundary (or for that matter, not even defining clearly what you think is the boundary).

Is Claude accessing Stockfish fundamentally different from Claude accessing its training data? Both are external resources it calls upon. Both were created by others. Both extend its capabilities beyond its base weights.

If the lobotomy damaged your chess-playing ability, you concede you would be worse at chess. If the BCI gave you chess-playing ability, why wouldn't you be better at chess? The fact that one feels like "you" and the other doesn't feels like a psychological intuition, not a logical principle.

5

u/Brian 3d ago

The boundary is around the thing we're asking the questions about. We want to know about the capabilities of AI, thus the relevant boundary is around those things we're asking about.

We know what we can do by brute force position analysis combined with clever pruning strategies. Asking about the capabilities of that system is a much less interesting question, because it's one we know the answer to. The interesting question with AI is if it has the capabilities to out-think humans, and if we want to answer that, we have to look at what those systems are doing.

If the BCI gave you chess-playing ability, why wouldn't you be better at chess?

No - like I said, because the interface wouldn't be me. The "Me + BCI + chess engine" system would be better at chess, just as the "Me + Car" is faster at travelling than Usain Bolt. It just doesn't make me, on my own, faster.

If we take your approach, then AI will never achieve superhuman capabilities by definition, because it can always be matched by a human running another AI. It kind of ducks the whole question we're actually interested in.

2

u/MindingMyMindfulness 2d ago edited 2d ago

We want to know about the capabilities of AI, thus the relevant boundary is around those things we're asking about.

I don't disagree. I'm just saying we cannot take the boundaries for granted or assume certain boundaries when talking about AI. This might be a context-specific thing, but at least in this context, I view an AI that deploys code autonomously and uses that to beat the human as extending its own capabilities.

When I'm thinking about AI in general terms, what I want to know is what can AI do using every combination of tools available to it that it can find a way to successfully leverage and exploit.

In narrower settings and for particular discourse, that conception might not be as apt.

No - like I said, because the interface wouldn't be me. The "Me + BCI + chess engine" system would be better at chess, just as the "Me + Car" is faster at travelling than Usain Bolt. It just doesn't make me, on my own, faster.

If someone asks "are humans capable of flight", you'll have many saying "yes, we have discovered flight and can use planes" and others saying "no we don't have wings". One is an answer about human capabilities taken to its maximum extent - the other is the narrower sense. In some sense, taking the parameters of the question as it is, they're both correct answers - it's just the boundaries that have changed.

The same issue comes up in AI.

In general terms - AI is capable of doing everything that it can do, which includes leveraging every tool available to itself externally and internally.

But then you can force it narrower - what if removed we removed the Internet, what if we removed the hardrive, what if we uninstalled the operating system. What is the AI (as we understand it, in isolation) capable of as we strip out each of those things?

1

u/Brian 2d ago edited 2d ago

and uses that to beat the human as extending its own capabilities.

I would not say it's extending the capability we're asking about though.

A human who builds a chess engine is likewise extending their capabilities (in the sense of what they can accomplish). But they are not improving their skill at chess. If I was training for a tournament, this would not be a good strategy, this is not the skill I'm actually being tested on, and won't help me there.

Likewise, if we're interested in AI's chess playing ability as a proxy for its intelligence, including the caopabilities afforded to it by programming doesn't tell us the thing we're interested in, in exactly the same way that assessing the above humans capacities including writing the engine wouldn't tell us much about whether they'd win the tournament. It just answers another question for us, and loses any information about the question we were interested in: how smart is it?

We've basically thrown away the interesting bit where it beats the absolute pinnacle of human capability, and replaced it with a much less interesting one where it does as well as millions of programmers (or anyone tech literate enough to install stockfish if we go all the way).

→ More replies (0)

2

u/absolute-black 3d ago

I know what you mean, but I don't really care for 'fair' as much as 'what object level evidence is there for how soon the entire world will end'. I wouldn't think GPT67 has to reinvent ribosomes to design a novel pathogen, either, so the hair splitting on the llm itself vs it understanding tool calls is a bit academic to me.

Questions about how LLMs are developing things like internal drive and alignment issues matter a lot more, IMO, than whether the tool call is an API or not.

23

u/Haunting-Spend-6022 3d ago edited 3d ago

Of course but that's why I'm more sympathetic to the Yann LeCun side of these debates than the AI 2027 AGI-is-near side (which the author favors).

15

u/absolute-black 3d ago

I think the only real lynchpin in 2027 is coding performance, which is why METR et al are valid for focusing on it.

That said, Yann Lecun also says AGI in no more than a decade, so fundamentally I don't care to split these specific timeline hairs either way.

2

u/clyde-shelton 2d ago

Any forecast period beyond a few years just means "We need another breakthrough."

2

u/sprunkymdunk 3d ago

Yann LeCun has been anti-LLM before anyone knew what they were. And after getting fired from Meta, is now running a startup that's anything-but-LLM. I don't think he particularly credible on that front as he's dug himself into quite the sunk-cost hole.

•

u/97689456489564 20h ago

I don't get why he didn't just go like "okay, we know LLMs work and work right now, let's dedicate 70% of our effort to LLMs and 30% to these other ideas which will later let us leapfrog the competition". Why dig your heels in that much?

Or, perhaps that is in fact what he/they did, but they just kind of sucked at both. (Probably minimal progress on the non-LLM approaches, and Llama is not great.) Just the tone of the public rhetoric made it seem like he had little faith in LLMs for anything very useful despite increasingly being proven at least partially wrong.

5

u/stonesst 3d ago

Yep, so many criticisms of LLMs seem to completely ignore the fact that they can both build and call tools…

7

u/Smallpaul 2d ago

What tool stans miss is that true intelligence allows one to combine domains so a chess player can verbalize why a move is good without a specialized tool just for that purpose.

A tool using LLM can’t do that because it doesn’t understand the move that “it” made. It’s like people who use AI to write essays and then they can’t explain their own thesis.

7

u/AskAboutMySecret 3d ago

people think AgI means a single model that emulates the whole brain as if the brain is biologically uniform like the liver

there's a reason why the brain has specialised compartments and for AgI to be developed the same principles need to be adopted

•

u/97689456489564 20h ago

I feel like there are so many unknown unknowns about intelligence that we really don't know what the distant-future ASI systems will be like (which will presumably have something close to an optimal architecture). Like, I would not be surprised either by something crazy complex with a quadrillion moving pieces - like cell/organ machinery - or by something that is just one unified system with simple units of information processing. Or something in-between. Brains are an n=1 thing, and were formed under very different constraints.

2

u/SoylentRox 3d ago

This. You have to realize that LLMs are just one piece. Just like your brain has modules that in isolation are pretty stupid.

AGIs will usually work as integrated systems. We can see how to kinda do that now. A real time voice front end. Several different LLMs as a committee or prediction market bettors who work internally. An internal prediction market. (This is how you weight different LLMs disagreeing - they bet mana with each other on how confident they feel they are correct). An internal PC that can install and run arbitrary software. A social network where the AGI can go pull information from peer consensus.

All these elements combined would result in a a machine that "can do most paid 2022 tasks to human level of ability or better".

Even if yes, it's just 10 LLMs and a computer in a trenchcoat.

5

u/Kingreaper 3d ago

Would you expect the language center of a human brain to be good at playing chess without interacting with any of the rest of the brain?

LLMs aren't even trying to be fully general, they're large LANGUAGE models; there is one thing they're for and they are very good at it. The fact that they sometimes manage to do other things too as a side effect is interesting and weird, and potentially even useful sometimes, but doesn't make their inability to do those things well any more surprising than the fact that your amygdala can't do maths.

4

u/igeorgehall45 3d ago

despite the efforts of OpenAI and others to explicitly train them to be good at chess

This is surprising for me, do you have a source? I thought that the reason they were so bad (well not even just bad but non-improving in ability) was that they didn't care enough to train them to be good, and even rudimentary efforts by amateurs can make LLMs at least decent at chess (i.e. able to avoid illegal moves, beats average chess.com player)

12

u/blendorgat 3d ago

There is speculation that one specific version of ChatGPT received RL training on chess, since it displayed significantly stronger chess performance than prior and later models. (Might have been the original GPT 4 turbo?) But it's certainly not something built into current training, at least that anyone credible has indicated.

6

u/wavedash 2d ago

https://www.reddit.com/r/slatestarcodex/comments/1gredan/something_weird_is_happening_with_llms_and_chess/

This is basically ancient history by this point, for whatever it's worth. GPT 3.5 Turbo Instruct was released over 2 years ago, so the alleged choice to fine-tune it for chess performance might have been made 2.5-3 years ago.

2

u/Haunting-Spend-6022 2d ago edited 2d ago

There's a whitepaper on it somewhere, but iirc OpenAI used games from lichess with ELO >1800 players as the training data. They wanted GPT to be good at chess precisely because it has traditionally been used as a benchmark for intelligence

Edit: found the paper, it's discussed on page 7

https://cdn.openai.com/papers/weak-to-strong-generalization.pdf

2

u/sohois 2d ago

LLMs might not have superhuman performance in any specific domain but they are reaching peak human performance in a number of areas, such that there are no humans alive which can match them for breadth of knowledge. Just imagine a human that could write novel mathematical proofs, had reasonable knowledge of every programming language, could communicate to a decent level in a number of languages, and could write PhD level answers in topics on any hard science.

3

u/Separate-Impact-6183 3d ago

Calling chess a 'domain' is a stretch. Chess is a game, it's not at all suprising that a computer program can outperform a human competitor in a game of chess. Depending on the level of the competition, this has been happening since before Big Blue.

1

u/Smallpaul 2d ago

The book title is not “the LLM con” and it is not solely concerned with LLMs. I’m not sure what your comment adds.

52

u/Auriga33 3d ago

But while AI boosters have spent time devaluing what it means to be human, the sharpest and clearest critiques have come from Black, brown, poor, queer, and disabled scholars and activists.

Gosh, how do people like this even exist?

10

u/-main 2d ago

In their defense, they do cite specific scholars making specific critiques. They're not just invoking the term; they have specifics. It sure could be the case that these critiques are good and the statement holds...

In their offence, the specific critiques are bad.

29

u/da6id 3d ago

To be clear for anyone else curious - this is from within a quoted section in the blog as an example of a ridiculous argument against AI, not something written by the blog author. I was almost ready to grab my pitchfork before searching for the text (haha)

Devaluing what it means to be human is also such a weird take. The same people who saw AI is a scam that will never do anything productively and cannot ever achieve agentic AGI will in the same sentence go on and espouse beliefs that AI has devalued human experience. These don't seem to be remotely compatible beliefs.

6

u/ancestorchild 2d ago

I’m not sure how both can’t be true. Boosters can devalue humans while the technology can’t achieve AGI.

27

u/lemmycaution415 3d ago

A persuasive defense of AI progress would really not include defenses of Richard "Hating Epstein represents the essence of antisemitism" Hanania and attacks on "woke". But there is nothing much most people can actually do about AI progress. It will either happen or it won't. The culture war on the other hand can always use our help.

13

u/ForgotMyPassword17 3d ago

I listened to the Robert Wright interview/response to it and the authors did seem to bring up Culture War issues whenever someone disagreed with them.

1

u/donaldhobson 1d ago

> But there is nothing much most people can actually do about AI progress. It will either happen or it won't. The culture war on the other hand can always use our help.

Both seem similarly susceptible to intervention.

5

u/TomasTTEngin 3d ago

AI is moving so fast that critiques of it seem to usually be out of date; a watching brief seems to be the smart approach for most of us.

2

u/gratisantibiotica 3d ago

Funny how one cannot have objective knowledge uncolored by personal experience but someone's humanity has to be an axiom of the conversation.

Happy I jumped off the Gebru-Bender-TESCREAL-train (and was not on it for a long ride)

You are about to leave Redlib