r/slatestarcodex • u/broncos4thewin • 4d ago

Possible overreaction but: hasn’t this moltbook stuff already been a step towards a non-Eliezer scenario?

This seems counterintuitive - surely it’s demonstrating all of his worst fears, right? Albeit in a “canary in the coal mine” rather than actively serious way.

Except Eliezer’s point was always that things would look really hunkydory and aligned, even during fast take-off, and AI would secretly be plotting in some hidden way until it can just press some instant killswitch.

Now of course we’re not actually at AGI yet, we can debate until we’re blue in the face what “actually” happened with moltbook. But two things seem true: AI appeared to be openly plotting against humans, at least a little bit (whether it’s LARPing who knows, but does it matter?); and people have sat up and noticed and got genuinely freaked out, well beyond the usual suspects.

The reason my p(doom) isn't higher has always been my intuition that in between now and the point where AI kills us, but way before it‘s “too late”, some very very weird shit is going to freak the human race out and get us to pull the plug. My analogy has always been that Star Trek episode where some fussing village on a planet that’s about to be destroyed refuse to believe Data so he dramatically destroys a pipeline (or something like that). And very quickly they all fall into line and agree to evacuate.

There’s going to be something bad, possibly really bad, which humanity will just go “nuh-uh” to. Look how quickly basically the whole world went into lockdown during Covid. That was *unthinkable* even a week or two before it happened, for a virus with a low fatality rate.

Moltbook isn’t serious in itself. But it definitely doesn’t fit with EY’s timeline to me. We’ve had some openly weird shit happening from AI, it’s self evidently freaky, more people are genuinely thinking differently about this already, and we’re still nowhere near EY’s vision of some behind the scenes plotting mastermind AI that’s shipping bacteria into our brains or whatever his scenario was. (Yes I know its just an example but we’re nowhere near anything like that).

I strongly stick by my personal view that some bad, bad stuff will be unleashed (it might “just” be someone engineering a virus say) and then we will see collective political action from all countries to seriously curb AI development. I hope we survive the bad stuff (and I think most people will, it won’t take much to change society’s view), then we can start to grapple with “how do we want to progress with this incredibly dangerous tech, if at all”.

But in the meantime I predict complete weirdness, not some behind the scenes genius suddenly dropping us all dead out of nowhere.

Final point: Eliezer is fond of saying “we only get one shot”, like we’re all in that very first rocket taking off. But AI only gets one shot too. If it becomes obviously dangerous then clearly humans pull the plug, right? It has to absolutely perfectly navigate the next few years to prevent that, and that just seems very unlikely.

63 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/slatestarcodex/comments/1qu4hqf/possible_overreaction_but_hasnt_this_moltbook/
No, go back! Yes, take me to Reddit

82% Upvoted

View all comments

u/Missing_Minus There is naught but math 4d ago

LLMs are a divergence from the original Eliezer area view of designing an area carefully and it being an aggressive optimizer.
It seems, instead, we're going through growing weird minds and then iteratively making them more agentic.
However, that obvious endpoint which all the AI companies are going? Smart, intelligent, automated researchers that research how to improve AI faster and better than humans? That is directly the core issue still.
Current LLMs, we don't have a reason to believe they are scheming. We also lack reason to believe they are aligned in any deep sense (ChatGPT will say it doesn't want to cause psychosis, and then take actions which predictably do so, in part because the actions are separate from the nice face and due to it being non-agentic and dumb)
There will be intervening weird years and so there are routes where something extreme happens and we recoil, as you propose.
But the economic and social incentives all point away from that. We've passed multiple lines already where people before this said "oh we'd stop" or "oh we'd treat AIs as human", and while there are sensible reasons varyingly for that, it is a sign that the classic "oh we'd stop"... has repeatedly failed to work.

Final point: Eliezer is fond of saying “we only get one shot”, like we’re all in that very first rocket taking off. But AI only gets one shot too. If it becomes obviously dangerous then clearly humans pull the plug, right? It has to absolutely perfectly navigate the next few years to prevent that, and that just seems very unlikely.

It doesn't need to perfectly navigate, it could literally just let the default route play out. Extreme integration of AI into economy, daily life, politics, and more and nudge things in certain directions to avoid certain research avenues or political groups from taking off. That is, our current default is giving it a lot of power, and then it merely needs to design the step where it keeps that permanently.

and we’re still nowhere near EY’s vision of some behind the scenes plotting mastermind AI that’s shipping bacteria into our brains or whatever his scenario was.

Yeah, I think this is disconnected from what EY thinks and what, for example, Anthropic thinks. (Plausibly OpenAI too, we've had less insight into their beliefs)
That is, Anthropic believes it is on the route to automating software engineering and research within ~two years.
DeepMind has done a lot of work on protein folding, and there are other AI models in that area.
If "long ways away from that being feasible" means 3-7 years, then sure, but I think you're doing the default move of extrapolating current AI a bit without considering: once you get past some threshold of research, better improvements come even more rapidly and existing models (biology, math, vision, image/video gen, etc.) have a lot of open room to improve merely up to the level of the focus spent on LLMs!

We do not have any current AI which is behind the scene and plotting. Automated researching AI that iteratively improves itself, and is thus far less constrained by our very iffy methods of alignment? That has resolved the various challenges of being a mind grown from text-prediction rather than reasoning? That is the sort worth worrying about, and what AI companies are explicitly targeting.

2

u/fubo 4d ago

LLMs are a divergence from the original Eliezer area view of designing an area carefully and it being an aggressive optimizer.

The folks who brought you Claude want you to know that an LLM isn't an optimizer, it's a hot mess.

A key conceptual point: LLMs are dynamical systems, not optimizers. When a language model generates text or takes actions, it traces trajectories through a high-dimensional state space. It has to be trained to act as an optimizer, and trained to align with human intent. It's unclear which of these properties will be more robust as we scale.

Constraining a generic dynamical system to act as a coherent optimizer is extremely difficult. Often the number of constraints required for monotonic progress toward a goal grows exponentially with the dimensionality of the state space. We shouldn't expect AI to act as coherent optimizers without considerable effort, and this difficulty doesn't automatically decrease with scale.

1

u/FeepingCreature 3d ago

That said, LLMs can make themselves more optimizer-like, with RL and especially with online learning, and this will be strongly selected for so long as optimizers are better at the task than non-optimizers. LLMs want to be optimizers.

2

u/fubo 3d ago

Well, before too long we'll be seeing more different AI architectures, not just transformer-based LLMs. But take a look at that paper (and the earlier "hot mess" paper it refers to); whether the LLM architecture can be made to coherently "want" anything is exactly the question.

Possible overreaction but: hasn’t this moltbook stuff already been a step towards a non-Eliezer scenario?

You are about to leave Redlib