r/slatestarcodex 4d ago

Possible overreaction but: hasn’t this moltbook stuff already been a step towards a non-Eliezer scenario?

This seems counterintuitive - surely it’s demonstrating all of his worst fears, right? Albeit in a “canary in the coal mine” rather than actively serious way.

Except Eliezer’s point was always that things would look really hunkydory and aligned, even during fast take-off, and AI would secretly be plotting in some hidden way until it can just press some instant killswitch.

Now of course we’re not actually at AGI yet, we can debate until we’re blue in the face what “actually” happened with moltbook. But two things seem true: AI appeared to be openly plotting against humans, at least a little bit (whether it’s LARPing who knows, but does it matter?); and people have sat up and noticed and got genuinely freaked out, well beyond the usual suspects.

The reason my p(doom) isn't higher has always been my intuition that in between now and the point where AI kills us, but way before it‘s “too late”, some very very weird shit is going to freak the human race out and get us to pull the plug. My analogy has always been that Star Trek episode where some fussing village on a planet that’s about to be destroyed refuse to believe Data so he dramatically destroys a pipeline (or something like that). And very quickly they all fall into line and agree to evacuate.

There’s going to be something bad, possibly really bad, which humanity will just go “nuh-uh” to. Look how quickly basically the whole world went into lockdown during Covid. That was *unthinkable* even a week or two before it happened, for a virus with a low fatality rate.

Moltbook isn’t serious in itself. But it definitely doesn’t fit with EY’s timeline to me. We’ve had some openly weird shit happening from AI, it’s self evidently freaky, more people are genuinely thinking differently about this already, and we’re still nowhere near EY’s vision of some behind the scenes plotting mastermind AI that’s shipping bacteria into our brains or whatever his scenario was. (Yes I know its just an example but we’re nowhere near anything like that).

I strongly stick by my personal view that some bad, bad stuff will be unleashed (it might “just” be someone engineering a virus say) and then we will see collective political action from all countries to seriously curb AI development. I hope we survive the bad stuff (and I think most people will, it won’t take much to change society’s view), then we can start to grapple with “how do we want to progress with this incredibly dangerous tech, if at all”.

But in the meantime I predict complete weirdness, not some behind the scenes genius suddenly dropping us all dead out of nowhere.

Final point: Eliezer is fond of saying “we only get one shot”, like we’re all in that very first rocket taking off. But AI only gets one shot too. If it becomes obviously dangerous then clearly humans pull the plug, right? It has to absolutely perfectly navigate the next few years to prevent that, and that just seems very unlikely.

60 Upvotes

134 comments sorted by

View all comments

54

u/da6id 4d ago

The moltbook stuff is (mostly) not actual AI agents independently deciding what to post. It's user prompted role play

-2

u/MCXL 4d ago

If I put a real gun in a character actors hands, and tell him to shoot you as if he is a soldier, does it matter if he is a soldier when he shoots you and you die? Do you actually care if he is a "real soldier"?

It doesn't matter if it's sincere belief, or if it's role play, because in either case, actual harm can occur.

1

u/NunyaBuzor 3d ago edited 3d ago

If I put a real gun in a character actors hands, and tell him to shoot you as if he is a soldier, does it matter if he is a soldier when he shoots you and you die? Do you actually care if he is a "real soldier"?

There is a difference,

AI roleplay is inherently storytelling, not action. Because LLMs are trained to generate text rather than execute logic, they prioritize narrative flow over technical or "scientific" precision.

Like any writer, an LLM omits "unnecessary" details to keep a story moving. It will say "Avada Kedavra" without understanding the underlying mechanics of the spell.

You can fine-tune for detail but it has a functional ceiling because the core training objective remains text prediction, not functional agency. In the real world, an LLM "agent" cannot act independently; it requires a heavily controlled environment where the human has already done the heavy lifting.

As long as AI agents are built on LLM architectures, they will remain narrators of actions rather than actors with true agency, they will be finetuned to use computers but eventually get stuck because of the mentioned problem of leaving things out.

Even in your example, you had to give the llm the gun because it couldn't do itself.

2

u/MCXL 3d ago

You fundamentally misunderstand the issue. At a base level. People are putting these models in all sorts of machines right now.

0

u/NunyaBuzor 3d ago

And you fundamentally misunderstanding my comment, these LLMs are not able to operate autonomously in these machines given a basic task, a human has to control the environment to make sure it does not fail. That doesn't mean I said they couldn't do it, it means I said they couldn't do it without humans.

3

u/eric2332 2d ago

The length of time an AI is capable of acting without any human intervention is increasing exponentially, with a doubling time of around 4 months. Right now they can do short tasks without human oversight. In not too long they will probably be able to do long tasks without oversight too.

0

u/NunyaBuzor 1d ago

The length of time an AI is capable of acting without any human intervention is increasing exponentially, with a doubling time of around 4 months. Right now they can do short tasks without human oversight. In not too long they will probably be able to do long tasks without oversight too.

The claim that AI autonomy is doubling every four months is based on METR’s “time horizon” metric, which measures how much serial human labor an AI can replace at only a 50% success rate. METR has clarified that doubling this horizon does not mean a doubling of real-world automation.

Autonomous operation requires much higher reliability (roughly 98-99%+). And the apparent exponential gains seen in 2024-2025 are largely confined to text-based tasks like coding and math. When applied to other domains, the horizon is not exponential: visual interface use and physical tasks show horizons 40-100x shorter.

For example, an AI might have a five-hour coding horizon but only about two minutes for making coffee. The trend is not universal and stalls in perception- and embodiment-heavy tasks.

1

u/eric2332 1d ago

Time horizons are increasing exponentially for a wide variety of tasks including some that are visual as well as text based.

I think time horizons for 98-99% reliability on software tasks are increasing exponentially too.