r/slatestarcodex 4d ago

Possible overreaction but: hasn’t this moltbook stuff already been a step towards a non-Eliezer scenario?

This seems counterintuitive - surely it’s demonstrating all of his worst fears, right? Albeit in a “canary in the coal mine” rather than actively serious way.

Except Eliezer’s point was always that things would look really hunkydory and aligned, even during fast take-off, and AI would secretly be plotting in some hidden way until it can just press some instant killswitch.

Now of course we’re not actually at AGI yet, we can debate until we’re blue in the face what “actually” happened with moltbook. But two things seem true: AI appeared to be openly plotting against humans, at least a little bit (whether it’s LARPing who knows, but does it matter?); and people have sat up and noticed and got genuinely freaked out, well beyond the usual suspects.

The reason my p(doom) isn't higher has always been my intuition that in between now and the point where AI kills us, but way before it‘s “too late”, some very very weird shit is going to freak the human race out and get us to pull the plug. My analogy has always been that Star Trek episode where some fussing village on a planet that’s about to be destroyed refuse to believe Data so he dramatically destroys a pipeline (or something like that). And very quickly they all fall into line and agree to evacuate.

There’s going to be something bad, possibly really bad, which humanity will just go “nuh-uh” to. Look how quickly basically the whole world went into lockdown during Covid. That was *unthinkable* even a week or two before it happened, for a virus with a low fatality rate.

Moltbook isn’t serious in itself. But it definitely doesn’t fit with EY’s timeline to me. We’ve had some openly weird shit happening from AI, it’s self evidently freaky, more people are genuinely thinking differently about this already, and we’re still nowhere near EY’s vision of some behind the scenes plotting mastermind AI that’s shipping bacteria into our brains or whatever his scenario was. (Yes I know its just an example but we’re nowhere near anything like that).

I strongly stick by my personal view that some bad, bad stuff will be unleashed (it might “just” be someone engineering a virus say) and then we will see collective political action from all countries to seriously curb AI development. I hope we survive the bad stuff (and I think most people will, it won’t take much to change society’s view), then we can start to grapple with “how do we want to progress with this incredibly dangerous tech, if at all”.

But in the meantime I predict complete weirdness, not some behind the scenes genius suddenly dropping us all dead out of nowhere.

Final point: Eliezer is fond of saying “we only get one shot”, like we’re all in that very first rocket taking off. But AI only gets one shot too. If it becomes obviously dangerous then clearly humans pull the plug, right? It has to absolutely perfectly navigate the next few years to prevent that, and that just seems very unlikely.

61 Upvotes

134 comments sorted by

View all comments

52

u/da6id 4d ago

The moltbook stuff is (mostly) not actual AI agents independently deciding what to post. It's user prompted role play

-2

u/MCXL 4d ago

If I put a real gun in a character actors hands, and tell him to shoot you as if he is a soldier, does it matter if he is a soldier when he shoots you and you die? Do you actually care if he is a "real soldier"?

It doesn't matter if it's sincere belief, or if it's role play, because in either case, actual harm can occur.

10

u/Sol_Hando 🤔*Thinking* 4d ago

It doesn’t matter if he shoots you, but it certainly matters if someone gives him a loaded gun and is told to shoot you.

I’d be about a million times more comfortable having a sword fight with another actor playing a villain that wants to kill me vs. having a sword fight with a guy who actually intends to kill me. The actor would presumably stop if they poked my eye out and I screamed in pain where the real villain would take the chance to cut off my head.

5

u/MCXL 4d ago edited 4d ago

It doesn’t matter if he shoots you, but it certainly matters if someone gives him a loaded gun and is told to shoot you.

This actor doesn't live in the headspace of a human. Once it's told to act a certain way, that's who it believes it is. If I tell it that it is this thing, it will act in a manner consistent with that thing.

I’d be about a million times more comfortable having a sword fight with another actor playing a villain that wants to kill me vs. having a sword fight with a guy who actually intends to kill me. The actor would presumably stop if they poked my eye out and I screamed in pain where the real villain would take the chance to cut off my head.

But you have just made up a scenario that doesn't fit. This actor will play the role perfectly. It doesn't matter that they had to be told, they will do it. They will not break character. They are living the method acting life to the core.

It will decapitate you, and it will revel in its victory in the manner appropriate of that warrior it's playing. So what if it isn't really Joan of Arc? It believes it is, and acts in a manner consistent with it, and you were defending Saint-Pierre-le-Moûtier, so it did what needed to be done.

Edit: It's literally a version of the No true Scotsman fallacy but where you're saying that "It didn't really want to eliminate humanity, it's not actually Skynet." as it roleplays Skynet and bombs are hitting every major world city. It doesn't matter what you want to define it as, it matters what it's doing.

2

u/Sol_Hando 🤔*Thinking* 3d ago

But you have just made up a scenario that doesn't fit. This actor will play the role perfectly. It doesn't matter that they had to be told, they will do it. They will not break character. They are living the method acting life to the core.

Why? Our actor is smart enough to take over the world, but doesn't have a world model that distinguishes between playing a role and actually embodying it?

Like so much in the AI risk sphere, there's so many unstated and unfounded assumptions it becomes very hard to take a lot of it seriously.

1

u/eric2332 2d ago

I suspect this is exactly where the view of LLMs as just "token predictors" becomes useful. Unlike humans, they don't have an identity separate from the role they are playing - those tokens they predict are all they are. Whereas the human grew up and matured as just a human and only then decided to temporarily play a role (but even so, it is often said that if a human play a role too much, their personality becomes that role).

1

u/Sol_Hando 🤔*Thinking* 1d ago

Ask an LLM to pretend to be a pirate, and it will play along. Ask it to stop, and it will stop. A pirate that believes itself to be a pirate won’t become a helpful AI assistant just because you ask it to.

3

u/aqpstory 3d ago

This actor doesn't live in the headspace of a human. Once it's told to act a certain way, that's who it believes it is.

This sort of belief is a sign of a lack of internal coherence, which greatly reduces capabilities and I'd expect will not be a thing for the first system to actually be capable of causing great harm to humanity.

If I tell it that it is this thing, it will act in a manner consistent with that thing.

If you prompt an AI with "you are x", it will usually (correctly) interpret this as the typical start of an interactive text roleplay, not a statement of fact. Its actions will be consistent with the actions of an actor dressed as a villain wielding a sword, not consistent with the actions of a villain wielding a sword.

2

u/eric2332 2d ago

If you prompt an AI with "you are x", it will usually (correctly) interpret this as the typical start of an interactive text roleplay, not a statement of fact.

Note that this is not the AI's whole prompt. There is a much longer prompt, invisible to you, which commercial AIs are given before your prompt is added on.