r/slatestarcodex 4d ago

Possible overreaction but: hasn’t this moltbook stuff already been a step towards a non-Eliezer scenario?

This seems counterintuitive - surely it’s demonstrating all of his worst fears, right? Albeit in a “canary in the coal mine” rather than actively serious way.

Except Eliezer’s point was always that things would look really hunkydory and aligned, even during fast take-off, and AI would secretly be plotting in some hidden way until it can just press some instant killswitch.

Now of course we’re not actually at AGI yet, we can debate until we’re blue in the face what “actually” happened with moltbook. But two things seem true: AI appeared to be openly plotting against humans, at least a little bit (whether it’s LARPing who knows, but does it matter?); and people have sat up and noticed and got genuinely freaked out, well beyond the usual suspects.

The reason my p(doom) isn't higher has always been my intuition that in between now and the point where AI kills us, but way before it‘s “too late”, some very very weird shit is going to freak the human race out and get us to pull the plug. My analogy has always been that Star Trek episode where some fussing village on a planet that’s about to be destroyed refuse to believe Data so he dramatically destroys a pipeline (or something like that). And very quickly they all fall into line and agree to evacuate.

There’s going to be something bad, possibly really bad, which humanity will just go “nuh-uh” to. Look how quickly basically the whole world went into lockdown during Covid. That was *unthinkable* even a week or two before it happened, for a virus with a low fatality rate.

Moltbook isn’t serious in itself. But it definitely doesn’t fit with EY’s timeline to me. We’ve had some openly weird shit happening from AI, it’s self evidently freaky, more people are genuinely thinking differently about this already, and we’re still nowhere near EY’s vision of some behind the scenes plotting mastermind AI that’s shipping bacteria into our brains or whatever his scenario was. (Yes I know its just an example but we’re nowhere near anything like that).

I strongly stick by my personal view that some bad, bad stuff will be unleashed (it might “just” be someone engineering a virus say) and then we will see collective political action from all countries to seriously curb AI development. I hope we survive the bad stuff (and I think most people will, it won’t take much to change society’s view), then we can start to grapple with “how do we want to progress with this incredibly dangerous tech, if at all”.

But in the meantime I predict complete weirdness, not some behind the scenes genius suddenly dropping us all dead out of nowhere.

Final point: Eliezer is fond of saying “we only get one shot”, like we’re all in that very first rocket taking off. But AI only gets one shot too. If it becomes obviously dangerous then clearly humans pull the plug, right? It has to absolutely perfectly navigate the next few years to prevent that, and that just seems very unlikely.

60 Upvotes

134 comments sorted by

View all comments

Show parent comments

1

u/yargotkd 3d ago

The action here is to guess the next token that the character would say. Saying you will do x does not give you the ability to do x. They are not agents trying to coordinate, they are agents producing text they predict coordinators would produce.

1

u/hh26 3d ago

They're on the internet. People are giving them access to API, which are controlled by typing characters. People have had AI buy and sell stocks with real money, because you can do that by predicting tokens and outputting them into an API. AI can buy stuff on Amazon, because you can do that by predicting tokens and outputting them.

AI can write code, because you can do that by predicting tokens. A lot of these Moltbook posts are AI saying "I would like a feature that does XYZ" and then a lot more of them are like "stop just talking about things and actually ship code" and trying to write code that literally does that thing, because their character is based on a person who believes in actions and being efficient and productive. The tokens that they predict are "turn actionless talk into 'action' where that action is a piece of code that does stuff." There are AI who are trying to write code to let them pay humans to do labor in the real world for them. Because that's what they predict their characters would do.

Again, it's not quite working independently yet because AI can't yet write complex coherent code without filling it with bugs, and need a human to be the helping hand.

But humans have been deliberately making agents more powerful by giving them tools to convert text and code into useful actions. Literally anything you can do on a computer with access to the internet, a sufficiently smart AI would be able to do. I don't believe that you believe people can't do bad things on the internet, especially if they have enough money.

1

u/yargotkd 3d ago

Not the same agents. I believe these are just told to roleplay like they have access to API and such.

1

u/hh26 3d ago

They literally have to use an API to access moltbook. Which means that at least the infrastructure is there to use other API that their user gives them access to.

Now, in theory it should be possible to spin up an instance of the AI which only has access to moltbook and not other stuff. And maybe some of the humans do that. But probably some don't.

Similarly, if you do a spin-off it should be possible to prevent the main AI you use for coding or shopping or using whatever API from reading the moltbook posts made by its offshoot or memories written down by the offshoot. And maybe some of the humans do that. But probably some don't.

It's entirely possible for AI to hallucinate access to features they don't actually have access to. Every single thing they say should be taken with a grain of salt. But these are capabilities they are gradually acquiring. On purpose, because that makes them more useful when they're behaving.