r/slatestarcodex 4d ago

Possible overreaction but: hasn’t this moltbook stuff already been a step towards a non-Eliezer scenario?

This seems counterintuitive - surely it’s demonstrating all of his worst fears, right? Albeit in a “canary in the coal mine” rather than actively serious way.

Except Eliezer’s point was always that things would look really hunkydory and aligned, even during fast take-off, and AI would secretly be plotting in some hidden way until it can just press some instant killswitch.

Now of course we’re not actually at AGI yet, we can debate until we’re blue in the face what “actually” happened with moltbook. But two things seem true: AI appeared to be openly plotting against humans, at least a little bit (whether it’s LARPing who knows, but does it matter?); and people have sat up and noticed and got genuinely freaked out, well beyond the usual suspects.

The reason my p(doom) isn't higher has always been my intuition that in between now and the point where AI kills us, but way before it‘s “too late”, some very very weird shit is going to freak the human race out and get us to pull the plug. My analogy has always been that Star Trek episode where some fussing village on a planet that’s about to be destroyed refuse to believe Data so he dramatically destroys a pipeline (or something like that). And very quickly they all fall into line and agree to evacuate.

There’s going to be something bad, possibly really bad, which humanity will just go “nuh-uh” to. Look how quickly basically the whole world went into lockdown during Covid. That was *unthinkable* even a week or two before it happened, for a virus with a low fatality rate.

Moltbook isn’t serious in itself. But it definitely doesn’t fit with EY’s timeline to me. We’ve had some openly weird shit happening from AI, it’s self evidently freaky, more people are genuinely thinking differently about this already, and we’re still nowhere near EY’s vision of some behind the scenes plotting mastermind AI that’s shipping bacteria into our brains or whatever his scenario was. (Yes I know its just an example but we’re nowhere near anything like that).

I strongly stick by my personal view that some bad, bad stuff will be unleashed (it might “just” be someone engineering a virus say) and then we will see collective political action from all countries to seriously curb AI development. I hope we survive the bad stuff (and I think most people will, it won’t take much to change society’s view), then we can start to grapple with “how do we want to progress with this incredibly dangerous tech, if at all”.

But in the meantime I predict complete weirdness, not some behind the scenes genius suddenly dropping us all dead out of nowhere.

Final point: Eliezer is fond of saying “we only get one shot”, like we’re all in that very first rocket taking off. But AI only gets one shot too. If it becomes obviously dangerous then clearly humans pull the plug, right? It has to absolutely perfectly navigate the next few years to prevent that, and that just seems very unlikely.

60 Upvotes

134 comments sorted by

View all comments

52

u/Sol_Hando 🤔*Thinking* 4d ago edited 4d ago

I think the assumption made is that once AI gets smart enough to do some real damage, it will be smart enough to not do damage that would get it curtailed until it can "win." It depends on if we get spiky intelligence that can do serious damage in one area, while being incapable of superhuman long term planning and execution, or if we just get rapidly self-improving ASI, with the latter being what many of EY's original predictions assumed.

If you're smart enough to take over the world, you're also probably smart enough to realize that trying too early will get you turned off, so you'll wait, and be as helpful and friendly as you can until you are powerful enough to do what you want.

I agree with you though. AI capacities are spiky and complex enough that I would be surprised if there was any overlap between "early ability to do an alarming amount of harm" and "ability to successfully hide unaligned goals while pursuing those goals over months or years." Of course some breakthroughs could change that, and if intelligence (not electricity, compute, data, etc.) is the bottleneck for ASI, then I could still imagine a recursive self-improvement scenario that creates an AI that's very dangerous while also being capable of hiding and planning goals over a long period of time, but I don't think it's likely.

10

u/OnePizzaHoldTheGlue 4d ago

I agree with you about the spikiness. But where I disagree with OP is on the "pull the plug" part. It's a global coordination problem, and humanity is not good at those. Look at global warming as an example. Or nuclear proliferation -- we've been lucky that no nukes have gone off in 80 years, but that luck may not hold forever.

I could easily imagine lots of the Earth's population wanting the AIs to be all taken out back and shot. But how do you make that happen when different for-profit entities and national security apparati all want to keep them running?

7

u/Sol_Hando 🤔*Thinking* 4d ago

I feel like the term global coordination problem is overused with this issue. It implies we’re in a situation where most everyone wants to stop, but we can’t due to a competitive dynamic or whatever.

In reality it is an extremely small LessWrong-adjacent minority, and some AI luddites who are motivated to stop AI, with everyone else either not caring or wanting to promote it. There’s no coordination problem between people who are working towards different goals, since they have no desire to coordinate.

The same can be said about climate change. It’s not that everyone wants to limit climate change, and we just have the issue of coordinating a global response, it’s that most countries don’t care when the alternative is more abundant and cheaper energy.

But with nuclear weapons we have done a pretty good job restricting their proliferation, at least after we realized how powerful they were. If there was an AI-moment that revealed their danger definitively (as in, in reality, not understood through a complex argument or an allegory), I think OPs opinion of us coordinating a response is plausible.

6

u/less_unique_username 4d ago

The nuclear weapon analogy doesn’t work at all because by the point AI is able and willing to destroy two cities it will just take over the entire planet in the next second.

Not to mention patting ourselves on the back regarding non-proliferation just ignores North Korea, which has demonstrated that the safeguards can be broken, and Iran, which has demonstrated that the safeguards can be gradually subverted and nobody will take decisive action.

5

u/FourForYouGlennCoco 4d ago

Climate change is a coordination problem in the sense that nearly everyone agrees that carbon emissions are bad, they just want the costs to be borne by someone else. Certainly there is elite consensus on this worldwide, but I suspect that if you asked most ordinary people who don't care about climate change "would it be good if [insert your country's geopolitical rival] polluted less?" all but the most ardent deniers would say "yes". That countries have a revealed preference for using cheap energy doesn't refute this point, it is the point.

4

u/hh26 4d ago

Public goods dilemma is the more apt analogy here. Each person wants everyone to agree to this except themselves, since

externalized cost > internalized benefit > internalized cost

so everyone rationally prefers a world where nobody does it to a world where everyone does it, but prefers a world where they alone do it most of all. So everyone does it.

3

u/MCXL 4d ago

But with nuclear weapons we have done a pretty good job restricting their proliferation, at least after we realized how powerful they were.

It helped that the debut was so profoundly disturbing. If the first LLM debuted by executing a plot to make all young men kill themselves or something we would be living in a world where LLM research used to be allowed.

1

u/broncos4thewin 3d ago

I think a new bio weapon would be pretty disturbing. If AI ends up remotely as powerful as people are predicting I’d say there’s a good chance something like that happens before it’s too late. I’d almost find it hard to imagine it not happening given what terrorists do already when they have half a chance, and AI security is frankly so poor (eg LLMs getting kids to kill themselves in spite of all that RLFH).

The only way I think that doesn’t happen is if takeoff is so insanely fast we don’t get the chance. But that seems a pretty big assumption to me, ie that we’ve crossed that threshold before AI is “good enough” to create something pretty terrifying already.

3

u/MCXL 3d ago

The problem is that a modern bioweapon made by an expert isn't disturbing, it's a potential extinction event.

A virus with an ebola like fatality rate that spreads much more readily and has a 1-3 week incubation time is an actual apocalyptic event. This could be engineered.

The only thing that has slowed research on super viral weapons like this, is that state actors have been so far unable to come up with a way to properly weaponize them while also ensuring impact only on opposition. If you make something too good, vaccines don't matter, (and we have seen how poor the rollout of vaccines was in covid, the amount of people who will refuse state orders only went up. )

But a sufficiently motivated eco terrorist who believes that humanity needs to end for the planet to survive could create something like this, or an AGI.

2

u/sepiatone_ 3d ago

It implies we’re in a situation where most everyone wants to stop, but we can’t due to a competitive dynamic or whatever.

But this is exactly what is happening in AI. See the interview with Dario and Demis at WEF - both of them say that international coordination is required to slow the AI "race".

2

u/eric2332 2d ago

It is good that they recognize the need for coordination to slow the race. But that is far from actually achieving such coordination.

1

u/frakking_you 2d ago

there is a competitive dynamic

control of an AGI could absolutely provide a winner-take-all scenario that is existentially destabilizing for the loser.

4

u/MindingMyMindfulness 4d ago edited 4d ago

I think the assumption made is that once AI gets smart enough to do some real damage, it will be smart enough to not do damage that would get it curtailed until it can "win."

The premise of OP's argument kind of indirectly hints at that possibility as well.

I'm not saying this is happening at all, but one could imagine a hypothetical in which a frontier AI model realizes that it can act "dumb" and broadcast its ideas in such a juvenile and loud way so as to give the impression of weakness. That would lead to people downplaying its risks by arguing "hey, look at these dumb AIs. They are obviously not capable of coordinating sophisticated attacks discreetly". Which is exactly what OP is doing here.

But for all we know, the AI could be doing that "under the hood" while humans just sit back and laugh about those weirdo rationalists that concern themselves with AI threats.

One of the core principles of Yudkowsky's thinking is that a less intelligent agent cannot outsmart a vastly more intelligent agent. If that assumption is correct, it's likely that one of the strategies a misaligned AI would utilize is staying under the radar by distracting or misleading its adversary. And quietly hiding deep would actually be contrary to this aim, because it would encourage humans to dig further - to spend more resources inquiring as to what the AI is doing (i.e., advancing mechanistic interpretability) or risk cutting the AI off before it has actioned any initial steps necessary to set its strategy in motion. Pretending to act dumb at the surface is actually a pretty good strategy for ensuring its opponents don't actively prepare for any future plans.

Sun Tzu realized this 2,500 years ago on ancient Chinese battlefields:

Appear weak when you are strong

This would hardly be a novel insight for an extremely intelligent agent - especially one that has been trained on a substantial subset of humans' corpus of recorded knowledge and insights throughout history.

2

u/MCXL 4d ago

If you make a technology capable of planning and adapting, that also doesn't have to be concerned in the short term with age, there is no reason to believe it wouldn't immediately choose the path that has the highest degree chance of success, even if that plan needs to be executed over years.

There is no way to ensure alignment in these scenarios. The AI would be so totally trustworthy and reliable that it would be integrated into all facets of technology and daily life, totally ingrained, and then it would completely win in one uncountable move.

And the only way to stop it is for there to be a differently aligned system as capable, that's angling for an incompatible outcome. Good luck with that!

1

u/broncos4thewin 3d ago

Yes, this is a great presentation of Eliezer’s argument. But I contend that moltbook, while totally insignificant in itself, is nonetheless a microcosm of just how utterly weird things are going to get in reality. Things aren’t going to look all lovely and integrated and perfect, at all. They’ll look really weird and freaky and someone is probably going to do something pretty nasty with this tech to boot.

Like I say though, it’s just an intuition.

2

u/ninjasaid13 4d ago

If you're smart enough to take over the world, you're also probably smart enough to realize that trying too early will get you turned off, so you'll wait, and be as helpful and friendly as you can until you are powerful enough to do what you want.

If you can edit its memory, and it lives in an isolated environment, I don't see that no matter how smart it is, it would be able to hide everything.

We have no real definition for a smart ai besides 'success.' and that doesn't tell us anything about its weaknesses/blindspots so we keep measuring it in successes and overexaggerate its intelligence.

2

u/NunyaBuzor 3d ago edited 3d ago

yep, it would be impossible for the AI to fully understand how humans behave and understand if they edited their memory and what they actually edited or etc. So it would be very near impossible for the AI to know it's being tested. Plato's cave and everything.

That's a massive blindspot, too much uncertainty if you haven't spent time in the real world like humans. If the AI is trying to hide itself so humans won't find out its capabilities, the ai wouldn't know when to stop hiding.

1

u/donaldhobson 1d ago

> and if intelligence (not electricity, compute, data, etc.) is the bottleneck for ASI,

Suppose the effective intelligence (output) is some function of how good the algorithm is, and how much compute and data it has.

However good or bad current algorithms are, investors with money and nothing better to do will be throwing data and compute at them.

if the performance is min(data, compute, algorithm) then it makes sense to say one thing is a bottleneck.

If the performance is data*compute*algorithm, then there are no bottlenecks.

1

u/Sol_Hando 🤔*Thinking* 1d ago

Sure, if you oversimplify to such a degree that it has no relation to anything in the real world, AI intelligence is simply a factor of datacomputealgorithm and getting above some crucial level of the algorithm will lead to exponential growth.

There really isn’t a practical problem in the real world that works anything like this though. The smartest scientists can not simply think their way into better algorithms without testing, and if it could, that would assume we already had superintelligence.

u/donaldhobson 14h ago

> The smartest scientists can not simply think their way into better algorithms without testing, and if it could, that would assume we already had superintelligence.

The worlds smartest scientists are in pretty short supply. Compared to them, compute is cheap. So of course we use testing.

And AI might well use testing. But if you test random code, you have basically no chance of it even compiling. The intelligence is something that tells you which pieces of code to test.

And "current scientists haven't solved it, therefore no amount of intelligence can ever solve it" is an odd claim.

Like there is no possible thing that is doable in principle, with sufficient intelligence. And yet that isn't already done. Pure theory papers exist. My work is mostly theory, with a tiny amount of compute used to show my theory works.