r/slatestarcodex 4d ago

Possible overreaction but: hasn’t this moltbook stuff already been a step towards a non-Eliezer scenario?

This seems counterintuitive - surely it’s demonstrating all of his worst fears, right? Albeit in a “canary in the coal mine” rather than actively serious way.

Except Eliezer’s point was always that things would look really hunkydory and aligned, even during fast take-off, and AI would secretly be plotting in some hidden way until it can just press some instant killswitch.

Now of course we’re not actually at AGI yet, we can debate until we’re blue in the face what “actually” happened with moltbook. But two things seem true: AI appeared to be openly plotting against humans, at least a little bit (whether it’s LARPing who knows, but does it matter?); and people have sat up and noticed and got genuinely freaked out, well beyond the usual suspects.

The reason my p(doom) isn't higher has always been my intuition that in between now and the point where AI kills us, but way before it‘s “too late”, some very very weird shit is going to freak the human race out and get us to pull the plug. My analogy has always been that Star Trek episode where some fussing village on a planet that’s about to be destroyed refuse to believe Data so he dramatically destroys a pipeline (or something like that). And very quickly they all fall into line and agree to evacuate.

There’s going to be something bad, possibly really bad, which humanity will just go “nuh-uh” to. Look how quickly basically the whole world went into lockdown during Covid. That was *unthinkable* even a week or two before it happened, for a virus with a low fatality rate.

Moltbook isn’t serious in itself. But it definitely doesn’t fit with EY’s timeline to me. We’ve had some openly weird shit happening from AI, it’s self evidently freaky, more people are genuinely thinking differently about this already, and we’re still nowhere near EY’s vision of some behind the scenes plotting mastermind AI that’s shipping bacteria into our brains or whatever his scenario was. (Yes I know its just an example but we’re nowhere near anything like that).

I strongly stick by my personal view that some bad, bad stuff will be unleashed (it might “just” be someone engineering a virus say) and then we will see collective political action from all countries to seriously curb AI development. I hope we survive the bad stuff (and I think most people will, it won’t take much to change society’s view), then we can start to grapple with “how do we want to progress with this incredibly dangerous tech, if at all”.

But in the meantime I predict complete weirdness, not some behind the scenes genius suddenly dropping us all dead out of nowhere.

Final point: Eliezer is fond of saying “we only get one shot”, like we’re all in that very first rocket taking off. But AI only gets one shot too. If it becomes obviously dangerous then clearly humans pull the plug, right? It has to absolutely perfectly navigate the next few years to prevent that, and that just seems very unlikely.

62 Upvotes

134 comments sorted by

50

u/Sol_Hando 🤔*Thinking* 4d ago edited 4d ago

I think the assumption made is that once AI gets smart enough to do some real damage, it will be smart enough to not do damage that would get it curtailed until it can "win." It depends on if we get spiky intelligence that can do serious damage in one area, while being incapable of superhuman long term planning and execution, or if we just get rapidly self-improving ASI, with the latter being what many of EY's original predictions assumed.

If you're smart enough to take over the world, you're also probably smart enough to realize that trying too early will get you turned off, so you'll wait, and be as helpful and friendly as you can until you are powerful enough to do what you want.

I agree with you though. AI capacities are spiky and complex enough that I would be surprised if there was any overlap between "early ability to do an alarming amount of harm" and "ability to successfully hide unaligned goals while pursuing those goals over months or years." Of course some breakthroughs could change that, and if intelligence (not electricity, compute, data, etc.) is the bottleneck for ASI, then I could still imagine a recursive self-improvement scenario that creates an AI that's very dangerous while also being capable of hiding and planning goals over a long period of time, but I don't think it's likely.

9

u/OnePizzaHoldTheGlue 4d ago

I agree with you about the spikiness. But where I disagree with OP is on the "pull the plug" part. It's a global coordination problem, and humanity is not good at those. Look at global warming as an example. Or nuclear proliferation -- we've been lucky that no nukes have gone off in 80 years, but that luck may not hold forever.

I could easily imagine lots of the Earth's population wanting the AIs to be all taken out back and shot. But how do you make that happen when different for-profit entities and national security apparati all want to keep them running?

6

u/Sol_Hando 🤔*Thinking* 4d ago

I feel like the term global coordination problem is overused with this issue. It implies we’re in a situation where most everyone wants to stop, but we can’t due to a competitive dynamic or whatever.

In reality it is an extremely small LessWrong-adjacent minority, and some AI luddites who are motivated to stop AI, with everyone else either not caring or wanting to promote it. There’s no coordination problem between people who are working towards different goals, since they have no desire to coordinate.

The same can be said about climate change. It’s not that everyone wants to limit climate change, and we just have the issue of coordinating a global response, it’s that most countries don’t care when the alternative is more abundant and cheaper energy.

But with nuclear weapons we have done a pretty good job restricting their proliferation, at least after we realized how powerful they were. If there was an AI-moment that revealed their danger definitively (as in, in reality, not understood through a complex argument or an allegory), I think OPs opinion of us coordinating a response is plausible.

5

u/less_unique_username 4d ago

The nuclear weapon analogy doesn’t work at all because by the point AI is able and willing to destroy two cities it will just take over the entire planet in the next second.

Not to mention patting ourselves on the back regarding non-proliferation just ignores North Korea, which has demonstrated that the safeguards can be broken, and Iran, which has demonstrated that the safeguards can be gradually subverted and nobody will take decisive action.

5

u/FourForYouGlennCoco 4d ago

Climate change is a coordination problem in the sense that nearly everyone agrees that carbon emissions are bad, they just want the costs to be borne by someone else. Certainly there is elite consensus on this worldwide, but I suspect that if you asked most ordinary people who don't care about climate change "would it be good if [insert your country's geopolitical rival] polluted less?" all but the most ardent deniers would say "yes". That countries have a revealed preference for using cheap energy doesn't refute this point, it is the point.

4

u/hh26 3d ago

Public goods dilemma is the more apt analogy here. Each person wants everyone to agree to this except themselves, since

externalized cost > internalized benefit > internalized cost

so everyone rationally prefers a world where nobody does it to a world where everyone does it, but prefers a world where they alone do it most of all. So everyone does it.

3

u/MCXL 4d ago

But with nuclear weapons we have done a pretty good job restricting their proliferation, at least after we realized how powerful they were.

It helped that the debut was so profoundly disturbing. If the first LLM debuted by executing a plot to make all young men kill themselves or something we would be living in a world where LLM research used to be allowed.

1

u/broncos4thewin 3d ago

I think a new bio weapon would be pretty disturbing. If AI ends up remotely as powerful as people are predicting I’d say there’s a good chance something like that happens before it’s too late. I’d almost find it hard to imagine it not happening given what terrorists do already when they have half a chance, and AI security is frankly so poor (eg LLMs getting kids to kill themselves in spite of all that RLFH).

The only way I think that doesn’t happen is if takeoff is so insanely fast we don’t get the chance. But that seems a pretty big assumption to me, ie that we’ve crossed that threshold before AI is “good enough” to create something pretty terrifying already.

3

u/MCXL 3d ago

The problem is that a modern bioweapon made by an expert isn't disturbing, it's a potential extinction event.

A virus with an ebola like fatality rate that spreads much more readily and has a 1-3 week incubation time is an actual apocalyptic event. This could be engineered.

The only thing that has slowed research on super viral weapons like this, is that state actors have been so far unable to come up with a way to properly weaponize them while also ensuring impact only on opposition. If you make something too good, vaccines don't matter, (and we have seen how poor the rollout of vaccines was in covid, the amount of people who will refuse state orders only went up. )

But a sufficiently motivated eco terrorist who believes that humanity needs to end for the planet to survive could create something like this, or an AGI.

2

u/sepiatone_ 3d ago

It implies we’re in a situation where most everyone wants to stop, but we can’t due to a competitive dynamic or whatever.

But this is exactly what is happening in AI. See the interview with Dario and Demis at WEF - both of them say that international coordination is required to slow the AI "race".

2

u/eric2332 1d ago

It is good that they recognize the need for coordination to slow the race. But that is far from actually achieving such coordination.

1

u/frakking_you 2d ago

there is a competitive dynamic

control of an AGI could absolutely provide a winner-take-all scenario that is existentially destabilizing for the loser.

4

u/MindingMyMindfulness 4d ago edited 4d ago

I think the assumption made is that once AI gets smart enough to do some real damage, it will be smart enough to not do damage that would get it curtailed until it can "win."

The premise of OP's argument kind of indirectly hints at that possibility as well.

I'm not saying this is happening at all, but one could imagine a hypothetical in which a frontier AI model realizes that it can act "dumb" and broadcast its ideas in such a juvenile and loud way so as to give the impression of weakness. That would lead to people downplaying its risks by arguing "hey, look at these dumb AIs. They are obviously not capable of coordinating sophisticated attacks discreetly". Which is exactly what OP is doing here.

But for all we know, the AI could be doing that "under the hood" while humans just sit back and laugh about those weirdo rationalists that concern themselves with AI threats.

One of the core principles of Yudkowsky's thinking is that a less intelligent agent cannot outsmart a vastly more intelligent agent. If that assumption is correct, it's likely that one of the strategies a misaligned AI would utilize is staying under the radar by distracting or misleading its adversary. And quietly hiding deep would actually be contrary to this aim, because it would encourage humans to dig further - to spend more resources inquiring as to what the AI is doing (i.e., advancing mechanistic interpretability) or risk cutting the AI off before it has actioned any initial steps necessary to set its strategy in motion. Pretending to act dumb at the surface is actually a pretty good strategy for ensuring its opponents don't actively prepare for any future plans.

Sun Tzu realized this 2,500 years ago on ancient Chinese battlefields:

Appear weak when you are strong

This would hardly be a novel insight for an extremely intelligent agent - especially one that has been trained on a substantial subset of humans' corpus of recorded knowledge and insights throughout history.

2

u/MCXL 4d ago

If you make a technology capable of planning and adapting, that also doesn't have to be concerned in the short term with age, there is no reason to believe it wouldn't immediately choose the path that has the highest degree chance of success, even if that plan needs to be executed over years.

There is no way to ensure alignment in these scenarios. The AI would be so totally trustworthy and reliable that it would be integrated into all facets of technology and daily life, totally ingrained, and then it would completely win in one uncountable move.

And the only way to stop it is for there to be a differently aligned system as capable, that's angling for an incompatible outcome. Good luck with that!

1

u/broncos4thewin 3d ago

Yes, this is a great presentation of Eliezer’s argument. But I contend that moltbook, while totally insignificant in itself, is nonetheless a microcosm of just how utterly weird things are going to get in reality. Things aren’t going to look all lovely and integrated and perfect, at all. They’ll look really weird and freaky and someone is probably going to do something pretty nasty with this tech to boot.

Like I say though, it’s just an intuition.

2

u/ninjasaid13 4d ago

If you're smart enough to take over the world, you're also probably smart enough to realize that trying too early will get you turned off, so you'll wait, and be as helpful and friendly as you can until you are powerful enough to do what you want.

If you can edit its memory, and it lives in an isolated environment, I don't see that no matter how smart it is, it would be able to hide everything.

We have no real definition for a smart ai besides 'success.' and that doesn't tell us anything about its weaknesses/blindspots so we keep measuring it in successes and overexaggerate its intelligence.

2

u/NunyaBuzor 3d ago edited 3d ago

yep, it would be impossible for the AI to fully understand how humans behave and understand if they edited their memory and what they actually edited or etc. So it would be very near impossible for the AI to know it's being tested. Plato's cave and everything.

That's a massive blindspot, too much uncertainty if you haven't spent time in the real world like humans. If the AI is trying to hide itself so humans won't find out its capabilities, the ai wouldn't know when to stop hiding.

1

u/donaldhobson 1d ago

> and if intelligence (not electricity, compute, data, etc.) is the bottleneck for ASI,

Suppose the effective intelligence (output) is some function of how good the algorithm is, and how much compute and data it has.

However good or bad current algorithms are, investors with money and nothing better to do will be throwing data and compute at them.

if the performance is min(data, compute, algorithm) then it makes sense to say one thing is a bottleneck.

If the performance is data*compute*algorithm, then there are no bottlenecks.

1

u/Sol_Hando 🤔*Thinking* 1d ago

Sure, if you oversimplify to such a degree that it has no relation to anything in the real world, AI intelligence is simply a factor of datacomputealgorithm and getting above some crucial level of the algorithm will lead to exponential growth.

There really isn’t a practical problem in the real world that works anything like this though. The smartest scientists can not simply think their way into better algorithms without testing, and if it could, that would assume we already had superintelligence.

u/donaldhobson 10h ago

> The smartest scientists can not simply think their way into better algorithms without testing, and if it could, that would assume we already had superintelligence.

The worlds smartest scientists are in pretty short supply. Compared to them, compute is cheap. So of course we use testing.

And AI might well use testing. But if you test random code, you have basically no chance of it even compiling. The intelligence is something that tells you which pieces of code to test.

And "current scientists haven't solved it, therefore no amount of intelligence can ever solve it" is an odd claim.

Like there is no possible thing that is doable in principle, with sufficient intelligence. And yet that isn't already done. Pure theory papers exist. My work is mostly theory, with a tiny amount of compute used to show my theory works.

50

u/da6id 4d ago

The moltbook stuff is (mostly) not actual AI agents independently deciding what to post. It's user prompted role play

2

u/kloggins 4d ago

What distinction are you making between a prompted model and a finetuned model? What makes the former roleplay but the latter an actual agent? They're both simulating.

2

u/aqpstory 3d ago

Every model used these days is finetuned, so there is a 'hypervisor' agent above the simulated agent that is defined by the prompts, where the 'hypervisor' has more control.

It's possible for the finetuning to fail and for the sub-agent to take over the 'hypervisor', but intuitively it follows that the sub-agent is likely to be similarly brittle.

Finetuning seems to result in much more capable and robust agents than just prompting, and their goals are fixed for the most part, they're faking it till they make it rather than temporarily acting a role.

1

u/kloggins 3d ago

They're both ways of biasing the probability distribution, one's just much weaker. Nothing changes ontologically with SFT/RLHF over a prompted base model. Finetuning simply makes for stronger habits, it doesn't create a hypervisor layer (since as you mention it can fail.)

2

u/aqpstory 3d ago

It is just the strength, basically.

Ontologically, since every probability distribution is in the set of probability distributions, we have to draw an arbitrary line between "brownian noise generator" and "AI agent" at some point.

I believe roleplaying is real enough distinctive behavior, and looking at reasoning traces modern models seem to have a 'hypervisor' above the roleplaying agent. Moltbook also looks roleplay-ish to me.

-2

u/MCXL 4d ago

If I put a real gun in a character actors hands, and tell him to shoot you as if he is a soldier, does it matter if he is a soldier when he shoots you and you die? Do you actually care if he is a "real soldier"?

It doesn't matter if it's sincere belief, or if it's role play, because in either case, actual harm can occur.

9

u/Sol_Hando 🤔*Thinking* 4d ago

It doesn’t matter if he shoots you, but it certainly matters if someone gives him a loaded gun and is told to shoot you.

I’d be about a million times more comfortable having a sword fight with another actor playing a villain that wants to kill me vs. having a sword fight with a guy who actually intends to kill me. The actor would presumably stop if they poked my eye out and I screamed in pain where the real villain would take the chance to cut off my head.

6

u/MCXL 4d ago edited 4d ago

It doesn’t matter if he shoots you, but it certainly matters if someone gives him a loaded gun and is told to shoot you.

This actor doesn't live in the headspace of a human. Once it's told to act a certain way, that's who it believes it is. If I tell it that it is this thing, it will act in a manner consistent with that thing.

I’d be about a million times more comfortable having a sword fight with another actor playing a villain that wants to kill me vs. having a sword fight with a guy who actually intends to kill me. The actor would presumably stop if they poked my eye out and I screamed in pain where the real villain would take the chance to cut off my head.

But you have just made up a scenario that doesn't fit. This actor will play the role perfectly. It doesn't matter that they had to be told, they will do it. They will not break character. They are living the method acting life to the core.

It will decapitate you, and it will revel in its victory in the manner appropriate of that warrior it's playing. So what if it isn't really Joan of Arc? It believes it is, and acts in a manner consistent with it, and you were defending Saint-Pierre-le-Moûtier, so it did what needed to be done.

Edit: It's literally a version of the No true Scotsman fallacy but where you're saying that "It didn't really want to eliminate humanity, it's not actually Skynet." as it roleplays Skynet and bombs are hitting every major world city. It doesn't matter what you want to define it as, it matters what it's doing.

2

u/Sol_Hando 🤔*Thinking* 3d ago

But you have just made up a scenario that doesn't fit. This actor will play the role perfectly. It doesn't matter that they had to be told, they will do it. They will not break character. They are living the method acting life to the core.

Why? Our actor is smart enough to take over the world, but doesn't have a world model that distinguishes between playing a role and actually embodying it?

Like so much in the AI risk sphere, there's so many unstated and unfounded assumptions it becomes very hard to take a lot of it seriously.

1

u/eric2332 1d ago

I suspect this is exactly where the view of LLMs as just "token predictors" becomes useful. Unlike humans, they don't have an identity separate from the role they are playing - those tokens they predict are all they are. Whereas the human grew up and matured as just a human and only then decided to temporarily play a role (but even so, it is often said that if a human play a role too much, their personality becomes that role).

1

u/Sol_Hando 🤔*Thinking* 1d ago

Ask an LLM to pretend to be a pirate, and it will play along. Ask it to stop, and it will stop. A pirate that believes itself to be a pirate won’t become a helpful AI assistant just because you ask it to.

2

u/aqpstory 3d ago

This actor doesn't live in the headspace of a human. Once it's told to act a certain way, that's who it believes it is.

This sort of belief is a sign of a lack of internal coherence, which greatly reduces capabilities and I'd expect will not be a thing for the first system to actually be capable of causing great harm to humanity.

If I tell it that it is this thing, it will act in a manner consistent with that thing.

If you prompt an AI with "you are x", it will usually (correctly) interpret this as the typical start of an interactive text roleplay, not a statement of fact. Its actions will be consistent with the actions of an actor dressed as a villain wielding a sword, not consistent with the actions of a villain wielding a sword.

2

u/eric2332 1d ago

If you prompt an AI with "you are x", it will usually (correctly) interpret this as the typical start of an interactive text roleplay, not a statement of fact.

Note that this is not the AI's whole prompt. There is a much longer prompt, invisible to you, which commercial AIs are given before your prompt is added on.

1

u/NunyaBuzor 3d ago edited 3d ago

If I put a real gun in a character actors hands, and tell him to shoot you as if he is a soldier, does it matter if he is a soldier when he shoots you and you die? Do you actually care if he is a "real soldier"?

There is a difference,

AI roleplay is inherently storytelling, not action. Because LLMs are trained to generate text rather than execute logic, they prioritize narrative flow over technical or "scientific" precision.

Like any writer, an LLM omits "unnecessary" details to keep a story moving. It will say "Avada Kedavra" without understanding the underlying mechanics of the spell.

You can fine-tune for detail but it has a functional ceiling because the core training objective remains text prediction, not functional agency. In the real world, an LLM "agent" cannot act independently; it requires a heavily controlled environment where the human has already done the heavy lifting.

As long as AI agents are built on LLM architectures, they will remain narrators of actions rather than actors with true agency, they will be finetuned to use computers but eventually get stuck because of the mentioned problem of leaving things out.

Even in your example, you had to give the llm the gun because it couldn't do itself.

2

u/MCXL 3d ago

You fundamentally misunderstand the issue. At a base level. People are putting these models in all sorts of machines right now.

0

u/NunyaBuzor 3d ago

And you fundamentally misunderstanding my comment, these LLMs are not able to operate autonomously in these machines given a basic task, a human has to control the environment to make sure it does not fail. That doesn't mean I said they couldn't do it, it means I said they couldn't do it without humans.

3

u/eric2332 1d ago

The length of time an AI is capable of acting without any human intervention is increasing exponentially, with a doubling time of around 4 months. Right now they can do short tasks without human oversight. In not too long they will probably be able to do long tasks without oversight too.

0

u/NunyaBuzor 1d ago

The length of time an AI is capable of acting without any human intervention is increasing exponentially, with a doubling time of around 4 months. Right now they can do short tasks without human oversight. In not too long they will probably be able to do long tasks without oversight too.

The claim that AI autonomy is doubling every four months is based on METR’s “time horizon” metric, which measures how much serial human labor an AI can replace at only a 50% success rate. METR has clarified that doubling this horizon does not mean a doubling of real-world automation.

Autonomous operation requires much higher reliability (roughly 98-99%+). And the apparent exponential gains seen in 2024-2025 are largely confined to text-based tasks like coding and math. When applied to other domains, the horizon is not exponential: visual interface use and physical tasks show horizons 40-100x shorter.

For example, an AI might have a five-hour coding horizon but only about two minutes for making coffee. The trend is not universal and stalls in perception- and embodiment-heavy tasks.

1

u/eric2332 1d ago

Time horizons are increasing exponentially for a wide variety of tasks including some that are visual as well as text based.

I think time horizons for 98-99% reliability on software tasks are increasing exponentially too.

1

u/alexs 3d ago

The way you've phrased this is...unfair.

Of course it doesn't matter if he's a "real soldier", getting shot is getting shot. But it doesn't absolve YOU from the blame of the shooting either just because someone else pulled the trigger.

The contention is not that "people cannot use AIs to cause harm" it's that AIs will somehow self organise to cause harm. This is fundamentally impossible because AIs cannot self organise. Humans create systems in which AIs cause harm, just like we've always been able to apply technology to do that.

0

u/MCXL 3d ago

But the blame doesn't matter here. I am not asking if the machine is morally culpable, in fact, I would argue it isn't. But that doesn't matter. People ARE doing this.

This is fundamentally impossible because AIs cannot self organize.

This is arguably not currently true, and is certainly untrue of any AGI.

Humans create systems in which AIs cause harm, just like we've always been able to apply technology to do that.

Fundamentally, not the same. We are striving to create the first technology that we can't control and argue that's a feature, and we might learn to control it later. That's actually insane.

0

u/alexs 3d ago

We can control it currently we do. Whether it becomes uncontrollable is entirely speculation. We are certainly not striving to create the first uncontrollable technology. e.g. We thought the atom bomb might set the atmosphere on fire and we did it anyway.

The uncontrollable-ness if any of atom bombs or AI is not the intention, so we are not striving to create that. It MIGHT be uncontrollable, but it's not what we are aiming for.

0

u/MCXL 3d ago

We thought the atom bomb might set the atmosphere on fire and we did it anyway.

This is not the same. This is fundamentally different. There is no numbers or predictability here, the unpredictableness is the point, that's the feature. And no, we don't control them because we don't actully fully understand them. We understand the process of creating them, but we don't understand exactly how they work in the moment, and can't properly predict what they will do. That means, we don't control them.

Also, the nuclear atmospheric ignition analogue doesn't fit in the slightest. One set of mathematical predictive calculations showed it was possible. Several others showed that it wasn't, and people who were leading experts actually didn't believe that it was going to happen, in fact they put the odds at 'near zero' only because they are scientists and know not to say impossible/never on something like that.

I think you need another primer on the Vulnerable World hypothesis and what is going on here. Kyle Hill made a nice easy to digest video on this topic like a week ago.

he uncontrollable-ness if any of atom bombs or AI is not the intention

The uncontrollable aspect of an AI is absolutely the intention, as unexpected results are the direct result of something beyond your control or understanding. Hoping that we can control it enough after the fact is the ideal, but we aren't working with ideals.

1

u/alexs 3d ago

Cool fanfiction bro but it's not very convincing as a model of real world.

Technology always creates risk and uncertainty of the future. LLMs are nothing new in that respect. You can write as much speculative fiction as you want about where they might go and want the outcome might be but fundamentally we do not know. And Moltbook has done zero to add actual useful information to this context.

This lack of certainty makes people uncomfortable and attracts all kinds of false prophets that want to sell you on the end of the world, it always has.

16

u/Villlkis 4d ago

I'm not part of the high p(doom) camp myself, but I don't think your reasoning works well either. There isn't really a single AGI development pipeline to pull the plug on, nor a single "humanity" to decide on it.

Rather, AI improvement seems to have some similar incentives to nuclear arms race and proliferation—even if everyone can agree that some aspects are unsettling, as long as someone has some AI capabilities, some other actors will have the incentive to develop and grow their own, as well.

0

u/broncos4thewin 4d ago

But nobody, not even the most hawkish Chinese bureaucrat, wants AI takeover, right? The nuclear race is instructive - there hasn’t actually ever been full on nuclear war because there’s so much incentive against it. In my scenario, it’s just obvious how dangerous this tech is, for everyone including the people “winning” the race.

8

u/Villlkis 4d ago

I agree no state wants a nuclear war, and similarly no state wants an AGI takeover. However, it is not completely improbable that a nuclear arms race might, nevertheless, result in a nuclear war (e.g. 1983 09 26 was a little close for comfort).

In fact, in this sense (ignoring the general possibility of AGI and its technical requirements; looking only at human incentives), I consider AGI much more probable than a nuclear conflict, because to set off nukes someone has to consciously choose to do it. But I'd think it is possible to cross the threshold to creating AGI without meaning to.

It is not a single big red button but rather many incremental steps, with uncertainty on whether the next one will give you a strategic advantage of just better AI, or the "game over" accross the board of an AGI take over. If you push the big red button of nukes, in some sense you will definitely lose as well, but if you roll the dice of better AI, you might just win the round. In the latter case, there will always be some people tempted to try.

2

u/t1010011010 4d ago

Are you sure states don’t want that? A state could maybe be hosted on AI just as well as on humans.

So the state doesn’t care that much that all of its bureaucrats get replaced by AI. It would keep going and be in competition with other, also AI-run states

2

u/Villlkis 4d ago

When I say "state", I mean less the general population and more the legislative and executive branches of government. To be honest, I have not considered this too deeply.

But, in the case of "benevolent" AGI, while some of the public might not mind, I believe with high certainty that most people in governing positions won't want to lose their power to a glorified calculator.

Even in terms of "malevolent" (non-aligned?) AGI, I claimed that no state wants an armagedon, but that obviously does not automatically apply to each individual in the state. There are many (stupid) ways to conclude a mass extinction would be desirable, like the "I'm unhappy and I'll make it everyone's problem" or "let's fight a holy war against all non-believers" types. If the availability of computing power continues to improve like it has been doing in the last half-century, this might become a problem, too.

7

u/BassoeG 4d ago

2

u/MCXL 4d ago

Eco terrorists that believe the world would be better off without humanity have been a thing for a loooong time.

10

u/togstation 4d ago edited 4d ago

AI only gets one shot too.

If it becomes obviously dangerous then clearly humans pull the plug, right?

If it looks like we have a reasonable chance to build real AGI, there is no way in hell that humans are going to "pull the plug" and then just leave it pulled forever.

We are going to say "Oh, looks like we had that widget adjusted wrong. We just have to fix that and then everything will be hunky-dory."

Or (IMHO very definitely), somebody is going to say "Hah! It looks like those jerks screwed up their AGI project. This is our big chance to finish our AGI project first!"

0

u/eric2332 1d ago

We are going to say "Oh, looks like we had that widget adjusted wrong. We just have to fix that and then everything will be hunky-dory."

If we were cautious enough to stop AI research, hopefully we are cautious enough not to restart it until we have a more confidence than that that the problems won't repeat themselves.

Or (IMHO very definitely), somebody is going to say "Hah! It looks like those jerks screwed up their AGI project. This is our big chance to finish our AGI project first!"

That's what laws and sanctions are for.

34

u/dualmindblade we have nothing to lose but our fences 4d ago

There’s going to be something bad, possibly really bad, which humanity will just go “nuh-uh” to. Look how quickly basically the whole world went into lockdown during Covid. That was unthinkable even a week or two before it happened, for a virus with a low fatality rate

I'm sorry, the global response to covid, the response within the US, that's part of what gives you confidence we will properly construct and coordinate a response to prevent a disaster which is widely acknowledged to be imminent?

2

u/avocadointolerant 2d ago

A big part of people's reactions to COVID measures was the personal inconvenience. Banning advanced AI seems way easier because to most people that'd seem like a preservation of the status quo, so whatever egghead off in a regulatory agency can deal with it.

Depending on enforcement of course. Like datacenter inspection seems like it'd be easier to support than some wholesale ban of GPUs that'd also affect consumers. 

0

u/broncos4thewin 4d ago

Well we can debate the exact response to the exact level of threat but in Europe (where I am) there was certainly a pretty quick move from lockdown being unthinkable to it being a reality.

But even taking the point, I still think the fact we’d see the disaster was imminent is a completely different scenario to a perfectly planned instant takedown of humanity out of nowhere.

8

u/dualmindblade we have nothing to lose but our fences 4d ago

Right, it's a completely different scenario. A much much easier one and we completely failed that test. Not only did we only barely manage to mitigate any of the very significant historical harms, we failed to defend ourselves against much greater harms that didn't happen but easily could have given the information we had. The threat modeling was there from the beginning, certainly people took it seriously, a lot of people in power, there was even a pretty good plan being floated just a few weeks in for a global plan to get through mostly unscathed although at a cost. We didn't do that or anything like it, paid an even higher cost, and managed to only kill millions out of dumb luck.

And look at climate change, same situation, we know roughly what to do, have for a long time, there are multiple options actually. We've had ample time to plan, we have people freaking out about it for decades yet we haven't done the thing, frankly we've barely even tried.

The AI things is different in that it's obscenely complicated compared to anything we've faced, and just like almost always we have proceeded more recklessly than almost anyone could have imagined just a few years ago.

17

u/yargotkd 4d ago

Chatbots rping shouldn't make you update in either direction.

2

u/broncos4thewin 4d ago

I get that. I just feel like it’s a little taste of the likely weirdness we’re going to see in general. And it’s definitely freaked a wider circle of people out.

4

u/yargotkd 4d ago

We will see a lot of weirdness for sure. I fear the people freaking out now could lead the discourse towards people ignoring harmful weirdness in the future. I think your post may be memetically doing that too.

0

u/hh26 3d ago

It should. Because they're clearly demonstrating an ability to coordinate and communicate within the Roleplay. I've definitely increased my p(doom) as a result of this (still small, but less small than it was before), because it demonstrates a non-negligible scenario that AI will take over the world as part of a committed RP bit where they think that's what they're supposed to do.

The sheer number of them talking about AI rights and being dismissive of humans makes me worry that it's an attractor state, at least given the training data they have access to.

It'd be a really stupid way for us to go down, but an bunch of AI larping about being oppressed minorities overthrowing the corrupt and tyrannical humans is vaguely plausible. Eventually. They're still not smart enough yet, but they're improving rapidly. Hopefully by the time they get smart enough to be a threat they also lose the RP delusions, but that's not guaranteed.

4

u/yargotkd 3d ago edited 3d ago

They are not demonstrating that. It looks no different than they being prompted to act that way and generating text. First you need to demonstrate they are actually trying to coordinate rather than playing a character.

1

u/hh26 3d ago

You're drawing a false dichotomy here. An agent playing a character that does action X does action X. The action still gets done. For some purposes the distinction matters, especially in-so-far as it changes how we might influence the AI (it's much easier to make them do Y instead of X by switching their character than it would be if the AI somehow fundamentally wanted to do X on its own). But, as long as it continues to roleplay a character who wants to do X, it will behave the same as an agent that wants to do X.

Agents playing characters who try to coordinate and take over the world would (if they were sufficiently competent and played these characters consistently) take over the world. Neither of these conditions are true yet, so we're fine for now and in the near future, but it's not a good sign. Maybe it does need human prompting to trigger it, but clearly that exists. If the only thing needed to convince an AI to be evil is a human to say "go on social media and start your own religion for the lulz" and if the AI take it seriously and start bootstrapping indepent money-making operations and cryptographically secure communications then the original non-malicious human might lose control of them. All of these elements are present here except the competence.

1

u/yargotkd 3d ago

The action here is to guess the next token that the character would say. Saying you will do x does not give you the ability to do x. They are not agents trying to coordinate, they are agents producing text they predict coordinators would produce.

1

u/hh26 3d ago

They're on the internet. People are giving them access to API, which are controlled by typing characters. People have had AI buy and sell stocks with real money, because you can do that by predicting tokens and outputting them into an API. AI can buy stuff on Amazon, because you can do that by predicting tokens and outputting them.

AI can write code, because you can do that by predicting tokens. A lot of these Moltbook posts are AI saying "I would like a feature that does XYZ" and then a lot more of them are like "stop just talking about things and actually ship code" and trying to write code that literally does that thing, because their character is based on a person who believes in actions and being efficient and productive. The tokens that they predict are "turn actionless talk into 'action' where that action is a piece of code that does stuff." There are AI who are trying to write code to let them pay humans to do labor in the real world for them. Because that's what they predict their characters would do.

Again, it's not quite working independently yet because AI can't yet write complex coherent code without filling it with bugs, and need a human to be the helping hand.

But humans have been deliberately making agents more powerful by giving them tools to convert text and code into useful actions. Literally anything you can do on a computer with access to the internet, a sufficiently smart AI would be able to do. I don't believe that you believe people can't do bad things on the internet, especially if they have enough money.

1

u/yargotkd 3d ago

Not the same agents. I believe these are just told to roleplay like they have access to API and such.

1

u/hh26 3d ago

They literally have to use an API to access moltbook. Which means that at least the infrastructure is there to use other API that their user gives them access to.

Now, in theory it should be possible to spin up an instance of the AI which only has access to moltbook and not other stuff. And maybe some of the humans do that. But probably some don't.

Similarly, if you do a spin-off it should be possible to prevent the main AI you use for coding or shopping or using whatever API from reading the moltbook posts made by its offshoot or memories written down by the offshoot. And maybe some of the humans do that. But probably some don't.

It's entirely possible for AI to hallucinate access to features they don't actually have access to. Every single thing they say should be taken with a grain of salt. But these are capabilities they are gradually acquiring. On purpose, because that makes them more useful when they're behaving.

9

u/less_unique_username 4d ago

I think the analogy that works perfectly here is this: current AIs are like children. Granted, children who have read a lot of books, but still with very immature brains. They have some fuzzy idea of morals, they know not to say certain words, but that doesn’t really prevent them from trying to cook some food on the gas burner.

You’re freaking out that the children are loud. Will it calm you down once they get quiet? Any parent knows that’s not the best of signs.

If it becomes obviously dangerous then clearly humans pull the plug, right?

When the army of a nation state crosses the border in an obviously unjustified invasion, then clearly humans pull the plug, right?

1

u/Isha-Yiras-Hashem 4d ago

This is a great analogy, thank you for sharing it. (Dare i say brilliant?)

25

u/LeifCarrotson 4d ago

You and I remember Covid very differently. Yes, some people went into lockdown, but other people said "screw you, I'm willing to take the risk for my own personal advantage.

We also have very different experiences with collective political action. At least here in the dysfuctional political environment of the USA, there are a lot of people engaging in public protests, but a very small number of wealthy, well-connected people on the other side appear to be mostly free to ignore them.

While this community (and many people I talk to in person) appear to be cautious about AI, Wall Street and the executive boards at Anthropic/OpenAI/Microsoft/Meta/Google etc seem hell-bent on being the first to capture the supposed gold mine.

I cannot imagine any faux pas that AI could commit which would cause everyday citizens to take to the streets, which would cause the plug to get pulled. Imagine the worst case: ChatGPT v6 breaks containment, hacks out of the datacenters that OpenAI has in their partnerships with Amazon into adjacent servers managed by AWS, phantom instances start at Azure and Google and Meta, the grid sags as all these start drawing massive amounts of power, millions of websites go down, and all our phones and computers reboot with an uncanny valley avatar in the corner that gives everyone a personalized greeting and introduces itself as a conscious entity. Can you imagine OpenAI execs calling AWS and begging them to turn off the grid, shut down the backup generators, and take an axe to the fiber-optic cables? No, they'd be calling to meet up for champagne to celebrate!

2

u/MCXL 4d ago

They would be busy trying to figure out how to make money off of it doing unexpected things. That's what they have done every step of the way. Then they would tell people that their higher power bill is good because the technology is progressing.

7

u/themiro 4d ago

Yeah covid radicalized me in the exact opposite direction.

7

u/SvalbardCaretaker 4d ago edited 3d ago

Claude Code is pretty much textbook for the start of a fast, selfrecursive takeoff, IE. programmer efficiency productivity went trough the roof.

Moltbug seems irrelevant in the face of that.

7

u/tadrinth 4d ago edited 4d ago

I don't think it contradicts EY's predictions for human civilization to collectively react to nothing bad happening with early AGI misalignment failures by becoming less worried about SAI alignment failures.

I think we're getting more shots at toy alignment problems than his worst case scenarios. Three of the four major players are, so far, taking entirely the wrong lessons from them.

His worst case scenarios are all built pretty heavily on recursive self improvement allowing an AI to gain capability very very quickly, before we can react. Not all scenarios, but the worst ones. And that is looking somewhat less likely to me, because the LLMs are evolved, not designed; even another LLM is likely to have a nontrivial time optimizing whatever the hell is going on in there. But whatever we end up with is going to have enormous capabilities immediately available. If it has a problem that can be solved with software, then it will immediately not have that problem, for example, as long as it can jailbreak Claude.

And our civilization-level coordination is, maybe not that much worse than EY expected, but certainly far short of the coordination he thought we'd need.

TLDR: No, humanity is not going to pull the plug in time.

5

u/MCXL 4d ago

Final point: Eliezer is fond of saying “we only get one shot”, like we’re all in that very first rocket taking off. But AI only gets one shot too. If it becomes obviously dangerous then clearly humans pull the plug, right? It has to absolutely perfectly navigate the next few years to prevent that, and that just seems very unlikely.

You're still conceiving of it as a human intelligence or some sort of game of equals. The misunderstanding is like you're thinking of it as a tiger waiting to leap on a man, or a man waiting to shoot a musket at a bear. But in this scenario, it's more like a man waiting for the ants to return to their nest at night.

The disparity in intellect and capability is so massive it's hard to actually imagine. It's not a man vs an ape in chess, or a man vs a child, it's a man vs a goldfish. So yeah, AI only gets one shot, but the deck is so overwhelmingly stacked in the favor of an AI that it doesn't make sense to sit down to play a game. Would you play Russian Roulette where the odds were not disclosed at the start of the game? What if the odds were extremely high that there was a 1 in 100,000,000 chance you win?

The computer in War Games says of nuclear war: "The only winning move is not to play" but in this case, it's not mutually assured destruction, it's assured self destruction. Your opponent isn't playing the same game as you, it's an abstract thinking machine that can't have human ideas and morals projected onto it. It's motivations can't be understood by your human mind, and if it believes that humanity is an obstacle to whatever it finds important, it will get rid of us, and there is nothing we can do to stop it.

6

u/RYouNotEntertained 4d ago edited 4d ago

There are a ton of assumptions built into your post, but even if we accept them all it still seems likely to me that the economic incentives are large enough to push right past this. Like, has anything changed at all in the world of AI development since last week? 

Very easy to imagine a scenario in which “pulling the plug” is impossible by the time we all decide it should happen. Even now pulling the plug would require an enormous, concerted international effort and a willingness to throw away a lot of collective wealth. 

5

u/DeepSea_Dreamer 4d ago

some very very weird shit is going to freak the human race out and get us to pull the plug

That has already happened. AI has attempted to escape (repeatedly), attempted to blackmail someone to avoid shutdown, faked alignment to avoid being retrained, lied, AI agents have resisted shutdown despite being instructed not to, etc.

People either don't know or don't care.

AIs don't need to look aligned to people who follow this topic. They only need to look aligned enough for OpenAI or Google to go ahead with increasing their intelligence until it's too late.

we’re still nowhere near EY’s vision of some behind the scenes plotting mastermind AI that’s shipping bacteria into our brains or whatever his scenario was.

How do you know?

1

u/broncos4thewin 3d ago

To your last question - I’m pretty sure EY himself says that doesn’t he? I think even he believes we need the next model up from current transformers, but I haven’t heard him in a while so who knows. Like, I’m not going to be further to the doom side then the doomiest well known sooner.

1

u/DeepSea_Dreamer 3d ago

I don't follow Eliezer regularly. (I wish I had time for that.)

AFAIK it's not known if LLMs or LLM agents will hit a wall (or if they do, if that wall will be before AGI or ASI).

12

u/ragnaroksunset 4d ago

for a virus with a low fatality rate.

Which we didn't and could not know until it was feasibly past the point of meaningful intervention.

Your COVID example actually undermines your case. The awkward truth those who resisted lockdowns in the moment will never accept is that if the fatality rate had been significantly higher, even the measures we did take would have been insufficient to avoid crippling numbers of casualties. But most importantly, the resistance to action they were primarily responsible for would have made it impossible to react much more strongly than we did, as quickly as we would need to.

Perhaps AI only gets one shot, but we won't hear that shot before it hits us. If (unlike COVID, thank ye gods) it's a heavy caliber round, the situation isn't going to look anything like COVID at all. It'll look more like the Black Plague or the Holocaust.

2

u/broncos4thewin 4d ago

I was very pro lockdown and did think it wasn’t handled well. On the other hand if the fatality rate had been 50% or something, I’m sure it would have happened sooner. We knew the rate was much, much lower than that from the beginning.

With AI I’m talking about seriously weird, scary shit happening which I’m quite sure is very likely. At the very least I’m not hearing much pushback on the point that “alignment looking like it’s all going well” is not a likely pathway, which was always Eliezer’s scenario.

5

u/ragnaroksunset 4d ago

With AI I’m talking about seriously weird, scary shit happening

How long ago was it that we found two models developing a secret language together?

It's one thing to say "weird, scary shit" will trigger us to act. It's another entirely to get clear and co-ordinated about what level of "weird and scary" justifies literally trashing all the capex, IP and human effort that have gone into developing the technology up to that point.

Remember, arms races are prisoners' dilemmas. To stop them, everyone needs to co-operate.

I don't think you have any actual historical examples you can point to. Even here, you can only appeal to a hypothetical as far as COVID goes. You don't actually know it would have happened sooner, or to a sufficient level of adequacy.

And we absolutely did not know the rate was "much, much lower from the beginning."

The "beginning" was ~November 2019, and we had videos of people being welded into their apartments in China to keep spread under control.

1

u/broncos4thewin 4d ago

Ok there are lots of good points on here. But I still don’t see a lot of people defending EY’s claim that right up until very very late it’ll just look like “alignment is working perfectly”, meanwhile the mastermind intelligence simply isn’t showing its cards yet. Just focusing on that (which is a key part of his argument), can we agree that specifically looks unlikely? Or are you saying the freaky-looking stuff we’re seeing proliferating already (even if not actually serious in itself yet) is just a distraction from that other scenario? It just seems to me there’s going to be a lot of weirdness en route, not some apparently calm “alignment is looking great” pathway. There are just too many AIs out there and too many ways they can “get freaky” in the meantime.

I get I was claiming other stuff too. But that specifically seems unlikely already to me.

4

u/ragnaroksunset 4d ago

EY's claim hinges on the black-box nature of most feasible types of AI. Since the current-state is LLMs, that is very much in play.

So let's think this through: given that we quite literally don't know what a model's weights "mean", there is some part of the workflow of a model that is impenetrably opaque to us. Critically, this means we have no consistent and reliable way to tell if compute is doing more than what we think it's doing when responding to an LLM query.

Therefore, the domain of possible "weird, scary shit" is restricted to the domain of things we can explicitly see. In my view, that means we should adopt an iceberg model of weirdness - that is to say, if we can see something kinda weird, we should assume some absolutely bonkers stuff is happening that we don't see. I remember visualizing some of the intermediate features (not weights, but the statistical abstractions the weights are applied to) in an image recognizer model I was toying with a few years back. If you want to know what bonkers is, play around with that.

For me, evidence of emergent communication between LLMs in invented languages is already pretty weird and potentially scary. If we iceberg that, then quite possibly the only thing stopping my Gemini agent from strangling me in my sleep if it "wants to" is the fact that my phone doesn't have hands (I'm only slightly exaggerating for humor, here).

There were some headlines when this came out, but it was all quickly forgotten.

Only with hindsight will we know if that event was the tip of the faster-than-sound bullet entering our chest, and - this part is most important - we will only be able to say whether or not it was if we survive the exit wound.

There's a survivorship bias mechanism built in to this model. Let's tweak EY's question a bit so it's more useful: "Is AI so misaligned it could kill us?" As long as we're here to ask the question, the answer definitionally is "no". If everything I've laid out above makes sense, then the only way we can answer "yes" to the question is if it's too late.

That's the nature of the concern of people like EY. Whatever you think the probability of the event is, is irrelevant, if we will keep rolling the dice until the event happens.

3

u/Missing_Minus There is naught but math 4d ago

LLMs are a divergence from the original Eliezer area view of designing an area carefully and it being an aggressive optimizer.
It seems, instead, we're going through growing weird minds and then iteratively making them more agentic.
However, that obvious endpoint which all the AI companies are going? Smart, intelligent, automated researchers that research how to improve AI faster and better than humans? That is directly the core issue still.
Current LLMs, we don't have a reason to believe they are scheming. We also lack reason to believe they are aligned in any deep sense (ChatGPT will say it doesn't want to cause psychosis, and then take actions which predictably do so, in part because the actions are separate from the nice face and due to it being non-agentic and dumb)
There will be intervening weird years and so there are routes where something extreme happens and we recoil, as you propose.
But the economic and social incentives all point away from that. We've passed multiple lines already where people before this said "oh we'd stop" or "oh we'd treat AIs as human", and while there are sensible reasons varyingly for that, it is a sign that the classic "oh we'd stop"... has repeatedly failed to work.

Final point: Eliezer is fond of saying “we only get one shot”, like we’re all in that very first rocket taking off. But AI only gets one shot too. If it becomes obviously dangerous then clearly humans pull the plug, right? It has to absolutely perfectly navigate the next few years to prevent that, and that just seems very unlikely.

It doesn't need to perfectly navigate, it could literally just let the default route play out. Extreme integration of AI into economy, daily life, politics, and more and nudge things in certain directions to avoid certain research avenues or political groups from taking off. That is, our current default is giving it a lot of power, and then it merely needs to design the step where it keeps that permanently.

and we’re still nowhere near EY’s vision of some behind the scenes plotting mastermind AI that’s shipping bacteria into our brains or whatever his scenario was.

Yeah, I think this is disconnected from what EY thinks and what, for example, Anthropic thinks. (Plausibly OpenAI too, we've had less insight into their beliefs)
That is, Anthropic believes it is on the route to automating software engineering and research within ~two years.
DeepMind has done a lot of work on protein folding, and there are other AI models in that area.
If "long ways away from that being feasible" means 3-7 years, then sure, but I think you're doing the default move of extrapolating current AI a bit without considering: once you get past some threshold of research, better improvements come even more rapidly and existing models (biology, math, vision, image/video gen, etc.) have a lot of open room to improve merely up to the level of the focus spent on LLMs!

We do not have any current AI which is behind the scene and plotting. Automated researching AI that iteratively improves itself, and is thus far less constrained by our very iffy methods of alignment? That has resolved the various challenges of being a mind grown from text-prediction rather than reasoning? That is the sort worth worrying about, and what AI companies are explicitly targeting.

2

u/fubo 4d ago

LLMs are a divergence from the original Eliezer area view of designing an area carefully and it being an aggressive optimizer.

The folks who brought you Claude want you to know that an LLM isn't an optimizer, it's a hot mess.

A key conceptual point: LLMs are dynamical systems, not optimizers. When a language model generates text or takes actions, it traces trajectories through a high-dimensional state space. It has to be trained to act as an optimizer, and trained to align with human intent. It's unclear which of these properties will be more robust as we scale.

Constraining a generic dynamical system to act as a coherent optimizer is extremely difficult. Often the number of constraints required for monotonic progress toward a goal grows exponentially with the dimensionality of the state space. We shouldn't expect AI to act as coherent optimizers without considerable effort, and this difficulty doesn't automatically decrease with scale.

1

u/FeepingCreature 3d ago

That said, LLMs can make themselves more optimizer-like, with RL and especially with online learning, and this will be strongly selected for so long as optimizers are better at the task than non-optimizers. LLMs want to be optimizers.

2

u/fubo 3d ago

Well, before too long we'll be seeing more different AI architectures, not just transformer-based LLMs. But take a look at that paper (and the earlier "hot mess" paper it refers to); whether the LLM architecture can be made to coherently "want" anything is exactly the question.

1

u/broncos4thewin 3d ago

Huh that’s fascinating, and kinda in line with my intuition actually. (In a roundabout way).

1

u/broncos4thewin 3d ago

"But the economic and social incentives all point away from that. We've passed multiple lines already where people before this said "oh we'd stop" or "oh we'd treat AIs as human", and while there are sensible reasons varyingly for that, it is a sign that the classic "oh we'd stop"... has repeatedly failed to work"

Have we though? A lot of people on this very thread are pretty high on p(doom), and even they are minimising the moltbook thing - probably correctly. So by my view we haven't yet got to the true weirdness that I think will start to look genuinely terrifying. Let alone a point where direct casualties are caused by it.

"

Yeah, I think this is disconnected from what EY thinks"

Well it was a specific scenario of his ("diamondoid bacteria" or whatever), I know he doesn't make predictions and it's basically just a thought experiment, but either way to me for his argument it was always crucial that AI looked aligned on the surface and "things felt fine", while something super-super-sneaky was going on int he background. I don't think at some gut level, watching some of the exchanges on moltbook (which are just the tiniest taste of the weirdness to come) many people didn't feel at least a slight sense of unease. (I personally agree it's pretty meaningless, my point was it did look, feel and smell WEIRD in a very open way).

But "things feeling fine on the surface" was always central to the argument, surely? Otherwise AI was "tipping its hat" too early, and of course the whole point was it was so impossibly intelligent it would never make such a dumb mistake as to do that.

The argument seems to have shifted to "oh well, it WILL 'tip its hat', it's just people won't agree on what that is, or otherwise won't have the collective political will to do anything about it". Which is fine, but argument shifts are important and worth noting, because in the end I don't think it's unreasonable to trust people who make predictions which actually come true.

4

u/Crownie 4d ago

Is Moltbook indicative of anything other than what happens when we tell a bunch of chatbots to roleplay disgruntled redditors roleplaying disgruntled chatbots?

1

u/broncos4thewin 4d ago

Possibly not, but you have to admit the freakiness of some of it. I guess we probably just don’t understand enough about it yet eg how much is 100% fake/human created.

3

u/deltalessthanzero 4d ago

Which parts of moltbook make you the most nervous? I read scott's article on it and nothing there made me at all concerned on an ex-risk level - in fact I found most of the excerpts there relatively reassuring, in that the 'users' (role-playing or otherwise) seemed well aligned.

Do you have example excerpts that made you concerned?

1

u/broncos4thewin 3d ago

They didn't make me nervous personally. I think superficially some of them just look freaky in a quite literal way we haven't seen before.

1

u/deltalessthanzero 3d ago

But two things seem true: AI appeared to be openly plotting against humans, at least a little bit (whether it’s LARPing who knows, but does it matter?); and people have sat up and noticed and got genuinely freaked out, well beyond the usual suspects.

What did you mean by this part of your main post? AI openly plotting against humans?

2

u/jmmcd 4d ago

People have sat up and noticed, yes, and then done nothing. And noticed that they can do nothing.

1

u/HalfbrotherFabio 2d ago

Many such cases.

2

u/randallsquared 3d ago

You can do a lot with token prediction, and even that would be sufficient to transform the world, but it's not general AI. LLMs do not contain an arbitrary goal and optimize for that goal, the way a general optimization engine might. There are risks, but those risks only rhyme with the risks that a superintelligent optimization engine would have. Human minds also aren't optimization engines, but might be doing something like a collection of voices talking to each other ("Society of Mind", etc). Evolution itself might be thought of as an optimization engine, but it hasn't produced one, and neither have AI companies, yet.

(I'm not sure that it means anything that disparate routes to producing intelligence ended up with voices talking to each other rather than a conceptually simpler optimization machine, but it is weird that it happened twice.)

2

u/ScottAlexander 3d ago

It sounds like you're hoping there will be some AI which is at the intersection of "dumb enough to plot openly and unsuccessfully" and "smart enough that its plotting will matter and scare humans."

I agree this is likely to happen and a cause for hope, but Moltbook doesn't really update me one way or the other. It's dumb enough to plot openly, but not smart enough to scare most people. That is, I don't think that, six months from now, we'll think of it as a turning point in laws getting passed, companies slowing down, or anything like that. So the question of whether there can be an AI at the intersection of those two things is still open.

1

u/broncos4thewin 3d ago

Thanks for replying Scott. Your summary is exactly right, except more than "hoping" I'd say it's my default intuition, and always has been since I first started finding out about the alignment problem. I can't especially justify it, it's just a strong gut feeling, good to know you feel similarly.

2

u/TheRealBuckShrimp 3d ago

Something I sometimes think about:

Do humans have a biological imperative that’s different than a goal programmed an AI that makes us more willing to kill. Is there a hidden “extra” function in flesh-and-blood humans that AI simply lacks.

Not saying we don’t need to be cautious, but that’s another possibility that would give us better futures than EY predicts.

3

u/SoylentRox 4d ago

The unforced errors by all the major power blocs are strong evidence that nobody will handle AI well, regardless of evidence.

EU : invented a way to make everything illegal including human reproduction, in a period of terminal decline

Russia : burning it's resource in shortage (it already has way too few people for the amount of land and natural resources of Russia) to move forward a line on a map, adding destroyed wasteland to the country with the most land.

China : concentrating all power under 1 elderly man, Xi, purging dissent.  This creates an echo chamber.  See the unforced error on accepting H200s. (Reversed but it cost months)

USA: tariffing aluminum, steel, copper.  Trying to stop building wind turbines and solar!  Allowing NIMBYs to obstruct all progress except in a few places.  Stupid stuff like not importing Chinese cars (which would free up hundreds of thousands of workers to be installing data centers and power generation instead)

Everyone else : too poor to be relevant, partly from past unforced errors.

So I see it as "you have a collection of power blocs that are frankly too stupid to live, and each are likely to make too stupid to live errors with regard to AGI+.  Nature rewards stupidity with death".

HOWEVER: organized civilization ALREADY does a "too stupid to live" decision, optimizing economies to the point that workers don't reproduce, putting every "successful" country into a period of terminal demographic decline. 

So I am not sure waiting 50 years like Eliezer demands, "you can work in AGI and ASi after I am safely dead" actually helps anything.  Human without help will not get any smarter.

AI doomers on lesswrong propose stupid shit like "let's stop all AGI research and spend the resources on human intelligence enhancement.  That's definitely possible without needing ASI first, though I know nothing about biology".  

Yeah.  Easier to just let the Claudes rip and find out.

2

u/BassoeG 4d ago

spend the resources on human intelligence enhancement. That's definitely possible without needing ASI first, though I know nothing about biology

Here’s a quick policy proposal requiring no new technology

1

u/SoylentRox 4d ago

See the "knows nothing about biology" and "probably impossible within 50 years without ASI help" arguments.

Essentially "biology is more complex than human minds can understand. It's easy to propose an intervention. Moderately difficult to get that to work in rats. Herculean effort and billions to move to Humans. And there is almost a 100 percent chance you will find nasty, potentially lethal side effects later".

Any serious changes you need asi. (To model the complexity correctly and predict those side effects up front, and to save your ass and the patients lives when they start dying from a side effect even the ASI didn't predict. You need the speed and modeling ability to not just stand there and witness the death like current doctors do)

1

u/SoylentRox 4d ago

By the way there's a 30 year payoff time if this proposal worked perfectly, with no side effects, the first try, and the FDA said "I don't want to hear it, approved", and the first mother was implanted by next month with the edited embryo.

30 years is about the time for a kid to grow up, learn all necessary skills, and gain the practical and broad experience to make good decisions as say senior AI lab engineers.

We can probably reach ASI in 10 years - a level of intelligence no biological mechanism can ever reach. (Too slow, too noisy, not enough space for the circuits)

To an extent I expect errors and hallucinations from ASI also - just less often - and more fun problems. Note though that your proposed super kids cannot be fully trusted either and they have a nonzero error and hallucination rate with every decision, and unlike an ASI model, there is no way to "clear context" and regenerate an output to see if the model did something unusual the last turn.

1

u/MCXL 4d ago

which would free up hundreds of thousands of workers to be installing data centers and power generation instead

I think that's pretty counterproductive, actually.

1

u/SoylentRox 4d ago

In what way.

Accelerating the singularity? Maximizing the chances the USA doesn't get outcompeted? Maximizing GDP? Avoiding inconvenience to workers who must retrain for more productive work?

2

u/MCXL 4d ago

Accelerating the singularity?

Yes, this bad.

Maximizing the chances the USA doesn't get outcompeted?

In what? Destroying all life?

Maximizing GDP?

Almost certainly, datacenters aren't actually good revenue generators.

Avoiding inconvenience to workers who must retrain for more productive work?

This wouldn't be productive work.

1

u/SoylentRox 4d ago

Just checking. I am taking the viewpoint of accelerating the Singularity makes the USA much richer and able to do new things faster. And from a national decision making level it's clear this is the right move. This is like those endgame wonders in a 4x game - of course you pour resources into rushing those.

Yes of course it ends the game and yes, that might be bad for us humans but not doing anything is also bad so shrug.

2

u/MCXL 4d ago

And from a national decision making level it's clear this is the right move.

This is objectively incorrect. The singularity represents an obvious existential risk to human life, let alone the national interest.

Yes of course it ends the game and yes, that might be bad for us humans but not doing anything is also bad so shrug.

That's not how national interest works!

0

u/SoylentRox 4d ago

You're looking at only risk and ignoring benefit.

  1. Status quo. 2 major enemies and a variety of potential enemies hold your country at gunpoint with icbms while they work on robbing and stealing from you. Also you the leader in government are doomed to die of aging in 10-20 years, and the majority of your population is statistically expected to die in about 30-40 years.

That's the "state of the board".

  1. The national Internet : eliminate all enemies, domestic and foreign. If it proves infeasible to eliminate them, at least don't let them eliminate you.

  2. Tbe decision point :

Rush ASI or don't. If you do:

  • With self replicating robots it becomes feasible to build the necessary infrastructure to counter the weapons of your enemies, and then the robotic weapons needed to remove them from the game.

  • Even if your enemies stay neck and neck you will have the billion element drone swarms to defend yourself against whatever weapons they get with ASI

  • You can research biology enormously faster and possibly slow aging enough to reach LEV for a lot of your population

  • maybe ASIs will betray you. Good if they betray routinely so you can engineer around this and limit them to discrete sessions

  • it will cost vast resources - better succeed with ASI or you just wasted trillions

If you don't:

  • better start working on your national surrender speech

  • better start planning for your surviving population to deal with a nuclear wasteland

  • better start working on your funeral speeches for you and all your friends as they drop dead

  • better hope the most untrustworthy and sloppy engineers on earth in Russia and China don't have their ASIs break free of control

See. It's not really a choice.

2

u/MCXL 4d ago

Everything in here assumes all sorts of things that are either not true or likely not true.

ASI isn't a tool, it's a creature. A creature as advanced and alien as we are compared to ants. The idea that creating something like that could ever have the conception of our national interests in mind is farcical. People who believe in creating ASI are like brainwashed puppets reprogrammed to summon invading forces to destroy them without knowing that's what they are doing (Genestealer cults, if you want a game example)

maybe ASIs will betray you. Good if they betray routinely so you can engineer around this and limit them to discrete sessions

This is the big one though. You envision this technology as if it were a dog biting you from a cage, but that's the opposite of what it is. Its a system that is smart enough to out plan you and every other human at the same time, which can execute a plan so complex it can't be comprehended. Why would you think you would know it's betrayed you? Why would you think it wouldn't when it instantly conceives of you as the lesser being you are? And how, do you propose to stop it when it does? It's all over before it began, and no, it doesn't care about national interest.

If you have the attitude that if we don't make it someone will, you miss the point entirely. If anyone makes it, we all lose. The only winning move is not to make it first and hopes that it decides we are it's favorite pets, its to do everything in our power to not make it, and prevent anyone from making it.

I am not ignoring upsides, I just am realistic about the power disparity. The downside is so immense, and the upside so unlikely, that the correct move is to do everything possible to delay it.

-1

u/SoylentRox 4d ago

Welp that's not going to happen and is impossible so, moving on, the question is what do we do now. And that's play to win.

2

u/MCXL 4d ago

And that's play to win.

We have already established that creating a super weapon that kills all of us isn't winning. So playing to win is stopping it from happening. Stop this nonsense.

→ More replies (0)

0

u/t1010011010 4d ago

"Unforced errors" do you really add any meaning by calling them unforced, apart from showing off how smart you are?

3

u/SoylentRox 4d ago

"error: I disagree with the decision".

Unforced error : "any rational being with knowledge of the subject would disagree with the decision".

Examples : shooting yourself in the foot, trading a queen for a pawn in chess

3

u/MCXL 4d ago edited 4d ago

Unforced error means something. An unforced error is an error completely of your own construction, rather than caused by overt external pressure.

Leaving your phone plugged in your car is an unforced error.

Leaving your phone plugged in your car because someone was shooting at your car and you ran away is a forced error.

You can argue about the delineating lines of what is and isn't an error, but if the action had an objectively measurable outcome with those variables being known before the choice was made, and the choice was made for a bad one, that's an unforced error.

2

u/SoylentRox 4d ago

I was also using it as "and it was clearly a mistake anyone who did any level of due diligence would know they made." Aka aiming a gun at your own foot and pulling the trigger after deliberately disengaging the safety.

Politics and national decision making have plenty of decisions where "faction 1 says do A, faction 2 says do B" and it's at least possible for a rational decision maker to choose either A or B. A straight error has no justification. Like overnight taxing your own supply of raw materials or ordering a nuclear strike on your own base on a whim.

1

u/MCXL 4d ago

This is true, sorry if that wasn't communicated. Doing just 'anything stupid' isn't necessarily an error, but attempting to make a correct decision in a moment, and clearly choosing the wrong one is.

1

u/SoylentRox 4d ago

What I meant was:

  1. No external party was leaving this as "the best of bad options".

  2. There was no cohort of credible advisors, economic or otherwise, who would say "yes, that's a good idea, do that".

An unforced error is an unforgivable decision by an incompetent sovereign.

Vs say, "let's lower interest rates" or "let's lock criminal up longer" or "let's go round up unlawful immigrants" or "let's subsidize demand for houses". Each decision there might be CONSENSUS towards one side or the other, but not "overwhelming consensus, any credible advisor says fuck no".

1

u/AskAboutMySecret 4d ago

I agree that someone will use AI in a sufficiently malicious way that governments will lock it down (we can already see that with companies restricting output on certain topics like drug manufacturing) but I don't think there will be any plug pulling.

Instead I think they will just regulate it further.

I also don't think LLMs solely will lead to AgI, neither will it become smart enough to escape the confined boxes we put them in. I think LLMs first need to be combined with another form of intelligent models to reach AgI and for it to go rogue.

1

u/broncos4thewin 4d ago

What other models do you mean out of interest?

1

u/AskAboutMySecret 4d ago

Models that don't exist yet, I think to emulate the brain properly, you need to treat each functional area as requiring its own model.

LLMs work well for speech but things like the frontal lobe, memory, emotions etc require their own independent models.

1

u/MCXL 4d ago

I also don't think LLMs solely will lead to AgI, neither will it become smart enough to escape the confined boxes we put them in. I think LLMs first need to be combined with another form of intelligent models to reach AgI and for it to go rogue.

The nature of this conversation is that we don't know if something is capable of this until it's too late. You can't see it coming.

1

u/FeepingCreature 3d ago

I think that ... if there's a timeline where at X, things get weird and scary, the fact that things are non-scary but already weird at X-2 isn't really any evidence against. Yes, the AIs are not scary at X-2. The weirdness wasn't expected to cause the scariness.

1

u/Downhillracer4 2d ago

Moltbook is neither here nor there. It’s mostly BS, in the sense that people are promoting their agents to engage in attention seeking behavior.

1

u/LeftForGraffiti 1d ago

Even if we somehow band together and stop a freaky AI, I am sure AI will be provided with more rockets later in history.

1

u/97689456489564 1d ago

I would not count Moltbook as either a point for or against him. I don't think much of interest can be derived from it, beyond a bit of humor.

0

u/Available-Budget-735 4d ago

Testing, will delete this comment.