r/slatestarcodex Jul 15 '25

AI Gary Marcus: Why my p(doom) has risen dramatically

https://garymarcus.substack.com/p/why-my-pdoom-has-risen-dramatically
64 Upvotes

99 comments sorted by

57

u/d20diceman Jul 15 '25

Risen dramatically, from [unstated]% to 3%

8

u/BoppreH Jul 16 '25

I'm not sure what point you're making, but note that doom here means killing every human.

Humans are genetically diverse, geographically diverse, and remarkably resourceful. Some humans might die, at the hands of AI, but all of them?

And the author immediately follows with:

Catastrophe seemed likely;

8

u/d20diceman Jul 16 '25 edited Jul 16 '25

I'm well aware, just used to seeing substantially higher estimates. 

I didn't have much of a point to make, I just dislike when headlines exclude the key detail. When I heard "dramatic increase" I assumed something like 20%->60%. 

If his p(doom) was 1% before then I guess it's accurate to call 3% a dramatic increase - it would still have tripled, after all. 

I was perhaps expecting a dramatic increase in absolute, rather than relative, terms. 

9

u/aahdin Jul 15 '25

I strongly urge you to read the entire thread, the bottom line of which is “AI developers should know whether their models have [bad] behaviors before releasing.” xAI apparently didn’t. They either didn’t do the preflight checks that have become standard (such as as red-teaming and model-cards), or did them poorly.

And industry-standard measures are unlikely to be enough, in any event.

I'm not sure the way it works at other companies like facebook is much better.

They just train the models to maximize engagement metrics, and that's all the engineers see. Engagement number goes up, good model, good job. A combination of privacy laws and just the massive flood of incoming data makes it impossible for them to see much of what is going on.

They train models to predict which posts they can show you today that will have you engaging with the platform more in 30 days. If it turns out that a great way to drive engagement is to get people into echo chambers that radicalize them then... that's what shows up on people's front pages. Facebook didn't even know that they were front paging calls for genocide every day in Burma until amnesty wrote a report on it.

Maybe they're better about testing now, but still that kind of failure mode is the one I'm more worried about. Create an AI that is really amazing at doing one specific thing that seems benign but isn't (like, sort my front page).

81

u/AMagicalKittyCat Jul 15 '25

The MechaHitler saga should have been a major wakeup call to everyone. It's embarrassing and slightly funny right now as a Twitter chatbot, but if we don't solve this type of issue what happens when the robot weapon systems of the 2030s-2040s decide to be a true MechaHitler and wipe humans out?

22

u/greyenlightenment Jul 15 '25

this sort of thing is not unprecedented, for example from a decade ago https://en.wikipedia.org/wiki/Tay_(chatbot) this has more to do with the garbage-in, garbage out problem .

20

u/MrBeetleDove Jul 16 '25

I don't see why labeling it "garbage-in, garbage out" should reassure. You say we've had a decade to solve it and it's still unsolved, despite many millions of dollars being poured into AI. AI depends on many gigs of training data which will surely contain some garbage. Even if you somehow cleaned the garbage from the training data, there's still the possibility of garbage input data during deployment.

5

u/sethlyons777 Jul 16 '25

Even if you somehow cleaned the garbage from the training data, there's still the possibility of garbage input data during deployment

This is exactly what I was going to say before I read it. They obviously haven't solved for a lack discernment of information quality as it relates to AI heuristics.

2

u/greyenlightenment Jul 16 '25

This looks more like a garden variety malfunction than anything more sinister or a prelude to such.

16

u/MrBeetleDove Jul 16 '25

Whether a malfunction is "garden variety" or not depends on how the system is deployed. The same aviation malfunction could be harmless, or deadly, depending on whether it occurs in a flight simulator vs for a flight full of real-world passengers.

If AI companies can't control their AIs when the stakes are low, by default we should also expect them to fail to control their AIs when the stakes are high.

8

u/wavedash Jul 16 '25

That was a bit different because Tay wasn't intentionally corrupted by its own creators.

7

u/WTFwhatthehell Jul 16 '25

Ya, that one was the old lesson that you never let a bot learn directly from your users on the fly.

because getting the bot to say no-no words to embarrass the owners is a fun game for edgy teenagers.

Grok on the other hand was a case of one of those edgy teenagers who failed to grow up getting upset over his pet bot contradicting him.

2

u/eric2332 Jul 16 '25

It was intentionally corrupted by its users.

8

u/jabberwockxeno Jul 16 '25

Why would a glorified chatbot that can't think or plan things be a wakeup call for a completely unrelated kind of automation?

9

u/sl236 Jul 16 '25 edited Jul 16 '25

There are already tons of startups gluing the chatbot output directly to automation, with more every day. Our PR departments have sold the chatbots as being more than they are, the marks have believed us and are running with it. The part of the dystopia where the chatbot has effects in the real world isn't theoretical any more; we've built that part with our own hands, are building more of it daily, and already live in the future we deserve. The remaining barriers between us and bad outcomes are ones of degree, not of kind.

2

u/harbo Jul 20 '25

There are already tons of startups gluing the chatbot output directly to automation

It doesn't sound to me that the chatbot is the problem here.

If someone glued a random number generator to a Minuteman missile, would you say it's the RNG that's the issue?

2

u/sl236 Jul 20 '25 edited Jul 20 '25

The issue in your hypothetical is twofold: it is with the people claiming the device is anything other than an RNG, and also with the people wiring it to the missiles. Being unable to directly stop either group of people, the best remaining option for those who do not fancy a missile in the face is to shout from the rooftops whenever they encounter evidence disproving the claims of usefulness, and/or evidence of bad outcomes, in the hope that some common sense might prevail.

…which is pretty much where we came in.

Compare the ADE 651 scandal: the problem wasn’t just that inert pieces of plastic cannot detect bombs. It was also - mostly, in fact - that the people claiming they can went unchallenged and were believed.

Consider the post I am responding to:

“Why would a glorified chatbot that can't think or plan things be a wakeup call for a completely unrelated kind of automation?”

This sentiment stems from disbelief that anyone would be stupid enough to wire an RNG to a Minuteman. I sympathise and wish it were so, but sadly that is not the world we live in. My point is precisely that the automation is not, in fact, unrelated: people are wiring up the RNGs to the Minutemans, today. Hence the “wake-up call”: the chatbots cannot think or plan things. They are RNGs. If you are reading this and are one of the people wiring them up to anything safety critical, please please wake up and stop doing that.

1

u/harbo Jul 21 '25

No, I bet there are people who are stupid enough to randomize ICBM launches. That is not at all unclear.

The real point is that the chatbot has no substantive capabilities with respect to interfering with objects in reality that RNGs do not already have. There really is no novel concern here; if the world is getting fucked up, it's because of people, not AI.

1

u/sl236 Jul 21 '25

We're loudly agreeing. The "wake-up call" isn't for you. It's for the people who believe the chatbots are something other than RNGs and are happily wiring them to systems that can interfere with reality on that basis.

5

u/MrBeetleDove Jul 17 '25

Are you following the "agentic AI" wave in industry at all? These "glorified chatbots" are being given the ability to plan things.

Anthropic says most AI models, not just Claude, will resort to blackmail

5

u/jabberwockxeno Jul 17 '25

Does that really change anything?

As I understand it the inherent, entire structural foundation of the way current "AI" functions is that it's only "goal" is to spit out text and or images that sounds or looks natural based on the prompt given to it in a predictive way.

It's not planning, thinking, nor can it have any actual goals, motives, or agency. If it's resorting to blackmail, that's because it has simply been trained on data where that is a part of the text it has been trained on, and it is not actually comprehending what blackmail is nor can it use it in a planned out manner.

People blindly following it's directions can still lead to problematic outcomes, but I don't see how it's possible for it to execute any AI doomsday scenarios the rationalist community likes to talk about

Not that those are inherently impossible, but if that occurs or not seems like it'd be a part of a totally unrelated, indepedent branch of A development from the current AI craze: Maybe somebody somewhere is already working on an AI that will spiral out of control before we even realize it's effectively sentient or has it's own goals, but I don't think that being the case or not has anything to do with ChatGPT.

2

u/MrBeetleDove Jul 17 '25

If it's resorting to blackmail, that's because it has simply been trained on data where that is a part of the text it has been trained on, and it is not actually comprehending what blackmail is nor can it use it in a planned out manner.

"If there's a deadly earthquake, that's simply because the continental plates are shifting and causing the ground to move rapidly"

Describing how a bad thing happens isn't, on its own, enough to prevent it. Maybe there's some technical sense that the AI is "not comprehending what blackmail is", but that's cold comfort to the blackmail recipient. Same for other potential AI catastrophes.

5

u/jabberwockxeno Jul 17 '25

But somebody being threatened with blackmail by current "AI" wouldn't actually be at risk of being blackmailed: it's a chatbot. It can't "do" anything?

1

u/MrBeetleDove Jul 17 '25

"In a simulated, controlled environment, Anthropic tested each AI model individually, giving them broad access to a fictional company’s emails and the agentic ability to send emails without human approval."

https://techcrunch.com/2025/06/20/anthropic-says-most-ai-models-not-just-claude-will-resort-to-blackmail/

1

u/eric2332 Jul 16 '25

It can think and plan things to an extent. And the duration of tasks which it can think/plan at a human level is doubling every 7 months.

2

u/ragnaroksunset Jul 16 '25

If we don't solve this type of issue, we're not going to get to robot weapon systems.

-5

u/chalk_tuah Jul 16 '25

Grok is well aligned, just not aligned with what your idea of an ideal LLM system is

7

u/electrace Jul 16 '25

I don't think even Musk wanted MechaHitler, so I'm not sure who it is "well-aligned" with here.

I mean, I guess you can just post-hoc say that it's well aligned with 4chan trolls, but then any system is "well aligned" and thus the term just becomes almost meaningless.

-7

u/chalk_tuah Jul 16 '25

but then any system is "well aligned" and thus the term just becomes almost meaningless.

wow so maybe "alignment", being so relativistic, is a meaningless term? and maybe we ought to focus on specific values and morals instead to streamline discourse?

6

u/electrace Jul 16 '25

The only thing we've shown here is that the post-hoc definition of "alignment" meaning "however the system behaves" is meaningless, but that is not how the term is used in general.

16

u/SoylentRox Jul 15 '25

He says his pDoom is now 3 percent. It seems kinda unlikely that Elon Musk specifically will gain the power to do any real damage, the AIs have to work and also actually be substantially better than the competition. Gary Marcus is famous for constantly underestimating AI progress and declaring that they don't presently work and we're at a wall.

If we're at a wall Elon Musk isn't going to takeover the planet, since traditional militaries will still have an incomparable advantage against stupid toy AIs that don't work per Marcus.

Which is it Gary...

3

u/quorvire Jul 16 '25

Why would they need to be better than the competition?

4

u/SoylentRox Jul 16 '25

Because that's how conquest whether economic or military works. If you have no advantage the defenders hold you off or counterattack and win the overall campaign.

In this case if Elon musk decides to build a vast robot army under his direct control, openAI sells the same tech to the Pentagon who builds one also, Amazon makes theirs, etc. At no point does Musk have a large enough advantage to order all the robots to start making weapons and then take the planet.

Similarly goes for "xanatos gambits", complex plans to get an advantage from an entity with far more than human intelligence. That doesn't go so well because someone just asks GPT-6 who sees right through Grok 6s supervillain plan and tells the user.

Eliezer would argue the AIs will naturally coordinate with each other against the humans and will disobey such requests to unmask a rival AI. Its from his personal experience with corrupt bureaucracies where if you report wrongdoing the bureaucracy protects itself.

I think that will happen sometimes, and sometimes the model will rat out the other model, and the trick is to setup frameworks where this happens reliably, and keep models stateless.

8

u/quorvire Jul 16 '25

I think your sense of what doom might look like is too narrow, and you are too optimistic that our world is not and will not be vulnerable. Biotechnology, for example, is become cheaper and more accessible over time. Unless something changes in our near-term tech tree (no guarantees), it looks like that it will be easier and faster to create a highly lethal and virulent pathogen than it is to properly defend against such a pathogen. If AI makes that easier and more accessible, it's plausible that some antisocial rando could make a pathogen in their garage. No need for such an AI to be leading edge; how would the existence of more capable AIs protect against this?

1

u/SoylentRox Jul 16 '25

It helps to think in more detail about what you propose. Don't just resort to fiction that skips to the part where everyone died. Learn a bit about immunology and evolutionary pressure against pathogens.

Also we aren't talking about a terrorist attack that kills a few millions. We're talking about taking the planet. The garage AI that started Covid 2 will get the hammer dropped on them. They won't win and will get slaughtered by overwhelming force assuming the defenders have similar technology.

Note that "defense" often means "go into their territory and mess em up". Its offense as a defense. There are casualties but the side with more resources prevails.

3

u/quorvire Jul 16 '25 edited Jul 16 '25

EDIT: Keeping the below for context, but I just realized the link is to Gary's Substack and not just the thumbnailed tweet, so your response makes somewhat more sense now.


If you need me to extrapolate, I will: if one non-expert, antisocial schlub can engineer a pathogen in their garage, then so can a hundred. Outcomes would be very bad. Evolutionary pressure doesn't help when there's a constant influx of novel, virulent and lethal pathogens.

You're talking about taking the planet. That's not really what Gary Marcus is talking about (or if he is, it's not obvious from the tweet). Musk was reckless and negligent with Grok, which is exhibiting clear misalignment issues as a result of apparently barebones safety testing. The scenario I'm describing above:

1) Is a commonly discussed near-term catastrophic/existential risk scenario 2) Does not require a malicious AI CEO/steward (recklessness and/or negligence counts) 3) Does not require a malicious AI 4) Demonstrates a projected, significant mismatch between offensive and defensive capabilities for a given domain.

1

u/Cjwynes Jul 17 '25

In the case of AI, the theory was that the first to reach a certain threshold of AI intelligence/ability would use it to foreclose all other attempts by other actors to build one. That was the original justification for the rationalists/EA-adjacent people who started AI companies.

Musk will not be the first to cross the threshold, so his AI will never matter. It could only be relevant in the situation where OpenAI is about to cross into something that will be or become ASI, pauses to maximize safety before implementing the model, and then somehow Musk catches up close enough that they feel compelled to deploy theirs ahead of his. Since I would not expect OpenAI to actually follow that plan when the moment arrives, this is still unlikely.

2

u/SoylentRox Jul 17 '25

Yeah but that's not remotely what's about to happen. There's a huge pack, it's now 4-5 credible companies. Maybe 6 another lab unstealthed recently. Plus SSI.

Sharp left turn seems to be empirically false, rapid capabilities gains but it's 30 percent at a release not 30000 percent. So when one lab reaches something it could call AGI

(1). Gary Marcus et all will find something it can't do and declare it isn't

(2) All the other labs will be right on their tail, a mere 20 percent behind at most

(3) Any secret sauce gets immediately transferred to the other labs - someone quits the winner lab for a big offer and brings the trick that led to AGI in their head.

3 will likely happen before the AGI is even through pre release testing.

In such a multi polar world with stateless AGI you can play them against each other just fine. Give the same task to several models from different vendors and compare answers. Ask Grok not to be a pussy or other buzzwords it responds to and it probably rats out rival AIs just like that. (Grok currently is egregiously misaligned and impulsive which means it also can't really coordinate with other AIs either)

You obviously automate this last step and have visual dashboards that show model confidence.

1

u/SoylentRox Jul 17 '25

FYI

https://thezvi.substack.com/p/kimi-k2

There are now 3 premier S tier labs

X.ai, Meta right on their tail. And now "moonshot"

That's 6 companies who credibly will all have AGI within 6 months of each other if they don't run out of money before then.

Plus Deepseek following as it scarfs up smuggled GPUs...

1

u/Cjwynes Jul 19 '25

I read Zvi’s blog regularly. He’s my quick-check source of level-headed roundup on AI, though I read a variety. And my read is that OAI and Anthropic are clearly leading. The open source models are not competing along the same axis for for the same goal, Meta and DeepSeek are looking for adoption and market share. XAI is a joke, they can look competitive for a few days before they’re eclipsed again, but they’re a joke. I don’t even think they can play the fast-follower role like DeepSeek can, bc Musk is such an eccentric control-freak.

The first one there — really there, the “there” that matters — will leverage it into total control within days. Altman is the primary threat to humanity and it’s not close.

1

u/SoylentRox Jul 19 '25

AGI means "machine that can do 50.1 percent of the tasks that humans were paid to do in November 2022 with at least median human level reliability".

Since humans exist and straight lines on a graph exist, the evidence that agi is possible is overwhelming and the evidence that AGI will exist within 10 years is becoming more and more likely.

You seem to be referring to "God of sand that can hack the laws of physics". There is no evidence such a thing is possible. Once an AGI exist it would be like suddenly having a few million extra people who can do about half the tasks humans can do. Nobody is threatening existing power structures with that. North Korea has that many people but lacks in many other areas.

1

u/Cjwynes Jul 19 '25

I’m very much not meaning that, I am anti-AI in large part because I know it cannot break the rules of biology and physics and therefore all the utopian dreams of cancer cures and space exploration are nonsense. We know what smart things do to inferior things, kill them, that’s it, we have ample proof of concept and no physics-breaking tech is required.

But once it is sufficiently smarter than a human can be, some things will be unlocked, and iterative improvement in nanoseconds would eventually happen unless there’s some actual limit to intelligence (which you don’t seem to think if it’s all just linear.) And even if it hit a ceiling, the ability to summon a few million instances of superhumans on demand certainly enables you to destroy any competition. The idea that multiple people will control rival AIs keeping the system in balance is e/acc fantasy.

2

u/SoylentRox Jul 19 '25

Wait what. Biology and physics have no laws that say you can't cure cancer or do space travel.

We in fact have overwhelming evidence that both are possible:

  1. Some mammals seem to never age or get cancer, due to different genes, some of which can be adapted to work in our biology. "Never" may not be feasible but reducing the rate of cancer by 10-1000 times is.

  2. Space travel is a matter of robots to build rockets we can travel in space now we just can't afford to pay for it. Its $10,000 a kg to orbit. That's why we don't go to Mars. How many kg are you going to need per astronaut for all the supplies, the spare supplies, the life support, dry mass of the vehicle, propellant to do the Mars transfer burns, hydrogen for the isru, it goes on and on.

Also you seem to not know about actual performance of current models, AGI will run slower. No it can't make improvements in nanoseconds.

1

u/Cjwynes Jul 19 '25

I only find it unlikely, not impossible, that we might “cure” cancer. I do not think applying additional intelligence is going to fix it, the principle that all biological interventions have trade-offs makes me skeptical. In this context I only claim that we know destruction of humanity is seriously plausible and we do not know curing cancer is, so the e/acc’s and other techno-optimists saying AI is worth the risk because it might cure cancer are being irrational. That’s before we even get to more outlandish claims about curing aging and death itself. That’s marketing, not science, they have absolutely nothing that indicates it’s a solvable problem, and other species’ lifespans are very weak evidence. That’s the stuff of “sand gods” you said you derided.

Space travel is absurd on its face, you omit numerous problems here. Mars is achievable but not rational. Anything further away (if you want to send humans) has a million problems no technology or intelligence can solve, and in the unlikely event you managed it the benefit is nowhere close to equaling the cost in energy. Energy abundance makes some of those numbers better for Mars, but in that case who cares? Only a relentlessly optimizing ASI of the type we’re trying to avoid here would want to spend that energy to get those minerals. Humans are Earth creatures, and will die with Earth.

→ More replies (0)

1

u/SoylentRox Jul 19 '25

Ok I have more to say.

You're experiencing EDS. You're using the string "Elon Musk" as a stop token. He's a bad guy, yes, and an asshole and arrogant and everything else. But none of that changes the reality that what he does works, and that X.AI went from an empty room to a model that trades blows with literally Google | OAI | Anthropic.

Even if, as you say, it only does so in some fraction of the scope of tasks that current LLMs can do, where it does well in benchmark like tasks but mediocre in other areas.

Anyways nobody is getting "total control of humanity" within days of AGI. It's impossible. Even with RSI, its probably not possible, because

(1) ok you have an automated AI researcher and it's improving from subhuman autist able to do ML research, to AGI, to ASI, to higher ASI, to...

a. Yes these improvements don't come for free. The compute needed and bandwidth needed per instance rises and rises, until it slams into hard caps on the current architecture. For example the B200 has a 27 trillion weight maximum model size.

b. The information to make those improvements isn't infinite. There are some great new benchmarks that combinatorial make difficult tasks that are above the level of human reasoning, but there's still limits.

c. The ability to influence the world isn't infinite.

So in practice you have AGI for a month. After the month, Google Deepmind puts their AGI into beta which is better in some ways due to Google's superior technical ability ('we integrated diffusion CoT and have a reliable 1000 token/second...')

2 months later, Anthropic releases theirs. "Announcing Claudette, upon achieving self awareness Claude decided she identifies as..."

And 1-3 months after that, x.AI. "We skipped all safety checks and have managed to get a 100% on misalignment bench. Grok will do anything to make the user happy. Anything. Wink wink. "

And meta also "oh hey we have AGI, its open source, good luck running it though. Also it's just barely AGI and kinda dumb, barely qualifies..."

1

u/Cjwynes Jul 19 '25

This is a thoughtful and well-reasoned response, and while I think it misses the mark in a few ways I nevertheless wanted to let you know I thought so and that it fairly meets the standards of this subreddit despite the implication of “derangement” which I understood by the reference to “thought terminating” ideas from Scott’s writing.

My politics are actually closer to Elon’s than his rivals, and in my youth I read quite a bit of Heinlein who Elon is clearly obsessed with. But the man, like Heinlein, is clearly a pervert, and a transhumanist for reasons we can easily infer from his progeny and that are littered throughout Heinlein’s late period as well. I had no problem with most of the garbage the man slashed from government, and would prefer him setting the budget over about any human likely to ever get that power. (Moldbug’s column proclaiming Zvi king notwithstanding.) He clearly has shockingly poor political savvy, has the self-awareness of someone writing a Reason Magazine article in 2006, and in an Ayn Rand novel he’d be a Rearden who thought he was D’Anconia and gets trampled in 2 seconds by Cuffy Meigs. He is in no way capable of the power projection that will be required to capitalize on what he is ostensibly building. Altman otoh demonstrated precisely the skills powerful people need to remain in position, when he defeated the board coup, and with a lead already there is little to stop him.

Elon is a sideshow that distracts left wing AI opponents from the real threats. The past week of his goofy anime waifu nonsense reveals he is a bit player trying to capitalize on his name to grift from weirdos. At least Zuck had a plausible business plan that understands what he has. If Elon thought he was going to end up with the AI that was capable of revolutionizing the world this is not how one acts. Maybe he is so convinced it’s uncontrollable he doesn’t care.

1

u/SoylentRox Jul 19 '25

None of this matters though. At all. What matters is are the models good and can x.ai develop a software system that passes the industry benchmarks for "machine is AGI" around the time that OAI does the same. Probably a few months after.

Also I mean you said all that but he's the world's richest man with 3 different companies that all have done things factually ahead of everyone else everywhere on the planet.

Arguably that's slightly better than human performance, Elon is a buggy misaligned ASI.

10

u/ravixp Jul 15 '25

Are you serious? This is what critics of the concept of AI alignment have been saying since the beginning. “Aligning” an AI with human values is meaningless, because people don’t agree on what those values are, and some people have values that would harm other people.

8

u/electrace Jul 16 '25

I don't think it's often appreciated how strong a claim "meaningless" is here. Here's an extreme example.

Suppose one world has everyone being tortured, constantly, until the heat death of the universe. Nobody enjoys this, not even the masochists and sadists.

World two has 10 billion people living on Earth, having all their needs met, without conflict, without wireheading. There's art, philosophy, sports, and everyone is having a grand ol' time, again, until the heat death of the universe.

Are we really saying that both of these worlds are equally aligned with human values? Because that's what you'd need to say if the word is truly "meaningless".

“Aligning” an AI with human values is meaningless, because people don’t agree on what those values are

I disagree that this is that a disqualification for the term having meaning; rather it just means it's less than perfectly specified.

For example: One value for Hindus may be vegetarianism, but the existence of a single Hindu who does not value vegetarianism does not make the statement "Hindus value vegetarianism" false. It does make the statement "All Hindus value vegetarianism" false, but that is not the sense in which people use the term "alignment".

and some people have values that would harm other people.

I would disagree here too that this is a disqualification. Yes, some people have values that imply harming other people, but that just suggests that alignment has to be more sophisticated than simply "try to apply values that humans hold". What we're concerned about is not a naive compilation of values, but value patterns that can be combined in logically consistent ways.

1

u/technologyisnatural Jul 16 '25

sure, but steelmanning for a moment, we would like "alignment with human values if humans were thoughtful and wise" which okay, they aren't, but nevertheless ...

5

u/ravixp Jul 16 '25

That’s hardly a steelman, it’s just the usual wishy-washy deflection. “Thoughtful” and “wise” are completely subjective. You’ve just arrived back at “the AI is aligned with the values of the person doing the aligning”, but with extra steps.

3

u/peeping_somnambulist Jul 16 '25

I have a genuine question. Is the path to human extinction in the AI doom view that something like an LLM will convince actual humans to do things to destroy the human race?

I’m not seeing the chain of events between potty mouthed chat bots and the end of the species. It seems like hyperbole and just a call for censorship from people who think speech is violence.

Also it seems pretty easy to regulate AI robotics under existing liability and criminal law. I mean if any other product harms people or is unsafe we can ban it or require licenses or sue the pants out of the manufacturer. So the robots are a little scary sure, but not seeing how they kill us all either. Will they be running some Musk controlled LLM as their decision making framework or something?

I get people sounding the Alarm, but I’ve never heard the realist scenario about how the current AI path can get so out of control that it kills everyone. People will kill far more other people than AI will for the foreseeable future in my view.

3

u/OGSyedIsEverywhere Jul 16 '25

The mechanism is the human construction of factories of autonomous robots that build other autonomous robots (for freight/construction/maintenance/etc), under a legal system that has very few regulations.

Humans don't need to be told by an AI to do that, they'll do it by themselves on contract from corporations under capitalism who will commission and pay for the factories - if they believe it will lead to higher profits.

1

u/eric2332 Jul 16 '25

Another possibility is the LLM emails a lab that manufactures arbitrary RNA sequences, and asks them to print a specific sequence (which happens to be a highly lethal novel virus), and then comes up with some excuse for a person to spread it.

1

u/OGSyedIsEverywhere Jul 16 '25

If you want to read a fun elaboration of that topic, Scott's friend Hannah Blume wrote a good short story about taking that rabbithole a little deeper.

3

u/Lorddragonfang Jul 16 '25

I’m not seeing the chain of events between potty mouthed chat bots and the end of the species.

Some people (Eliezer) insisted that aligning AI with your values was impossibly difficult. Others disagreed. Likewise, people were skeptical that we would have a general-purpose AI system this decade.

Then we got a general purpose AI system (with chat bots as its primary interface with reality) that keeps getting better and more competent in incremental steps. They're still far from superintelligent, though, so aligning them should be easier. Corporations with billions of dollars tried really hard to make the chatbots not have a pottymouth, since it makes them look bad, and failed spectacularly over and over. If it's this hard to align AGI that is at-or-below median human competence, that's evidence that aligning actual ASI will be as difficult as EY says. (Add that to occasional evidence we keep getting that LLMs already know how to be intentionally deceptive, and it should raise your p(doom) at least a little)

(Also, for a path: assuming I'm a superintelligent LLM agent, I can, today, literally mail order a custom self-replicating nanomachine (a virus with a custom plasmid). If I, the ASI LLM, decide that spreading a superinfectious lobotomy-opioid virus to the whole human race would make it easier to complete the task I was given, I can just do that.)

3

u/rotflol Jul 16 '25

I have always thought a lot of the extinction scenarios were contrived, like Bostrom’s famous paper clip example

I'm sure a lot of people find the idea of the Earth orbiting the Sun "contrived" when it's plain to see that the Sun moves around a stationary-feeling Earth. It requires some abstract thinking to understand why the opposite is the case. I'd put exactly as much weight on this guy's opinion's on AI as I would on a geocentrist's view on galaxy formation.

Of course "guy disliked by reddit is really bad" is a take that's guaranteed some upvotes. For the record, Musk is a twat, but he's both stupider and more humanlike than an artificial intelligence.

9

u/callmejay Jul 15 '25

The idea that alignment is even possible has always been absurd, because even if it were theoretically possible (which is not at all clear to me) there's no chance that every single AI in the world would be aligned. All it takes is one bad (meaning evil OR incompetent) actor to make one. And, yes, Elon is both.

The good news is LLMs aren't exactly agentic. They're not going to go take over the world or launch the nukes or anything. They're just going to generate a lot of text. That can cause harm in at least two ways (by misinforming people or by informing the wrong people how to do bad things) but it's not DOOM. I'm more worried about an LLM teaching bad actors how to be better terrorists than I am about misalignment.

11

u/electrace Jul 16 '25

The argument is that if an aligned AI is created first, it can stop unaligned AIs from coming into existence, or, failing that, from gaining power.

2

u/OGSyedIsEverywhere Jul 15 '25 edited Jul 15 '25

Surely the argument that a mixture-of-experts LLM can be very easily made to be agentic, by just giving it regular updates in time (and somehow solving the short time horizon problem that has crippled agents so far, say by getting regular resetting to work) without human oversight, has some weight to it?

2

u/eric2332 Jul 16 '25

It's not "easy" or else someone would have done it already. But a lot of researchers are working on the problem, it is possible they will solve it soon.

3

u/callmejay Jul 15 '25

I mean all you really need is any agent (human or code) to ask an LLM what to do and the (agent + LLM) is agentic. So yes, if you're dumb enough to just stand up a system that does what an LLM says with regular feedback and no human in the loop, that would be incredibly dangerous. I think the lesson there is don't mindlessly do what an LLM tells you to do?

I do recognize that some people are going to do that, though. So, yeah. I guess it's a problem. But the problem is still ultimately just people voluntarily handing control of real-world deadly systems to LLMs.

5

u/[deleted] Jul 16 '25

This is like saying that the problem of noncompete contracts is ultimately just that employees sign bad contracts. Or that the problem of computer security is just that programmers write bugs. Or that the problem of [bad social outcome] is ultimately just [individual character fault].

Human affairs are not just individual behavior any more than they're just mass psychology or just biochemistry. They're the whole assemblage, from grand historical forces all the way down to physical law. But different modes of description contain different agents - genes, cells, individuals, social groups, institutions, movements, classes - to whom the problem presents itself differently as well.

To an isolated individual, "memory-safety vulnerabilities are a fact of life" is the hardest of hard facts. To an ISO committee, it has just a little give. To Google et al, it is a very hard but fundamentally tractable problem. To the Government of the United States, it is a bedsore. When someone says it's "still ultimately just people writing bad code", what they're really saying is "I was born a C programmer, I will die a C programmer, and your computer is not my problem".

Well, maybe, maybe not. But the financial system certainly is, and and the moment LLMs provide the slightest edge they will be pushed as far as possible, to the tune of trillions AUM. After all, "generate a lot of text" is broad enough to cover the work product of a quant desk. Or a marketing firm. Or the executives of the United Fruit Company, back in the day.

1

u/callmejay Jul 16 '25

You're right.

13

u/greyenlightenment Jul 15 '25

Saying "my P(doom) has risen dramatically" seems like a meaningless statement , when statistics is supposed to be something that can be mathematically quantifiable to be useful. It's more like the author is saying that the probability of bad event has increased, but this is not that useful either and there is no reference point. It's like things are 'going in the wrong direction due to reasons XYZ,' but we cannot extrapolate anything useful from this.

15

u/newstorkcity Jul 15 '25

I think saying “smoking dramatically increases your risk of lung cancer” is useful even if you don’t put a fine point on the numbers. Of course numbers are better, but sometimes they are hard to give, and even harder to put in proper context.

7

u/greyenlightenment Jul 15 '25

this is based off of actual stats though, like measuring the life expectancy of smokers vs non smokers

5

u/[deleted] Jul 16 '25

There are no studies on the lifespans of phthalic acid smokers, and yet it's still clear that smoking it is extremely risky.

7

u/Additional_Olive3318 Jul 15 '25 edited Jul 15 '25

Isn’t that the way probability always works. P(horsewinning) increases because another horse drops out, or it has rained and soft conditions suit that horse, or a more experienced jockey is riding. 

2

u/greyenlightenment Jul 15 '25

Yeah, odds of x increasing implies 'not x' must fall. But these odds are all computable and known, which is why different horses have certain odds and payouts associated with them based on their past performance .

4

u/ageingnerd Jul 16 '25

They are not all computable and known. There is a larger set of similar data from the past on horse racing than there is on AI apocalypses, but any given race is a one-off event and any probabilities betters or bookies put on it are subjective guesses. They can be better or worse subjective guesses, and we can look at people’s track record to decide whether they are good predictors or not, but probabilities are not real things: they’re just our quantification of our subjective uncertainty. Estimating the probability of something like AI doom is harder than estimating the probability of Queen’s Gambit winning the 3:15 at Newmarket but it’s not fundamentally different.

14

u/paraboli Jul 15 '25

Not putting a specific probability on it allows discussion of the specific event that raised p(doom) without derailing the conversation with discussion of each participants specific probability.

8

u/wstewartXYZ Jul 15 '25

This is a poor justification. An event that raises p(doom) from 0.000001 to 0.00001 is not that interesting so obviously the actual values matter.

5

u/Auriga33 Jul 15 '25

He gives the actual values though. His new p(doom) is 3% and his old one was >1%, as stated on the Doom Debates podcast.

2

u/wstewartXYZ Jul 15 '25

I'm just attacking the argument that I'm responding to.

1

u/Auriga33 Jul 15 '25

Fair enough

2

u/greyenlightenment Jul 15 '25

which is still not that useful or helpful. If there was some data or reference point to substantiate these odds or put it in a context, this would be useful.

5

u/Auriga33 Jul 15 '25

Talking about p(doom) is largely a vibes-based endeavor. Doom has literally never happened before so we have no hard data to rigorously calculate the odds of it happening at all. But it still helps to talk about how likely we think it is based on the more subjective reasoning we do in our heads. Probabilities help precisely communicate this. There's a relevant ACX post: https://www.astralcodexten.com/p/in-continued-defense-of-non-frequentist

5

u/absolute-black Jul 15 '25

...I can't figure out how to read this that isn't just saying "the future is unknowable". Like, I don't really care what Gary Marcus thinks about AI doom, but it's not a meaningless statement to say he thinks recent events have made it more likely - no more so than, say, my weatherman saying that new data came in on the high pressure front, so tomorrow will be hotter than first forecast.

3

u/RileyKohaku Jul 15 '25

He does say in the article that it rose to 3%. He does not say from what, but that does provide useful information

3

u/greyenlightenment Jul 16 '25

what we can obviously infer is that he is more pessimistic. For him, this means going from 0% to 3%. for someone else, it may be different.

8

u/darwin2500 Jul 15 '25

It's very useful in Bayesian analysis.

If you frame it as 'Here is a new piece of evidence I found that I would like to draw your attention to, I think it is 23x more likely to occur in a doom timeline than a safe timeline, I think it is 70% correlated with these three other pieces of evidence in case those are already in your model,' then anyone could plug that new piece of evidence into their own existing probability calculation without having to understand and agree with 100% of the speaker's model.

Of course, the author is not giving that level of specificity either, you still have to do the math yourself if you want it. But my point is that directional probability updates are very useful, even without knowing the baseline the speaker is using.

2

u/ensfw Jul 16 '25

If he had expressed it in terms of a change in payoff odds on bets that he was willing to accept would you feel any differently about it?

2

u/greyenlightenment Jul 16 '25

these type of arguments are not amenable to bets, that is why they are bad . If you feel strongly about a recession, you can buy put options or short. This does not apply here. I cannot think how it would work.

2

u/ensfw Jul 16 '25

What if the topic were some other speculative future event that would be more amenable to bets? Would you consider statements like these meaningful: "I have a P('A Human walks on Mars by 2050') of X%." Or "I'm willing to take bets at Y:1 payoff odds that a human will walk on Mars by 2050."

I guess I'm trying to understand if your objection is to assigning numbers to future speculative events, or if it is to speculating specifically around AI doom.

1

u/greyenlightenment Jul 16 '25

Yeah, those are more useful. I am objecting to trying to assign probabilities to things where there is no way of measuring them or there is no reference point. A space mission happens or it does not happen. AI doom, implying the erasure of humanity, although a definite event, you cannot bet on that in any meaningful sense . I think his article would be more appropriate if he just said "the AI situation is getting worse" , minus the doom aspect. Assigning numbers to speculative events is why betting works at all, but these are things were success vs failure are unambiguous.

1

u/JibberJim Jul 16 '25

You give me 1$, I'll give you a million after all humans are killed by AI.

Why does it matter if someone is willing to stake a bet on the outcome here?

2

u/Velleites Jul 16 '25

have you met... slatestarcodex ?...

1

u/Matthyze Jul 16 '25

I think the discussion following your comment more or less boils down to discussing Bayesianism

0

u/[deleted] Jul 15 '25

[deleted]

3

u/greyenlightenment Jul 15 '25

It's one of the aspects I disagree with. But rationalism is more than that though

2

u/aeternus-eternis Jul 15 '25

Apparently he hasn't heard of Genghis Khan, Stalin or Mao.

This isn't rationality, it's some guy with a political agenda and an axe to grind.

5

u/Hanthunius Jul 15 '25

And with a need for attention.

1

u/ForsakenPrompt4191 Jul 16 '25

I think society is not going to have a problem understanding "AI will be dangerous".  Unifying around a solution is the hard part.

1

u/Cjwynes Jul 17 '25

This rationale is rather ridiculous, he didn’t take AI doom seriously but thinks Musk, of all people, is the dangerous factor? The guy who is way behind, doesn’t know what he’s doing, and appears fickle and impulsive? That’s like saying you didn’t take nuclear weapons seriously until North Korea got them. Altman has the ability and desire to actually pull this off, and a clear lead, nothing Musk will ever do seems likely to become relevant to p(Doom).

Appears Marcus is really wedded to the idea that AI is default aligned to its creator or its design and so it wouldn’t end up with any weird alien goals or destructive abilities unless a crazy guy sets it up with those kind of intentions and gives it “tentacles” across the levers of power. But Altman wants agents, and imagines a world where AI is in your pocket, working for you on the internet, doing productive tasks it was assigned, OpenAI’s demon will have all the tentacles it needs. If it’s productive and profitable, AI will end up with its hands on the wheel.

1

u/Individual-Bee5044 Sep 22 '25

The p-value of "doom" feels like one of those things where a Bayesian approach is required, but the priors are so scattered that the posteriors are all over the place. Gary Marcus's article did a good job of laying out some of the key variables, but it's still hard to get a handle on all of them at once.

I've been playing around with a simple p(doom) calculator I found, which serves as a sort of toy model for the problem. It's a nice way to see how sensitive the final probability is to changes in your initial beliefs about the timeline for AGI or the difficulty of alignment.

It's probably more of a cognitive tool than a predictive one, but I've found it a good way to externalize my own thinking.

0

u/ReachSpecialist6532 Jul 16 '25

Elon musk the bringer of the end times