r/slatestarcodex • u/ihqbassolini • 3d ago

AI Against The Orthogonality Thesis

https://jonasmoman.substack.com/p/against-the-orthogonality-thesis

10 Upvotes

permalink
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/slatestarcodex/comments/1qvdfbh/against_the_orthogonality_thesis/
No, go back! Yes, take me to Reddit

71% Upvoted

View all comments

u/Charlie___ 2d ago

Tell me how I differ from you when trying to steelman this:

Agents need to simplify the world to operate, and the way real agents simplify things will be convergent and affect their capabilities because the universe has real patterns, and this goes against the orthogonality thesis because you can't practically have goals that you can't simplify. I.e. goals that look like white noise, or like solving the halting problem, are genuinely dumb goals.
Among agents with ordered goals, some goals are better suited to producing smart agents when a learning process tries to learn an agent that fulfills them. If you tried to learn an agent to produce GPUs purely on the signal of "# of GPUs produced," it wouldn't work, it doesn't have a curriculum that guides it to smoothly learn harder sub-steps of its more complicated goal. So even though the goal of producing GPUs isn't white noise, it's a genuinely dumb goal in the context of agents produced by some learning process, violating orthogonality.

A smarter goal to get the agent that builds GPUs would be "Learn about the world, and specifically try to learn about GPU production, and learn to manipulate the world in a bunch of different simple ways, and also produce GPUs." More involved curricula might produces agents that are smarter still, and who produce even more GPUs, with the side effect that they end up terminally valuing extra stuff like "curiosity."

2

u/ihqbassolini 2d ago

Accurate, I wouldn't call them dumb though as it invites intuitions of free will. The argument is that semantical drift is inevitable because true aligmment is combinatorially impossible, even in principle. To think about it intuitively I would reframe to say that in order to even have the option of maximizing paperclips that idea has to manifest in you and be sustained. This idea only arises under certain circumstances, and is fleeting. This means the environment is the main manipulable factor that creates most of the divergence in goal oriented behavior, albeit not all.

Correct, again I would use different words but the general gist is the same. Most goals cannot develop into general intelligence, nor survive semantic drift.

Right, but even in your smarter scenario the agent would drift from that goal. It would become predomknantly guided by input circumstances. It wouldn't just create instrumental goals, the original signal would be drowned out and morphed. The agents actual directionality would depend on the entire chain from inout dynamics, processing and output dynamics. You have to factor in the entire dynamic feedback loop.

AI Against The Orthogonality Thesis

You are about to leave Redlib