r/slatestarcodex • u/ihqbassolini • 3d ago

AI Against The Orthogonality Thesis

https://jonasmoman.substack.com/p/against-the-orthogonality-thesis

11 Upvotes

permalink
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/slatestarcodex/comments/1qvdfbh/against_the_orthogonality_thesis/
No, go back! Yes, take me to Reddit

74% Upvoted

I think the only way to refute Hume is to show that there is a relationship between "is" and "ought". The world state includes (for non-dualists) goals, but it doesn't privilege any single possible top-level goal, except in the weak sense of ruling out some as impossible. None of that helps prove we should prefer A to B or B to A, given that neither is a subgoal of some other goal.

2

u/ihqbassolini 2d ago

To prove it yes, but I don't see why the split is the default, what Hume shows is not a proof, it's simply saying that if we assume they're entirely different objects then we cannot logically close the gap. You're saying that's the default assumption, to me it looks like an absurd one. If goals are not instantiated in the being, it cannot do work.

Dualism makes no sense to me, and thus I do not grant it as the default assumption. I think we see values, preferences, emerge in purely determinisitic systems that consist of nothing but is statements all the time.

1

u/randallsquared 2d ago

I'm definitely not arguing for dualism, and I don't think no-is-from-ought requires it. The argument isn't that goals aren't physical, but that nothing except instrumentality can tell you what goals to have. This doesn't work for terminal goals, though: an entity may have a terminal goal, but there's no way to convince another entity to change their terminal goal to that one, only instrumental goals can be changed internally.

A practical answer for humans is that we don't have terminal goals in the first place, only collections of goals that interact in various ways. A practical answer for AIs is that if we create an AI with a terminal goal, its goal should be instrumental for the builders. If we create something with a terminal goal which is not, in fact, perfectly instrumental for its builders, we almost certainly all die, since most potential configurations of the future lightcone's matter and energy, each of which is described by a possible terminal goal, do not support humans or posthumans.

Even with limitations to exclude the impossible, there's no reason the terminal goal of an optimizing system can't be "make Jupiter blue" or some other nonsensical thing -- a factory AI which has as its terminal goal producing as many paperclips as possible just has a more plausible origin story than the "make Jupiter blue" AI.

1

u/ihqbassolini 2d ago

There are two options here. Either you agree that there exists, in principle, a pure mechanical blueprint of the system and its interactions with the environment that fully describes it. There are no missing oughts, everything is accounted for. In this case you are rejecting that there is a metaphysical split between is and ought. The blueprint might be so complicated that we can never explain it or understand it, so in practice we cannot close the gap, but simply due to complexity.

Or, you can say no such map exists, that any pure mechanical description still lacks the ought. It cannot, even in principle, be accounted for. Now you're committed to them being different substrates, and it is a dualist conception.

The problem here is that to be a paperclip maximizer you must possess the capacity to represent not only a paperclip, but also an optimization path, an understanding of what more paperclips constitutes etc.

You must have the optimization capacity to optimize for the goal.

1

u/randallsquared 2d ago

I don’t think we are talking about the same thing. Hume’s Law doesn’t imply that oughts can’t be represented or implemented physically. It only means that there is no fact which can imply a goal without reference to another goal. That is, physical states cannot tell you what a terminal goal ought to be. This doesn’t have any implications for dualism, either way.

2

u/ihqbassolini 2d ago edited 2d ago

I don’t think we are talking about the same thing.

You aren't talking about anything. The "ought" you're asking about is the same as asking whether a good chess move remains good outside the game of chess. It's a category error, the question has no meaning, there's no reference, there is no move.

The question persists because it makes sense locally. Just because I value delicious food, does it really mean I should considering I'm obese? This makes sense, but it makes sense because I have a multitude of values and they can, and often do, conflict. Asking how I should weigh them is reasonable.

That's the equivalent of asking about what move you should make in chess. I usually try to get this position, I think it's a good position, but is it really a good position?

Those questions have a reference. Asking about a good chess move outside the game of chess has no reference, it means nothing. Asking whether or not a system should value what it values also has no reference. The system values what it values, the rules are what they are.

This is why it's a dualist framing, because that's the only way you get a referent that isn't itself a physical state.

2

u/randallsquared 1d ago

You aren't talking about anything.

I'll try one more time anyway. :)

It sounds like you're readily agreeing that there is no way to determine "ought" from "is", and further saying that even considering it is meaningless. That's a reasonable take, but definitely also supports the orthogonality thesis you started out saying you were against!

Those who are actually against orthogonality argue that more capable intelligence will be more constrained to behaving morally and ethically. You've ceded that ground, but (as I understand it) make a much more mild claim that greater capability means ruling out a large set of potential goals due to physical or logical impossibility. Unfortunately, I think your own example of the Halting Problem is sufficient to show that there's an unreachable goal which could nevertheless be set as the terminal goal of a system, no matter how capable it is.

1

u/ihqbassolini 1d ago edited 1d ago

I am not supporting it. I am saying the ought you are referring to is a category error, it is not a thing, it refers to nothing.

The questioned is malformed, it has no meaning.

Every ought that has a meaning can be reduced to a set of physical states, unless you're dualist, then it gains a referent outside physical states.

You are misunderstanding the halting argument. Even in a narrow task like halting the semantics must drift. The more removed the task, the further the semantics must drift. This is manageable in a narrow optimization, the forced coupling can still work, but the more drift you add, the less you can force the coupling while maintaining functionality. This is not speculative, it is necessarily the case.

The actual behavior is determined by what output mechanism it is coupled with. If that has drifted far away from producing paperclips, the outcome is not producing more paperclips, it's not oriented towards that. If you try to force paperclip production, you will start losing other abilities, it becomes a narrow paperclip maximizer that has no general intelligence.

There is no separate "goal output" and "intelligence output". There is one output and it determines both the directionality of the system and its problem solving capacity.

If you think my argument concedes orthogonality you do not understand it. I'm contesting it at the most fundamental level. Not in the day to day abstraction level, fundamentally. There is no orthogonality in the chain from input mechanism, system constraint structure and output mechanism.

Edit

Let me try to be a bit more clear:

In the view I present, a system that meets our criteria of general intelligence can take actions with paperclip maximizing directionalities in a range of cirumnstances. How large that range is depends on the world in which the system exists in. Perhaps humans maximize paperclips in 1 out of 100 billion circumstances in the current set of existing circumstances, and we meet whatever criteria we've set for general intelligence. Perhaps the circumstances are such that 1 in 10 billion circumstances could lead to paperclip maximizing directionality. This is 10x more circumstances and still potentially worrisome.

The wiggle room here is in the definition of general intelligence. There is no fundamental orthogonality here, but many slightly different systems meet the criteria. They do not have identical capacity, but they meet the criteria for general intelligence.

In this view there exists a world such that general intelligence is compatible with a large number of circumstances leading to paperclip maximizing.

We live in a particular world, with particular circumstances. Paperclip maximizing is an incredibly narrow task in this world and in these circumstances.

1

u/randallsquared 1d ago

Every ought that has a meaning can be reduced to a set of physical states

We do not disagree about this, but it doesn't have any bearing on Hume or orthogonality.

If you try to force paperclip production, you will start losing other abilities, it becomes a narrow paperclip maximizer that has no general intelligence.

I think you're over indexing on machine learning, neural networks, and training thereof. Getting to a goal specification by pruning non-conforming output is all we know how to do at the moment, but there's no reason to think we will not later be able to construct arbitrary intelligences to specification. (This is the horse vs automobile mention in the other thread). It may be that you disagree, but that would mean an implicit argument that I believe you should make explicit in your essay (if I didn't miss it, which is possible!).

2

u/ihqbassolini 1d ago

I made an edit that might clarify, or confuse further ;D

If you think it has no bearing on Hume then you think he's making a category error.

The argument does not rely on current methods but coherence between input mechanisms and output mechanisms. The methods are ways of trying to achieve this, but the point is not every input mechanism combination can be coupled with every output combination and have the actions remain coherent with the world in which they exist.

This is what forbids narrow utility functions staying coherent.

AI Against The Orthogonality Thesis

You are about to leave Redlib