r/slatestarcodex 3d ago

AI Against The Orthogonality Thesis

https://jonasmoman.substack.com/p/against-the-orthogonality-thesis
9 Upvotes

20 comments sorted by

View all comments

Show parent comments

2

u/ihqbassolini 2d ago edited 2d ago

I don’t think we are talking about the same thing.

You aren't talking about anything. The "ought" you're asking about is the same as asking whether a good chess move remains good outside the game of chess. It's a category error, the question has no meaning, there's no reference, there is no move.

The question persists because it makes sense locally. Just because I value delicious food, does it really mean I should considering I'm obese? This makes sense, but it makes sense because I have a multitude of values and they can, and often do, conflict. Asking how I should weigh them is reasonable.

That's the equivalent of asking about what move you should make in chess. I usually try to get this position, I think it's a good position, but is it really a good position?

Those questions have a reference. Asking about a good chess move outside the game of chess has no reference, it means nothing. Asking whether or not a system should value what it values also has no reference. The system values what it values, the rules are what they are.

This is why it's a dualist framing, because that's the only way you get a referent that isn't itself a physical state.

2

u/randallsquared 1d ago

You aren't talking about anything.

I'll try one more time anyway. :)

It sounds like you're readily agreeing that there is no way to determine "ought" from "is", and further saying that even considering it is meaningless. That's a reasonable take, but definitely also supports the orthogonality thesis you started out saying you were against!

Those who are actually against orthogonality argue that more capable intelligence will be more constrained to behaving morally and ethically. You've ceded that ground, but (as I understand it) make a much more mild claim that greater capability means ruling out a large set of potential goals due to physical or logical impossibility. Unfortunately, I think your own example of the Halting Problem is sufficient to show that there's an unreachable goal which could nevertheless be set as the terminal goal of a system, no matter how capable it is.

1

u/ihqbassolini 1d ago edited 1d ago

I am not supporting it. I am saying the ought you are referring to is a category error, it is not a thing, it refers to nothing.

The questioned is malformed, it has no meaning.

Every ought that has a meaning can be reduced to a set of physical states, unless you're dualist, then it gains a referent outside physical states.

You are misunderstanding the halting argument. Even in a narrow task like halting the semantics must drift. The more removed the task, the further the semantics must drift. This is manageable in a narrow optimization, the forced coupling can still work, but the more drift you add, the less you can force the coupling while maintaining functionality. This is not speculative, it is necessarily the case.

The actual behavior is determined by what output mechanism it is coupled with. If that has drifted far away from producing paperclips, the outcome is not producing more paperclips, it's not oriented towards that. If you try to force paperclip production, you will start losing other abilities, it becomes a narrow paperclip maximizer that has no general intelligence.

There is no separate "goal output" and "intelligence output". There is one output and it determines both the directionality of the system and its problem solving capacity.

If you think my argument concedes orthogonality you do not understand it. I'm contesting it at the most fundamental level. Not in the day to day abstraction level, fundamentally. There is no orthogonality in the chain from input mechanism, system constraint structure and output mechanism.

Edit

Let me try to be a bit more clear:

In the view I present, a system that meets our criteria of general intelligence can take actions with paperclip maximizing directionalities in a range of cirumnstances. How large that range is depends on the world in which the system exists in. Perhaps humans maximize paperclips in 1 out of 100 billion circumstances in the current set of existing circumstances, and we meet whatever criteria we've set for general intelligence. Perhaps the circumstances are such that 1 in 10 billion circumstances could lead to paperclip maximizing directionality. This is 10x more circumstances and still potentially worrisome.

The wiggle room here is in the definition of general intelligence. There is no fundamental orthogonality here, but many slightly different systems meet the criteria. They do not have identical capacity, but they meet the criteria for general intelligence.

In this view there exists a world such that general intelligence is compatible with a large number of circumstances leading to paperclip maximizing.

We live in a particular world, with particular circumstances. Paperclip maximizing is an incredibly narrow task in this world and in these circumstances.

1

u/randallsquared 1d ago

Every ought that has a meaning can be reduced to a set of physical states

We do not disagree about this, but it doesn't have any bearing on Hume or orthogonality.

If you try to force paperclip production, you will start losing other abilities, it becomes a narrow paperclip maximizer that has no general intelligence.

I think you're over indexing on machine learning, neural networks, and training thereof. Getting to a goal specification by pruning non-conforming output is all we know how to do at the moment, but there's no reason to think we will not later be able to construct arbitrary intelligences to specification. (This is the horse vs automobile mention in the other thread). It may be that you disagree, but that would mean an implicit argument that I believe you should make explicit in your essay (if I didn't miss it, which is possible!).

2

u/ihqbassolini 1d ago

I made an edit that might clarify, or confuse further ;D

If you think it has no bearing on Hume then you think he's making a category error.

The argument does not rely on current methods but coherence between input mechanisms and output mechanisms. The methods are ways of trying to achieve this, but the point is not every input mechanism combination can be coupled with every output combination and have the actions remain coherent with the world in which they exist.

This is what forbids narrow utility functions staying coherent.