Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

The Unexpected Clanging

12Dagon

5Vladimir_Nesov

2Dagon

4Vladimir_Nesov

2Dagon

2Vladimir_Nesov

2Dagon

2Vladimir_Nesov

3Chris_Leong

4Vladimir_Nesov

1Chris_Leong

2Ben

2Jiro

4Chris_Leong

2Jiro

2Chris_Leong

2Jiro

2evand

2Chris_Leong

2Richard_Kennaway

1DaemonicSigil

1RogerDearnaley

New Comment

22 comments, sorted by Click to highlight new comments since: Today at 12:58 AM

Adversarial simulation by a very powerful entity is so obviously unsolvable that I'm caught off-guard every time it comes up.

Let Omega be an AI that can perfectly simulate your entire deliberation process ... such as to mess with you

Ok, the result is that you're messed with. Can we move on to more interesting questions?

Note that it does get a little more interesting (possibly not above 0, though), if you remove the anthropomorphization of Omega, either remove or model fully your time-bounded process (I think remove here, as there's no change during the experiment), and clearly specify whether you know that Omega uses that formula, or if you are predicting without any knowledge. I assume, but it's nice to specify, that you are indifferent to which direction you're wrong (that is, you don't care whether you guess 1 when it's 2 or guess 2 when it's 1, but you prefer to guess correctly over incorrectly).

I think, as the victim, I will recognize the paradox and give up - I have no clue what that reads as to Omega - either 0.5 or 1-epsilon or 0+epsilon, or "no answer". If Omega is taking my spoken answer, rather than my belief, and the payout utilities match the odds, I'll say whatever comes to mind, and take. my coinflip.

I will recognize the paradox and give up

A better decision procedure is possible, or better diagnostics of issues that arise in particular procedures.

Can we move on to more interesting questions?

More specifically, obviously breaking normal operational conditions in various ways is useful in highlighting what's normally broken in much more subtle ways. Mind-reading Omegas that perfectly predict your future decisions are shadowed by human minds that weakly and noisily guess future decisions of others. If these aren't accounted for in decision theory, there is a systematic leak in the abstraction, which won't be visible directly, without either much better tools or such thought experiments. This effect is ubiquitous and can't be addressed merely by giving up in the instances where it's recognizably there. Normal procedure should get good enough to be able to cope.

Is a better decision procedure possible? I don't see it in this thought experiment. A pointer to such a thing would help a lot.

breaking normal operational conditions in various ways is useful

I agree in many cases, but breaking fundamental decision causality in adversarial ways is not among those cases.

Ahh, that's an interesting take - I don't agree AT ALL, which I generally expect means I'm misunderstanding something. In what way are there active agents with more knowledge of my relevant future decisions than I have, and with sufficient predictive ability that even randomization is not effective.

The given thought experiment is effectively either "Omega reads my mind, and finds I don't have a probability, because I recognize the paradox", or "Omega takes my statement, then sets the probability to half of what I say". I'm getting a coinflip, yay!

When other people have even the slightest sense of your future decisions or beliefs, but you don't account for that effect (when deciding or reasoning formally, rather than intuitively), then your decision procedure would be systematically wrong in that respect, built on invalid premises. It would only be slightly wrong to the extent that others' edge on knowing your future decisions is weak, but wrong nonetheless. So it's worth building theory that accounts for that effect. You might take away that predictive ability by "giving up" in various ways, but not when you're reasoning and deciding routinely, which is when the ability of others to predict your decisions and beliefs is still there to some extent.

I suspect our disagreement may be on whether this post (or most of the Omega-style examples) is a useful extension of that. Comparing "very good prediction that you know about but can't react to, because you're a much weaker predictor than Omega" with "weak prediction that you don't know about, but could counter-model if you bothered" seems like a largely different scenario to me. To the point that I don't think you can learn much about one from the other.

"weak prediction that you don't know about, but could counter-model if you bothered"

That's why the distinction between reasoning intuitively and reasoning formally. If it's an explicit premise *of the theory that built the procedure* that this doesn't happen, the formal procedure won't be allowing you to "counter-model if you bothered". An intuitive workaround doesn't fix the issue *in the theory*.

I think it’s often valuable to provide a short post for describing phenomenon clearly so that you can then reference them in future posts without going on a massive detour.

Unfortunately, getting onto more interesting matters sometimes requires a bunch of setup first. I could skip the setup, but then everyone would end up confused.

There is incentive for hidden expectation/cognition that Omega isn't diagonalizing (things like creating new separate agents in the environment). Also, at least you can know how ground truth depends on official "expectation" of ground truth. Truth of knowledge of this dependence wasn't diagonalized away, so there is opportunity for control.

Omega can simulate *me* perfectly by assumption, therefore, in order to win (at least probabilistically) I need to base my estimate on something external that I have reason to believe Omega cannot simulate. One approach (for example) might be to bring a quantum random number generator (given our current best models of physics such a device cannot be predicted in advance by any simulation), and use that to introduce randomness into my decision making.

You could choose to make a decision that is, in relevant aspects, equivalent to simularing Omega. This subjects Omega to the Halting Problem. If you make the Halting Problem irrelevant by limiting time, you've also limited Omega's ability to perfectly simulate you, contradicting the conditions of the problem.

Yes, if Omega accurately simulates me and wants me to be wrong, Omega wins. But why do I need to get the answer *exactly* "right"? What does it matter if I'm slightly off?

This would be a (very slightly) more interesting problem if Omega was offering a bet or a reward and my goal was to maximize reward or utility or whatever. It sure looks like for this setup, combined with a non-adversarial reward schedule, I can get arbitrarily close to maximizing the reward.

Perhaps it would be fruitful to consider the two participants, who I'll call Alpha and Omega, as finite computer programs, for which Omega has access to Alpha's source code. Maybe Alpha also has access to Omega's source code. Each of them chooses a number (from the natural numbers, the reals, [0,1], {0,1}, or whatever). It is common knowledge between them that Omega's goal is to choose differently from Alpha, and Alpha's goal is to choose the same as Omega.

Given various constraints on the computational or proof-theoretic capabilities of Alpha and Omega, under what circumstances does either player have a winning strategy?

If they each have access to a source of randomness, the game could be generalised to Omega trying to maximise the probability that they differ, and Alpha's being to minimise that probability.

Interesting. This prank seems to be one you could play on a Logical Inductor, I wonder what the outcome would be? One fact that's possibly related is that computable functions are continuous. This would imply that whatever computable function Omega applies to your probability estimate, there exists a fixed point probability you can choose where you'll be correct about the monkey probability. Of course if you're a bounded agent thinking for a finite amount of time, you might as well be outputting rational probability estimates, in which case functions like become computable for Omega.

Suppose that I decide that my opinion on the location of the monkey will be left or right dependent on one bit of quantum randomness, which I will sample sufficiently close to the deadline that my doing so is outside Omega's backward lightcone at the time of the deadline, say a few tens of nanoseconds before the deadline if Omega is at least a few tens of feet away from me and the two boxes? By the (currently believed to be correct) laws of quantum mechanics, qbits cannot be cloned, and by locality, useful information cannot propagate faster than light, so unless Omega is capable of breaking very basic principles of (currently hypothesized) physical laws – say, by having access to faster-than-light travel or a functioning time loop not enclosed by an event horizon, or by having root access to a vast quantum-mechanics simulator that our entire universe is in fact running on – then it physically cannot predict this opinion. Obviously we have some remaining Knightian-uncertainty as to whether the true laws of physics (as opposed to our current best guess of them) allow either of these things or our universe is in fact a vast quantum simulation — but it's quite possible that the answer to the physics question is in fact 'No', as all current evidence suggests, in which case no matter how much classical or quantum computational power Omega throws at the problem there are random processes that it simply cannot reliably predict the outcome of.

[Also note that there is some actual observable evidence on the subject of the true laws of physics in this regard: the Fermi paradox, of why no aliens colonized Earth geological ages ago, gets even harder to explain if our universe's physical laws allow those aliens access to FTL and/or time loops.]

Classically, any computation can be simulated given its initial state and enough computational resources. In quantum information theory, that's also true, but a very fundamental law, the no-cloning theorem, implies that the available initial state information has to be classical rather than quantum, which means that the random results of quantum measurements in the real system and any simulation are not correlated. So quantum mechanics means that we *do* have access to real randomness that no external attacker can predict, regardless of their computational resources. Both quantum mechanical coherence and information not being able to travel faster than light-speed also provide ways for us to keep a secret so that it's physically impossible for it to leak for a short time.

So as long as Omega is causal (rather than being acausal or the sysop of our simulated universe) and we're not badly mistaken about the fundamental nature of physical laws, there are things that it's actually physically impossible for Omega to do, and beating the approach I outlined above is one of them. (As opposed to, say, using nanotech to sabotage my quantum-noise generator, or indeed to sabotage me, which *are* physically possible.)

So designing ideal decision theories for the correct way to act in a classical universe in the presence of other agents with large computational resources able to predict you perfectly doesn't seem very useful to me. We live in a quantum universe, initial state information will never be perfect, agents are highly non-linear systems, so quantum fluctuations getting blown up to classical scales by non-linear effects will soon cause a predictive model to fail after a few coherence times followed by a sufficient number of Lyapunov times. It's quite easy to build a system whose coherence and Lyapunov times are deliberately made short so that it's impossible to predict over quite short periods, if it wants to be (for example, continuously feed the output from a quantum random noise generator into perturbing the seed of a high-cryptographic-strength pseudo-random-number generator run on well-shielded hardware, ideally quantum hardware).

Of course, in a non-linear system, it's still possible to predict the climate far further ahead than you can predict the weather: if Omega has sufficiently vast quantum computational resources, it can run many full-quantum simulations of the entire system of every fundamental particle in me and my environment as far as I can see (well, apart from incoming photons of starlight, whose initial state it doesn't have access to), and extract statistics from this ensemble of simulations. But that doesn't let Omega predict if I'll actually guess left vs right, just determine that it's 50:50. Also (unless physical law is a great deal weirder than we believe), Omega is not going to be able to run these simulations as fast as real physics can happen — humans are 70% warm water, which contains a vast amount of quantum thermal entropy being shuffled extremely fast, some of it moving at light-speed as infra-red photons, and human metabolism is strongly and non-linearly coupled to this vast quantum-random-number-generator via diffusion and Brownian motion: so because of light-speed limits, the quantum processing units that the simulation was run on would need to be smaller than individual water molecules to be able to run the simulation in real-time. [It just might be possible to build something like that in the outer crust of a neuron star if there's some sufficiently interesting nucelonic chemistry under some combination of pressure and magnetic field strength there, but if so Omega is a *long* way away, and has many years of light-speed delay on anything they do around here.]

What Omega can do is run approximate simulations of some simplified heuristic of how I work, if one exists. If my brain was a digital computer, this might be very predictive. But a great deal of careful engineering has gone into making digital computers behave (under their operating conditions) in a way that's extremely reliably predictable by a specific simplified heuristic that doesn't require a full atomic-scale simulation. Typical physical or biological systems just don't have this property. Engineering something to ensure that it definitely *doesn't* have this property is easy, and in any environment containing agents with more computation resources than you, seems like a very obvious precaution.

So, an agent *can* easily arrange to act unpredictably, by acting randomly based on a suitably engineered randomness source rather than optimizing. Doing so makes its behavior depend on the unclonable quantum details of its initial state, so the probabilities can be predicted but the outcome cannot. In practice, even though they haven't been engineered for it, humans probably also have this property over some sufficiently long timescale (seconds, minutes or hours, perhaps), when they're not attempting to optimize the outcome of their actions.

[Admittedly, humans leak a vast amount of quantum information about their internal state in the form of things like infra-red photons emitted from their skin, but attempting to interpret those to try to keep a vast full-quantum simulation of every electron and nucleus inside a human (and everything in their environment) on-track would clearly require running at least that much quantum calculation (almost certainly many copies of it) in at least real-time, which again due to light-speed limits would require quantum computing elements smaller than water-molecule size. So again, it's not just way outside current technological feasibility, it's actually physically impossible, with the conceivable exception of inside the crust of a neutron star.]

As a more general version of this opinion, while we may have to worry about Omegas whose technology is far beyond ours, as long as they live in the same universe as us, there are some basic features of physical law that we're pretty sure are correct and would thus also apply even to Omegas. If we had managed to solve the alignment problem contingent on basic physical assumptions like information not propagating faster then the speed of light, time loops being impossible outside event horizons, and quanta being unclonable, then personally (as a physicist) I wouldn't be too concerned. Your guess about the Singularity may vary.

There are two boxes in front of you. In one of them, there is a little monkey with a cymbal, whilst the other box is empty. In precisely one hour the monkey will clang its cymbal.

While you wait, you produce an estimate of the probability of the monkey being in the first box. Let's assume that you form your last estimate, p, three seconds before the monkey clangs its cymbal. You can see the countdown and you know that it's your final estimate, partly because you're slow at arithmetic.

Let Omega be an AI that can perfectly simulate your entire deliberation process. Before you entered the room, Omega predicted what your last probability estimate would be and decided to place the monkey in a box such as to mess with you. Let q be the probability of Omega placing the monkey in the first box. In particular, Omega, sets q=p/2, unless p=0 or you haven't formed a probability estimate, in which case q=1.

What probability should you expect that the monkey is in the first box?

I think it's fairly clear that this is a no-win situation. No matter what the final probability estimate you form before clanging, as soon as you've locked it in, you know that it is incorrect, even if you haven't heard the clanging yet. You can try to escape this, but there's no reason that the universe has to play nice.

This problem can be seen as a variation on Death in Damascus. I designed this problem to reveal that the core challenge Death in Damascus poses isn't just that another process in the world can depend upon your decision, but that it can depend upon your expectations even if you don't actually make a decision based upon those expectations.

I also find this problem as a useful intuition pump as I think it's clearer that it's a no-win situation than in other similar problems. In Newcomb's problem, it's easy to get caught up thinking about the Principle of Dominance. In Death in Damascus, you can confuse yourself trying to figure out whether CDT recommends staying or fleeing. At least to me, in this problem it is clearer it is a dead end and that there's no way to beat Omega.

This is also a useful intuition pump for the Evil Genie Puzzle. When I first discovered this puzzle, I felt immensely confused that no matter which decision that you made you would immediately regret it. However, the complexity of the puzzle made it complicated for me to figure out exactly what to make of it, so when trying to solve it I came up with this problem as something easier to grok. I guess my position after considering the Unexpected Clanging is that you just have to accept that a sufficiently powerful agent may be able to mess with you like this and that you just have to deal with it. (I'll leave a more complete analysis to a future post).