johnswentworth

There's an asymmetry between local differences from the true model which can't match the true distribution (typically too few edges) and differences which can (typically too many edges). The former get about O(n) bits against them per local difference from the true model, the latter about O(log(n)), as the number of data points n grows.

Conceptually, the story for the log(n) scaling is: with n data points, we can typically estimate each parameter to ~log(n) bit precision. So, an extra parameter costs ~log(n) bits.

Just that does usually work pretty well for (at least a rough estimate of) the undirected graph structure, but then you don't know the directions of any arrows.

I've tried this before experimentally - i.e. code up a gaussian distribution with a graph structure, then check how well different graph structures compress the distribution. Modulo equivalent graph structures (e.g. A -> B -> C vs A <- B <- C vs A <- B -> C), the true structure is pretty consistently favored.

I'm not sure what motivation for worst-case reasoning you're thinking about here. Maybe just that there are many disjunctive ways things can go wrong other than bad capability evals and the AI will optimize against us?

This getting very meta, but I think my Real Answer is that there's an analogue of You Are Not Measuring What You Think You Are Measuring for plans. Like, the system just does not work any of the ways we're picturing it at all, so plans will just generally not at all do what we imagine they're going to do.

(Of course the plan could still in-principle have a high chance of "working", depending on the problem, insofar as the goal turns out to be easy to achieve, i.e. most plans work by default. But even in that case, the planner doesn't have counterfactual impact; just picking some random plan would have been about as likely to work.)

The general solution which You Are Not Measuring What You Think You Are Measuring suggested was "measure tons of stuff", so that hopefully you can figure out what you're actually measuring. The analogy of that technique for plans would be: plan for tons of different scenarios, failure modes, and/or goals. Find plans (or subplans) which generalize to tons of different cases, and there might be some hope that it generalizes to the real world. The plan can maybe be robust enough to work *even though* the system does not work at all the ways we imagine.

But if the plan doesn't even generalize to all the low-but-not-astronomically-low-probability possibilities we've thought of, then, man, it sure does seem a lot less likely to generalize to the real system. Like, that pretty strongly suggests that the plan will work only insofar as the system operates basically the way we imagined.

And for this exact failure mode, I think that improvements upon various relatively straight forward capability evals are likely to quite compelling as the most leveraged current interventions, but I'm not confident.

Personally, my take on basically-all capabilities evals which at all resemble the evals developed to date is You Are Not Measuring What You Think You Are Measuring; I expect them to mostly just not measure whatever turns out to matter in practice.

This answer clears the bar for at least some prize money to be paid out, though the amount will depend on how far other answers go by the deadline.

One thing which would make it stronger would be to provide a human-interpretable function for each equivalence class (so Alice can achieve the channel capacity by choosing among those functions).

The suggestions for variants of the problem are good suggestions, and good solutions to those variants would probably also qualify for prize money.

Yes, there is a story for a canonical factorization of , it's just separate from the story in this post.

Sounds like we need to unpack what "viewing as a latent which generates " is supposed to mean.

I start with a distribution . Let's say is a bunch of rolls of a biased die, of unknown bias. But I don't know that's what is; I just have the joint distribution of all these die-rolls. What I want to do is look at that distribution and somehow "recover" the underlying latent variable (bias of the die) and factorization, i.e. notice that I can write the distribution as , where is the bias in this case. Then when reasoning/updating, we can usually just think about how an individual die-roll interacts with , rather than all the other rolls, which is useful insofar as is much smaller than all the rolls.

Note that is not supposed to match ; then the representation would be useless. It's the marginal which is supposed to match .

The lightcone theorem lets us do something similar. Rather all the 's being independent given , only those 's sufficiently far apart are independent, but the concept is otherwise similar. We express as (or, really, , where summarizes info in relevant to , which is hopefully much smaller than all of ).

So I saw the Taxonomy Of What Magic Is Doing In Fantasy Books and Eliezer’s commentary on ASC's latest linkpost, and I have cached thoughts on the matter.

My cached thoughts start with a somewhat different question - not "what role does magic play in fantasy fiction?" (e.g. what fantasies does it fulfill), but rather... insofar as magic is a natural category, what does it denote? So I'm less interested in the relatively-expansive notion of "magic" sometimes seen in fiction (which includes e.g. alternate physics), and more interested in the pattern called "magic" which recurs among tons of real-world ancient cultures.

Claim (weakly held): the main natural category here is

symbols changing the territory. Normally symbols represent the world, and changing the symbols just makes them not match the world anymore - it doesn't make the world do something different. But if the symbols are "magic", then changing the symbols changes the things they represent in the world. Canonical examples:I would guess that most historical "magic" was of this type.