magfrump

Mathematician turned software engineer. I like swords and book clubs.

Introduction To The Infra-Bayesianism Sequence

I think I see what I was confused about, which is that there is a specific countable family of properties, and these properties are discrete, so you aren't worried about **locally** distinguishing between hypotheses.

Introduction To The Infra-Bayesianism Sequence

I am confused about how the mechanisms and desiderata you lay out here can give meaningful differences of prediction over complete spaces of environments. Maybe it is possible to address this problem separately.

In particular, imagine the following environments:

E1: the outcome is deterministically 0 at even time steps and 1 at odd time steps.

E2: the outcome is deterministically 0 at even time steps up to step 100 and 1 at odd time steps up to step 100, then starts to be drawn randomly based on some uncomputable process.

E3: the outcome is drawn deterministically based on the action taken in a way which happens to give 0 for the first 100 even step actions and 1 for the odd step actions.

All of these deterministically predict all of the first 200 observations with probability 1. I have an intuition that if you get that set of 200 observations, you should be favoring E1, but I don't see how your update rule makes that possible without some prior measure over environments or some notion of Occam's Razor.

In the examples you give there are systemic differences between the environments but it isn't clear to me how the update is handled "locally" for environments that give the same predictions for all observed actions but diverge in the future, which seems sticky to me in practice.

Radical Probabilism

So one of the first thoughts I had when reading this was whether you can model any Radical Probabilist as a Bayesian agent that has some probability mass on "my assumptions are wrong" and will have that probability mass increase so that it questions its assumptions over a "reasonable timeframe" for whatever definition.

For the case of coin flips, there is a clear assumption in the naive model that the coin flips are independent of each other, which can be fairly simply expressed as $P(flip_i = H | flip_{j} = H) = P(flip_i = H | flip_{j} = T) \forall j < i$. In the case of the coin that flips 1 heads, 5 tails, 25 heads, 125 tails, just evaluating j=i-1 through the 31st flip gives P(H|last flip heads) = 24/25, P(H|last flip tails) = 1/5, which is unlikely at p=~1e-4, which is approximately the difference in bayesian weight between the hypothesis H1: the coin flips heads 26/31 times (P(E|H1)=~1e-6) and H0: the coin flips heads unpredictably (1/2 the time, P(E|H0)=~4e-10) which is a better hypothesis in the long run until you expand your hypothesis space.

So in this case, the "I don't have the hypothesis in my space" hypothesis actually wins out right around the 30th-32nd flip, possibly about the same time a human would be identifying the alternate hypothesis. That seems helpful!

However this relies on the fact that this specific hypothesis has a single very clear assumption and there is a single very clear calculation that can be done to test that assumption. Even in this case though, the "independence of all coin flips" assumption makes a bunch more predictions, like that coin flips two apart are independent, etc. calculating all of these may be theoretically possible but it's arduous in practice, and would give rise to far too much false evidence--for example, in real life there are often distributions that look a lot like normal distributions in the general sense that over half the data is within one standard deviation of the mean and 90% of the data is within two standard deviations, but where if you apply an actual hypothesis test of whether the data is normally distributed it will point out some ways that it isn't exactly normal (only 62% of the data is in this region, not 68%! etc.).

It seems like the idea of having a specific hypothesis in your space labeled "I don't have the right hypothesis in my space" can work okay under the conditions

1. You have a clearly stated assumption which defines your current hypothesis space

2. You have a clear statistical test which shows when data doesn't match your hypothesis space

3. You know how much data needs to be present for that test to be valid--both in terms of the minimum for it to distinguish itself so you don't follow conspiracy theories, and something like a maximum (maybe this will naturally emerge from tracking the probability of the data given the null hypothesis, maybe not).

I have no idea whether these conditions are reasonable "in practice" whatever that means, so I'm not really clear whether this framework is useful, but it's what I thought of and I want to share even negative results in case other people had the same thoughts.

Price Gouging and Speculative Costs

So is there a way to run a charitable organization that accomplishes this?

For example, if an org started ordering respirators to be built to expand capacity starting in January, would that be a good fit for Open Phil funding?

Would LW be able to convince people to move money to it at that point?

What will the economic effects of COVID-19 be?

One piece of this is how many businesses will go out of business.

My cousin, who owns a small business, suggested that 50-75% of small businesses might go under based on a couple of months of being unable to earn money. I don't know enough specifics to dissect exactly what that means but just to name some more specific questions that probably have answers:

How much operating runway do most small businesses have?

How much of that determination is based on rent, and how likely is it that they will have rent payments suspended for a month or more?

How much of that is from payroll, and how much lower will payroll costs be? Presumably much lower for businesses that are not operating at all.

Will employees return to work without difficulty in general?

What will be provided in terms of government assistance?

If a large number of small businesses go under, this will have significant downstream effects on the economy.

Will business lot rents go down?

Where will business owners transition for work?

Will some job markets be flooded? Which ones? How much will this vary by area?

How to Identify an Immoral Maze

I don't like the specific description of levels of hierarchy for reasons I'm not quite certain about. This is at least partly just the phrasing, not the deeper point.

One piece of this is that, as is mentioned in some other comments, not every level of management has the same size. For example, if I regularly speak with my manager's manager and they have both explicit awareness of the object level work I do and share some responsibility for the outcome of that work, are they really two levels above me? What if I only talk to them once a month? Once a quarter? There may be some calculus that one can use to compute whether these count as 2 levels or 1.27 levels, but it seems like it has a strong interaction with the other criteria, such as how much slack I'm given, whether excellence is measurable, and who has skin in the game.

Maybe this is an optimistic take based on good luck with the team I currently work on, but my expectation is that usually there will be more levels of hierarchy than there are levels of non-interaction; that is, I expect most actual ranked titles to correspond to 1/2 or less of a level of hierarchy, which makes it a bit difficult to measure the depth of an organization.

Maybe this is just pessimistic of me though, and it's easy to find an up to date, readable org chart when joining an organization? That doesn't match my experience though.

Anyway I don't really feel satisfied that I've found my true objection but maybe this will help someone else or future me identify something.

What is Life in an Immoral Maze?

If your value at a company is 90% determined by your relationships, moving to a new company in most cases means giving up 90% of your value to the company since you will have no relationships at a new company outside of a coordinated move.

If you aren’t in middle management you potentially have other ways of demonstrating value to be better placed and not sacrifice as much career progress.

A LessWrong Crypto Autopsy

This post distinguishes between the success of the LW community on identifying crypto and the relative failure on acting on crypto in a way that reminds me of how important it is to actually act on information instead of just processing it mentally.

I think this failure mode of understanding a problem but failing to act on that understanding is a very common one for me and I would expect for other readers. I think both emphasizing that this is a part of the problem to be solved, and illustrating specific benefits from solving that problem in a historical context, where you can actually assign monetary value to those outcomes, is a great way to emphasize the specific value involved in rationality.

Also the discussion quickly converges on a relatively cheap solution of writing up tutorial style documentation for processes like this that you've found to be high value. That kind of intro tutorial is one of the most valuable things to read for exactly this reason, because it can close that "understanding->action" gap and I would love to read more articles inspired by this notion that there are plots of value ready to be grasped.

Jacob's Twit, errr, Shortform

I certainly have the moral instinct to.

I don't have a lot of experience with people within my friend group hooking up, or necessarily a lot of experience hearing about the details of hookups enough to have explicitly put me in that situation.

I have had several personal experiences where I reciprocated advances from women, then later been hit by the fallout of the lack of explicit verbal negotiation of what was going to happen. And I certainly reprimand friends (including women) for failing to communicate in their relationships at a broader level when I do know about it.

I mean distinguishing between hypotheses that give very similar predictions--like the difference between a coin coming up heads 50% vs. 51% of the time.

As I said in my other comment, I think the assumption that you have discrete hypotheses is what I was missing.

Though for any countable set of hypotheses, you can expand that set by prepending some finite number of deterministic outcomes for the first several actions. The limit of this expansion is still countable, and the set of hypotheses that assign probability 1 to your observations is the same at every time step. I'm confused in this case about (1) whether or not this set of hypotheses is discrete and (2) whether hypotheses with shorter deterministic prefixes assign enough probability to allow meaningful inference in this case anyway.

I may mostly be confused about more basic statistical inference things that don't have to do with this setting.