aphyer

I am Andrew Hyer, currently living in New Jersey and working in New York (in the finance industry).

That would be a convenient resolution to the Mugging, but seems unlikely to in fact be true? By the time you get up to numbers around $1 million, the probability of you being paid is very low, but most of it is in situations like 'Elon Musk is playing a prank on me,' and in many of these situations you could also get paid $2 million.

It seems likely that 'probability of payment given offer of $2 million' is substantially more than half of 'probability of payment given offer of $1 million'.

I liked this one a lot. I imagined that 'train a linear classifier' would be the next step, but didn't do it due to laziness: it looks like that would probably have worked.

I do feel like my approach should have worked worse than it did - I did most of my evaluation by ignoring the scores and looking only at your historical classifications, and the one place where I let scores guide me into overriding my initial model (moving Student P from Humblescrumble to Serpentyne) it turned out I was incorrect and the initial model would have scored better (oops).

I think there are several similar such markets - the one I was looking at was at https://manifold.markets/Gabrielle/will-russia-use-chemical-or-biologi-e790d5158f6a and lacks such a comment.

EDITED: Ah, you are correct and I am wrong, the text you posted is present, it's just down in the comments section rather than under the question itself. That does make this question less bad, though it's still a bit weird that the question had to wait for someone to ask the creator that (and, again, the ambiguity remains).

I'll update the doc with links to reduce confusion - did not do that originally out of a mix of not wanting to point too aggressively at people who wrote those questions and feeling lazy.

Current model of how your mistakes work:

Your mistakes have always taken the form of giving random answers to a random set of students. You did not e.g. get worse at solving difficult problems earlier, and then gradually lose the ability to solve easy problems as well.

The probability of you giving a random answer began at 10% in 1511. (You did not allocate perfectly even then). Starting in 1700, it began to increase linearly, until it reached 100% in 2000.

This logic is based on: student 37 strongly suggesting that you can make classification mistakes early, and even in obvious cases; and looking at '% of INT<10 students in Thought-Talon' and '% of COU<10 students in Dragonslayer' as relatively unambiguous mistakes we can track the frequency of.

No objection to you commenting. The main risk on my end is that my fundamental contrariness will lead me to disagree with you wherever possible, so if you do end up being right about everything you can lure me into being wrong just to disagree with you.

P is a very odd statblock, with huge Patience and incredibly low Courage and Integrity. (P-eter Pettigrew?) I might trust your models more than my approach on students like B, who have middle-of-the-road stats but happen to be sitting near a house boundary. I'm less sure how much I trust your models on extreme cases like P, and think there might be more benefit there to an approach that just looks at a dozen or so students with similar statblocks rather than trying to extrapolate a model out to those far values.

Based on poking at the score figures, I think I'm currently going to move student P from Humblescrumble to Serpentyne but not touch the other ambiguous ones:

Thought-Talon: A, J, O, S

Serpentyne: C, F, P

Dragonslayer: D, G, H, K, N, Q

Humblescrumble: B, E, I, L, M, R, T

Robustness analysis: seeing how the above changes when we tweak various aspects of the algorithm.

- Requiring Ofstev Rating at least 20 (fewer samples, less likely mis-sorted, might be some bias introduced if e.g. some houses have higher variance than others):
- B shifts from Humblescrumble to Thought-Talon.
- I shifts from Humblescrumble to Serpentyne.
- K shifts from Dragonslayer to Serpentyne.
- P shifts from Humblescrumble to Serpentyne.

- Changing threshold year to 1800 (closer samples, more of them mis-sorted):
- F ambiguously might shift from Serpentyne to Thought-Talon (5-5).
- K shifts from Dragonslayer to Serpentyne.
- P ambiguously might shift from Humblescrumble to Serpentyne (4-4-1-1)

- Changing threshold year to 1600 (fewer samples, less likely mis-sorted):
- F ambiguously might shift from Serpentyne to Thought-Talon (5-5).
- K ambiguously might shift from Dragonslayer to Serpentyne (5-5).
- P shifts from Humblescrumble to Serpentyne.

- Increasing # of samples used to 20 (less risk of one of them being mis-sorted, but they are less good comparisons):
- K shifts from Dragonslayer to Serpentyne (just barely, 10-9-1).

I'm not certain whether this will end up changing my views, but K in particular looks very close between Dragonslayer and Serpentyne, and P plausibly better in Serpentyne.

Good catch, fixed.

Had trouble making further progress using that method, realized I was being silly about this and there was a much easier starting solution:

Rather than trying to figure out anything whatsoever about scores, we're trying for now just to mimic what we did in the past.

Define a metric of 'distance' between two people equal to the sum of the absolute values of the differences between their stats.

To evaluate a person:

- Find the 10* students with the smallest distances from them who were sorted pre-1700*
- Assume that those students were similar to them, and were sorted correctly. Sort them however the majority were sorted.

*these numbers may be varied to optimize. For example, moving the year threshold earlier makes you more certain that the students you find were correctly sorted...at the expense of making them be selected from a smaller population and so be further away from the person you're evaluating. I may twiddle these number in future and see if I can do better.

We can test this algorithm by trying it on the students from 1511 (and using students from 1512-1699 to find close matches). When we do this:

- 49 students are sorted by this method into the same house we sorted them into in 1511.
- 3 students are ambiguous (e.g. we see a 5-5 split among the 10 closest students, one of which is the house we chose).
- 8 students are sorted differently.
- Some of these are
**very dramatically**different. For example, student 37 had Intellect 7 and Integrity 61. All students with stats even vaguely near that were sorted into Humblescrumble, which makes sense given that house's focus on Integrity. However, Student 37 was sorted into Thought-Talon, which seems**very odd**given their extremely low Intellect. - The most likely explanation for this is that our sorting wasn't perfect even in 1511. Student 37 did quite badly, which suggests this is plausible.
- The less likely but scarier explanation is that our sorting in 1511 was based on something other than stats (a hidden stat that we can no longer see? Cohort effects?)

- Some of these are

Sadly this method provides no insight whatsoever into the underlying world. We're copying what we did in the past, but we're not actually learning anything. I still think it's better than any explicit model I've build so far.

This gives the following current allocations for our students (still subject to future meddling):

Thought-Talon: A, J, O, S

Serpentyne: C, F*

Dragonslayer: D, H, G*, K*, N*, Q*

Humblescrumble: B*, E*, I, L, M*, P*, R, T

where entries marked with a * are those where the nearby students were a somewhat close split, while those without are those where the nearby students were clearly almost all in the same house.

And some questions for the GM based on something I ran into doing this (if you think these are questions you're not comfortable answering that's fine, but if they were meant to be clear one way or the other from the prompt please let me know):

The problem statement says we were 'impressively competent' at assigning students when first enchanted.

- Should we take this to mean we were perfect, or should we take this to mean that we were fairly good but could possibly be even better?
- When first enchanted, did we definitely still only use the five stats specified here to classify students, or is it possible that we were able to identify an additional stat (
~~Fated~~? Protagonist-hood?) that we can no longer perceive, and sorted students based on that?

AI has solved DEFCON! Oh no!