Ulisse Mini

Born too late to explore Earth; born too early to explore the galaxy; born just the right time to save humanity.



Alignment stream of thought

Wiki Contributions


I wasn't in a flaming asshole mood, it was a deliberate choice. I think being mean is necessary to accurately communicate vibes & feelings here, I could serialize stuff as "I'm feeling XYZ and think this makes people feel ABC" but this level of serialization won't activate people's mirror neurons & have them actually internalize anything.

Unsure if this worked, it definitely increased controversy & engagement but that wasn't my goal. The goal was to shock one or two people out of bad patterns.

Sorry, I was more criticizing a pattern I see in the community rather than you specifically

However, basically everyone I know who takes innate intelligence as "real and important" is dumber for it. It is very liable to mode collapse into fixed mindsets, and I've seen this (imo) happen a lot in the rat community.

(When trying to criticize a vibe / communicate a feeling it's more easily done with extreme language, serializing loses information. sorry.)

EDIT: I think this comment was overly harsh, leaving it below for reference. The harsh tone was contributed from being slightly burnt out from feeling like many people in EA were viewing me as their potential ender wiggin, and internalizing it.[1]

The people who suggest schemes like what I'm criticizing are all great people who are genuinely trying to help, and likely are.

Sometimes being a child in the machine can be hard though, and while I think I was ~mature and emotionally robust enough to take the world on my shoulders, many others (including adults) aren't.

An entire school system (or at least an entire network of universities, with university-level funding) focused on Sequences-style rationality in general and AI alignment in particular.


Genetic engineering, focused-training-from-a-young-age, or other extreme "talent development" setups.

Please stop being a fucking coward speculating on the internet about how child soldiers could solve your problems for you. Enders game is fiction, it would not work in reality, and that isn't even considering the negative effects on the kids. You aren't smart enough for galaxy brained plans like this to cause anything other than disaster.

In general rationalists need to get over their fetish for innate intelligence and actually do something instead of making excuses all day. I've mingled with good alignment researchers, they aren't supergeniuses, but they did actually try.

(This whole comment applies to Rationalists generally, not just the OP.)

  1. I should clarify this mostly wasn't stuff the atlas program contributed to. Most of the damage was done from my personality + heroic responsibility in rat fiction + dark arts of rationality + death with dignity post. Nor did atlas staff do much to extenuate this, seeing myself as one of the best they could find was most of it, cementing the deep "no one will save you or those you love" feeling. ↩︎


Excited to see what comes out of this. I do want to raise attention to this failure mode covered in the sequences. however. I'd love for those who do the program try to bind their results to reality in some way, ideally having a concrete result of how they're substantively stronger afterwards, and how this replicated with other participants who did the training.

Really nice post. One thing I'm curious about is this line:

This provides some intuitions about what sort of predictor you'd need to get a non-delusional agent - for instance, it should be possible if you simulate the agent's entire boundary.

I don't see the connection here? Haven't read the paper though.

Quick thoughts on creating a anti-human chess engine.

  1. Use maiachess to get a probability distribution over opponent moves based on their ELO. for extra credit fine-tune on that specific player's past games.
  2. Compute expectiminimax search over maia predictions. Bottom out with stockfish value when going deeper becomes impractical. (For MVP bottom out with stockfish after a couple ply, no need to be fancy.) Also note: We want to maximize (P(win)) not centipawn advantage.
  3. For extra credit, tune hyperparameters via self-play against maia (simulated human). Use lichess players as a validation set.
  4. ???
  5. Profit.

This might actually be a case where a chess GM would outperform an AI: they can think psychologically, so they can deliberately pick traps and positions that they know I would have difficulty with.

Emphasis needed. I expect a GM to beat you down a rook every time, and down a queen most times.

Stockfish assumes you will make optimal moves in planning and so plays defensive when down pieces, but an AI optimized to trick humans (i.e. allowing suboptimal play when humans are likely to make a mistake) would do far better. You could probably build this with maiachess, I recall seeing someone build something like this though I can't find the link right now.

Put another way, all the experiments you do are making a significant type error, Stockfish down a rook against a worse opponent does not play remotely like a GM down a rook. I would lose to a GM every time, I would beat Stockfish most times.

Haha nice work! I'm impressed you got TransformerLens working on Colab, I underestimated how much CPU ram they had. I would have shared a link to my notebook & Colab but figured it might be good to keep under the radar so people could preregister predictions.

Maybe the knowledge that you're hot on my heels will make me finish the LLAMAs post faster now ;)

From my perspective 9 (scaling fast) makes perfect sense since Conjecture is aiming to stay "slightly behind state of the art", and that requires engineering power.

Added italics. For the next post I'll break up the abstract into smaller paragraphs and/or make a TL;DR.

Load More