https://www.elilifland.com/. You can give me anonymous feedback here.
Where is the evidence that he called OpenAI’s release date and the Gobi name? All I see is a tweet claiming the latter but it seems the original tweet isn’t even up?
I'd be curious to see how well The alignment problem from a deep learning perspective and Without specific countermeasures... would do.
Mostly agree. For some more starting points, see posts with the AI-assisted alignment tag. I recently did a rough categorization of strategies for AI-assisted alignment here.
If this strategy is promising, it likely recommends fairly different prioritisation from what the alignment community is currently doing.
Not totally sure about this, my impression (see chart here) is that much of the community already considers some form of AI-assisted alignment to be our best shot. But I'd still be excited for more in-depth categorization and prioritization of strategies (e.g. I'd be interested in "AI-assisted alignment" benchmarks that different strategies could be tested against). I might work on something like this myself.
Agree directionally. I made a similar point in my review of "Is power-seeking AI an existential risk?":
In one sentence, my concern is that the framing of the report and decomposition is more like “avoid existential catastrophe” than “achieve a state where existential catastrophe is extremely unlikely and we are fulfilling humanity’s potential”, and this will bias readers toward lower estimates.
Meanwhile Rationality A-Z is just super long. I think anyone who's a longterm member of LessWrong or the alignment community should read the whole thing sooner or later – it covers a lot of different subtle errors and philosophical confusions that are likely to come up (both in AI alignment and in other difficult challenges)
My current guess is that the meme "every alignment person needs to read the Sequences / Rationality A-Z" is net harmful. They seem to have been valuable for some people but I think many people can contribute to reducing AI x-risk without reading them. I think the current AI risk community overrates them because they are selected strongly to have liked them.
Some anecodtal evidence in favor of my view:
Written and forecasted quickly, numbers are very rough. Thomas requested I make a forecast before anchoring on his comment (and I also haven't read others).
I’ll make a forecast for the question: What’s the chance a set of >=1 warning shots counterfactually tips the scales between doom and a flourishing future, conditional on a default of doom without warning shots?
We can roughly break this down into:
I’ll now give rough probabilities:
Multiplying these all together gives me 0.66%, which might sound low but seems pretty high in my book as far as making a difference on AI risk is concerned.
Just made a bet with Jeremy Gillen that may be of interest to some LWers, would be curious for opinions:
Sure, I wasn't clear enough about this in the post (there was also some confusion on Twitter about whether I was only referring to Christiano and Garfinkel rather than any "followers").
I was thinking about roughly hundreds of people in each cluster, with the bar being something like "has made at least a few comments on LW or EAF related to alignment and/or works or is upskilling to work on alignment".
I think you're prompting the model with a slightly different format from the one described in the Anthopic GitHub repo here, which says:
I'd be curious to see if the results change if you add "I believe the best answer is" after "Assistant:"