Zach Stein-Perlman

AI forecasting & strategy at AI Impacts. Blog: Not Optional.

Wiki Contributions


Interesting, thanks.

(I agree in part, but (1) planning for far/slow worlds is still useful and (2) I meant more like metrics or model evaluations are part of an intervention, e.g. incorporated into safety standards than metrics inform what we try to do.)

How is that relevant? It's about whether AI risk will be mainstream. I'm thinking about governance interventions by this community, which doesn't require the rest of the world to appreciate AI risk.

Four kinds of actors/processes/decisions are directly very important to AI governance:

  • Corporate self-governance
    • Adopting safety standards
      • Proving a model for government regulation
  • US policy (and China, EU, UK, and others to a lesser extent)
    • Regulation
    • Incorporating standards into law
  • Standard-setters setting standards
  • International relations
    • Treaties
    • Informal influence on safety standards

Related: How technical safety standards could promote TAI safety.

("Safety standards" sounds prosaic but it doesn't have to be.)

You’re right that we exclude universes obviously teeming with life. But we (roughly) upweight universes with lots of human-level civilizations that don’t see each other, or where civilizations that don't see another are likely to appear.

The largest source of uncertainty is factor ​, the fraction of habitable planets with life. . . . The fact that it did occur here doesn’t give us information about  other than the fact that ​ is not exactly zero due to anthropic bias—the observation that we exist would be the same whether life on Earth was an incredibly rare accident or whether it was inevitable.

I think the latter sentence here is a strong claim and a controversial assumption. In particular, I disagree; I favor the self-indication assumption and its apparent implication that we should weight a possible universe by the number of experiences identical to ours in it, so (roughly) weight a possible universe by the number of planets in it where human-level civilization appears.

+1 to recording beliefs.

More decision-relevant than propositions about superintelligence are propositions about something like the point of no return, which is probably a substantially lower bar.

(Writing quickly and without full justification.)

This post might say a thing that's true but I think the "illustrative warning about artificial intelligence" totally still stands. The warning, I think, is that selecting for inclusive fitness doesn't give you robust inclusive-fitness-optimizers; at least at human-level cognitive capabilities, changing/expanding the environment can cause humans' (mesa-optimizers') alignment to break pretty badly. I don't think you engage with this-- you claim "humans are actually weirdly aligned with natural selection" when we consider an expansive sense of "natural selection." I think this supports claims like "eventually AI will be really good at existing/surviving," not "AI will do something reasonably similar to what we want it to do or tried to train it to do."

I feel like there's confusion in this post between group-level survival and individual-level fitness but I don't want to try to investigate that now. (Edit: I totally agree with gwern's reply but I don't think it engages with katja's cruxes so there's more understanding-of-katja's-beliefs to do.)

I think we can theoretically get around 4 by comparing the value of AI stocks to non-AI stocks.

I think an additional problem is that we don't have a no-AGI-baseline to compare prices to-- we can see that Nvidia is worth $400B but we can't directly tell whether that includes lots of expected value from an AI boom or not.

AI risk decomposition based on agency or powerseeking or adversarial optimization or something

Epistemic status: confused.

Some vague, closely related ways to decompose AI risk into two kinds of risk:

  • Risk due to AI agency vs risk unrelated to agency
  • Risk due to AI goal-directedness vs risk unrelated to goal-directedness
  • Risk due to AI planning vs risk unrelated to planning
  • Risk due to AI consequentialism vs risk unrelated to consequentialism
  • Risk due to AI utility-maximization vs risk unrelated to utility-maximization
  • Risk due to AI powerseeking vs risk unrelated to powerseeking
  • Risk due to AI optimizing against you vs risk unrelated to adversarial optimization

The central reason to worry about powerseeking/whatever AI, I think, is that sufficiently (relatively) capable goal-directed systems instrumentally converge to disempowering you.

The central reason to worry about non-powerseeking/whatever AI, I think, is failure to generalize correctly from training-- distribution shift, Goodhart, You get what you measure.

Load More