Former AI safety research engineer, now AI governance researcher at OpenAI. Blog:


Replacing fear
Shaping safer goals
AGI safety from first principles

Wiki Contributions


The ability to do so in general probably requires a super strong understanding. The ability to do so in specific limited cases probably doesn't. For example, suppose I decide to think about strawberries all day every day. It seems reasonable to infer that, after some period of doing this, my values will end up somehow more strawberry-related than they used to be. That's roughly analogous to what I'm suggesting in the section you quote.

+1, and in particular the paper claims that g is about twice as strong in language models as in humans and some animals.

I'm not confident that this is good research, but the original post really seems like it had a conclusion pre-written and was searching for arguments to defend it, rather than paying any attention to what other people might actually believe.

This comment feels like a central example of the kind of unhealthy thinking that I describe in this post: specifically, setting an implicit unrealistically high standard and then feeling viscerally negative about not meeting that standard, in a way that's divorced from action-relevant considerations.

It reassures me, and I think it's the right thing to do in this case, because policy discussions follow strong contextualizing norms. Using a layer of indirection, as Eliezer does here, makes it clearer that this is a theoretical discussion, rather than an attempt to actually advocate for that specific intervention.

There are some similarities, although I'm focusing on AI values not human values. Also, seems like the value change stuff is thinking about humanity on the level of an overall society, whereas I'm thinking about value systematization mostly on the level of an individual AI agent. (Of course, widespread deployment of an agent could have a significant effect on its values, if it continues to be updated. But I'm mainly focusing on the internal factors.)

The stronger version is: EUT is inadequate as a theory of agents (for the same reasons, and in the same ways) during an agent's "growing up" period as well as all the time. I think the latter is the case for several reasons, for example:

  • agents get exposed to novel "ontological entities" continuously (that e.g. they haven't yet formed evaluative stances with respect to), and not just while "growing up"
  • there is a (generative) logic that governs how an agent "grows up" (develops into a "proper agent"), and that same logic continues to apply throughout an agent's lifespan

I think this is a very important point; my post on value systematization is a (very early) attempt to gesture towards what an agent "growing up" might look like.

See Sam Altman here:

As we create successively more powerful systems, we want to deploy them and gain experience with operating them in the real world. We believe this is the best way to carefully steward AGI into existence—a gradual transition to a world with AGI is better than a sudden one. We expect powerful AI to make the rate of progress in the world much faster, and we think it’s better to adjust to this incrementally.

A gradual transition gives people, policymakers, and institutions time to understand what’s happening, personally experience the benefits and downsides of these systems, adapt our economy, and to put regulation in place. It also allows for society and AI to co-evolve, and for people collectively to figure out what they want while the stakes are relatively low.

And Sam has been pretty vocal in pushing for regulation in general.

There are over 100 companies globally with a market cap of more than 100 billion. If we're indexing on the $10 billion figure, these companies could have a bigger financial impact by doing "conspiracy-type" things that swung their value by <10%. How many of them have actually done that? No idea, but "dozens" doesn't seem implausible (especially when we note that many of them are based in authoritarian countries).

Re NSA: measuring the impact of the NSA in terms of inputs is misleading. The problem is that they're doing very highly-leveraged things like inserting backdoors into software, etc. That's true of politics more generally. It's very easy for politicians to insert clauses into bills that have >$10 billion of impact. How often are the negotiations leading up to that "conspiratorial"? Again, very hard to know.

in terms of things that are as clearly some kind of criminal or high-stakes government conspiracy, I think FTX stands among the biggest ones

This genuinely seems bizarre to me. A quick quote I found from googling:

The United Nations estimated in a 2011 report that worldwide proceeds from drug trafficking and other transnational organized crime were equivalent to 1.5 percent of global GDP, or $870 billion in 2009.

That's something like 100 FTXs per year; we mostly just don't see them. Basically I think that you're conflating legibility with impact. I agree FTX is one of the most legible ways in which people were defrauded this century; I also think it's a tiny blip on the scale of the world as a whole. (Of course, that doesn't make it okay by any means; it was clearly a big fuck-up, there's a lot we can and should learn from it, and a lot of people who were hurt.)

(I think you're thinking of Spinning Silver not Uprooted btw.)

There's a story on some blog (maybe Ozy's, or something similar in concept-space) about an analogy between children and mind-controlling aliens. Can't remember what it's called, but would appreciate a link if anyone has one; it does a great job at raising interesting questions about identity and agency via packing an emotional punch.

Load More