lc

Sorry in advance.

Sequences

The Territories
Mechanics of Tradecraft

Wiki Contributions

Comments

I too tried to read this post and couldn't figure out what its point was most of the time.

The attack surface differs enormously from car to car.

  1. Why would CEV be difficult to learn?

I'm not an alignment researcher, so someone might be cringing at my answers. That said, responding to some aspects of the initial comment:

Humans are relatively dumb, so why can't even a relatively dumb AI learn the same ability to distinguish utopias from dystopias?

The problem is not building AIs that are capable of distinguishing human utopias from dystopias - that's largely a given if you have general intelligence. The problem is building AIs that target human utopia safely first-try. It's not a matter of giving AIs some internal module native to humans that lets them discern good outcomes from bad outcomes, it's having them care about that nuance at all.

if CEV is impossible to learn first try, why not shoot for something less ambitious? Value is fragile, OK, but aren't there easier utopias?

I would suppose (as aforementioned, being empirically bad at this kind of analysis) that the problem is inherent to giving AIs open-ended goals that require wresting control of the Earth and its resources from humans, which is what "shooting for utopia" would involve. Strawberry tasks, being something that naively seems more amenable to things like power-seeking penalties and oversight via interpretability tools, sound easier to perform safely than strict optimization of any particular target.

lc2d2415

I would be more impressed if he had used the information bottleneck as a simple example of a varying training condition, instead of authoritatively declaring it The Difference, accompanied with its own just so story to explain discrepancies in implementation that haven't even been demonstrated. I'm not even sure the analogy is correct; is the 7.5MB storing training parameters or the python code?

What's up with the back-to-back shootings in California by two Asian men over 65?

The oldest woman I've ever dated was in her thirties. Forty would be a little weird, and I probably wouldn't instigate, but if I thought she seemed open to it I wouldn't rule it out.

I don't think we ever had a chance.

lc5d1311

The biggest surprise to me was when he said that he thought short timelines were safer than long timelines. The reason for that is not obvious to me. Maybe something to do with contingent geopolitics.

What do you expect him to say? "Yeah, longer timelines and consolidated AGI development efforts are great, I'm shorting your life expectancies as we speak"? The only way you can be a Sam Altman is by convincing yourself that nuclear proliferation makes the world safer.

Man, I don't doubt you're telling the truth, but I find this bizarre. As a 22yo just starting his career, I would kill to go on a couple dates with some women "in the cluster", I just mostly never get a chance to because I don't know any such women.

Load More