Ben Pace

I'm an admin of this site; I work full-time on trying to help people on LessWrong refine the art of human rationality. (Longer bio.)

Sequences

AI Alignment Writing Day 2019
Transcript of Eric Weinstein / Peter Thiel Conversation
AI Alignment Writing Day 2018
Share Models, Not Beliefs

Wiki Contributions

Load More

Comments

Briefly registering disagreement: my first thought was an order of magnitude higher than yours. 

Brief sketch of my reasoning: Losing a staff member for 1-2 months really cuts out our ability to maintain the infrastructure we have responsibility for (like Lighthaven and Lightspeed grants and LW) while running at the organizational top priority — right now that's dialogues — and we're already stretched thin with only 2 people working on the top-priority full-time who don't have any side commitments (plus 2 other people working on it as their main focus but with side commitments). I've not got a definite sense of how we'd rearrange, but I can see worlds where it would cut our focus on the top priority by as much as 30% during that period, and that's not just the cost measured in the staff member's time, but reduces the value of everyone's time in a big way.

I just gave this a re-read, I forgot what a trip it is to read the thoughts of Eliezer Yudkowsky. It continues to be some of my favorite stuff in recent years written on LessWrong.

It's hard to relate to the world with a level of mastery over basic ideas as Eliezer has. I don't mean with this to vouch that his perspective is certainly correct, but I believe it is at least possible, and so I think he aspires to a knowledge of reality that I rarely if ever aspire to. Reading it inspires me to really think about how the world works, and really figure out what I know and what I don't. +9

(And the smart people dialoguing with him here are good sports for keeping up their side of the argument.)

This post helped me distinguish between having good reasons for my beliefs, and being able to clearly communicate and explain my reasoning, and (to me) painted the latter as pro-social and as a virtue rather than a terrible cost I was expected to pay.

Strong agree, crucial points, +4.

This isn't a post that I feel compelled to tell everyone they need to read, but nonetheless the idea immediately entered my lexicon, and I use it to this day, which is a pretty rare feat. +4

(Sometimes, instead of this phrase, I say that I'm feeling precious about my idea.)

A few points:

  • This post caused me to notice the ways that I find emergencies attractive. I am myself drawn to scenarios where all the moves are forced moves. If there's a fire, there's no uncertainty, no awkwardness, I just do whatever I can right now to put it out. It's like reality is just dragging me along and I don't really have to take responsibility for the rest of it, because all the tradeoffs are easy and any harder evaluations must be tightly time-bounded. I have started to notice the unhealthy ways in which I am drawn to things that have this nature, even when it's not worth it.
  • This post also pointed out the ways that emergencies are memetically fit, in that if you can convince people that your issue is an emergency, you can draw a lot of action and attention out from a lot of people right now. That's a powerful force. That made me update downward substantially on the background rate of emergencies in the world, even when everyone around me is screaming (euphemistically).
  • I think this post also helped me think about the long-term future more clearly. I often feel like my primary bottleneck in life is "believing that the future exists", and this helped me notice another way that I would behave differently if I believed that the future existed.

Also a great comment section (including but not limited to the comment by Anna on what she cut, which gave me a lot of the first two bullets).

+9.

Feedback: I had formed a guess as to who you meant to which I assigned >50% probability, and my guess was incorrect.

A few related thoughts:

  • One story we could tell is that the thing these people have in common is that they take alignment seriously, not that they are generally pessimists. 
  • I think alignment is unsolved in the general case and so this makes it harder to strongly argue that it will get solved for future systems, but I don't buy that people would not update on seeing a solution or strong arguments for that conclusion, and I think that some of Quintin's and Nora's arguments have caused people I know to rethink their positions and update some in that direction.
  • I think the rationalist and EA spaces have been healthy enough for people to express quite extreme positions of expecting an AI-takeover-slash-extinction. I think it would be a strongly negative sign for everyone in these spaces to have identical views or for everyone to give up all hope on civilization's prospects; but in the absence of that I think it's a sign of health that people are able to be open about having very strong views. I also think the people who most confidently anticipate an AI takeover sometimes feel and express hope.
  • I don't think everyone is starting with pessimism as their bottom line, and I think it's inaccurate to describe the majority of people in these ecosystems as temperamentally pessimistic or epistemically pessimistic.

It's correct that there's a distinction between whether people identify as pessimistic and whether they are pessimistic in their outlook. I think the first claim is false, and I actually also think the second claim is false, though I am less confident in that. 

Interview with Rohin Shah in Dec '19

Rohin reported an unusually large (90%) chance that AI systems will be safe without additional intervention. His optimism was largely based on his belief that AI development will be relatively gradual and AI researchers will correct safety issues that come up.

Paul Christiano in Dec '22

...without AI alignment, AI systems are reasonably likely to cause an irreversible catastrophe like human extinction. I think most people can agree that this would be bad, though there’s a lot of reasonable debate about whether it’s likely. I believe the total risk is around 10–20%, which is high enough to obsess over.

Scott Alexander, in Why I Am Not (As Much Of) A Doomer (As Some People) in March '23

I go back and forth more than I can really justify, but if you force me to give an estimate it’s probably around 33%; I think it’s very plausible that we die, but more likely that we survive (at least for a little while).

John Wentworth in Dec '21 (also see his to-me-inspiring stump speech from a month later):

What’s your plan for AI alignment?

Step 1: sort out our fundamental confusions about agency

Step 2: ambitious value learning (i.e. build an AI which correctly learns human values and optimizes for them)

Step 3: …

Step 4: profit!

… and do all that before AGI kills us all.

That sounds… awfully optimistic. Do you actually think that’s viable?

Better than a 50/50 chance of working in time.

Davidad also feels to me like an optimist to me about the world — someone who is excited about solving the problems and finding ways to win, and is excited about other people and ready to back major projects to set things on a good course. I don't know his probability of an AI takeover but I stand by that he doesn't seem pessimistic in personality.

On occasion when talking to researchers, I talk to someone who is optimistic that their research path will actually work. I won't name who but I recently spoke with a long-time researcher who believes that they have a major breakthrough and will be able to solve alignment. I think researchers can trick themselves into thinking they have a breakthrough when they don't, and this field is unusually lacking in feedback, so I'm not saying I straightforwardly buy their claims, but I think it's inaccurate to describe them all as pessimistic.

I think that short, critical comments can sometimes read as snarky/rude, and I don't want to speak that way to Nora. I also wanted to take some space to try to invoke the general approach to thinking about tribalism and show how I was applying it here, to separate my point from one that is only arguing against this particular tribal line that Nora is reifying, but instead to encourage restraint in general. Probably you're right that I could make it substantially shorter; writing concisely is a skill I want to work on.

I don't know who the "ai nihilists" are supposed to be. My sense is that you could've figured out from my comment objecting to playing and fast and loose with group names that I wouldn't think that phrase carved reality and that I wasn't sure who you have in mind!

Load More