This post is meant to be a linkable resource. It contains a short list of guidelines that are meant to be fairly straightforward and uncontroversial, for the purpose of nurturing and strengthening a culture of clear thinking, clear communication, and collaborative truth-seeking.
"Alas," said Dumbledore, "we all know that what should be, and what is, are two different things. Thank you for keeping this in mind."
There is also (for those who want to read past the simple list) substantial expansion/clarification of each specific guideline, along with justification for the overall philosophy behind the set.
Once someone has a deep, rich understanding of a complex topic, they are often able to refer to that topic with short, simple sentences that correctly convey the intended meaning to other people with...
This is a lightly edited transcript of a chatroom conversation between Scott Alexander and Eliezer Yudkowsky last year, following up on the Late 2021 MIRI Conversations. Questions discussed include "How hard is it to get the right goals into AGI systems?" and "In what contexts do AI systems exhibit 'consequentialism'?".
[Yudkowsky][13:29] @ScottAlexander ready when you are |
[Alexander][13:31] Okay, how do you want to do this? |
[Yudkowsky][13:32] If you have an agenda of Things To Ask, you can follow it; otherwise I can start by posing a probing question or you can? We've been very much winging it on these and that has worked... as well as you have seen it working! |
[Alexander][13:34] Okay. I'll post from my agenda. I'm assuming we both have the right to edit logs |
Fair enough. Nonetheless, I have had this experience many times with Eliezer, including when dialoguing with people with much more domain-experience than Scott.
Truffles are pretty great: gooey fat-softened flavored chocolate inside, protected by a shell of hard chocolate so you don't get your fingers messy. Sometimes I'll make them with the kids:
This is a good amount of work: you need to prepare the filling, chill, shape little spheres, chill, and then dip them. Worth it for making presents or sharing, but not if I'm making something for myself. Instead I just make a jar of ganache:
I'll eat it with a spoon over a week or two, 1-3oz after lunch each day.
At it's simplest it's chocolate and fat. I normally use Callebaut semi-sweet which I can get in bulk for $3.42/lb ($0.21/oz):
I'll usually use butter and cream, though coconut cream works well if you want to make it vegan. Often I'll mix in some...
We had some discussions of the AGI ruin arguments within the DeepMind alignment team to clarify for ourselves which of these arguments we are most concerned about and what the implications are for our work. This post summarizes the opinions of a subset of the alignment team on these arguments. Disclaimer: these are our own opinions that do not represent the views of DeepMind as a whole or its broader community of safety researchers.
This doc shows opinions and comments from 8 people on the alignment team (without attribution). For each section of the list, we show a table summarizing agreement / disagreement with the arguments in that section (the tables can be found in this sheet). Each row is sorted from Agree to Disagree, so a column does not...
They haven't managed to do it so far for climate change, which has received massively more attention than AGI. I have seen many times this example being used to argue that we can indeed be successful at coordinating for major challenges, but I think this case is misleading: CFC never played a major role in the economy and they were easily replaceable, so forbidding them was not such an important move.
[Neurotic status: Until recently, I've felt entirely comfortable telling friends, family, and co-workers that I'm a rationalist (or an aspiring rationalist)—that it wouldn't out-group me among them, or if it did, it was probably worth being out-grouped over. I will happily continue to use Bayesian reasoning, but I can't rep the label "IRL" right now. The distal reasons for this change are (1) I believe rationalism and effective altruism as movements are losing their ability to self-correct, (2) I'm not in a position to help fix (1). I hope this is temporary or that I'm wrong. I started writing last week, and there may have been development since then that I missed. I love you all like a milkshake.
I avoided using easily searchable terms below to keep...
I too tried to read this post and couldn't figure out what its point was most of the time.
This weekend I was in San Luis Obispo for a gig, about halfway between SF and LA. It's possible to fly into SBP, but since I was traveling with two of my kids it was a lot cheaper to fly into SFO and drive down, and only slightly slower.
I'm signed up with Hertz's reward program ("Gold") and one of the benefits is that you pick out your own car. When I got to SFO, there were several Tesla Model 3s in the "Gold" area. This was somewhat surprising—I had only paid for a small sedan ("B") and the Teslas are fancy ("E7")—but it seemed like it would be interesting to try one out and I liked the idea of not paying for gas. I got us loaded up but when I got to the exit...
The attack surface differs enormously from car to car.
Let’s say I know how to build / train a human-level (more specifically, John von Neumann level) AGI. And let’s say that we (and/or the AGI itself) have already spent a few years[1] on making the algorithm work better and more efficiently.
Question: How much compute will it take to run this AGI?
(NB: I said "running" an AGI, not training / programming an AGI. I'll talk a bit about “training compute” at the very end.)
Answer: I don’t know. But that doesn’t seem to be stopping me from writing this post. ¯\_(ツ)_/¯ My current feeling—which I can easily imagine changing after discussion (which is a major reason I'm writing this!)—seems to be:
Still mulling this over. I may end up revising the post and/or writing a follow-up. :)
At the end of the second and final round of the Inverse Scaling Prize, we’re awarding 7 more Third Prizes. The Prize aimed to identify important tasks on which language models (LMs) perform worse the larger they are (“inverse scaling”). Inverse scaling may reveal cases where LM training actively encourages behaviors that are misaligned with human preferences. The contest started on June 27th and concluded on October 27th, 2022 – thanks to everyone who participated! Across the two rounds, we had over 80 unique submissions and gave out a total of 11 Third Prizes.
We are also accepting updates to two previous prize-winners (quote-repetition and redefine-math). For more details on the first round winners, see the Round 1 Announcement Post.
We didn't find the kind of robust, major long-term-relevant problems that...
I think its also not obvious how it solves the problem, whether its about the model only being capable of doing the reasoning required using multiple steps(though why the inverse scale then) or something more like writing an explanation makes the model more likely to use the right kind of reasoning.
And inside of that second option there's a lot of ways that could work internally whether its about distributions of kinds of humans it predicts, or something more like different circuits being activated in different contexts in a way that doesn't have to ...
...On January 26, 2023, NIST released the AI Risk Management Framework (AI RMF 1.0) along with a companion NIST AI RMF Playbook, AI RMF Explainer Video, an AI RMF Roadmap, AI RMF Crosswalk, and various Perspectives. Watch the event here.
In collaboration with the private and public sectors, NIST has developed a framework to better manage risks to individuals, organizations, and society associated with artificial intelligence (AI). The NIST AI Risk Management Framework (AI RMF) is intended for voluntary use and to improve the ability to incorporate trustworthiness considerations into the design, development, use, and evaluation of AI products, services, and systems.
Released on January 26, 2023, the Framework was developed through a consensus-driven, open, transparent, and collaborative process that included a Request for Information, several draft versions for public comments, multiple workshops, and
Does it say anything about AI risk that is about the real risks? (Have not clicked the links, the text above did not indicate to me one way or another).
for a basics, this post is long, and I have a lot of critique I'd like to write that I'd hope to see edited. However, this post has been posted to a blogging platform, not a wiki platform; it is difficult to propose simplifying refactors for a post. I've downvoted for now and I think I'm not the only one downvoting, would be curious to hear reasons for downvotes from others and what would reverse them. would be cool if lesswrong was suddenly a wiki with editing features and fediverse publishing. you mention you want to edit; looking forward to those, hoping to upvote once edited a bit.
unrelated, did you know lesswrong has a "hide author until hovered" feature that for some reason isn't on by default with explanation? :D