Running Lightcone Infrastructure, which runs LessWrong. You can reach me at habryka@lesswrong.com
Promoted to curated: I think this post overstates some claims, in-particular I think it underweighs some information theoretic arguments for sexual selection, and I do have a feeling that this post tells an overly neat story that leaves out a bunch of important bits, but I overall still feel like I learned a lot of useful things from this post, and it was clearly very well written.
Thank you for writing this!
Curation is a thing where the post shows up at the top of the frontpage for 1-2 weeks, and we send out the post to something like 30k people who are subscribed to get updates whenever we curate a post.
This is one comment that seems good to respond to: https://www.lesswrong.com/posts/yA8DWsHJeFZhDcQuo/the-talk-a-brief-explanation-of-sexual-dimorphism?commentId=LmHAALLKHkyrDtvei
I am quite interested in curating this, however, I do think there are a bunch of kind of important questions and objections in the comments. I would probably curate this post if you respond to them, even if I don't find the responses that compelling, because I do think this post seems quite good, but I want to get one more level of sanity-check on the content by the commenters.
Promoted to curated. I do think the top comments are pretty important context, in that I think some of the quotes and source material in this kind of post are probably pretty biased in how they present things, but nevertheless I find these case studies still really interesting, and I think there is a quite natural category of organizations here that deserves to be studied. I also in-general think this kind of post that tries to extract key quotes and material from longer existing works is quite valuable, and I would love to see more of that on the margin.
The ETHICS dataset has little to do with human values, it's just random questions with answers categorized by simplistic moral systems. Seeing that an LLM has a concept correlated with it has about as much to do with human values as it being good at predicting Netflix watch time.
This makes me confused what this post is trying to argue for. The evidence here seems about as relevant to alignment as figuring out whether LLM embeddings have a latent direction for "how much is something like a chair" or "how much is a set of concepts associated with the field of economics". It is a relevant question, but invoking the ETHICS dataset here as an additional interesting datapoint strikes me as confused. Did we have any reason to assume that the AI would be incapable of modeling what an extremely simplistic model of a hedonic utilitarian would prefer? Also, this doesn't really have that much to do with what humans value (naive hedonic utilitarianism really is an extremely simplified model of human values that lacks the vast majority of the complexity of what humans care about).
I think this post makes a few good points, but I think the norm of "before you claim that someone is overconfident or generally untrustworthy, start by actually showing that any of their object-level points are inaccurate" seems pretty reasonable to me, and seems to me more like what Eliezer was talking about.
Like, your post here seems to create a strong distinction between "arguing against Eliezer on the issues of FDT" and "arguing that Eliezer is untrustworthy based on his opinion on FDT", but like, I do think that the first step to either should be to actually make object-level arguments (omnizoid's post did that a bit, but as I commented on the post, the ratio of snark to object-level content was really quite bad).
A related thing that's coming to mind is that I have mediated a handful of disputes under conditions of secrecy. I currently don't view this as a betrayal of you (that I've accepted information that I cannot share with you) but do you view it as me betraying you somehow?
I think if, during those disputes, you committed to only say positive things about either party (in pretty broad generality, as non-disparagement clauses tend to do), and that you promised to keep that commitment of yours secret, and if because of that I ended up with a mistaken impression on reasonably high-stakes decisions, then yeah, I would feel betrayed by that.
I think accepting confidentiality is totally fine. It's costly, but I don't see a way around it in many circumstances. The NDA situation feels quite different to me, where it's really a quite direct commitment to providing filtered evidence, combined with a promise to keep that filtering secret, which seems very different from normal confidentiality to me.
I mean, yeah, sometimes there are pretty widespread deceptive or immoral practices, but I wouldn't consider them being widespread that great of an excuse to do them anyways (I think it's somewhat of an excuse, but not a huge one, and it does matter to me whether employees are informed that their severance is conditional on signing a non-disparagement clause when they leave, and whether anyone has ever complained about these, and as such you had the opportunity to reflect on your practices here).
I feel like the setup of a combined non-disclosure and non-disparagement agreement should have obviously raised huge flags for you, independently of its precedent in Silicon Valley.
I think a non-disparagement clause can make sense in some circumstances, but I find really very little excuse to combine that with a non-disclosure clause. This is directly asking the other person to engage in a deceptive relationship with anyone who wants to have an accurate model of what it's like to work for you. They are basically forced to lie when asked about their takes on the organization, since answering with "I cannot answer that" is now no longer an option due to revealing the non-disparagement agreement. And because of the disparagement clause they are only allowed to answer positively. This just seems like a crazy combination to me.
I think this combination is really not a reasonable thing to ask off of people in a community like ours, where people put huge amounts of effort into sharing information on the impact of different organizations, and where people freely share information about past employers, their flaws, their advantages, and where people (like me) have invested years of their life into building out talent pipelines and trying to cooperate on helping people find the most impactful places for them to work.
Like, I don't know what you mean by over-indexing. De-facto I recommended that people work for Wave, on the basis of information that you filtered for me, and most importantly, you contractually paid people off to keep that filtering hidden from me. How am I supposed to react with anything but betrayal? Like, yeah, it sounds to me like you paid at least tens (and maybe hundreds) of thousands of dollars explicitly so that I and other people like me would walk away with this kind of skewed impression. What does it mean to over-index on this?
I don't generally engage in high-trust relationships with random companies in Silicon Valley, so the costs for me there are much lower. I also generally don't recommend that people work there in the same way that I did for Wave, and didn't spend years of my life helping build a community that feeds into companies like Wave.
They were hidden up until this very moment, from me, presumably with a clause in the NDA that contractually committed everyone who signed them to keep them hidden from me.
I am pretty sure many past Wave employees would have brought them up to me had they not been asked to sign an NDA in order to get their severance package. I agree it's worth something that Lincoln just said it straightforwardly, though my sense is this only happened because Jeff did something slightly risky under his NDA, by leaking some relevant information (there are not that many places Jeff worked, so him saying he knew about one organization, and having to check for permission, was leaking some decent number of bits, possibly enough to risk a suit if Lincoln wanted to), and me finding this out was sheer luck, and in most worlds I would have never found out.
The variance in different rooms is quite large, but we also did a pretty huge amount of work making rooms nicer. So my guess is it's a mixture of "rooms look a lot more like the ones shown here than when you last visited" and "some rooms do just look very different than this".