Wiki Contributions


I'm curious if anyone made a serious attempt at the shovel-ready math here and/or whether this approach to counterfactuals still looks promising to Abram? (Or anyone else with takes.)

GPT-3- a text-generating language model.

PaLM-540B- a stunningly powerful question-answering language model.

Great Palm- A hypothetical language model that combines the powers of GPT-3 and PaLM-540B.

I would've thought that palm was better at text generation then gpt-3 by default. They're both pretrained on internet next-word prediction and palm is bigger with more data. What makes you think GPT-3 is better at text generation?

Interesting, thanks! To check my understanding:

  • In general, as time passes, all the researcheres increase their compute usage at a similar rate. This makes it hard to distinguish between improvements caused by compute and algorithmic progress.
  • If the correlation between year and compute was perfect, we wouldn't be able to do this at all.
  • But there is some variance in how much compute is used in different papers, each year. This variance is large enough that we can estimate the first-order effects of algorithmic progress and compute usage.
  • But complementarity is a second-order effect, and the data doesn't contain enough variation/data-points to give a good estimate of second-order effects.

Thanks for this!

Question: Do you have a sense of how strongly compute and algorithms are complements vs substitutes in this dataset?

(E.g. if you compare compute X in 2022, compute (k^2)X in 2020, and kX in 2021: if there's a k such that the last one is better than both the former two, that would suggest complementarity)

I'm curious how much of a concern you think this is, now, 1 year later. I haven't heard the "total number of mRNA shots (for any disease)"-concern from other places, and I'm wondering if that's for good reasons.

Competence does not seem to aggressively overwhelm other advantages in humans: 


g. One might counter-counter-argue that humans are very similar to one another in capability, so even if intelligence matters much more than other traits, you won’t see that by looking at  the near-identical humans. This does not seem to be true. Often at least, the difference in performance between mediocre human performance and top level human performance is large, relative to the space below, iirc. For instance, in chess, the Elo difference between the best and worst players is about 2000, whereas the difference between the amateur play and random play is maybe 400-2800 (if you accept Chess StackExchange guesses as a reasonable proxy for the truth here).

The usage of capabilities/competence is inconsistent here. In points a-f, you argue that general intelligence doesn't aggressively overwhelm other advantages in humans. But in point g, the ELO difference between the best and worst players is less determined by general intelligence than by how much practice people have had.

If we instead consistently talk about domain-relevant skills: In the real world, we do see huge advantages from having domain-specific skills. E.g. I expect elected representatives to be vastly better at politics than medium humans.

If we instead consistently talk about general intelligence: The chess data doesn't falsify the hypothesis that human-level variation in general intelligence is small. To gather data about that, we'd want to analyse the ELO-difference between humans who have practiced similarly much but who have very different g.

(There are some papers on the correlation between intelligence and chess performance, so maybe you could get the relevant data from there. E.g. this paper says that (not controlling for anything) most measurements of cognitive ability correlates with chess performance at about ~0.24 (including IQ iff you exclude a weird outlier where the correlation was -0.51).)

Another fairly common argument and motivation at OpenAI in the early days was the risk of "hardware overhang," that slower development of AI would result in building AI with less hardware at a time when they can be more explosively scaled up with massively disruptive consequences. I think that in hindsight this effect seems like it was real, and I would guess that it is larger than the entire positive impact of the additional direct work that would be done by the AI safety community if AI progress had been slower 5 years ago.

Could you clarify this bit? It sounds like you're saying that OpenAI's capabilities work around 2017 was net-positive for reducing misalignment risk, even if the only positive we count is this effect. (Unless you think that there's substantial reason that acceleration is bad other than giving the AI safety community less time.) But then in the next paragraph you say that this argument was wrong (even before GPT-3 was released, which vaguely gestures at the "around 2017"-time). I don't see how those are compatible.

(If 1 firing = 1 bit, that should be 34 megabit ~= 4 megabyte.)

This random article (which I haven't fact-checked in the least) claims a bandwidth of 8.75 megabit ~= 1 megabyte. So that's like 2.5 OOMs higher than the number I claimed for chinchilla. So yeah, it does seem like humans get more raw data.

(But I still suspect that chinchilla gets more data if you adjust for (un)interestingness. Where totally random data and easily predictable/compressible data are interesting, and data that is hard-but-possible to predict/compress is interesting.)

Load More