You can get a complementary analysis by comparing the US to its past self. Incarceration rate, homicide rate. Between 1975 and 2000, the incarceration rate grew five-fold while the homicide rate fell by half.
Bit of a tangent, but while we might plausibly run out of cheap oil in the near future, the supply of expensive, unconventional oil is vast. By vast I mean 'several trillion barrels of known reserves', against an annual consumption of 30bn.
Question is just how much of those reserves are accessible at each price point. This is really hard to answer well, so instead here's an anecdote that'll stick in your head: recent prices ($50-$100/bbl) are sufficient that the US is now the largest producer of oil in the world, and a net exporter to boot.
For what it's worth, this whole unconventional oil thing has appeared from nowhere the last ten years, and it's been a shock to a lot of people.
Thanks for the feedback! I've cleaned up the constraints section a bit, though it's still less coherent than the first section.
Out of curiosity, what was it that convinced you this isn't an infohazard-like risk?
While you're here and chatting about D.5 (assume you meant 5), another tiny thing that confuses me - Figure 21. Am I right in reading the bottom two lines as 'seeing 255 tokens and predicting the 256th is exactly as difficult as seeing 1023 tokens and predicting the 1024th'?
e: Another look and I realise Fig 20 shows things much more clearly - never mind, things continue to get easier with token index.
Though it's not mentioned in the paper, I feel like this could be because the scaling analysis was done on 1024-token sequences. Maybe longer sequences can go further.
It's indeed strange no-one else has picked up on this, which makes me feel I'm misunderstanding something. The breakdown suggested in the scaling law does imply that this specific architecture doesn't have much further to go. Whether the limitation is in something as fundamental as 'the information content of language itself', or if it's a more-easily bypassed 'the information content of 1024-token strings', is unclear.
My instinct is for the latter, though again by the way no-one else has mentioned it - even the paper authors - I get the uncomfortable feeling I'm misunderstanding something. That said, being able to write that quote a few days ago and since have no-one pull me up on it has increased my confidence that it's a viable interpretation.
'Why the hell has our competitor got this transformative capability that we don't?' is not a hard thought to have, especially among tech executives. I would be very surprised if there wasn't a running battle over long-term perspectives on AI in the C-suite of both Google Brain and DeepMind.
If you do want to think along these lines though, the bigger question for me is why OpenAI released the API now, and gave concrete warning of the transformative capabilities they intend to deploy in six? twelve? months' time. 'Why the hell has our competitor got this transformative capability that we don't?' is not a hard thought now, but it that's largely because the API was a piece of compelling evidence thrust in all of our faces.
Maybe they didn't expect it to latch into the dev-community consciousness like it has, or for it to be quite as compelling a piece of evidence as it's turned out to be. Maybe it just seemed like a cool thing to do and in-line with their culture. Maybe it's an investor demo for how things will be monetised in future, which will enable the $10bn punt they need to keep abreast of Google.
hey man wanna watch this language model drive my car
Thinking about this a bit more, do you have any insight on Tesla? I can believe that it's outside DM and GB's culture to run with the scaling hypothesis, but watching Karpathy's presentations (which I think is the only public information on their AI program?) I get the sense they're well beyond $10m/run by now. Considering that self-driving is still not there - and once upon a time I'd have expected driving to be easier than Harry Potter parodies - it suggests that language is special in some way. Information density? Rich, diff'able reward signal?
I'd say it's at least 30% likely that's the case! But if you believe that, you'd be pants-on-head loony not to drop a billion on the 'residual' 70% chance that you'll be first to market on a world-changing trillion-dollar technology. VCs would sacrifice their firstborn for that kind of deal.
Entirely seriously: I can never decide whether the drunkard's search is a parable about the wisdom in looking under the streetlight, or the wisdom of hunting around in the dark.