Stephen McAleese

Software engineer from Ireland who's interested in EA and AI safety research.

Wiki Contributions


I personally use Toggl to track how much time I spend working per day. I usually aim for at least four hours of focused work per day.

Thanks for the post! I think it does a good job of describing key challenges in AI field-building and funding.

The talent gap section describes a lack of positions in industry organizations and independent research groups such as SERI MATS. However, there doesn't seem to be much content on the state of academic AI safety research groups. So I'd like to emphasize the current and potential importance of academia for doing AI safety research and absorbing talent. The 80,000 Hours AI risk page says that there are several academic groups working on AI safety including the Algorithmic Alignment Group at MIT, CHAI in Berkeley, the NYU Alignment Research Group, and David Krueger's group in Cambridge.

The AI field as a whole is already much larger than the AI safety field so I think analyzing the AI field is useful from a field-building perspective. For example, about 60,000 researchers attended AI conferences worldwide in 2022. There's an excellent report on the state of AI research called Measuring Trends in Artificial Intelligence. The report says that most AI publications come from the 'education' sector which is probably mostly universities. 75% of AI publications come from the education sector and the rest are published by non-profits, industry, and governments. Surprisingly, the top 9 institutions by annual AI publication count are all Chinese universities and MIT is in 10th place. Though the US and industry are still far ahead in 'significant' or state-of-the-art ML systems such as PaLM and GPT-4.

What about the demographics of AI conference attendees? At NeurIPS 2021, the top institutions by publication count were Google, Stanford, MIT, CMU, UC Berkeley, and Microsoft which shows that both industry and academia play a large role in publishing papers at AI conferences.

Another way to get an idea of where people work in the AI field is to find out where AI PhD students go after graduating in the US. The number of AI PhD students going to industry jobs has increased over the past several years and 65% of PhD students now go into industry but 28% still go into academic jobs.

Only a few academic groups seem to be working on AI safety and many of the groups working on it are at highly selective universities but AI safety could become more popular in academia in the near future. And if the breakdown of contributions and demographics of AI safety will be like AI in general, then we should expect academia to play a major role in AI safety in the future. Long-term AI safety may actually be more academic than AI since universities are the largest contributor to basic research whereas industry is the largest contributor to applied research.

So in addition to founding an industry org or facilitating independent research, another path to field-building is to increase the representation of AI safety in academia by founding a new research group though this path may only be tractable for professors.

Thanks for the post. It's great that people are discussing some of the less-frequently discussed potential impacts of AI.

I think a good example to bring up here is video games which seem to have similar risks. 

When you think about it, video games seem just as compelling as AI romantic partners. Many video games such as Call of Duty, Civilization, or League of Legends involve achieving virtual goals, leveling up, and improving skills in a way that's often more fulfilling than real life. Realistic 3D video games have been widespread since the 2000s but I don't think they have negatively impacted society all that much. Though some articles claim that video games are having a significant negative effect on young men.

Personally, I've spent quite a lot of time playing video games during my childhood and teenage years but I mostly stopped playing them once I went to college. But why replace an easy and fun way to achieve things with reality which is usually less rewarding and more frustrating? My answer is that achievements in reality are usually much more real, persistent, and valuable than achievements in video games. You can achieve a lot in video games but it's unlikely that you'll achieve goals that increase your status to as many people over a long period of time as you can in real life.

A relevant quote from the article I linked above:

"After a while I realized that becoming master of a fake world was not worth the dozens of hours a month it was costing me, and with profound regret I stashed my floppy disk of “Civilization” in a box and pushed it deep into my closet. I hope I never get addicted to anything like “Civilization” again."

Similarly, in the near term at least, AI romantic partners could be competitive with real relationships in the short term, but I doubt it will be possible to have AI relationships that are as fulfilling and realistic as a marriage that lasts several decades.

And as with the case of video games, status will probably favour real relationships causing people to value real relationships because they offer more status than virtual ones. One possible reason is that status depends on scarcity. Just as being a real billionaire offers much more status than being a virtual one, having a real high-quality romantic partner will probably yield much more status than a virtual one and as a result, people will be motivated to have real partners.

I agree that the difficulty of the alignment problem can be thought of as a diagonal line on the 2D chart above as you described.

This model may make having two axes instead of one unnecessary. If capabilities and alignment scale together predictably, then high alignment difficulty is associated with high capabilities, and therefore the capabilities axis could be unnecessary.

But I think there's value in having two axes. Another way to think about your AI alignment difficulty scale is like a vertical line in the 2D chart: for a given level of AI capability (e.g. pivotal AGI), there is uncertainty about how hard it would be to align such an AGI because the gradient of the diagonal line intersecting the vertical line is uncertain.

Instead of a single diagonal line, I now think the 2D model describes alignment difficulty in terms of the gradient of the line. An optimistic scenario is one where AI capabilities are scaled and few additional alignment problems arise or existing alignment problems do not become more severe because more capable AIs naturally follow human instructions and learn complex values. A highly optimistic possibility is that increased capabilities and alignment are almost perfectly correlated and arbitrarily capable AIs are no more difficult to align than current systems. Easy worlds correspond to lines in the 2D chart with low gradients and low-gradient lines intersect the vertical line corresponding to the 1D scale at a low point.

A pessimistic scenario can be represented in the chart as a steep line where alignment problems rapidly crop up as capabilities are increased. For example, in such hard worlds, increased capabilities could make deception and self-preservation much more likely to arise in AIs. Problems like goal misgeneralization might persist or worsen even in highly capable systems. Therefore, in hard worlds, AI alignment difficulty increases rapidly with capabilities and increased capabilities do not have helpful side effects such as the formation of natural abstrations that could curtail the increasing difficulty of the AI alignment problem. In hard worlds, since AI capabilities gains cause a rapid increase in alignment difficulty, the only way to ensure that alignment research keeps up with the rapidly increasing difficulty of the alignment problem is to limit progress in AI capabilities.

What you're describing above sounds like an aligned AI and I agree that convergence to the best-possible values over time seems like something an aligned AI would do.

But I think you're mixing up intelligence and values. Sure, maybe an ASI would converge on useful concepts in a way similar to humans. For example, AlphaZero rediscovered some human chess concepts. But because of the orthogonality thesis, intelligence and goals are more or less independent: you can increase the intelligence of a system without its goals changing.

The classic thought experiment illustrating this is Bostrom's paperclip maximizer which continues to value only paperclips even when it becomes superintelligent.

Also, I don't think neuromorphic AI would reliably lead to an aligned AI. Maybe an exact whole-brain emulation of some benevolent human would be aligned but otherwise, a neuromorphic AI could have a wide variety of possible goals and most of them wouldn't be aligned.

I suggest reading The Superintelligent Will to understand these concepts better.

If you don’t know where you’re going, it’s not helpful enough not to go somewhere that’s definitely not where you want to end up; you have to differentiate paths towards the destination from all other paths, or you fail.

I'm not exactly sure what you meant here but I don't think this claim is true in the case of RLHF because, in RLHF, labelers only need to choose which option is better or worse between two possibilities, and these choices are then used to train the reward model. A binary feedback style was chosen specifically because it's usually too difficult for labelers to choose between multiple options.

A similar idea is comparison sorting where the algorithms only need the ability to compare two numbers at a time to sort a list of numbers.

Thanks for the comment.

I think there's a possibility that there could be dangerous emergent dynamics from multiple interacting AIs but I'm not too worried about that problem because I don't think you can increase the capabilities of an AI much simply by running multiple copies of it. You can do more work this way but I don't think you can get qualitatively much better work.

OpenAI created GPT-4 by training a brand new model not by running multiple copies of GPT-3 together. Similarly, although human corporations can achieve more than a single person, I don't consider them to be superintelligent. I'd say GPT-4 is more capable and dangerous than 10 copies of GPT-3.

I think there's more evidence that emergent properties come from within the AI model itself and therefore I'm more worried about bigger models than problems that would occur from running many of them. If we could solve a task using multiple AIs rather than one highly capable AI, I think that would probably be safer and I think that's part of the idea behind iterated amplification and distillation.

There's value in running multiple AIs. For example, OpenAI used multiple AIs to summarize books recursively. But even if we don't run multiple AI models, I think a single AI running at high speed would also be highly valuable. For example, you can paste a long text into GPT-4 today and it will summarize it in less than a minute.

In my opinion, much of the value of interpretability is not related to AI alignment but to AI capabilities evaluations instead.

For example, the Othello paper shows that a transformer trained on the next-word prediction of Othello moves learns a world model of the board rather than just statistics of the training text. This knowledge is useful because it suggests that transformer language models are more capable than they might initially seem.

I highly recommend this interview with Yann LeCun which describes his view on self-driving cars and AGI.

Basically, he thinks that self-driving cars are possible with today's AI but would require immense amounts of engineering (e.g. hard-wired behavior for corner cases) because today's AI (e.g. CNNs) tends to be brittle and lacks an understanding of the world.

My understanding is that Yann thinks we basically need AGI to solve autonomous driving in a reliable and satisfying way because the car would need to understand the world like a human to drive reliably.

Load More