Concept Safety
Multiagent Models of Mind
Keith Stanovich: What Intelligence Tests Miss


When can Fiction Change the World?

I don't think that "we manage to find a smart way to avoid a disaster, though we almost lose anyway" implies "being smart automatically means that we win".

nostalgebraist: Recursive Goodhart's Law

Could you elaborate on that? The two posts seem to be talking about different things as far as I can tell: e.g. nostalgebraist doesn't say anything about the Optimizer's Curse, whereas your post relies on it.

I do see that there are a few paragraphs that seem to reach similar conclusions (both say that overly aggressive optimization of any target is bad), but the reasoning used for reaching that conclusion seems different.

(By the way, I don't quite get your efficiency example? I interpret it as saying that you spent a lot of time and effort on optimizations that didn't pay themselves back. I guess you might mean something like "I had a biased estimate of how much time my optimizations would save, so I chose expensive optimizations that turned out to be less effective than I thought." But the example already suggests that you knew beforehand that the time saved would be on the order of a minute or so, so I'm not sure how the example is about Goodhart's Curse.)

nostalgebraist: Recursive Goodhart's Law

But "COVID-19 cases decreasing" is probably not your ultimate goal: more likely, it's an instrumental goal for something like "prevent humans from dying" or "help society" or whatever... in other words, it's a proxy for some other value. And if you walk back the chain of goals enough, you are likely to arrive at something that isn't well defined anymore.

The two-layer model of human values, and problems with synthesizing preferences

Good question. I think that at least some approaches to no-self do break down the mechanisms by which the appearance of a character is maintained, but the extent to which it actually gives insight to the nature of the player (as opposed to giving insight to the non-existence of the character) is unclear to me.

When can Fiction Change the World?

Nice post!

Related excerpt from Misinformation and Its Correction: Continued Influence and Successful Debiasing on people's tendency to pick up beliefs from fiction (note that this is a pre-replication crisis social psychology paper, so take it with a grain of salt):

A related but perhaps more surprising source of misinformation is literary fiction. People extract knowledge even from sources that are explicitly identified as fictional. This process is often adaptive, because fiction frequently contains valid information about the world. For example, non-Americans’ knowledge of U.S. traditions, sports, climate, and geography partly stems from movies and novels, and many Americans know from movies that Britain and Australia have left-hand traffic. By definition, however, fiction writers are not obliged to stick to the facts, which creates an avenue for the spread of misinformation, even by stories that are explicitly identified as fictional. A study by Marsh, Meade, and Roediger (2003) showed that people relied on misinformation acquired from clearly fictitious stories to respond to later quiz questions, even when these pieces of misinformation contradicted common knowledge. In most cases, source attribution was intact, so people were aware that their answers to the quiz questions were based on information from the stories, but reading the stories also increased people’s illusory belief of prior knowledge. In other words, encountering misinformation in a fictional context led people to assume they had known it all along and to integrate this misinformation with their prior knowledge (Marsh & Fazio, 2006; Marsh et al., 2003).

The effects of fictional misinformation have been shown to be stable and difficult to eliminate. Marsh and Fazio (2006) reported that prior warnings were ineffective in reducing the acquisition of misinformation from fiction, and that acquisition was only reduced (not eliminated) under conditions of active on-line monitoring—when participants were instructed to actively monitor the contents of what they were reading and to press a key every time they encountered a piece of misinformation (see also Eslick, Fazio, & Marsh, 2011). Few people would be so alert and mindful when reading fiction for enjoyment. These links between fiction and incorrect knowledge are particularly concerning when popular fiction pretends to accurately portray science but fails to do so, as was the case with Michael Crichton’s novel State of Fear. The novel misrepresented the science of global climate change but was nevertheless introduced as “scientific” evidence into a U.S. Senate committee (Allen, 2005; Leggett, 2005)
Mesa-Search vs Mesa-Control

It sounds a bit absurd: you've already implemented a sophisticated RL algorithm, which keeps track of value estimates for states and actions, and propagates these value estimates to steer actions toward future value. Why would the learning process re-implement a scheme like that, nested inside of the one you implemented? Why wouldn't it just focus on filling in the values accurately?

I've thought of two possible reasons so far.

  1. Perhaps your outer RL algorithm is getting very sparse rewards, and so does not learn very fast. The inner RL could implement its own reward function, which gives faster feedback and therefore accelerates learning. This is closer to the story in Evan's mesa-optimization post, just replacing search with RL.
  2. More likely perhaps (based on my understanding), the outer RL algorithm has a learning rate that might be too slow, or is not sufficiently adaptive to the situation. The inner RL algorithm adjusts its learning rate to improve performance.

Possibly obvious, but just to point it out: both of these seem like they also describe the case of genetic evolution vs. brains.

Matt Botvinick on the spontaneous emergence of learning algorithms

Good point, I wasn't thinking of social effects changing the incentive landscape.

Matt Botvinick on the spontaneous emergence of learning algorithms

That seems like a reasonable paraphrase, at least if you include the qualification that the "quickly" is relative to the amount of structure that the inner layer has accumulated, so might not actually happen quickly enough to be useful in all cases.

For example, it seems plausible to me that the inner layer might come to optimize for its proxy estimations of outer reward more than for outer reward itself, and that those two things could become decoupled.

Sure, e.g. lots of exotic sexual fetishes look like that to me. Hmm, though actually that example makes me rethink the argument that you just paraphrased, given that those generally emerge early in an individual's life and then generally don't get "corrected".

Does crime explain the exceptional US incarceration rate?
Related to this is that so far we’ve basically taken the homicide rate as exogenous, but of course there’s reverse causality. Having a large chunk of the population in prison will affect the murder rate. [...] Another way out for them is that maybe all the countries with similar homicide rates should imprison people as much as the US, but their institutions don’t function well enough.

Note that some people make the reverse argument: that a high imprisonment rate makes things worse, especially if the sentences are long and prison conditions are harsh and tending towards punishment rather than rehabilitation. People in prison end up socialized into interacting with other prisoners, which gets first-timers into a stronger criminal mindset. Once they get out, they might not have many opportunities available other than going back into crime.

At least this article notes that e.g. Finland has a low incarceration rate as well as a low recidivism rate, though the report that it cites for the recidivism figure explicitly concludes that the rates are not directly comparable between countries, so take that with a grain of salt.

Building up to an Internal Family Systems model

Happy to hear that the post was useful to you!

After identifying a part that I want to work with, I immediately intellectualize that part and build a predictive model of what the part may possibly respond to some inquiries that I have in mind

First piece of advice: don't do that. :-) I feel pretty comfortable saying that this approach is guaranteed not to produce any results. Intellectualizing parts will basically only give you the kind of information that you could produce by intellectual analysis, and for intellectual analysis you don't need IFS in the first place. Even if your guesses are right, they will not produce the kind of emotional activation that's necessary for change.

A few thoughts on what to do instead...

Is Procrastination its own part? Maybe so. I'll give him a character. I had a roommate ("John") who had a lot of issues with procrastination, so his visual image feels appropriate.

It sounds (correct me if I'm wrong) like you are giving the part a visual appearance by thinking of the nature of the problem, and choosing an image which seems suitably symbolic of it; then you try to interact with that image.

In that case, you are picking a mental image, but the image isn't really "connected" to the part, so the interaction is not going to work. What you want to do is to first get into contact with the part, and then let a visual image emerge on its own. (An important note: parts don't have to have a visual appearance! I expect that one could do IFS even if one had aphantasia. If you try to get a visual appearance and nothing comes up, don't force it, just work with what you do have.)

So I would suggest doing something like this:

  • Think of some concrete situation in which you usually procrastinate. If you have a clear memory of a particular time, let that memory come back to mind. Or you could imagine that you are about to do something that you've usually been procrastinating on. Or you could just pick something that you've been procrastinating on and try doing it right now, to get that procrastination response.
  • Either way, what you are going for are the kinds of thoughts, feelings, and bodily sensations that are normally associated with you procrastinating. Pay particular attention to any sensations in your body. Whatever it is that you are experiencing, try describing it out loud. For example: "when I think of working on my project, I get an unpleasant feeling in my back... it's a kind of nervous energy. And when I try to focus my thoughts on what I'm supposed to do, I... my attention just keeps kind of sliding off to something else."
    • The ellipses in that example are to highlight that there's no rush. Take your time settling into the sensations. Often, if you start with a coarse description, such as "an unpleasant feeling", you might get more details if you just keep your attention on it and see whether you could describe it more precisely: "... it's a kind of nervous energy".
    • You're not thinking about parts yet. You're just imagining yourself in a situation and then describing whatever sensations and thoughts are coming up.
    • If you find yourself describing everything very quickly, you are probably not paying attention to the actual sensations. If you find yourself pausing, looking for the right word, finding a word that's almost it but having an even better one lurking on the tip of your tongue... then you're much more likely doing it right.
    • Sometimes you don't get bodily sensations, but you might get various thoughts, mental images, or desires. That's fine too. Describe them in a similar way.
    • If you find yourself being too impatient to do this properly, working with a friend whose only job is to sit there and listen often helps. You can think of yourself as doing your best to communicate the experience to your friend.
  • Once you have a good handle on the sensations, you can let your attention rest on them and ask yourself, "if these sensations had a visual appearance, what would it be?".
    • Don't try to actively come up with an answer. Just keep your attention on the sensations, ask yourself the question, and see if any visual image emerges on its own. If you get a sense of something but it's vague, you can try saying a few words of what you do manage to make out and see if that brings out additional details.
    • "Ask yourself" here doesn't mean that you would need to address any external entity, or do anything else special. Rather, just... kind of let your mind wonder about the question, and see if any answer emerges.
    • The image doesn't need to look like anything in particular. It doesn't need to be a human, or even a living being. Though it can be! But it can be a swirling silver vortex, or a wooden duck, or whatever feels right.
    • If no visual image emerges, don't sweat it, and don't try to force one. Just stay with the sensations.
  • At this point, you can see if you could give this bundle of sensations and (maybe) images a name. Again, don't think about it too intellectually, just see if there would be anything that fits your experience. If you had a nervous energy in your back, maybe it's called "nervousness". If the mental image you got was of a swirling silver vortex in your back, maybe it's "silver vortex".
  • Now you can start doing things like seeing if you could communicate with this part, check how you feel towards it, etc.
    • When you are asking the part questions, its answers don't need to actually be any kind of mental speech. For instance, if you ask it what it is trying to do, you might get a vague intuition, a flash of memory, or a mental image. The answer might feel cryptic at first. If so, you can again describe it out loud, and wait to see if more details emerge.
      • If you think you have a hunch of what it's about, you can try asking the part whether you've understood correctly. Asking verbally is one way, but you can also just kind of... hold up your current understanding against the part, and see whether you get a sense of it resonating.
        • If the part tells you that you did understand it correctly, you can then use the same approach to ask it whether you've understood everything about this, or whether there are still more pieces that you are missing.
      • Generally avoid the temptation to go into intellectual analysis to figure out what this is about. (You can ask any intellectualizing parts to move aside.) Often there's an emotional logic which will make sense in retrospect, but which is impossible to figure out on an intellectual level beforehand. If you - say - get a particular memory which you recognize but don't understand how it's related to this topic, just stay with the memory, maybe describe it out loud, and see whether more details would emerge.
      • It's okay if you don't figure it out during one session. Let your brain process it.
  • You might arrive at something like a "classic IFS" situation, where a part has a distinct anthropomorphic appearance and you are literally having a conversation with it. Or your parts might be nothing like this, and be just a bundle of sensations whose "answers" consist of more sensations and memories coming to your mind. Either one is fine.
  • Throughout the process, the main thing is to work with that which comes naturally, and not try to force anything. (If you do feel a desire to force things into a particular shape, or guide the process to happen in a particular way, that's a part. See what it's trying to do and whether it would be willing to move aside.)
Load More