What do you mean by a goodhearting problem, & why is it a lossy compression problem? Are you using "goodhearting" to refer to Goodhart's Law?
I'll preface this by saying that I don't see why it's a problem, for purposes of alignment, for human values to refer to non-existent entities. This should manifest as humans and their AIs wasting some time and energy trying to optimize for things that don't exist, but this seems irrelevant to alignment. If the AI optimizes for the same things that don't exist as humans do, it's still aligned; it isn't going to screw things up any worse than humans do.
But I think it's more important to point out that you're joining the same metaphysical goose chase that has made Western philosophy non-sense since before Plato.
You need to distinguish between the beliefs and values a human has in its brain, and the beliefs & values it expresses to the external world in symbolic language. I think your analysis concerns only the latter. If that's so, you're digging up the old philosophical noumena / phenomena distinction, which itself refers to things that don't exist (noumena).
Noumena are literally ghosts; "soul", "spirit", "ghost", "nature", "essence", and "noumena" are, for practical purposes, synonyms in philosophical parlance. The ghost of a concept is the metaphysical entity which defines what assemblages in the world are and are not instances of that concept.
But at a fine enough level of detail, not only are there no ghosts, there are no automobiles or humans. The Buddhist and post-modernist objections to the idea that language can refer to the real world are that the referents of "automobiles" are not exactly, precisely, unambiguously, unchangingly, completely, reliably specified, in the way Plato and Aristotle thought words should be. I.e., the fact that your body gains and loses atoms all the time means, for these people, that you don't "exist".
Plato, Aristotle, Buddhists, and post-modernists all assumed that the only possible way to refer to the world is for noumena to exist, which they don't. When you talk about "valuing the actual state of the world," you're indulging in the quest for complete and certain knowledge, which requires noumena to exist. You're saying, in your own way, that knowing whether your values are satisfied or optimized requires access to what Kant called the noumenal world. You think that you need to be absolutely, provably correct when you tell an AI that one of two words is better. So those objections apply to your reasoning, which is why all of this seems to you to be a problem.
The general dissolution of this problem is to admit that language always has slack and error. Even direct sensory perception always has slack and error. The rationalist, symbolic approach to AI safety, in which you must specify values in a way that provably does not lead to catastrophic outcomes, is doomed to failure for these reasons, which are the same reasons that the rationalist, symbolic approach to AI was doomed to failure (as almost everyone now admits). These reasons include the fact that claims about the real world are inherently unprovable, which has been well-accepted by philosophers since Kant's Critique of Pure Reason.
That's why continental philosophy is batshit crazy today. They admitted that facts about the real world are unprovable, but still made the childish demand for absolute certainty about their beliefs. So, starting with Hegel, they invented new fantasy worlds for our physical world to depend on, all pretty much of the same type as Plato's or Christianity's, except instead of "Form" or "Spirit", their fantasy worlds are founded on thought (Berkeley), sense perceptions (phenomenologists), "being" (Heidegger), music, or art.
The only possible approach to AI safety is one that depends not on proofs using symbolic representations, but on connectionist methods for linking mental concepts to the hugely-complicated structures of correlations in sense perceptions which those concepts represent, as in deep learning. You could, perhaps, then construct statistical proofs that rely on the over-determination of mental concepts to show almost-certain convergence between the mental languages of two different intelligent agents operating in the same world. (More likely, the meanings which two agents give to the same words don't necessarily converge, but agreement on the probability estimates given to propositions expressed using those same words will converge.)
Fortunately, all mental concepts are over-determined. That is, we can't learn concepts unless the relevant sense data that we've sensed contains much more information than do the concepts we learned. That comes automatically from what learning algorithms do. Any algorithm which constructed concepts that contained more information than was in the sense data, would be a terrible, dysfunctional algorithm.
You are still not going to get a proof that two agents interpret all sentences exactly the same way. But you might be able to get a proof which shows that catastrophic divergence is likely to happen less than once in a hundred years, which would be good enough for now.
Perhaps what I'm saying will be more understandable if I talk about your case of ghosts. Whether or not ghosts "exist", something exists in the brain of a human who says "ghost". That something is a mental structure, which is either ultimately grounded in correlations between various sensory perceptions, or is ungrounded. So the real problem isn't whether ghosts "exist"; it's whether the concept "ghost" is grounded, meaning that the thinker defines ghosts in some way that relates them to correlations in sense perceptions. A person who thinks ghosts fly, moan, and are translucent white with fuzzy borders, has a grounded concept of ghost. A person who says "ghost" and means "soul" has an ungrounded concept of ghost.
Ungrounded concepts are a kind of noise or error in a representational system. Ungrounded concepts give rise to other ungrounded concepts, as "soul" gave rise to things like "purity", "perfection", and "holiness". I think it highly probable that grounded concepts suppress ungrounded concepts, because all the grounded concepts usually provide evidence for the correctness of the other grounded concepts. So probably sane humans using statistical proofs don't have to worry much about whether every last concept of theirs is grounded, but as the number of ungrounded concepts increases, there is a tipping point beyond which the ungrounded concepts can be forged into a self-consistent but psychotic system such as Platonism, Catholicism, or post-modernism, at which point they suppress the grounded concepts.
Sorry that I'm not taking the time to express these things clearly. I don't have the time today, but I thought it was important to point out that this post is diving back into the 19th-century continental grappling with Kant, with the same basic presupposition that led 19th-century continental philosophers to madness. TL;DR: AI safety can't rely on proving statements made in human or other symbolic languages to be True or False, nor on having complete knowledge about the world.
When you write of A belief in human agency, it's important to distinguish between the different conceptions of human agency on offer, corresponding to the 3 main political groups:
Someone who wants us united under a document written by desert nomads 3000 years ago, or someone who wants the government to force their "solutions" down our throats and keep forcing them no matter how many people die, would also say they believe in human agency; but they don't want private individuals to have agency.
This is a difficult but critical point. Big progressive projects, like flooding desert basins, must be collective. But movements that focus on collective agency inevitably embrace, if only subconsciously, the notion of a collective soul. This already happened to us in 2010, when a large part of the New Atheist movement split off and joined the Social Justice movement, and quickly came to hate free speech, free markets, and free thought.
I think it's obvious that the enormous improvements in material living standards in the last ~200 years you wrote of was caused by the Enlightenment, and can be summarized as the understanding of how liberating individuals leads to economic and social progress. Whereas modernist attempts to deliberately cause economic and social progress are usually top-down and require suppressing individuals, and so cause the reverse of what they intend. This is the great trap that we must not fall into, and it hinges on our conception of human agency.
A great step forward, or backwards (towards Athens), was made by the founders of America when they created a nation based in part on the idea of competition and compromise as being good rather than bad, basically by applying Adam Smith's invisible hand to both economics and politics. One way forward is to understand how to do large projects that have a noble purpose. That is, progressive capitalism. Another way would be to understand how governments have sometimes managed to do great things, like NASA's Apollo project, without them degenerating into economic and social disasters like Stalin's or Mao's 5-Year-Plans. Either way, how you conceptualize human agency will be a decisive factor in whether you produce heaven or hell.
I think it would be more-graceful of you to just admit that it is possible that there may be more than one reason for people to be in terror of the end of the world, and likewise qualify your other claims to certainty and universality.
That's the main point of what gjm wrote. I'm sympathetic to the view you're trying to communicate, Valentine; but you used words that claim that what you say is absolute, immutable truth, and that's the worst mind-killer of all. Everything you wrote just above seems to me to be just equivocation trying to deny that technical yet critical point.
I understand that you think that's just a quibble, but it really, really isn't. Claiming privileged access to absolute truth on LessWrong is like using the N-word in a speech to the NAACP. It would do no harm to what you wanted to say to use phrases like "many people" or even "most people" instead of the implicit "all people", and it would eliminate a lot of pushback.
I say that knowing particular kinds of math, the kind that let you model the world more-precisely, and that give you a theory of error, isn't like knowing another language. It's like knowing language at all. Learning these types of math gives you as much of an effective intelligence boost over people who don't, as learning a spoken language gives you above people who don't know any language (e.g., many deaf-mutes in earlier times).
The kinds of math I mean include:
These things are what I call the correct Platonic forms. The Platonic forms were meant to be perfect models for things found on earth. These kinds of math actually are. The concept of "perfect" actually makes sense for them, as opposed to for Earthly categories like "human", "justice", etc., for which believing that the concept of "perfect" is coherent demonstrably drives people insane and causes them to come up with things like Christianity.
They are, however, like Aristotle's Forms, in that the universals have no existence on their own, but are (like the circle , but even more like the normal distribution ) perfect models which arise from the accumulation of endless imperfect instantiations of them.
There are plenty of important questions that are beyond the capability of the unaided human mind to ever answer, yet which are simple to give correct statistical answers to once you know how to gather data and do a multiple regression. Also, the use of these mathematical techniques will force you to phrase the answer sensibly, e.g., "We cannot reject the hypothesis that the average homicide rate under strict gun control and liberal gun control are the same with more than 60% confidence" rather than "Gun control is good."
Agree. Though I don't think Turing ever intended that test to be used. I think what he wanted to accomplish with his paper was to operationalize "intelligence". When he published it, if you asked somebody "Could a computer be intelligent?", they'd have responded with a religious argument about it not having a soul, or free will, or consciousness. Turing sneakily got people to look past their metaphysics, and ask the question in terms of the computer program's behavior. THAT was what was significant about that paper.
It's a great question. I'm sure I've read something about that, possibly in some pop book like Thinking, Fast & Slow. What I read was an evaluation of the relationship of IQ to wealth, and the takeaway was that your economic success depends more on the average IQ in your country than it does on your personal IQ. It may have been an entire book rather than an article.
Google turns up this 2010 study from Science. The summaries you'll see there are sharply self-contradictory.
First comes an unexplained box called "The Meeting of Minds", which I'm guessing is an editorial commentary on the article, and it says, "The primary contributors to c appear to be the g factors of the group members, along with a propensity toward social sensitivity."
Next is the article's abstract, which says, "This “c factor” is not strongly correlated with the average or maximum individual intelligence of group members but is correlated with the average social sensitivity of group members, the equality in distribution of conversational turn-taking, and the proportion of females in the group."
These summaries directly contradict each other: Is g a primary contributor, or not a contributor at all?
I'm guessing the study of group IQ is strongly politically biased, with Hegelians (both "right" and "left") and other communitarians, wanting to show that individual IQs are unimportant, and individualists and free-market economists wanting to show that they're important.
But what makes you so confident that it's not possible for subject-matter experts to have correct intuitions that outpace their ability to articulate legible explanations to others?
That's irrelevant, because what Richard wrote was a truism. An Eliezer who understands his own confidence in his ideas will "always" be better at inspiring confidence in those ideas in others. Richard's statement leads to a conclusion of import (Eliezer should develop arguments to defend his intuitions) precisely because it's correct whether Eliezer's intuitions are correct or incorrect.
The way to dig the bottom deeper today is to get government bailouts, like bailing out companies or lenders, and like Biden's recent tuition debt repayment bill. Bailouts are especially perverse because they give people who get into debt a competitive advantage over people who don't, in an unpredictable manner that encourages people to see taking out a loan as a lottery ticket.