Yes, check eg https://www.lesswrong.com/posts/H5iGhDhQBtoDpCBZ2/announcing-the-alignment-of-complex-systems-research-group or https://ai.objectives.institute/ or also partially https://www.pibbss.ai/
You won't find much of this on LessWrong, due to LW being an unfavorable environment for this line of thinking.
It's probably worth noting you seem to be empirically wrong: I'm pretty confident I'd be able to do >half of human jobs, with maybe ~3 weeks of training, if I was able to understand all human languages (obviously not in parallel!) Many others here would be able to do the same.
The criterion is not as hard as it seems, because there are many jobs like cashiers or administratrative workers or assembly line workers which are not that hard to learn.
It's probably worth noting that I take the opposite update from the covid crisis: it was much easier to get governments listen to us and do marginally more sensible things than expected. With better preparation and larger resources, it would have been possible to cause order of magnitude more sensible things to happen. Also it's worth noting some governments were highly sensible and agentic about covid
Similary to johnswentworth: My current impression is core alignment problems are the same and manifest at all levels - often sub-human version just looks like a toy version of the scaled-up problem, and the main difference is, in the sub-human version problem, you can often solve it for practical purposes by plugging in human at some strategic spot. (While I don't think there are deep differences in the alignment problem space, I do think there are differences in the "alignment solutions" space, where you can use non-scalable solutions, or in risk space, where dangers being small due to the systems being stupid.)
I'm also unconvinced about some of practical claims about differences for wildly superintelligent systems.
One crucial concern related to "what people want" is this seems underdefined, un-stable in interactions with wildly superintelligent systems, and prone to problems with scaling of values within systems where intelligence increases. By this line of reasoning, if the wildly superintelligent system is able to answer me these sort of questions "in a way I want", it very likely must be already aligned. So it feels like part of the worries was assumed away. Paraphrasing the questions about human values again, one may ask "how did you get to the state where you have this aligned wildly superintelligent system which is able to answer questions about human values, as opposed to e.g. overwriting what humans believe about themselves by it's own non-human-aligned values?".
Ability to understand itself seems a special case of competence: I can imagine systems which are wildly superhuman in their ability to understand the rest of the world, but pretty mediocre at understanding themselves, e.g. due to some problems with recursion, self-references, reflections, or different kinds of computations being used at various levels of reasoning. As a result, it seems unclear whether the ability to clearly understand itself is a feature of all wildly super-human systems. (Toy counterexample: imagine a device which would connect someone in ancient Greece with our modern civilization, and our civilization dedicating about 10% of global GDP to answering questions from this guy. I would argue this device is for most practical purposes wildly superhuman compared to this individual guy in Greece, but at the same time bad at understanding itself)
Fundamentally inscrutable thoughts seems like something which you can study with present day systems as toy models. E.g., why does AlphaZero believe something is a good go move? Why does a go grand-master believe something is a good move? What counts as a 'true explanation'? Who is the recipient of the explanation? Are you happy with explanation of the algorithm like 'upon playing myriad games, my general functional approximator is approximating the expected value of this branch of an unimaginably large choice tree is larger than for other branches?'? If yes, why? If no, why not?
Inscrutable influence-seeking plans seem also a present problem. Eg, if there are already some complex influence-seeking patterns now, how would we notice?
Getting oriented fast in complex/messy real world situations in fields in which you are not an expert
Actually trying to change something in the world where the system you are interacting with has significant level of complexity & somewhat fast feedback loop (&it's not super-high-stakes)
One thing I'm a bit worried about in some versions of LW rationality & someone should write a post about is something like ... 'missing opportunities to actually fight in non-super-high-stakes matters'', in the martial arts metaphor.
I like the metaphor!
Just wanted to note: in my view the original LW Sequences are not functional as a stand-alone upgrade for almost any human mind, and you can empirically observe it: You can think about any LW meet-up group around the world as an experiment, and I think to a first approximation it's fair to say aspiring Rationalists running just on the Sequences do not win, and good stuff coming out of the rationalist community was critically dependent of presence of minds Eliezer & others. (This is not say Sequences are not useful in many ways)
I basically agree with Vanessa:
the correct rule is almost always: first think about the problem yourself, then go read everything about it that other people did, and then do a synthesis of everything you learned inside your mind.
Thinking about the problem myself first often helps me understand existing work as it is easier to see the motivations, and solving solved problems is good as a training.
I would argue this is the case even in physics and math. (My background is in theoretical physics and during my high-school years I took some pride in not remembering physics and re-deriving everything when needed. It stopped being a good approach for physics ca since 1940 and somewhat backfired.)
The mistake members of "this community" (LW/rationality/AI safety) are sometimes making is skipping the second step / bouncing off the second step if it is actually hard.
Second mistake is not doing the third step in a proper way, which leads to somewhat strange and insular culture which may be repulsive for external experts. (E.g. people partially crediting themselves for discoveries which are know to outsiders)
Epistemic status: Wild guesses based on reading del Guidice's Evolutionary psychopathology and two papers trying to explain autism in terms of predictive processing. Still maybe better than the "tower hypothesis"
0. Let's think in terms of two parametric model, where one parameter tunes something like capacity of the brain, which can be damaged due to mutations, disease, etc., and the other parameter is explained bellow.
1. Some of the genes that increase risk of autism tune some parameter of how sensory prediction is handled, specifically, making the system to expect higher precision from sensory inputs/being less adaptive about it. (lets call it parameter p)
2. Several hypothesis - Mildly increased p sounds like something which should be somewhat correlated with increased learning / higher intelligence;
3. But note: tune it up even more, and the system starts to break; too much weight is put on sensory experience, "normal world experience" becomes too surprising which leads to seek more repetitive behaviours and highly predictable environments. In the abstract, it becomes difficult to handle fluidity, rules which are vague and changing,...
4. In the two-parameter space of capacity and something like surprisal handling, this creates a picture like this
Parts of the o.p. can be reinterpreted as
Even if this is simple, it makes some predictions (in the sense that the results are likely already somewhere in the literature, just I don't know whether this is true or not when writing this)
With a map of brains/minds into two dimensional space it is a priori obvious that it will fail in explaining the original high-dimensional problem, in many ways; many other dimensions are not orthogonal but actually "project" to the space (e.g. something like "brain masculinisation" has nonzero projection on p), there are various regulatory system like g means better ability to compensate via metacognition, or social support.
(For example, if subagents are assigned credit based on who's active when actual reward is received, that's going to be incredibly myopic -- subagents who have long-term plans for achieving better reward through delayed gratification can be undercut by greedily shortsighted agents, because the credit assignment doesn't reward you for things that happen later; much like political terms of office making long-term policy difficult.)
it seems to me you have in mind a different model than me (sorry if my description was confusing). In my view, you have the world-modelling, "preference aggregation" and action generation done by the "predictive processing engine". The "subagenty" parts basically extract evolutionary relevant features of this (like:hunger level), and insert error signals not only about the current state, but about future plans. (Like: if the p.p. would be planning a trajectory which is harmful to the subagent, it would insert the error signal.).
Overall your first part seems to assume more something like reinforcement learning where parts are assigned credit for good planning. I would expect the opposite: one planning process which is "rewarded" by a committee.
parsimonious theory which matched observations well
With parsimony... predictive processing in my opinion explains a lot for a relatively simple and elegant model. On the theory side it's for example
On the how do things feel for humans from the inside, for example
On the neuroscience side
I don't think predictive processing should try to explain all about humans. In one direction, animals are running on predictive processing as well, but are missing some crucial ingredient. In the opposite direction, simpler organisms had older control systems (eg hormones),we have them as well, and p.p. must be in some sense be stacked on top of that.
+1 I was really really upset safe.ai decided to use an established acronym for something very different