Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
I intend to use my shortform feed for two purposes:
1. To post thoughts that I think are worth sharing that I can then reference in the future in order to explain some belief or opinion I have.
2. To post half-finished thoughts about the math or computer science thing I'm learning at the moment. These might be slightly boring and for that I apologize.
There is a large set of people who went around, and are still are going around, telling people that "The coronavirus is nothing to worry about" despite the fact that robust evidence has existed for about a month that this virus could result in a global disaster. (Don't believe me? I wrote a post a month ago about it).
So many people have bought into the "Don't worry about it" syndrome as a case of pretending to be wise, that I have become more pessimistic about humanity correctly responding to global catastrophic risks in the future. I too used to be one of those people who assumed that the default mode of thinking for an event like this was panic, but I'm starting to think that the real default mode is actually high status people going around saying, "Let's not be like that ambiguous group over there panicking."
Now that the stock market has plummeted, from what my perspective appeared entirely predictable given my inside view information, I am also starting to doubt the efficiency of the stock market in response to historically unprecedented events. And this outbreak could be even worse than even some of the most doomy media headlin... (read more)
Just this Monday evening, a professor at the local medical school emailed someone I know, "I'm sorry you're so worried about the coronavirus. It seems much less worrying than the flu to me." (He specializes in rehabilitation medicine, but still!) Pretending to be wise seems right to me, or another way to look at it is through the lens of signaling and counter-signaling:
Here's another example, which has actually happened 3 times to me already:
I think the main reason is that the social dynamic is probably favorable to them in the longrun. I worry that there is a higher social risk to being alarmist than being calm. Let me try to illustrate one scenario:
My current estimate is that there is only 15 - 20% probability of a global disaster (>50 million deaths within 1 year) mostly because the case fatality rate could be much lower than the currently reported rate, and previous illnesses like the swine flu became looking much less serious after more data came out. [ETA: I did a lot more research. I think it's now like 5% risk of this.]
Let's say that the case fatality rate turns out to be 0.3% or something, and the illness does start looking like an abnormally bad flu, and people stop caring within months. "Experts" face no sort of criticism since they remained calm and were vindicated. People like us sigh in relief, and are perhaps reminded by the "experts" that there was nothing to worry about.
But let's say that the case fatality rate actually turns out to be 3%, and 50% of the... (read more)
There's a phenomenon I currently hypothesize to exist where direct attacks on the problem of AI alignment are criticized much more often than indirect attacks.
If this phenomenon exists, it could be advantageous to the field in the sense that it encourages thinking deeply about the problem before proposing solutions. But it could also be bad because it disincentivizes work on direct attacks to the problem (if one is criticism averse and would prefer their work be seen as useful).
I have arrived at this hypothesis from my observations: I have watched people propose solutions only to be met with immediate and forceful criticism from others, while other people proposing non-solutions and indirect analyses are given little criticism at all. If this hypothesis is true, I suggest it is partly or mostly because direct attacks on the problem are easier to defeat via argument, since their assumptions are made plain
If this is so, I consider it to be a potential hindrance on thought, since direct attacks are often the type of thing that leads to the most deconfusion -- not because the direct attack actually worked, but because in explaining how it failed, we learned what definitely doesn't work.
Nod. This is part of a general problem where vague things that can't be proven not to work are met with less criticism than "concrete enough to be wrong" things.
A partial solution is a norm wherein "concrete enough to be wrong" is seen as praise, and something people go out of their way to signal respect for.
Occasionally, I will ask someone who is very skilled in a certain subject how they became skilled in that subject so that I can copy their expertise. A common response is that I should read a textbook in the subject.
Eight years ago, Luke Muehlhauser wrote,
However, I have repeatedly found that this is not good advice for me.
I want to briefly list the reasons why I don't find sitting down and reading a textbook that helpful for learning. Perhaps, in doing so, someone else might appear and say, "I agree completely. I feel exactly the same way" or someone might appear to say, "I used to feel that way, but then I tried this..." This is what I have discovered:
- When I sit down to read a long textbook, I find myself subconsciously constantly checking how many pages I have read. For instance, if I have been
... (read more)I used to feel similarly, but then a few things changed for me and now I am pro-textbook. There are caveats - namely that I don't work through them continuously.
This is a big one for me, and probably the biggest change I made is being much more discriminating in what I look for in a textbook. My concerns are invariably practical, so I only demand enough formality to be relevant; otherwise I am concerned with a good reputation for explaining intuitions, graphics, examples, ease of reading. I would go as far as to say that style is probably the most important feature of a textbook.
As I mentioned, I don't work through them front to back, because that actually is homework. Instead I treat them more like a reference-with-a-hook; I look at them when I need to understand the particular thing in more depth, and then get out when I have what I need. But because it is contained in a textbook, this knowledge now has a natural link to steps before and after, so I have obvious places to go for regression and advancement.
I spend a lot of time thinking about what I need to learn, why I need to learn it, and how it relates to what I already know. Thi... (read more)
I bet Robin Hanson on Twitter my $9k to his $1k that de novo AGI will arrive before ems. He wrote,
I get the feeling that for AI safety, some people believe that it's crucially important to be an expert in a whole bunch of fields of math in order to make any progress. In the past I took this advice and tried to deeply study computability theory, set theory, type theory -- with the hopes of it someday giving me greater insight into AI safety.
Now, I think I was taking a wrong approach. To be fair, I still think being an expert in a whole bunch of fields of math is probably useful, especially if you want very strong abilities to reason about complicated systems. But, my model for the way I frame my learning is much different now.
I think my main model which describes my current perspective is that I think employing a lazy style of learning is superior for AI safety work. Lazy is meant in the computer science sense of only learning something when it seems like you need to know it in order to understand something important. I will contrast this with the model that one should learn a set of solid foundations first before going any further.
Obviously neither model can be absolutely correct in an extreme sense. I don't, as a silly example, think that people who can't do ... (read more)
I have mixed feelings and some rambly personal thoughts about the bet Tamay Besiroglu and I proposed a few days ago.
The first thing I'd like to say is that we intended it as a bet, and only a bet, and yet some people seem to be treating it as if we had made an argument. Personally, I am uncomfortable with the suggestion that our post was "misleading" because we did not present an affirmative case for our views.
I agree that LessWrong culture benefits from arguments as well as bets, but it seems a bit weird to demand that every bet come with an argument attached. A norm that all bets must come with arguments would seem to substantially damper the incentives to make bets, because then each time people must spend what will likely be many hours painstakingly outlining their views on the subject.
That said, I do want to reply to people who say that our post was misleading on other grounds. Some said that we should have made different bets, or at different odds. In response, I can only say that coming up with good concrete bets about AI timelines is actually really damn hard, and so if you wish you come up with alternatives, you can be my guest. I tried my best, at least.
More people ... (read more)
I think there are some serious low hanging fruits for making people productive that I haven't seen anyone write about (not that I've looked very hard). Let me just introduce a proof of concept:
Final exams in university are typically about 3 hours long. And many people are able to do multiple finals in a single day, performing well on all of them. During a final exam, I notice that I am substantially more productive than usual. I make sure that every minute counts: I double check everything and think deeply about each problem, making sure not to cut corners unless absolutely required because of time constraints. Also, if I start daydreaming, then I am able to immediately notice that I'm doing so and cut it out. I also believe that this is the experience of most other students in university who care even a little bit about their grade.
Therefore, it seems like we have an example of an activity that can just automatically produce deep work. I can think of a few reasons why final exams would bring out the best of our productivity:
1. We care about our grade in the course, and the few hours in that room are the most impactful to our grade.
2. We are in an environment where ... (read more)
Related to: The Lottery of Fascinations, other posts probably
I will occasionally come across someone who I consider to be extraordinarily productive, and yet when I ask what they did on a particular day they will respond, "Oh I basically did nothing." This is particularly frustrating. If they did nothing, then what was all that work that I saw!
I think this comes down to what we mean by doing nothing. There's a literal meaning to doing nothing. It could mean sitting in a chair, staring blankly at a wall, without moving a muscle.
More practically, what people mean by doing nothing is that they are doing something unrelated to their stated task, such as checking Facebook, chatting with friends, browsing Reddit etc.
When productive people say that they are "doing nothing" it could just be that they are modest, and don't want to signal how productive they really are. On the other hand, I think that there is a real sense in which these productive people truly believe that they are doing nothing. Even if their "d... (read more)
Many people have argued that recent language models don't have "real" intelligence and are just doing shallow pattern matching. For example see this recent post.
I don't really agree with this. I think real intelligence is just a word for deep pattern matching, and our models have been getting progressively deeper at their pattern matching over the years. The machines are not stuck at some very narrow level. They're just at a moderate depth.
I propose a challenge:
The challenge is to come up with the best prompt that demonstrates that even after 2-5 years of continued advancement, language models will still struggle to do basic reasoning tasks that ordinary humans can do easily.
Here's how it works.
Name a date (e.g. January 1st 2025), and a prompt (e.g. "What food would you use to prop a book open and why?"). Then, on that date, we should commission a Mechanical Turk task to ask humans to answer the prompt, and ask the best current publicly available language model to answer the same prompt.
Then, we will ask LessWrongers to guess which replies were real human replies, and which ones were machine generated. If LessWrongers can't do better than random guessing, then the machine wins.
So, in 2017 Eliezer Yudkowsky made a bet with Bryan Caplan that the world will end by January 1st, 2030, in order to save the world by taking advantage of Bryan Caplan's perfect betting record — a record which, for example, includes a 2008 bet that the UK would not leave the European Union by January 1st 2020 (it left on January 31st 2020 after repeated delays).
What we need is a short story about people in 2029 realizing that a bunch of cataclysmic events are imminent, but all of them seem to be stalled, waiting for... something. And no one knows what to do. But by the end people realize that to keep the world alive they need to make more bets with Bryan Caplan.
The case for studying mesa optimization
Early elucidations of the alignment problem focused heavily on value specification. That is, they focused on the idea that given a powerful optimizer, we need some way of specifying our values so that the powerful optimizer can create good outcomes.
Since then, researchers have identified a number of additional problems besides value specification. One of the biggest problems is that in a certain sense, we don't even know how to optimize for anything, much less a perfect specification of human values.
Let's assume we could get a utility function containing everything humanity cares about. How would we go about optimizing this utility function?
The default mode of thinking about AI right now is to train a deep learning model that performs well on some training set. But even if we were able to create a training environment for our model that reflected the world very well, and rewarded it each time it did something good, exactly in proportion to how good it really was in our perfect utility function... this still would not be guaranteed to yield a positive artificial intelligence.
This problem is not a superficial one either -- it is intri... (read more)
Signal boosting a Lesswrong-adjacent author from the late 1800s and early 1900s
Via a friend, I recently discovered the zoologist, animal rights advocate, and author J. Howard Moore. His attitudes towards the world reflect contemporary attitudes within effective altruism about science, the place of humanity in nature, animal welfare, and the future. Here are some quotes which readers may enjoy,
... (read more)
I agree with Wei Dai that we should use our real names for online forums, including Lesswrong. I want to briefly list some benefits of using my real name,
That said, there are some significant downsides, and I sympathize with people who don't want to use their real names.
- It makes it much easier for people to dox you. There are some very bad ways that this can manifest.
- If
... (read more)Bertrand Russell's advice to future generations, from 1959
... (read more)When I look back at things I wrote a while ago, say months back, or years ago, I tend to cringe at how naive many of my views were. Faced with this inevitable progression, and the virtual certainty that I will continue to cringe at views I now hold, it is tempting to disconnect from social media and the internet and only comment when I am confident that something will look good in the future.
At the same time, I don't really think this is a good attitude for several reasons:
- Writing things up forces my thoughts to be more explicit, improving my ability
... (read more)People who don't understand the concept of "This person may have changed their mind in the intervening years", aren't worth impressing. I can imagine scenarios where your economic and social circumstances are so precarious that the incentives leave you with no choice but to let your speech and your thought be ruled by unthinking mob social-punishment mechanisms. But you should at least check whether you actually live in that world before surrendering.
Related to: Realism about rationality
I have talked to some people who say that they value ethical reflection, and would prefer that humanity reflected for a very long time before colonizing the stars. In a sense I agree, but at the same time I can't help but think that "reflection" is a vacuous feel-good word that has no shared common meaning.
Some forms of reflection are clearly good. Epistemic reflection is good if you are a consequentialist, since it can help you get what you want. I also agree that narrow forms of reflection can also be ... (read more)
It's now been about two years since I started seriously blogging. Most of my posts are on Lesswrong, and the most of the rest are scattered about on my substack and the Effective Altruist Forum, or on Facebook. I like writing, but I have an impediment which I feel impedes me greatly.
In short: I often post garbage.
Sometimes when I post garbage, it isn't until way later that I learn that it was garbage. And when that happens, it's not that bad, because at least I grew as a person since then.
But the usual case is that I realize that it's garbage right after I... (read more)
Should effective altruists be praised for their motives, or their results?
It is sometimes claimed, perhaps by those who recently read The Elephant in the Brain, that effective altruists have not risen above the failures of traditional charity, and are every bit as mired in selfish motives as non-EA causes. From a consequentialist view, however, this critique is not by itself valid.
To a consequentialist, it doesn't actually matter what one's motives are as long as the actual effect of their action is to do as much good as possible. This is the pri... (read more)
Sometimes people will propose ideas, and then those ideas are met immediately after with harsh criticism. A very common tendency for humans is to defend our ideas and work against these criticisms, which often gets us into a state that people refer to as "defensive."
According to common wisdom, being in a defensive state is a bad thing. The rationale here is that we shouldn't get too attached to our own ideas. If we do get attached, we become liable to become crackpots who can't give an idea up because it would make them look bad if we ... (read more)
I keep wondering why many AI alignment researchers aren't using the alignmentforum. I have met quite a few people who are working on alignment who I've never encountered online. I can think of a few reasons why this might be,
I've often wished that conversation norms shifted towards making things more consensual. The problem is that when two people are talking, it's often the case that one party brings up a new topic without realizing that the other party didn't want to talk about that, or doesn't want to hear it.
Let me provide an example: Person A and person B are having a conversation about the exam that they just took. Person A bombed the exam, so they are pretty bummed. Person B, however, did great and wants to tell everyone. So then person B comes up to... (read more)
Reading through the recent Discord discussions with Eliezer, and reading and replying to comments, has given me the following impression of a crux of the takeoff debate. It may not be the crux. But it seems like a crux nonetheless, unless I'm misreading a lot of people.
Let me try to state it clearly:
The foom theorists are saying something like, "Well, you can usually-in-hindsight say that things changed gradually, or continuously, along some measure. You can use these measures after-the-fact, but that won't tell you about the actual gradual-ness of t... (read more)
There have been a few posts about the obesity crisis here, and I'm honestly a bit confused about some theories that people are passing around. I'm one of those people thinks that the "calories in, calories" (CICO) theory is largely correct, relevant, and helpful for explaining our current crisis.
I'm not actually sure to what extent people here disagree with my basic premises, or whether they just think I'm missing a point. So let me be more clear.
As I understand, there are roughly three critiques you can have against the CICO theory. You can think it... (read more)
A common heuristic argument I've seen recently in the effective altruism community is the idea that existential risks are low probability because of what you could call the "People really don't want to die" (PRDWTD) hypothesis. For example, see here,
(Note that I hardly mean to strawman MacAskill here. I'm not arguing against him ... (read more)
After writing the post on using transparency regularization to help make neural networks more interpretable, I have become even more optimistic that this is a potentially promising line of research for alignment. This is because I have noticed that there are a few properties about transparency regularization which may allow it to avoid some pitfalls of bad alignment proposals.
To be more specific, in order for a line of research to be useful for alignment, it helps if
- The line of research doesn't require unnecessarily large amounts of computations to p
... (read more)Forgive me for cliche scientism, but I recently realized that I can't think of any major philosophical developments in the last two centuries that occurred within academic philosophy. If I were to try to list major philosophical achievements since 1819, these would likely appear on my list, but none of them were from those trained in philosophy:
- A convincing, simple explanation for the apparent design we find in the living world (Darwin and Wallace).
- The unification of time and space into one fabric (Einstein)
- A solid foundation for axiomatic mathematics
... (read more)I would name the following:
NVIDIA's stock price is extremely high right now. It's up 134% this year, and up about 6,000% since 2015! Does this shed light on AI timelines?
Here are some notes,
- NVIDIA is the top GPU company in the world, by far. This source says that they're responsible for about 83% of the market, with 17% coming from their primary competition, AMD.
- By market capitalization, it's currently at $764.86 billion, compared to the largest company, Apple, at $2.655 trillion.
- This analysis estimates their projected earnings based on their stock price on September 2nd and comes u
... (read more)Rationalists are fond of saying that the problems of the world are not from people being evil, but instead a result of the incentives of our system, which are such that this bad outcome is an equilibrium. There's a weaker thesis here that I agree with, but otherwise I don't think this argument actually follows.
In game theory, an equilibrium is determined by both the setup of the game, and by the payoffs for each player. The payoffs are basically the values of the players in the game—their utility functions. In other words, you get different equilibria if p... (read more)
I've heard a surprising number of people criticize parenting recently using some pretty harsh labels. I've seen people call it a form of "Stockholm syndrome" and a breach of liberty, morally unnecessary etc. This seems kind of weird to me, because it doesn't really match my experience as a child at all.
I do agree that parents can sometimes violate liberty, and so I'd prefer a world where children could break free from their parents without penalties. But I also think that most children genuinely love their parents and so would... (read more)
I think that human level capabilities in natural language processing (something like GPT-2 but much more powerful) is likely to occur in some software system within 20 years.
Since human level capabilities in natural language processing is a very rich real-world task, I would consider a system with those capabilities to be adequately described as a general intelligence, though it would likely not be very dangerous due to its lack of world-optimization capabilities.
This belief of mine is based on a few heuristics. Below I have collected a few claims which I... (read more)
[ETA: Apparently this was misleading; I think it only applied to one company, Alienware, and it was because they didn't get certification, unlike the other companies.]
In my post about long AI timelines, I predicted that we would see attempts to regulate AI. An easy path for regulators is to target power-hungry GPUs and distributed computing in an attempt to minimize carbon emissions and electricity costs. It seems regulators may be going even faster than I believed in this case, with new bans on high performance personal computers now taking effect in six ... (read more)
Is it possible to simultaneously respect people's wishes to live, and others' wishes to die?
Transhumanists are fond of saying that they want to give everyone the choice of when and how they die. Giving people the choice to die is clearly preferable to our current situation, as it respects their autonomy, but it leads to the following moral dilemma.
Suppose someone loves essentially every moment of their life. For tens of thousands of years, they've never once wished that they did not exist. They've never had suicidal thoughts, and have a... (read more)
I generally agree with the heuristic that we should "live on the mainline", meaning that we should mostly plan for events which capture the dominant share of our probability. This heuristic causes me to have a tendency to do some of the following things
- Work on projects that I think have a medium-to-high chance of succeeding and quickly abandon things that seem like they are failing.
- Plan my career trajectory based on where I think I can plausibly maximize my long term values.
- Study subjects only if I think that I will need to understand them at som
... (read more)In discussions about consciousness I find myself repeating the same basic argument against the existence of qualia constantly. I don't do this just to be annoying: It is just my experience that
1. People find consciousness really hard to think about and has been known to cause a lot of disagreements.
2. Personally I think that this particular argument dissolved perhaps 50% of all my confusion about the topic, and was one of the simplest, clearest arguments that I've ever seen.
I am not being original either. The argument is the same one that has b... (read more)
"Immortality is cool and all, but our universe is going to run down from entropy eventually"
I consider this argument wrong for two reasons. The first is the obvious reason, which is that even if immortality is impossible, it's still better to live for a long time.
The second reason why I think this argument is wrong is because I'm currently convinced that literal physical immortality is possible in our universe. Usually when I say this out loud I get an audible "what" or something to that effect, but I'm not kidding.
It... (read more)
I now have a Twitter account that tweets my predictions.
I don't think I'm willing to bet on every prediction that I make. However, I pledge the following: if, after updating on the fact that you want to bet me, I still disagree with you, then I will bet. The disagreement must be non-trivial though.
For obvious reasons, I also won't bet on predictions that are old, and have already been replaced by newer predictions. I also may not be willing to bet on predictions that have unclear resolution criteria, or are about human extinction.
I have discovered recently that while I am generally tired and groggy in the morning, I am well rested and happy after a nap. I am unsure if this matches other people's experiences, and haven't explored much research. Still, I think this is interesting to think about fully.
What is the best way to apply this knowledge? I am considering purposely sabotaging my sleep so that I am tired enough to take a nap by noon, which would refresh me for the entire day. But this plan may have some significant drawbacks, including being excessively tired for a few hours in the morning.