The obvious way to quickly and intuitively illustrate whether reactions are positive or negative would seem to be color; another option would be grouping them horizontally or vertically with some kind of separator. The obvious way to quickly and intuitively make it visible which reactions were had by more readers would seem to be showing a copy of the same icon for each person who reacted a certain way, not a number next to the icon.
I make no claim that either of these changes would be improvements overall. Clearly the second would require a way to handl...
In the current UI, the list of reactions from which to choose is scrollable, but that's basically impossible to actually see. While reading the comments I was wondering what the heck people were talking about with "Strawman" and so forth. (Like... did that already get removed?) Then I discovered the scrolling by accident after seeing a "Shrug" reaction to one of the comments.
I've had similar thoughts. Two counterpoints:
This is basically misuse risk, which is not a weird problem that people need to be convinced even needs solving. To the extent AI appears likely to be powerful, society at large is already working on this. Of course, its efforts may be ineffective or even counterproductive.
They say power corrupts, but I'd say power opens up space to do what you were already inclined to do without constraints. Some billionaires, e.g. Bill Gates, seem to be sincerely trying to use their resources to help people. It isn't har
On SBF, I think a large part of the issue is that he was working in an industry called cryptocurrency that is basically has fraud as the bedrock of it all. There was nothing real about crypto, so the collapse of FTX was basically inevitable.
I don't deny that the cryptocurrency "industry" has been a huge magnet for fraud, nor that there are structural reasons for that, but "there was nothing real about crypto" is plainly false. The desire to have currencies that can't easily be controlled, manipulated, or implicitly taxed (seigniorage, inflation) by gove...
Thank you for writing these! They've been practically my only source of "news" for most of the time you've been writing them, and before that I mostly just ignored "news" entirely because I found it too toxic and it was too difficult+distasteful to attempt to decode it into something useful. COVID the disease hasn't directly had a huge effect on my life, and COVID the social phenomenon has been on a significant decline for some time now, but your writing about it (and the inclusion of especially notable non-COVID topics) have easily kept me interested enou...
I found it to be a pretty obvious reference to the title. SPAM is a meatcube. A meatcube is something that has been processed into uniformity. Any detectable character it had, whether faults, individuality, or flashes of brilliance, has been ground, blended, and seasoned away.
I don't know how far a model trained explicitly on only terminal output could go, but it makes sense that it might be a lot farther than a model trained on all the text on the internet (some small fraction of which happens to be terminal output). Although I also would have thought GPT's architecture, with a fixed context window and a fixed number of layers and tokenization that isn't at all optimized for the task, would pay large efficiency penalties at terminal emulation and would be far less impressive at it than it is at other tasks.
Assuming it does work, could we get a self-operating terminal by training another GPT to roleplay the entering commands part? Probably. I'm not sure we should though...
Sure, I understood that's what was being claimed. Roleplaying a Linux VM without error seemed extremely demanding relative to other things I knew LLMs could do, such that it was hard for me not to question whether the whole thing was just made up.
Thanks! This is much more what I expected. Things that look generally like outputs that commands might produce, and with some mind-blowing correct outputs (e.g. the effect of
tr on the source code) but also some wrong outputs (e.g. the section after
echo A >a; echo X >b; echo T >c; echo H >d; the output being consistent between
cat a a c b d d and
cat a a c b d d | sort (but inconsistent with the "actual contents" of the files) is especially the kind of error I'd expect an LLM to make).
That works too!
Got it. This post also doesn't appear to actually be part of that sequence though? I would have noticed if it was and looked at the sequence page.
EDIT: Oh, I guess it's not your sequence.
EDIT2: If you just included "Alignment Stream of Thought" as part of the link text in your intro where you do already link to the sequence, that would work.
What do you mean by this acronym? I'm not aware of its being in use on LW, you don't define it, and to me it very definitely (capitalization and all) means Armin van Buuren's weekly radio show A State of Trance.
Counterpoint #2a: A misaligned AGI whose capabilities are high enough to use our safety plans against us will succeed with an equal probability (e.g., close to 100%), if necessary by accessing these plans whether or not they were posted to the Internet.
If only relative frequency of genes matters, then the overall size of the gene pool doesn't matter. If the overall size of the gene pool doesn't matter, then it doesn't matter if that size is zero. If the size of the gene pool is zero, then whatever was included in that gene pool is extinct.
Yes, it's true people make all kinds of incorrect inferences because they think genes that increase the size of the gene pool will be selected for or those that decrease it will be selected against. But it's still also true that a gene that reduces the size of the po...
I mean, just lag, yes, but there's also plain old incorrect readings. But yes, it would be cool to have a system that incorporated glucagon. Though, diabetics' body still produce glucagon AFAIK, so it'd really be better to just have something that senses glucose and releases insulin the same way a working pancreas would.
Context: I am a type 1 diabetic. I have a CGM, but for various reasons use multiple daily injections rather than an insulin pump; however, I'm familiar with how insulin pumps work.
A major problem with a closed-loop CGM-pump system is data quality from the CGM. My CGM (Dexcom G6) has ~15 minutes of lag (because it reads interstitial fluid, not blood). This is the first generation of Dexcom that doesn't require calibrations from fingersticks, but I've occasionally had CGM readings that felt way off and needed to calibrate anyway. Accuracy and noisiness v...
We'll build the most powerful AI we think we can control. Nothing prevents us from ever getting that wrong. If building one car with brakes that don't work made everyone in the world die in a traffic accident, everyone in the world would be dead.
There's also the problem of an AGI consistently exhibiting aligned behavior due to low risk tolerance, until it stops doing that (for all sorts of unanticipated reasons).
This is especially compounded by the current paradigm of brute forcing randomly generated-neural networks, since the resulting systems are fundamentally unpredictable and unexplainable.
How much did that setup cost? I'm curious about similar use cases.
The best way to actually schedule or predict a project is to break it down into as many small component tasks as possible, identify dependencies between those tasks, and produce most likely, optimistic, and pessimistic estimates for each task, and then run a simulation for chain of dependencies to see what the expected project completion looks like. Use a Gantt chart. This is a boring answer because it's the "learn project management" answer, and people will hate on it because gesture vaguely to all of the projects that overrun their schedule. There are m
In other words, asking people for a best guess or an optimistic prediction results in a biased prediction that is almost always earlier than a real delivery date. On the other hand, while the pessimistic question is not more accurate (it has the same absolute error margins), it is unbiased. The reality is that the study says that people asked for a pessimistic question were equally likely to over-estimate their deadline as they were to under-estimate it. If you don't think a question that gives you a distribution centered on the right answer is useful, I'
I have a sense that this is a disagreement about how to decide what words "really" mean, and I have a sense that I disagree with you about how to do that.
I had already (weeks ago) approvingly cited and requested for my wife and my best friend to read that particular post, which I think puts it at 99.5th percentile or higher of LW posts in terms of my wanting its message to be understood and taken to heart, so I think I disagree with this comment about as strongly as is possib...
It didn't work for the students in the study in the OP. That's literally why the OP mentioned it!
You're right - "you failed, what happened" does create a mental frame that "what could go wrong" does not. I don't think "how long could it take if everything goes as poorly as possible" creates any more useful of a frame than "you failed, what happened". But it does, formally, request a number. I don't think that number, itself, is good for anything. I'm not even convinced asking for that number is very effective for eliciting the "you failed, what happened" mindset. I definitely don't think it's more effective for that than just asking directly "you failed, what happened".
Given the context, I imagine what they were doing is making up a number that was bigger than another number they'd just made up. Humans are cognitive misers. A student would correctly guess that it doesn't really matter if they get this question right and not try very hard. That's actually what I would do in a context where it was clear that a numeric answer was required, I was expected to spend little time answering, and I was motivated not to leave that particular question blank.
My answer of "never" also took little thought (for me). I thought a bit ...
Yes, given that question, IMO they should have answered "never". 55.5 days isn't the true answer, because in reality everything didn't go as poorly as possible. You're right, it's a bad question that a brick wall would do a better job of answering correctly than a human who's trying to be helpful.
The answer to your question is useful, but not because of the number. "What could go wrong to make this take longer than expected?" would elicit the same useful information without spuriously forcing a meaningless number to be produced.
"Assume it gets done but the process is super shitty. How long will it take?"
I would, in fact, consider that interpretation to be unambiguously "not understanding what they were being asked" , given the question in the post. Not understanding what is being asked is something that happens a fair bit.
I'll give you that if they had asked "assuming it gets done but everything goes as poorly as possible, how long did it take?", it takes a bit of a strange mind to look for some weird scenario that strings things along for years, or centuries, or eons. But "...
I would, in fact, consider that interpretation to be unambiguously "not understanding what they were being asked" , given the question in the post.
in a classic experiment, 37 psychology students were asked to estimate how long it would take them to finish their senior theses “if everything went as poorly as it possibly could,” and they still underestimated the time it would take, as a group (the average prediction was 48.6 days, and the average actual completion time was 55.5 days).
That's nuts. Does anyone really think that "if everything went as poorly as it possibly could" that the thesis would ever get done at all? It's so bizarre it makes me question whether the students actually understood what they were being asked.
IME the sense that this is nuts seems to be a quirk of STEM thinking. In practice, most non-rationalists seem to interpret "How long will this take if everything goes as poorly as possible?" as something like "Assume it gets done but the process is super shitty. How long will it take?"
It's a quirk of rationalist culture (and a few others — I've seen this from physicists too) to take the words literally and propose that "infinitely long" is a plausible answer, and be baffled as to how anyone could think otherwise.
Many smart political science and English majors don't seem to go down that line of reasoning, for instance.
Thermodynamic? Thermodynamics seems to be about using a small number of summary statistics (temperature, pressure, density, etc.) because the microstructure of the system isn't necessary to compute what will happen at the macro level.
"Building an actual aligned AI, of course, would be a pivotal act." What would an aligned AI do that would prevent anybody from ever building an unaligned AI?
My guess is that it would implement universal surveillance and intervene, when necessary, to directly stop people from doing just that. Sorry, I should've been clearer that I was talking about an aligned superintelligent AI. Since unaligned AI killing everyone seems pretty obviously extremely bad according to the vast majority of humans' preferences, preventing that would be a very high priority for any sufficiently powerful aligned AI.
I can't see how "publishing papers with alignment techniques" or "encouraging safe development with industry groups and policy standards" could be pivotal acts. To prevent anyone from building unaligned AI, building an unaligned AI in your garage needs to be prevented. That requires preventing people who don't read the alignment papers or policy standards and aren't members of the industry groups from building unaligned AI.
That, in turn, appears to me to require at least one of 1) limiting access to computation resources from your garage, 2) limiting kno...
I’m going to predict a somewhat faster rise this week because I doubt the Midwest drop will get sustained.
The deaths number going up this much shows that my prediction the previous week was indeed far too high, despite this coming in substantially higher than my median guess, confirming that last week was a cross between slower real growth than expected and the Easter holiday. This week had a huge jump in the South region.
These pieces of text don't seem to line up with their associated charts and graphs? I notice I am confused.
Wow, the air conditioner systematically sucking the cold air it's generated back into the intake sort of seems like another problem with this design. (Possibly the same problem in another guise, thermodynamically, but in any case, different in terms of actual produced experience.)
I am also not quite clear why north Korea destroying the world would be so much worse than deepmind doing it.
I think the argument about this part would be that Deepmind is much more likely (which is not to say "likely" on an absolute scale) to at least take alignment seriously enough to build (or even use!) interpretability tools and maybe revise their plans if the tools show the AGI plotting to kill everyone. So by the time Deepmind is actually deploying an AGI (even including accidental "deployments" due to foom during testing), it's less likely to b...
I didn't look at the tags before reading. I did notice it was fiction pretty quickly but "is this dath ilan" was still a live question for me until the reveal. (Though Eliezer might want to continue writing some non-dath ilan fiction occasionally, if he wants that to continue to be a likely thought process.)
I think I have that intuition because the great majority of seatbelt unbucklings in my experience happen while traveling at a speed of zero (because they're in cars, not planes). The sentence has no cues to indicate the unusual context of being in a plane (and in fact, figuring that out is the point of the example). So my mental process reading that sentence is "that's obviously false" -> "hmm, wonder if I'm missing something" -> "oh, maybe in a plane?" and the first step there seems a lot more reliable (in other reasoners as well, not just me) than the second or third.
The "inference" "We can also infer that she is traveling at a high speed because she is unbuckling her seatbelt." is also nonsensical. People don't typically unbuckle their seatbelts when traveling at high speed. (Albeit, this does maybe happen to be true for airplane travel because one isn't allowed to unbuckle one's seatbelt while traveling at low speed, i.e. during taxi, takeoff and landing; but that's enough of a non-central case that it needs to be called out for the reasoning not to sound absurd.)
Why is it a non-central example when this is, in fact, about commercial airplane travel where you will be moving fastest at cruising altitude and that is when you're allowed to unbuckle and move about the cabin?
I'm pretty confused that this is as necessary as it is, particularly with writing that involves a lot of math and math notation. I don't understand how people get the insight and motivation necessary to write that kind of thing without explaining what the point of it is or giving examples of how to apply it as part of their expositions.
Your (johnswentworth's) posts don't seem to suffer nearly as much from this, at least from a quick skim of the ones you said on another thread you'd like distilled, but e.g. Infra-Bayesianism seems maybe important (it's abo...
I'm not saying there is a clearly better structure available for this purpose - I think the weirdness comes from the fact that it's so unclear who should go in the box normally reserved for "Shareholders" or "Voters."
Isn't the obvious answer for nonprofits "Donors"? (Yes, it's not immediately obvious how amount or recency of donations should translate into power to ultimately direct the organization, but this at least tells you who should go in the box.)
That's not to say that this is how nonprofits are run (it seems not to be) but that it would be the obvious translation of the ways that for-profit companies and governments are run.
Creating totally artificial rewards ("if I put away the groceries, I can have a piece of candy") kind of straddles the boundary between "Rewards" and "Make the task less unpleasant", IMO. I've occasionally managed to do N things I didn't want to do by allowing myself to watch N episodes of a show I'm binging, but only if I do a thing between episodes. Clearly this is a second-best solution, but it does seem to stop the "what happens next? Keep poking him until he watches the next episode so we can find out" demon from being quite so disruptive.
One aspect of habits you didn't mention is automaticity. I automatically brush my teeth as part of taking a shower. Often I notice myself doing this, but even if not it's still part of the "taking a shower" mental program.
I agree that this isn't reliable and sometimes one has to muscle through to retain a habit, but I think the balance between automaticity and aversion to losing the habit varies a lot from person to person, and it would be worth having a better understanding of what enhances or inhibits automaticity.
I wish you (or someone) would make a little book of this.
I can certainly see why one might do that – if you allow requests then refusing a request is going to be quite socially awkward at be evest.
Can't tell whether you're being sarcastic, but I actually think this is true. I feel like there might have been a time when these things weren't weaponised and even if it were okay to ask, no one would do it unless they were immunocompromised or something. But these days, as an employer who wants employees spending as little time and attention on masking (or lack thereof) as possible, the only maybe-stable equilib...
I felt dumb recently when I noticed that the determinant is sort of "the absolute value for matrices", considering that it's literally written using the same signs as the absolute value. Although I guess the determinant of the representation of a complex number as a matrix is , not . The "signed volume" idea seems related to this, insofar as multiplying a complex number by another will stretch / smush by (in addition to rotating it).
Thanks. Yeah, I knew there was some qualifier missing that would make it true, I just couldn't intuit exactly what it was.
Edited to add: Actually I would say that the determinant distributes through multiplication. Commutativity: . Distributivity: . Neither is a perfect analog, because the determinant is a unary operation, but distributivity at least captures that there are two operations involved. But unlike my other comment, this one doesn't actually impair comprehension, as there's not really a different thing you could be...
Now, we see a connection with the sign of a permutation: it's the only nontrivial way we know (and in fact it's the only way to do it at all!) to assign a scalar value to a permutation, which in this special case we know the determinant must do.
Huh? Off the top of my head, here's another way to assign a scalar value to a permutation: multiply together the lengths of all the cycles it contains. (No idea whether this is useful for anything. Taking the least common multiple of the lengths of all the cycles tells you the order of the permutation, i.e. how many times you have to apply it before you get the identity, though.)
I have a little experience with this. I'm a type 1 diabetic and insulin needs to be kept refrigerated or it denatures and doesn't work.
I've just gotten a bunch of EcoFlow products for this. They work well together, but are maybe not the cheapest. Many are out of stock at the moment, anyway.
My 400W portable solar panel appears to actually generate around 140W in bright sunlight when pointed in roughly the right direction. This is a substantial amount for powering phones and probably even laptops! Not so much for refrigerators, air conditioners, e
See also Evolution of Modularity. Using your quantilizer to choose training examples from which to learn appears to be a very simple, natural way of accomplishing modularly varying reward functions. (I left a comment there too.)
Aren't "modularly varying reward functions" exactly what D𝜋's Self-Organised Neural Networks accomplish? Each example in the training data is a module of the reward function. By only learning on the training examples that are currently hardest for the network, we make those examples easier and thus implicitly swap them out of the "examples currently hardest for the network" set.
I think this comment demonstrates that the list of reacts should wrap, not extend arbitrarily far to the right.