I lurk and tag stuff.


Sorted by New

Wiki Contributions


  • What are these AIs going to do that is immensely useful but not at all dangerous? A lot of useful capabilities that people want are adjacent to danger. Tool AIs Want to be Agent AIs.
  • If two of your AIs would be dangerous when combined, clearly you can't make them publicly available, or someone would combine them. If your publicly-available AI is dangerous if someone wraps it with a shell script, someone will create that shell script (see AutoGPT). If no one but a select few can use your AI, that limits its usefulness.
  • An AI ban that stops dangerous AI might be possible. An AI ban that allows development of extremely powerful systems but has exactly the right safeguard requirements to render those systems non-dangerous seems impossible.

When people calculate utility they often use exponential discounting over time. If for example your discount factor is .99 per year, it means that getting something in one year is only 99% as good as getting it now, getting it in two years is only 99% as good as getting it in one year, etc. Getting it in 100 years would be discounted to .99^100~=36% of the value of getting it now.

The sharp left turn is not some crazy theoretical construct that comes out of strange math. It is the logical and correct strategy of a wide variety of entities, and also we see it all the time.

I think you mean Treacherous Turn, not Sharp Left Turn.

Sharp Left Turn isn't a strategy, it's just an AI that's aligned in some training domains being capable but not aligned in new ones.

This post is tagged with some wiki-only tags. (If you click through to the tag page, you won't see a list of posts.) Usually it's not even possible to apply those. Is there an exception for when creating a post?

Based on my incomplete understanding of transformers:

A transformer does its computation on the entire sequence of tokens at once, and ends up predicting the next token for each token in the sequence.

At each layer, the attention mechanism gives the stream for each token the ability to look at the previous layer's output for other token before it in the sequence.

The stream for each token doesn't know if it's the last in the sequence (and thus that its next-token prediction is the "main" prediction), or anything about the tokens that come after it.

So each token's stream has two tasks in training: predict the next token, and generate the information that later tokens will use to predict their next tokens. 

That information could take many different forms, but in some cases it could look like a "plan" (a prediction about the large-scale structure of the piece of writing that begins with the observed sequence so far from this token-stream's point of view).

In the blackmail scenario, FDT refuses to pay if the blackmailer is a perfect predictor and the FDT agent is perfectly certain of that, and perfectly certain that the stated rules of the game will be followed exactly. However, with stakes of $1M against $1K, FDT might pay if the blackmailer had an 0.1% chance of guessing the agent's action incorrectly, or if the agent was less than 99.9% confident that the blackmailer was a perfect predictor.

(If the agent is concerned that predictably giving in to blackmail by imperfect predictors makes it exploitable, it can use a mixed strategy that refuses to pay just often enough that the blackmailer doesn't make any money in expectation.)

In Newcomb's Problem, the predictor doesn't have to be perfect - you should still one-box if the predictor is 99.9% or 95% or even 55% likely to predict your action correctly. But this scenario is extremely dependent on how many nines of accuracy the predictor has. This makes it less relevant to real life, where you might run into a 55% accurate predictor or a 90% accurate predictor, but never a perfect predictor.

I'm not familiar with LeCun's ideas, but I don't think the idea of having an actor, critic, and world model is new in this paper. For a while, most RL algorithms have used an actor-critic architecture, including OpenAI's old favorite PPO. Model-based RL has been around for years as well, so probably plenty of projects have used an actor, critic, and world model.

Even though the core idea isn't novel, this paper getting good results might indicate that model-based RL is making more progress than expected, so if LeCun predicted that the future would look more like model-based RL, maybe he gets points for that.

Merge candidate with Philosophy of Language?

Load More