Reinforcement Learning Gone Wrong

Last week’s episode on artificial intelligence gets a huge payoff this week—we’ll explore a wonderful couple of papers about all the ways that artificial intelligence can go wrong. Malevolent actors? You bet. Collateral damage? Of course. Reward hacking? Naturally! It’s fun to think about, and the discussion starting now will have reverberations for decades to come.

Relevant links:

How to create a malevolent artificial intelligence
Unethical research: how to create a malevolent artificial intelligence
Concrete problems in AI safety