EA - Why AGI systems will not be fanatical maximisers (unless trained by fanatical humans) by titotal
The Nonlinear Library: EA Forum - Un pódcast de The Nonlinear Fund
Categorías:
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why AGI systems will not be fanatical maximisers (unless trained by fanatical humans), published by titotal on May 17, 2023 on The Effective Altruism Forum.[Disclaimer: While I have dabbled in machine learning, I do not consider myself an expert.]IntroductionWhen introducing newcomers to the idea of AI existential risk, a typical story of destruction will involve some variation of the “paperclip maximiser†story. The idea is that some company wishes to use an AGI to perform some seemingly simple and innocuous task, such as producing paperclips. So they set the AI with a goal function of maximizing paperclips. But, foolishly, they haven’t realized that taking over the world and killing all the humans would allow it to maximize paperclips, so it deceives them into thinking it’s friendly until it gets a chance to defeat humanity and tile the universe with paperclips (or wiggles that the AI interprets as paperclips under it's own logic).What is often not stated in these stories is an underlying assumption about the structure of the AI in question. These AI’s are fixed goal utility function maximisers, hellbent on making an arbitrary number as high as possible, by any means necessary. I’ll refer to this model as “fanatical†AI, although I’ve seen other posts refer to them as “wrapper†AI, referring to their overall structure.Increasingly, the assumption that AGI’s will be fanatical in nature is being challenged. I think this is reflected in the “orthodox†and “reform†Ai split. This post was mostly inspired by Nostalgebraist's excellent “why optimise for fixed goals†post, although I think there is some crossover with the arguments of the “shard theory†folks.Humans are not fanatical AI. They do have goals, but the goals change over time, and can only loosely be described by mathematical functions. Traditional programming does not fit this description, being merely a set of instructions executed sequentially. None of the massively successful recent machine-learning based AI fits this description, as I will explain in the next section. In fact, nobody even knows how to make such a fanatical AI.These days AI is being designed by trial-and-error techniques. Instead of hand designing every action it makes, we’re jumbling it's neurons around and letting it try stuff until it finds something that works. The inner working of even a very basic machine learning model is somewhat opaque to us. What is ultimately guiding the AI development is some form of evolution: the strategies that work survive, the strategies that don’t get discarded.This is ultimately why I do not believe that most AI will end up as fanatical maximisers. Because in the world that an AI grows up in, trying to be a fanatical global optimizer is likely to get you killed.This post relies on two assumptions: that there will be a fairly slow takeoff of AI intelligence, and that world takeover is not trivially easy. I believe both to be true, but I won't cover my reasons here for the sake of brevity.In part 1, I flesh out the argument for why selection pressures will prevent most AI from becoming fanatical. In part 2, I will point out some ways that catastrophe could still occur, if AI is trained by fanatical humans.Part 1: Why AI won't be fanatical maximisersGlobal and local maxima[I've tried to keep this to machine learning 101 for easy understanding].Machine learning, as it exists today, can be thought of as an efficient trial and error machine. It contains a bazillion different parameters, such as the “weights†of a neural network, that go into one gigantic linear algebra equation. You throw in an input, compute the output of the equation, and “score†the result based on some goal function G. So if you were training an object recognition program, G might be “number of objects correctly identifiedâ€. ...
