EA - Why AGI systems will not be fanatical maximisers (unless trained by fanatical humans) by titotal

The Nonlinear Library: EA Forum - Un pódcast de The Nonlinear Fund

Categorías:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why AGI systems will not be fanatical maximisers (unless trained by fanatical humans), published by titotal on May 17, 2023 on The Effective Altruism Forum.[Disclaimer: While I have dabbled in machine learning, I do not consider myself an expert.]IntroductionWhen introducing newcomers to the idea of AI existential risk, a typical story of destruction will involve some variation of the â€œpaperclip maximiserâ€ story. The idea is that some company wishes to use an AGI to perform some seemingly simple and innocuous task, such as producing paperclips. So they set the AI with a goal function of maximizing paperclips. But, foolishly, they havenâ€™t realized that taking over the world and killing all the humans would allow it to maximize paperclips, so it deceives them into thinking itâ€™s friendly until it gets a chance to defeat humanity and tile the universe with paperclips (or wiggles that the AI interprets as paperclips under it's own logic).What is often not stated in these stories is an underlying assumption about the structure of the AI in question. These AIâ€™s are fixed goal utility function maximisers, hellbent on making an arbitrary number as high as possible, by any means necessary. Iâ€™ll refer to this model as â€œfanaticalâ€ AI, although Iâ€™ve seen other posts refer to them as â€œwrapperâ€ AI, referring to their overall structure.Increasingly, the assumption that AGIâ€™s will be fanatical in nature is being challenged. I think this is reflected in the â€œorthodoxâ€ and â€œreformâ€ Ai split. This post was mostly inspired by Nostalgebraist's excellent â€œwhy optimise for fixed goalsâ€ post, although I think there is some crossover with the arguments of the â€œshard theoryâ€ folks.Humans are not fanatical AI. They do have goals, but the goals change over time, and can only loosely be described by mathematical functions. Traditional programming does not fit this description, being merely a set of instructions executed sequentially. None of the massively successful recent machine-learning based AI fits this description, as I will explain in the next section. In fact, nobody even knows how to make such a fanatical AI.These days AI is being designed by trial-and-error techniques. Instead of hand designing every action it makes, weâ€™re jumbling it's neurons around and letting it try stuff until it finds something that works. The inner working of even a very basic machine learning model is somewhat opaque to us. What is ultimately guiding the AI development is some form of evolution: the strategies that work survive, the strategies that donâ€™t get discarded.This is ultimately why I do not believe that most AI will end up as fanatical maximisers. Because in the world that an AI grows up in, trying to be a fanatical global optimizer is likely to get you killed.This post relies on two assumptions: that there will be a fairly slow takeoff of AI intelligence, and that world takeover is not trivially easy. I believe both to be true, but I won't cover my reasons here for the sake of brevity.In part 1, I flesh out the argument for why selection pressures will prevent most AI from becoming fanatical. In part 2, I will point out some ways that catastrophe could still occur, if AI is trained by fanatical humans.Part 1: Why AI won't be fanatical maximisersGlobal and local maxima[I've tried to keep this to machine learning 101 for easy understanding].Machine learning, as it exists today, can be thought of as an efficient trial and error machine. It contains a bazillion different parameters, such as the â€œweightsâ€ of a neural network, that go into one gigantic linear algebra equation. You throw in an input, compute the output of the equation, and â€œscoreâ€ the result based on some goal function G. So if you were training an object recognition program, G might be â€œnumber of objects correctly identifiedâ€. ...

Visit the podcast's native language site