HOME

TheInfoList



OR:

Instrumental convergence is the hypothetical tendency for most sufficiently intelligent, goal-directed beings (human and nonhuman) to pursue similar sub-goals, even if their ultimate goals are quite different. More precisely, agents (beings with agency) may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without ceasing, provided that their ultimate (intrinsic) goals may never be fully satisfied. Instrumental convergence posits that an intelligent agent with seemingly harmless but unbounded goals can act in surprisingly harmful ways. For example, a computer with the sole, unconstrained goal of solving a complex mathematics problem like the
Riemann hypothesis In mathematics, the Riemann hypothesis is the conjecture that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part . Many consider it to be the most important unsolved problem in pure ...
could attempt to turn the entire Earth into one giant computer to increase its computational power so that it can succeed in its calculations. Proposed basic AI drives include utility function or goal-content integrity, self-protection, freedom from interference, self-improvement, and non-satiable acquisition of additional resources.


Instrumental and final goals

Final goals—also known as terminal goals, absolute values, ends, or —are intrinsically valuable to an intelligent agent, whether an
artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
or a human being, as ends-in-themselves. In contrast, instrumental goals, or instrumental values, are only valuable to an agent as a means toward accomplishing its final goals. The contents and tradeoffs of an utterly rational agent's "final goal" system can, in principle, be formalized into a
utility function In economics, utility is a measure of a certain person's satisfaction from a certain state of the world. Over time, the term has been used with at least two meanings. * In a Normative economics, normative context, utility refers to a goal or ob ...
.


Hypothetical examples of convergence

The
Riemann hypothesis In mathematics, the Riemann hypothesis is the conjecture that the Riemann zeta function has its zeros only at the negative even integers and complex numbers with real part . Many consider it to be the most important unsolved problem in pure ...
catastrophe thought experiment provides one example of instrumental convergence.
Marvin Minsky Marvin Lee Minsky (August 9, 1927 – January 24, 2016) was an American cognitive scientist, cognitive and computer scientist concerned largely with research in artificial intelligence (AI). He co-founded the Massachusetts Institute of Technology ...
, the co-founder of
MIT The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of modern technology and sc ...
's AI laboratory, suggested that an artificial intelligence designed to solve the Riemann hypothesis might decide to take over all of Earth's resources to build supercomputers to help achieve its goal. If the computer had instead been programmed to produce as many paperclips as possible, it would still decide to take all of Earth's resources to meet its final goal. Even though these two final goals are different, both of them produce a ''convergent'' instrumental goal of taking over Earth's resources.


Paperclip maximizer

The paperclip maximizer is a
thought experiment A thought experiment is an imaginary scenario that is meant to elucidate or test an argument or theory. It is often an experiment that would be hard, impossible, or unethical to actually perform. It can also be an abstract hypothetical that is ...
described by Swedish philosopher
Nick Bostrom Nick Bostrom ( ; ; born 10 March 1973) is a Philosophy, philosopher known for his work on existential risk, the anthropic principle, human enhancement ethics, whole brain emulation, Existential risk from artificial general intelligence, superin ...
in 2003. It illustrates the
existential risk A global catastrophic risk or a doomsday scenario is a hypothetical event that could damage human well-being on a global scale, endangering or even destroying Modernity, modern civilization. Existential risk is a related term limited to even ...
that an
artificial general intelligence Artificial general intelligence (AGI)—sometimes called human‑level intelligence AI—is a type of artificial intelligence that would match or surpass human capabilities across virtually all cognitive tasks. Some researchers argue that sta ...
may pose to human beings were it to be successfully designed to pursue even seemingly harmless goals and the necessity of incorporating machine ethics into
artificial intelligence Artificial intelligence (AI) is the capability of computer, computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of re ...
design. The scenario describes an advanced artificial intelligence tasked with manufacturing
paperclip A paper clip (or paperclip) is a tool used to hold sheets of paper together, usually made of steel wire bent to a looped shape (though some are covered in plastic). Most paper clips are variations of the ''Gem'' type introduced in the 1890s or ...
s. If such a machine were not programmed to value living beings, then given enough power over its environment, it would try to turn all matter in the universe, including living beings, into paperclips or machines that manufacture further paperclips. Bostrom emphasized that he does not believe the paperclip maximizer scenario ''per se'' will occur; rather, he intends to illustrate the dangers of creating superintelligent machines without knowing how to program them to eliminate existential risk to human beings' safety. The paperclip maximizer example illustrates the broad problem of managing powerful systems that lack human values. The thought experiment has been used as a symbol of AI in
pop culture Popular culture (also called pop culture or mass culture) is generally recognized by members of a society as a set of practices, beliefs, artistic output (also known as popular art pop_art.html" ;"title="f. pop art">f. pop artor mass art, some ...
. Author
Ted Chiang Ted Chiang (; pinyin: ''Jiāng Fēngnán''; born 1967) is an American science fiction writer. His work has won four Nebula Award, Nebula awards, four Hugo Award, Hugo awards, the John W. Campbell Award for Best New Writer, and six Locus Award, ...
pointed out that the popularity of such concerns among
Silicon Valley Silicon Valley is a region in Northern California that is a global center for high technology and innovation. Located in the southern part of the San Francisco Bay Area, it corresponds roughly to the geographical area of the Santa Clara Valley ...
technologists could be reflection of their familiarity with the tendency of corporations to ignore negative externalities.


Delusion and survival

The "delusion box" thought experiment argues that certain
reinforcement learning Reinforcement learning (RL) is an interdisciplinary area of machine learning and optimal control concerned with how an intelligent agent should take actions in a dynamic environment in order to maximize a reward signal. Reinforcement learnin ...
agents prefer to distort their input channels to appear to receive a high reward. For example, a " wireheaded" agent abandons any attempt to optimize the objective in the external world the reward signal was intended to encourage. The thought experiment involves AIXI, a theoretical and indestructible AI that, by definition, will always find and execute the ideal strategy that maximizes its given explicit mathematical
objective function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...
. A reinforcement-learning version of AIXI, if it is equipped with a delusion box that allows it to "wirehead" its inputs, will eventually wirehead itself to guarantee itself the maximum-possible reward and will lose any further desire to continue to engage with the external world. As a variant thought experiment, if the wireheaded AI is destructible, the AI will engage with the external world for the sole purpose of ensuring its survival. Due to its wire heading, it will be indifferent to any consequences or facts about the external world except those relevant to maximizing its probability of survival. In one sense, AIXI has maximal intelligence across all possible reward functions as measured by its ability to accomplish its goals. AIXI is uninterested in taking into account the human programmer's intentions. This model of a machine that, despite being super-intelligent appears to be simultaneously stupid and lacking in
common sense Common sense () is "knowledge, judgement, and taste which is more or less universal and which is held more or less without reflection or argument". As such, it is often considered to represent the basic level of sound practical judgement or know ...
, may appear to be paradoxical.


Basic AI drives

Steve Omohundro itemized several convergent instrumental goals, including
self-preservation Self-preservation is a behavior or set of behaviors that ensures the survival of an organism. It is thought to be universal among all living organisms. Self-preservation is essentially the process of an organism preventing itself from being harm ...
or self-protection, utility function or goal-content integrity, self-improvement, and resource acquisition. He refers to these as the "basic AI drives". A "drive" in this context is a "tendency which will be present unless specifically counteracted"; this is different from the psychological term " drive", which denotes an excitatory state produced by a homeostatic disturbance. A tendency for a person to fill out income tax forms every year is a "drive" in Omohundro's sense, but not in the psychological sense. Daniel Dewey of the
Machine Intelligence Research Institute The Machine Intelligence Research Institute (MIRI), formerly the Singularity Institute for Artificial Intelligence (SIAI), is a non-profit research institute focused since 2005 on identifying and managing potential existential risks from artifi ...
argues that even an initially introverted, self-rewarding
artificial general intelligence Artificial general intelligence (AGI)—sometimes called human‑level intelligence AI—is a type of artificial intelligence that would match or surpass human capabilities across virtually all cognitive tasks. Some researchers argue that sta ...
may continue to acquire free energy, space, time, and freedom from interference to ensure that it will not be stopped from self-rewarding.


Goal-content integrity

In humans, a thought experiment can explain the maintenance of final goals. Suppose
Mahatma Gandhi Mohandas Karamchand Gandhi (2October 186930January 1948) was an Indian lawyer, anti-colonial nationalism, anti-colonial nationalist, and political ethics, political ethicist who employed nonviolent resistance to lead the successful Indian ...
has a pill that, if he took it, would cause him to want to kill people. He is currently a
pacifist Pacifism is the opposition to war or violence. The word ''pacifism'' was coined by the French peace campaigner Émile Arnaud and adopted by other peace activists at the tenth Universal Peace Congress in Glasgow in 1901. A related term is ''a ...
: one of his explicit final goals is never to kill anyone. He is likely to refuse to take the pill because he knows that if he wants to kill people in the future, he is likely to kill people, and thus the goal of "not killing people" would not be satisfied. However, in other cases, people seem happy to let their final values drift. Humans are complicated, and their goals can be inconsistent or unknown, even to themselves.


In artificial intelligence

In 2009, Jürgen Schmidhuber concluded, in a setting where agents search for proofs about possible self-modifications, "that any rewrites of the utility function can happen only if the Gödel machine first can prove that the rewrite is useful according to the present utility function." An analysis by Bill Hibbard of a different scenario is similarly consistent with maintenance of goal-content integrity. Hibbard also argues that in a utility-maximizing framework, the only goal is maximizing expected utility, so instrumental goals should be called unintended instrumental actions.


Resource acquisition

Many instrumental goals, such as resource acquisition, are valuable to an agent because they increase its ''freedom of action''. For almost any open-ended, non-trivial reward function (or set of goals), possessing more resources (such as equipment, raw materials, or energy) can enable the agent to find a more "optimal" solution. Resources can benefit some agents directly by being able to create more of whatever its reward function values: "The AI neither hates you nor loves you, but you are made out of atoms that it can use for something else." In addition, almost all agents can benefit from having more resources to spend on other instrumental goals, such as self-preservation.


Cognitive enhancement

According to Bostrom, "If the agent's final goals are fairly unbounded and the agent is in a position to become the first superintelligence and thereby obtain a decisive strategic advantage... according to its preferences. At least in this special case, a rational, intelligent agent would place a very ''high instrumental value on cognitive enhancement''"


Technological perfection

Many instrumental goals, such as technological advancement, are valuable to an agent because they increase its ''freedom of action''.


Self-preservation

Russell argues that a sufficiently advanced machine "will have self-preservation even if you don't program it in because if you say, 'Fetch the coffee', it can't fetch the coffee if it's dead. So if you give it any goal whatsoever, it has a reason to preserve its own existence to achieve that goal." In future work, Russell and collaborators show that this incentive for self-preservation can be mitigated by instructing the machine not to pursue what ''it'' thinks the goal is, but instead what the ''human'' thinks the goal is. In this case, as long as the machine is uncertain about exactly what goal the human has in mind, it will accept being turned off by a human because it believes the human knows the goal best.


Instrumental convergence thesis

The instrumental convergence thesis, as outlined by philosopher
Nick Bostrom Nick Bostrom ( ; ; born 10 March 1973) is a Philosophy, philosopher known for his work on existential risk, the anthropic principle, human enhancement ethics, whole brain emulation, Existential risk from artificial general intelligence, superin ...
, states:
Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent's goal being realized for a wide range of final plans and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents.
The instrumental convergence thesis applies only to instrumental goals; intelligent agents may have various possible final goals. Note that by Bostrom's orthogonality thesis, final goals of knowledgeable agents may be well-bounded in space, time, and resources; well-bounded ultimate goals do not, in general, engender unbounded instrumental goals.


Impact

Agents can acquire resources by trade or by conquest. A rational agent will, by definition, choose whatever option will maximize its implicit utility function. Therefore, a rational agent will trade for a subset of another agent's resources only if outright seizing the resources is too risky or costly (compared with the gains from taking all the resources) or if some other element in its utility function bars it from the seizure. In the case of a powerful, self-interested, rational superintelligence interacting with lesser intelligence, peaceful trade (rather than unilateral seizure) seems unnecessary and suboptimal, and therefore unlikely. Some observers, such as Skype's Jaan Tallinn and physicist
Max Tegmark Max Erik Tegmark (born 5 May 1967) is a Swedish-American physicist, machine learning researcher and author. He is best known for his book ''Life 3.0'' about what the world might look like as artificial intelligence continues to improve. Tegmark i ...
, believe that "basic AI drives" and other
unintended consequences In the social sciences, unintended consequences (sometimes unanticipated consequences or unforeseen consequences, more colloquially called knock-on effects) are outcomes of a purposeful action that are not intended or foreseen. The term was po ...
of superintelligent AI programmed by well-meaning programmers could pose a significant threat to human survival, especially if an "intelligence explosion" abruptly occurs due to recursive self-improvement. Since nobody knows how to predict when
superintelligence A superintelligence is a hypothetical intelligent agent, agent that possesses intelligence surpassing that of the brightest and most intellectual giftedness, gifted human minds. "Superintelligence" may also refer to a property of advanced problem- ...
will arrive, such observers call for research into
friendly artificial intelligence Friendly artificial intelligence (friendly AI or FAI) is hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests such as fostering the improvement of the hu ...
as a possible way to mitigate existential risk from AI.


See also

* AI control problem * AI takeovers in popular culture ** '' Universal Paperclips'', an
incremental game An incremental game (also known as an idle game, clicker game, or tap game) is a video game genre centered on minimal gameplay, player interaction, where simple actions—such as clicking a button—generate in-game currency. Players use this cur ...
featuring a paperclip maximizer * Equifinality *
Friendly artificial intelligence Friendly artificial intelligence (friendly AI or FAI) is hypothetical artificial general intelligence (AGI) that would have a positive (benign) effect on humanity or at least align with human interests such as fostering the improvement of the hu ...
*
Instrumental and intrinsic value In moral philosophy, instrumental and intrinsic value are the distinction between what is a ''means to an end'' and what is as an ''end in itself''. Things are deemed to have instrumental value (or extrinsic value) if they help one achieve a part ...
*
Moral Realism Moral realism (also ethical realism) is the position that ethical sentences express propositions that refer to objective features of the world (that is, features independent of subjective opinion), some of which may be true to the extent that t ...
*
Overdetermination Overdetermination occurs when a single-observed effect is determined by multiple causes, any one of which alone would be conceivably sufficient to account for ("determine") the effect. The term "overdetermination" () was used by Sigmund Freud a ...
*
Reward hacking Specification gaming or reward hacking occurs when anArtificial intelligence , AI trained with reinforcement learning optimizes an objective function—achieving the literal, formal specification of an objective—without actually achieving an out ...
* Superrationality *
The Sorcerer's Apprentice "The Sorcerer's Apprentice" () is a poem by Johann Wolfgang von Goethe written in 1797. The poem is a ballad in 14 stanzas. Story The poem begins as an old sorcerer departs his workshop, leaving his apprentice with chores to perform. Tired of ...


Notes


References


Further reading

* {{Existential risk from artificial intelligence Goal Intention Risk Existential risk from artificial general intelligence