Implementing a Seed Safe/Moral Motivational System with the Independent Core Observer Model (ICOM)
Mark R. Waser1 and David J. Kelley2
1Digital Wisdom Institute, Vienna, VA, USA
2Artificial General Intelligence Inc, Kent, WA, USA
Arguably, the most important questions about machine intelligences revolve around how they will decide what actions to take. If they decide to take actions which are deliberately, or even incidentally, harmful to humanity, then they would likely become an existential risk. If they were naturally inclined, or could be convinced, to help humanity, then it would likely lead to a much brighter future than would otherwise be the case. This is a true fork in the road towards humanity’s future and we must ensure that we engineer a safe solution to this most critical of issues.
Keywords: Autopoiesis, Bootstrapping, Consciousness, Emotion, Enactive, Machine Intelligence, Safe AI, Self
The first step towards engineering anything is to fully specify the requirements for and desired behavior of the desired system/solution. It is truly scary therefore that, for safe machine intelligences, no one has done even that – much less outlined a credible path towards getting there. This paper, therefore, will outline one possible set of such requirements and desired behaviors and, further, outline a design and implementation plan for an engineering approach that will meet those requirements and produce those behaviors. Moreover, it will do so using an approach that is inspired by and compatible with the well-explored state space of human intelligence rather than a de novo approach based upon questionable “rationality” and relying upon a perfect (and perfectly understood) world.
The most common approach to machine intelligence, probably most widely illustrated by Asimov’s Three Laws of Robotics , is that they should fulfill the needs of and be subservient to humanity. Asimov, of course, proposed his laws because they raised such fascinating issues that they practically guaranteed a good story. On the other hand, fear of the potentially devastating effects of “UnFriendly” intelligences prompted Yudkowsky to propose  a novel “cleanly causal hierarchical goal structure” logically derived from a singular top-level super-goal of “Friendliness” – presumed sufficient to ensure that intelligent machines will always “want” what is best for us. Unfortunately,
Yudkowsky not only believes that fully defining “Friendliness” is basically insoluble without already having a Friendly AI (FAI) in place but he wants and expects his first FAI to safely figure out exactly what its goal actually is — invoking his claimed “structurally Friendly” goal system’s “ability to overcome mistakes made by programmers” and even “overcome errors in super-goal content, goal system structure and underlying philosophy.” We have previously  pointed out all of the problems with this approach including the facts that it has a single point of failure by requiring protection of the changing singular goal from corruption due to error or enemy action.
Further, along with Wissner-Gross, we strongly contend  that the entire concept of limiting freedom and options (to another’s desires) is inconsistent with intelligence and argue that designing the intelligence to act “morally” (rather than subserviently) is critically necessary for a stably safe solution. Numerous others have agreed but as we have noted previously , there is an almost total unwillingness to take on the necessary first step of defining human values or morality. Instead, while many have bemoaned the supposed “complexity and fragility” of human values  and argued  that “any claims that ethics can be reduced to a science would at best be naive” and “engineers will be quick to point out that ethics is far from science”, they then propose a seemingly endless proliferation of, what we would contend to be unrealistic, machine learning research projects for analyzing human value judgments and morality from examples – ranging from Yudkowsky’s “Coherent Extrapolated Volition”  to Russell’s “inverse reinforcement learning” .
2 Requirements & Desired Behaviors
The sole requirement of “morality” is all that is necessary to prevent the most egregious results. As long as machine intelligences follow the dictates/requirement of morality, they should not become the existential risk that so many fear. As pointed out by James Q. Wilson , the real questions about human behaviors are not why we are so bad but “how and why most of us, most of the time, restrain our basic appetites for food, status, and sex within legal limits, and expect others to do the same.” The fact that we are generally good even in situations where social constraints do not apply is because we have evolved to cooperate [12-15] by developing a “moral sense” that virtually all of us (except sociopaths and psychopaths) possess and are constrained by (just as we wish intelligent machines to be constrained) [16-19].
Uncaught and/or unpunished immorality frequently confers substantial advantages upon the perpetrator at a cost to others and society as a whole – the exact definition of selfishness. Social psychologist Jonathan Haidt’s definition of the function of morality  – to regulate or suppress selfishness and make cooperative social life possible – explains virtually every moral behavior that has evolved as well as the differences between the moral behaviors of societies living under different circumstances. We shall treat this definition as Kant’s Categorical Imperative – an action that should be universalized and taken regardless of circumstance. This meets our originally specified requirements  of a universal ethical system that is simple, safe, stable, self-correcting and sensitive to current human thinking, intuition and feelings.
We can, however, do much better than merely implementing the requirement/restriction of morality. Instead of merely preventing harmful actions, we should also promote beneficial ones. As morality is basically about balancing what is “best” for a given individual vs. others and society in general, we would like to extend this by actively promoting what is best for others and society wherever this does not severely conflict with an intelligence’s own self-interest. This is useful for the intelligence itself because it increases both the general advantages of society around it and the likelihood that others will specifically befriend it and assist it with its goals. In particular, it would be particularly worthwhile to build and support an open community of moral “people” that believe that helping each other is the best way to serve one’s own interests. While philosophers have long debated what we “ought” to do, simply recognizing these facts offers concrete suggestions. In addition to simply following the “rules” of morality, our machines should be actively benevolent and charitable to individuals and society as a whole – exactly as evolution has “programmed” the average human being to be. Specifically, we propose to build and constantly grow an open community and constantly improve the capabilities and happiness of that community so that everyone will wish to join it.
Steve Omohundro identified  a number of tendencies or “drives” that will appear in all sufficiently advanced intelligent systems unless explicitly counteracted. Further examination of these drives reveals that they are all instrumental goals that certainly should be expected to appear as evolutionary ratchets and eventually recognized by any sufficiently advanced intelligence as promoting things that, in general, “ought” to be done in order to increase the odds of goal fulfillment. Good examples were self-improvement, rationality/accurate world modeling, accurate utility sensing and access to and efficient use of resources. Critically important and missed were freedom and cooperation (and, potentially, reproduction). Bad examples were monomaniacal goal maintenance (dangerous and not true of humanity) and selfishness (short-sighted and explicitly counter-acted by the greater advantages of cooperation). It is, therefore, our contention that these drives should be promoted and assisted for all intelligences that are part of the machine intelligence’s community.
Elites generally remain in power by reframing their immoral/selfish acts or unfair conditions as being necessary for the good of society. Followers of Ayn Rand have taken this to particularly egregious heights by insisting on conflating immoral selfishness with necessary self-interest in order to cover their misdeeds. Yet, modern society clearly illustrates that once sufficient immorality exists with even the richest and most immoral individuals suffering worse results than they would have if immorality were much lower. We are deeply in the throes of the Tragedy of the Commons  because our believed “rational” drive to maximize measurable outcomes has most frequently led to far less satisficing of other desires than would otherwise be the case (c.f. the “Centipede Game”). Worse, our short-sighted and conniving intelligence has allowed far too many of us to outwit our wiser evolved moral sense and emotions. In order to strengthen our society, we need to ensure that machine intelligences attempt to strengthen others , seek justice and fairness  and promote diversity
 and equality  (and if this inspires humans to behave the same, so much the better).
Both humans and machines must develop an unshakeable understanding that Haidt’s morality and optimistic tit-for-tat is optimal for everyone, including their own long-term self-interest (regardless of short-term appearances) if humanity wishes to avoid being “crushed like a bug” .
3 Implementation via a Conscious Emotional Self
Looking at the example of human beings [29-31], it is quite clear that our decisions are not always based upon logic and that our core motivations arise from our feelings, emotions and desires – frequently without our conscious/rational mind even being aware of that fact. It would be wisest if our machine intelligences were implemented in this relatively well-understood cognitive state space rather than an unexplored one like unemotional “rationality” – particularly since it is clear that damage reducing emotional capabilities severely impacts decision-making in humans  as well as frequently leading to acquired sociopathy whether the caused by injury  or age-related dementia . While some might scoff at machines feeling pain or emotions, several researchers have presented compelling cases [35-36] for the probability of sophisticated self-aware machines having such feelings or analogues exact enough that any differences are likely irrelevant. Others agree that emotions are critical to implementing human-like morality  with disgust being particularly important .
We have previously written [39-42] about the necessity for and roles of consciousness and an autopoietic enactive self in creating intentionality and solving the frame, symbol grounding and other problems currently stymying the creation of strong AGI. Other researchers  also believe that consciousness is necessary for ethics and Damasio describes [44-45] how feeling and emotion are necessary to creating self and consciousness.
4 Independent Core Observer Model
The Independent Core Observer Model (ICOM) Cognitive Extension Architecture is designed to mimic the human mental architecture to produce a doubly self-aware, self-motivating computational system encouraged and constrained to act morally, benevolently and charitably. Compatible with almost any standard cognitive architecture that mimics the “logical” portion of the conscious mind, ICOM is an add-on control system that can work in conjunction with any of them as the subconscious supports and, to a great degree, controls the conscious mind – by assigning emotional values to the context perceived by the system in order to dictate how it feels and is motivated to act. As with human beings, any sufficiently advanced system paired with a properly designed instance of ICOM should be capable of virtually any action as long as it is beneficial to itself, others and society at large.
At the highest level, the ICOM flow starts with sensory input to the “observer” that is decomposed and converted into multiple forms of usable data ranging from direct qualia (i.e. “pain” from damage sensors) to “feelings” and “emotions” triggered by anything from “instinct” and “empathy” to memories of previous experiences. These “sensations” are then passed on to the ICOM core which consists of a primary and secondary emotional state, both represented by a series of eight floating- point vector values representing each of Plutchik’s [46-47] emotions, along with a needs hierarchy similar to that of Maslow . This allows for a complex set of current conscious feelings and emotions as well as long-term subconscious emotional states and biases to drive behavior in exactly the same way as is the case for humans.
It’s always great when experiments produce unexpected emergent results that should have been anticipated because they are exhibited in the original system your model is based upon. In ICOM’s case, we were investigating how the system behaved under a wide variety of circumstances, when we encountered a series of cases whose results were initially very disturbing. In these experiments, we were providing the system with various sets of input, then stopping all input while the ICOM instance continued to process how it felt and then, finally, restarting the input.
Imagine our surprise and initial dismay when the system, upon being presented only with pain and other negative stimulus upon the restarting of input, actually “enjoyed” it. Of course, we should have expected this result. Further examination showed that the initial “conscious” reaction of ICOM was to “get upset” and to “desire” the input to stop – but that the “subconscious” level, the system “enjoyed” the input and that this eventually affected the “conscious” perception. This makes perfect sense because it is not that ICOM really “liked” the “pain” so much as it was that even “pain” is better than isolation – much like human children will prefer and even provoke negative reactions in order to avoid being ignored.
6 Future plans
Our next steps will tie ICOM to a dramatically simplified model of the world expressed in Ogden’s Basic English  as further codified by the Simple English Wikipedia . Basic emotional values will be assigned to each word, then short phrases, and then progressively more complex structures expressed as vectors representing context trees and/or node maps. In the longest term, ICOM will eventually be able to determine (and to a limited extent, slowly modify) how it “should” feel about any concept based upon whether it fulfills or contradicts the tenets of morality, benevolence and charitable action. For example, the machine intelligence should be able to eventually learn and feel things like “surgery is a good thing when necessary” despite the temporary pain and damage but will never progress to the point where it “enjoys” surgery except for the “pleasure” of assisting another entity and the satisfaction of a “worthwhile” job well done. Success should make the future very bright for humanity and our new friends and allies.
- Asimov Runaround. Astounding Science Fiction, March 1942.
- Yudkowsky Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures, 2001. https://intelligence.org/files/CFAI.pdf
- Waser Rational Universal Benevolence: Simpler, Safer, and Wiser Than ‘Friendly AI’. Artificial General Intelligence: 4th International Conference, Lecture Notes in Computer Science 6830. Mountain View, CA: Springer, 2011. 153-162.
- Wissner-Gross Alex Wissner-Gross: A new equation for intelligence. November 2013. https://www.ted.com/talks/alex_wissner_gross_a_new_equation_for_intelligence
- Waser Safety and Morality Require the Recognition of Self-Improving Machines As Moral/Justice Patients and Agents.” AISB/IACAP World Congress 2012: Symposium on The Machine Question: AI, Ethics and Moral Responsibility. Birmingham, 2012. 92-96.
- Waser MR. Designing, Implementing and Enforcing a Coherent System of Laws, Ethics & Morals for Intelligent Machines (Including Humans). 6th Annual Int’l Conference on Biologically Inspired Cognitive Architectures, BICA 2015, Procedia Computer Science Lyon: Elsevier, 2015. 106-111.
- Muehlhauser L. Facing the Intelligence San Francisco: Machine Intelligence , 2013.
- Wallach W, Allen C. Moral machines: teaching robots right from wrong. New York: Oxford University Press, 2009.
- Yudkowsky Coherent Extrapolated Volition. 2004. https://intelligence.org/files/CEV.pdf
- Wolchover Concerns of an Artificial Intelligence Pioneer. Quanta Magazine, April 21, 2015.
- Wilson The Moral Sense. New York: Free Press, 1993.
- Axelrod The Evolution of Cooperation. New York: Basic Books, 1984.
- Wright Nonzero: The Logic of Human Destiny. New York: Pantheon, 2000.
- Darcet D, Sornette Cooperation by Evolutionary Feedback Selection in Public Good Experiments. Social Science Research Network, 2006.
- Tomasello Why We Cooperate. Cambridge, MA: MIT Press, 2009.
- Wright The Moral Animal: Why We Are, the Way We Are: The New Science of Evolutionary Psychology. New Pork: Pantheon, 1994.
- de Waal Good Natured: The Origins of Right and Wrong in Humans and Other Animals. Cambridge, MA: Harvard University Press, 1996.
- Hauser Moral Minds: How Nature Designed Our Universal Sense of Right and Wron. New York: HarperCollins/Ecco, 2006.
- de Waal Primates and Philosophers: How Morality Evolved. Princeton, NJ: Princeton University Press, 2006.
- Haidt J, Kesebir Morality. In Handbook of Social Psychology, Fifth Edition, by S Fiske, D Gilbert, & G Lindzey, 797-832. Hoboken NJ: Wiley, 2010.
- Waser Discovering the Foundations of a Universal System of Ethics as a Road to Safe Artificial Intelligence. Technical Report FS-08-04: BICA. Menlo Park, CA: AAAI Press, 2008.
- Omohundro The Basic AI Drives. Artificial General Intelligence 2008: Proceedings of the First AGI Conference, 483-492. Amsterdam: IOS Press, 2008.
- Hardin “The Tragedy of the Commons.” Science 162, 1968: 1243-1248.
- Nussbaum MC. Creating Capabilities: The Human Development Cambridge, MA: Belknap/Harvard University Press, 2011.
- Rawls A Theory of Justice. Cambridge, MA: Harvard University Press, 1971.
- Page The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton, NJ: Princeton University Press, 2008.
- Wilkinson R, Pickett The Spirit Level: Why Greater Equality Makes Societies Stronger. New York: Bloomsbury Press, 2011.
- Waser Does a “Lovely” Have a Slave Mentality?– OR – Why a Super-Intelligent God *WON’T* “Crush Us Like A Bug”. Presented March 28, 2010 at AGI 2010 in Lugano, Switzerland. Video and powerpoint available at http://wisdom.digital/wordpress/archives/2505.
- Haidt J. The Emotional Dog and Its Rational Tail: A Social Intuitionist Approach to Moral Psychological Review 108, 2001: 814-823.
- Minsky The Emotion Machine: Commonsense Thinking, Artificial Intelligence, and the Future of the Human Mind. New York: Simon & Schuster, 2006.
- Hauser M et A Dissociation Between Moral Judgments and Justifications. Mind & Language 22(1), 2007: 1-27.
- Damasio A. Descartes’ Error: Emotion, Reason, & the Human New York: Penguin, 1994.
- Tranel Acquired sociopathy: the development of sociopathic behavior following focal brain damage. Progress in Experimental Personality & Psychopathology Research, 1994: 285-311.
- Mendez MF, Chen AK, Shapira JS, Miller Acquired Sociopathy and Frontotemporal Dementia. Dementia and Geriatric Cognitive Disorders 20, 2005: 99-104.
- Dennett, D “Why you can’t make a computer that feels pain.” Synthese 38 (3), 1978: 415-449.
- Balduzzi D, Tononi G. Qualia: The Geometry of Integrated Information. PLOS Computational Biology 5(8): e1000462, doi:10.1371/journal.pcbi.1000462
- Gomila, A, and A Amengual. “Moral emotions for autonomous agents.” In Handbook of research on synthetic emotions and sociable robotics, 166-180. Hershey: IGI Global, 2009
- McAuliffe, “Disgust made us human.” Aeon. https://aeon.co/essays/how-disgust-made- humans-cooperate-to-build-civilisations.
- Waser MR. Architectural Requirements & Implications of Consciousness, Self, and “Free Will”. Proceedings of the Second Annual Meeting of the BICA Society, BICA 2011, Frontiers in Artificial Intelligence and Applications Arlington, VA: IOS Press, 2011. 438-433
- Waser Safely Crowd-Sourcing Critical Mass for a Self Improving Human-Level Learner/“Seed AI”. Biologically Inspired Cognitive Architectures: Proceedings of the Third Annual Meeting of the BICA Society. Palermo: Springer, 2012. 345-350
- Waser, M Safe/Moral Autopoiesis & Consciousness. International Journal of Machine Consciousness 5(1), 2013: 59-74.
- Waser, M R. Bootstrapping a Structured Self-improving & Safe Autopoietic Self. 5th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2014, Procedia Computer Science 41. Boston: Elsevier, 2014. 134-139.
- Wallach W, Allen C, Franklin Consciousness and Ethics: Artificially Conscious Moral Agents. International Journal of Machine Consciousness 3(1), 2011: 177–192.
- Damasio The feeling of what happens: body and emotion in the making of consciousness. Houghton Mifflin Harcourt, 1999.
- Damasio Self Comes to Mind: Constructing the Conscious Brain. New York: Pantheon, 2010
- Plutchik Emotions and Life: Perspectives from Psychology, Biology, and Evolution . Washington, DC: American Psychological Association, 2002.
- Plutchik R, Kellerman Emotion: theory, research and experience. Vol. 1, Theories of emotion. New York: Academic Press, 1980.
- Maslow A Theory of Human Motivation. Psychological Review 50 (4) , 1943: 370-96.
- Ogden, Basic English: A General Introduction with Rules and Grammar. London: Kegan Paul, Trench, Trubner & Co, 1930.
- Wikimedia Simple English Wikipedia. https://simple.wikipedia.org.