A person with a collaborative mind is not likely to believe which of the following statements?

  • Journal List
  • iScience
  • v.24[2]; 2021 Feb 19
  • PMC7890414

iScience. 2021 Feb 19; 24[2]: 102130.

Summary

Robots are likely to become important social actors in our future and so require more human-like ways of assisting us. We state that collaboration between humans and robots is fostered by two cognitive skills: intention reading and trust. An agent possessing these abilities would be able to infer the non-verbal intentions of others and to evaluate how likely they are to achieve their goals, jointly understanding what kind and which degree of collaboration they require. For this reason, we propose a developmental artificial cognitive architecture that integrates unsupervised machine learning and probabilistic models to imbue a humanoid robot with intention reading and trusting capabilities. Our experimental results show that the synergistic implementation of these cognitive skills enable the robot to cooperate in a meaningful way, with the intention reading model allowing a correct goal prediction and with the trust component enhancing the likelihood of a positive outcome for the task.

Subject areas: Artificial Intelligence, Human-Centered Computing, Human-Computer Interaction

Graphical Abstract

Introduction

Human beings are social creatures held together by communal bonds and organized into complex social structures. This tendency to aggregation and to work as part of groups is not to be dismissed as a quirk but rather constitutes an important characteristic that has been proved being at least partially hardwired in our genes [Ebstein et al., 2010]. The ability to collaborate with others to achieve common goals has been one of the key factors for our success as a species.

Researchers in the social sciences agree to distinguish collaboration from cooperation, as they represent two different types of interaction [Roschelle and Teasley, 1995]. In particular, we refer to “cooperation” when the involved parties work toward a shared goal by solving sub-tasks individually and then assembling their partial results. In contrast, “collaboration” refers to the act of dividing the task among the participants, who then engage in a mutual, coordinated effort to solve the problem together. Given these definitions, the main difference between cooperation and collaboration is that the latter implies a deeper level of interaction, shared understanding, and coordination [Dillenbourg, 1999].

A body of scientific evidence points toward the early development of collaborative behaviors in human infants: the latter are, in fact, able to engage in coordinate actions as early as their first birthday. This ability continues to evolve through time and by experience, in parallel to their cognitive development, and by the 30th month of age, they become able to perform complementary actions [Henderson and Woodward, 2011].

Our hypothesis on collaborative intelligence stems from two statements. Bauer et al. [2008] break the collaboration process in a series of sequential tasks, namely perception, intention estimation, planning, and joint action. In other words, before an agent can collaborate with another, there is the need of recognizing the pursued goal and to select appropriate actions to maximize the chances of a successful outcome. Groom and Nass [2007] declare that trust is an essential component to successfully perform joint activities with common tasks. From these premises, we state that the two cognitive skills essential for successful collaboration are “intention reading” and “trust”.

We refer to intention reading as the ability to understand the goals of other agents based on the observation of their physical cues, for example, body posture, movements, and gaze direction. Generally speaking, humans do not perceive biological motion as meaningless trajectories through space but instead are able to view it in relation to an end objective [Malle et al., 2001]. The cognitive process of estimating the intention is performed by dividing the observed continuous stream of actions in discrete intervals which are then individually decoded [Baldwin and Baird, 2001]. By giving us the ability to understand what is happening around us, this ability lays the foundation of social awareness [Woodward et al., 2009], allowing us to reason about the behavior of other agents in our environment and acting accordingly.

Trust shares with intention reading the same importance in scaffolding our social abilities, as it affects every interaction we experience. Mayer et al. [1995] define it as the willingness of the trustor to rely on the actions of the trustee despite the former not having any control of the latter. The ability to correctly direct our trust has deep consequences on the success of our relationships, in our personal safety [Das and Teng, 2004] and in team cooperation [Jones and George, 1998].

Both these cognitive skills are not innate in humans, meaning that newborns do not automatically possess them. Instead, human phylogeny has provided each individual the tools to develop them in the scope of one's personal ontogeny, meaning that these traits will gradually arise during childhood and will refine themselves through social interactions and experiences, until reaching their full maturity. In particular, intention reading is facilitated in human beings by the mirror neuron system present in their brain [Rizzolatti and Craighero, 2004]: a collection of neurons which activate both when the individual executes an action or when it observes a similar action being performed by someone else. By mapping the visual perception with the organism's own motor representation, this neurological system enables action understanding and imitation learning [Gallese and Goldman, 1998]. This system is tuned by epigenetic processes during postnatal development [Ferrari et al., 2013], so it is correct to say that intention reading is perfected through experience; this is also confirmed by the fact that children are initially able to recognize biological motion, with time they start associating social cues such as biological motion and eye gaze [Tomasello et al., 2005] to goals and finally manage to understand the choice of plans [Woodward et al., 2009]. In contrast, the developmental evolution of trust is still under debate. Erikson [1993] has theorized the stages of psychological development, the first of which is known as the “trust vs mistrust” stage that occurs around the second year of age: during this phase, the child's propensity to trust is directly influenced by the quality of cares he or she receives. This happens because infants depend entirely on their caregivers for sustenance, so if their needs are regularly satisfied, they will learn that the world is a secure and trustable place, or vice versa.

Both of these cognitive traits depend on a third one: theory of mind [ToM], the ability to understand that other beings around us possess different sets of mental states, such as beliefs, goals, and desires [Vanderbilt et al., 2011]. Mastery of this capacity is a fundamental requirement for both the collaborative skills we are analyzing. In particular, intention reading can be performed only if it is possible to determine which desires are driving the actions of another agent, and trust can be estimated only if it is possible to compare beliefs and motivations to verify their alignment with one's owns [Premack and Woodruff, 1978]. This dependency is emphasized by the fact that both these skills fully mature around the fifth year of age, which is also the same age at which ToM fully develops [Tomasello et al., 2005; Vanderbilt et al., 2011; Wellman et al., 2001].

Given the importance of collaborative behavior for humans, it seems natural to transpose its value to artificial agents, in particular to social robots which are expected to act in human-shaped environments interacting with us on a daily base. In particular, if we aim at designing robots able to blend themselves in our present and future societies, a strict requirement for them will be to adapt to our social expectations and fit in our natural environments. In other words, in a future where interactions between humans and robots will be more common, we do not want to robotize people, but we hope to make the minds of these mechanical companions a little more human. For this purpose, collaborative intelligence may be one of the most important skills for these agents to possess.

Collaborative intelligence, under a technical perspective, can be defined as a multi-agent system where each agent has the autonomy to contribute to a problem-solving network [Gill, 2012]. For the purpose of this paper, we are interested in considering the special case of two agents, one human and a robot, which are collaborating to complete some task. In this work, we intend to expand the general collaboration architecture for cognitive robotics provided by Bauer et al. [2008] adding trust estimation between the intention reading and the action planning steps. Our proposed architecture is shown in Figure 1.

The proposed artificial cognitive architecture which integrates intention reading and trust mechanisms for the purpose of collaborative intelligence

Please refer to the Transparent methods section of the Supplemental information for the detail of each component.

Overview of the mechanisms leading to joint action

Expanded from Bauer et al. [2008] through the addition of trust estimation.

The scientific community has been investigating computational models for artificial intention reading for many years, as this is an important skill for collaborative machines [Vinanzi et al., 2019a]. Some techniques seem to be more common than others, in particular hidden Markov models [Kelley et al., 2008] and Bayesian networks [BNs] [Dindo et al., 2015] seem to have gained a large consensus, as well as a wide range of machine learning methods such as neural networks [Singh et al., 2016] and support vector machines [Manzi et al., 2017]. Hybrid approaches have also been investigated, for example, Granada et al. [1995] used a neural architecture to extract low-level features from camera images which are then used in a probabilistic plan recognizer. The use of embodied agents such as robots for the exploration of intention reading capabilities is promoted by Sciutti et al. [2015], who underline the importance of sharing the same action space with the human partner. Robots have in fact been successfully used to investigate intention understanding and sharing in turn-based games that possess a strong learning-by-demonstration aspect [Dominey and Warneken, 2011; Jansen and Belpaeme, 2006].

Trust has also been extensively researched in the context of human-robot interaction [HRI], the main reason being that the quality of the interaction is usually shaped by how trustworthy the robot appears to the human. This means that even a perfect machine will not be able to perform at its fullest if the human partner is not willing to trust its decisions and actions. This problem has generated a branch of research focused on determining which behavioral and esthetic elements of a robot can influence its perception from the people who interact with it, in other words there is a vast literature of human-centered trust in HRI [Floyd et al., 2014; Zanatto, 2019]. Here, we propose that the opposite, i.e., the trustworthiness of a human estimated by a robot, is also fundamental during a collaborative activity: whereas a robot can fail, so can a person, and it is important to keep this in mind when performing decisions that will try to optimize the achievement of the shared goal. Unfortunately, literature is scarce for what concerns this kind of robot-centered trust. Patacchiola and Cangelosi [2016] proposed a probabilistic model which unifies trust and ToM to be used in a simulation of Vanderbilt's experiment about children's trust willingness [Vanderbilt et al., 2011]. This model has been subsequently expanded into a cognitive architecture for a humanoid robot [Vinanzi et al., 2019b] enhanced with an episodic memory system. The latter is a subcategory of the long-term declarative memory that stores memories about temporally dated episodes or events and temporal-spatial relations among them [Tulving, 1972]. This feature is relevant because the positive influence of one's personal history on the cognitive capabilities has been proven other than for the biological brain also for artificial agents [De Castro and Gudwin, 2010; Jockel et al., 2008]. Episodic memory is also the key to reproduce the “trust vs mistrust” stage theorized by Erikson [1993] in a developmental cognitive system.

In this paper, we present an integration of our previous studies on artificial intention reading [Vinanzi et al., 2019a, 2020] and trust estimation [Vinanzi et al., 2019b] to create a collaborative intelligent embodied agent able to direct its efforts in providing assistance in a shared activity with a human partner. Through the use of this computational model, we aim at demonstrating the positive influence of trust on the synergistic efforts of the two agents. Given this premise, our main contribution comes in the form of the novel cognitive artificial architecture for human-robot collaboration shown in Figure 2, capable to perform both intention reading and trust estimations on human partners. To achieve this, we have made use of a set of state-of-the-art techniques ranging from unsupervised machine learning methodologies to probabilistic modeling. We have validated this architecture through a set of simulated HRI experiments involving several humans and a robot collaborating in a block placing game. The results we collected demonstrate that the pairing of these two cognitive skills can greatly enhance the outcome of the joint action by providing the robot with some decision-making parameters that are used to fine-tune the assistive behavior.

Results

Experiments

Many of the considerations made throughout this section refer directly to the methodology involved in this line of research. For this reason, we invite the reader to consult the Transparent methods section of the Supplemental information to gain a better insight on the experiments that are described below.

Having already validated the performance of our intention reading [Vinanzi et al., 2020] and trust [Vinanzi et al., 2019b] models in our previous publications, the aim of our current experiment is to verify our hypothesis on the positive influence of trust mechanisms on the overall collaborative performance. For this reason, we are going to use the same experimental setup of our previous investigation on robotic mind reading and compare the results achieved from our new, integrated architecture [referred as trust architecture, or TA] with the baseline obtained from our previous intention reading model [Vinanzi et al., 2020] which we will hereafter be referring to as the no trust architecture, or NTA.

The experimental setup is shown in Figure 3A. A Sawyer robot and a human are facing each other on the two sides of a table. Four different colored blocks are positioned on the corners of the playing area; anticlockwise from the top left they are blue [B], orange [O], red [R], and green [G]. The central area of the table is denoted as the building space.

Experimental setup for the block building game

[A] Schematic of the playing table, depicting the position of the 4 colored blocks: blue [B], orange [O], red [R], and green [G].

[B] The 8 admissible block sequences obtained by picking blocks alternatively from each side. These sequences are the goals for this scenario.

The aim of the game is to use the 4 available blocks to form a line, following a simple rule only known by the demonstrator: the blocks must be chosen one by one from a different side of the table [left or right]. The 8 legal combinations of blocks are reported in Figure 3B and each of them forms a goal for our intention reading purposes. During the training phase, the human will demonstrate each goal once and the robot will learn to associate the demonstrator's body posture and eye gaze direction to their intentions. Additionally, the robot will always naively trust its teacher, while the beliefs regarding other subsequent partners will be generated using episodic memory. For more details on our adopted methodology, please refer to the Transparent methods section of the Supplemental information.

During the execution phase, the robot will follow the workflow described in the Transparent methods [Section S1.3]. In our setting, a total output represents a full line of 4 colored blocks, while the partial output [PO] is the sequence of cubes that the human has arranged before the artificial agent was able to perform intention reading. If the human is trusted or the PO is valid, the robot will collect the next predicted blocks and hand them over to him or her. If not, the robot will position the blocks itself on the building area in what it considers to be the correct order, attempting to rectify the errors that have been committed. In the latter case, the robot will also offer an explanation of why it thinks the PO is invalid [in our experimental setting, this happens when two blocks from the same side of the table are placed one next to another].

In the scope of this experiment, an interaction will be considered successful if its outcome is a structure that follows the game's rules, in other words one of those listed in Figure 3B. This is true even if the true goal was not the one predicted by the robot: this is because we do not wish to measure the performance of the intention reading model [which has already been quantified] but rather we want to evaluate the collaborative effort itself. From here on, we will define a “positive” interaction one in which the human correctly achieves a valid goal and a “negative” one where he or she takes an unsuccessful course of action. The human might violate the rules more or less intentionally, but for our purposes, we consider both these cases as a failure that will lead to a decrease of their trust level.

To verify and measure the trust model's impact on the collaborative effort driven by the intention reading architecture, we have conducted a batch of simulated experiments [The use of virtual agents in a simulated environment is a COVID-19 lockdown contingency choice] using a virtual robot which has been modeled in accordance to the empirical data collected during our latest experiment on intention reading [Vinanzi et al., 2020].

After training the robot, we let it interact with a set of simulated humans which possess different behavior patterns. It is important to note that in most of these experiments we do not make an explicit use of episodic memory. This is because, having only familiarized with the demonstrator, the robot would generate a fully trustful network for the novel informant because it will be sampling episodes from a batch of positive memories. This mean that, for the purpose of the simulated experiment, we can simply assume that the robot will naively trust its new informant. Thereafter, we continue not using the memory system because we do not want our results to depend on the order in which the robot has experienced the users, rather we want to study how each robot would respond to each user independently. For completeness, one of our simulated humans is initialized with a distrustful BN to simulate the effects of the episodic memory.

We have divided the simulated humans in two groups. The first one involves the “deterministic” agents, which have a fixed behavioral pattern, as follows:

  • H1: always negative;

  • H2: 50% positive, then 50% negative;

  • H3: 50% negative, then 50% positive;

The second group categorizes the “stochastic” agents: the latter possess different success-to-failure ratios, but the order of their actions is randomized and not fixed. In particular, we have the following:

  • H4: 50% success rate;

  • H5: 80% success rate;

  • H6: 20% success rate;

  • H7: 80% success rate, but initialized with a distrustful BN;

The deterministic humans have been tested through a batch of 100 iterations each. For the stochastic ones, we have performed 10 random initializations, and for each of them, we have executed 100 interactions with the simulated robot. The only exception is H4, for which we performed 20 random initializations due to its high variance. During each test, we have recorded the success rate and the opinion value, both of which are described in the following section.

Evaluation metrics

Success rate

Given a human partner Hi, we define the success rate S as follows:

S[Hi]=successful goalstotal interactions ∈[0,1]

[Equation 1]

We wish to formulate a comparison between the integrated cognitive architecture and the NTA. To do so, we refer to the success rate calculated on the latter as S⋆[Hi] and we formalize the difference between the two systems as follows:

ΔS[Hi]=S[Hi]−S⋆[Hi ]

[Equation 2]

Positive values of ΔS[Hi] will denote a more performative collaboration obtained by our current architecture over the NTA and vice versa.

Artificial opinion

We define a quantitative index which reflects the willingness of the robot to change its opinion about a partner. For a partner Hi at a certain time step t, this artificial opinion is calculated as follows:

O[ Hi,t]=np−nnnp+nn∈[−1,1]

[Equation 3]

where np and nn indicate, respectively, the number of positive and negative episodes experienced by the robot with partner Hi at time t. We will sometimes use a more simple notation, where we indicate the opinion of a robot toward a generic partner at a certain timestep simply as O[H].

When the robot trusts the person, that is, when PXI[a]>PXI[b], it is also true that O[H]≥0 and vice versa, when the BN is distrustful toward them O[H]

Chủ Đề