Most research on human-robot handovers focuses on the development of comfortable and efficient HRI; few have studied handover failures. If a failure occurs in the beginning of the interaction, it prevents the whole handover process and destroys trust.
Here we analyze the underlying reasons why people want explanations in a handover scenario where a robot cannot pick up the object. Results suggest that participants set expectations on their request and that a robot should provide explanations rather than non-verbal cues after failing. Participants also expect that their handover request can be done by a robot, and, if not, would like to be able to fix the robot or change the request based on the provided explanations.
Handing an object to someone else is seemingly easy if done by a human, but becomes challenging when a robot is the giver. The handover process can be separated into three phases : approach, signal, and transfer. The robot giver that possesses an object first approaches the human receiver, signals the intent that the robot is ready to hand over the object, and transfers the object to the receiver.
All three phases have been investigated for comfortable and efficient handovers, e.g., approaching a person behind a clustered table  and in terms of arm trajectory and end effector height , object configuration , gaze effects on handover timing [4, 5], and proactive release .
However, the assumption in all of this work is that the handover process is successful. There is work on robustness (e.g., robust transfer using force to detect object perturbation [7, 8]), but unrecoverable handover failures remain unexamined. In this paper, we focus on the case where a robot has failed to pick up the object during the approach phase, preventing the whole handover process from happening. In addition, early failures are shown to hurt humans’ trust of the robot more than middle or later failures .
To close the gap, we contribute a qualitative analysis on the textual data we collected during a human subjects study (N = 372) about robot explanations. Particularly, we address the following question: Why would people want the robot to explain after a possession failure?
A. User Study
We conducted an online experiment  on Amazon Mechanical Turk to investigate the perceived need and the content of desired robot explanations in a handover failure scenario. 372 participants contributed valid data (Age: 18–74, M = 37; 210 males, 158 females, 3 who preferred not to answer, and 1 transgender person). In the experiment, participants first watched a Baxter robot encounter an unrecoverable pre-handover failure: it was unable to possess a cup when asked to hand it over. What participants were not told is the causal information that the cup slightly out of reach.
We controlled the amount of causal information provided by how the robot executes the task and if it shook its head (H) or not (X), resulting in 6 conditions. Executions included doing nothing (Clueless, C), looking at the cup (Opaque, O), and the addition of repeatedly moving its arm towards the cup (Legible, L; see Fig. 1). The latter two used non-verbal cues to hint or convey participants that the cup is not reachable. All execution videos are available at https://bit.ly/2U6VR0L.
Surprisingly, participants report that the robot should explain regardless of the condition. Without explanations, the non-verbal cues are confusing to participants. The headshake was interpreted as disobeying whereas the intention of the arm movement was deemed unclear. For explanation content, the robot should explain why it failed, why it disobeyed them during head shake executions without any arm motion, and why it kept moving its arm, i.e., the intention. When the robot did nothing, people wanted to know about its previous behavior. A detailed accounting can be found in .
B. Qualitative Analysis Approach
Tightly related to this work, we also asked why participants would like the robot to explain. To analyze the qualitative data, we coded all open-ended responses. On a high level, we went through all the responses in two full iterations to develop the codes and revise them. Specifically, as we went through the responses, we first coded each with up to 4 codes and generalized them as we saw more responses. The generalization process led to frequent review of previous responses, especially during the first hundred. To account for later responses and limited working memory, we went through the responses again after the first iteration. After reading all of the responses, we revised the codes and created new composite codes that have an or relationship with a very similar meaning.
III. RESULTS & ANALYSIS
Of 372 participants, 353 answered the questions. After coding, we had 106 unique codes with half (55, 52%) appearing only once due to the open-ended nature. We measured the inter-coder reliability using Cohen’s κ. An independent coder coded 10% of the samples, selected randomly, while the experimenter coded all responses. After merging codes with similar meanings, we achieved an κ value of 0.84, considered as almost perfect agreement by .
Ninety-eight (26.3%) participants wanted the robot to explain because the handover failure does not meet their expectation. While there are 20 cases for Clueless conditions (CX, CH) and the Opaque condition with headshake (OH), there are only around 10 for the Opaque condition without headshake (OX) and Legible conditions (LX, LH). By comparing the Clueless and Opaque conditions, it shows that participants expect the robot to turn its head towards the cup (OX) but without a headshake (OH), which explains why OX is dropped. The takeaway here is that participants set expectations of successful handovers from the robot after their request, and the robot should explain when it cannot achieve it. By comparing the Opaque and Legible conditions, the additional arm movement does not lead to any change when there is no headshake, but the count reduces by half with a headshake. However, the reason why more participants want the robot to explain is to fix the robot or the second composite code, revealing how problems are perceived. The takeaway here is that when the robot cannot complete the task yet exhibits some unclear behaviors without explanation, participants interpret them as problems and that the robot needs to be fixed. For other codes, the differences are not large across conditions, usually within 5–10, so we will not discuss them per condition below.
As seen in Fig. 2, the second code by 82 participants (22%) is a composite one: Confirm | Correct | Capable, indicating that the robot should explain in order to confirm it will do the task, will do it correctly, and whether it is capable of finishing the task. Around 53 participants (14.3%) expressed general reasoning: robot explanation helps them better understand the robot. Interestingly, in the fourth composite code: Fix | Troubleshoot | Help | Get Fixed, 43 participants (11.6%) expressed interest in solving the problem of the robot, either by themselves or contact the manufacturer of the robot. Related to this, we found 21 participants (5.6%) would like to correct themselves to make the robot work, coded as Human-Correction | -Correctness.
Due to the open-ended nature, all other codes found in the responses are fragmented and limited to less than 10% of participants. Some interesting reasons include understanding the decision-making process and solving future problems. If they are given explicitly, more participants may choose them.
We explored the reasoning behind robot explanation when a robot cannot possess an object for a handover request. Results suggest that participants set expectations and the robot should not only use non-verbal cues but should explain after failing. Participants also showed interest in fixing the robot or correct themselves after getting robot explanations.
This work has been supported in part by the Office of Naval Research (N00014-18-1-2503).
 K. Strabala, M. K. Lee,
A. Dragan, J. Forlizzi, S. S. Srinivasa, M. Cakmak, and V. Micelli,
“Toward seamless human-robot handovers,” Journal of Human-Robot
Interaction, vol. 2, no. 1, pp. 112–132, 2013.
 J. Mainprice, M. Gharbi, T. Siméon, and R. Alami, “Sharing effort in planning human-robot handover tasks,” in 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication. IEEE, 2012, pp. 764–770.
 M. Cakmak, S. S. Srinivasa, M. K. Lee, J. Forlizzi, and S. Kiesler, “Human preferences for robot-human hand-over configurations,” in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2011, pp. 1986–1993.
 A. Moon, D. M. Troniak, B. Gleeson, M. K. Pan, M. Zheng, B. A. Blumer, K. MacLean, and E. A. Croft, “Meet me where i’m gazing: how shared attention gaze affects human-robot handover timing,” in Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction, 2014, pp. 334–341.
 M. Zheng, A. Moon, E. A. Croft, and M. Q.-H. Meng, “Impacts of robot head gaze on robot-to-human handovers,” International Journal of Social Robotics, vol. 7, no. 5, pp. 783–798, 2015.
 Z. Han and H. Yanco, “The effects of proactive release behaviors during human-robot handovers,” in 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 2019, pp. 440–448.
 A. G. Eguíluz, I. Rano, S. A. Coleman, and T. M. McGinnity, “Towards robot-human reliable hand-over: Continuous detection of object perturbation force direction,” in 2017 26th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN). IEEE, 2017, pp. 510–515.
 ——, “Reliable robotic handovers through tactile sensing,” Autonomous Robots, vol. 43, no. 7, pp. 1623–1637, 2019.
 M. Desai, P. Kaniarasu, M. Medvedev, A. Steinfeld, and H. Yanco, “Impact of robot failures and feedback on real-time trust,” in 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 2013, pp. 251–258.
 Z. Han, E. Phillips, and H. A. Yanco, “The need for verbal robot explanations and how people would like a robot to explain itself,” ACM Transactions on Human-Robot Interaction, vol. 10, no. 4, 2021.
 J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data,” biometrics, pp. 159–174, 1977.