During the last decade, robots have become increasingly ubiquitous. They have been moved outside of laboratories and deployed to environments where they have to interact with humans. Examples include public places such as warehouses, hotels, factories, retail stores, and streets, as well as the most anticipated places—private homes. In these increasingly unstructured environments, requirements for human-robot interaction and collaboration are more involved as the tasks that robots complete are increasingly complex. For people to build trust with robots, there is a pressing need for robots to explain their actions and behaviors explicitly, rather than in implicit and vague manners such as using eye gaze or arm movement. This dissertation centered around robot explanations and examined four interconnected aspects of the robot explanation process: from what explanations humans prefer, how to generate explanations, and how to communicate them explicitly, to explaining missing causal information of past actions due to environment change.
The contributions of this work are fourfold. In a human-subjects study, we found strong evidence that people prefer verbal explanations coupled with non-verbal cues. To verbally explain, people prefer robots to get their attention first, then concisely explain, and are only willing to ask a few follow-up questions for more details. Then I contributed explanation generation algorithms using Behavior Trees (BTs), a simple yet powerful robot task sequence method for high-level and failure robot explanations. We framed BTs into semantic sets to generate explanations from the resulted shallow tree and demonstrated the algorithms with a complex mobile manipulation task and a taxi domain navigation task. BTs were also made dynamically modifiable for behavior insertion after users’ follow-up questions. Thirdly, we contributed a complete projection mapping implementation solution for instant and salient robot communication: from how to choose an off-the-shelf projector, how to calibrate it, and the underlying principle, to all the code and files needed to readily integrate the solution into any ROS system. Finally, I investigated physical replays, verbal and projection markers for robots to help people infer missing causal information of a robot’s past actions. We found that a multimodal approach, including all physical replay, verbal, and projection markers, provided better aid in inference-making, less mental workload, and more trustworthy robots.