Selective imitation on the basis of reward function similarity
Published in Proceedings of the 45th Annual Meeting of the Cognitive Science Society, 2023
Recommended citation: Max Taylor-Davies, Stephanie Droop & Christopher G. Lucas. (2023). "Selective imitation on the basis of reward function similarity." Proceedings of the 45th Annual Meeting of the Cognitive Science Society.
We suggest that people preferentially imitate the behavior of others they deem to have similar reward functions to their own. We further argue that these inferences can be made on the basis of very sparse or indirect data, by leveraging an inductive bias toward positing the existence of different groups or types of people with similar reward functions, allowing learners to select imitation targets without direct evidence of alignment.