Nobuyuki Hanaki, Alan Kirman, and Paul Pezanis-Christou
Paper #: 2017-07-022
Abstract. We examine, experimentally and theoretically in a very simple multi-armed bandit framework, how individuals learn about an undisclosed inter-temporal payo structure. We propose a baseline reinforcement learning model that allows for pattern-recognitions and associated change in the strategy space, as well as its three augmented versions that accommodate observational learning from the actions and/or payo s of another player with whom they are matched. The models reproduce the distributional properties of observed discovery times well. Our study further shows that observing another's actions and/or payo s improves discovery compared to the baseline case when one of the pair discovered the hidden pattern.