An Investigation of Decision Noise and Horizon-Adaptive Exploration in the Explore-Exploit Dilemma

An Investigation of Decision Noise and Horizon- Adaptive Exploration in the Explore-Exploit Dilemma Item Type text; Electronic Dissertation Authors Wang, Siyu Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the University Libraries, University of Arizona. Further transmission, reproduction, presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 09/10/2021 05:56:56 Link to Item http://hdl.handle.net/10150/642134 AN INVESTIGATION OF DECISION NOISE AND HORIZON-ADAPTIVE EXPLORATION IN THE EXPLORE-EXPLOIT DILEMMA by Siyu Wang __________________________ Copyright © Siyu Wang 2020 A Dissertation Submitted to the Faculty of the DEPARTMENT OF PSYCHOLOGY In Partial Fulfillment of the Requirements For the Degree of DOCTOR OF PHILOSOPHY In the Graduate College THE UNIVERSITY OF ARIZONA 2020 2 THE UNIVERSITY OF ARIZONA GRADUATE COLLEGE As members of the Dissertation Committee, we certify that we have read the dissertation prepared by: Siyu Wang titled: and recommend that it be accepted as fulfilling the dissertation requirement for the Degree of Doctor of Philosophy. Robert C Wilson _________________________________________________________________ Date: ____________May 29, 2020 Robert C Wilson _________________________________________________________________ Date: ____________Jun 8, 2020 John JB Allen _________________________________________________________________ Date: ____________May 29, 2020 Lynn Nadel _________________________________________________________________ Date: ____________May 29, 2020 Jessica Andrews-Hanna Final approval and acceptance of this dissertation is contingent upon the candidate’s submission of the final copies of the dissertation to the Graduate College. I hereby certify that I have read this dissertation prepared under my direction and recommend that it be accepted as fulfilling the dissertation requirement. Robert C Wilson _________________________________________________________________ Date: ____________May 29, 2020 Robert C Wilson Psychology Acknowledgements I would like first thank my advisor Dr. Robert C. Wilson for being both a great mentor and a role model as a scientist to me. I will also take this opportunity to thank Dr. Jean-Marc Fellous, my second advisor in graduate school who opened the door of animal research to me. Dr. Hashem Sadeghiyeh, for being a great colleague and valuable friend. Jack-Morgan Mizell, Bryan Kromenacker, Dr. Jane Keung, Sarah Cook, Todd Hagen, for being such wonderful and supportive lab mates Blaine Harper, Maddie Souder, Kristine Gradisher, Yuxin Qin, Jerry Anderson, Zhuocheng Xiao, for all the training or help you offered in the Fellous lab, and Blake Gerken who helped me significantly in running experiments. Ali Gilliland, Maggie Calder, Sylvia Zarnescu, Yifei Xiang, Weixi Kang, Zeyi Chen, Hannah Kyllo, Filipa Santos, Carlos Velazquez, Kathleen Ge, undergraduates who worked closely with me on various projects, as well as my fantastic research assistants who worked with me or helped me run experiments, including Chrysta Andrade, Abigail Foley, Kathryn Kellohen, Daniel Carrera, Vera Thornton, Tausif Chowdhury, Audrey Fierro, Aidan Smith, Gabe Sulser, Nick Adragna, Kailyn Teel, Chelsea Goldberger, Haley Gordy, Julia Sochin, Cami Rivera, Michala Carlson, Colin Lynch, James Barley-Fuentes. My committee members, Dr. Lynn Nadel, Dr. John Allen, Dr. Jessica Andrews-Hanna, Dr. Ying-hui Chou, for all the support and advice. Beth Owens, Stephanie O’Donnell, Sarah Winters, for all the help and kindness. My dear rats, Drs. Scragg Gradisher, Hachi Wang, Tianqi Wang, Ratzo Wang, Rizzo Wang, Gerald Gerken and Twenty Lu, for your significant sacrifice and contribution to science. In addition, I’d like to thank my family and friends 3 Randy Spalding, who embraced me as part of his extended family and made Tucson my second home. Jim Cook, Michelle Morden, Patsy Spalding, Nancy Cook, Shubham Jain, Birkan Kolcu, Anault Allihien, Bob Cook, Friederike Almstedt, Cindy Cook, Thayer Keller, Dr. Adam Ussishkin, Dr. Andy Wedel, Aldo Wedel-Ussishkin, Dhruv Gajaria, Prajakta Vaishampayan, Sathyan Padmanabhan, members of this extended family and friends who significantly enriched my life in Tucson. Yuru Zhu, Dewei Zhang, for standing with me in my ups and downs over the past years Mengtian Lu, for being the best friend I can hope for. My parents, Wei Wang and Yaling Xu for everything. 4 Dedication To my dog, Tudou, who passed away recently & my rats, Scragg, Hachi, Tianqi, Ratzo, Rizzo, Gerald, Twenty 5 Contents Abstract . .8 1 Introduction 10 The explore-exploit tradeoff . 11 Multi-armed bandit problem . 11 Exploration strategies in humans and animals . 19 Current studies . 22 References . 24 2 The nature of decision noise in random exploration 28 Abstract . 29 Introduction . 30 Results . 31 Discussion . 41 Methods . 42 References . 48 3 Deep exploration accounts for the stopping threshold and behavioral variability in an opti- mal stopping task 52 Abstract . 53 Introduction . 54 Methods . 55 Results . 58 Discussion . 62 References . 62 6 4 The importance of action in explore-exploit decisions 65 Abstract . 66 Introduction . 67 Methods . 68 Results . 72 Discussion . 80 References . 81 7 Abstract Human and animals constantly face the tradeoff between exploring something new and exploiting what one has learned to be good. In this dissertation, I studied the properties of the heuristics that humans use to make the explore-exploit decisions. In the first study, I examined the nature of randomness in human behavior that is adaptive in exploration. Human decision making is inherently variable. While this variability is often seen as a sign of sub-optimality in human behavior, recent work suggests that randomness can actually be adaptive. A little randomness in explore-exploit decisions is remarkably effective as it encourages us to explore options we might otherwise ignore. From a modeling perspective, behavioral variability is essentially the variance that can not be explained by a model and is modeled as the level of decision noise. However, what we have called ”decision noise” in previous researches could actually just be missing deterministic components from the model, it is difficult to tell whether decision noise truly arises from a stochastic process. Here we show that, while both random and deterministic noise drive variability in behavior, the noise driving random exploration is predominantly random. In the second study, we further asked where the randomness in behavior comes from. In particular, we examined one candidate theory known as deep exploration that decisions are made through mental simulation in which behavioral variability can potentially come from the stochastic sampling process during such mental simulation. In the context of a stopping problem, we showed that deep exploration successfully accounts for the simultaneous strategic adaptation of stopping threshold and the adaptation of the level of behavioral variability in the task, suggesting a potential mechanism for how adaptive behavioral variability in human behavior is achieved. In the third study, we examined factors that modulate the behavioral adaptation of strategy and behavioral variability to the horizon context in explore-exploit decisions. One key factor in explore-exploit decisions is the planning horizon, i.e. how far ahead one plans when making the decision. Previous work has shown that humans can adapt the level of exploration to the horizon context, specifically, people are more biased towards less-known option (known as directed exploration) and behave more randomly (known as random exploration) in longer horizon context. However, Sadeghiyeh et al. (2018) showed that this horizon adaptive exploration critically depends on how the value information of the options are obtained by the participants, and participants only show horizon adaptive exploration when the value information is gained by action triggered responses (Active version), and don’t show horizon 8 adaption if the information is presented without actions to retrieve them (Passive version). In the Passive version, participants showed no horizon adaptive directed or random exploration. This is true even if the same participant has played the Active version first. I conducted a series of experiments to further investigate what behavioral factors kill the horizon adaptive exploration in the passive condition. This work reveals a more complicated nature of explore-exploit decisions and suggests the influence of action on how subjective utility is computed in the brain. 9 Introduction Siyu Wang1 1Department of Psychology, University of Arizona, Tucson AZ USA 10 The explore-exploit tradeoff Imaging you are deciding which restaurant to go for dinner, would you exploit your favorite restaurant that you always enjoy going to, or would you explore some new restaurant that you’ve never been to? This is an example of the so-called explore-exploit dilemma. Exploiting your favorite restaurant so far can ensure you a good meal, but you won’t learn anything new, whereas exploring new restaurants have the potential of finding an even better restaurant that you can enjoy for the rest of your life but at the risk of getting bad meals sometimes. These type of explore-exploit dilemma is quite ubiquitous, birds face it when deciding whether to explore and forage for food (Krebs et al., 1978), lab rats face it when deciding

Load more