<<

Quantum Probabilistic Graphical Models for Cognition and Decision

Catarina Alexandra Pinto Moreira

Supervisor: Doctor Andreas Miroslaus Wichert

Thesis specifically prepared to obtain the PhD Degree in Information Systems and Computer Engineering

Draft

August, 2017 ii Dedicated to all who contributed for my education ’If you can dream - and not make dreams your master; If you can think - and not make your aim; If you can meet with Triumph and Disaster, And treat those two impostors just the same; (...) If you can force your heart and nerve and sinew To serve your turn long after they are gone, And so hold on when there is nothing in you Except the Will which says to them: ”Hold on!” (...) Then, yours is the Earth and everything that’s in it, And - which is more - you’ll be a Man, my son!’ ’If - ’ by Rudyard Kipling

iii iv Title: -Like Probabilistic Graphical Models for Cognition and Decision Name Catarina Alexandra Pinto Moreira PhD in Information Systems and Computer Engineering Supervisor Doctor Andreas Miroslaus Wichert

Abstract

Cognitive scientists are mainly focused in developing models and cognitive structures that are able to represent processes of the human mind. One of these processes is concerned with human decision making. In the last decades, literature has been reporting several situations of human decisions that could not be easily modelled by classical models, because humans constantly violate the laws of prob- ability theory in situations with high levels of uncertainty. In this sense, quantum-like models started to emerge as an alternative framework, which is based on the mathematical principles of quantum mechan- ics, in order to model and explain paradoxical findings that cognitive scientists were unable to explain using the laws of classical . Although quantum-like models succeeded to explain many paradoxical decision making scenarios, they still suffer from three main problems. First, they cannot scale to more complex decision scenarios, because the number of quantum parameters grows exponentially large. Second, they cannot be consid- ered predictive, since they require that we know a priori the outcome of a decision problem in order to manually set quantum parameters. And third, the way one can set these quantum parameters is still an unexplored field and still an open research question in the Quantum Cognition literature. This work focuses on quantum-like probabilistic graphical models by surveying the most important aspects of classical probability theory, quantum-like models applied to human decision making and probabilistic graphical models. We also propose a Quantum-Like Bayesian Network that can easily scale up to more complex decision making scenarios due to its network structure. In order to address the problem of exponential quantum parameters, we also propose heuristic functions that can set an exponential number of quantum parameters without a priori knowledge of experimental outcomes. This makes the proposed model general and predictive in contrast with the current state of the art models, which cannot be generalised for more complex decision making scenarios and that can only provide an explanatory nature for the observed paradoxes.

Keywords: Quantum Cognition, Quantum-Like Bayesian Networks, Quantum Probability, Quan- tum Interference Effects, Quantum-Like Models

v vi T´ıtulo Modelos Graficos´ Probabil´ısticos Quanticosˆ para Cognic¸ao˜ e Decisao˜ Nome Catarina Alexandra Pinto Moreira Doutoramento em Engenharia Informatica´ e de Computadores Orientador Doutor Andreas Miroslaus Wichert

Resumo

Os cientistas cognitivos concentram-se principalmente no desenvolvimento de modelos e estruturas cognitivas capazes de representar processos da mente humana. Um desses processos esta´ rela- cionado com o facto de como os humanos tomam decisoes.˜ Nas ultimas´ decadas,´ a literatura tem relatado varias´ situac¸oes˜ de decisoes˜ humanas que nao˜ podem ser facilmente modeladas por mode- los classicos,´ porque os humanos violam constantemente as leis da teoria da probabilidade classica´ em situac¸oes˜ com altos n´ıveis de incerteza. Nesse sentido, os modelos quanticosˆ comec¸aram a surgir como uma abordagem alternativa que se baseia nos princ´ıpios matematicos´ da mecanicaˆ quanticaˆ para modelar e explicar situac¸oes˜ paradoxais que os cientistas cognitivos nao˜ conseguem explicar usando as leis da teoria da probabilidade classica.´ Embora os modelos quanticosˆ tenham conseguido explicar muitos cenarios´ paradoxais de decisao˜ humana, eles ainda sofrem de tresˆ problemas principais. Primeiro, eles nao˜ podem escalar para cenarios´ de decisao˜ mais complexos, porque o numero´ de parametrosˆ quanticosˆ cresce de uma forma exponencial relativamente a` complexidade do problema de decisao.˜ Em segundo lugar, eles nao˜ podem ser considerados preditivos, uma vez que exigem que conhec¸amos a priori o resultado de um problema de decisao˜ para definir manualmente os parametrosˆ quanticosˆ que servem para explicar os resultados paradoxais. E em terceiro lugar, a forma como se pode definir esses parametrosˆ quanticosˆ e´ um campo inexplorado e ainda e´ uma questao˜ de investigac¸ao˜ aberta na literatura modelos cognitivos quanticos.ˆ Este trabalho centra-se em modelos probabil´ısticos graficos´ quanticos,ˆ consistindo num levanta- mento dos aspectos mais importantes da teoria da probabilidade classica,´ modelos quanticosˆ aplica- dos a` tomada de decisao˜ humana e em modelos probabil´ısticos graficos´ classicos.´ Tambem´ propomos uma rede Bayesiana quanticaˆ que pode escalar facilmente para cenarios´ de decisao˜ mais complexos devido a` sua estrutura de rede. De forma a abordar o problema de atribuic¸ao˜ de valores a um numero´ exponencial de parametrosˆ quanticos,ˆ tambem´ propomos func¸oes˜ heur´ısticas que podem definir um conjunto exponencial de parametrosˆ quanticosˆ sem conhecimento a priori de resultados experimentais. Isso torna o modelo proposto geral e preditivo em contraste com os modelos actuais do estado da arte, que nao˜ podem ser generalizados para cenarios´ de tomada de decisao˜ mais complexos e que so´ podem fornecer uma natureza explicativa para os paradoxos observados.

Palavras-chave: Cognic¸ao˜ Quantica,ˆ Redes Bayesianas Quanticas,ˆ Probabilidade Quantica,ˆ Efeitos de Interferenciaˆ Quantica,ˆ Modelos Quanticosˆ

vii viii T´ıtulo Modelos Graficos´ Probabil´ısticos Quanticosˆ para Cognic¸ao˜ e Decisao˜ Nome Catarina Alexandra Pinto Moreira Doutoramento em Engenharia Informatica´ e de Computadores Orientador Doutor Andreas Miroslaus Wichert

Resumo Extendido

A cognic¸ao˜ quanticaˆ e´ uma area´ de investigac¸ao˜ que visa usar os princ´ıpios matematicos´ da mecanicaˆ quanticaˆ para modelar sistemas cognitivos para a tomada de decisoes˜ humanas. Dado que a teoria da probabilidade classica´ e´ muito r´ıgida no sentido de que ela apresenta muitas restric¸oes˜ e pressupostos (princ´ıpio da trajetoria´ unica,´ obedece a teoria dos conjuntos, etc.), torna-se muito limitado (ou mesmo imposs´ıvel) desenvolver modelos simples que possam capturar julgamentos humanos e decisoes,˜ uma vez que as pessoas podem violar as leis da logica´ e da teoria da probabilidade [33, 37,6]. A teoria da probabilidade quanticaˆ beneficia de muitas vantagens relativamente a` teoria classica.´ Pode representar eventos em espac¸os vectoriais. Consequentemente, pode levar em considerac¸ao˜ o problema da ordem dos efeitos [202, 188] e representar as amplitudes dos resultados experimentais ao mesmo tempo atraves´ de numa superposic¸ao.˜ Psicologicamente, o efeito de superposic¸ao˜ pode estar relacionado ao sentimento de confusao,˜ incerteza ou ambiguidade. Ou seja, pode representar a noc¸ao˜ de crenc¸a como um estado indefinido [34]. Alem´ disso, esta representac¸ao˜ do espac¸o vectorial nao˜ obedece ao axioma distributivo da logica´ booleana e nem a` lei da probabilidade total. Isso permite a construc¸ao˜ de modelos mais gerais que podem explicar matematicamente fenomenos´ cognitivos, como erros de conjunc¸ao/disjunc¸˜ ao˜ [40, 73] ou violac¸oes˜ do Princ´ıpio da Certeza [164, 110], que e´ o foco principal deste trabalho. Um problema dos actuais sistemas probabil´ısticos e´ o facto de nao˜ podem fazer previsoes˜ pre- cisas em situac¸oes˜ em que as leis da probabilidade classica´ sao˜ violadas. Estas situac¸oes˜ ocorrem frequentemente em sistemas que tentam modelar decisoes˜ humanas em cenarios´ onde o princ´ıpio da Certeza [170] e´ violado. Este princ´ıpio e´ fundamental na teoria da probabilidade classica´ e afirma que se alguem´ preferir a acc¸ao˜ A relativamente a` acc¸ao˜ B no estado do mundo X, e se alguem´ tambem´ preferir a acc¸ao˜ A relativamente a B sob o estado complementar do Mundo ¬X, entao˜ subentende- se que se deve preferir sempre a acc¸ao˜ A relativamente a B mesmo quando o estado do mundo nao˜ e´ conhecido. Violac¸oes˜ ao Princ´ıpio da Certeza implicam violac¸oes˜ a` lei da probabilidade total classica´ [193, 196, 198,9, 26]. Desta forma, neste trabalho, e´ proposta uma Rede Bayesiana quantica,ˆ inspirada nos formalismos de Integrais de caminho de Feynman [72]. Uma rede Bayesiana pode ser entendida como um grafico´ ac´ıclico direcionado, no qual cada no´ representa uma variavel´ aleatoria´ e cada uma das arestas representa uma influenciaˆ direta do no´ de origem para o no´ alvo (dependenciaˆ condicional). Por sua vez, os integrais do caminho de Feynman representam todos os caminhos poss´ıveis que uma part´ıcula pode percorrer para alcanc¸ar um ponto de

ix destino, levando em considerac¸ao˜ que todos esses caminhos podem produzir efeitos de interferenciaˆ quanticaˆ entre eles.

A criac¸ao˜ deste tipo de redes Bayesianas quanticas,ˆ juntamente com a aplicac¸ao˜ dos integrais de caminho de Feynman, geram algumas dificuldades, nomeadamente a quantidade exponencial de parametrosˆ livres que resultam dos efeitos de interferenciaˆ quantica.ˆ A estes parametrosˆ e´ preciso atribuir valores que permitam acomodar os cenarios´ de decisao˜ onde o princ´ıpio da certeza e´ violado.

Para colmatar este problema, propomos tambem´ um conjunto de heur´ısticas de similaridade para calcular esse numero´ exponencial de parametrosˆ de interferenciaˆ quanticos.ˆ Note-se que uma heur´ıstica e´ simplesmente um atalho que geralmente fornece bons resultados em muitas situac¸oes,˜ mas com o custo de ocasionalmente nao˜ nos dar resultados muito precisos [173].

Note-se que os modelos atuais da literatura exigem uma busca manual de parametrosˆ que podem levar aos resultados desejados. Ou seja, e´ necessario´ sabermos o resultado do cenario´ de decisao˜ a` priori para manualmente se atribu´ırem valores a esses parametrosˆ [93, 96, 101, 164, 44, 41]. Com a rede proposta, pretende-se um modelo escalavel´ e preditivo ao contraste dos modelos actuais que temˆ uma natureza explicativa.

As heur´ısticas que propomos neste trabalho sao˜ de tresˆ tipos: (1) baseadas na distribuic¸ao˜ prob- abi´ısticas dos dados, (2) baseadas nos conteudos´ dos dados e (3) baseadas em relac¸oes˜ semanticas.ˆ

Heur´ısticade Similaridade Baseada em Distribuic¸oes˜ Probabil´ısticas

O objetivo da heur´ıstica de similaridade e´ determinar um anguloˆ entre os vectores probabil´ısticos as- sociados a` marginalizac¸ao˜ das atribuic¸oes˜ positivas e negativas da variavel´ de consulta. Em outras palavras, ao realizar uma inferenciaˆ probabil´ısticas a partir de uma tabela de distribuic¸ao˜ de proba- bilidade conjunta, selecionamos nesta tabela todas as probabilidades que combinam as atribuic¸oes˜ da variavel´ de consulta e, se for dado, as variaveis´ observadas. Se somarmos essas probabilidades, acabamos com uma inferenciaˆ de probabilidade classica´ final. Se acrescentarmos um termo de in- terferenciaˆ a essa inferenciaˆ classica,´ acabaremos com uma inferenciaˆ probabil´ıstica quantica.ˆ Neste caso, podemos usar esses vetores de probabilidade para obter informac¸oes˜ adicionais para calcular os parametrosˆ de interferenciaˆ quantica.ˆ A ideia geral da heur´ıstica de similaridade e´ usar as distribuic¸oes˜ de probabilidade marginal como vetores de probabilidade e medir sua similaridade atraves´ da lei dos Cossenos, que e´ uma medida de similaridade bem conhecida no dom´ınio da Cienciaˆ da Computac¸ao˜ e e´ amplamente utilizada na Recuperac¸ao˜ de Informac¸ao˜ [23]. De acordo com esse grau de similaridade, aplicaremos uma func¸ao˜ de mapeamento com uma natureza heur´ıstica, que produzira´ o valor para o parametroˆ de interferenciaˆ quantico,ˆ tendo em conta um estudo previo´ relativamente a` distribuic¸ao˜ prob- abil´ıstica dos dados de varias´ experienciasˆ relatadas por toda a literatura. Os resultados mostraram um erro medio´ entre 6.4% a 7.9% na previsao˜ das decisoes˜ humanas em varias´ experienciasˆ da literatura onde foram reportadas violac¸oes˜ ao princ´ıpio da certeza.

x Heur´ısticade Similaridade Baseada em Distribuic¸oes˜ Probabil´ısticas

Esta heur´ıstica representa objectos (ou eventos num espac¸o vetorial N-dimensional. Isto permite a sua comparac¸ao˜ atraves´ de func¸oes˜ de similaridade. O valor da similaridade e´ usado para calcular os parametrosˆ de interferenciaˆ quantica.ˆ Tal como no trabalho de Pothos et al. [166], nao˜ estamos a restringir o nosso modelo a um vector em num espac¸o psicologico´ multidimensional, mas a um espac¸o multidimensional arbitrario.´ As similaridades calculadas entre dois vectores que representam conteudos´ de eventos (neste caso, os eventos sao˜ imagens e os seus conteudos´ sao˜ os pixeis´ que as compoes)˜ podem ser usadas para definir parametrosˆ de interferenciaˆ quantica,ˆ uma vez que ambos sao˜ compostos pelo calculo´ do produto interno entre duas variaveis´ aleatorias.´ Isto sugere uma equivalenciaˆ matematica´ entre os parametrosˆ θ calculados a partir da similaridade do Cosseno e os parametrosˆ quantitativos θ correspondentes aos efeitos de interferenciaˆ quantica.ˆ Essa suposic¸ao˜ e´ baseada no livro de Busemeyer & Bruza [34], onde se afirma que o parametroˆ θ que surge em efeitos de interferenciaˆ quanticaˆ corresponde a` fase do anguloˆ do produto interno entre os projetores de duas variaveis´ aleatorias.´ Os autores tambem´ afir- mam que o produto interno fornece uma medida de similaridade entre dois vectores (onde cada vector corresponde a uma superposic¸ao˜ de eventos). Se os vectores tiverem o comprimento unitario,´ entao,˜ a semelhanc¸a do Coseno colapsa para o produto interno. Dadas todas essas relac¸oes,˜ podemos assumir que as semelhanc¸as computadas entre dois vetores que representam imagens (usadas na experienciaˆ de Busemeyer et al. [41]) podem ser usadas para definir parametrosˆ de interferenciaˆ quantica.ˆ Os resultados das simulac¸oes˜ aplicados ao trabalho de Busemeyer et al. [41] demonstraram que a heur´ıstica proposta foi capaz de reproduzir as observac¸oes˜ experimentais das violac¸oes˜ do princ´ıpio da certeza com uma pequena percentagem de erro (entre 4% e 5%).

Heur´ısticade Similaridade Semanticaˆ

Esta heur´ıstica procura determinar o impacto de relac¸oes˜ de dependenciaˆ semanticaˆ entre eventos. Estas semelhanc¸as semanticasˆ adicionam novas dependenciasˆ entre os nos´ das redes Bayesianas que nao˜ incluem necessariamente relac¸oes˜ causais directas. Usaremos essas informac¸oes˜ semanticasˆ adicionais para calcular os efeitos de interferenciaˆ quantica,ˆ a fim de acomodar as violac¸oes˜ ao princ´ıpio da certeza. Sob o princ´ıpio da causalidade, dois eventos que nao˜ estao˜ causalmente conectados nao˜ devem produzir nenhum efeito. Quando alguns eventos acausais ocorrem produzindo um efeito, e´ chamado de coincidencia.ˆ Carl Jung, acreditava que todos os eventos tinham que estar conectados uns aos outros, nao˜ num cenario´ causal, mas sim atraves´ do seu significado, sugerindo algum tipo de relac¸ao˜ semanticaˆ entre eventos. Esta noc¸ao˜ e´ conhecida como o princ´ıpio da sincronicidade [87]. Definimos a heur´ıstica de similaridade semanticaˆ de forma semelhante ao princ´ıpio da sincronici- dade: duas variaveis´ sao˜ ditas sincronizadas, se compartilhem uma conexao˜ semanticaˆ entre eles. Essa conexao˜ pode ser obtida atraves´ da representac¸ao˜ de uma rede semanticaˆ das variaveis´ em

xi questao.˜ Isso permitira´ o surgimento de novas dependenciasˆ significativas que seriam inexistentes ao considerar apenas relac¸oes˜ causa/efeito. Os parametrosˆ quanticosˆ sao˜ entao˜ atribu´ıdos usando esta informac¸ao˜ adicional de forma a que o anguloˆ formado por essas duas variaveis,´ num espac¸o de Hilbert, seja o menor poss´ıvel (alta similaridade), dessa forma forc¸ando os eventos acausais a serem correlacionados. Os resultados das simulac¸oes˜ aplicadas ao trabalho de Busemeyer et al. [41] demonstraram que a heur´ıstica proposta foi capaz de reproduzir as observac¸oes˜ experimentais das violac¸oes˜ do princ´ıpio da certeza com uma pequena percentagem de erro (entre 3% e 6%).

Palavras-chave: Cognic¸ao˜ Quantica,ˆ Redes Bayesianas Quanticas,ˆ Probabilidade Quantica,ˆ Efeitos de Interferenciaˆ Quantica,ˆ Modelos Quanticosˆ

xii Contents

Abstract...... v Resumo...... vii Resumo...... ix List of Tables...... xviii List of Figures...... xxii

1 Introduction 1 1.1 Violations to Normative Theories of Rational Choice...... 1 1.2 The Emergence of Quantum Cognition...... 2 1.3 Motivation: Violations to the Sure Thing Principle...... 3 1.4 Why Quantum Cognition?...... 5 1.5 Challenges of Current Quantum-Like Models...... 6 1.6 Thesis Proposal...... 8 1.6.1 Why Bayesian Networks?...... 8 1.6.2 Quantum-Like Bayesian Networks for Disjunction Errors...... 8 1.6.3 Comparison with Existing Quantum-Like Models...... 9 1.7 Advantages of Quantum-Like Models...... 9 1.7.1 Research Questions...... 10 1.8 Contributions...... 10 1.8.1 Conference Papers, Extended Abstracts and Posters...... 11 1.9 Organisation...... 11

2 Quantum Cognition Fundamentals 15 2.1 Introduction to Quantum Probabilities...... 16 2.1.1 Representation of Quantum States...... 16 2.1.2 Space...... 17 2.1.3 Events...... 18 2.1.4 System State...... 20 2.1.5 State Revision...... 21 2.1.6 Compatibility and Incompatibility...... 22 2.2 Interference Effects...... 23

xiii 2.2.1 The Double Slit Experiment...... 23 2.2.2 Derivation of Interference Effects from Complex Numbers...... 24 2.3 Time Evolution...... 26 2.4 Path Diagrams...... 27 2.4.1 Single Path Trajectory Principle...... 28 2.4.2 Multiple Indistinguishable Paths...... 28 2.4.3 Multiple Distinguishable Paths...... 29 2.5 Born’s Rule...... 29 2.6 Why Complex Numbers?...... 30 2.7 Summary and Final Discussion...... 33

3 Fundamentals of Bayesian Networks 35 3.1 The Na¨ıve Bayes Model...... 35 3.2 Bayesian Networks...... 37 3.2.1 Example of Inferences in Bayesian Networks...... 38 3.3 Reasoning Factors...... 39 3.3.1 Causal Reasoning...... 39 3.3.2 Evidential Reasoning...... 40 3.3.3 Intercausal Reasoning...... 41 3.4 Flow of Probabilistic Inference...... 41 3.5 Summary and Final Discussion...... 43

4 Paradoxes and Fallacies for Cognition and Decision-Making 45 4.1 Utility Functions...... 46 4.1.1 Expected Utility Theory...... 46 4.1.2 Subjective Expected Utility...... 47 4.2 Paradoxes...... 47 4.2.1 ...... 48 4.2.2 ...... 48 4.2.3 Three Color Ellsberg Paradox...... 50 4.3 Conjunction and Disjunction Errors...... 50 4.3.1 The Linda Problem...... 51 4.4 Disjunction Effects...... 52 4.4.1 The Two Stage Gambling Game...... 53 4.4.2 The Prisoner’s Dilemma Game...... 54 4.5 Order of Effects...... 56 4.6 Summary and Final Discussion...... 58

xiv 5 Related Work 61 5.1 Disjunction Fallacy: The Prisoner’s Dilemma Game...... 62 5.2 A Classical Markov Model of the Prisoner’s Dilemma Game...... 62 5.3 The Quantum-Like Approach...... 64 5.3.1 Contextual Probabilities: The Vaxj¨ o¨ Model...... 64 5.3.2 The Hyperbolic Interference...... 66 5.3.3 Quantum-Like Probabilities as an Extension of the Vaxj¨ o¨ Model...... 67 5.3.4 Modelling the Prisoner’s Dilemma using the Quantum-Like Approach...... 68 5.4 The Quantum Dynamical Model...... 68 5.5 The Quantum Prospect ...... 71 5.5.1 Choosing the Uncertainty Factor...... 72 5.5.2 The Quantum Prospect Decision Theory Applied to the Prisoner’s Dilemma Game 74 5.6 Probabilistic Graphical Models...... 75 5.6.1 Classical Bayesian Networks...... 75 5.6.2 Classical Bayesian Networks for the Prisoner’s Dilemma Game...... 75 5.6.3 Quantum-Like Bayesian Networks in the Literature...... 77 5.7 Discussion of the Presented Models...... 78 5.7.1 Discussion in Terms of Interference, Parameter Tuning and Scalability...... 78 5.7.2 Discussion in Terms of Parameter Growth...... 81 5.8 The Quantum-Like Approach Over the Literature...... 82 5.9 The Quantum Dynamical Model Over the Literature...... 83 5.10 A Model of Neural Oscillators for Quantum Cognition and Negative Probabilities..... 84 5.11 A Quantum-Like Agent-Based Model...... 85 5.12 Summary and Final Discussion...... 86

6 Quantum-Like Bayesian Networks for Cognition and Decision 89 6.1 Classical Bayesian Networks...... 89 6.1.1 Classical Conditional Independece...... 90 6.1.2 Classical Random Variables...... 90 6.1.3 Example of Application in the Two-Stage Gambling Game...... 90 6.1.4 Classical Full Joint Distributions...... 91 6.1.5 Classical Marginalization...... 91 6.2 Quantum-Like Bayesian Networks...... 92 6.2.1 Quantum Random Variables...... 93 6.2.2 ...... 94 6.2.3 Quantum-Like Full Joint Distribution...... 95 6.2.4 Quantum-Like Marginalisation: Exact Inference...... 96 6.3 The Impact of the Phase θ ...... 99 6.4 A Cognitive Interpretation of Quantum-Like Bayesian Networks...... 100

xv 6.5 Summary of the Quantum-Like Bayesian Network Model...... 100 6.6 Inference in More Complex Networks: The Burglar/Alarm Network...... 103 6.7 Discussion of Experimental Results...... 105 6.8 Summary and Final Discussion...... 107

7 Heuristical Approaches Based on Data Distribution 109 7.1 The Vector Similarity Heuristic...... 109 7.1.1 Acquisition of Additional Information...... 111 7.1.2 Definition of the Heuristical Function...... 112 7.1.3 Algorithm...... 113 7.1.4 Summary...... 114 7.2 Example of Application...... 115 7.3 Similarity Heuristic Applied to the Prisoner’s Dilemma Game...... 117 7.3.1 The Special Case of Crosson’s (2009) Experiments...... 119 7.3.2 Analysing Li’s et al. (2002) Experiments...... 120 7.4 Similarity Heuristic Applied to the Two Stage Gambling Game...... 122 7.5 Comparing the Similarity Heuristic with other Works of the Literature...... 123 7.6 Summary and Final Discussion...... 124

8 Heuristical Approaches Based on Contents of the Data 127 8.1 A Vector Similarity Model to Extract Quantum Parameters...... 128 8.1.1 Using Cosine Similarity to Determine Quantum Parameters...... 129 8.2 Application to the categorisation-Decision Experiment...... 130 8.2.1 Categorisation - Decision Making Experiment...... 130 8.2.2 Modelling the Problem using Quantum-Like Bayesian Networks...... 132 8.2.3 Computation of the Probability of Narrow Faces...... 132 8.2.4 Computing Quantum Interference Terms...... 133 8.2.5 The Impact of the Conversion Threshold...... 134 8.2.6 Results and Discussion...... 136 8.3 Algorithm...... 138 8.4 Summary and Final Discussion...... 138

9 Heuristical Approaches Based on Semantic Similarities 141 9.1 Synchronicity: an Acausal Connectionist Principle...... 142 9.2 Combining Causal and Acausal Principles for Quantum Cognition...... 142 9.2.1 Semantic Networks...... 143 9.2.2 The Semantic Similarity Heuristic...... 143 9.3 The Semantic Similarity Heuristic in the Categorisation/Decision Experiment...... 144 9.3.1 Application of the Synchronity Heuristic: Narrow Faces...... 145 9.3.2 Results and Discussion...... 146

xvi 9.4 Application to More Complex Bayesian Networks: The Lung Cancer Network...... 146 9.4.1 Deriving a Semantical Network...... 147 9.4.2 Inference in Quantum Bayesian Networks...... 147 9.4.3 Results with No Evidences Observed: Maximum Uncertainty...... 147 9.4.4 Results with One Piece of Evidence Observed...... 148 9.5 Application to More Complex Bayesian Networks: The Burglar / Alarm Network...... 149 9.5.1 Semantic Networks: Incorporating Acausal Connections...... 150 9.6 Summary and FInal Discussion...... 152

10 Classical and Quantum Models for Order Effects 153 10.1 The Gallup Poll Problem...... 154 10.2 A Quantum Approach for Order Effectsl...... 156 10.2.1 The Quantum Projection Model...... 157 10.2.2 Discussion of the Quantum Projection Model...... 159 10.3 The Relativist Interpretation of Parameters...... 160 10.4 Do We Need Quantum Theory for Order Effects?...... 162 10.4.1 A Classical Approach for Order Effects...... 162 10.4.2 Analysis of the Classical Projection Model...... 164 10.4.3 Explaining Serveral Order Effects using the Classical and Quantum Projection Models...... 165 10.4.4 Occam’s Razor...... 166 10.5 Summary and Final Discussion...... 167

11 Classical Models with Hidden Variables 169 11.1 Latent Variables...... 170 11.2 Classical Bayesian Network with Latent Variables...... 172 11.2.1 Estimating the Parameters...... 175 11.2.2 Increasing the Dimensionality of a Classical Bayesian Network...... 179 11.3 Quantum-Like Bayesian Networks as an Alternative Model...... 180 11.4 Summary and Final Discussions...... 184

12 Conclusions 187

13 Future Work 191 13.1 A Quantum-Like Analysis of a Real Life Financial Scenario: The Dutch’s Bank Loan Ap- plication...... 191 13.2 Quantum-Like Influence Diagrams: Incorporating Expected Utility in Quantum-Like Bayesian Networks...... 192 13.3 Neuroeconomics: quantum probabilities towards a unified theory of decision making... 193

xvii Bibliography 194

xviii List of Tables

3.1 Summary of all possible active trails in a Bayesian Network...... 43

4.1 Allais Paradox Experiement 1...... 49 4.2 Allais Paradox Experiement 2...... 49 4.3 Three color ellesberg paradox experiment 1...... 50 4.4 Three color ellesberg paradox experiment 2...... 50 4.5 Results of the two-stage gambling game reported by different works from the literature.. 54 4.6 Works of the literature reporting the probability of a player choosing to defect under sev- eral conditions. a corresponds to the average of the results reported in the first two payoff matrices of the work of Crosson [55]. b corresponds to the average of all seven experi- ments reported in the work of Li & Taplin [125]...... 55 4.7 Summary of the results obtained in the work of Moore [134]...... 57 4.8 Results obtained from the medical decision experiment in Bergus et al. [25]...... 57 4.9 Results reported by Trueblood & Busemeyer [188] of the experiments performed by McKen- zie et al. [132]...... 58

5.1 Average results of several different experiments of the Prisoner’s Dilemma Game reported in Section 4.4.2...... 62 5.2 Classical full joint probability distribution representation of the Bayesian Network in Fig- ure 5.4...... 76 5.3 Relation between classical and quantum probabilities used in the work of Leifer & Poulin [124]...... 77 5.4 Comparison of the different models proposed in the literature...... 79

6.1 Fulll joint distribution of the Bayesian Newtwork in Figure 6.1 representing the average results reported over the literature for the Two Stage Gambling Game (Table 4.5). The

random variable G1 corresponds to the outcome of the first gamble and the variable G2 corresponds to the decision of the player of playing/not playing the second gamble..... 91 6.2 Fulll joint distribution of the Bayesian Newtwork in Figure 6.2 representing the average results reported over the literature for the Two Stage Gambling Game (Table 4.5). The

random variable G1 corresponds to the outcome of the first gamble and the variable G2 corresponds to the decision of the player of playing/not playing the second gamble..... 96

xix 6.3 Probabilities obtained when performing inference on the classical Bayesian Network of Figure 6.6...... 104 6.4 Probabilities obtained when performing inference on the quantum Bayesian Network of Figure 6.7...... 105 6.5 Optimum θ’s found for each variable from the burglar/alarm bayesian network (Figure 6.6). 105

7.1 Table representation of a quantum full joint probability distribution...... 110 7.2 Fulll joint distribution of the Bayesian Newtwork in Figure 6.2 representing the average results reported over the literature for the Two Stage Gambling Game (Table 4.5). The

random variable G1 corresponds to the outcome of the first gamble and the variable G2 corresponds to the decision of the player of playing/not playing the second gamble..... 115 7.3 Analysis of the quantum θ parameters computed for each work of the literature using the proposed similarity function. Expected θ corresponds to the quantum parameter that leads to the observed probability value in the experiment. Computed θ corresponds to the quantum parameter computed with the proposed heurisitc. b corresponds to the average of all seven experiments reported...... 118 7.4 Results for the two games reported in the work of Crosson [55] for the Prisoner’s Dilemma Game for several conditions: when the action of the second player was guessed to be Defect (Guessed to Defect), when the action of the second player was guessed to be C ooperate (Guessed to Collaborate), and when the action of the second player was not known(Unknown)...... 119 7.5 Experimental results reported in work of Li & Taplin [125] for the Prisoner’s Dilemma game for several conditions: when the action of the second player is known to be Defect (Known to Defect), when the action of the second player is known to be C ooperate (Known to Collaborate), and when the action of the second player is not known(Unknown). The column Violations of STP corresponds to determining if the collected results are violating the Sure Thing Principle...... 120 7.6 Experimental results reported in work of Li & Taplin [125] for the Prisoner’s Dilemma game. The entries highlighted correspond to games that are not violating the Sure Thing Principle. Expected θ corresponds to the quantum parameter that leads to the observed probability value in the experiment. Computed θ corresponds to the quantum parameter computed with the proposed heurisitc...... 121 7.7 Comparison between the Quantum Prospect Decision Theory (QPDT) model and the proposed Quantum-Like Bayesian Network (QLBN) for different works of the literature reporting violations to the Sure Thing Principle. b corresponds to the average of all seven experiments reported...... 123 7.8 Comparison between the Quantum Prospect Decision Theory (QPDT) model and the proposed Quantum-Like Bayesian Network (QLBN) for all the different experiments per- formed in the work of Li & Taplin [125]...... 124

xx 8.1 Empirical data collected in the experiment of Busemeyer et al. [41]...... 131 8.2 Results from the application of the Quantum Like Bayesian Network (QLBN) model to the categorisation / Decision experiment and comparison with the Quantum Dynamical Model (QDM) proposed in the work of Busemeyer et al. [41]...... 137

9.1 Full joint probability distribution. Pr(C,D) corresponds to the classical probability and ψ(C,D) corresponds to the respective quantum amplitude...... 145 9.2 Comparison between a Quantum Markov Model and the proposed Bayesian Network... 146 9.3 Probabilities obtained when performing inference on the Bayesian Network of Figure 9.4. 149 9.4 Probabilities obtained when performing inference on the Bayesian Network of Figure 9.6. 151

10.1 Summary of the results obtained in the work of [134] for the Clinton-Gore Poll, showing an Assimilation Effect...... 154 10.2 Summary of the results obtained in the work of [134] for the Gingrich-Dole Poll, showing a Contrast Effect...... 155 10.3 Summary of the results obtained in the work of [134]. The table reports the probability of answering All or Many to the questions. The results show the occurrence of an Additive Effect...... 155 10.4 Summary of the results obtained in the work of [134] for the Rose-Jacjson Poll, showing a Subtractive Effect...... 156 10.5 Prediction of the geometric approach using different φ rotation parameters to explain the different types of order effects reported in the work of [134]. The columns Pr(1st ans) vs Pr(1st ans exp) represent the answer to the first question obtained using the projection models and the value reported in [134], respectively. Pr(2nd ans) vs Pr(2nd ans exp) represent the answer to the second question obtained using the projection models and the value reported in [134]...... 165

11.1 Full joint probability distribution for the general Bayesian Network from Figure 11.2, which models the Prisoner’s Dilemma game. Note that rs stands for risk seeking, ra for risk averse, d for defect and c for cooperate...... 176 11.2 Full joint probability distribution table of the Quantum-Like Bayesian Network in Figure 11.5.182

11.3 Analysis of the quantum θx parameters computed for each work of the literature in order to reproduce the observed and unobserved conditions of the Prisoner’s Dilema Game. b corresponds to the average of all seven experiments reported...... 183

xxi xxii List of Figures

1.1 Representation of the knowable conditions of the Two Stage Gambling Game experiment of Tversky & Shafir [198]...... 4 1.2 Representation of the unknowable conditions of the Two Stage Gambling Game experi- ment of Tversky & Shafir [198]...... 4 1.3 Representation of the unknowable conditions of the Two Stage Gambling Game experi- ment conducted by Tversky & Shafir [198]...... 5

2.1 Sample Space (classical probability theory)...... 18 2.2 Hilbert Space (quantum probabilty theory)...... 18 2.3 Example of a representation of an event on a Hilbert Space...... 19 2.4 Example of a quantum system state...... 20 2.5 The double slit experiment. Electrons are fired and they can pass through one of the

slits (either s1ors2) to reach a detector screen in points d1 or d2. If we measure from which slit the electron went through, then the pattern in the detectetor will have the shape and size of the two slits, suggesting a particle baheviour of the electron. If we do not measure from which slit the electron is going through, then the electron behaves as a wave and produces an interference pattern in the detector screen, with one point detecting constructive interference and another point detecting destructive interference...... 24 2.6 Classical Principle of Least Action. The path that a particle chooses between a starting and ending position is always the one that requires the least energy (left). Quantum version of the Principle of Least Action. A particle can be on different paths at the same time and use them to find the optimal path (the one that requires less energy) between a starting and final position (right)...... 27 2.7 Single Path Trajectory (left). Multiple distinguishable paths (center). Multiple undistin- guishable paths (right)...... 27

2.8 Representation of the projections, Pi, of a qubit ψ, to either the |0i state subspace S0 or

the |1i state subspace S1...... 30 2.9 Example of a distance between two points in L1-norm, also known as the Manhatten distance...... 31 2.10 Example of a distance between two points in L2-norm, also known as the Euclidean dis- tance...... 31

xxiii 3.1 Na¨ıve Bayes Model, where node C represents the class variable and the set of random

variables {X1,X2, ..., Xn} represent the features...... 36 3.2 The Burglar Bayesian Network proposed in the book of [168]...... 38

3.3 Difference between causal reasoning and evidential reasoning...... 40

3.4 Indirect Causal Effect...... 43

3.5 Indirect Evidential Effect...... 43

3.6 Common Cause Effect...... 43

3.7 V-Structure...... 43

4.1 Linda is feminist and bank teller. Notice that P r(F ∩ B) has always to be smaller than P r(B)...... 52

4.2 Linda is feminist and bank teller. Notice that P r(F ∪ B) has always to be bigger than P r(F ). 52

4.3 The two-stage gambling experiment proposed by Tversky & Shafir [198]...... 53

4.4 Example of a payoff matrix for the Prisoner’s Dilemma Game...... 54

4.5 The Prisoner’s Dilemma game experiment proposed by Tversky & Shafir [198]...... 55

5.1 Illustration of the probabilities that can be obtained by varying the parameters γ and µd.. 71

5.2 Illustration of the probabilities that can be obtained by varying the parameters γ and µc.. 71

5.3 Illustration of the probabilities that can be obtained by varying the parameters γ and µB.. 71 5.4 Bayesian Network representation of the Average of the results reported in the literature

(last row of Table 8.2). The random variables that were considered are P1 and P2, corre- sponding to the actions chosen by the first participant and second participant, respectively. 76

6.1 Classical Bayesian Network representation of the average results reported over the liter- ature for the Two Stage Gambling Game (Section 4.4.1, Table 4.5)...... 90

6.2 Quantum-Like Bayesian Network representation of the average results reported over the literature for the Two Stage Gambling Game (Section 4.4.1, Table 4.5). The ψ(x) repre- sents a complex probability amplitude...... 93

6.3 Example of constructive interference: two waves collide forming a bigger wave...... 98

6.4 Example of destructive interference: two waves collide cancelling each other...... 98

6.5 The various quantum probability values that can be achieved by variying the angle θ in Equation 6.14. Note that quantum probability can achieve much higher/lower values than the classical probability...... 99

6.6 Burglar/Alarm classical Bayesian Network proposed in the book of Russel & Norvig [168] 104

6.7 Quantum-like counterpart of the Burglar/Alarm Bayesian Network proposed in the book of Russel & Norvig [168]...... 104

6.8 Possible probabilities when querying ”MaryCalls = t” with no evidence. Parameters used

were: {θ1, θ2, θ3, θ5, θ7, θ8} → {0, 0, 0, 0, 3.1, 0}. Maximum probability for {θ1, θ2} → {0, 3.1}. 106

xxiv 6.9 Possible probabilities when querying ”Burglar = t” with no evidence. Parameters used

were: {θ1, θ2, θ3, θ5, θ6, θ7} → {0, 0, 0, 6.2, 0.1, 3.1}. Maximum probability for {θ4, θ8} → {0, 3.2}...... 106 6.10 Possible probabilities when querying ”JohnCalls = t” with no evidence. Parameters used

were: {θ1, θ3, θ4, θ5, θ7, θ8} → {1.9, 0, 2.3, 0.5, 4.5, 2.4}. Maximum probability for {θ2, θ6} → {2.3, 5.5}...... 106 6.11 Possible probabilities when querying ”Alarm = t” with no evidence. Parameters used were:

{θ1, θ3, θ4, θ5, θ7, θ8} → {0, 0, 0.8, 6.2, 3.1, 4.3}. Maximum probability for {θ2, θ6} → {0.2, 0.5}.106

7.1 Vector representation of two vectors representing a certain state...... 111 7.2 Illustration of the different 2-dimensional vectors that will be generated for each step of iteration during the computation of the quantum interference term...... 111

7.3 Vector representation of vectors G2play and G2nplay plus the euclidean distance vector c. 116 7.4 Comparison of the results obtained, for different works of the literature concerned with the Prisoner’s Dilemma game...... 118 7.5 Possible probabilities that can be obtained from Game 1 (left), Game 2 (center) and the average of the Games of the work of Crosson [55], using the quantum law of total probability.119 7.6 Comparison of the results obtained, for different experiments reported in the work of Li & Taplin [125] in the context of the Prisoner’s Dilemma game...... 120 7.7 Possible probabilities that can be obtained in Game 2 of the work of Li & Taplin [125] (left). Possible probabilities that can be obtained in Game 6 of the work of Li & Taplin [125] (center). Possible probabilities that can be obtained in the work of Busemeyer et al. [39] (right)...... 121 7.8 Comparison of the results obtained, for different works of the literature concerned with the Two-Stage Gambling game...... 122 7.9 Error percentage obtained in each experiment of the Two Stage Gambling game...... 122 7.10 Possible probabilities that can be obtained in the work of Lambdin & Burdsal [122]. The probabilities observed in their experiment and the one computed with the proposed quan- tum like Bayesian Network are also represented...... 122

8.1 Vector normalization to obtain quantum destructive interferences...... 129 8.2 Example of Wide faces used in the experiment of Busemeyer et al. [41]...... 130 8.3 Example of Narrow faces used in the experiment of Busemeyer et al. [41]...... 130 8.4 Summary of the probability distribution of the Good / Bad faces in the experiment of Buse- meyer et al. [41]...... 131 8.5 Representation of the Narrow faces experiment (left) and Wide faces experiment (right) in a Bayesian Network with classical probabilities and quantum amplitudes. The classical

probabilities are given by P r(X) and the quantum amplitudes by ψx...... 132 8.6 Conversion of a dataset image into a binary image. Conversion with a small threhold (left). Conversion with a high threhold (right)...... 134

xxv 8.7 Impact of the threshold when converting an image into a binary image. Threhold ranges from 0.2 (left) to 0.8 (right)...... 135 8.8 Distribution of P r(Attack) using threshold 0.2...... 135 8.9 Distribution of P r(Attack) using threshold 0.3...... 135 8.10 Distribution of P r(Attack) using threshold 0.4...... 135 8.11 Distribution of P r(Attack) using threshold 0.5...... 135 8.12 Distribution of P r(Attack) using threshold 0.6...... 136 8.13 Distribution of P r(Attack) using threshold 0.7...... 136 8.14 Distribution of P r(Attack) using threshold 0.8...... 136 8.15 Probability distribution of the 100 simulations performed when converting a grayscale im- age into a binary one with a threshold of 0.4...... 136

9.1 Encoding of the synchronised variables with their respective angles (left). Two synchro- nised events forming an angle of π/4 between them (right)...... 143 9.2 Representation of the Synchronicity heuristic in the Hilbert Space. Vector i corresponds to the event C = Good, D = Attack. Vector j corresponds to the event C = Bad, D = Attack. The computed angle for the Attack (left) and W thdraw (right) actions is θ = 3π/4...... 145 9.3 Semantic Network for the Lung Cancer Bayesian Network...... 147 9.4 Lung Cancer Bayesian Network...... 148 9.5 Probabilities obtained using classical and quantum inferences for different queries for the Lung Cancer Bayesian Network (Figure 9.4)...... 148 9.6 Example of a Quantum-Like Bayesian Network [168]. ψ represents quantum amplitudes. P r corresponds to the real classical probabilities...... 150 9.7 Semantic Network representation of the network in Figure 9.6...... 150 9.8 Results for various queries comparing probabilistic inferences using classical and quan- tum probability when no evidences are observed: maximum uncertainty...... 151

10.1 Example of the application of the quantum projection approach for a sequece of two bi- nary questions A and B. We start in a superposition state and project this state into the yes basis of question A (left). Then, starting in this basis, we project into the basis corre- sponding to the answer yes of question B (center). We can then have a different result if we reverse the order the projections (right)...... 157

10.2 Relation between the rotation parameter φ and the quantum probability amplitude s0 of

Equation 10.15. The amplitude s1 was set to s1 = 1 − s0. We can simulate several order effects by varying the parameter φ...... 160

10.3 Relation between the rotation parameter φ and the quantum probability amplitude s0 of

Equation 10.12. The amplitude s1 was set to s1 = 1 − s0. We can simulate several order effects by varying the parameter φ...... 160

xxvi 10.4 Example of the Relativistic Interpretation of Quantum Parameters. Each person reasons according to a N-dimensional personal basis state without being aware of it. The repre- sentation of the beliefs between different people consists in rotating the personal belief state by φ radians...... 161

11.1 Example of a Bayesian Network with a latent variable H and a random variable X..... 171 11.2 A classical Bayesian Network with a latent variables to model the Prisoner’s Dilemma game. P 1 and P 2 are both random variables. P 1 represents the decision of the first player and P 2 represents the decision of the second player (either to cooperate or to defect). H is the hidden state or latent variable and represents some unmeasurable factor that can influence the participant’s decisions...... 173 11.3 Classical Bayesian Network to model the observed conditions for the Prisoner’s Dilemma Game. OutP 1 and P 2 are both random variables that represent the outcome (or decision) of the first player and the decision of the second player. The decisions can either be defect, which is represented by d or cooperate, represented by c. H2 represents a latent (hidden) unmeasurable variable that corresponds to the personality of the second player: either risk averse (ra) or risk seeking (rs)...... 177 11.4 A general classical Bayesian Network with two latent variables, H1 and H2, to express both unobserved and observed conditions for the Prisoner’s Dilemma game. Random variables P 1U and P 1 represent the first player’s decision according to the unobserved and observed conditions, respectively. Random variables P 2U and P 2 represent the sec- ond player’s decision according to the unobserved and observed conditions, respectively. The assignments ra stand for risk averse, rs risk seeking, d defect and c cooperate.... 179 11.5 Example of a Quantum-Like Bayesian Network. The terms ψ correspond to quantum probability amplitudes. The variables P 1 and P 2 correspond to random variables repre- senting the first and the second player, respectively...... 181

xxvii xxviii Chapter 1

Introduction

It is the purpose of this thesis to explore the applications of the formalisms of in areas outside of physics. More specifically, it is proposed a quantum-like decision model based on a network structure to accommodate and predict several paradoxical findings that were reported over the literature [193, 89, 195, 197, 198]. Note that, the term quantum-like is simply the designation that is employed to refer to any model, which is applied in the domains outside of physics and that makes use of the mathematical formalisms of quantum mechanics, abstracting them from any physical meanings or interpretations. The paradoxes reported over the literature suggest that human behaviour does not follow normative rational choices. In other words, people usually do not choose the preferences which lead to a maximum utility in a decision scenario and consequently are consistently violating the axioms of expected utility functions and the laws of classical probability theory. When observations contradict one of the most significant and predominant decision theories, like the Expected Utility Theory, then it often suggests that there is something missing in the theory. When dealing with preferences under uncertainty, it seems that models based on normative theories of rational choice tend to tell how individuals must choose, instead of telling how they actually choose [129]. It is the purpose of this thesis to provide a set of contributions of quantum based models applied to decision scenarios as an alternative mathematical approach to human decision-making and cognition in order to better understand the structure of human behaviour.

1.1 Violations to Normative Theories of Rational Choice

The process of decision-making is a research field that has always triggered a vast amount of interest among several fields of the scientific community. Throughout time, many frameworks for decision-making have been developed. In the beginning of the 1930’s, economical models focused in the mathematical structures of preferences, which take choices as primitives and investigate whether these choices can be represented by some utility function. The biggest consequence of this approach is the separation of economics from psychology. This means that human psychological processes started to be irrelevant as long as human decision-making obeys to some set of axioms [77]. According to these strong normative

1 models, human behaviour is assumed to maximise his/her utility function and by doing so, the person would be acting in a rational manner. It was in 1944, that the Expected Utility theory was axiomatised by the mathematician and the economist Oskar Morgenstern, and became one of the most significant and predominate rational theories of decision-making [201]. The Expected utility hypothesis is characterised by a specific set of axioms that enable the computation of the person’s preferences with regard to choices under risk [74]. By risk, we mean choices that can be measured and quantified. Putting in other words, choices based on objective probabilities. However, in 1953, Allais proposed an experiment that showed that human behaviour does not follow these normative rules and violates the axioms of Expected Utility, leading to the well known Allais paradox [13]. Later, in 1954, the mathematician Leonard Savage proposed an extension of the Expected Utility hypothesis, giving origin to the Subjective Expected Utility [170]. Instead of dealing with decisions under risk, the Subjective Utility theory deals with uncertainty. Uncertainty is specified by subjective probabilities and is understood as choices that cannot be quantified and are not based on objective probabilities. But once more, in 1961, Daniel Ellsberg proposed an experiment that showed that human behaviour also contradicts and violates the axioms of the Subjective Expected Utility theory, leading to the Ellsberg paradox [70]. In the end, the Ellsberg and Allais paradoxes show that human behaviour is not normative and tend to violate the axioms of rational decision theories.

1.2 The Emergence of Quantum Cognition

Later, in the 70s, cognitive psychologists and Daniel Kahneman decided to put to test the axioms of the Expected Utility hypothesis. They performed a set of experiments in which they demon- strated that people usually violate the Expected Utility hypothesis and the laws of logic and probability in decision scenarios under uncertainty [193, 195, 197, 90, 89]. From these experiments, it was reported several paradoxes, such as disjunction / conjunction fallacies, order of effects, etc. Motivated by these findings, researchers started to look for alternative mathematical representations in order to accommodate these violations. Although in the 40’s, Niels Bohr had defended and was con- vinced that the general notions of quantum mechanics could be applied in fields outside of physics [150], it was only in the 90’s, that researchers started to actually apply the formalisms of quantum mechanics to problems concerned with social sciences. It was the pioneering work of Aerts & Aerts [7] that gave rise to the field Quantum Cognition. In their work, Aerts & Aerts [7] designed a that was able to represent the evolution from a quantum structure to a classical one, depending on the de- gree of knowledge regarding the decision scenario. The authors also made several experiments to test the variation of probabilities when posing yes/no questions. According to their experiment, most partici- pants formed their answer at the moment the question was posed. This behaviour goes against classical theories, because in classical probability, it would be expected that the participants have a predefined answer to the question (or a prior) and not form it at the moment of the question. A further discussion about this study can be found in the works of [4, 11, 12, 76,8]. Quantum cognition has emerged as a research field that aims to build cognitive models using the

2 mathematical principles of quantum mechanics. Given that classical probability theory is very rigid in the sense that it poses many constraints and assumptions (single trajectory principle, obeys set theory, etc.), it becomes too limited (or even impossible) to provide simple models that can capture human judgments and decisions since people are constantly violating the laws of logic and probability theory [33, 37,6].

1.3 Motivation: Violations to the Sure Thing Principle

Although there are many paradoxical situations reported all over the literature, in this work we focus on one of the most predominant human decision-making errors that still persists nowadays: the disjunction effect [198]. The disjunction effect occurs whenever the Sure Thing Principle is violated. This principle is fundamental in classical probability theory and states that, if one prefers action A over B under the state of the world X, and if one also prefers A over B under the complementary state of the world X, then one should always prefer action A over B even when the state of the world is not known [170]. Violations of the Sure Thing Principle imply violations of the classical law of total probability. In order to put to test the Sure Thing Principle, Tversky & Shafir [198] conducted an experiment, which is called the Two Stage Gambling Game. Under this experiment, participants were asked to make a set of two consecutive gambles. At each stage, they were asked to make the decision of whether or not to play a gamble that has an equal chance of winning $200 or losing $100. Three conditions were verified:

1. Participants were informed if they had won the first gamble;

2. Participants were informed if they had lost the first gamble;

3. Participants did not know the outcome of the first gamble;

The results obtained showed that participants who knew they had won the first gamble, decided to play the second gamble. Participants who knew they had lost the first gamble also decided to play the second gamble. We will address to these two conditions as the knowable conditions. Through Savage’s Sure Thing Principle, it would be expected that the participants would choose to play the second gamble even when they did not know the outcome of the first gamble. However, the results obtained showed that the majority of the participants became risk averse and chose not to play the second gamble, leading to a violation of the Sure Thing Principle. We will refer to this experimental condition as the unknowable condition. Figures 1.1 and 1.2 represent the knowable and unknowable conditions, respectively. Tversky & Shafir [198] explained these findings in the following way: when the participants knew that they had won, then they had extra house money to play with and decided to play the second gamble. When the participants knew that they lost, then they decided to play again with the hope of recovering the lost money. But when the participants did not know if they had won or lost the gamble, then these thoughts did not arise in their minds and consequently they ended up not to playing the second gamble. Under a mathematical point of view, a person acts in a rational and consistent way, if under the unknowable condition, he/she chooses to play the second gamble. Let P r (G2 = play|G1 = win) and

3 Figure 1.1: Representation of the knowable Figure 1.2: Representation of the unknowable conditions of the Two Stage Gambling Game conditions of the Two Stage Gambling Game experiment of Tversky & Shafir [198]. experiment of Tversky & Shafir [198].

P r (G2 = play|G1 = win) be the probability of a player choosing to play the second gamble given that it is known that he won / lost the first gamble, respectively. And let P r (G2 = play) be the probability of the second player choosing to play without knowing the outcome of the first gamble. Assuming a neutral prior and that the gamble is fair and not biased (50% chance of either winning or losing the first gamble), it would be expected that:

P r (G2 = play|G1 = win) ≥ P r (G2 = play) ≥ P r (G2 = play|G1 = lose)

However, this is not consistent with the experimental results reported in the work of Tversky & Shafir [198]. What it was perceived in their experiments was that the probability of the unknowable condition got extremely low compared to the known conditions.

P r (G2 = play|G1 = win) ≥ P r (G2 = play|G1 = lose) ≥ P r (G2 = play)

This led to a violation of the laws of classical probability theory. was also not able to accommodate many paradoxical findings that were being observed in several experimental settings. This gave rise to the axiomatisation of the theory of quantum mechanics. In this thesis, we explore these paradoxical scenarios in the same way by using quantum probability theory as an alternative mathematical formalism. Under a quantum cognition perspective, the third experimental condition, the unknown condition, could be mathematically explained by quantum interference effects. In quantum mechanics, electrons which are in an undefined state can interfere with each other. Under a quantum cognitive point of view, if we consider that the beliefs of the participants are in an undefined state, then they can also interfere with each other causing the final probabilities to be disturbed either increasing them (constructive interferences) or decreasing them (destructive interferences). This last one is the type of interference that results in violations to the Sure Thing Principle. Figure 6.2 represents the third experimental condition from Tversky & Shafir [198] under a quantum cognitive point of view with interference effects being generated when the outcome of the first gamble is not known.

4 Figure 1.3: Representation of the unknowable conditions of the Two Stage Gambling Game experiment conducted by Tversky & Shafir [198].

1.4 Why Quantum Cognition?

It is not the purpose of this thesis to argue whether quantum-like models should be preferred over classical models. Just like it will be addressed in future chapters of this work, the advantages of the applications of quantum-like models depend on the type of the decision problem (Chapters 10 and 11). Following the lines of though of Sloman [179], people have to deal with missing / unknown information. This lack of information can be translated into the feelings of ambiguity, uncertainty, vagueness, risk, ignorance, etc [216], and each of them may require different mathematical approaches to build adequate cognitive / decision problems. Quantum probability theory can be seen as an alternative mathematical approach to model such cognitive phenomena. Some researchers argue that quantum-like models do not offer many underlying aspects of human cognition (like , reasoning, etc). They are merely mathematical models used to fit data and for this reason they are able to accommodate many paradoxical findings [123]. Indeed quantum-like models provide a more general probability theory that use quantum interference effects to model de- cision scenarios, however they are also consistent with other psychological phenomena (for instance, order effects) [179]. In the book of Busemeyer & Bruza [34], for instance, the feeling of uncertainty or ambiguity can be associated to quantum superpositions, in which assumes that all beliefs of a person occur simultaneously, instead of the classical approach which considers that each belief occurs in each time frame. The book of Busemeyer & Bruza [34] provides a set of quantum phenomena that can be associated to psychological processes that support the application of quantum-like models to cognitive models.

• Violation of Classical Laws: The biggest motivation for the application of quantum formalisms in areas outside of physics is the need to explain paradoxical findings that are hard to explain through classical theory: violations to the Sure Thing Principle, disjunction/conjunction errors, Ellsberg / Allais paradoxes. Quantum theory is a more general framework that allows the accommodation and explanation of scenarios violating the laws of classical probability and logic [33, 31, 38].

• Superposition: Under a classical point of view, cognitive models assume that, at each moment of

5 time, a person is in a definite state. For instance, while making a judgement whether of not buying a car, a person is either in a state corresponding to the judgment buy car or in the state not buy car for each instance of time. In quantum cognition, it is assumed that the human process works like a wave until a decision is made. A person can be in an indefinite state, that is, due to the wave- like structure, a person can be in a superposition of thoughts. For each instance of time, a person can be in the state buy car and not buy car. This wave-like paradigm enables the representation of conflicting, ambiguous and uncertain thoughts more clearly as well as vagueness [28].

• The Principle of Unicity: In classical theory, when the path of a particle is unknown, it is assumed that the particle either goes from one path or another with probability 1/2. In quantum theory, when the path is unknown, the particle enters in a superposition state, taking all paths at the same instance of time, generating interference effects that alter the final probabilities of the particle.

• Sensitivity to Measurement: In quantum mechanics, the act of measuring disturbs a state making it collapse into one definite state. A measurement on a system rather creates than records a property of the system [162]. In the scope of quantum cognition, the measurement process can be used to explain decisions if we assume that human thoughts are represented by a wave in superposition. For instance, if we ask a person if he/she will buy a car, immediately before the question is posed, the person is in a superposition state. When the question is posed, the superposition state will collapse into either one of two states: one in which the answer is yes, the other in which the answer is no. The answer is created from the interaction of the superposition state and the question. In classical mechanics, this act of creation does not exist. Since a state is always considered definite, then the properties of a system are recorded rather than created.

• Measurement Incompatibility: In classical theory, the act of asking a sequence of two questions should yield the same answers as in the situation where the questions are posed in reversed order. Empirical experiments have shown that this is not the case, and the act of answering the first question changes the context of the second question, yielding people to give different answers according to the order in which the questions are posed. Quantum theory allows the explanation of order effects intuitively, since operations in quantum theory are non-commutative.

1.5 Challenges of Current Quantum-Like Models

Although recent research shows the successful application of quantum-like models in many different decision scenarios of the literature [20, 16, 54, 50, 187], there are still many concerns that challenge and put some resistance in the acceptance and usage of these models. Some of the current challenges that quantum-like models face can be summarised in the following points.

• Prediction. Although many cognitive models have been successfully applied to accommodate several paradoxical findings, they cannot be considered predictive. Most of the quantum-like mod- els proposed need to have a priori knowledge of the outcome of the probability distributions of

6 the experiment in order to fit parameters to explain the paradoxical results. For this reason, it is considered that these models have an explanatory nature rather than a predictive one.

• Scalability. Although there are many experiments that report paradoxical findings [164, 41, 120, 122], these experiments consist of very small scenarios that are modelled by, at most, two random variables. Therefore, many of the proposed models in the literature are only effective under such small scenarios and become intractable (or even intractable) for more complex situations. The number of quantum interference parameters grows exponentially large [93, 96, 101] or there are computational constraints in the computation of very large unitary operators [164, 44, 41].

• What can be considered Quantum-Like? Since the emergence of the Quantum Cognition field, many researchers have been attempting to apply the mathematical formalisms of quantum me- chanics in many different research areas, ranging from Biology [20, 16], Economics [102, 82], Finance [81] Perception [54, 50], Jury duty [187] to domains such as Information Retrieval [133]. Regarding this last field, it has been proposed quantum-like versions of geometric-based pro- jection models, which measure the similarity between entities (either documents, , etc). However, it is still not clear if applying a quantum-like projective approach has any advantages towards the classical models, since the way that these models accommodate the paradoxical find- ings is through a rotation of the vector space instead of the usage of quantum interferences [144].

• Classical vs. Quantum-Like. Recent research shows that quantum-based probabilistic models are able to explain and accommodate decision scenarios that cannot be explained by pure clas- sical models [38, 31]. However, there is still a big resistance in the scientific literature to accept these quantum-based models [123, 184]. Many researchers believe that one can model scenarios that violate the laws of probability and logic through traditional classical decision models [151]. Classical models can indeed simulate many of the paradoxical findings reported all over the litera- ture [144]. This rises the question of the advantages or even the applicability of quantum models over classical ones.

• Non-Kolmogorovian Models. Quantum-like models make use of quantum interference effects in order to accommodate paradoxical decision-scenarios [165]. Since pure classical probabilistic models are constrained to the limitations of set theory, it is difficult (or even impossible) to represent these paradoxes. But if the limitations are in the constraints of set theory, then non-Kolmogorovian theories such as Dempster-Shafer (D-S) theory [171] should also be able to accommodate the same decision problems that quantum-like models are able to. The Dempster-Shafer theory of evidence differs from classical set theory in the sense that it is possible to associate measures of uncertainty to sets of hypotheses, this way enabling the theory to distinguish between uncer- tainty from ignorance [128]. This distinction has been proved all over the literature and has shown accurate predictions in sensor fusion models [136]. The uncertainty in the D-S theory is speci- fied by allowing the representation of probabilities to sets of events, instead of being constraint to specify the probabilities to atomic events (like in classical probability theory). It is still an open research question in the literature if there are any relations between quantum-like models and

7 other non-Kolmogorovian probability theories. If this holds to be true, then the accommodation of the paradoxical decision scenarios does not come from the unique characteristics of quantum-like models and quantum interference effects, but because of the limitations of set theory.

1.6 Thesis Proposal

In order to overcome the above challenges, in this thesis it is proposed a quantum-like Bayesian Network formalism, which consists in replacing classical probabilities by quantum complex probability amplitudes. However, since this approach also suffers from the problem of exponential growth of quantum param- eters that need to be fit, it is also proposed a similarity heuristic [173] that automatically computes this exponential number of quantum parameters through vector similarities. A Bayesian Network can be un- derstood as an acyclic directed graph, in which each node represents a random variable and each edge represents a direct causal influence from the source node to the target node (conditional dependence). Under the proposed network, if a node (event) is unobserved, then it can enter in a superposition state and produce interference effects. These effects provide some explanation in terms of cognition, since they can be seen as the feeling of confusion or ambiguity [34].

1.6.1 Why Bayesian Networks?

Bayesian Networks are one of the most powerful structures known by the Computer Science community for deriving probabilistic inferences (for instance, in medical diagnosis, spam filtering, image segmenta- tion, etc) [116]. The reason why Bayesian Networks were chosen is because it provides a link between probability theory and graph theory. And a fundamental property of graph theory is its modularity: one can build a complex system by combining smaller and simpler parts. It is easier for a person to combine pieces of evidence and to reason about them, instead of calculating all possible events and their respec- tive beliefs [79]. In the same way, Bayesian Networks represent the decision problem in small modules that can be combined to perform inferences. Only the probabilities which are actually needed to perform the inferences are computed. This process can resemble human cognition [79]. While reasoning, humans cannot process all possible information, because of their limited capacity [90]. Consequently, they combine several smaller pieces of evidence in order to reach a final decision. A Bayesian Network works exactly in the same way. It provides a relation mechanism between human cognition and inductive inference [161].

1.6.2 Quantum-Like Bayesian Networks for Disjunction Errors

In this thesis, it is addressed the problem of violations to the Sure Thing Principle (which are a conse- quence of disjunction errors) by examining two major problems in which these violations were verified: the Prisoner’s Dilemma game and the Two Stage Gambling game. These violations were initially re- ported by Tversky & Shafir [198] and later reproduced in several works in the literature that also reported similar results [125, 39, 86]. It will be demonstrated how the current classical models fail to explain the

8 paradoxical findings implied in the violations of the Sure Thing Principle and it will be proposed a new quantum-like structure based on Bayesian Networks. This model has the advantage of being flexible enough to be easily extended to more complex decision scenarios (containing more than two random variables) and provides mechanisms to automatically compute quantum parameters derived from quan- tum interference effects. This way, the proposed model decouples itself from the models proposed in the literature for its ability of not requiring a priori knowledge of the outcome of the empirical experiments to accommodate the paradoxical results.

1.6.3 Comparison with Existing Quantum-Like Models

In this thesis, it was performed an overview of the most important quantum models in the literature that are used to make predictions under scenarios where the Sure Thing Principle is being violated. These models were evaluated in terms of three metrics: interference effects, parameter tuning and scalability. The first examines if the analysed model makes use of any type of quantum interference to explain human decision-making. The second is concerned with the assignment of values to a large number of quantum parameters. The last one consists of analysing the ability of the models to be extended and generalised to more complex scenarios. We also studied the growth of the quantum parameters when the complexity and the levels of uncertainty of the decision scenario increase. Finally, we compared these quantum models with traditional classical models from the literature. We conclude with a discussion of the manner in which the models addressed in this thesis can only deal with very small decision problems and why they cannot scale well to larger and more complex decision scenarios.

1.7 Advantages of Quantum-Like Models

Recent research shows that quantum-based probabilistic models are able to explain and predict scenar- ios that cannot be explained by pure classical models [33, 31]. However, there is still a big resistance in the scientific literature to accept these quantum-based models. Many researchers believe that one can model scenarios that violate the laws of probability and logic through traditional classical decision models [123, 184]. Although, the quantum cognition field is recent in the literature, there have been several different quantum-like models proposed in the literature. These models range from dynamical models [44, 41, 164], which make use of unitary operators to describe the time evolution since a participant is given a problem (or asked a question), until he/she makes a decision, to models that are based on contextual probabilities [7, 105, 215]. Quantum-like dynamical models have also been proposed in the literature to accommodate violations to the Prisoner’s Dilemma Game [164], study the evolution of the interaction of economical agents in markets [102, 82] or even to specify a formal description of dynamics of epigenetic states of cells interacting with an environment [19]. On the other hand, quantum-like models based on contextual probabilities, explore the application of complex probability amplitudes in order to define contexts that can interfere with the decision-maker [99, 106, 105]. For a survey about the applications

9 of quantum-like models for the Sure Thing Principle, the reader can refer to Moreira & Wichert [142].

1.7.1 Research Questions

With this thesis, a set of research questions are posed. Their answers will be explored throughout the chapters of this work and will be answered with more detail in the final chapter of this thesis (Chapter 12).

• [RQ1] So far, several quantum-like models have been proposed all over the literature, ranging from models that are dynamical [44, 164,3], based on contextual probabilities [105] or even models based on expected utility theories [149, 10]. What is the advantage of the proposed approach?

• [RQ2] Different quantum-like models use different types of quantum interference terms: either from the usage of Hamiltonians or by the usage of Feynman’s Path Integrals. But what is the psychological interpretation of quantum interference effects under these approaches?

• [RQ3] Are quantum projection models really quantum? Or are they merely a representation of a classical model with a rotation of the basis vectors?

• [RQ4] It is a legitimate thought that pure classical models fail to accommodate paradoxical decision- scenarios, because there could be some hidden variables that could be influencing the results and due to physical limitations, one cannot gather their data. Can a classical model with hidden vari- ables be used to accommodate the paradoxical findings reported over the literature? Or is this accommodation specific to quantum-like models?

1.8 Contributions

During this work, several contributions were proposed to the scientific community:

1. Catarina Moreira and Andreas Wichert, Are Quantum-Like Bayesian Networks More Powerful than Classical Bayesian Networks? (Major Revisions).

2. Diederik Aerts, Suzette Geriente, Catarina Moreira and Sandro Sozzo, Testing Ambiguity and Machina Preferences Within a Quantum-theoretic Framework for Decision-making, (under re- view) [10]

3. Catarina Moreira and Andreas Wichert, Are Quantum Models for Order Effects Quantum?, Inter- national Journal of Theoretical Physics, 1-18, 2017 [144]

4. Catarina Moreira and Andreas Wichert, Exploring the Relations Between Quantum-Like Bayesian Networks and Decision-Making Tasks with Regard to Face Stimuli, Journal of Mathematical Psy- chology, 78, 86-95, 2017 [145]

5. Catarina Moreira and Andreas Wichert, Quantum Probabilistic Models Revisited: the Case of Disjunction Effects in Cognition, Frontiers in Physics: Interdisciplinary Physics, 4, 1-26, 2016 [142]

10 6. Catarina Moreira and Andreas Wichert, Quantum-Like Bayesian Networks for Modelling Decision Making, Frontiers in Psychology, 7, 2016 [141]

7. Catarina Moreira and Andreas Wichert, The Synchronicity Principle Under Quantum Probabilistic Inferences, NeuroQuantology, 13,111-133, 2015 [140]

8. Catarina Moreira and Andreas Wichert, Interference Effects in Quantum Belief Networks, Journal of Applied Soft Computing, 25, 64-85, 2014 [137]

1.8.1 Conference Papers, Extended Abstracts and Posters

1. Catarina Moreira, Quantum-Like Influence Diagrams: Incorporating Expected Utility in Quantum- Like Bayesian Networks, International Symposium Worlds of Entanglement, Belgium, 2017 (ex- tended abstract) [135].

2. Catarina Moreira, Emmanuel Haven, Sandro Sozzo, Andreas Wichert, A Quantum-Like Analysis of a Real Financial Scenario: The Dutch’s Bank Loan Application, In Proceedings of the 13th Econophysics Colloquium, Poland, 2017 (extended abstract) [146].

3. Catarina Moreira and Andreas Wichert, When to use Quantum Probabilities in Quantum Cogni- tion? A Discussion, In Proceedings of the 12th Biennial International Quantum Structures Associ- ation Conference, United Kingdom, 2016 (extended abstract) [143].

4. Catarina Moreira and Andreas Wichert, Application of Quantum-Like Bayesian Networks in Social Sciences, 4th Champalimaud NeuroScience Symposium, Champalimaud Center of the Unknown, Portugal, 2015 (poster)

5. Catarina Moreira and Andreas Wichert, Quantum-Like Bayesian Networks using Feynman’s Path Diagram Rules, In Proceedings of the 16th Vaxj¨ o¨ Conference on Quantum Theory: from Founda- tions to Technologies, Sweden, 2015 (extended abstract) [138].

6. Catarina Moreira and Andreas Wichert, The Relation Between Acausality and Interference in Quantum-Like Bayesian Networks, In Proceedings of the 9th International Conference on Quan- tum Interactions, Switzerland, 2015 [139].

1.9 Organisation

The present work is organised as follows:

• Chapter1 presents an introduction and motivation of the current thesis by making an overview of the scientific historical aspects that contributed for the emergence of the field of Quantum Cogni- tion. An example showing violations to the Sure Thing Principle is also presented as a motivation for the topics and problems that will be addressed throughout this work.

11 • Chapter2 presents the fundamental concepts of quantum probability theory and makes an intro- duction to the field of Quantum Cognition.

• Chapter3 makes a general overview of the basic concepts related to Bayesian Networks, which are fundamental for the understanding of this work.

• Chapter4 makes a small overview of the most relevant paradoxes and fallacies that occur in decision-making scenarios and provides a brief literature overview of current approaches that at- tempt to address them.

• Chapter5 presents the first contribution of this work. It provides an exhaustive overview and dis- cussion of the most important state-of-the-art quantum cognitive models that are able to explain the paradoxical findings of experiments that violate the Sure Thing Principle. It also presents a deep comparison of and discussion of several quantum models in terms of three elements: (1) incorporation of quantum interference effects, (2) how to find quantum parameters and (3) scala- bility of the model for more complex decision problems. This study has been published in Moreira & Wichert [142].

• Chapter6 presents the second contribution of this work: a quantum-like Bayesian Network for- malism, which consists in replacing classical probabilities by quantum probability amplitudes. The proposed model takes advantage of its modular network structure to scale to more complex de- cision scenarios and generates an exponential number of quantum interference effects. An initial study of the impact of these interference terms is performed. This study has been published in Moreira & Wichert [137].

• Chapter7 presents the third contribution of this work. We complement the study of the quantum- like Bayesian network by proposing a vector similarity heuristic that is based on the probability distribution of the data collected for many different experiments reported over the literature. This heuristic takes into account the vector similarity between random variables and, based on this information, attempts to derive a value for the quantum interference term. This heuristic turns the proposed model predictive, contrary to current state of the art approaches, which have an explanatory nature. This study has been published in Moreira & Wichert [141].

• Chapter8 presents the forth contribution of this work. We complement the study of the quantum- like Bayesian network by proposing an heuristic to model the Categorisation / Decision experiment from Busemeyer et al. [41]. The model consists in representing objects (or events) in an arbitrary n-dimensional vector space, enabling their comparison through similarity functions. The computed similarity value is used to set the quantum parameters in the quantum-like Bayesian Network model. The difference of this approach with the vector similarity heuristic is that the former is based on the contents of the data objects used to perform inference, whereas the later makes use of a previous probability distribution analysis of several experiments reported all over the literature. This study has been published in Moreira & Wichert [145].

12 • Chapter9 presents the fifth contribution of this work. It is analysed a quantum-like Bayesian Net- work that puts together cause/effect relationships and semantic similarities between events. These semantic similarities constitute acausal connections according to the Synchronicity principle and provide new relationships for quantum-like Bayesian networks. As a consequence, beliefs (or any other event) can be represented in vector spaces, in which quantum parameters are determined by the similarities that these vectors share between them. Events attached by a semantic meaning do not need to have an explanation in terms of cause and effect. This study has been published in Moreira & Wichert [140, 139].

• Chapter 10 presents the sixth contribution of this work. We analysed several order effects situa- tions (additive, subtractive, assimilation and contrast effects) using the gallup reports collected in the work of Moore [134]. In the end, we show that order effects can be explained by both classical and quantum frameworks intuitively, since both models are similar and take advantage of the fact that matrix multiplications are non-commutative. Depending on how one sets the rotation operator, one can simulate any effect reported in Moore [134]. This study has been published in Moreira & Wichert [144].

• Chapter 11 presents the seventh contribution of this work. In this chapter it is discussed that, although the classical models with latent variables could explain the paradoxical findings under the Prisoner’s Dilemma game, the same model could not simulate the choice of the player when a piece of evidence was given, that is, when it was known which action the first player chose. This leads to the dilemma: either one creates a classical model just to account for observed evidence or one creates the model just to explain the paradoxical findings. This study is currently under review.

• Chapter 12 makes a general summary and some final discussions about this work by answering the research questions posed in Section 1.7.1.

• Chapter 13 makes a discussion of some promising directions for future work that we have already established in preliminary research [135, 146].

13 14 Chapter 2

Quantum Cognition Fundamentals

Since the paradoxical findings of Tversky and Khaneman[193, 195, 197, 90, 89], researchers started to look for alternative frameworks to accommodate decision scenarios that violated the laws of probability theory and logic. In the same way quantum physics was created to explain several paradoxical findings that could not be explained through classical physics, quantum cognition emerged as a research field that aims to explain paradoxical decision scenarios by building cognitive models with the underlying mathematical principles of quantum mechanics. In this sense, psychological (and cognitive) models benefit from the usage of quantum probability principles because they have many advantages over classical counterparts [42]. In quantum theory, events are represented as multidimensional vectors in a Hilbert space. This vector representation comprises the occurrence of all events at the same time. In quantum mechanics, this property refers to the superposition principle. Under a psychological point of view, a quantum superposition can be related to the feeling of confusion, uncertainty or ambiguity [34]. This vector representation neither obeys to the distributive axiom of Boolean logic nor to the law of total probability. It also enables the construction of more general models that can mathematically explain cognitive phenomena such as violations of the Sure Thing Principle [110, 131], which is the focus of this thesis. Quantum probability principles have also been successfully applied in many different fields of the literature, namely in biology [20, 16], economics [102, 82], perception [54, 50], jury duty [187], game theory [148, 30], order effects [206], opinion polls [111, 109] etc.

This chapter presents the fundamental concepts of quantum cognition that are necessary for the understanding of this work. In Section 2.1, we introduce the main concepts of quantum probability theory by comparing it with the classical theory axiomatised by Kolmogorov [117], giving illustrative examples of how to apply this theory. The derivation of quantum interference terms from complex probability amplitudes is detailed in Section 2.2. A parallel analysis of how classical and quantum systems evolve through time is presented in Section 2.3. This chapter also compares the calculations of probabilities, in a classical and quantum setting, in path trajectories in Section 2.4. In Sections 2.5 and 2.6, we explain how Born’s rule was derived and why complex amplitudes are important for quantum mechanics and quantum cognitive models. We will give some discussions about these subjects, since they are still open research questions in the scientific community. Finally, in Section 2.7, it is summarised all the

15 important concepts that were addressed throughout this chapter.

2.1 Introduction to Quantum Probabilities

In this section, the main differences between classical and quantum probability theory are presented. The concepts will be introduced by an example concerning jury duty. Suppose you are a juror and you must decide whether a defendant is guilty or innocent. The following sections describe how to represent this problem according to classical probability theory and quantum probability theory. This comparison is based on the book of Busemeyer & Bruza [34].

2.1.1 Representation of Quantum States

The notation adopted in quantum theory is the Dirac notation [67], also known as the ”bra-ket” notation. Instead of writing states using column vectors, in quantum theory column vectors are written in a linear and compact way. For instance, a k-dimensional column vector A:

  α0      α1  A =   ,  .   .    αk−1 can be written in terms of ket notation, as |Ai, in the following way:

|Ai = α0|0i + α1|1i + ... + αk−1|k − 1i

Where,     1 0          0   0  |0i =   ,..., |k − 1i =    .   .   .   .      0 1

The bra notation, of the same vector hA|, corresponds to the conjugate transpose (represented by the symbol †) of the ket representation of |Ai and vice versa:

hA|† = |Ai

,

16 This means that the inner product of vector A can be written in terms of Dirac notation as:

  α0,      α ,  .  1  2 2 2 hA|Ai = α , α , ., α   = |α0| + |α1| + ... + |αk−1| . 0 1 k−1  .   .,    αk−1

In the same way, the outer product of vector A, which is known as the quantum Projection operator, can be written as:

   2 ∗ ∗  α0, |α0| α0α1 . . . α0αk−1          ∗ 2 ∗   α1,  .  α1α0 |α1| . . . α1αk−1  hA|Ai =   α , α , ., α =    .  0 1 k−1  . . .. .   .,   . . . .      ∗ ∗ 2 αk−1 αk−1α0 αk−1α1 ... |αk−1|

The main advantage of using the Dirac’s notation is that it enables the explicit labelling of the basis vectors. In quantum theory, this is a great benefit since it allows the representation of a quantum system by a vector while at the same time explicitly writing the physical quantity of interest (either the , position, spin, etc, of the electron). Writing vectors in the Dirac notation often saves space. Partic- ularly, when writing sparse vectors, the Dirac notation enables a compact representation by representing a vector through a binary string of length k, instead of writing a column vector representation with 2k components [2].

2.1.2 Space

In classical probability theory, events are contained in a Sample Space. A Sample Space Ω corre- sponds to the set of all possible outcomes of an experiment or random trial [64]. For example, when judging whether a defendant is guilty or innocent, the sample space is given by Ω = Guilty, Innocent (Figure 2.1). In quantum probability theory, events are contained in a Hilbert Space. A Hilbert space can be defined as a generalisation and extension of the Euclidean space into spaces with any finite or infinite number of dimensions. It is a vector space of complex numbers and offers the structure of an inner product to enable the measurement of angles and lengths [84]. The space is spanned by a set of orthonormal basis vectors, which form a basis for the space. In the jury duty example, we can define a set of orthonormal basis vectors as H = {|Guiltyi, Innocenti}, where

    1 0 |Guiltyi =   |Innocenti =   . 0 1

Figure 2.2, presents a diagram showing the Hilbert space of a defendant being guilty or innocent [34]. Inferences are then calculated through similarities between vectors and events are defined in feature

17 vectors just like in many cognitive systems [194, 119, 35]. Since a Hilbert space enables the usage of complex numbers, then, in order to represent the events Guilty and Innocent, one would need two dimensions for each event (one for the real part and another for the imaginary part). In quantum decision theory, one usually ignores the imaginary component in order to be able to visualise geometrically all vectors in a 2-dimensional space.

Figure 2.2: Hilbert Space (quantum probabilty Figure 2.1: Sample Space (classical probabil- theory) ity theory)

2.1.3 Events

In classical probability theory, events can be defined by a set of outcomes to which a probability is assigned. They correspond to a subset of the sample space Ω from which they are contained in. Events can be mutually exclusive and they obey to set theory. This means that operations such as intersection or union of events is well defined, as well as the distributive axiom between sets. In the jury duty example, since a person cannot be both guilty and innocent, these events are defined as being two mutually exclusive sets, that is Guilty ∩ Innocent = ∅. We also have the union property between sets defined: Guilty ∪ Innocent = {Guilty, Innocent}. And we could also apply the distributive axiom. Let Z be some event (like the person carrying the crime scene weapon), then the distributive axiom would allow us to compute: Z ∩ (Guilty ∪ Innocent) = (Z ∩ Guilty) ∪ (Z ∩ Innocent). In quantum probability theory, events are defined geometrically and correspond to a subspace spanned by a subset of the basis vectors contained in the Hilbert Space. This geometric representation enables the definition of events as a superposition state vector |Si, which comprises the occurrence of all events. This representation of events is an important difference between classical and quantum prob- ability and a consequence of the quantum vector space representation. While in the classical theory we can only represent each event at each time frame as a set, in quantum probability we can represent all possible events at the same time though a vector. It follows that mutual exclusive events are represented by orthonormal vectors. Also, operations such as intersection and union of events can be represented geometrically and are only defined if and only if the events are contained in the same subspace [34]. If two events are represented by different basis vectors (that is, they are contained in different subspaces), then the intersection and union of events is not defined. It also follows that the distributive axiom does not hold [32]. This fact constitutes another major difference between classical and quantum probability theory.

18 Figure 2.3: Example of a representation of an event on a Hilbert Space

Concerning the jury duty example, the superposition state |Si can be defined as a superposition of a defendant being both Guilty and Innocent. Figure 2.3 presents a geometric visualisation of event |Si in a Hilbert space.

eiθG eiθI |Si = √ |Guiltyi + √ |Innocenti (2.1) 2 2

iθ In Equation 2.1, the values √e are quantum probability amplitudes. They correspond to the ampli- 2 tudes of a wave and are described by complex numbers. A complex number is a number that can be expressed in the form z = a + ib, where a and b are real numbers and i corresponds to the imaginary part, such that i2 = −1. Alternatively, a complex number can be described in the form z = |r| eiθ, where √ |r| = a2 + b2. The eiθ term is defined as the phase of the amplitude and corresponds to the angle be- tween the point expressed by (a, b) and the origin of the plane. These amplitudes are related to classical probability by taking the squared magnitude of these amplitudes through Born’s rule (more details will be given in Section 2.5). This is achieved by multiplying the amplitude with its complex conjugate.

2 ∗ iθG  iθG   iθG  iθG −iθG  2 e e e e e i(θ −θ ) 1 P r(Guilty) = √ = √ . √ = √ . √ = e G G √ = 0.5 (2.2) 2 2 2 2 2 2 In quantum theory, it is required that the sum of the squared magnitudes of each amplitude equals 1. This axiom is called the normalisation axiom and corresponds to the classical theory constraint that the probability of all events in a sample space should sum to one.

iθ 2 iθ 2 e G e I √ + √ = 1 (2.3) 2 2 If we consider the situation where the jury needs to reason about the guiltiness of two defendants, who were accused of conducting a crime together, then one could write their combined state using an operation called tensor product, which is represented by the symbol ⊗. The tensor product corresponds to a mathematical method that enables the construction of a Hilbert space from the combination of individual spaces. If we have two defendants who could be each either Guilty or Innocent, then we could represent them as a complex linear combination of these states as,:

(α0|Gi + β0|Ii) ⊗ (α1|Gi + β1|Ii) =

= α0β0|GGi + α0β1|GIi + α1β0|IGi + α1β1|IIi

19 where G and I, represent the basis for Guilty and Innocent, respectively. And αi, βj, represent the quantum complex amplitudes associated to the first and second defendants, respectively.

2.1.4 System State

A system state is a probability function P r which maps events into probability numbers, i.e., positive real numbers between 0 and 1.

In classical theory, the system state corresponds to exactly its definition. There is a function that is responsible to assign a probability value to the outcome of an event. If the event corresponds to the sample space, then the system state assigns a probability value of 1 to the event. If the event is empty, then it assigns a probability of 0. In our example, if nothing else is told to the juror, then the probability of the defendant being guilty is P r(Guilty) = 0.5.

In quantum theory, the probability of a defendant being |Guiltyi is given by the squared magnitude of the projection from the superposition state |Si to the subspace containing the observed event |Guiltyi. Figure 2.4 shows an example. If nothing is told to the juror about the guiltiness of a defendant, then according to quantum theory, the decision-maker is unsure and is in a superposition state |Si between these events.

Figure 2.4: Example of a quantum system state.

When someone asks whether the defendant is guilty, then the decision-maker needs to reason and to answer the question. Under the geometric representation of quantum events, a decision-maker makes up his mind by projecting the superposition state |Si into the subspace that is related to this answer. If the decision-maker thinks the defendant is guilty, then a projection operator PG is applied, which maps the superposition vector |Si to the |Guiltyi subspace. In the same way, if the decision-maker thinks the defendant is innocent, then another projection operator PI is applied, mapping the superposition vector |Si to the |Innocenti subspace (Figure 2.4). This act of projecting the into some subspace corresponds to the projection postulate of quantum mechanics.

eiθG eiθI |Si = √ |Guiltyi + √ |Innocenti 2 2

The projection operators PG and PI correspond to the outer product (also called tensor product) of

20 the basis states |Gi and |Ii, respectively.

    1 0 0 0 PG = |GuiltyihGuilty| =   PI = |InnocentihInnocent| =   0 0 0 1

In order to compute the probability of a defendant being guilty or innocent, under the geometric approach of quantum theory, we measure the length of the vector, which results from the projection of the superposition vector |Si to a desired subspace. The length of this vector corresponds to its squared magnitude and this value is interpreted to be the probability of the event. The interpretation of the squared magnitude of a wave’s density function being related to the probability of finding a particle in a given region of space was proposed by Max Born in 1926 The Statistical Interpretation of Quantum Mechanics [185]. This relation is known as Born’s rule. We will give more details about this rule in Section 2.5.

2 iθG 2 † e P r(Guilty) = ||PG|Si|| = (PG|Si) (PG|Si) = hS|PG|Si P r(Guilty) = √ = 0.5 2

2 iθI 2 † e P r(Innocent) = ||PI |Si|| = (PI |Si) (PI |Si) = hS|PI |Si P r(Innocent) = √ = 0.5 2

Note that, in quantum mechanics, the collapse of the wave function into a certain subspace with some probability corresponds to the act of measurement and to the collapse of the wave function. Although, the wave function is fully deterministic, when we make a measurement, the outcome is random.

2.1.5 State Revision

State revision corresponds to the situation where after observing an event, we are interested in observing other events given that the previous one has occurred.

P r(A∩B) In classical theory, this is addressed through the conditional probability formula P r(B|A) = P r(B) . So, returning to our example, suppose that some evidence has been given to the juror proving that the defendant is actually guilty, then what is the probability of him being innocent, given that it has been proved that he is guilty? This is computed in the following way.

P r(Innocent ∩ Guilty) P r(Innocent|Guilty) = = 0 P r(Guilty)

Since the events Guilty and Innocent are mutually exclusive, then their intersection is empty, leading to a zero probability value. In quantum probability theory, the state revision is given by first projecting the superposition state |Si into the subspace representing the observed event. Then, the projection is normalised such that the resulting vector is unit length. Again, if we want to determine the probability of a defendant being innocent, given he was found guilty, the calculations are performed as follows. We first start in the superposition state vector |Si.

21 1 1 |Si = √ |Guiltyi + √ |Innocenti 2 2 Then, we observe that the defendant is guilty, so we project the state vector |Si into the Guilty subspace and normalise the resulting projection. √ (1/ 2)|Guiltyi Sg = q √ 2 1/ 2

Sg = 1|Guiltyi + 0|Innocenti

From the resulting state, we just extract the probability of being innocent by simply squaring the respec- tive probability amplitude and find that we achieve the same results as in classical theory.

P r(Innocent|Guilty) = 02 = 0

2.1.6 Compatibility and Incompatibility

So far, although classical theory and quantum probability theory are different, they achieved the same results under our juror/defendant example. This happened, because the events were compatible. Classical theory follows the principle of unicity [78] which states that, for any given experiment, a single sample space contains all events. This means that a single probability function is necessary in order to assign probabilities to all events. As a consequence, operations such as intersection or union between events are always possible and well defined. Quantum theory only follows the unicity principle if and only if the events are spanned by the same basis vectors, that is, if the events are compatible. When events are spanned by a common basis vector, intersection and union operations are also possible and well defined and also a single probability function is enough to assign the probability values to outcomes. This means that, when we are dealing with compatible events, quantum theory converges to the classical probability theory [34]. The incompatibility phenomenon occurs only under the quantum probability theory. If two events are spanned by different basis vectors, then operations such as intersection and union are not defined [34]. This is the major difference between the classical theory and the quantum theory. In order to compute probabilities in incompatible events, one should turn to Luder’s¨ Rule [153]. This rule states that in order to compute the probability of two incompatible events A and B, one should compute the sequence of first observing event A and then calculate the probability of the sequence A and then B. Following the steps from Nielsen & Chuang [152] and Busemeyer & Bruza [34], the probability of two incompatible events is given as follows. Let A be an event spanned by the basis vectors V = |Vii, i = 1, ..., N and let

B be an event spanned by the basis vectors W = |Wii, i = 1, ..., N. After observing event A, we obtain the revised state: PA|Si SA = 2 ||PA|Si|| The probability of event A is simply the square of the projection of its amplitude, which is: P r(A) =

22 2 ||PA|Si|| . The probability of event B given that we observed event A is given by the square of the projection of the revised state vector SA corresponding to event A, into the subspace related to event B. That 2 is, P r(B|A) = ||PB|SAi|| . So, according to Luders¨ rule, the probability of B followed by A is given by

P r(A).P r(B|SA). And this equals to the following formulas [34].

2 2 P r(A).P r(B|SA) = ||PA|Si|| .||PB|SAi||

2 2 PA|Si = ||PA|Si|| . PB k PA|Si k

2 1 2 =k PA|Si k . 2 . k PBPA|Si k k PA|Si k

2 = ||PBPA|Si||

2.2 Interference Effects

Quantum interference is a challenging principle that essentially states that an individual elementary particle, such as a photon, can cross its own trajectory and interfere with the direction of its path, causing destructive or constructive effects. Mapping these effects to quantum cognition, the reasoning process of a decision-maker can be modelled by set of waves moving across time over a state space until a final decision is made. Under this perspective, interference can then be regarded as a chain of waves in a superposition state, coming from different directions. When these waves crash, one can experience a destructive effect (one wave destroys the other) or a constructive effect (one wave merges with another). In either case, the final probabilities of each wave is affected.

2.2.1 The Double Slit Experiment

The most famous example of quantum interference is the double slit experiment [56]. The experiment consisted in firing electrons towards a barrier with two slits. A detector would then record the pattern originated by the electrons. What makes the experiment interesting is that when a detector was put near the slits to check from which slit the electron was crossing, the detector screen showed an interference pattern with the size and shape of the two slits. This means that the electrons behaved like particles. When the detectors were not put near the slits (that is, no measurement was being made), then the de- tector revealed an interference pattern, characteristic of waves, which can crash between them causing destructive interference (when they cancel each other) or constructive interference (when they merge and their amplitude increases). Mathematically, this can be represented as an application of an unitary matrix to a superposition state. A unitary matrix can be seen as rotation of the vector space that pre- serves the length of the vectors and has orthonormal eigenvalues and unit eigenvalues. Let’s consider a unitary matrix, U, that rotates the vector space 45◦ clockwise and simply represents the electron passing from slit s1 or s2 to the top or bottom of the detector, d1 and d2, respectively. Let’s also define the initial

23 superposition vector of the electron, which can either go through slit s1 or s2, just like it is represented in Figure 2.5.

Figure 2.5: The double slit experiment. Electrons are fired and they can pass through one of the slits (either s1ors2) to reach a detector screen in points d1 or d2. If we measure from which slit the electron went through, then the pattern in the detectetor will have the shape and size of the two slits, suggesting a particle baheviour of the electron. If we do not measure from which slit the electron is going through, then the electron behaves as a wave and produces an interference pattern in the detector screen, with one point detecting constructive interference and another point detecting destructive interference.

Then, if we observe (or measure) that an electron passed in the first slit, s1, then one can compute the resulting quantum state corresponding to the electron reaching the detector as:

 √ √     √  1/ 2 1/ 2 1 1/ 2 √ 2

U|Ss1i =  √ √   √  =  √  → P r(d1) = 1/ 2 = 0.5 = P r(d2), 1/ 2 −1/ 2 0/ 2 1/ 2 which gives a 50% chance of either reaching the top or bottom part of the detector and shows that the electron behaved like a particle. If, however, we do not observe from which slit the electron passed through, then

 √ √   √    1/ 2 1/ 2 1/ 2 1 U|Si =  √ √   √  =   → P r(d1) = 1 and P r(d2) = 0, 1/ 2 −1/ 2 1/ 2 0 which shows a destructive interference of waves for the bottom of the detector and a constructive inter- ference effect when reaching its top. This means that applying a unitary matrix that designates a random effect over the electron (we do not know, which slit that the electron went through) to a random initial superposition state of the electron, leads to a well deterministic outcome Aaronson [2]. This effect does not occur under a classical setting, because we cannot specify negative probabilities in classical proba- bility theory. But in quantum probability theory, we can specify positive numbers, negative numbers and complex numbers through quantum probability amplitudes. These negative amplitudes are the reason why waves cancel each other or interfere constructively and are the core of quantum mechanics.

2.2.2 Derivation of Interference Effects from Complex Numbers

Interference effects can be naturally derived from the rules of complex numbers. The relation of some event A between classical probability, P r(A), and a quantum probability amplitude, eiθA ψ(A), is given

24 by Born’s rule in Equation 2.4. A quantum amplitude corresponds to the amplitude of a wave and is described by a complex number. The term eiθA is the phase of the amplitude.

P r(A) = | eiθA ψ(A) |2 (2.4)

Suppose that events A1,A2,...,AN form a set of mutually disjoint events, such that their union is all in the sample space, Ω, for any other event B. Then, the classical law of total probability can be formulated like in Equation 2.5. N N X X P r(B) = P r(Ai)P r(B|Ai) where: Ai = 1 (2.5) i=1 i=1 The quantum law of total probability can be derived through Equation 2.5 by applying Born’s rule (Equa- tion 2.4): N 2 N X X 2 iθx iθx P r(B) = e ψ(Ax)ψ(B|Ax) e ψ(Ax) = 1 (2.6) x=1 x=1 For simplicity, we will expand Equation 2.6 for N = 3 and only later we will find the general formula for N events: 2 iθ1 iθ2 iθ3 P r(B) = e ψ(A1)ψ(B|A1) + e ψ(A2)ψ(B|A2) + e ψ(A3)ψ(B|A3) (2.7)

Next, we compute the magnitude in Equation 2.7 by multiplying it with its complex conjugate:

iθ1 iθ2 iθ3  P r(B) = e ψ(A1)ψ(B|A1) + e ψ(A2)ψ(B|A2) + e ψ(A3)ψ(B|A3) · (2.8) −iθ1 −iθ2 −iθ3  · e ψ(A1)ψ(B|A1) + e ψ(A2)ψ(B|A2) + e ψ(A3)ψ(B|A3)

Simplifying Equation 2.8, we obtain:

2 2 2 P r(B) = |ψ(A1)ψ(B|A1)| + |ψ(A2)ψ(B|A2)| + |ψ(A3)ψ(B|A3)| + Interference (2.9)

In Equation 2.9, one can see that it is composed by the classical law of total probability and by an interference term. This interference term corresponds to:

iθ1−iθ2 iθ2−iθ1 Interference = e ψ(A1)ψ(B|A1)ψ(A2)ψ(B|A2) + e ψ(A2)ψ(B|A2)ψ(A1)ψ(B|A1)+

iθ1−iθ3 iθ3−iθ1 +e ψ(A1)ψ(B|A1)ψ(A3)ψ(B|A3) + e ψ(A3)ψ(B|A3)ψ(A1)ψ(B|A1)+

iθ2−iθ3 iθ3−iθ2 + e ψ(A2)ψ(B|A2)ψ(A3)ψ(B|A3) + e ψ(A3)ψ(B|A3)ψ(A2)ψ(B|A2) (2.10)

Knowing that eiθ1−iθ2 + eiθ2−iθ1 cos(θ − θ ) = 1 2 2

Then, Equation 2.9 becomes

2 2 2 P r(B) = |ψ(A1)ψ(B|A1)| +|ψ(A2)ψ(B|A2)| +|ψ(A3)ψ(B|A3)| +2ψ(A1)ψ(B|A1)ψ(A2)ψ(B|A2) cos(θ1−θ2)+

25 +2ψ(A1)ψ(B|A1)ψ(A3)ψ(B|A3) cos(θ1 − θ3) + 2ψ(A2)ψ(B|A2)ψ(A3)ψ(B|A3) cos(θ2 − θ3)

(2.11)

Generalising Equation 2.11 for N events, the final probabilistic interference formula, derived from the law of total probability, is given by:

N N−1 N X 2 X X P r(B) = |ψ(Ai)ψ(B|Ai)| + 2 |ψ(Ai)ψ(B|Ai)| |ψ(Aj)ψ(B|Aj)| cos(θi − θj) (2.12) i=1 i=1 j=i+1

Following Equation 2.12, when cos(θi − θj) equals zero, then it is straightforward that quantum prob- ability theory converges to its classical counterpart, because the interference term will be also zero.

For non-zero values, Equation 2.12 will produce interference effects that can affect destructively the classical probability (when the interference term in smaller than zero) or constructively (when it is bigger than zero). Additionally, Equation 2.12 will lead to a large amount of θ parameters when the number of events increases. For N binary random variables, we will end up with 2N parameters to tune.

2.3 Time Evolution

In classical theory, time evolution can be modelled by a dynamical Markov model, which uses a transition function T (t) to model the probability of transiting from one state to another in time t. The transition func- tion T (t) is represented by a matrix containing positive real numbers and with the constraint that each row must sum to one (normalisation axiom). In other words, this matrix represents the new probability distribution over some time period t and is given by Equation 2.13[164]. The matrix K corresponds to an intensity matrix. A solution to the above equation is given by T (t) = eK.t, which allows one to construct a transition matrix for any time point from the fixed intensity matrix. In other words, going back to the jury duty example, the intensity matrix performs a transformation on the probabilities of the current state to favour either the guiltiness or the innocence of a defendant. These intensities can be defined in terms of the evidence and payoffs for actions in the task (for instance, how strong is the evidence of the defendant being guilty).

d T (t) = K · T (t) ⇒ T (t) = eK.t (2.13) dt

In quantum theory, time evolution is modelled according to Schrodinger’s¨ equation and results in a unitary operator. The unitary matrix restricts the allowed evolution of quantum systems by ensuring that the sum of probabilities of all possible outcomes of any event is always 1. This means that the matrix must be double stochastic (all rows and columns sum to 1). Under the quantum cognition point of view, this matrix encodes all state transitions that a person can experience while reasoning about a decision. The Schrodinger’s¨ equation is given by Equation 2.14 and makes use of a Hamiltonian matrix. The Hamiltonian matrix H has to be Hermitian (that is, it needs to be equal to its conjugate transpose) and

26 like the intensity matrix, it represents the evidences and payoffs for actions in a task.

δ U(t) = −i · H · U(t) ⇒ U(t) = e−i·H·t (2.14) δt

2.4 Path Diagrams

The superposition principle and the interference effects can lead to different ways of interpreting the quantum mechanics framework. Feynman et al. [72] proposed the Feynman Path Diagrams, which corresponds to another way of interpreting quantum particle trajectories. In classical physics, the path that a particle can take between two positions corresponds to the one that requires spending less energy (Figure 2.6 (left)). This was introduced by Pierre Maupertuis and is known as the The Principle of Least Action.

Figure 2.6: Classical Principle of Least Action. The path that a particle chooses between a starting and ending position is always the one that requires the least energy (left). Quantum version of the Principle of Least Action. A particle can be on different paths at the same time and use them to find the optimal path (the one that requires less energy) between a starting and final position (right)

When turning to a quantum perspective, Maupertuis’ Principle of Least Action poses several chal- lenges, since in quantum mechanics, a particle can take a superposition of paths to reach a final position (Figure 2.6 (right)). Since in quantum mechanics particles are represented by waves, then the major challenge is to find a theory that can take into account multiple paths and that can represent these paths in a probabilistic manner. A solution to this challenge corresponds to Feynman’s Path Integrals.

Figure 2.7: Single Path Trajectory (left). Multiple distinguishable paths (center). Multiple undistinguish- able paths (right).

27 Very generally, Feynman’s Path Integrals can be used for computing quantum probabilities in graph- ical models in the following way.

2.4.1 Single Path Trajectory Principle

In classical probability, the computation of the probability of transiting from an initial state A to a final state C, transiting from an intermediate state B, can be achieved through a classical Markov model. The probability can be computed by making the product of the individual probabilities for each transition, from one state to another, through the usage of conditional probabilities. According to the left graph of Figure 2.7, the probability of transiting from state A, followed by state B and ending in state C, that is, P r(A → B → C), is given by:

P r( A → B → C ) = P r( A ) · P r( B | A ) · P r( C | B ) (2.15)

In quantum probability, the calculation can be performed using Feynman’s first rule, which asserts that the probability of a single path trajectory consists in the product of the squared magnitudes of the amplitudes for each transition from one state to the next along the path. This means that the quantum probability value of a single path trajectory is the same as the classical Markov probability for the same path. In Equation 2.16, ψ corresponds to a quantum amplitude.

P r( A → B → C ) = | ψ(A)|2 · | ψ( B | A )|2 · | ψ( C | B )|2 = P r(A) · P r( B | A ) · P r( C | B ) (2.16)

2.4.2 Multiple Indistinguishable Paths

In classical probability, multiple indistinguishable paths consist in moving from an initial state A to a final state D by passing from multiple possible paths without knowing for certain which path was taken to reach the goal state (Figure 2.7, (right)). In a classical graphical probabilistic model, if one does not observe which path was taken to reach the final state D, then one simply computes this probability by summing the individual probabilities of each path. This is known as the single path principle and it is in accordance with the law of total probability. In the right of Figure 2.7, the final probability is given by Equation 2.17.

P r( A → D ) = P r( A ) · P r( B | A ) · P r( D | B ) + P r( A ) · P r( C | A ) · P r( D | C ) (2.17)

In quantum probability, when a path is unobserved, then the goal state can be reached through a superposition of path trajectories. This is known as Feynman’s second rule, which states that the amplitude of transiting from an initial state A to a final state D, taking multiple indistinguishable paths, is given by the sum of all amplitudes for each path. This rule is in accordance with the law of total amplitude and the probability is computed by taking the squared magnitude of this sum. This rule will generate the

28 quantum interference term, which was already presented in Section 2.2.

P r( A → D ) = |ψ(A) · ψ(B | A) · ψ(D | B) + ψ(A) · ψ( C | A) · ψ( D | C) |2 =

2 2 = | ψ(A) · ψ(B | A) · ψ( D | B ) | + | ψA · ψ(C | A) · ψ(D | C) | + Interference (2.18) Interference = 2 · | ψA · ψ(B | A) · ψ(D | B) | · | ψ(A) · ψ(C | A) · ψ(D | C ) | cos(θ1 − θ2)

2.4.3 Multiple Distinguishable Paths

In multiple distinguishable paths, that is when the paths are observed, quantum probability theory col- lapses to the classical Markov model. This is known as Feynman’s third rule and states that the proba- bility amplitude of observed multiple path trajectories corresponds to the sum of the amplitudes of each individual path. The probabilities are then taken by calculating the squared magnitude of each individual path (Figure 2.7, (centre)).

P r( A → D ) = | ψ(A) · ψ(B | A) · ψ( D | B ) |2 + | ψ(A) · ψ( C | A ) · ψ( D | C ) |2 =

= P r(A) · P r( B | A ) · P r( D | B ) + P r(A) · P r( C | A ) · P r( D | C ) (2.19)

2.5 Born’s Rule

Born’s rule states that the probability of finding a particle at position x in a generalised quantum state representing a wave function, ψ(x), is given by Equation 2.20, and corresponds to the squared length of the projection of state |ψi into the subspace of |xi (in other words, a projection to the eigenvector |xi).

P r(x) = |ψ(x)|2 (2.20)

Born’s rule plays an important role to describe and interpret the results of a quantum measurement. It was a rule that was introduced into the quantum theory in order to give an interpretation to the theory (a postulate), rather than a rule derived from the theory itself. Although it has been validated by extensive experimental studies Sinha et al. [178], there is still not a convincing analytical proof of why this relation is true, although there have been many attempts to explain Born’s rule Deutsch [65], Zurek [219]. To give a more precise explanation, consider Figure 2.8. If we represent a bit in a vector space by a superposition vector (called a qubit), then this vector, ψ, will represent a complex linear combination of a 0 state and a 1 state. But, when we measure the qubit, a projection of the superposition vector to a subspace, Si, is performed and the system will randomly return either state 0 or state 1 with a probability, P r(Si). The question is, in which subspace will it be projected, the 0 subspace, S0, or the 1 subspace, S1? Since we do not know which one, we need to specify this uncertainty in terms of probabilities. And now we are left with the question of how to describe this probability: in which magnitude should the probability of the projection into a selected subspace Si be proportional to? In other words, what value of n should

29 Figure 2.8: Representation of the projections, Pi, of a qubit ψ, to either the |0i state subspace S0 or the |1i state subspace S1.

n be chosen such that P r(Si) = |hSi|ψi| [66]. Born’s rule is the answer to this question with n = 2 and experimental findings indeed confirm this observation. Born’s rule constitutes an important postulate in quantum mechanics and also means that L2-norms are used under this theory.

2.6 Why Complex Numbers?

From the previous section, experimental observations show that the type of norms applied in quantum mechanics have to be the L2-norm, also known as the Euclidean norm. In his book, Aaronson [2] raises an important question: if complex numbers were created by humans as a convenient tool to always give a solution to a quadratic equation, then why is quantum mechanics modelled by complex numbers? In other words, in L2-norms, we can use real numbers, so why do we need complex amplitudes in quantum mechanics? This is a fundamental question, because it is what will allow us to differentiate a problem modelled in a vector space as being quantum or classical: if the solution can only be given using complex probability amplitudes, then it is quantum. Otherwise, if real numbers can lead to a solution, then it is classical. Quantum physics is the only discipline that uses complex numbers to specify probabilities. And the consequences is that, the theory of quantum mechanics will allow us to specify negative probability amplitudes, which will lead to the quantum interference effects, an effect that is exclusive to quantum mechanics. Suppose that we want to build an abstract probability model. We start by defining an event |Si as a vector of N possible outcomes. Since probabilities cannot be negative, then these values can only be positive real numbers:   pr1    .  |Si =  .    prN

Since probabilities need to sum to unity, then we can state that we need to apply a Lp-norm such that the probability vector is unit length. The two most used norms are: L1-norm, also known as the Manhattan

30 distance (Figure 2.9); and the L2-norm, known as the Euclidean distance (Figure 2.10). This means that,

Figure 2.9: Example of a distance between two Figure 2.10: Example of a distance between points in L1-norm, also known as the Manhat- two points in L2-norm, also known as the Eu- ten distance. clidean distance.

in the L1-norm, we compute the length of a vector by summing the absolute value each entry, |||Si||1. This represents classical probability theory. In the L2-norm, we compute the square root of the sum of the squares of the entries, |||Si||2. This represents quantum probability theory.

q 2 2 |||Si||1 = hS|Si = |pr1| + ... + |prN | |||Si||2 = hS|Si = pr1 + ... + prN

If we consider a single bit, we can state that the bit has a probability p of being in state 0 and a probability 1 − p of being in state 1. When we move from L1-norm to L2-norm, we need that the square √ √ √ √ of these numbers sum to unity. That is, p2 + 1 − p2 = 1. Setting α = p and β = 1 − p, then the condition α2 + β2 = 1 is a circle comprising the set of all unit length vectors. One can already make a relation and understand why quantum events are represented in circle of radius 1 in a Hilbert space. We used this representation for the jury duty example in Section 2.1.2. Following the works of Aaronson [1,2], in order to be able to select the event that we want to measure from this vector representation of probabilities, we need to specify an operation that always maps a probability vector into another probability vector and that preserves the vector’s length. In the L1-norm, we want to find a matrix composed of non-negative real numbers where every column sums to unity. The class of matrices that satisfy these conditions are the stochastic matrices. Showing the example given in the book of Aaronson [2], the operation of flipping a bit can be described using the following stochastic matrix:       0 1 pr 1 − pr     =   , 1 0 1 − pr pr and the probability of observing the bit in state 0 would be |1 − pr|. When we move to the L2-norm, the matrices that map one vector to another and preserve the length of the vectors are the unitary matrices, which we already addressed in this chapter in two different situations: time evolution (Section 2.3) and to model the double slit experiment (Section 2.2). If the numbers of this matrix are all real values, then the

31 matrix is called an orthogonal matrix and is a matrix whose inverse equals its transpose. If the matrix is represented by complex numbers, then it is called a unitary matrix and is a matrix whose conjugate transpose equals its inverse. This matrix corresponds to a rotation of the vector space.

   √   √  0 1 pr 1 − pr    √  =  √  , 1 0 1 − pr pr

√ 2 The probability of observing the bit in state 0 is equal to 1 − pr 2. The question from the previous 1/p section arises again: in (1 − pr) p, why p = 2? And most importantly, when using the L2-norm, one can still use real numbers. So, why use quantum complex probability amplitudes?

According to Aaronson [1], an argument for the usage of complex probability amplitudes is concerned with the number of parameters required to define a mixed state. Suppose we have a mixed state com- posed of two quantum states AB. Then, the number of parameters needed to describe it, dAB should be equal the product of the number of parameters needed to describe state A, dA, and the number of parameters needed to describe state B, dB:

dAB = dA.dB

Letting NA and NB be the number of dimensions of A and B respectively, then the number of indepen- dent real parameters in a N × N Hermitian matrix are N(N + 1)/2.

N(N + 1) N (N + 1) N (N + 1) = A A . B B 2 2 2

The above equation does not apply for N = NANB, because we will obtain a number bigger number:

2  2  NANB(NANB + 1) = NA + NA NB + NB

2 2 2 2  2 2  NANB + NANB < NANB + NANB + NANB + NANB

The same reasoning applies for quaternions: we would end up with a number smaller than expected. If, however, the amplitudes are quaternion numbers, then the left side of the equation would be much smaller than expected.

If, however, the numbers are complex probability amplitudes, then the number of independent param- eters consists in N real numbers in the diagonal and N(N − 1) parameters bellow the diagonal. Since the matrix is Hermitian, the complex probability amplitudes that are below the main diagonal, have to be the same as the ones above it, since the Hermitian matrix has to be equal to its complex conjugate. 2 In other words, we would need N + N(N − 1) = N parameters. Again, having NA and NB be the number of dimensions of A and B respectively, and N = NANB, then we get the exact relation, which means that usage of complex numbers are indeed the type of numbers that define what makes a model quantum / quantum-like. 2 2 2 2 dAB = NANB = NANB = dAdB

32 Any system / model, which does not use complex probability amplitudes cannot be considered quantum. This statement will play a major role in later chapters where we will compare classical models with quantum-like models that have been proposed throughout the literature, specially to deal with order effects problems.

2.7 Summary and Final Discussion

Quantum probability theory provides another method of computing probabilities without falling into the restrictions that classical probability has in modelling cognitive systems of decision-making. Quantum probability theory can also be seen as a generalisation of classical probability theory, since it also in- cludes the classical probabilities as a special case (when the interference term is zero). In an abstract mathematical point of view, the key differences between classical and quantum prob- ability theory is that the later makes use of quantum complex amplitudes to specify probabilities. These amplitudes can be negative and produce quantum interference effects, which will either cancel outcomes of events in a destructive fashion or will increase their probability amplitude constructively. In the realm of quantum cognition, the main difference between classical and quantum probability is the way how information is processed. According to classical decision-making, a person changes beliefs at each moment in time, but it can only be in one precise state with respect to some judgement. So, at each moment, a person is favouring a specific belief. The process of human inference deterministically either jumps between definite states or stays in a single definite state across time [34]. In processing, on the other hand, information is modelled via wave functions and therefore they cannot be in definite states. Instead, they are in an indefinite quantum state called the superposition state. That is, all beliefs are occurring on the human mind at the same time. According to cognitive scientists, this effect is responsible for making people experience uncertainties, ambiguities or even confusion before making a decision. At each moment, one belief can be more favoured than an- other, but all beliefs are available at the same time. In this sense, quantum theory enables the modelling of the cognitive system as it was a wave moving across time over a state space until a final decision is made. From this superposed state, uncertainty can produce different waves coming from opposite di- rections that can crash into each other, causing an interference distribution. This phenomena can never be obtained in a classical setting. When the final decision is made, then there is no more uncertainty. The wave collapses into a definite state. Thus, quantum information processing deals with both definite and indefinite states [34]. The non-commutative nature of quantum probability theory enables the exploration of methods ca- pable of explaining violations in order of effects. Order of effects is a fallacy that consists in querying a person in one order and then posing the same questions in reverse order. Through classical probability theory, it would be expected that a person would give the same answers independently of the order of the questions. However, empirical findings show that this is not the case and that people are influence by the context of the previous questions. Moreover, the existence of quantum interference effects also enables the exploration of models that accommodate other typed of paradoxical findings. These topics

33 will be addressed with more detail in Chapters4 and5 respectively. In summary, quantum probability theory is a general framework, which can naturally explain various decision-making paradoxes without falling into the restrictions of classical probability theory.

34 Chapter 3

Fundamentals of Bayesian Networks

The explicit representation of probability joint distributions is very demanding. Even in its simpler case, when variables contain binary values, a joint distribution requires 2N entries, where N is the number of random variables. So, in a computational point of view, the manipulation of joint probability distributions is very expensive and the joint probability table is too large to store in . Moreover, it is hu- manly impossible to assign probability values manually to all events of the joint probability table. These difficulties were the main motivation for researchers to develop new methodologies which explored inde- pendence properties in distributions and alternative parameterisations that could enable the exploitation of these independences [116]. An example of such methodology is the Bayesian Network.

In this chapter, we present the fundamental concepts related to Bayesian Networks that are neces- sary for the understanding of this work. In order to do that, we start the chapter by introducing the na¨ıve Bayes model, which represents the underlying model behind Bayesian Networks (Section 3.1). Next, we present the definition of Bayesian Networks (Section 3.2). In Section 3.3, we analyse the three types of patterns that can arise when performing inferences on Bayesian Networks. Section 3.4, analyses how the observation of one random variable changes our beliefs regarding other variables. Finally, in Section3.5, we summarise all the important concepts that were addressed throughout this chapter.

3.1 The Na¨ıveBayes Model

The aim of the Na¨ıve Bayes model is to combine conditional parameterisations with conditional inde- pendence assumptions in order to obtain a more compact representation of the high dimensional joint probability distribution table. In this model, all random variables Xi are instances that fall into one of a number of mutually exclusive and exhaustive classes C. The naive assumption is that the random variables (also called features in this model) are conditionally independent given the instance’s class. Figure 3.1 shows a Na¨ıve Bayes model which can be represented as a Bayesian Network.

In the Na¨ıve Bayes model, the probability of a class (C) given a set of random variables {X1,X2, ..., Xn}

35 Figure 3.1: Na¨ıve Bayes Model, where node C represents the class variable and the set of random variables {X1,X2, ..., Xn} represent the features. is given by the conditional probabilistic formula:

P r(C|X1,X2, ..., Xn)

Computing the above formula implies the computation of the full joint distribution table. When the number of random variables is large, the manipulation of the joint probability distribution, in a computa- tional point of view, is very expensive and too large to store in memory. So the above formula, needs to be reformulated in order to be computationally tractable by using the Bayes formula.

P r(C)P r(X1,X2, ..., Xn|C) P r(C|X1,X2, ..., Xn) = P r(X1,X2, ..., Xn)

Since the denominator does not depend on the class variable C, one can look at it as a constant and consequently it can be discarded. The formula is now identical to a joint distribution.

P r(C|X1,X2, ..., Xn) ∝ P r(C)P r(X1,X2, ..., Xn|C) = P r(C,X1,X2, ..., Xn)

Given that the joint probability distribution can be fully described by the chain rule principle, then the joint distribution can be computed by:

P r(C,X1,X2, ..., Xn) = P r(C).P r(X1, |C).P r(X2|C,X1).P r(X3|C,F1,F2)...

...P r(Xn|C,X1,X2,X3, ..., Xn−1)

The na¨ıve assumption states that each feature Xi is conditionally independent of every other feature

Xj, for j 6= i, given the class variable C. This means that:

P r(Xi|C,Xj) = P r(Xi|C) for i 6= j

With the na¨ıve assumption, the joint probability distribution can be denoted by:

P r(C|X1, ..., Xn) ∝ P r(C).P r(X1|C).P r(X2|C).P r(X3|C)...P r(Xn|C)

36 n Y P r(C|X1, ..., Xn) ∝ P r(C) P r(Xi|C) i=1

In order to have a probability value, the previous formula needs to be normalised. So using the naive independence assumption, the conditional distribution over the class variable C is given by Equation 3.1. In this equation, Z is a normalisation factor.

n 1 Y P r(C|F , ..., F ) = .P r(C) P r(X |C) (3.1) 1 n Z i i=1

The model represented by Equation 3.1 is more adequate for computation than the full joint distribu- tion. In order to compute the full joint probability formula, if all variables are binary, one would require to store and compute 2n entries. With the naive assumption model this computational cost is reduced to 2n + 1 entries, where again n is the number of binary random variables.

3.2 Bayesian Networks

A Bayesian Network is a directed acyclic graph structure in which each node represents a different random variable from a specific domain and each edge represents a direct influence from the source node to the target node. The graph represents independence relationships between variables and each node is associated with a conditional probability table which specifies a distribution over the values of a node given each possible joint assignment of values of its parents. This means that each node is a stochastic function of its parents. This idea of a node depending directly from its parent nodes is the core of Bayesian Networks. Once the values of the parents are known, no information relating directly or indirectly to its parents or other ancestors can influence the beliefs about it. Although, information about its descendants can change the beliefs about it [116]. This leads to conditional independence and consequently to the Markov Assumption.

The Markov assumption can be defined as follows. Let X = {X1,X2, ..., XN } be a set of N random variables of a Bayesian Network graph structure. Let P arents(Xi) be the parents of the random variable

Xi and NonDescendants(Xi) be the variables in the graph that are not descendants of Xi. Then, the

Markov assumption states that each variable Xi is independent of its non descendants given its parent nodes [116].

Xi ⊥ NonDescendants(Xi)|P arents(Xi)

Each node Xi is conditionally independent of its non descendants given its parents. Since we are dealing with conditional independences, the Na¨ıve Bayes formula can be used in order to reduce the computational costs of the calculation of all the entries of the full joint distribution table. Therefore, combining the Na¨ıve Bayes formula from Equation 3.1 with the definition of local independences, one can factorise a Bayesian Network using Equation 3.2. The parameter α corresponds to the normalisation

37 factor. n Y P r(X1, ..., Xn) = α P r(Xi|P arents(Xi)) (3.2) i=1 The formula for computing classical exact inferences on Bayesian Networks is based on the full joint distribution (Equation 11.15). Let e be the list of observed variables and let Y be the remaining unobserved variables in the network. For some query X, the inference is given by:

  X P rc(X|e) = αP rc(X, e) = α  P rc(X, e, y) (3.3) y∈Y

1 Where α = P x∈X P rc(X = x, e) The summation is over all possible y, i.e., all possible combinations of values of the unobserved variables y. The α parameter, corresponds to the normalisation factor for the distribution P r(X|e) [168]. Bayesian Networks have advantages, because they can represent any full joint distribution in a fac- torised and compact way. Instead of computing the full joint distribution, using Equation 3.2, one can compute the joint distribution of a random variable from the probability distributions of the parent nodes. They do not require an exponential storage in order to hold all the probabilities of a distribution and the computation of an atomic event can be done in linear time with the number of nodes in the network. The computation of conditional probabilities is more costly, because they require enumerating all the relevant joint probabilities. This takes exponential time with the number of variables, making inference problems NP − Hard in Bayesian Networks. This means that exact inferences are intractable with the size of the decision scenario and will not produce an outcome in convenient time.

3.2.1 Example of Inferences in Bayesian Networks

Figure 6.9 illustrates an example of a Bayesian Network, originally proposed in the book of [168].

Figure 3.2: The Burglar Bayesian Network proposed in the book of [168]

In Figure 6.9, we have five random variables: Burglar, Earthquake, Alarm, JohnCalls and MaryCalls.

38 This network describes that an alarm can be activated either by a burglar or by an earthquake. If John thinks that he heard the alarm, then he might call the police. The same happens to Mary. If we want to compute the probability of John calling the police, given that he heard the Alarm, there was no Earthquake, no Burglar and Mary did not call the police, then using Equation 3.2, we obtain the following, without computing the full joint probability distribution table.

P r(B, E, A, J, M) = P r(B).P r(E).P r(A|B,E).P r(J|A).P r(M|A)

P r(B = f, E = f, A = t, J = t, M = f) =

= P r(B = f).P r(E = f).P r(A = t|B = f, E = f).P r(J = t|A = t).P r(M = f|A = t)

= 0.99 × 0.98 × 0.001 × 0.05 × 0.3 = 0.0015%

So this value means that under the above circumstances there is a very low probability of John calling the police. Given that there was no burglary nor earthquakes, the probability of hearing the alarm ringing and consequently making John call the police is very low.

3.3 Reasoning Factors

When performing reasoning in Bayesian Networks, three types of patterns may arise. A reasoning process can either be causal, evidential or intercausal. This section describes how each pattern per- forms during reasoning under a Bayesian Network. All the patterns in this section will be described by example based in the Bayesian Network illustrated in Figure 6.9.

3.3.1 Causal Reasoning

This pattern is concerned with the presence of causal relationships among events. Causal reasoning goes in the causal direction, from top to bottom, and is the idea that any cause leads to a certain effect. It may be possible to alter the environment by preventing the occurrence of certain kinds of events. Two events do not become relevant to each other merely by virtue of predicting a common consequence, but they do become relevant when the consequence is actually observed. Example: Using the Bayesian Network illustrated in Figure 6.9, suppose that Sam is interested in knowing if John called the police (based on the possibility of occurring a burglary or an earthquake). In causal reasoning we observe how the probabilities change as evidence is obtained.

• How likely is John calling the police, knowing anything else?

P r(J = t) = 0.0637

39 • But there was no earthquake: P r(J = t|E = f) = 0.0588

• Next, we find out that there was no burglar in the house

P r(J = t|E = fB = f) = 0.0508

One can notice that the probabilities for John calling the police keep changing in the presence of new evidence.

3.3.2 Evidential Reasoning

In evidential reasoning, the system is modelled from effects to causes. The task goes in the direction from bottom to top. In evidential some parts of a system are observed and one want to perform an inference about the other hidden parts and therefore given an observation of effects, one wants to deter- mine the causes. Figure 3.3 shows the differences between causal reasoning and evidential reasoning strategies.

Figure 3.3: Difference between causal reasoning and evidential reasoning

Example: Using the Bayesian Network illustrated in Figure 6.9, suppose that a policeman is investi- gating the percentage of crime in a street.

• A priori, they know that the probability of occurring a burglary is:

P r(B = t) = 0.0100

• Then, the police finds out, that recently an alarmed fired:

P r(B = t|A = t) = 0.5835

Notice that with the Bayesian Network in Figure 6.9, if the police had information about the alarms which were triggered and about which neighbours called the police (either John or Mary), one will notice

40 that the probabilities are the same:

P r(B = t|A = t) = P r(B = t|A = tJ = t) =

= P r(B = t|A = tM = t)P r(B = t|A = t, J = t, M = t) = 0.5835

This phenomena occurred because of the independence structures present in the Bayesian Network. Section 3.4 addresses this phenomena more carefully.

3.3.3 Intercausal Reasoning

Intercausal reasoning is an inference pattern that involves probabilistic dependence of causes of an ob- served common effect. Explaining away [83, 161] is probably the most common form of intercausal rea- soning. Explaining away can be illustrated in the following way. Given an observed effect and increase in probability of one cause, then all other causes of that effect suffer a decrease in their probabilities, making them less likely to occur [69]. In other words, when there are competing possible causes for some event, and the chances of one of those causes increases, then the chances of the other causes must decline since they are being ”explained away” by the first explanation. Two random variables A and B are in an intercausal relationship given a third random variable C if they are conditional independent on set of variables, but conditionally dependent on every other set that C is contained in.

3.4 Flow of Probabilistic Inference

This section analyses the situations where a random variable can influence another in a Bayesian Networks. By influence, we mean that an observation regarding a random variable X can possibly change our beliefs about another random variable Y . This analysis will enable the understanding of when an independence holds in a distribution associated with a Bayesian Network. There are two pos- sible cases where X can influence Y . They can influence each other either in a direct way or indirectly.

Direct Connection This is the simplest and the most intuitive case of influence. In this situation, a random variable X is directly connected to a random variable Y through an edge, X → Y . In direct connections, one can always build a distribution where X and Y are correlated regardless of the obser- vation of any other random variable in the network.

Indirect Connection In this case, the variables X and Y are not directly connected. There is a trail between them that may enable the connection of these two variables. In the simplest case, where X and Y are not directly connected, but they share a trail via a third random variable Z (e.g. X → Z → Y ). Indirect connections are the most common in Bayesian Network structures and are the ones that enable the exploitation of independence properties to reduce computation costs during the inference process.

41 There are four situations in a Bayesian network where two random variables are indirectly connected. Two of them correspond to causal chains, the other one corresponds to common causes and the last one corresponds to common effects. They are described as follows.

Indirect Causal Effect Considering our burglar Bayesian network example in Figure 6.9, one can see that there are two causal trails: B → A → J or E → A → M. Considering that the variable alarm is not observed, if a burglar is observed, then there is a higher probability of the alarm being triggered and consequently, there is a high chance of John or Mary calling the police. Transposing this situations to a general case, one can see that a random variable X can in fact influence a random vari- able Y via a Z as long as Z remains unobserved, that is, X → Z → Y . Figure 3.4 illustrates an example.

Indirect Influential Effect Given that dependence and independence notions share the symmetry property, then this situation is similar to the indirect causal effect previously explained. Returning to the burglar example, if John called the police, then it means that he thinks he heard the alarm and conse- quently there is a higher probability of an occurrence of a burglary. Once the alarm variable is observed, then finding whether John called the police or not, does not change our beliefs of whether a burglary occurred. So again, Y can influence X via a third random variable Z, as long as Z remains unobserved, that is, Y → Z → X. Figure 3.5 illustrates an example of an indirect influential effect.

Common Cause This situation is similar to the Bayesian example of the Na¨ıve Bayes model already presented in Figure 3.1. In the burglar Bayesian Network, there is a variable alarm that is parent of both JohnCalls and MaryCalls. Mary and John are highly correlated, because if we observe that Mary called the police, then there is a high probability that John also called the police and consequently a high probability of the alarm having been triggered. If the alarm occurred then the probabilities of having occurred a burglary or an earthquake also increase. However, when the variable alarm is observed, then these variables become uncorrelated and knowing about whether Mary called the police, does not bring any additional information to the model. So, in a general case, X can influence Y via Z as long as Z remains unobserved. Figure 3.6 illustrates an example of a common cause effect.

Common Effect (v-structure) So far, the previous examples demonstrated that X influences Y via Z as long as the random variable Z is not observed. Although one might be thinking that this pattern is universal, it does not occur in common effect structures, also known as v-structures. Again, considering the burglar Bayesian Network, we have the variables burglar and earthquake which are parents of the alarm variable. When the variable alarm is not observed, then burglar is independent of earthquake. In this case, there is no information flow along the trail X → Z ← Y if the intermediate random variable Z is not observed. However, when we observe the intermediate variable, the alarm, the variables burglar and earthquake become correlated. If the alarm was triggered, then the probability of having occurred a burglary or an earthquake increases. In the same way, if we do not observe the alarm, but we know

42 that Mary called the police, then there is a high probability that Mary heard an alarm. If there is a high probability of the alarm being triggered, then there is also a high probability of having occurred either a burglary or an earthquake. So, in a general case, X can influence Y via Z if Z is observed or a descendant of Z is observed. Figure 3.7 illustrates an example of a v-structure.

Figure 3.4: Indirect Figure 3.5: Indirect Figure 3.6: Common Figure 3.7: V-Structure Causal Effect Evidential Effect Cause Effect

An active trail can be defined as follows. Let X1 ··· Xn be a trail in a Bayesian structure. Let Z be a subset of observed variables. The trail X1 ··· Xn is active given Z if [116]:

• There is a v-structure Xi−1 → Xi ← Xi+1, then Xi or one of its descendants are in Z;

• No other node along the trail is in Z.

Table 3.1 summarises all dependence properties presented in this section together with active trails.

Trail Notation Activation Causal trail X → Z ← Y active if and only if Z is not observed Evidential trail X ← Z ← Y active if and only if Z is not observed Common cause X ← Z → Y active if and only if Z is not observed Common effect X → Z ← Y active if and only if either Z or one of Z’s descendants is observed

Table 3.1: Summary of all possible active trails in a Bayesian Network

3.5 Summary and Final Discussion

Probabilistic based systems are one of the most powerful frameworks that enable the extraction of knowledge from data to perform predictions. Probability theory is a formal framework that is capable of representing multiple outcomes and their likelihoods under uncertainty. Uncertainty is a consequence of various factors: limitations in our ability to observe the world, limitations in our ability to model it and possibly even because of innate nondeterminism [116]. If we take into account the basic structures behind probabilistic systems, we will find that in one extreme there is the full joint probability distribution and in another extreme there is the Na¨ıve Bayes model. Bayesian Networks can be seen as a framework that falls between the full joint probability distribution and the Na¨ıve Bayes Model. It is a compact representation of high dimensional probability distributions by combining conditional parameterisations with conditional independences in graph structures [116]. It

43 is is a directed acyclic graph structure in which each node represents a different random variable from a specific domain and each edge represents a direct influence from the source node to the target node. When performing reasoning in Bayesian Networks, three types of patterns may arise. A reasoning process can either be causal, evidential or intercausal. Causal reasoning implies the presence of causal relationships between events, that is, any cause leads to a certain effect. Evidential reasoning is a notion opposite to the Causal reasoning. The system is modelled from effects to causes, that is, given an observation of effects, one wants to determine the causes. And finally, Intercausal reasoning consists in a combination of causal reasoning and evidential reasoning. Given an observed effect and increase in probability of one cause, then all other causes of that effect suffer a decrease in their probabilities, making them less likely to occur. In Bayesian Networks, random variables can influence each other in several different ways. A random variable X can be directly connected to some random variable Y , which implies that they are correlated independently of any other observation in the network. They can also be connected indirectly through a third random variable. In this case, they can have several types of connections, such as indirect causal effects, where a random variable X can influence a random variable Y via a Z as long as Z remains unobserved, that is, X → Z → Y . They can be connected through an indirect influential effect where a random variable Y can influence X via a third random variable Z, as long as Z remains unobserved, that is, Y → Z → X. They can share a Common Cause, where a random variable X can influence Y via Z as long as Z remains unobserved, X ← Z → Y . And, they can have a v-structure or a Common Effect, where a random variable X can influence Y via Z if Z is observed or a descendant of Z is observed, X → Z ← Y .

44 Chapter 4

Paradoxes and Fallacies for Cognition and Decision-Making

Throughout time, many frameworks for decision-making have been developed. Since the 30’s, eco- nomical models started to represent a person’s preference through utility functions. The underlying principle was that people should always choose preferences, which maximise an utility function [74]. This assumption set the foundation for the axiomatisation of one of the most important decision theories developed in the last decades: the Expected Utility theory. As long as a decision-maker obeys to a set of axioms and chooses the preference that maximises the utility function, then the decision-maker is considered to act in a rational way [177]. All decision theories based on these assumptions are called normative: they say what people should choose. However, in the end of the 70’s, Daniel Kahneman and Amos Tversky showed in a set of experiments that in many real life situations, the predictions of the expected utility were completely inaccurate [193, 195, 197, 90, 89]. This means that a decision theory should be predictive in the sense that it should say what people actually do choose, instead of what they must choose. Citing one of the lectures of Richard Feynman, the ’paradox’ is only a conflict between reality and your feeling of what reality ’ought to be’ [72]. Since the axiomatisation of the expected utility, many experiments have been carried out, which showed violations to the axioms of rational choice. Example of such paradoxes are the well known Allais [13] and Ellsberg paradoxes [70]. In this chapter, we will make a small overview of the most im- portant paradoxes and human decision errors reported over the literature. This chapter is organised as follows. In Sections 4.1.1 and 4.1.2, it is presented the expected utility theory and the subjective ex- pected utility theories, respectively. In Section 4.2, it is presented a set of paradoxes that violate both the expected and subjective expected utility theories. In Section 4.3, it is presented human judgment errors by introducing the notions of conjunction and disjunction errors in the Linda problem. In Section 4.4, we present probability judgment errors by presenting two representative experiments of the literature: the two stage gambling game and the prisoner’s dilemma. Next, in Section 4.5, it is presented the problem of order of effects, in which we analyse the results obtained in the work of Moore [134], where the author collected public opinions between two important people: Bill Clinton and Al Gore. It is also analysed the

45 work of Bergus et al. [25] in which the authors investigated how the order of effects in clinical data can influence decisions in diagnostics. Finally, this chapter is ended with a brief overview of the most import concepts presented, in Section 4.6.

4.1 Utility Functions

In this section, we present the two main decision theories that have been proposed in the literature: the Expected Utility Theory and the Subjective Utility Theory. Note that both these theories are highly normative and therefore many other decision theories have been proposed throughout the literature (for instance Choquet Expected Utility). This work will be focused on the paradoxes derived from the expected and subjective utility theories, and therefore we will give primary focus to them.

4.1.1 Expected Utility Theory

The expected utility hypothesis corresponds to a function designed to take into account decisions un- der risk. It consists of a choice of a possible set of actions represented by a probability distribution over a set of possible payoffs. The traditional expected utilities are also referred to as von Neu- mann–Morgenstern [201] and are given by Equation 4.1.

X EU[D(a)] = πa(o)U(o) (4.1) o∈O

Where O is a set of outcomes O = {o1, o2, ..., oN }, A is a set of possible actions that an agent can take: A = {a1, a2, ..., aK }, πa(o) is a probability distribution over outcome o given that action a was taken and U(a) is the utility function, which specifies the agent’s preferences for the outcome o. The four axioms that describe the expected utility framework are completeness, transitivity, indepen- dence and continuity. They are defined as follows:

Completeness Assumes that an agent has well defined preferences and is always able to prefer one gambling over another. That is, the agent either prefers gambling A over B, or is indifferent to both gamblings or prefers gambling B over A:

For every A and B either A  B or B  A

Transitivity Extends the completeness axiom by stating that an agent consistent decisions. That is, if the agent prefers action A over B and then we add a third action C, where action B is preferred over C, then the agent still prefers action A over B:

For every A, B or C with A  B and B  C, then A  C

Independence Is similar to the Transitivity axiom. It states that when two gambles are mixed with third one, the agent should maintain the same preference order as when the first two gambles are

46 presented independently of the third. That is:

Let A, B and C be three different gambles having

A  B, and let t ∈ [0, 1], then tA + (1 − t)C  tB + (1 − t)C

Continuity Let A, B and C be three different gamblings where an agent has the sequence of prefer- ences A  B  C, then there should be a possible combination of A and C in which the agent is indifferent between this mix and gambling B.

Let A, B and C be gamblings with preferences A  B  C, then there exists

a probability p such that B is equally good as pA + (1 − p)C

4.1.2 Subjective Expected Utility

In 1954, Savage extended von Neumann-Morgenstern’s expected utility, creating the subjective ex- pected utility (SEU). The extension refers to decisions under uncertainty. That is, this new extension defines functions that can map uncertain events into payoffs. This leads to a personal utility function, and a personal probability distribution. A major difference between the expected and subjective utility functions is concerned with the type of decisions. Expected utility deals with decision under risk, that is choices are defined as an objective probability distribution over possible preferences. The subjective expected utility , on the other hand, deals with decisions under uncertainty, which means that choices are defined over uncertain events and the probability distribution of the events may be subjective. Equa- tion 4.2 describes the Subjective Expected Utility.

X SEU(a) = πa(o)U(o) (4.2) o∈O Although the mathematical formulas of EU and SEU are the same, their probabilities are computed differently. The Expected Utility proposed by von Neumann-Morgenstern uses objective probabilities, whereas the Savage’s Subjected Expected Utility uses subjective probabilities, that is, they are not based on the frequentist approach ( counting events and possible outcomes).

4.2 Paradoxes

In this section, we present a small overview of the most relevant paradoxes and fallacies that occur in decision-making scenarios. We also provides a brief literature overview of current approaches that attempt to address them.

47 4.2.1 Ellsberg Paradox

The disjunction fallacy corresponds to every situation in which Savage’s Sure Thing Principle is being violated. The Sure Thing Principle is a concept widely used in game theory which was originally intro- duced by Savage [170]. This principle is fundamental in theory and states that if one prefers action A over B under state of the world X, and if one also prefers A over B under the complementary state of X, then one should always prefer A over B even when the state of the world is unspecified. An example of the disjunction fallacy is the Ellsberg Paradox. The Ellsberg paradox was developed to illustrate inconsistencies of actual observed choices against predictions of the subjective expected utility function and violations of the sure thing principle. It is described as follows. A ball is taken randomly from an urn. The ball can have one of three colours: red, blue or green. This colour will determine the payoff of each action. The only information given about the quantity of the balls is that there exists 100 red balls and 200 of the others. The player must pick one of two choices:

• Experiment 1:

Option A If a red ball is picked, then receive $ 100, else win nothing;

Option B If a blue ball is picked, then receive $ 100, else win nothing;

• Experiment 2:

Option A’ If a red or a green ball are picked, then receive $ 100, else win nothing;

Option B’ If a blue or a green ball are picked, then receive $ 100, else win nothing;

Under the Ellsberg paradox, the sure thing principle implies that if option A is preferred over B, the option A0 must also be preferred over B0. Same happens when choosing option B: if B is preferred over option A, then option B0 should also be preferred over A0. However, the results reported by Camerer & Weber [46] suggest that the majority of people pick option A over B, but they also choose option B0 over A0, violating Savage’s sure thing principle. In order to explain these findings, Tversky & Fox [192] developed a model based on prospect theory, i.e. assumes that people make their decisions more focused on their potential loses and winnings, rather than the final outcome. These winnings/losses are evaluated through heuristics. They also incorporated non-additive decision weights in order to accommodate Ellsberg paradox. Other examples that fall under the disjunction fallacy or disjunction effects are the Two Stage Gam- bling Game (Section 4.4.1) and the Prisoner’s Dilemma Game (Section 4.4.2).

4.2.2 Allais Paradox

Human judgements do not obey to the axioms of the Expected Utility function nor to the Subjective Expected Utility. There is a specific task that researchers found that was constantly violating Expected Utility functions: the Allais Paradox [13].

48 The Allais paradox was developed to illustrate inconsistencies of actual perceived choices against predictions of the expected utility function. It is described as follows. A participant must pick one of two choices. The first one consists in choosing between winning $ 1 million with 100% probability (option A) or an 80% chance of winning $ 5 million and 20% chance of winning nothing (option B). The second choice consists in a 10% chance of wining $ 1 million and a 90% change of winning nothing (option A0), or winning $ 5 million with a probability of 8% and a 92% probability of winning nothing (option B0). Tables 4.1 and 4.2 summarise these bets.

Option A Option B Winning $ 1 million with 100% probability 80% chance of winning $ 5 million 20% chance of winning nothing

Table 4.1: Allais Paradox Experiement 1

Option A’ Option B’ 10% chance of wining $ 1 million and a $ 5 million with a probability of 8% and 90% chance of winning nothing 20% chance of winning nothing

Table 4.2: Allais Paradox Experiement 2

Following the book of Busemeyer & Bruza [34], the expected utility for the first choice is given by:

Option A: EU(A) = 1 × U($1M) Option B: EU(B) = 0.80 × U($5M)

The above statement means that option A is preferred over B if U($1M) > 0.80 × U($5M). If we multiply both sides by 0.10, we obtain the following inequality, and consequently, end up representing options A0 and B0: 0.10 × U($1M) > 0.08 × U($5M)

This means that, people who choose option A over B should also choose option A0 over B0. Results reported by Allais [13] describe that the majority of people chose option A over option B. But they also chose option B0 over option A0, violating the EU independence axiom. The EU theory makes the assumption that preferences are independent of a probability factor when applied to both options (in the example given, the probability factor was 0.10). However, human-decision does not obey to this axiom. In the work of Mura [149], the authors extended the von Neumann-Morgenstern expected utility framework in order to accommodate the Allais paradox. The expected utility hypothesis corresponds to a theory in which the gambling preferences of people is represented by a function of payouts. The framework proposed by von Neumann-Morgenstern could not be adapted to avoid Allais paradox, since they treat notions such as subjective uncertainties and risk as being the same. In Mura [149], they gen- eralised the expected utility framework by adding the idea that the relevant dimensions of risk (subjective consequences) may be perceived differently by different objects, while the probabilities with respect to the natural basis may be taken to represent the relevant dimensions of risk as perceived by the modeller or external observer (objective outcomes).

49 4.2.3 Three Color Ellsberg Paradox

In the Ellsberg Three Ball Paradox, there are 90 balls in an urn with 30 red balls and the remaining 60 are either black or yellow. One ball will be drawn at random from the urn. The following preferences over acts are typical Zhang [217]:

Option 1A Option 1B ∗ If ball is Red, then you win $100; ∗ If ball is Black, then you win $100; ∗ If ball is either Black or Yellow, then win nothing; ∗ If ball is either Red or Yellow, then win nothing;

Table 4.3: Three color ellesberg paradox experiment 1.

Option 2A Option 2B ∗ If ball is either Red or Yellow, win $100; ∗ If ball is either Black or Yellow, then win $ 100; ∗If ball is Black, win nothing; ∗ If ball is Red, then win nothing;

Table 4.4: Three color ellesberg paradox experiment 2.

In the three colour Ellsberg Paradox, the decision maker will prefer option 1A, because he knows that there is a chance of 1/3 of winning $ 100. Option 1B is ambiguous, since the probability of the decision maker winning ranges from 0 to 2/3. The same happens in experiment 2. The decision maker will prefer option 2B, because there is a known chance of winning 2/3 over option 1B, which gives an ambiguous chance of winning between 1/3 and 1. The core of Ellsberg’s paradox is the existence of two types of events: ambiguous events or unambiguous events. Ambiguous events consist in events where the probability outcome is unknown. On the other hand, unambiguous events are events where the probability distributions of the outcomes are known. The term ambiguity aversion consists in a preference of events with known odds over events with unknown odds (event if these unknown odds could lead to a higher utility value). Subjective expected utility fails to explain Ellsberg’s preferences, because it cannot distinguish between ambiguous and unambiguous bets.

4.3 Conjunction and Disjunction Errors

One of the paradoxical findings in cognitive decision-making corresponds to the conjunction errors. A conjunction fallacy corresponds to when people judge the probability of intersection of two events to be greater than the probability of a single event. This fallacy was first reported by Tversky & Kahneman [196]. This is a violation of probability theory, because the likelihood of a subset can never be bigger than the probability of the entire set. On the other hand, there is also the disjunction fallacy, which occurs when people judge the probability of a single set to be bigger than the probability of the union of two sets [47].

50 4.3.1 The Linda Problem

It was the pioneering work of Tversky & Kahneman [196] that first noticed these fallacies. The authors attempted to explain this phenomena through heuristics, more specifically through the representative- ness heuristic already presented in their previous work [193]. The representativeness heuristic can be defined as the degree to which an event is more similar to its parent population. When people judge something through this heuristic, they tend to consider the event more representative, however the fact of being more representative does not mean that is more likely. This can be easily perceived in the Linda example [196]. ”Linda is 31 years old, single outspoken and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti- nuclear demonstrations.” What is Linda more likely to be:

• a. Active in the feminist movement (F)

• b. Bank teller (B)

• c. Active in the feminist movement and bank teller (F ∩ B)

• d. Active in the feminist movement or bank teller (F ∪ B)

The representative heuristic is present in Linda’s problem, because the conjunction (Linda is active in the feminist movement and a bank teller) is more representative of the problem, than the more unlikely event (Linda is a bank teller). Therefore, individuals following this heuristic tend to give higher preference to option c than to option b, which leads to the conjunction fallacy. On the other hand, the disjunction fallacy occurs when individuals tend to judge option a more likely than option d. In Morier & Borgida [147], the authors posed this problem to a set of 64 participants where they were asked to judge Linda according to the above story. The results reported are the following:

P r(F ) > P r(F ∪ B) > P r(F ∩ B) > P r(B)

In this example, the conjunction fallacy emerges when option c is more preferred than option b. In Moore [134] results, one can see that P r(F ∩ B) > P r(B). This is clearly a violation of classical probability theory, since the probability of a subset can never be bigger than the probability of the entire set. In the same way, the disjunction fallacy emerges when option a is judged to be more likely than option d, that is, when it is considered that the probability one single event is higher than the probability of the union of that event with another: P r(F ∪ B) < P r(F ) In Moore [134] results, one can see that P r(F ∩ B) > P r(B). This represents a violation of classical probability theory, because the probability of a single event can never be smaller than a probability union that comprises that very same event. Fig- ure 4.1 and 4.2 demonstrate this explanation through Venn diagrams for both conjunction and disjunction fallacies, respectively. There is a lot of related literature attempting to replicate empirical findings for conjunction errors Moore [134], Sides et al. [176]. More recent works in the literature started to use quantum

51 Figure 4.1: Linda is feminist and bank Figure 4.2: Linda is feminist and bank teller. Notice that P r(F ∪ teller. Notice that P r(F ∩ B) has always B) has always to be bigger than P r(F ). to be smaller than P r(B).

mechanics formalisms in order to explain conjunction errors. The models proposed are various, but they all agree in one aspect: one needs to use quantum interference effects in order to explain conjunc- tion errors. For example, Franco [73] proposed a two-dimensional quantum model to accommodate the violations perceived in the conjunction fallacy, which is very similar to Khrennikov’s Quantum-Like Ap- proach Khrennikov [96]. The author proposed to represent two binary questions A and B by operators in a Hilbert Space. When computing the probability of question B, the authors make use of the law of total probability with the difference that classical probabilities are replaced by quantum amplitudes. The prob- ability is given by making the squared magnitude of the probability amplitudes. Quantum interference effects emerge from this operation as already presented in Section 2.2. To explain the empirical results of the work of Tversky & Kahneman [196], one needs to know a priori the outcome of the experiment in order to manually set the quantum parameters that result from interference effects. This is a concern that affects many quantum-like models of the literature and will be analysed with more detail in Chap- ter5. Other similar quantum versions of this problem were also developed by Khrennikov [107], Yukalov & Sornette [213] and Aerts [5]. These models contain variations, but they share the same quantum formalisms and assumptions.

Another more recent model that also uses quantum formalisms to explain conjunction and disjunction errors corresponds to Busemeyer et al. [40]. In their work, the person’s beliefs are modelled by quantum vector states and used quantum projections to simulate a person’s answer. For instance, to represent the person’s belief about Linda being active in the feminist movement and a bank teller, the proposed model starts in a superposition state |Si, then it collapses to the feminist subspace |F i. The revised state then changes from |Si to the new state |F i. Now, to represent the belief that Linda is also a bank teller, then the state changes from |F i to the revised state |Bi. The final probability is then obtained through the squared length of the projection of Linda believed to be a bank teller, given the person also 2 believed she was part of the feminist movement, ||PB|F i|| . The results obtained were able to simulate the conjunction fallacy without making any violation.

52 4.4 Disjunction Effects

While the disjunction errors reported in the previous section were concerned with a decision-making error, in this section we present examples of disjunction effects, which are concerned with probability judgement errors [34]. In the Linda problem, disjunction errors occurred when a decision-maker expressed more preference for a single event rather than a union comprising that very same event with another. In this section, we will cover decision scenarios, which are concerned with probability errors that violate Savage’s Sure Thing Principle. To these types of paradoxes, we will address to them as disjunction effects.

4.4.1 The Two Stage Gambling Game

The two stage gambling game has been presented as a motivation for this work in the introductory section. We will repeat the explanation of the experiment and complement it with the several works of the literature, which tried to reproduce the experimental findings of Tversky & Shafir [198]. In the work of Tversky & Shafir [198], participants were asked at each stage to make the decision of whether or not to play a gamble that has an equal chance of winning $200 or loosing $100. Three conditions were verified:

1. Participants were informed if they had won the first round;

2. Participants were informed if they had lost the first round;

3. Participants did not know the outcome of the first round;

The results obtained showed that participants who knew they won the first gamble, decided to play again. Participants who knew that they have lost the first gamble also decided to play again. Through Savage’s sure thing principle, it was expected that the participants chose to play again even if they do not knew the outcome of the first gamble. However, the results obtained showed that the majority of the participants did not want to play again after not knowing the outcome of the first gamble, violating the sure thing principle.

Figure 4.3: The two-stage gambling experiment proposed by Tversky & Shafir [198]

53 Literature Win Lose Unknown Tversky & Shafir [198] 0.69 0.58 0.37 Kuhberger et al. [120] 0.72 0.47 0.48 Lambdin & Burdsal [122] 0.63 0.45 0.41 Averaged Results 0.68 0.50 0.42

Table 4.5: Results of the two-stage gambling game reported by different works from the literature.

Tversky & Shafir [198] explained these findings in the following way: when the participants knew that they won, then they have extra house money to play with and decide to play the second round. If the participants know that they lost, then they choose to play again with the hope of recovering the lost money. But when the participants did not know if they had won or lost the round, then these thoughts did not arise in their minds and consequently they choose not to play. Other works in the literature also replicated this two-stage gambling experiment [172, 120, 122]. Table 4.5 illustrates the results obtained in these works.

4.4.2 The Prisoner’s Dilemma Game

The Prisoner’s Dilemma game corresponds to an example of the violation of the Sure Thing Principle. In this game, there are two prisoners who are in separate solitary confinements with no means of speaking to or exchanging messages with each other. The police offer each prisoner a deal: they can either betray each other (defect) or remain silent (cooperate). The dilemma of this game is the following. Taking into account the payoff matrix, the best choice for both players would be to cooperate. However, the action that yields a bigger individual reward is to defect. If player A has to make a choice, he has two options: if B has chosen to cooperate, the best option for A is to defect because he will be set free; if B has chosen to defect, then the best action for A is also to choose to defect because he will spend less time in jail than if he cooperates [198].

Figure 4.4: Example of a payoff matrix for the Prisoner’s Dilemma Game.

To test the veracity of the Sure Thing Principle under the Prisoner’s Dilemma game, several experi- ments were performed in the literature in which three conditions were tested:

• Participants were informed that the other participant chose to defect.

• Participants were informed that the other participant chose to cooperate.

54 • Participants had no information about the other participant’s decision.

Tversky & Shafir [198] investigated the prisoner dilemma game involving 80 participants. The re- sults obtained showed that when a player knew that the opponent had defected, then the majority of participants (97%) chose to also defect. When the player was told that the opponent cooperated, then the majority (84%) decided to defect. However, when the player did not know the action chosen by the opponent, only 63% of people decided to defect. This means that 37% of the participants were violating the sure thing principle. Many other works of the literature replicated Tversky & Shafir [198] experiment. Examples are the works of Crosson [55], Li & Taplin [125], Busemeyer et al. [39], Hristova & Grinberg [86] and Pothos & Busemeyer [164].

Figure 4.5: The Prisoner’s Dilemma game experiment proposed by Tversky & Shafir [198]

Figure 4.5 describes the experiment and Table 8.2 summarises the results of several works in the literature that have performed this experiment using different payoffs. Note that all entries of Table 8.2 show a violation of the Sure Thing Principle and, consequently, the law of total probability. In a classical setting, assuming neutral priors, it is expected that:

P r ( P2 = Defect | P1 = Defect ) ≥ P r ( P2 = Defect ) ≥ P r ( P2 = Defect | P1 = Cooperate )

However, this is not consistent with the experimental results reported in Table 8.2. Note that P r( P2

= Defect | P1 = Defect ) corresponds to the probability of the second player choosing the Defect action given that he knows that the first player chose to Defect. In Table 8.2, this corresponds to the entry Known to Defect. In the same manner, P r( P2 = Defect | P1 = Cooperate ) corresponds to the entry Known to Cooperate. The observed probability during the experiments concerned with player 2 choosing to defect, P r ( P 2 = Defect ), corresponds to the unknown entry of Table 8.2 because there is no evidence regarding the first player’s actions. Finally, the entry Classical Probability corresponds to the classical probability P r ( P2 = Defect ), which is computed through the law of total probability assuming neutral priors (a 50% chance of a player choosing either to cooperate or to defect):

P r (P2 = Defect) = P r (P1 = Defect) · P r (P2 = Defect|P1 = Defect) +

55 +P r (P1 = Cooperate) · P r (P2 = Defect|P1 = Cooperate)

Literature Known to Defect Known to Cooperate Unknown Classical Probability Shafir & Tversky [172] 0.9700 0.8400 0.6300 0.9050 Crosson [55]a 0.6700 0.3200 0.3000 0.4950 Li & Taplin [125]b 0.8200 0.7700 0.7200 0.7950 Busemeyer et al. [39] 0.9100 0.8400 0.6600 0.8750 Hristova & Grinberg [86] 0.9700 0.9300 0.8800 0.9500 Average 0.8700 0.7400 0.6400 0.8050

Table 4.6: Works of the literature reporting the probability of a player choosing to defect under several conditions. a corresponds to the average of the results reported in the first two payoff matrices of the work of Crosson [55]. b corresponds to the average of all seven experiments reported in the work of Li & Taplin [125].

Many works in the literature have been proposed in order to address the violations to the Sure Thing Principle under a quantum approach. For instance, Pothos & Busemeyer [164] proposed a quantum dy- namic model in which the players beliefs and actions were represented as a superposition in a quantum state vector. When the player knew the information about the opponent’s action, the superposition state collapsed to a new state that was compatible with the action chosen by the opponent. When the player did not know about the opponent’s move, then the system involved through the application of unitary transformations. In Asano et al. [21], the authors focus on irrational choices and developed a model based on the quantum superposition and interference principles for the prisoner dilemma game. After each play, the state is updated until it reaches an equilibrium. In Cheon & Tahahashi [48], the authors analysed quantum conditional probabilities in the prisoner’s dilemma game. In Accardi et al. [3], the authors explored the impact of violations of probability theory in economics and analyse the two-stage gambling game and the prisoner’s dilemma game under a quantum probabilistic point of view, by propos- ing a quantum Markov Model to explain the paradoxical observations in these games. Finally, Asano et al. [15] proposed a quantum bayesian updating scheme based on the formalisms of quantum mechan- ics that represents mental states on a Hilbert state. Through the usage of projections, they were able to introduce a model that could explain the paradoxical findings of the two-stage gambling game [198]. Another similar work that compares Bayes rule to its quantum counterpart corresponds to Busemeyer & Trueblood [36].

4.5 Order of Effects

In social sciences and behaviour research, there is also the problem of order of effects. Classical probability theory cannot take into account order of effects, because, given two events A and B, the probability of confirming a hypothesis H is the same as when giving the events in reversed order, i.e. B and A. That is P r(H|AB) = P r(H|BA). However, empirical experiments performed in social and behaviour research show that this equality does not hold and, in fact, the order in which the events are presented influences human judgements [175, 182]. In quantum mechanics, this phenomena also occurs: when performing measurements, the order in which one performs the measurements leads to

56 Clinton-First Clinton-Last Differences Pr After First Question P r(C) = 50% P r(G) = 68% = 18% (non-comparative) Pr After Second Question P r(G|C) = 60% P r(C|G) = 57% = 3% (comparative)

Table 4.7: Summary of the results obtained in the work of Moore [134]. different outcomes.

P r(B|(H ∩ A)) P r(A|H ∩ B) P r(H|A ∩ B) = P r(H|A). = P r(H|B). = P r(H|B ∩ A) P r(B|A) P r(A|B)

The first works in the literature that tried to explain the order of effects phenomena were mainly based on heuristics: the averaging model [174] and the belief adjustment model [85]. The main problem of these approaches is that they scarce in axiomatic foundations and their model cannot be generalised to other question order effects problems. One of the most known works in this area correspond to Moore [134], where the author collected public opinions between two important people: Bill Clinton and Al Gore. In this poll, half of the partic- ipants were asked if they thought that Bill Clinton was a honest and trustworthy person and next they were asked the same about Al Gore. The other half of the participants were asked the same ques- tions, but in reversed order. Results showed that there was a big gap in a non-comparative context (the first questions) and a small gap in a comparative context (the second question). The author called this phenomena the assimilation effect and classical probability theory was unable to explain it. Table 4.7 summarises the results obtained by Moore [134]. In the work of Wang & Busemeyer [202], the authors managed to develop a quantum model for question order of effects, which could accommodate Moore’s findings and could explain the assimila- tion effect. In their model, they represented the beliefs about Bill Clinton and Al Gore in two different basis. When the participants answered a question about Bill Clinton, a projection was made, creat- ing a new state. From this new state, another projection was made regarding the participants answer about Al Gore. When the questions were asked in the reversed order, the measurements led to different projections, causing different outcomes. Bergus et al. [25] also investigated how the order of effects in clinical data can influence decisions in diagnostics. In their experiment, a set of professionals was divided in two. In one half, they had to estimate the probability of the patient having a urinary traction infection (UTI) receiving the patient’s medical history and the patient’s physical examination first (H & P - First) and then laboratory data. The other half, received laboratory data first and then received information about the patient’s medical history and physical examination (H & P - Last). The authors noticed that in the presence of different pieces of evidence, the beliefs of the professionals was always changing and, in the end, the estimations made by the two separate groups was distinct, violating the mathematical principle that P r(H|AB) = P r(H|BA). Table 4.8 summarise the findings. McKenzie et al. [132] investigated the changing of beliefs in scenario involving juror duty in a burglary case. Participants were divided into two groups. One had to rate their confidence of the defendant being guilty after hearing the prosecution and after hearing the defence. The other group had to rate the

57 H & P First H & P Last Prior Probability P r(UTI) = 0.6740 P r(UTI) = 0.6780 First Set of Evidences P r(UTI|H&P ) = 0.778 P r(UTI|Lab) = 0.4400 Final Set of Evidences P r(UTI|H&P, Lab) = 0.5090 P r(UTI|Lab, H&P ) = 0.5910

Table 4.8: Results obtained from the medical decision experiment in Bergus et al. [25].

After First Case After Second Case P r(G|S, P ) = 0.6720 P r(G|SP,WD) = 0.7190 P r(G|W, D) = 0.5100 P r(G|WD,SP ) = 0.7500 P r(G|W, P ) = 0.6000 P r(G|WP,WD) = 0.5250

Table 4.9: Results reported by Trueblood & Busemeyer [188] of the experiments performed by McKenzie et al. [132]. strength of the prosecution and of the defence. In the first case, one group of participants was presented with a strong prosecution (SP) followed by a weak defence (WD) and the other group was presented with a weak defence followed by a strong prosecution. On the second experiment, the strength of the prosecution varied between groups. One group rated their confidence in guilty after being presented with a strong prosecution followed by a weal defence. The other group rated their belief in guilty after a weak prosecution followed by the same weak defence. The results are summarised in Table 4.9 and again the order of effects changes the final outcome, which is violating classical probability theory. McKenzie et al. [132] proposed a model that could explain this findings, however the proposed solution could not be generalised and lacked on axiomatic foundations. In Trueblood & Busemeyer [188], the authors focused on the experiments of Bergus et al. [25], McKenzie et al. [132] and developed a quantum-like model of causal reasoning. They applied their model to the medical decision-making experiment of Bergus et al. [25] and to the juror decision task of McKenzie et al. [132]. The authors modelled participant’s beliefs by quantum state vectors. In order to compute the probability of an event, they used quantum projections to project the initial superpo- sition state into the subspace representing the observed event, and then they computed the squared magnitude of this projection to extract the probabilities using Born’s rule. In their model, Trueblood & Busemeyer [188] also deal with incompatible events by applying unitary transformations to the belief states, rotating the incompatible basis vector into another. The authors realised that the quantum model enabled an accurate fit to the data than the classical model and was able to provide different probabil- ity values for different orders of effects for a fixed hypothesis. The model developed was proved to be general, since it was applied for different decision scenarios. Later, in Trueblood & Busemeyer [189], the authors explored a quantum dynamical model for causal reasoning. This model could be applied in judgments with a predictive nature of the form P r(Effect|Cause) = 2 ||Pe|ψi|| , where Pe is a projection operator and |ψi is the initial belief state. These judgments can also be seen as causal reasoning, since we are determining the probability of an effect given the occurrence of some causal event. The general idea of the model is similar to their previous work. The initial be- lief state, |ψi, is formalised as a superposition of the participant’s beliefs. When the participant gains knowledge of the causal event, the initial belief state is updated. The superposition state suffers a rota-

58 tion through the application of a unitary operator. Then, it is projected into the subspace representing the causal event. The model could also take into account judgments with a diagnostic effect, which 2 correspond to evidential reasoning with the form: P r(Cause|Effect) = ||PcU|ψi|| .

4.6 Summary and Final Discussion

This chapter addressed the preliminary experiments of Tversky & Shafir [198] about violations of the classical probability theory on the sure thing principle. This principle states that if one chooses action A over B in a state of the world X, and if one also chooses action A over B under the complementary state of the world X, then one should always choose action A over B, even when the state of the world in unspecified. When humans need to make decisions under risk, several heuristics are used, since humans cannot process large amounts of data. These decisions coupled with heuristics lead to violations on the law of total probability and to the occurrence of several paradoxes and to the disjunction effect. Example of the literature of disjunction effects are the two stage gambling game and the prisoner’s dilemma game. The Ellsberg and Allais paradoxes were created in order to illustrate inconsistencies of actual ob- served choices against predictions of the subjective and expected utility functions, respectively. Ellsberg paradox is also known as disjunction fallacy or disjunction error and leads to violations of the sure thing principle. Several different quantum-like models have been proposed in the literature that make use of quantum interference effects or time evolution using unitary matrices in order to accommodate these inconsistencies. Conjunction errors correspond to when people judge the probability of intersection of two events to be greater than the probability of a single event. The disjunction error, on the other hand, corresponds to the situation where individuals tend to give more preference to a single event compared to the situation corresponding to the union of that very same event with another. The well known Linda problem created by Tversky & Kahneman [196] is an example of such fallacy. Several quantum-like models have been proposed in the literature to accommodate violations to the conjunction fallacy. Although all the proposed models are different, they all agree that quantum interference effects play an important role to address this issue. Finally, in classical theory, the act of asking a sequence of two questions should yield the same answers as in the situation where the questions are posed in reversed order. Empirical experiments have shown that this is not the case, and the act of answering the first question changes the context of the second question, yielding people to give different answers according to the order in which the questions are posed. This is called order effects. Quantum theory allows the explanation of order effects intuitively, since operations in quantum theory are non-commutative.

59 60 Chapter 5

Related Work

In this chapter, we provide an overview and discussion of the most important state-of-the-art quan- tum cognitive models that are able to explain the paradoxical findings of experiments that violate the Sure Thing Principle (ex: the Prisoner’s Dilemma game [172]). We conduct a deep comparison of and discussion on several quantum models: the Quantum-Like Approach [104], the Quantum Dynamical Model [41], the Quantum Prospect Decision Theory [215] and Quantum Bayesian Networks [191, 124, 140, 141]. We discuss these models in terms of three elements: (1) incorporation of quantum inter- ference effects, (2) how to find quantum parameters and (3) scalability of the model for more complex decision problems.

The first measure attempts to check how preferences are chosen under uncertainty. Following the work of Yukalov & Sornette [215], towards uncertainty, human beings tend to have aversion preferences. They prefer to choose an action that brings them a certain but lower propensity/utility instead of an action that is uncertain but can yield a higher propensity/utility [71]. This can be simulated through quantum interference effects, in which one outcome is enhanced (or diminished) towards the opposite outcome.

The second measure takes into account the problem of quantum parameters. In quantum mechanics, a quantum state is modelled by probability amplitudes [207]. These amplitudes describe the behaviour of a wave function that represents the quantum state. Associated with each probability amplitude is a quantum parameter representing the phase of the wave. The interpretation of this parameter under the psychology literature is still not clear, although various works have presented interpretations [164]. Moreover, when applying quantum principles to cognition (or to any other subject), one will need to set these quantum parameters in such a manner that they will lead to accurate predictions. In this metric, we will check how easy it is for the analysed models to set these parameters.

The third and last metric consists of determining if the model can be extended to more complex scenarios. Although there are many experiments that report violations of the Sure Thing Principle [164, 41, 120, 122], these experiments consist of very small scenarios that are modelled by, at most, two random variables. Therefore, many of the proposed models in the literature are only effective under such small scenarios and become intractable (or even cannot be applied) under more complex situations. These elements will be analysed with more detail in Section 5.7 of the present work.

61 We have collected a set of models from the literature that attempt to tackle violations of the Sure Thing Principle in a quantum fashion, and then we compare the collected models. For this comparison, we just show, through a mathematical description of each model, their advantages and disadvantages. That is, we compare these models with the three elements proposed: number of parameters involved in the model, the scalability of the quantum interference effects and their usage. We will also show that classical models also suffer from the same parameter growth problem as quantum approaches. However, because these models must obey set theory and the laws of classical probability, it is not possible to use them to make predictions in situations where the Sure Thing Principle is being violated. In the end, we make a brief overview of other important models of the literature.

5.1 Disjunction Fallacy: The Prisoner’s Dilemma Game

The Prisoner’s Dilemma Game is an example of a disjunction fallacy as already presented in Sec- tion 4.4.2. In this game, there are two prisoners who are in separate solitary confinements with no means of speaking to or exchanging messages with each other. The police offer each prisoner a deal: they can either betray each other (defect) or remain silent (cooperate). Table 5.1, shows the average results obtained in the literature. In the next sections, we will introduce the most representative models in the quantum cognition literature that are able to solve problems concerning violations of the Sure Thing Principle and also show that a classical model cannot accommodate violations to the Sure Thing Principle. We will also demonstrate how quantum models work when trying to predict the probabilities of the average results of the Prisoner’s Dilemma Game, reported in Table 5.1.

Known to Defect Known to Cooperate Unknown Classical Probability Literature (Section 4.4.2) 0.8700 0.7400 0.6400 0.8050

Table 5.1: Average results of several different experiments of the Prisoner’s Dilemma Game reported in Section 4.4.2.

5.2 A Classical Markov Model of the Prisoner’s Dilemma Game

A Markov Model can be generally defined as a stochastic probabilistic undirected graphical model that satisfies the Markov property. This means that the process evolves (and tries to perform a prediction) based only on the present state. The current state is independent of any past or future states. These probabilistic models are very useful for modeling systems that change states according to a transition matrix that specifies some probability distribution or some transition rules that depend solely on the current state. One can apply a dynamical Markov process to model the Prisoner’s Dilemma Game in the following manner. Having as reference the work of Pothos & Busemeyer [164], the Prisoner’s Dilemma is a 2- person game and can be modelled in a four-dimensional classical Markov model. Initially, the states can be represented by all possible actions of the players: Cooperate (C) and Defect (D). These are represented in a state vector in which all possible actions are equally likely to be chosen:

62     DD 1         DC 1 1 PI =   =   ·     4 CD 1     CC 1

The probability of the second player choosing to Defect given that the action of the other player is unknown is given by Equation 5.1 and consists of the multiplication of this initial probability state PI by a transition function T (t):

PF = T (t) · PI (5.1)

The transition function T (t) is represented by a matrix containing positive real numbers and with the constraint that each row must sum to one (normalisation axiom). In other words, this matrix represents the new probability distribution across the player’s possible actions over some time period t [164].

d T (t) = K · T (t) ⇒ T (t) = eK.t (5.2) dt

In Equation 5.2, the matrix K corresponds to an intensity matrix. It is a matrix representation of all payoffs of the players. A solution to the above equation is given by T (t) = eK.t, which allows one to construct a transition matrix for any time point from the fixed intensity matrix. These intensities can be defined in terms of the evidence and payoffs for actions in the task. In other words, the intensity matrix performs a transformation on the probabilities of the current state to favor defection or cooperation, which are represented by the parameters µd and µc, respectively [164].

        1 0 µd 1 0 0 µc 1 KAd =   ⊗   KAc =   ⊗   (5.3) 0 0 1 −µd 0 1 1 −µc

  −1 µD 0 0      1 −µD 0 0  KA = KAd + KAc =   (5.4)    0 0 −1 µC    0 0 1 −µC

The payoffs are represented by Equation 5.5. In the work of Pothos & Busemeyer [164], the authors proposed the incorporation of dissonance effects to simulate the change, uncertainty and contradictory beliefs that a player can experience. This is given by the parameter γ.

  −1 0 γ 0      0 −γ 0 1  KB =   (5.5)    1 0 −γ 0    0 γ 0 −1

63 Thus, the final intensity matrix K is given by:

  −2 µD γ 0      1 −γ − µD 0 1  K = KA + KB =   (5.6)    1 0 −1 − γ µC    0 γ 1 −µC − 1

To compute the final probability of a player defecting, we need to sum the dimensions of the column vector PF that correspond to the second player choosing the action Defect. Note that the four dimen- sions of the column vector PF correspond to [ DDDCCDCC ], where C corresponds to Cooperate and D to Defect. The first letter represents the action chosen by the first player, and the second letter corresponds to the action of the second player. Thus, the probability of player 2 choosing the action

Defect corresponds to the summation of the first and the third dimensions of the column vector PF :

P r( P2 = Defect ) = PF [1st dim] + PF [3rd dim] (5.7)

In Equation 5.7, we do not need to perform any normalisation in the end because the operation in Equation 5.1 together with the intensity matrix K ensures that the values computed are already prob- ability values. Moreover, there is no possible combination of parameters resulting from Equation 5.7 that will satisfy the results observed in Table 8.2. This occurs because, although we have parame- terized the Markov Model, we need to ensure that the computed values obey the laws of probability theory in the end. Thus, there is no possible optimization that can predict the violation of the Sure Thing Principle in such situations. This was already noticed in the previous works of Pothos & Busemeyer [164], Busemeyer et al. [41]. Other works in the literature focus on the differences of quantum and classical models [203, 121, 42]. In the next sections, we explain several quantum approaches proposed in the literature that can accommodate violations to the Sure Thing Principle.

5.3 The Quantum-Like Approach

The Quantum-Like Approach has its roots in contextual probabilities. This model was proposed by A. Khrennikov and corresponds to a general contextual probability space from which the classical and quantum probability models can be derived [108, 104].

5.3.1 Contextual Probabilities: The Vaxj¨ o¨ Model

In the Vaxj¨ o¨ Model, the context relates to the circumstances that form the setting for an event in terms of which it can be fully understood, clarifying the meaning of the event. For instance, in domains outside of physics, such as , one can have mental contexts. In social sciences, we can have a social context. The same idea is applied to many other domains, such as economics, politics, game

64 theory, and biology. Associated with a context, there is a set of observables. In quantum mechanics, an observable corresponds to a self-adjoint operator on a complex Hilbert Space. Under the Vaxj¨ o¨ Model, these observables correspond to the set of possible events with their respective values.

P rcontext = (C, O, π) (5.8)

For instance, for a context C ∈ C and for an observable a ∈ O having values α, the probability of the value of one observable is expressed in terms of the conditional (contextual) probability involving the values of an observable. That is, the probability distribution π is given by:

π(O, C) = P r( a = α | C ) (5.9)

If we move into the quantum mechanics realm, Equation 5.9 can be interpreted as the selection with respect to the result a = α of a measurement performed in a. For the contextual probability model, the Vaxj¨ o¨ corresponds to M = ( C, O, π( O, C ) ). Again, C is a set of contexts, O is the set of observables, and π( O, C ) corresponds to a probability distribution of some observables belonging to a specific context. In addition, assume for a context C ∈ C that there are two dichotomous observables a, b ∈ O and that each of these observables can take some values α ∈ a and β ∈ b, respectively. The Vaxj¨ o¨ Model can be built from the general structure of the quantum law of total probability. That is, the formula is a combination of the classical probability theory with a supplementary term called the interference term (Equation 5.10). This term does not exist in classical probability and enables the representation of interferences between quantum states.

P r(b = β) = Classical P robability(b = β) + Interference T erm (5.10)

Under this representation, we can replace Classical P robability by the classical law of total proba- bility and also replace the quantum Interference T erm by a supplementary measure, represented by δ( β | a, C ). Under the Vaxj¨ o¨ Model, the term δ( β | a, C ) corresponds to:

X δ(β|a, C) = P r(b = β) − P r(a = α|C)P r(b = β|a = α, C) (5.11) α∈a Equation 5.11 can be written in a similar way to the classical probability in the following manner:

X P r(b = β|C) = P r(a = α|C)P r(b = β|a = α, C) + δ(β|a, C) (5.12) α∈a

If we perform the normalisation of the probability measure of supplementary δ(β | a, C) by the square root of the product of all probabilities, we obtain:

δ(β|a, C) λθ = pQ (5.13) 2 α∈a P r(a = α|Ctx)P r(b = β|a = α, C)

65 From Equation 5.13, the general probability formula of the Vaxj¨ o¨ Model can be derived. For two variables, it is given by:

X sY P r(b = β|C) = P r(a = α|C)P r(b = β|a = α, C) + 2λθ P r(a = α|C)P r(b = β|a = α, C) (5.14) α∈a α∈a

If we look closely at Equation 5.14, we can see that the first summation of the formula corresponds to the classical law of total probability. The second term of the formula (the one that contains the λθ parameter) does not exist in the classical model and is called the interference term.

5.3.2 The Hyperbolic Interference

Although the Quantum-Like Approach provides great possibilities compared with the classical one, it appears that it cannot completely cover data from psychology and that a quantum formalism was not enough to explain some paradoxical findings (see Khrennikov et al. [111]), so hyperbolic spaces were proposed [100, 156, 155]. P From Equation 5.14, if P r( b = β ) − α ∈ a P r( a = α | C ) P r( b = β | a = α, C ) is different from zero, then various interference effects occur. To determine which type of interference occurred, one tests the Vaxj¨ o¨ Model for quantum probabilities. This can be determined by normalising the supplementary measure in a quantum fashion, just as presented in Equation 5.13. If we are under a quantum context, then the quantum interference term will be:

sY δ(β|a, C) = 2 P r(a = α|C)P r(b = β|a = α, C) cos(θ) (5.15) α∈a

In a quantum context because the supplementary term δ( β | a, C ) is being normalised in a quantum fashion, then we automatically know that the indicator term λθ will always have to be smaller than 1 to obtain quantum probabilities, λθ ≤ 1. Thus, under trigonometric contexts, the Vaxj¨ o¨ Model for quantum probabilities becomes:

X sY λθ = cos(θ) → P r(β|C) = P r(α|C)P r(β|α, C) + 2 P r(α|C)P r(β|α, C) cos(θ) (5.16) α∈a α∈a

If, however, the probability P r(b = β) was not computed in a trigonometric space (that is, it is not quantum), then, it is straightforward that the quantum normalisation applied in Equation 5.13 will yield a value larger than 1. Because we are not in the context of quantum probabilities, the quantum nor- malisation factor will fail to normalise the interference term and will produce a number larger than the normalisation factor. Under these circumstances, the Vaxj¨ o¨ Model incorporates the generalisation of hyperbolic probabilities, arguing that the context in which these probabilities were computed was Hyper- bolic [103, 108, 155].

66 Under Hyperbolic contexts, the Vaxj¨ o¨ Model contextual probability formula becomes:

X sY λθ = cosh(θ) → P r(β|C) = P r(α|C)P r(β|α, C) ± 2 P r(α|C)P r(β|α, C) cosh(θ) (5.17) α∈a α∈a

In summary, according to the values computed by the indicator function λθ, the Vaxj¨ o¨ Model enables the computation of probabilities in the following contexts:

• If | λθ | = 0, then there is no interference, and the Vaxj¨ o¨ Model collapses to classical probability theory.

• If | λθ | ≤ 1, then we fall into the realm of quantum mechanics, and the context becomes a Hilbert space. The indicator function is then replaced by the trigonometric function cos( θ ).

• If | λθ | > 1, then we fall into the realm of hyperbolic numbers, and the context becomes a hyperbolic space. The indicator function is then replaced by the hyperbolic function cosh(θ).

5.3.3 Quantum-Like Probabilities as an Extension of the Vaxj¨ o¨ Model

The probabilities that emerge from the Vaxj¨ o¨ model for trigonometric spaces (i.e., quantum probabilities), do not provide a complete description of a quantum system because it can violate the positivity axiom of probability theory [108]. In this sense, an algorithm was proposed in the literature that extends the Vaxj¨ o¨ model and is able to accommodate the positivity axiom. The algorithm proposed is the Quantum-Like Representation Algorithm (QLRA), and it was proposed by A. Khrennikov [91, 92, 94, 97, 98]. As already mentioned, quantum complex amplitudes can be obtained from classical probability by using Born’s rule [220, 219]. In the QLRA, for any trigonometric context C, one can simplify Born’s rule for two dichotomous variables using Equation 5.19[108].

P r(β|C) = P r(α1|C)P r(β|α1,C) + P r(α2|C)P r(β|α2,C)+ p p +2 P r(α1|Ctx)P r(β|α1, Ctx) P r(α2|Ctx)P r(β|α2,C) cos θ (5.18)

Equation 5.18 can be simplified in the following manner:

2 p iθβ|α,C p P r(β|C) = P r(α1|C)P r(β|α1,C) + e P r(α2|C)P r(β|α2,C) (5.19)

Equation 5.19 corresponds to the representation of the quantum law of total probability through the

Vaxj¨ o¨ model. In this equation, the angle θβ|α,C corresponds to the phase of a random variable and incorporates the phase of both A = α1 and A = α2 in the following manner: θβ|α, C = θβ|α1 − θβ|α2 . One should note that the Quantum-Like Approach can be extended to more complex decision sce- narios, that is, with more than two random variables. However, this will lead to the very difficult task of

67 tuning an exponential number of quantum θ parameters. Peter Nyman noticed this problem when he generalised the Quantum-Like Approach for 3 dichotomous variables [154, 156, 158, 157].

5.3.4 Modelling the Prisoner’s Dilemma using the Quantum-Like Approach

If we want to compute the average probabilities reported in Table 8.2 for the Prisoner’s Dilemma game, then we would need to make the following substitutions to Equation 5.18:

P r (α1|C) · P r (β|α1,C) = P r (P1 = Defect|C) · P r (P2 = Defect|P1 = Defect) = 0.5 × 0.87 = 0.435

P r (α2|C)·P r (β|α2,C) = P r (P1 = Cooperate|C)·P r (P2 = Defect|P1 = Cooperate) = 0.5×0.74 = 0.37

The main problem of the Vaxj¨ o¨ model and the Quantum-Like Approach is that it can only address very small decision scenarios and the fitting of the θ parameter has to be done manually. To compute the probability of a player choosing to defect, P r (P2 = Defect), one would proceed as follows:

√ √ P r(P2 = Defect) = 0.435 + 0.37 + 2 · 0.435 · 0.37 · cos(θ)

To achieve the observed result, θ must be equal to 1.7779 to achieve the final probability P r(P2 = Defect) = 0.64. However, this method does not provide any other means to find this θ parameter except by extrapolating the observed data.

5.4 The Quantum Dynamical Model

In the works of Pothos & Busemeyer [164], Busemeyer et al. [44, 41], the authors present a model to perform quantum time evolution. This model requires the creation of a doubly stochastic matrix, which represents the rotation of the participants’ beliefs. The double stochasticity is a requirement to preserve unit length operations and to obtain a probability value that does not require normalisation. The doubly stochastic matrix that the model requires can only be computed by the use of an aux- iliary Hamiltonian matrix, which needs to be self-adjoint. For instance, to explain the average results of the Prisoner’s Dilemma game, the Hamiltonian matrix is given by Equation 5.20, where µD and µC correspond to parameters representing the defect and cooperate actions, respectively.

        1 0 µD 1 1 0 0 µC 1 1 HAd = ⊗ HAc = ⊗     p 2     p 2 0 0 1 −µD 1 + µD 0 1 1 −µC 1 + µc

  √µD √ 1 0 0 1+µ2 1+µ2  D D    √ 1 √µD  2 − 2 0 0   1+µD 1+µD  HA = HAd + HAc =   (5.20)  0 0 √µC √ 1   1+µ2 1+µ2   c C   √ 1 √ µC  0 0 2 − 2 1+µD 1+µD

68 The dynamical model also takes dissonance effects into account. That is, the participants might have been confronted by some information that conflicted with his/her existing beliefs to simulate the dissonance effect when the participants had to decide on an action. Thus, the Quantum Dynamical

Model makes use of a second Hamiltonian matrix, HB.

    +1 0 +1 0 0 0 0 0          0 0 0 0 −γ 0 −1 0 +1 −γ HBd =   · √ HBc =   · √   2   2 +1 0 −1 0 0 0 0 0      0 0 0 0 0 +1 0 +1   −√γ 0 −√γ 0  2 2   0 √γ 0 −√γ   2 2  HB = HBd + HBc =   (5.21)  −γ γ   √ 0 √ 0   2 2  0 −√γ 0 −√γ 2 2

The general Hamiltonian matrix combines the matrices from steps 1 and 2. In the end, the final matrix needs to be self-adjoint and, consequently, symmetric. To explain the average results of the Prisoner’s Dilemma game, the final Hamiltonian matrix is given by:

  −√γ + √µD √ 1 −√γ 0 2 1+µ2 1+µ2 2  D D     √ 1 √γ − √µD 0 −√γ   1+µ2 2 1+µ2 2  H = HA + HB =  D D  (5.22)  −√γ 0 √γ + √µC √ 1   2 2 1+µ2 1+µ2   C C    0 −√γ √ 1 −√γ − √µC 2 2 2 2 1+µC 1+µC

Next, we need to create a unitary matrix. In quantum mechanics, a unitary matrix restricts the allowed evolution of quantum systems, ensuring that the sum of probabilities of all possible outcomes of any event is always 1. This means that the matrix must be doubly stochastic (all rows and columns sum to 1). In the Quantum Dynamical Model, this matrix encodes all state transitions that a person can experience while choosing a decision. A unitary matrix is computed by a differential equation called Schrodinger’s¨ equation:

δ U(t) = −i · H · U(t) ⇒ U(t) = e−i·H·t (5.23) δt

The parameter t corresponds to the time evolution. Under the Dynamical Quantum Model, this parameter is set to π/2, corresponding to the average time that a participant takes to make a decision (approximately 2 seconds).

The initial belief state corresponds to a quantum state representing a superposition of the partici- pant’s beliefs.

69   1     1  1  Qi =   (5.24) 2    1    1

By multiplying the unitary matrix with the initial superposition belief state, one can compute the tran- sition of the participants’ beliefs at each time. The final vector Qf represents the amplitude distribution across states after deliberation

  1      1  1 QF = U · Qi = U ·   · (5.25)   2  1    1

Having the final state QF , one can compute probabilistic inferences by computing the sum squared magnitude of the rows of interest in the final belief state. Note that the four dimensions of the column vector QF respectively correspond to [ DDDCCDCC ], where C corresponds to Cooperate and D to Defect. The first letter represents the action chosen by the first player, and the second letter corre- sponds to the action of the second player. Thus, the probability of player 2 choosing the action Defect corresponds to the summation of the squared magnitude of the first and the third dimensions of the column vector PF :

2 2 2 2 P r(P2 = Defect) = QF [1st dim] + QF [3rd dim] P r(P2 = Cooperate) = QF [2nd dim] + QF [4th dim] (5.26) To explain the average results observed in the Prisoner’s Dilemma Game, in the work of Pothos & Busemeyer [164], the authors chose the following parameters:

• µD = 0.51. This parameter corresponds to a participant choosing the defect action.

• µC = 0.51. This parameter corresponds to a participant choosing the cooperate action.

• γ = 0.6865. This parameter corresponds to the simulation of the dissonance effect.

Using the above parameters, one can estimate the average results of Table 8.2 to be P r(P2 = Defect) = 0.64. The Quantum Dynamical model shows that quantum probability is a very general framework and can lead to many different probabilities. These probabilities just depend on the way one chooses to fit these free parameters. This has also been shown in the previous study of Moreira & Wichert [137]. To illustrate this concept, we decided to fix one of the parameters µD, µC or γ and vary the others. Figures 5.1 to 5.3 show all possible probabilities that can be obtained with the presented Dynamical Quantum Model for the Prisoner’s Dilemma game. In the Quantum Dynamical Model, the parameters used are based on a psychological setting. The incorporation of parameters to model dissonance effects and the payoffs of the players provide an ap-

70 Figure 5.1: Illustration of the Figure 5.2: Illustration of the Figure 5.3: Illustration of the probabilities that can be ob- probabilities that can be ob- probabilities that can be ob- tained by varying the parame- tained by varying the parame- tained by varying the parame- ters γ and µd. ters γ and µc. ters γ and µB. proximation for the psychology of the problem that is not observed in other quantum cognitive models of the literature. However, one great disadvantage of the Quantum Dynamical Model is related to Hamil- tonian matrices. Creating a manual Hamiltonian is a very hard problem because it is required that all possible interactions of the decision problem are known, and this specification must be made in such a way that the matrix is doubly stochastic. A recent work from Yearsley & Busemeyer [208] describes how to construct Hamiltonians for quantum models of cognition. The Hamiltonian matrix grows exponentially with the complexity of the decision problem, and the computation of a unitary operator from such matri- ces is a very complex process. Most of the time, approximations are used because of the complexity of the calculations involved in the matrix exponentiation operation.

5.5 The Quantum Prospect Decision Theory

The Quantum Prospect Decision Theory was developed by Yukalov & Sornette [210, 215] and developed throughout many other works [211, 212, 213, 214]. The foundations of this theory are very similar to the previously presented Quantum-Like Approach. In the Quantum-Like Approach, we start with two dichotomous observables. In the Quantum Prospect Decision Theory, these observables are referred to as intensions. An intension can be defined by an intended action, and a set of intended actions is defined as a prospect. Each prospect can contain a set of action modes, which are concrete representations of an intension. Making a comparison with the Quantum-Like Approach, a prospect can be seen as a random variable, and the set of action modes are the assignments that each random variable can have. For instance, the intension to play can have two representations: play action A or play action B. Following the work of Yukalov & Sornette [215], two intensions A and B have the respective repre- sentations: A = x where x ∈ a1, a2 and B = y, where y ∈ b1, b2. The corresponding state of mind is given by:

X | ψs (t)i = ci,j (t) | Ai Bji (5.27) i,j

71 Equation 5.27 represents a linear combination of the prospect basis states. From a psychological perspective, the state of mind is a fixed vector characterizing a particular decision-maker with his/her beliefs, habits, principles, etc. That is, it describes each decision-maker as a unique subject.

The prospect states corresponding to the intensions A and B are given by Equation 5.28. The ψ symbol corresponds to quantum amplitudes associated with the prospect state. Under the Quantum Prospect Decision Theory, these amplitudes represent the weights of the intended actions while a person is still deliberating about them.

|πA=a1 i = ψ11|A = a1B = b1i+ψ12|A = a1B = b2i |πA=a2 i = ψ21|A = a2B = b1i+ψ22|A = a2B = b2i (5.28) The probabilities of the prospects can be obtained by computing the squared magnitude of the prospect states (just as in the Quantum-Like Approach and the Quantum Dynamical Model). Consequently, the final probabilities are given by:

2 2 P r(πA=a1 ) = P r(A = a1,B = b1) + P r(A = a1,B = b2) + q(πA=a1 ) = |ψ11| + |ψ12| + q(πA=a1 ) 2 2 P r(πA=a2 ) = P r(A = a2,B = b1) + P r(A = a2,B = b2) + q(πA=a2 ) = |ψ21| + |ψ22| + q(πA=a2 ) (5.29) where the interference term q is defined by:

p p q(πA=a1 ) = 2 · ϕ(πA=a1 ) P r(A = a1,B = b1) · P r(A = a1,B = b2) (5.30) p p q(πA=a2 ) = 2 · ϕ(πA=a2 ) P r(A = a2,B = b1) · P r(A = a2,B = b2)

In Equation 10.27, the symbol ϕ corresponds to the uncertainty factor and is given by Equation 5.31.

ϕ(πA=a1 ) = cos (arg (ψ11 · ψ12)) ϕ(πA=a2 ) = cos (arg (ψ21 · ψ22)) (5.31)

The interference term corresponds to the effects that emerge during the process of deliberation, that is, while a person is making a decision. These interference effects result from conflicting interests, ambiguity, emotions, etc. [215].

One can notice that the Quantum Prospect Decision Theory is very similar to the Quantum-Like Approach proposed by Khrennikov [105]. Both theories end up with the same quantum probability formula. However, the Quantum Prospect Decision Theory provides some heuristics for how to choose the uncertainty factors. This information will be addressed in the next section.

5.5.1 Choosing the Uncertainty Factor

To accommodate the violations of the Sure Thing Principle, the uncertainty factor must be set in such a way that it will enable accurate predictions. Two methods were proposed by Yukalov & Sornette [215] to estimate the uncertainty factor: the Interference Alternation method and the Interference Quarter Law.

72 • Interference Alternation - Under normalised conditions, the probabilities of the prospects p (πj) must sum to 1. This normalisation only occurs if one characterizes the interference term as an alternation such that the interference effects disappear while summing the probability of the prospects. This results in the property of the interference alternation, given by:

X q (πj) = 0 (5.32) j

The interference alternation property is in accordance with the findings of Epstein [71]: the de- structive interference effects can be associated with uncertainty aversion. This leads to a less probable action under uncertainty conditions. In contrast, the probabilities of other actions that contain less uncertainty are enhanced through constructive quantum interference effects. This un- certainty aversion happens quite frequently in situations where the Sure Thing Principle is violated. This implies that one of the probabilities of the prospects must be enhanced, whereas the other must be decreased.

sign [ϕ(πA=a1 )] = −sign [ϕ(πA=a2 )] where |ϕ(πA=ai )| ∈ [0, 1] (5.33)

• Interference Quarter Law - The interference terms generated by quantum probabilistic inferences have a free quantum parameter, which is the uncertainty factor (Equation 5.31). The Interference Quarter Law corresponds to a quantitative estimation of this parameter. The modulus of the inter- ference term q can be quantitatively estimated by computing the expectation value of the probability distribution of a random variable ξ in the interval [0, 1].

Z 1 1 q ≡ ξ · pr (ξ) dξ = (5.34) 0 4

The probability distribution p (ξ) is given by Equation 5.34 and can be computed by taking the average of two probability distributions.

1 1 pr (ξ) = [pr (ξ) + pr (ξ)] = δ (ξ) + Θ (1 − ξ) (5.35) 2 1 2 2

One of the probability distributions, (p1 (ξ)), is concentrated in the center and is described by a Dirac function δ (ξ).

pr1 (ξ) = 2 · δ (ξ) (5.36)

The other probability distribution,(p2 (ξ)), is a uniform distribution in the interval [0, 1].

  0, if ξ < 0 pr2 (ξ) = Θ (1 − ξ) where Θ(ξ) = (5.37)  1, if ξ ≥ 0

73 For a more detailed proof of the Interference Quarter Law, the reader should refer to Yukalov & Sornette [215].

5.5.2 The Quantum Prospect Decision Theory Applied to the Prisoner’s Dilemma Game

In this section, we apply the Quantum Prospect Decision Theory to try to predict the average results for the Prisoner’s Dilemma Game reported in Table 8.2. The probability of a player defecting (and cooperating), given that one does not know what the action of the other player was, is given by Equation 5.38. For simplicity, we will assume the following notation: Defect (D) and Cooperate (C).

P r(P2 = D) = P r(P 1 = D,P 2 = D) + P r(P 1 = C,P 2 = D) + Interferenced (5.38) P r(P2 = C) = P r(P 1 = D,P 2 = C) + P r(P 1 = C,P 2 = C) + Interferencec

The interference terms are given by:

p Interferenced = 2 · ϕ (P 2 = D) · P r(P 1 = D,P 2 = D) · P r(P 1 = C,P 2 = D) (5.39) p Interferencec = 2 · ϕ (P 2 = C) · P r(P 1 = D,P 2 = C) · P r(P 1 = C,P 2 = C)

The uncertainty factors are given by:

interferenced ϕ (P2 = D) = p 2 · pr(P1 = D,P2 = D) · P r(P1 = C,P2 = D) (5.40) interferencec ϕ (P2 = D) = p 2 · pr(P1 = D,P 2 = C) · P r(P 1 = C,P 2 = C)

According to the Interference Quarter Law and to the Alternation Law, the probabilities for acting under uncertainty are given by:

P r(P2 = D) = P r(P1 = D,P2 = D) + P r(P1 = C,P2 = D) − 0.25 (5.41) P r(P2 = C) = P r(P1 = D,P2 = C) + P r(P1 = C,P2 = C) + 0.25

For the Prisoner’s Dilemma Game,

P r(P1 = D,P2 = D) = P r(P 1 = D) · P r(P2 = D|P 1 = D) = 0.5 × 0.87 = 0.435

P r(P1 = C,P2 = D) = P r(P 1 = C) · P r(P2 = D|P 1 = C) = 0.5 × 0.74 = 0.37

Then, the final probabilities are given by:

P r(P2 = D) = 0.435 + 0.37 − 0.25 = 0.555 (5.42) P r(P2 = D) = 0.065 + 0.13 + 0.25 = 0.445

74 The average probability for the Prisoner’s Dilemma Game in Table 8.2 when the first player’s action is unknown is 0.64. That means that, with the Quarter Interference Law together with the Interference Alternation property, the Prospect Quantum Decision Theory obtained an error of 13%.

5.6 Probabilistic Graphical Models

In this section, we introduce the concepts of classical and Quantum-Like Bayesian Networks, as well as some approaches in the literature that formalised traditional Bayesian Networks into a Quantum-Like Approach.

5.6.1 Classical Bayesian Networks

A classical Bayesian Network can be defined by a directed acyclic graph structure in which each node represents a different random variable from a specific domain and each edge represents a direct in- fluence from the source node to the target node. The graph represents independence relationships between variables, and each node is associated with a conditional probability table that specifies a distribution over the values of a node given each possible joint assignment of values of its parents [116]. The full joint distribution [168] of a Bayesian Network, where X is the list of variables, is given by:

n Y P r(X1,...,Xn) = P r(Xi|P arents(Xi)) (5.43) i=1 The formula for computing classical exact inferences on Bayesian Networks is based on the full joint distribution (Equation 11.15). Let e be the list of observed variables and let Y be the remaining unobserved variables in the network. For some query X, the inference is given by:

  X P r(X|e) = αP r(X, e) = α  P r(X, e, y) (5.44) y∈Y

1 Where α = P x∈X P r(X = x, e) The summation is over all possible y, i.e., all possible combinations of values of the unobserved vari- ables y. The α parameter corresponds to the normalisation factor for the distribution P r(X|e) [168]. This normalisation factor comes from some assumptions that are made in Bayes rule.

5.6.2 Classical Bayesian Networks for the Prisoner’s Dilemma Game

We represent the Prisoner’s Dilemma Game under a Bayesian Network structure in which we assume neutral priors: there is a 50% of a player choosing the actions Defect or Cooperate (Figure 5.4). The decision of the first participant is then followed by the decision of the second participant. The probability distribution of the second player is obtained (or learned) from the experimental data for the averaged results in Table 8.2 when the actions of the first player are observed. Using this data, the goal is to try to

75 determine the probability of the second player choosing to defect given that it is not known what action the first player chose.

Figure 5.4: Bayesian Network representation of the Average of the results reported in the literature (last row of Table 8.2). The random variables that were considered are P1 and P2, corresponding to the actions chosen by the first participant and second participant, respectively.

To compute the probability P r(P2 = Defect), two operations are required: the computation of the full joint probability distribution (Equation 11.15) and the computation of the marginal probability. The full joint probability distribution can be easily computed by multiplying all possible assignments of the network with each other. Table 5.2 shows the computation of these probabilities.

P1 P2 Pr( P1, P2 ) Defect Defect 0.5 × 0.87 = 0.4350 Defect Cooperate 0.5 × 0.13 = 0.0650 Cooperate Defect 0.5 × 0.74 = 0.3700 Cooperate Cooperate 0.5 × 0.26 = 0.1300

Table 5.2: Classical full joint probability distribution representation of the Bayesian Network in Figure 5.4.

The marginalisation formula is used when we want to perform queries to the network. For instance, in the Prisoner’s Dilemma Game, we want to know what the probability is of the second player choosing to defect given that we do not know what the other player has chosen, P r(P2 = Defect). This is obtained by summing the entries of the full joint probability (Table 5.2) that have P2 = Defect. That is, we sum up the first and third rows of this table. Equation 5.45 shows this operation. For simplicity, we have used the following notation: D = Defect and C = Cooperate.

P r(P 2 = D) = P r(P 1 = D)·P r(P 2 = D|P 1 = D)+P r(P 1 = C)·P r(P 2 = D|P 1 = C) = 0.8050 (5.45)

In Equation 5.45, one can see that the classical Bayesian Network was not able to predict the ob- served results in Table 8.2 using classical inference. One might think that, if we parameterise the Bayesian Network to take into account the player’s actions and dissonance effects, there could be a possibility of obtaining the required results. This line of thought is legitimate, but one must take into account that, in the end, the probabilistic inferences computed through the Bayesian Network must obey set theory and the law of total probability. This means that, even if we parameterise the network, we cannot find any closed form optimisation that could lead to the desired results. This happened with the previous example of the Markov Model in Section 5.2. Although we parameterised the player’s actions

76 and dissonance effects, we could not arrive at the desired results because they go against the laws of probability theory, and Markov Models (as well as Bayesian Networks) must obey these laws.

5.6.3 Quantum-Like Bayesian Networks in the Literature

There are two main works in the literature that have contributed to the development and understanding of Quantum Bayesian Networks. One belongs to Tucci [191] and the other to Leifer & Poulin [124]. In the work of Tucci [191], it is argued that any classical Bayesian Network can be extended to a quantum one by replacing real probabilities with quantum complex amplitudes. This means that the factorization should be performed in the same manner as in a classical Bayesian Network. Thus, the Bayesian Network of Figure 5.4 could be represented by a Quantum Bayesian Network with the following matrices replacing the conditional probability tables:

 q 2  h q i b · eiθ3 1 − |b · eiθ3 | · eiθ4 iθ 2 iθ P1 = a · e 1 1 − |a · eiθ1 | · e 2 P2 =  q 2  c · eiθ5 1 − |c · eiθ5 | · eiθ6

One significant problem with Tucci’s work is related to the non-existence of any methods to set the phase parameters eiθ. The author states that one could have infinite Quantum Bayesian Networks representing the same classical Bayesian Network depending on the values that one chooses to set the parameter. This requires that one knows a priori which parameters would lead to the desired solution for each node queried in the network (which we never know). Thus, for these experiments, Tucci’s model cannot predict the results observed because one does not have any information about the quantum parameters. In the work of Leifer & Poulin [124], the authors argue that, to develop a Quantum Bayesian Network, quantum versions of probability distributions, quantum marginal probabilities and quantum conditional probabilities are required (Table 5.3). The authors performed a preliminary study of these concepts. Generally speaking, a quantum probability distribution corresponds to a contained in a Hilbert space, with the constraint that the trace of this matrix must sum to 1. In quantum probability the- ory, a full joint distribution is given by a density matrix, ρ. This matrix provides the probability distribution of all states that a Bayesian Network can have. The marginalisation operation corresponds to a quantum partial trace [152, 167].

Classical Probability Quantum Probability

iθ 2 State P r(A) e ψA Joint Probability Distribution P r(A, B) ρAB P Marginal Probability Distribution P r(B) = A P r(A, B) ρB = T rA(ρAB) Conditional State P r (B|A) ρB|A P b∈B P r(b|A) = 1 T r(ρB|A) = IA

Table 5.3: Relation between classical and quantum probabilities used in the work of Leifer & Poulin [124].

In the end, the models of Tucci [191] and Leifer & Poulin [124] fail to provide any advantage relative

77 to the classical models because they cannot take into account interference effects between random variables. Thus, they provide no advantages in modelling decision-making problems that try to predict decisions that violate the laws of total probability. There are other works in the literature that also discuss joint probability distributions under the context of quantum cognition and violations to the Sure Thing Principle, but more focused on the brain as a classical system that can be described by the mathematical principles of quantum theory [63]. For instance, in de Barros [58] it is suggested that these violations can occur, because of contextuality. That is, when one cannot find a full joint probability distribution that is consistent with experimental observed marginal probability distribution, then some interferences might have occurred in order to change the values of the random variables [57]. Based on these ideas, it was proposed a neural oscillator model that was able to explain violations to the Sure Thing Principle using the interferences between neurons as a way to come up with quantum-like interference effects during decision-making [59]. Returning to the models of Tucci [191] and Leifer & Poulin [124], these models fail to provide any advantage relative to the classical models, because they cannot take into account interference effects between random variables. So, they provide no advantages in modelling decision making problems that try to predict decisions that violate the laws of total probability.

5.7 Discussion of the Presented Models

The purpose of this section is to present discussion of and a comparison between the existing quantum models in terms of the proposed evaluation elements: terms of interference, parameter tuning and scalability. The discussion will be mainly focused on the set of parameters that the current quantum cognitive models have that need to be manually tuned to match the desired predictions. For instance, the Quantum Dynamical Model requires three parameters for such small decision scenarios, whereas the Quantum-Like Approach only needs one, and the Quantum Prospect Decision Theory does not need any parameters because it has a static heuristic to replace the interference term. In the end, we will see that the problems that we note for the quantum models are similar to many other classical cognitive models.

5.7.1 Discussion in Terms of Interference, Parameter Tuning and Scalability

In this section, we analyze the presented works in the literature regarding three different elements: interference effects, parameter tuning, and scalability.

• Interference Effects. Many works from the literature state that, through quantum interference effects, one could simulate the paradoxical decisions found across many experiments in the litera- ture. Without interference effects, quantum probability converges to its classical counterpart. This metric examines if the analysed model makes use of any type of quantum interference to explain human decision-making.

78 • Parameter Tuning. The problem of applying quantum formalisms to cognition is concerned with the number of quantum parameters that one needs to find. These parameters grow exponentially with the complexity of the decision problem, and thus far, very few works in the literature have suggested ways to automatically find these parameters to make accurate predictions.

• Scalability. Most problems of the current models of the literature are concerned with their inability to scale to more complex decision scenarios. Most of these models are built to explain very small paradoxical findings (for example, the Prisoner’s Dilemma Game and the Two-Stage Gambling Game). Therefore, this metric consists of analysing the presented models with respect to their ability to extend and generalise to more complex scenarios.

Table 11.3 presents a summary of the evaluation of the models presented in this work with respect to the three elements described above.

Approach Interference Effects Parameter Tuning Parameter Growth Comments Nperson Bayesian / Classical Theory Bayesian / Markov Networks No Manual Nactions Number of parameters varies for different models Nperson Khrennikov [105] Quantum-Like Approach Yes Manual Nactions Grows exponentially large

Pothos & Busemeyer [164] Quantum Dynamical Model Yes Manual Nactions Hamiltonian Size exponential: Nperson Busemeyer et al. [41] Nactions Nperson Yukalov & Sornette [215] Quantum Prospect Decision Theory Yes Automatic Nactions Static heuristic Quarter Law of Interference Nperson Moreira & Wichert [137] Quantum-Like Bayesian Networks Yes Automatic Nactions Dynamic heuristic Moreira & Wichert [140, 139, 141]

Table 5.4: Comparison of the different models proposed in the literature.

Starting the discussion with the classical models presented in Sections 5.2 and 5.6.1, the proba- bilistic inferences computed through Bayesian / Markov Networks must obey set theory and the law of total probability. This means that, even if we parameterise the networks, we cannot find any closed form optimisation that could lead to the desired results. These networks can be modelled with no parameters (just as was presented in Sections 5.2 and 5.6.1), or they can be parameterised. This parameterisation can end up with the same size as the full joint probability distribution of the networks. Although these models do not make use of any quantum interference effects and consequently cannot accommodate vi- olations to the Sure Thing Principle, it is worth noting that one can always classically explain behavioural results through appropriate conditionalisations and extensions of classical probabilistic models [34]. The Quantum-Like Approach Khrennikov [105] is based on the direct mapping of classical probabili- ties to quantum probability amplitudes through Born’s rule. This means that one can perform inferences for more complex decision-making scenarios by using the quantum counterpart of the classical marginal probability formula. Thus, the model generates quantum interference effects. The main problem of the Quantum-Like approach concerns the quantum parameters. The current works of the literature do not provide any means to assign values to these quantum parameters. They have to be manually set to explain the observed outcome. Thus, the Quantum-Like approach, although it can be (mathemati- cally) extended to more complex decision scenarios, does not provide any means to assign quantum parameters. It is an explicative model rather than a predictive one. The Quantum Dynamical Model proposed by Busemeyer et al. [41], Pothos & Busemeyer [164] incorporates quantum interference effects not from the quantum law of probability but by the usage

79 of unitary operators and Hamiltonians. One of the main disadvantages of this model concerns the definition of the Hamiltonian matrices. Creating a manual Hamiltonian is a very hard problem. It is required that all possible interactions of the decision problem are known, and this specification must be made in such a way that the matrix is doubly stochastic. A recent work from Yearsley & Busemeyer [208] describes how to construct Hamiltonians for quantum models of cognition. The unitary matrix also grows exponentially with the complexity of the decision problem, and the computation of a unitary operator from such matrices is a very complex process. Most of the time, approximations are used because of the complexity of the calculations involved in the matrix exponentiation operation. Just as in the Quantum-Like Approach, one needs to manually set the quantum parameters so that the final model can give the observed outcome. It is important to note that, in the Quantum Dynamical Model, the parameters used are based on a psychological setting. The incorporation of parameters to model dissonance effects and the payoffs of the players provide an approximation to the psychology of the problem that is not observed in other quantum cognitive models in the literature.

Finally, the Quantum Prospect Theory proposed by Yukalov & Sornette [215] also incorporates quan- tum interference effects from the quantum law of total probability. This model is very similar, from a mathematical point of view, to the Quantum-Like Approach, with the difference that it proposes laws to compute the quantum interference parameters: the alternation and the quantum quarter laws. Although the model is very precise for very small decision problems (such as the Prisoner’s Dilemma), it is not clear how the quantum quarter law and the alternation law would work for more complex problems. For this reason, the Quantum Prospect Theory is a model that enables the usage of quantum interference terms to make predictions under paradoxical scenarios and also provides an automatic mechanism to set the quantum parameters under very small scenarios with a static interference term (θ = ± 0.25). That is, the interference term is always the same, even for different contextual problems. For this reason, the model is not able to generalise well for more complex decision scenarios.

Regarding Bayesian Networks, it is hard to apply the model proposed in the work of Tucci [191] in paradoxical findings that violate the Sure Thing Principle because the author makes no mention of how to set these parameters. He even argues that a classical Bayesian Network can be represented by an infinite number of quantum Bayesian Networks depending on how one tunes the quantum parameters. Because the model is a Bayesian Network, one is able to perform inferences for any scenario by using the quantum counterpart of the classical marginal probability formula. Thus, in the end, the quantum Bayesian Network proposed by Tucci [191] is scalable and takes into account quantum interference effects; however, it does not give any insights into how to set the quantum parameters that result from the interference.

In the work of Leifer & Poulin [124], the authors create a direct mapping from classical probability to quantum theory. Because they made a quantum Bayesian Network, this model enabled probabilistic inference, and consequently, it can be generalised for any number of random variables through the use of the quantum part of the marginal probability formula. By making the direct mapping from classical to quantum probabilities, the full joint probability distribution is mapped into a density matrix. This means that the interference terms are canceled. The authors also take into account the order in which the oper-

80 ations are performed. Because, the commutativity axiom is not valid in quantum mechanics, we obtain different outcomes if the calculations are performed in a different order. Thus, the quantum Bayesian Network proposed by Leifer & Poulin [124] is scalable and takes into account quantum interference ef- fects; however, by making a direct mapping from classical to quantum, these interference effects will cancel because the network will collapse into its classical counterpart. Thus, in the end, this model does not take advantage of quantum interferences to explain paradoxical decision scenarios. In the work of Moreira & Wichert [137], the authors also make a direct mapping from classical the- ory to quantum probability by replacing classical real probability values by complex quantum probability amplitudes using Born’s rule. They also applied the same mechanism to derive a quantum-like full joint probability distribution formula and a quantum-like marginal probability distribution for exact infer- ence. In the end, the model is very similar to the Quantum-Like Approach, and it can be modelled for more complex decision-making scenarios very easily due to its graphical structure. Because this model uses quantum probability amplitudes, quantum interference effects arise from the quantum-like exact inference formula. However, these parameters grow exponentially large when the levels of uncer- tainty are high, that is, when there are many unobserved nodes in the network. Although the authors have proposed some dynamic heuristics to address this problem in recent works Moreira & Wichert [140, 139, 141], one needs to take into account that it is a heuristic, which means that it can lead to the expected outcome, but it can also lead to completely inaccurate results. Note that we are aware that the problems that we note in this discussion section about the quantum models are the same in many cognitive science models. However, we are not claiming that it is difficult to find the parameters for a game such as the Prisoner’s Dilemma. What we are claiming is that the several models analysed in the work (Quantum-Like Approach, Quantum Dynamical Model, Quantum- Like Bayesian Networks) contain a set of parameters that need to be manually tuned to match the desired predictions. For instance, the Quantum Dynamical Model requires three parameters for such a small decision scenario, whereas the Quantum-Like Approach only needs one, and the Quantum Prospect Decision Theory does not need any parameters, because it has a static heuristic to replace the interference term. The purpose of this discussion section is simply to compare the existing quantum models in terms of the evaluation elements specified in Table 11.3.

5.7.2 Discussion in Terms of Parameter Growth

All models analysed in this work present different growth rates in what concerns parameters. For in- stance, the Dynamical Model parameterises the player’s actions plus an additional parameter to model cognitive dissonance effects. Thus, the number of parameters would be static if we consider the N- Person Prisoner’s Dilemma Game. That is, instead of having only 2 players, it is extended to N players. In the case of the Quantum-Like Approach, we would have 2N parameters for the N-Person Prisoner’s Dilemma Game. The number 2 comes from the fact that each player has two actions (either defect or cooperate). The same applies to the Classical Networks, the Quantum-Like Bayesian Networks and the Quantum Prospect Theory Model. However, because the authors of this last model presented the

81 Quantum Quarter Law of Interference as a static heuristic, this model does not require any parameters. At this point, the reader might be thinking that the Quantum Dynamical Model provides great advan- tages vs. the existing models because the number of parameters required corresponds to the player’s actions with an additional cognitive dissonance parameter. Although this line of thought is correct, one should also take into account how the model unfolds. Although the numbers of parameters do not grow exponentially large as in the Quantum-Like Approach, the size of the Hamiltonian does. In fact, it grows

Nplayers Nplayers exponentially large with the following size: Nactions × Nactions , where Nactions represents the number of actions of the players and Nplayers corresponds to the number of players. We conclude this section by clarifying that most of the quantum cognitive models proposed in the literature have been directed towards small decision scenarios because of the scarcity of datasets repre- senting complex decision scenarios and violations of the Sure Thing Principle. Consequently, the models proposed are simply overfitting simple decision scenarios. Moreover, we believe that the violations of the Sure Thing Principle tend to diminish with the complexity of the decision scenario. Imagine, for instance, a Three-Stage Gambling game. It will be very hard to find significant data that shows a player wishing to play the last gamble given that he has lost the two previous gambles. More experimental data and more studies are needed for more complex decision scenarios to test the viability of quantum models vs. their classical counterparts.

5.8 The Quantum-Like Approach Over the Literature

Since its development, the Quantum-Like approach is a model that was rapidly adapted by several researchers in order to test violations of probability theory in many different subjects outside of the domain of physics [96]. One of the pioneering experiments is given by Conte et al. [53], in which the authors performed an experiment in order to analyze quantum-like behavior in perception. The experiment consisted in using ambiguous figures (in the style of Gestalt images) in order to show that the computation of the probabil- ities through the total law of probability theory diverged from the same probability obtained directly from the experiments. From these results, one could use the Quantum-Like Approach in order to compute the probabilities in order to obtain the observed probability value. A similar experiment is given by the work of Conte et al. [54], Conte [50], Conte et al. [51], but to test if the mathematical principles of quantum mechanics can be used to analyse the nature of cognitive entities during the perception of ambiguous figures. This would suggested quantum like behavior of cognitive entities. Moreover, in Conte et al. [52], the authors showed that cognitive data cannot only exhibit the interference effects, but also violate Bell’s inequality. Also concerning Bell’s inequalities, in Khrennikov [102] the authors proposed an analogue of Bell’s inequalities for conditional probabilities that could be applied to economic agents. That is, when the proposed inequality is violated, then there is strong support that a financial market evolves in a quantum fashion rather than in a classical way. The formalisms of quantum mechanics are so general that can be applied in other disciplines that try to model biological processes. For instance, in the work of Basieva et al. [24] the authors suggest

82 that a cell can exhibit quantum like behaviour in what concerns the levels of lactose and glucose in E. Coli. These effects were observed to be to suffer from a highly destructive interference effect, which quantum trigonometric spaces were not sufficient enough to model. For this reason, the authors turned to hyperbolic quantum-like interference effects in order to simulate the observed values of lactose and glucose. The interference coefficient (and consequently the quantum parameters) were found by looking at the experimental data. Another study that suggests the application of quantum theory in order to model cell dynamics of the process of diauxie in E. Coli was proposed by Asano et al. [17, 16, 18]. In microbiology, diauxie consists in a batch culture of microorganisms that contains two growth phases, one on glucose followed by one on the less common sugar (such as lactose). The authors propose a non-unitary decision making process, since usage of quantum unitary transformations are too restrictive in the sense that information processing in cognitive systems (as well as cell-based systems) cannot be described by using unitary transformations [24]. In the end, this model makes use of non-unitary transition matrices and the quantum interference effects appear in the same way as proposed in the quantum-like approach. More recently, the biological works mentioned have been put together in a book Asano et al. [20] with more insights concerning the application of quantum theory to model other biological phenomena.

5.9 The Quantum Dynamical Model Over the Literature

Under the domain of economics and finances, quantum dynamical models have been used. These models had their motivation in the reflexivity theory from Soros [180]. This theory states that an economic agent consists of two main functions: (i) the cognitive function (which understands reality) and (ii) the manipulative function (which makes an impact on a scenario). According to the reflexivity theory, if both these functions work independently, then one can obtain precise results. If, however, these functions work simultaneously, they can produce reflexes, that is, they produce interferences. These interferences could be represented by a quantum-like theory or through [82]. In the book of [95, Chapter 11], it is presented a brief explanation of how one could use a Bohmiam approach in order to model price dynamics in the scope of financial processes. It is explained that a hamiltonian formalism can be used to describe the evolution of the hard conditions (industrial production, services, etc) of prices in financial markets. On the other had, the evolution of soft conditions related to the psychology of the players, can be modelled using a Bohmiam pilot wave [107]. A similar work was proposed by Choustova [49] in order to describe some degree of predicability in the variation of the asset prices. Also, in the field of biology, Asano et al. [19] explored the implications of a quantum-like dynamical process in order to model a formal description of dynamics of epigenetic states of cells interacting with an environment. Relatively to the application of quantum-like dynamical models to cognition, in the work of Busemeyer et al. [41], the authors also analysed the formalism of quantum mechanics in order to describe the evolution of the cognitive process from the presentation of a decision problem to the actual decision.

83 They performed an empirical experiment based on interactions between categorisation and decision making. This experiment served as an empirical test to compare Classical Markov models with Quantum models. The experiment showed a violation of the law of probability theory while comparing the results between the probability of choosing a decision and the probability of making a categorization, followed by a decision. The authors were able to accommodate such violations by applying the quantum dynamical model in the same way as described in Section 5.4. In a later work, Wang & Busemeyer [204]

5.10 A Model of Neural Oscillators for Quantum Cognition and Negative Probabilities

In de Barros & Suppes [63], the authors argue that contextuality plays an important role in explaining the violations of classical probability in several paradoxical decision scenarios. From the three core features of quantum mechanics (non-determinism, non-locality and contextuality), they argue that only contextu- ality makes sense to map to quantum cognitive models and that there are classical systems that make use of contextuality without the need of using other underlying principles of quantum mechanics [61]. These classical models are based on neural oscillators and negative probabilities [60]. Neural oscillators consist in repetitive neural activity from the nervous system. This activity is repre- sented by waves and therefore are characterised by their phase, frequency and amplitude. Investigations over neural oscillators suggest that cognitive processes are the results of the activity of a large ensemble of synchronised neurons [75]. Some works have used the synchronisation between neurons as models for pattern recognition [200]. In de Barros & Suppes [63], the authors proposed the usage of neural oscil- lators as a classical example that makes use of the property of contextuality to model in a mathematical framework the behavioural stimulus-response (SR) theory. The general idea is to synchronise the firing of a collection of neurons, which represent stimulus and responses. When a stimulus oscillator fires, then the corresponding response oscillators also fire, activating the collection of neurons. An example of how to apply this model to accommodate the paradoxical findings reported for the two stage gambling game can be found in the work of de Barros [59].

Having three oscillators: one for stimulus (Os), and two for the responses (Or1,Or2), since neural oscillators have a periodic wave behaviour, then the simplest mathematical function proposed to de- scribed them was the cosine function, as illustrated through Equations 5.46 to 5.48. The phases of the waves describing the stimulus oscillator and the response oscillators are given by ψs(t), ψs(t) and ψs(t), respectively. The variable A corresponds to the amplitude of the wave and is assumed to be the same for the three oscillators.

Os(t) = Acos(ψs(t)) (5.46)

Or1(t) = Acos(φr1(t)) (5.47)

Or2(t) = Acos(φr2(t)) (5.48)

84 Since the oscillators have a wave behaviour, this means that they will also have the effects of super- position and, consequently, interference. Interference effects in oscillators are very similar to the effects found in the double slit experiment [61]. If only one oscillator is active, the response probabilities col- lapse to classical theory. If, however, both oscillators are active, then they produce interference effects, which disturb the classical probability outcome. This means that neural oscillators produce quantum-like effects and the specification of negative probabilities is required to give rise to interference effects [59]. Negative probabilities were introduced by Dirac [68], where he justifies the concept of negative prob- abilities in the same way as one applies the concept to define negative money. Moreover, the entries of a joint probability distribution can have negative values as long as we do not perform any observation. From the moment of observation, these probabilities become positive de Barros & Oas [62]. In other words, one can interpret negative probabilities as a way to specify that there is no possible joint probabil- ity distribution that can be specified given some marginal probability distributions. In de Barros [58], the authors provide an example of such situation. Let X, Y and Z be a set of binary random variables, which can be assigned the discrete values −1 and 1 and with E(X) = E(Y ) = E(Z) = 0, where E(.) is the expected value. Also, let the marginals XY , XZ and YZ be known probability values, then, according to Suppes & Zanotti [183], one can specify a full joint probability distribution if and only if we satisfy:

− 1 ≤ E(XY ) + E(XZ) + E(YZ) ≤ 1 + 2min{E(XY ),E(XZ),E(YZ)} (5.49)

In the scenario where we measure X = 1, then since X is correlated with Y , then in order to have the expected value −1, one needs to set Y = −1. In the same way, since Y is correlated with Z, then it follows that Z = 1. And, given that Z is correlated with X, then in order to have expected value −1, one would need to set X = −1, which leads to a contradiction. So there is no joint probability distribution for the three variables that would satisfy all marginals. One can model this problem using neural oscillators by correlating the stimulus and response os- cillators like it is proposed in the works of de Barros & Oas [62], de Barros [58, 59], by parameterising the stimulus and response oscillators defined in Equations 5.46 to 5.48. To model violations of the Sure Thing Principle under the Prisoner’s Dilemma game, one would need to model the oscillators to represent the dynamics for the observed conditions (either it is known that the first participant de- fected/cooperated) by activating one of the stimulus oscillators. This would lead to the selection of the random variable Defect or Cooperate. To model the unobserved conditions (not knowing the actions of the first participant) , both oscillators corresponding to both random variables would be activated., producing interference effects. Parameters to fit the model would then be needed to be found by running several computer simulated trials.

5.11 A Quantum-Like Agent-Based Model

In the works of Kitto et al. [115], Kitto & Boschetti [113, 114], the quantum principles of physics are extended to model agent’s attitudes. By attitude, we mean actions that need to be taken under decision

85 scenarios. Their model is motivated by the fact that people’s attitudes are not static. They change according to different contexts and evolve through time. The constant context update and the different attitudes towards different contexts pose difficult challenges to classical models [112]. In Kitto et al. [115], Kitto & Boschetti [113], the authors model the an agent A under some context p by a superposition of states, where |0pi corresponds to the decision Not Act and |1pi corresponds to the decision Act: 2 2 |Ai = α0|0pi + α1|1pi where: |α0| + |α1| = 1

In order to perform a measurement, in other words, to make a decision, a projection operator V is required:

V = |0pih0p| + |1pih1p| = V0 + V1

Thus, the probability of an agent A choosing to Act under a decision scenario can be computed by 2 the squared magnitude of the probability amplitude associated with the state Act, that is: |α1| . Moreover, in this model, the authors assume that people tend to choose a state, which reduces the total uncertainty levels. This statement is supported by Epstein [71]. Towards uncertainty, human beings tend to have aversion preferences. They prefer to choose an action that brings them a certain but lower propensity/utility instead of an action that is uncertain but can yield a higher propensity/utility. The Binary Entropy formula is used as a measure of uncertainty:

Hb(p) = −p log2 p + (1 − p) log2 (1 − p) (5.50)

Equation 5.50 can be rewritten as:

2 2 2 2 Hb(p) = − |α1| log2 |α1| + |α0| log2 |α0|

An agent acts according to a local and an individual state and also according to a global state towards the attitudes and norms of a society of agents. In this case, the entropy formula is extended in order to weight each of these states. An agent can either be favouring an internal state, wi, or a global and social state, ws:

H(|Ai, θ) = wiHb(p(θ)) + wsHb(p(Θ)) (5.51)

The total entropy of a society of agents and its evolution through time is given by the summation of the binary entropy of all agents that belong to a society:

N N X X HT otal = H (|ii, θi, Θi) = wi (i) Hb (P (|ii, θ)) + ws (i) Hb (P (|ii, Θ)) (5.52) i=1 i=1

5.12 Summary and Final Discussion

Recent work in cognitive psychology has revealed that quantum probability theory provides another method of computing probabilities without falling into the restrictions that classical probability has in

86 modelling cognitive systems of decision-making. Quantum probability theory can also be seen as a generalisation of classical probability theory, because it also includes the classical probabilities as a special case (when the interference term is zero).

Quantum probability has the particularity of enabling the representation of events in a geometric structure. The main advantage of this geometrical representation is the ability to rotate from one basis to another to contextualize and interpret events. This ability does not exist in the classical probability theory and provides great flexibility for decision-making systems. Consequently, quantum probability can be more expressive than its traditional classical counterpart. Under quantum theory, these paradoxical findings can simply be seen as consequences of the geometric flexibility that quantum probability theory offers.

We have collected a set of models from the literature that attempt to tackle violations of the Sure Thing Principle in a Quantum fashion, and then we compared the collected models. To illustrate this comparison, we provided a mathematical description of each model and how they could be applied in a decision scenario. We compared the models in terms of three proposed elements: the number of parameters involved in the model, the scalability and the usage of the quantum interference effects. We have also performed a more detailed study concerning the growth of the quantum parameters when the complexity and the levels of uncertainty of the decision scenario increase. We have also performed this comparison with classical models, namely a Markov Model and a Bayesian Network. The main statement of this work is not to express that quantum models are preferred with respect to the classical models. With this work, we have concluded that purely classical models suffer from the same exponen- tial parameterization growth as quantum models, with the added difficulty that they are not capable of simulating results that violate the Sure Thing Principle. It is worth noting that one can always classically explain behavioural results through appropriate conditionalizations of the classical law of total probabil- ity. In the end, classical models are constrained to obey set theory and the laws of probability theory, so there is no closed optimization form that could lead to the paradoxical results found in the experiments violating the Sure thing Principle.

The proposed models of the literature only work for very small decision problems. Most of them do not provide any means to tune the quantum parameters that are required in their models. They can only explain the paradoxical violations reported in the literature and are not able to predict the decisions of the players. One should also note that it is very difficult to validate these types of models, especially when the complexity of the decision problem increases. Thus far, in the literature, there are almost no demonstrations of violations of the Sure Thing Principle for more complex decision scenarios. More studies are needed in this direction to validate the viability of quantum models.

This work provides a technical overview of the proposed quantum models of the literature and a discussion of many key aspects of the original studies. With the proposed evaluation elements, we were able to discuss many key aspects that have been ignored in the literature, namely how the quantum interference terms affect the complexity of the decision problems. Most of the quantum cognitive models proposed in the literature cannot be considered predictive because they cannot predict the results ob- served in several experiments of the literature without manually fitting the quantum parameters. Instead,

87 these models have an explanatory nature because their primary goal is to accommodate the violations of the Sure Thing Principle. The discussions addressed turn this work into a complement to the study of the original works.

88 Chapter 6

Quantum-Like Bayesian Networks for Cognition and Decision

The present chapter presents a new model to make predictions in paradoxical situations where the Sure Thing Principle is being violated. In the previous chapter, it was shown that although many models have been proposed in the literature, most of them cannot be considered predictive, since, so far, the only way these models have to fit the parameters is to use the a priori knowledge of the outcome of the decision scenario to set the parameters, which explain that outcome. Moreover, it was shown that these mod- els present severe difficulties when scaling to more complex decision scenarios: either the number of parameters grows exponentially large [93, 96, 101] or because of computational constraints in the com- putation of very large unitary operators [164, 44, 41]. In order to overcome these difficulties, this chapter presents the first contribution of this thesis. We present a quantum-like Bayesian Network formalism, which consists in replacing classical probabilities by quantum probability amplitudes, which can easily scale to more complex decision scenarios due to its network structure. However, this approach also suffers from the problem of exponential growth of quantum parameters. The study of the impact of the quantum interference terms is the main focus of this work as well as deriving heuristically approaches that can dynamically set these parameters without the need of having the a priori knowledge of the outcome of the decision scenario. In this chapter, we will present the quantum-like Bayesian Network approach proposed in Moreira & Wichert [137, 141] and make an introductory study of the impact of the quantum interference terms in more complex decision scenarios. In the next chapters, we will present different heuristics to compute these parameters.

6.1 Classical Bayesian Networks

Recalling Chapter3, a classical Bayesian Network can be defined by a directed acyclic graph structure in which each node represents a different random variable from a specific domain and each edge rep- resents a direct influence from the source node to the target node. The graph represents independence relationships between variables and each node is associated with a conditional probability table which

89 specifies a distribution over the values of a node given each possible joint assignment of values of its parents. This idea of a node depending directly from its parent nodes is the core of Bayesian Networks. Once the values of the parents are known, no information relating directly or indirectly to its parents or other ancestors can influence the beliefs about it [116]. Figure 6.1 corresponds to the classical Bayesian Network representation of the Two Stage Gambling problem presented in Section 4.4.1.

Figure 6.1: Classical Bayesian Network representation of the average results reported over the literature for the Two Stage Gambling Game (Section 4.4.1, Table 4.5).

6.1.1 Classical Conditional Independece

Associated to Bayesian Networks there is always the concept of conditional independence. Two random variables X and Y are conditionally independent given a third random variable Z if and only if they are independent in their conditional probability distribution given Z. In other words, X and Y are conditionally independent given Z, (X = x ⊥ Y = y|Z), if and only if, given any value of Z, the probability distribution of X is the same for all values of Y and the probability distribution of Y is the same for all values of X. This means that an independence statement over random variables is a universal quantification over all possible values of random variables [116]. Therefore, a probability distribution P r satisfies (X ⊥ Y |Z) if and only if: P r(X,Y |Z) = P r(X|Z)P r(Y |Z) (6.1)

6.1.2 Classical Random Variables

In classical probability theory, a random variable X is defined by a function that associates a real value to each outcome in the sample space Ω, X :Ω → R.

6.1.3 Example of Application in the Two-Stage Gambling Game

In the two-stage gambling game, the random variables correspond to the nodes and their respective conditional probability tables of the Bayesian Network in Figure 6.1. Variable G1 corresponds to the event of the participant wining or losing the first gamble. Variable G2 corresponds to a player deciding to play or not the second gamble.

90 6.1.4 Classical Full Joint Distributions

In classical probability theory, the full joint distribution over a set of N random variables P r(X1,X2, ..., XN ) corresponds to the probability distribution assigned to all of these random variables occurring together in the same sample space [116].The full joint distribution of a Bayesian Network, where Xi is the list of random variables and P arents(Xi) corresponds to all parent nodes of Xi, is given by [168]:

n Y P r(X1,...,Xn) = P r(Xi|P arents(Xi)) (6.2) i=1 Considering the classical Bayesian Network represented in Figure 6.1, its full joint probability distribu- tion is given by a Table and corresponds to the application of Equation 11.15 to the network (Table 6.1). Note that this Bayesian Network assumes that the gambles are not biased and there is a 50% chance of winning / losing the first gamble.

G1 G2 Pr(U, G1,G2) win play 0.5 × 0.68 = 0.34 win not play 0.5 × 0.32 = 0.16 lose play 0.5 × 0.5 = 0.25 lose not play 0.5 × 0.5 = 0.25

Table 6.1: Fulll joint distribution of the Bayesian Newtwork in Figure 6.1 representing the average results reported over the literature for the Two Stage Gambling Game (Table 4.5). The random variable G1 corresponds to the outcome of the first gamble and the variable G2 corresponds to the decision of the player of playing/not playing the second gamble.

6.1.5 Classical Marginalization

Given a query random variable X and let Y be the unobserved variables in the network, the marginal distribution of X is simply the probability distribution of X averaging over the information about Y . The marginal probability for discrete random variables, can be defined by Equation 6.3. The summation is over all possible y, i.e., all possible combinations of values of the unobserved values y of variable Y . The term α corresponds to a normalisation factor for the distribution P r(X) [168].

X X 1 P r(X = x) = α P r(X = x, Y = y) = α P r(X = x|Y = y)P r(Y = y), where α = P P r(X = x) y y x∈X (6.3)

Considering the classical Bayesian Network represented in Figure 6.1, after computing the full joint distribution, we need to sum out all the variables that are unknown, in this case, the variable correspond- ing to the outcome of the first gamble: G1. This is achieved by applying the marginal probability formula in Equation 6.3. We will compute the probabilities of the observed conditions (when the participant knows the out- come of the first gamble) and the unobserved conditions (when the outcome of the first gamble is unknown).

91 The probability of a participant playing the second gamble given that he knows that we won the first gamble, P r(G2 = play|G1 = win), is:

0.34 P r(G = play|G1 = win) = αP r(G = win, G = play) = α0.34 = = 0.68 2 1 2 0.34 + 0.16

0.16 P r(G = not play|G1 = win) = αP r(G = win, G = not play) = α0.16 = = 0.32 2 1 2 0.34 + 0.16

In the same way, the probability of a participant playing the second gamble given that he knows that we lost the first gamble, P r(G2 = play|G1 = lose), is:

0.25 P r(G = play|G1 = lose) = αP r(G = lose, G = play) = α0.34 = = 0.5 2 1 2 0.5

0.25 P r(G = not play|G1 = lose) = αP r(G = lose, G = not play) = α0.16 = = 0.5 2 1 2 0.5

We can also compute the probability of the participant playing the second gamble, without knowing the outcome of the first gamble:

X P r(G2 = P lay) = α P r(G1 = g, G2 = P lay)

g∈G1

X P r(G2 = P lay) = α P r(G1 = g) P r(G2 = P lay | G1 = g)

g∈G1

P r(G2 = P lay) = α [P r(G1 = win) P r(G2 = P lay | G1 = win) +

+ P r(G1 = lose) P r(G2 = P lay | G1 = lose)]

P r(G2 = P lay) = α(0.34 + 0.25) = 0.59

P r(G2 = Not P lay) = α(0.16 + 0.25) = 0.41

The probabilities computed for the unobserved conditions are not in accordance with the probabilistic

findings reported by Tversky & Shafir [198]. It was empirically observed that P r(G2 = play) = 0.42 instead of the probability of 0.59, which was computed. A classical Bayesian Network representing the Two Stage Gambling game in its most straightforward way cannot predict these paradoxical findings, because of the limitations of set theory.

6.2 Quantum-Like Bayesian Networks

A quantum-like Bayesian Network can be defined by a directed acyclic graph structure in which each node represents a different quantum random variable and each edge represents a direct influence from the source node to the target node. The graph can represent independence relationships between variables, and each node is associated with a conditional probability table that specifies a distribution of

92 quantum complex probability amplitudes over the values of a node given each possible joint assignment of values of its parents. In other words, a quantum-like Bayesian Network is defined in the same way as classical network with the difference that real probability values are replaced by complex probability amplitudes [191]

Figure 6.2 corresponds to the quantum-like Bayesian Network representation of the Two Stage Gam- bling problem presented in Section 4.4.1.

Figure 6.2: Quantum-Like Bayesian Network representation of the average results reported over the literature for the Two Stage Gambling Game (Section 4.4.1, Table 4.5). The ψ(x) represents a complex probability amplitude.

6.2.1 Quantum Random Variables

In quantum theory, random variables are associated to a set of N quantum systems V = {v1, v2, ..., vN }, each associated with a Hilbert space with a specific dimension[124]. Consequently, all values contained in the conditional probability tables associated to the random variables are complex numbers.

Considering the quantum-like Bayesian Network represented in Figure 6.2, the node G1 can be represented in the following way:

1 1 |G1i = eiθG1 √ |P layi + eiθG1 √ |Not P layi 2 2

Where P lay and Not P lay are column vectors corresponding to the basis of the subspace defined by the random variable G1:

    1 0 |P layi =   |Not P layi =   0 1

Since our goal is to compare the quantum Bayesian Network with its classical counterpart, we will convert the conditional probability tables in Figure 6.2 into conditional amplitude tables. That is, we simply convert classical probabilities into complex amplitudes using Born’s rule.

iθA 2 iθA p P r(A) = |e ψA| → ψA = e P r(A) (6.4)

93 6.2.2 Quantum State

The representation of a general state in a Bayesian Network can be described by a bipartite state.

Suppose that H = HX ⊗ HY is a Hilbert space defined by the composition of three Hilbert spaces HX and HY . A quantum state SXY is designated bipartite if it can be specified with respect to the random variables X and Y . For Xi and Yj as basis in HX and HY , respectively, the bipartite state is given by Equation 6.5.

X SXY = ψXiψY jXi ⊗ Yj (6.5) i,j

In Equation 6.5, ψXiψY j corresponds to the complex probability amplitudes of states Xi and Yj, respectively. The states Xi and Yj correspond to column vectors representing basis vectors:

    1 0 X0 = Y0 =   , X1 = Y1 =   0 1

Considering the quantum-like Bayesian Network represented in Figure 6.2, the general quantum state is given by the quantum bipartite state:

X SG1,G2 = ψ(G1 = i)ψ(G2 = j)|ii ⊗ |ji (6.6) ij

Expanding Equation 6.6,

i(θwin+θplay ) |SG1,G2i = ψ (G1 = win, G2 = play) e |W in P layi+

+ψ(G1 = win, G2 = nplay)ei(θwin+θnplay )|W in NotP layi+ (6.7) +ψ (G1 = lose, G2 = play) ei(θlose+θplay )|Lose P layi+

+ψ (G1 = lose, G2 = nplay) ei(θlose+θnplay )|Lose NotP layi

Substituting the values of ψ (G1 = i, G2 = j) for the values complex probability amplitudes in the conditional probability tables of the quantum-like Bayesian Network in Figure 6.2, we obtain:

√ √  √ √  i(θwin+θplay ) i(θwin+θnplay ) |SG1,G2i = 0.5 × 0.68 e |W in P layi + 0.5 × 0.32 e |W in NotP layi+

√ √  √ √  + 0.5 × 0.5 ei(θlose+θplay )|Lose P layi + 0.5 × 0.5 ei(θlose+θnplay )|Lose NotP layi (6.8)

Substituting the values of ψ (G1 = i, G2 = j) for the values complex probability amplitudes in the conditional probability tables of the quantum-like Bayesian Network in Figure 6.2, we obtain:

√ √ i(θwin+θplay ) i(θwin+θnplay ) |SG1,G2i = 0.34e |W in P layi + 0.16e |W in NotP layi+ √ √ + 0.25ei(θlose+θplay )|Lose P layi + 0.25ei(θlose+θnplay )|Lose NotP layi (6.9)

94 Note that |W in P layi, |W in NotP layi, |Lose P layi, |Lose NotP layi are basis vectors of the Hilbert Space,

        1 0 0 0                  0  1  0  0 |W in P layi =   , |W in NotP layi =   , |Lose P layi =   , |Lose NotP layi =   .          0  0  1  0         0 0 0 1

In the end, the resulting state |SG1,G2i represents a quantum superposition over all possible states of the Bayesian Network. This superposition can be seen as a representation of quantum-like full joint probability distribution.

6.2.3 Quantum-Like Full Joint Distribution

The quantum-like full joint probability distribution can be defined in the same way as in a classical setting with two main differences: (1) the real probability values are replaced by complex probability amplitudes and (2) the probability value is given by applying the squared magnitude of a projection. In this sense, the quantum-like full joint complex probability amplitude distribution over a set of N random variables

ψ(X1,X2, ..., XN ) corresponds to the probability distribution assigned to all of these random variables occurring together in a Hilbert space. Then, the full joint complex probability amplitude distribution of a quantum-like Bayesian Network is given by:

N Y ψ(X1,...,XN ) = ψ(Xj|P arents(Xj)) (6.10) j=1

Note that, in Equation 11.25, Xi is the list of random variables (or nodes of the network), P arents(Xi) corresponds to all parent nodes of Xi and ψ (Xi) is the complex probability amplitude associated with the random variable Xi. The probability value is extract by applying Born’s rule, that is, by making the squared magnitude of the joint probability amplitude, ψ (X1,...,XN ):

2 P r(X1,...,XN ) = |ψ(X1,...,XN )| (6.11)

Considering the quantum Bayesian Network represented in Figure 6.2, its full joint distribution is already represented in the superposition state |SG1,G2i in Equation 6.9. This superposition state was computed according to the quantum-like joint distribution in Equation 11.25. We can, however, represent this superposition vector in a Table like it was done for the classical theory. Table 6.2) shows the applica- tion of Equation 11.15 as a complex probability amplitude distribution table, rather than a superposition state. Note that the quantum-like theory needs to obey to the unity axiom, that is the sum of the squared magnitude of each complex probability amplitude needs to sum to 1.

95 G1 G2 √ ψ (√G1,G2) √ iθwin iθplay i(θwin+θplay ) iθ1 win play √ 0.5e × 0.68e = √0.34e = √0.34e iθwin iθnplay i(θwin+θnplay ) iθ2 win not play √0.5e × 0√.32e = √0.16e =√ 0.16e iθlose iθplay i(θlose+θplay ) iθ3 lose play √ 0.5e ×√ 0.5e = √0.25e = √0.25e lose not play 0.5eiθlose × 0.5eiθnplay = 0.25ei(θlose+θnplay ) = 0.25eiθ4

Table 6.2: Fulll joint distribution of the Bayesian Newtwork in Figure 6.2 representing the average results reported over the literature for the Two Stage Gambling Game (Table 4.5). The random variable G1 corresponds to the outcome of the first gamble and the variable G2 corresponds to the decision of the player of playing/not playing the second gamble.

6.2.4 Quantum-Like Marginalisation: Exact Inference

The quantum-like marginalisation formula is the same as the classical one with two main differences: (1) the real probability values are replaced by complex probability amplitudes, (2) the probability is obtained by applying Born’s rule to the equation. More formally, given a query random variable X and let Y be the unobserved variables in the network, the marginal distribution of X is simply the amplitude probability distribution of X averaging over the information about Y . The quantum-like marginal probability for discrete random variables, can be defined by Equation 11.26. The summation is over all possible y, i.e., all possible combinations of values of the unobserved values y of variable Y . The term γ corresponds to a normalisation factor. Since the conditional probability tables used in Bayesian Networks are not unitary operators with the constraint of double stochasticity (like it is required in other works of the literature [44, 164]), we need to normalise the final scores. This normalisation is consistent with the notion of normalisation of wave functions used in Feynman’s Path Diagrams. In classical Bayesian inference, on the other hand, normalisation is performed due to the independence assumptions made in Bayes rule.

2 N X Y (6.12) P r(X|e) = γ ψ(Xk|P arents(Xk), e, y) y k=1

Expanding Equation 11.26, it will lead to the quantum marginalisation formula [137], which is composed by two parts: one representing the classical probability and the other representing the quantum interfer- ence term (which corresponds to the emergence of destructive / constructive interference effects):

|Y | 2 N X Y P r(X|e) = γ ψ(Xk|P arents(Xk), e, y = i) + 2 · I nterference (6.13) i=1 k

I nterference =

|Y |−1 |Y | N N X X Y Y ψ(Xk|P arents(Xk), e, y = i) · ψ(Xk|P arents(Xk), e, y = j) · cos(θi − θj) i=1 j=i+1 k k

Note that, in Equation 6.13, if one sets (θi − θj) to π/2, then cos(θi − θj) = 0. This means that the quantum interference term is cancelled and the quantum-like Bayesian Network collapses to its

96 classical counterpart. In other words, one can see the quantum-like Bayesian Network as a more general and abstract model of the classical network, since it represents both classical and quantum behaviour. Setting the angles to right angles means that all cosine similarities are either 0 or 1, transforming a continuous-valued system to a Boolean-valued system. Moreover, in Equation 11.27, if the Bayesian Network has N binary random variables, we will end up with 2N free quantum θ parameters, which is the size of the full joint probability distribution. Considering the quantum Bayesian Network represented in Figure 6.2, in order to compute the prob- ability of the participant playing the second gamble, P r(G2 = play), we need to sum out all the variables that are unknown, in this case, the variable corresponding to the outcome of the first gamble, G1. This is achieved by applying the marginal probability formula in Equation 11.27. We can also compute the probability of the participant playing the second gamble, without knowing the outcome of the first gamble, P r(G2 = play):

2

X P r(G2 = play) = γ ψ(G1 = g, G2 = P lay)

g∈G1

2 P r(G2 = play) = γ |ψ(G1 = win, G2 = play) + ψ(G1 = lose, G2 = play)|

√ √ 2 iθ1 iθ3 = γ 0.34e + 0.25e =

 √ 2 √ 2 √ √ 

= γ 0.34 + 0.25 + 2 0.34 0.25Cos (θ1 − θ3)

= γ [0.59 + 0.5831Cos (θ1 − θ3)]

In the same way, we can compute the probability of the participant not playing the second gamble, P r(P 2 = nplay):

2 P r(G2 = nplay) = γ |P r(G1 = win, G2 = nplay) + P r(G1 = lose, G2 = nplay)|

= γ [0.41 + 0.4Cos (θ2 − θ4)]

For normalisation purposes, we will assume that the decision-maker is subjected to the same quan- tum interference term: (θ1 − θ3) = (θ2 − θ4) = θ, simply because we need to satisfy P r(G2 = play) = 1 − P r(G2 = nplay) and consequently it would not make sense to give a different interference term to the probability P r(G2 = nplay). The normalisation factor γ, corresponds to:

1 1 γ = = 1 + 0.5831Cos (θ) + 0.4Cos (θ) 1 + 0.9831Cos (θ)

In order to explain the paradoxical results reported in Table 4.5, we need to set θ to 3.0934.

P r(G2 = play) = γ [0.59 + 0.5831Cos (3.0934)] = 0.42 (6.14)

P r(G2 = play) = γ [0.41 + 0.4Cos (3.0934)] = 0.58 (6.15)

97 In the same way, the probability of the participant playing the second gamble, given that he knows the outcome of the first gamble can be computed with the same formula. However, in this case, since there is no uncertainty, the quantum-like Bayesian Network collapses to its classical counterpart in the following way:

√ 2 √ √ √ 2 iθ1 iθ1 −iθ1 P r(G2 = play|G1 = win) = γ 0.34e = γ 0.34e × 0.34e = γ 0.34 = γ0.34

Setting the normalisation constant γ = (0.16 + 0.34)−1 = 2, then

P r(G2 = play|G1 = win) = γ0.34 = 0.68

In a similar way, when the participant knows that he lost the first gamble, then the probability of playing the second gamble is:

√ 2 √ √ √ 2 iθ1 iθ3 −iθ3 P r(G2 = play|G1 = lose) = γ 0.34e = γ 0.25e × 0.25e = γ 0.25 = 0.5

Summarising, under uncertainty the quantum-like Bayesian Network behaves like a Feynman Path Diagram. If we do not observe certain nodes, then they enter into a superposition state. One can think of each node of the network as a propagation of waves (under this decision scenario, these waves correspond to a participant’s beliefs). While performing probabilistic inferences, these waves can interfere with each other in two different ways: either through constructive or destructive inter- ferences. For example, the probability P r(G2 = play) suffered a destructive interference effect (Fig- ure 6.4), because the probability observed in the experiments is smaller than its classical counterpart:

P rquantum(G2 = play) < P rclassic(G2 = play). On the other hand, the opposite event suffered a con- structive interference (Figure 6.3): P rquantum(G2 = nplay) > P rclassic(G2 = nplay). If, however, we observe all nodes in the network, then there is no uncertainty and the waves do not propagate, making the quantum-like model collapse to the classical probability model.

Figure 6.3: Example of constructive interfer- Figure 6.4: Example of destructive interfer- ence: two waves collide forming a bigger wave. ence: two waves collide cancelling each other.

For this example, we manually found the quantum parameters θi that would lead to the paradoxical outcomes found in the Two Stage Gambling Game. It is still an open research question how to set these parameters in a dynamic way without needing to know a priori the outcome of the experimental decision

98 scenario. For this reason, the rest of this work will be dedicated to the study of different ways of setting these quantum interference terms dynamically in such a way that it produces the smallest possible error. This way, we will be turning the proposed model into a predictive framework, rather than the current models of the literature, which provide an explanatory notion to these paradoxical findings.

6.3 The Impact of the Phase θ

In the previous section, we manually set the quantum interference term in order to give the right results that explain the paradoxical findings of the Two Stage Gambling Game. But finding this interference term dynamically is a hard task and still an open research question. By dynamically, we mean that the interference term is computed in such a way that it always predicts the outcome of the decision scenario, without the need of the a priori knowledge of the the outcome of the experimental setting. The quantum interference term reaches its maximum when cos (θ) = 1, that is, when θ = 0 + 2kπ, for all k ∈ Z. This corresponds to the maximum constructive interference that a quantum-like probabilistic inference can achieve. In the same way, the smallest destructive interference is obtained when cos (θ) reaches its minimum, and that occurs when θ = π + 2kπ, for k ∈ Z. Between this interval, θ ∈ [0, π], the probability inferences computed using quantum probability theory can have a different range of all possible probability values. This means that the outcome can be anything given some range of values. For instance, in the Two Stage Gambling Game, the probability of the player playing the second gamble without knowing the outcome of the first gamble, P r(G2 = play), can be any value from the interval: P r(G2 = play) ∈ [0.4083, 0.5915]. The central question is: From this interval of possible probability outcomes, how can one choose the quantum interference term dynamically, such that it produces the observed results in the experimental findings? In this case, how can one find a dynamical way to compute θ in such a way that it gives the 42% observed in the Two Stage Gambling game? Figure 6.5 shows all possible probabilities that we can obtain for he probabilistic inference P r(G2 = play).

Figure 6.5: The various quantum probability values that can be achieved by variying the angle θ in Equation 6.14. Note that quantum probability can achieve much higher/lower values than the classical probability.

99 Figure 6.5 is also supporting the quantum information processing theory already mentioned in Chap- ter1 of this work: information is modeled via wave functions and therefore, they cannot be in a definite state (only when a final decision is made, a definite state emerges). One can look at all values that this parameter θ as all possible probabilities (or outcomes) that a player has when deciding to whether or not to play the second gamble. Since in quantum theory we model the participant’s beliefs by wave functions, then the superposed states can produce different waves coming from opposite directions that can crash into each other. When they crash, the waves can either unite or be destroyed. When they unite, it causes a constructive interference effect that will cause a bigger wave, leading to a maximum or minimum quantum probability value, depending on the phase of the wave. When the waves crash and are destroyed, then a destructive interference effect occurs. In conclusion, quantum probability enables the free choice of parameters in order to obtain a desired probability value. The Two Stage Gambling game is just a small example where the proposed quantum- like Bayesian Network could be applied. In the next section, we will analyse a more complex Bayesian Network that will determine the probability of a burglary occurring, given that the neighbours think that they heard an alarm.

6.4 A Cognitive Interpretation of Quantum-Like Bayesian Networks

The proposed quantum-like Bayesian Network can integrate human thoughts by representing a person’s beliefs in an N-dimensional unit length quantum state vector. In the same way, the proposed quantum structure is general enough to represent any other context in which there is a need to formalise uncer- tainty, including prediction problems in data fusion. In the context of Bayesian Networks, data fusion is introduced in the work of Pearl [160]. The author argues that, just like people, Bayesian Networks are structures that integrate data from multiple sources of evidence and enable the generation of a coherent interpretation of that data through a reasoning process. The fusion of all these multiple data sources can be done using Bayes theorem. When a data source is unknown, then the Bayes rule is extended in order to sum out all possible values of the probability distribution representing the unknown data source. The proposed quantum-like Bayesian Network takes advantage of these uncertainties by representing them in a superposition state, which enables the fusion of the data sources through quantum interference effects. These effects produce changes in the final likelihoods of the outcomes and provide a promising way to perform predictions more accurately according to reality. So, the Quantum Bayesian Network that is proposed in this work is potentially relevant and applicable in any behavioural situation in which uncertainty is involved.

6.5 Summary of the Quantum-Like Bayesian Network Model

Summarising, the proposed quantum-like Bayesian Network is built in a similar way as a classical net- work, with the difference that it uses quantum complex amplitudes to specify the conditional probability tables, instead of real probability values. As a consequence, the quantum-like Bayesian Network will give

100 rise to quantum interference effects, which can act destructively or constructively it the interferences are negative or positive, respectively. Algorithm1 describes the main steps to compute quantum-like inferences. Basically, a probabilistic inference consists in two major steps: the computation of the full joint probability distribution of the network and the computation of the marginal probability distribution with respect to the variable being queried. The algorithm starts by receiving a Bayesian Network represented by a set of factors specified by complex probability amplitudes instead of real probability values. A factor is a function that takes as input a set of random variables and returns all the assignments corresponding to that random variable. For instance, the full joint probability distribution of a network can be seen as a factor. The previous Burglar / Alarm network consists in four different factors, each one associated with a random variable. The algorithm also receives a set of observed variables if some conditional probability is being queried. And receives the random variable to be queried. It also receives the type of heuristic to be applied to compute the quantum interference terms, but this will be addressed with more detail in the next chapter. Given a Bayesian Network represented as set of factors, the algorithm first checks if there are any observed variables. More specifically, if the probabilistic inference is conditioned on some observed variable(s), then, for computational reasons, we set the values of the conditional probability tables, which are not consistent with the observed variables to 0. By doing this, we are computing just the probabilities of the joint probability distribution that matter for the inference process, instead of computing the entire full joint probability distribution table. Next, we compute the full joint probability distribution. This corresponds to the application of the full joint probability distribution formula described in Equation 11.25. Basically, this function performs the product for each assignment of all random variables of the network. One needs to guarantee that the full joint probability distribution obeys to the normalisation axiom, making all entries of distribution sum to one. Having the full joint distribution factor, we can perform the probabilistic inference by computing the classical marginal probability distribution and the quantum interference term. The function Fac- torMarginalization corresponds to the selection of all entries of the full joint probability distribution that match the query variable and the evidence variables (if given). It returns two vectors: (1) one corre- sponding to the entries of the full joint probability where the query variable is observed to occur (we address these probabilities as P ositiveP rob), and (2) another one corresponding to the entries of the full joint probability where the query variable is observed to not occur (NegativeP rob). The classical probability corresponds to a normalised summation of these vectors. Having the vectors with the positive and negative probabilities resulting from the marginalisation process, we can also compute the quantum-like probabilities (Algorithm2). The quantum interference formula in Equation 11.27 is given by set of two summations over the marginal probability vector. Due to normalisation purposes, we will need to compute the quantum interference term corresponding both to the positive and negative probability measures (when the query variable occurs and not occurs). The quantum interference parameter θ is computed according some given heuristic. These heuristics will be

101 Algorithm 1 Quantum-Like Bayesian Network Input: F, factor structure ObservedV ars, list of observed variables, QueryV ar, identifier of the variable to be queried, HEURISTIC NAME, heuristic to be applied to compute the quantum interferences

Output: Factor Q, corresponding to the quantum inferences, Factor C, corresponding to the classical inferences

1: /* A factor is a structure containing three lists: var, corresponds to an identifier of a random variable. It also contains the list of the parent vars. card, corresponds to the cardinality of each random variable in var. val, corresponds to the respective conditional probability table. */

2: Q ← struct(0var0, QueryV ar,0 card0, 2, val, {}); // initialise output factor structure for 3: C ← struct(var, QueryV ar,0 card0, 2, val, {}); // initialise output factor structure for classical network

4: // Observe evidence: set to 0 all factors in F that do not correspond to the evidence variables 5: F ← ObserveEvidence(F, ObservedV ars);

6: // Compute the Full Joint Probability Distribution of the Network: 7: N Y ψ(X1,...,XN ) = ψ(Xj|P arents(Xj)) j=1

8: Joint ← ComputeF ullJointDistribution(F);

9: // Marginalise the full joint probability distribution by selecting the positive and the negative assign- ments of the QueryV ar: 10: [P ositiveP rob, NegativeP rob] ← F actorMarginalization(Joint, QueryV ar);

11: // Compute classical probability factor: 12: |Y | N X Y P r(X|e) = α P r(Xk|P arents(Xk), e, y = i) i=1 k

13: C.val ← ComputeClassicalP rob(P ositiveP rob, NegativeP rob);

14: // Compute quantum probability factor 15: Q.val ← ComputeQuantumP rob(P ositiveP rob, NegativeP rob, HEURIST IC NAME);

16: return [Q, C];

addressed with more detail in the next chapters. After computing the quantum interference term, the final quantum-like probabilities are returned.

In the next sections, we will show some experiments where we applied the quantum-like Bayesian Network to complex decision scenarios.

102 Algorithm 2 ComputeQuantumProbability Input: P ositiveP rob, vector of marginal probabilities when QueryV ar occurs, NegativeP rob, vector of marginal probabilities when QueryV ar does not occur, HEURISTIC NAME, heuristic to be applied to compute the quantum interferences Output: List Q with probabilistic inference using quantum theory

1: interferencepos ← 0; 2: length assign ← length(P ositiveP rob);

3: // For all probability assignments, 4: for i = 1; 1i ≤ length assign − 1; i = i + 1 do 5: for j = i + 1; j ≤ length assign; j = j + 1 do 6: // Compute the quantum interference parameter θ according to a given heuristic function 7: heurs ← ComputeHeuristic(HEURISTIC NAME, P osAssign, NegAssign)

8: // Apply quantum interference formula: 9:

|Y |−1 |Y | N N X X Y Y ψ(Xk|P arents(Xk), e, y = i) · ψ(Xk|P arents(Xk), e, y = j) · cos(θi − θj); i=1 j=i+1 k k

10: // Compute the interference term related to the positive assignments 11: interference pos ← interference pos + 2P osAssign [i] P osAssign [j] heurs 12: 13: // Compute the interference term related to the negative assignments (for normalisation) 14: interference neg ← interference neg + 2NegAssign [i] NegAssign [j] heurs

15: end for 16: end for 17: // Compute quantum-like probabilities: classicalP rob + interference. 18: α = (sum(P osAssign) + sum(NegAssign))−1 19: classicalP rob ← [α P osAssign, α NegAssign]; 20: probP os ← classicalP rob[1] + interference pos; 21: probNeg ← classicalP rob[2] + interference neg;

22: // Normalise the results in order to obtain a probability value 23: γ ← (probP os + probNeg)−1 24: Q ← [γ probP os, γ probNeg]

25: return Q;

6.6 Inference in More Complex Networks: The Burglar/Alarm Net- work

In this section, we compare classical inferences in Bayesian Networks with the proposed quantum model. The example that we will examine correspond to a modified version of the Burglar/Alarm network, which is very used in the scope of Artificial Intelligence [161, 168].

The network proposed in the book of Russel & Norvig [168] is inspired in the following scenario. Imagine that a person installed a new burglar alarm in his house. The alarm has a big credibility in what concerns detecting burglaries. The person has two neighbours: Mary and John. They have promised to call the person, whenever they think they heard the alarm. John always calls the police, when he hears

103 the alarm, however, he sometimes confuses the sound of the alarm with the ringtone of his phone, resulting in a misleading phone call to the police. Mary, on the other hand, cannot hear the alarm very often, because she likes to hear very loud music. Given the evidence of who has or has not called the police, we want to represent in a Bayesian Network the probability of a burglary occurring [168]. Figure 6.6 represents a classical Bayesian Network to account for burglary detection. It’s quantum counterpart corresponds to Figure 6.7. In order to make a fair comparison between both networks, the quantum Bayesian Network was built in the same way as the classical one, but we replaced the real probability values by quantum complex amplitudes, just like proposed in Tucci [191] and Leifer & Poulin [124].

Figure 6.6: Burglar/Alarm classical Bayesian Figure 6.7: Quantum-like counterpart of the Bur- Network proposed in the book of Russel & glar/Alarm Bayesian Network proposed in the book Norvig [168] of Russel & Norvig [168]

We also performed a set of queries so we could compare the classical Bayesian Network with the proposed quantum interference Bayesian Network. Table 6.3 presents the final probabilities computed over the classical Bayesian Network of Figure 6.6 and Table 6.4 presents the results for the quantum Bayesian network (Figure 6.7).

Evidences Pr( Alarm = t ) Pr( Burglar = t ) Pr( JohnCalls = t ) Pr( MaryCalls = t ) No Evidence 0.0347 0.0200 0.0795 0.0339 Alarm = t 1.0000 0.5479 0.9000 0.7000 Burglar = t 0.9500 1.0000 0.8575 0.6655 JohnCalls = t 0.3927 0.2158 1.0000 0.2810 MaryCalls = t 0.7155 0.3923 0.6582 1.0000 Alarm = t, Burglar = t 1.0000 1.0000 0.7000 0.9000 Alarm = t, JohnCalls = t 1.0000 0.5479 1.0000 0.7000 Alarm = t, MaryCalls = t 1.0000 0.5479 0.9000 1.0000 Burglar = t, JohnCalls = t 0.9971 1.0000 1.0000 0.6980 Burglar = t, MaryCalls = t 0.9992 1.0000 0.8994 1.0000 JohnCalls = t, MaryCalls = t 0.9784 0.5360 1.0000 1.0000

Table 6.3: Probabilities obtained when performing inference on the classical Bayesian Network of Fig- ure 6.6.

In the Two-Stage Gambling game, we saw that the parameter θ plays an important role in the deter- mination of the final probability value of a variable. For the Burglar/Alarm Bayesian Network, in order to find the best parameter for each query, we varied θ between 0 and 2π, in steps of 0.1 radians, and

104 Evidences Pr( Alarm = t ) Pr( Burglar = t ) Pr( JohnCalls = t ) Pr( MaryCalls = t ) No Evidence 0.2122 0.0694 0.1124 0.1138 Alarm = t 1.0000 0.5479 0.9000 0.7000 Burglar = t 0.9896 1.0000 0.9999 0.9791 JohnCalls = t 0.6596 0.8380 1.0000 0.8138 MaryCalls = t 0.9018 0.9998 0.9758 1.0000 Alarm = t, Burglar = t 1.0000 1.0000 0.9000 0.7000 Alarm = t, JohnCalls = t 1.0000 0.5479 1.0000 0.7000 Alarm = t, MaryCalls = t 1.0000 0.5479 0.9000 1.0000 Burglar = t, JohnCalls = t 0.9982 1.0000 1.0000 0.7390 Burglar = t, MaryCalls = t 0.9993 1.0000 0.9138 1.0000 JohnCalls = t, MaryCalls = t 0.9883 0.6632 1.0000 1.0000

Table 6.4: Probabilities obtained when performing inference on the quantum Bayesian Network of Fig- ure 6.7. collected the θ that would maximise most variables in the network. The parameters found are detailed in Table 6.5.

Variables θ1 θ2 θ3 θ4 θ5 θ6 θ7 θ8 Alarm = t 0.00 0.20 0.00 0.80 6.20 0.50 3.10 4.30 Burglar = t 0.00 0.00 0.00 0.00 6.20 0.10 3.10 3.20 JohnCalls = t 1.90 2.30 0.00 2.30 0.50 5.50 4.50 2.40 MaryCalls = t 0.00 0.00 0.00 0.00 0.00 3.10 3.10 0.00

Table 6.5: Optimum θ’s found for each variable from the burglar/alarm bayesian network (Figure 6.6).

In the next section, we will analyse the results specified in Tables 6.3 and 6.4 for each single query.

6.7 Discussion of Experimental Results

Tables 6.3 and 6.4 present the results obtained for different queries performed over the classical Bayesian Network (Figure 6.6) and the proposed quantum-like Bayesian Network (Figure 6.7), respectively. Due to quantum constructive interferences, the proposed model was able to increase, on average, the probabilities of the query variables about 270.625%. In the case where nothing is observed, the proposed network achieves its maximum level of uncertainty: all variables are interfering with each other causing constructive interference effects on the final probabilities, because the quantum parameters were maximised. It is interesting to notice that when the classical probabilities computed for each query variable are very low, then the quantum-like Bayesian Network keeps those probabilities low as well, not allowing significant deviations from the classical setting. If events are rare under a classical decision scenario then they continue to be rare in a quantum-like decision scenario. In order to have some notion of the impact of the quantum interference parameters θ, we analysed all possible values that each query variable could have, by varying the θ parameters. Figures 6.8-6.11 were obtained in the following way: for each query, we found the set of all θ’s that would lead to a maximum and a minimum probability value. We then compared the set of θ’s and realised that in the majority of the cases there were components that shared the same parameters. Given this situation, we fixed 6 of the parameters which were common and varied the remaining parameters, leading to a 3-dimensional graph.

105 Figure 6.8: Possible probabilities when Figure 6.9: Possible probabilities when query- querying ”MaryCalls = t” with no evidence. Pa- ing ”Burglar = t” with no evidence. Pa- rameters used were: {θ1, θ2, θ3, θ5, θ7, θ8} → rameters used were: {θ1, θ2, θ3, θ5, θ6, θ7} → {0, 0, 0, 0, 3.1, 0}. Maximum probability for {0, 0, 0, 6.2, 0.1, 3.1}. Maximum probability for {θ1, θ2} → {0, 3.1}. {θ4, θ8} → {0, 3.2}.

Figure 6.10: Possible probabilities when Figure 6.11: Possible probabilities when querying ”JohnCalls = t” with no evidence. Pa- querying ”Alarm = t” with no evidence. Pa- rameters used were: {θ1, θ3, θ4, θ5, θ7, θ8} → rameters used were: {θ1, θ3, θ4, θ5, θ7, θ8} → {1.9, 0, 2.3, 0.5, 4.5, 2.4}. Maximum probability {0, 0, 0.8, 6.2, 3.1, 4.3}. Maximum probability for for {θ2, θ6} → {2.3, 5.5}. {θ2, θ6} → {0.2, 0.5}.

In what concerns the results in Table 6.6 with observed nodes, the moment we start to provide pieces of evidence to the network, the uncertainty starts to decrease. Analysing the situation when we observe that the Alarm variable is true, then an interesting phenomena occurs: the probabilities of the proposed quantum Bayesian Network collapse to the same probability values as in its classical counterpart. If one observes the variable Alarm, then the variables Burglar, MaryCalls and JohnCalls become independent from each other. This means that there is no possible way that these variables can interfere with each other. Since the variables do not provoke any interferences among them, then the interference term will be zero and will collapse to the classical probability. This means that conditional independence properties of classical Bayesian Networks are maintained in the quantum-like Bayesian Networks. This independence phenomena in quantum Bayesian Networks has also been noticed in the work of Leifer & Poulin [124], which also provides a mathematical proof of why this phenomena also occurs in a quantum setting. When we observe the Burglar variable, then, according to the scenario of the Bayesian Network, the Alarm variable should also increase. The variable JohnCalls is highly correlated with the variable Alarm. In its conditional probability table, there is a chance of 90% of the variable JohnCalls occurring when the variable Alarm is true. So in this situation, the quantum-like Bayesian Network is able to give more strength to that correlation. On average, the probabilities of all queries increased 22.8% when the

106 variable Burglar is observed, when compared to the respective classical setting. When we observe the variable JohnCalls, then it is expected that the probability of the Alarm vari- able increases as well, according to the scenario of the Bayesian Network. When we observe the variable MaryCalls the variable Alarm increases even more when compared to the situation where we observe the variable JohnCalls. Since in a classical setting MaryCalls is not highly dependent on the variable Alarm, it enabled the quantum-like model to produce more fluctuations when compared to the situation with JohnCalls. Through quantum constructive interferences, the quantum-like Bayesian Network was able to strengthen the correlations between variables when compared to its classical coun- terpart. Note that one cannot state if the quantum-like Bayesian has performed better than the classical network. We could compare the models if data was collected during some time period. Then, we would be able to infer the probabilities of occurring each variable in the collected data and check which model represents it with less errors. Without such data, which is practically impossible to obtain for the given Bayesian Network example, we can only make a study of how these quantum parameters can make an impact in probabilistic inferences. Finally, when we start to provide 2 pieces of evidence to the network, then the uncertainty levels start to decrease. Consequently, the probabilities computed in a quantum setting start to converge to the ones computed in a classical Bayesian Network. This means that, there are two situations there the proposed quantum Bayesian Network converges to its classical counterpart: (1) when the variables of the network become independent of each other and (2) when there are very low levels of uncertainty, because too many evidences were provided to the network. In the next chapter, we will present a set of heuristic that are able to dynamically attribute values to quantum parameters according to different decision scenarios.

6.8 Summary and Final Discussion

Recent work in cognitive psychology revealed that quantum probability theory provides another method of computing probabilities without falling into the restrictions that classical probability have in modelling cognitive systems of decision making. Quantum probability theory can also be seen as a generalisation of classical probability theory, since it also includes the classical probabilities as a special case (when the interference term is zero). In this chapter we presented a Bayesian Network model based on quantum probabilities. The pro- posed model can accommodate puzzling observations that the classical network fails to explain (for instance, the two-step gambling game). When the nodes of the quantum-like Bayesian Network are unobserved, then they enter in a superposition state. One can look at this state as many waves moving across in different directions. These waves can crash into each other causing waves to be bigger or to cancel each other. This is the interference phenomena that the proposed Bayesian Network offers and that has direct implications when making inferences. Therefore, the proposed network represents and simulates quantum aspects motivated by Feynman’s path integrals. Experimental results revealed that the proposed quantum Bayesian Network enables many degrees

107 of freedom in choosing the final outcome of the probabilities. If we had a real scenario, with real obser- vations, one could use the present model to fit it to the observed data, by simply tuning the parameter θ. The overall results also suggested that when the classical probability of some variable is already high, then the quantum probability tends to increase it even more. When the classical probability is very low, then the proposed model tends to keep it low as well, not allowing big variations. When there are many unobserved nodes in the network then the levels of uncertainty are very high. But, in the opposite scenario, when there are very few unobserved nodes, then the proposed quantum model tends to collapse into its classical counterpart, since the uncertainty levels are very low.

108 Chapter 7

Heuristical Approaches Based on Data Distribution

So far, we presented a general quantum-like Bayesian Network model, which performs quantum-like probabilistic inferences. However, we left as an open research question how to compute an exponen- tial number of quantum interference terms in a dynamical way without the need of having the a priori knowledge of the outcome of the experimental decision scenario. In this chapter, we propose a similarity heuristic that automatically computes this exponential number of quantum parameters through vector similarities by taking into account a probabilistic analysis of the distribution of the data of several ex- periments reported over the literature. From this data distribution, we were able to design a general heuristic that is able to accommodate violations to the Sure Thing Principle without any a priori knowl- edge concerned with the context of the decision scenario. An heuristic is a mathematical function that it generally provides good results in many decision scenarios, but at the cost of occasionally not giving us very accurate results [173]. It is important to keep in mind that it is very hard (or even impossible) to create a universal heuristic that can assign quantum parameters for different applications. That is why, in this work we focus on disjunction errors, more specifically in violations to the Sure Thing Principle.

7.1 The Vector Similarity Heuristic

The goal of this similarity heuristic is to determine an angle between the probabilistic vectors associated with the marginalisation of the positive and negative assignments of the query variable. In other words, when performing a probabilistic inference from a full joint probability distribution table, we select from this table all probabilities that match the assignments of the query variable and, if given, any observed variables. If we sum these probabilities, we end up with a final classical probability inference. If we add an interference term to this classical inference, we will end up with a quantum-like inference. In this case, we can use these probability vectors to obtain additional information to compute the quantum interference parameters. The general idea of the similarity heuristic is to use the marginal probability distributions as probability vectors and measure their similarity through the law of cosines formula, which

109 is a similarity measure well known in the Computer Science domain and it is widely used in Information Retrieval [23]. According to this degree of similarity, we will apply a mapping function with an heuristically nature, which will output the value for the quantum interference parameter θ by taking into consideration a previous study of the probabilistic distribution of the data of several experiments reported over the literature. When performing quantum-like probabilistic inferences, two steps are required: (1) the computation of a quantum-like full joint probability distribution and (2) the computation of the quantum-like marginal distribution. The quantum superposition vector, comprising all possible events, is given by the quantum full joint probability distribution already presented in Equation 11.25. The full joint probability distribution can be illustrated in table form just like it is presented in Table 7.1.

X1 ... XN ψ (X1, ..., XN ) iθ1 T ··· T ψ1 · e iθ2 T ··· F ψ2 · e ...... iθM F ... F ψM · e

Table 7.1: Table representation of a quantum full joint probability distribution.

As presented in the previous chapter, the quantum probability inference formula is composed of two parts: one representing the classical probability and the other representing the quantum inter- ference term. The interference term performs a summation over several combinations of the entries of the full joint probability distribution in groups of two random variables in each computational cycle: PN−1 PN i=1 j=i+1 |ψi| |ψj| cos (θi − θj). For each pair of variables, we will represent them as a 2-dimensional vector: one component represents the probability of ψi and the other corresponds to ψj. Moreover, the different probabilities represented in the full joint probability distribution table can be seen as the different beliefs/actions that one might have available before making a decision.

 2  2  iθi iθi ψi · e ψi · e a(X = T) =  2 b(X = F) =  2 (7.1) iθj iθj ψi · e ψj · e

Note that we always have two vectors, because the proposed Quantum-Like Bayesian network only supports binary random variables, that is, the query that it is performed to the network corresponds to a yes or no answer. In other words, one vector corresponds to the probability of the query random variable returning a positive answer, and the other corresponds to the probability of the query random variable returning a negative one. In a geometric space, these vectors are represented as in Figure 7.1. From these two vectors, similarity measures like the angles between the vectors or the distances between them can be computed. These similarity measures will be addressed in more detail in Section 7.1.1. One could ask why these feature vectors are represented by probabilities. In our model, the goal is to find a quantum parameter that can be used to compute quantum probability inferences. The only information that one has are the probability distributions of a given scenario, which are encoded in the Bayesian Network. In quantum mechanics, quantum states are always represented by unit length vectors. Since the

110 Figure 7.1: Vector representation of two vectors representing a certain state. proposed model is inspired by quantum formalisms, one might be wondering why the vectors are not unit length as well. There are two reasons for this choice. First, this representation of beliefs/actions as probabilities in feature vectors is not new, and it is a common practice in the literature [159]. Sec- ond, since our model is represented by a Bayesian Network and the vectors extracted directly from the network (through the representation of the full joint probability distribution), we do not need to have unit length vectors. Instead, this normalisation will be performed during the inference process through the computation of the normalisation factor γ. In the end, the quantum interference term is computed by computing different vector representations for each pair of random variables that are being computed (Figure 7.2). These vectors are extremely important to compute, since they will enable the calculation of different quantum θ parameters.

Figure 7.2: Illustration of the different 2-dimensional vectors that will be generated for each step of iteration during the computation of the quantum interference term.

7.1.1 Acquisition of Additional Information

It is important to note that, over the current literature, quantum parameters must be assigned manually in order to obtain a prediction. So, for different experiments, we will have disparate quantum parameters. For this reason, it is very hard to create a universal heuristic that can assign quantum parameters for different applications. In this work, we propose a heuristic that is able to perform accurate predictions for the several different experiments reported in the literature related to the Prisoner’s Dilemma Game and the Two Stage Gambling Game. The goal of this similarity heuristic is to determine an angle between the vectors a and b (Equa- tion 7.1) that can be used as the θ parameter in Equation 6.13. Moreover, by computing the Euclidean distance between vectors a and b, one can obtain vector c. Equation 7.2 shows how to obtain the

111 norm of vector c through vectors a and b (Figure 7.1). Additional information is gained by comparing the similarity between the two vectors. This new information allows one to infer hidden properties of a partic- ipant’s beliefs/actions from visible ones. This vector representation is similar to the approach proposed in the work of [166], where the authors represent a person’s beliefs/actions in an n-dimensional vector space and the similarity between the vectors is measured by a projection operator, which corresponds to the computation of the squared length of the projected vector. This is similar to our approach, since we are also computing the length between the vectors a and b.

q 2 2 2 ||c|| = ||a − b|| = (a1 − b1) + (a2 − b2) + ··· + (an − bn) (7.2)

Since we are interested in the angles that these vectors make between each other, we used trigono- metric laws, such as the law of cosines, to determine these angles. The law of cosines is given by Equations 7.3 to 7.5, where θA corresponds to the angle between vectors b and c. θB corresponds to the angle between vectors a and c. And θC corresponds to the angle between vectors a and b. Since we know the coordinates of vectors a and b, one can also compute angle θC through the similarity between a·b two vectors using the cosine similarity measure: cos (θC) = ||a||·||b|| . However, since we only know the length of vector c, we need to compare the similarity of the vectors through the law of cosines.

! ||b||2 + ||c||2 − ||a||2 ||a||2 = ||b||2 + ||c||2 − 2 · ||b|| · ||c|| · cos (θA) ⇔ θA = cos−1 (7.3) 2 · ||b|| · ||c||

! ||a||2 + ||c||2 − ||b||2 ||b||2 = ||a||2 + ||c||2 − 2 · ||a|| · ||c|| · cos (θB) ⇔ θB = cos−1 (7.4) 2 · ||a|| · ||c||

! ||a||2 + ||b||2 − ||c||2 ||c||2 = ||a||2 + ||b||2 − 2 · ||a|| · ||b|| · cos (θC) ⇔ θC = cos−1 (7.5) 2 · ||a|| · ||b||

7.1.2 Definition of the Heuristical Function

Violations to the Sure Thing principle imply a decrease in the final probability values when compared to the classical theory. This suggests that, somehow, we need to force the quantum parameters to have a destructive interference effect. This can be obtained by setting the quantum parameter to π (which is the angle that provides the smallest cosine value). The additional information that we incorporated in Figure 7.1, namely the Euclidean distance between vectors and their similarities, is translated into a triangle. This shape has a well known property that all their inner angles must sum to 180 degrees or π radians. Moreover, we would like to have a destructive interference effect that takes into account the similarity of the original vectors. Equation 7.6, shows how one can obtain this relationship.

θA + θB + θC = π ⇔ π − θC = θA + θB (7.6) θ θ + θ + π ⇔ π − C = A B 2 2

When, the similarity of the vectors is very small, that is θC is very small, then we can add a third

112 relationship:

θA + θB + θC = π ⇔ π = θA + θB

In this sense, we can formulate the general formula of the proposed similarity heuristic :

  π if φ < 0  h ( a, b ) = π − θC /2 if φ > 0.2 (7.7)   π − θC otherwise

We also came up with a similarity measure φ, which is given by the ratio between all the angles that the vectors make between them. In order words, it represents the similarity between the additional information found by manipulating the original vectors and is given by Equation 7.8.

θ θ φ = C − B (7.8) θA θA

The thresholds shown in the proposed similarity heuristic were taken by observing the data from several experiments violating the Sure Thing Principle. These include several experiments in the liter- ature of the Prisoner’s Dilemma Game and the Two Stage Gambling Game. Yukalov & Sornette [215] also did something similar. They analysed the experiments violating the Sure Thing Principle and came up with a static interference term (the Interference Quarter Law) that allows them to apply their model without knowing exactly a priori the outcome of some specific experiment. The proposed model works under similar conditions. We analysed several experiments from the literature from different games and mapped the trends of the data into a dynamic heuristic. So, in the end, the proposed model works under some rules that enables a dynamic behaviour (after all each experiment is unique, so there should be the freedom of different quantum interferences) and also enables the application of the model without specific a priori knowledge from a specific experiment. In quantum mechanics, the θ parameter corresponds to the phase of a wave. When representing a quantum state in a Hilbert space, this phase is given by the inner product between two quantum states [34]. The proposed similarity heuristic is motivated by the same idea. For two vectors representing a person’s belief/action, we find which angle (or in this case, a combination of angles) that can lead to the observed probabilities for the Prisoner’s Dilemma and for the Two Stage Gambling game.

7.1.3 Algorithm

Algorithm3 presents the pseudocode of the proposed heuristic. Given two vectors: (1) one correspond- ing to the entries of the full joint probability where the query variable is observed to occur (we address these probabilities as P ositiveP rob), and (2) another one corresponding to the entries of the full joint probability where the query variable is observed to not occur (NegativeP rob). Then, one can compute the similarity heuristic in the following way. First, one computes the euclidean distances between both vectors like it was specified in Figure 7.1.

113 Having the distances, one can use the law of cosines measure to determine the angles between all these vectors. With all this information, one can compute the heuristic measure φ of the vectors and get the output of the quantum interference parameter. In the end, the algorithm returns the cosine of this value.

Algorithm 3 SimilarityHeuristic Input: P ositiveP rob, vector of marginal probabilities when QueryV ar occurs, NegativeP rob, vector of marginal probabilities when QueryV ar does not occur, Output: interf, Quantum Interference term

1: // Compute Euclidean distances between vectors 2: normc ← norm(P osP rob − NegP rob, 2); 3: norma ← norm(P osP rob, 2); 4: normb ← norm(P osNeg, 2);

5: // Compute angles between vectors using the law of cosines 2 2 2 normb −norma+normc 6: θa ← ACos( ); 2 normc normb 2 2 2 norma−normb +normc 7: θb ← ACos( ) 2∗normc∗norma 2 2 2 norma+normb −normc 8: θc ← ACos( ); 2∗norma∗normb

9: // Compute φ θ 10: φ ← θc − b ; θa θa

11: // Apply heuristic 12: interf ← π − θc;

13: if φ < 0 then 14: interf ← π; 15: end if 16: if φ > 0.2 then 17: interf ← π − θc/2; 18: end if 19: return Cos(interf);

7.1.4 Summary

The proposed model is built based on observed data to perform quantum probabilistic inferences. We are using a similarity heuristic, which relies in the data of the Bayesian Network to indicate the param- eters that will allow us to perform quantum probabilistic inferences. One should keep in mind that this function is a heuristic: it generally provides good results in many situations (in this case, the Two Stage Gambling game, and the Prisoner’s Dilemma), but at the cost of occasionally not giving us very accurate results [173]. In sum, the proposed model works as follows:

• Definition of a quantum-like Bayesian Network containing cause / effect relationships of a given scenario. Each node of the Bayesian Network corresponds to a binary random variable and is associated to a conditional probability table. These tables represent conditional probability distri- butions, which can be converted to quantum amplitudes through Born’s rule.

114 • When performing a query to the quantum-like Bayesian Network, a set of quantum parameters will emerge, because of the application of Equation 6.13. These parameters can be determined with the similarity heuristic that takes into account similarities between vectors.

• The proposed similarity heuristic takes into account two 2-dimensional vectors. Each vector cor- responds to one assignment of the query variable (for instance, the probability of the query being true or the probability of the query being false).

• The two features of each vector correspond to each entry of the full joint probability distribution of the Bayesian Network that has the same assignment of the query variable. For instance, all entries of the distribution that have the assignment of the query variable set to true.

• After knowing the similarities that the vectors share between them, we can apply the proposed similarity heuristic given in Equation 7.7 to obtain a θ parameter that enables the computation of the final probability value of the query.

In the next sections, we will present a full example of how the proposed Quantum-Like Bayesian Network can be applied (Section 9.3). We will also present experimental results of the proposed model applied to several works of the literature concerned with the Prisoner’s Dilemma game (Section 7.3) and the Two Stage Gambling game (Section 7.4).

7.2 Example of Application

In Chapter6, we described the proposed quantum-like Bayesian network using the Two Stage Gambling Game as an example. In this section, we will demonstrate how the quantum-like Bayesian Network together with the similarity heuristic can be applied to predict the average results presented in Table 4.5 for the Two Stage Gambling game. In the previous chapter, we computed the quantum-like full joint probability distribution for the Two Stage Gambling game network (Figure 6.2). The resulting full joint probability table is summarised in Table 7.2.

G1 G2 √ ψ (√G1,G2) √ iθwin iθplay i(θwin+θplay ) iθ1 win play √ 0.5e × 0.68e = √0.34e = √0.34e iθwin iθnplay i(θwin+θnplay ) iθ2 win not play √0.5e × 0√.32e = √0.16e =√ 0.16e iθlose iθplay i(θlose+θplay ) iθ3 lose play √ 0.5e ×√ 0.5e = √0.25e = √0.25e lose not play 0.5eiθlose × 0.5eiθnplay = 0.25ei(θlose+θnplay ) = 0.25eiθ4

Table 7.2: Fulll joint distribution of the Bayesian Newtwork in Figure 6.2 representing the average results reported over the literature for the Two Stage Gambling Game (Table 4.5). The random variable G1 corresponds to the outcome of the first gamble and the variable G2 corresponds to the decision of the player of playing/not playing the second gamble.

After computing the full joint probability distribution, we need to marginalise the entries of this table in order to satisfy our query. Remember that in the Two Stage Gambling game, we want to compute the probability of the participant wanting to play the second gamble, given that the outcome of the first gamble is unknown, P r(G2 = play). The marginalisation consists in selecting the entries of the full

115 joint distribution that contain the assignment play for the variable G2. This means that the first and the third row will be selected and their squared results will be put in a vector, G2play. For normalisation purposes, we will also need to perform the marginalisation for P r(G2 = nplay). This corresponds to the second and forth row of the full joint probability table and the results will be put in another vector, G2nplay (Equation 7.9).

 √ 2    √ 2   0.34 · ei·θA 0.34 0.16 · ei·θB 0.16 G2 = = G2 = = (7.9) play  √ 2   nplay  √ 2   0.25 · ei·θC 0.25 0.25 · ei·θD 0.25

Having these vectors, we can start computing all the information we need for the heuristic similarity measure φ. That is, we can compute the Euclidean distances between the vectors and their respective angles. Since we only have two random variables, we only need to compute one θ parameter. This parameter can be obtained by directly by first computing the Euclidean distance between G2play and

G2nplay , and by computing the inner angles of the resulting triangle (Figure 7.3).

Figure 7.3: Vector representation of vectors G2play and G2nplay plus the euclidean distance vector c.

The norm of vector c is given by the euclidean distance between G2play and G2nplay .

q 2 2 ||c|| = ||G2play − G2nplay || = (0.34 − 0.16) + (0.25 − 0.25) = 0.1800 (7.10)

The norm of vectors G2play and G2nplay is given by:

p 2 2 p 2 2 ||G2play || = 0.34 + 0.25 = 0.4220 ||G2nplay || = 0.16 + 0.25 = 0.2968 (7.11)

The inner angles of the triangle formed by vectors G2play and G2nplay and c can be computed from the law of Cosines presented in Equations 7.12 to 7.14.

2 2 2 ! −1 ||G2nplay || − ||G2play || + c θA = cos = 2.1401 (7.12) 2 · c · ||G2nplay ||

2 2 2 ! −1 ||G2play || − ||G2nplay || + c θB = cos = 0.6340 (7.13) 2 · c · ||G2play ||

116 2 2 2 ! −1 ||G2play || + ||G2nplay || − c θC = cos = 0.3675 (7.14) 2 · ||G2play || · ||G2nplay ||

Given that θC − θB = −0.124555, then the final quantum θ parameter can be computed by using the θA θA third condition of Equation 7.7 θ = π (7.15)

In order to compute the inference P r(G2 = play), we also need to compute the opposite probability, that is, P r(G2 = nplay). Equation 7.16 represents quantum amplitudes through the symbol ψ.

P r(G2 = play) = γ[P r(G1 = win, G2 = play) + P r(G1 = lose, G2 = play)+

+ 2 |ψ(G1 = win, G2 = play)| |ψ(G1 = win, G2 = play)| Cos (π)] (7.16) h √ √ i P r(G2 = play) = γ 0.34 + 0.5 − 2 0.34 0.25 (7.17)

Computing the probability of P r(G2 = nplay) in the same way, we obtain:

P r(G2 = play) = γ · 0.0069 P r(G2 = nplay) = γ · 0.0100 (7.18)

The normalisation factor corresponds to:

1 1 γ = = = 59.1716 (7.19) 0.0069 + 0.0100 0.0169

The final probabilities are given by Equation 7.19. Note that in Table 4.5, the observed probability of a participant choosing to play was 0.42. The proposed quantum-like Bayesian Network estimated this probability to be approximately 0.41, which corresponds to a fit error percentage of 1.15%.

P r(G2 = play) = 0.4085 P r(G2 = nplay) = 0.5915 (7.20)

7.3 Similarity Heuristic Applied to the Prisoner’s Dilemma Game

In this section, we apply our model to predict the results obtained for the Prisoner’s Dilemma game for several works in the literature. It is common (and good) practice in cognitive science to compare the results of one’s model to the results of leading comparable models. The fit error percentages that we present in the following sections would be much easier to interpret if there could be other models to compare with. However, we cannot perform this comparison directly, because the current models of the literature only work for isolated experiments, just like it was shown for the Quantum Dynamical Model (Section 5.4) and the Quantum-Like Approach (Section 5.3). That is, each time there is a new experiment, the parameters of their respective models would need to be tuned manually in order to perform correct predictions. We propose a general and scalable framework that is able to perform predictions in several different setting

117 with small amounts of fit errors. In this sense, we modelled each result reported in Table 8.2 with the proposed Bayesian Network and using the proposed similarity heuristic. We obtained the results that are presented in Figure 7.4.

Figure 7.4: Comparison of the results obtained, for different works of the literature concerned with the Prisoner’s Dilemma game.

For a more detailed analysis of Figure 7.4, Table 7.3 shows the quantum θ parameters that were computed for each experiment and the quantum parameter that would be expected to achieve a 0% fit computed probability error. The fit error is a percentage value and was computing in the following way: (1− observed probability )∗ 100. In Table 7.3, the term computedprobability corresponds to the column Pr(Defect ) predicted and the term observed probability corresponds to the column observed probability.

Literature Expected θ Computed θ Pr( Defect) Pr(Defect ) predicted Fit Error Shafir & Tversky [172] 2.8151 2.8057 0.6300 0.6408 1.71 Li & Taplin [125]b 3.3033 3.0121 0.7200 0.7122 1.09 Busemeyer et al. [39] 2.9738 3.3628 0.6600 0.7995 21.13 Hristova & Grinberg [86] 2.8255 2.7400 0.8800 0.8968 3.01 Average 2.8718 2.7393 0.6400 0.7208 12.63

Table 7.3: Analysis of the quantum θ parameters computed for each work of the literature using the pro- posed similarity function. Expected θ corresponds to the quantum parameter that leads to the observed probability value in the experiment. Computed θ corresponds to the quantum parameter computed with the proposed heurisitc. b corresponds to the average of all seven experiments reported.

In Table 7.3, one can see that the proposed similarity heuristic was able to perform good approxima- tions to the data. The dynamical heuristic enabled to perform different estimations of quantum interfer- ence effects for different decision problems. However, since it is an heuristic, it can sometimes lead to overestimations, which was the case in the work of Busemeyer et al. [39]. These overestimations occur due to the sensitivity of the quantum parameters. That is, a small change in a quantum parameter will lead to a completely different probability value. This will be discussed in more depth in Section 7.3.2. As one might have noticed, the work of Crosson [55] was not taken into account in the analysis of these results. We decided to analyse these results in the next section, because they contained properties that were different from the remaining works. In Crosson [55], the participants were never told about the actions of the other player. The author asked for the participants to first try to guess what action the other player chose and then make a decision. In another setting, participants were just asked

118 to make a decision.

7.3.1 The Special Case of Crosson’s (2009) Experiments

In work of Crosson [55], we used the results reported for the first two payoff matrices tested in their work and performed the average of the results. When trying to compute the optimum quantum θ parameter that would lead to the computation of the probability with a 0% fit error, we could not find any. There was no possible parameter that could be obtained from the two feature vectors representing the probability of choosing either a Defect action or a Cooperate action.

Crosson [55] Guessed to Defect Guessed to Cooperate Unknown Unknown Predicted Violation of STP Game 1 0.1700 0.6800 0.2250 0.5877 no Game 2 0.4700 0.6500 0.3750 0.4390 yes Average 0.6700 0.32 0.3000 0.5053 yes

Table 7.4: Results for the two games reported in the work of Crosson [55] for the Prisoner’s Dilemma Game for several conditions: when the action of the second player was guessed to be Defect (Guessed to Defect), when the action of the second player was guessed to be C ooperate (Guessed to Collaborate), and when the action of the second player was not known(Unknown).

As a first thought, we noticed that the average of the results could be the cause of such impossibility, because they were not the true probabilities of the events reported. So, we decided to analyse the outcome of each experiment of the work of Crosson [55] individually. Table 7.4 specifies those results. We again analysed the individual results of Table 7.4, and again, we could not find any quantum θ parameter that would lead to the computation of probabilities with a 0%. On the contrary, the minimum fit errors found were 64.89%, 83.25% and 17.06% for Game 1, Game 2 and the Average of these games, respectively. Figure 7.5 present all possible probabilities that can be computed using the quantum law of total amplitude.

Figure 7.5: Possible probabilities that can be obtained from Game 1 (left), Game 2 (center) and the average of the Games of the work of Crosson [55], using the quantum law of total probability.

Analysing Game 1 (Figure 7.5, (left)), the probability that leads to the smallest fit error is obtained when both θ parameters are set to zero, with a probability of 0.4123. The observed probability reported in this experiment corresponds to 0.2250, leading to a computed fit error of 64.69%.

For Game 2 (Figure 7.5, (center)), when θ1 = 0 and θ2 = π, we obtain the probability that leads to the smallest fit error, which is 0.4390, with a fit error of 83.25 %. When computing the average of both games (Figure 7.5, (right)), the quantum θ parameters found were θ1 = 0 and θ2 = 0. This leads to a probability of 0.4947, corresponding to a fit error of 17.06%.

119 7.3.2 Analysing Li’s et al. (2002) Experiments

Table 7.5 specifies the results collected by Li & Taplin [125], which corresponded to the average of the results obtained in seven different experiments for the Prisoner’s Dilemma game. In this section we analyse each of these seven experiments, by trying to predict their outcome using the proposed Bayesian Network.

Li & Taplin [125] Known Defect Known Cooperate Unknown Classical Probability Violation of STP Game 1 0.7333 0.6670 0.6000 0.7000 yes Game 2 0.8000 0.7667 0.6300 0.7833 yes Game 3 0.9000 0.8667 0.8667 0.8834 no Game 4 0.8333 0.8000 0.7000 0.8167 yes Game 5 0.8333 0.7333 0.7000 0.7833 yes Game 6 0.7667 0.8333 0.8000 0.8000 no Game 7 0.8667 0.7333 0.7667 0.8000 no Average 0.8200 0.7700 0.7200 0.7950 yes

Table 7.5: Experimental results reported in work of Li & Taplin [125] for the Prisoner’s Dilemma game for several conditions: when the action of the second player is known to be Defect (Known to Defect), when the action of the second player is known to be C ooperate (Known to Collaborate), and when the action of the second player is not known(Unknown). The column Violations of STP corresponds to determining if the collected results are violating the Sure Thing Principle.

The results reported in the experiments conducted by Li & Taplin [125] are presented in Table 7.5. Note that Games 3, 6 and 7 are not violating the Sure Thing Principle, because: P r ( Defect ) ≥ P r ( Unknown ) ≤ P r ( Cooperate ) or P r ( Cooperate ) ≥ P r ( Unknown ) ≤ P r ( Defect ). Addition- ally, the results reported for the unknown condition in Games 3, 6 and 7 are very close to the classical probability theory. The goal of the study performed by Li and Taplin was to question if there was really violations of the Sure Thing Principle under the Prisoner’s Dilemma game. According to Table 7.5 three of the seven experiments did not show a violation, and reported results very similar to the classical probability theory. By applying the proposed quantum-like Bayesian Network each game in Table 7.5, we obtained the results illustrated in Figures 7.6.

Figure 7.6: Comparison of the results obtained, for different experiments reported in the work of Li & Taplin [125] in the context of the Prisoner’s Dilemma game.

120 The experiments that achieved the highest fit error rates correspond to Games 2 and 6. Game 6 corresponds to a situation where the Sure Thing Principle was not being violated. This leads to the conclusion that the proposed Bayesian Network can also predict classical probabilities, but with some fit errors.

Li & Taplin [125] Expected θ Computed θ Unknown Unknown Predicted Fit Error % Game 1 3.0170 2.9845 0.6000 0.6313 5.21 Game 2 3.0758 3.0436 0.6300 0.7011 11.28 Game 3 2.8052 2.9810 0.8667 0.8113 6.39 Game 4 3.2313 3.0306 0.7000 0.7341 4.87 Game 5 2.8519 2.8511 0.7000 0.7006 0.08 Game 6 1.5708 2.9350 0.8000 0.7169 10.39 Game 7 3.7812 2.7365 0.7667 0.7159 6.63 Average 3.3033 2.9888 0.7200 0.7122 1.09

Table 7.6: Experimental results reported in work of Li & Taplin [125] for the Prisoner’s Dilemma game. The entries highlighted correspond to games that are not violating the Sure Thing Principle. Expected θ corresponds to the quantum parameter that leads to the observed probability value in the experiment. Computed θ corresponds to the quantum parameter computed with the proposed heurisitc.

Table 7.6 shows the quantum parameters that were computed and compares them with the pa- rameters that would be expected in order to obtain the smallest fit error percentage. One thing worth mentioning in the computation of these quantum parameters is their sensitivity. Consider the row of Ta- ble 7.6 addressing the results of Game 2. The difference between expected quantum parameter with the one that was computed using the similarity heuristic corresponds to a difference of just 0.0322. However, this small difference introduced a fit error of almost 11.28% in the computation of the final probabilities. Figure 7.7 illustrates the relation between the quantum θ parameter and the final probabilities that can be obtained in Li’s Game 2, Game 6 and the work of Busemeyer et al. [39].

Figure 7.7: Possible probabilities that can be obtained in Game 2 of the work of Li & Taplin [125] (left). Possible probabilities that can be obtained in Game 6 of the work of Li & Taplin [125] (center). Possible probabilities that can be obtained in the work of Busemeyer et al. [39] (right).

Small changes in the θ parameters can lead to a completely different probability outcomes. This has some relation with deterministic chaos, in which small differences in initial conditions yield widely diverg- ing outcomes in a system. This chaos suggests how difficult the task of predicting human decisions is and how random it can be [181].

121 7.4 Similarity Heuristic Applied to the Two Stage Gambling Game

We applied the proposed Bayesian Network to each of the works reported in Table 4.5, and obtained the results illustrated in Figures 7.8 and 7.9.

Figure 7.8: Comparison of the results ob- tained, for different works of the literature con- Figure 7.9: Error percentage obtained in each cerned with the Two-Stage Gambling game. experiment of the Two Stage Gambling game.

The overall results reported very small errors. The highest error percentage achieved was 16.3% and corresponds to the work of Kuhberger et al. [120]. Once again, this work is not showing a violation to the Sure Thing Principle, enhancing the previous conclusion that the proposed quantum like Bayesian Network works best in situation where this violation exists. In what concerns the work of Lambdin & Burdsal [122], just like with the work of Crosson [55], we could not predict the observed probabilities by using the quantum framework. Figure 7.10 show all possible probabilities that can be obtained by varying the quantum parameters. As one can see, the minimum value that we can obtain corresponds to 0.4593. However, the observed probability reported by Lambdin & Burdsal [122] corresponds to 0.41. This leads to an error of 12.02%.

Figure 7.10: Possible probabilities that can be obtained in the work of Lambdin & Burdsal [122]. The probabilities observed in their experiment and the one computed with the proposed quantum like Bayesian Network are also represented.

122 In the work of Busemeyer et al. [45], the authors applied the quantum dynamical model to repro- duce the results obtained for the Two Stage Gambling Game and also explored the use of Hierarchical Bayesian methods to estimate the values of quantum parameters to simulate the player’s personal pro- file: risk aversion, loss aversion, memory and choice. In the recent work of Busemeyer et al. [43], the authors also compare the quantum model with a classical model using Bayes factor. They concluded that the quantum approach was preferred.

7.5 Comparing the Similarity Heuristic with other Works of the Lit- erature

In this section, we compare the results obtained with the proposed Quantum-Like Bayesian Network with the Quantum Prospect Decision Theory [215]. From all the analysed models, this is the only one that can be called predictive due to its static heuristic: the Interference Quarter Law. The reason why we proposed a dynamic heuristic is because every decision problem is different and, consequently, quantum interference effects should also be different and not static. In the Quantum Prospect Decision Theory, the quantum interference term is fixed by the Interference Quarter Law, that is, the quantum interference term in the law of total probability is fixed to 0.25. In the current model, since each decision problem is different, the proposed heuristic will compute a quantum θ parameter through similarities that the vector make between each other and these vectors are constructed from the experimental data. So, the vectors take into account the properties of each experiment, making it possible to compute different quantum interference terms for different decision problems. Table 7.7 shows the results obtained for the Quantum Prospect Decision Theory and for the Quantum- Like Bayesian Network for the different works of the literature that tested violations to the Sure Thing Principle in the Prisoner’s Dilemma Game and the Two Stage Gambling Game.

Literature Pr( Defect ) Observed Pr( Defect ) Computed (QPDT) Fit Error Pr( Defect ) Computed (QLBN) Fit Error Shafir & Tversky [172] 0.6300 0.6550 0.0397 0.6408 0.0171 Li & Taplin [125]b 0.7200 0.5450 0.2431 0.7122 0.0108 Busemeyer et al. [39] 0.6600 0.6250 0.0531 0.7995 0.2113 Hristova & Grinberg [86] 0.8800 0.7000 0.2045 0.8968 0.0191 Tversky & Shafir [198] 0.3700 0.3850 0.0405 0.3641 0.0159 Kuhberger et al. [120] 0.4800 0.3450 0.2813 0.4018 0.1629 Lambdin & Burdsal [122] 0.4100 0.2900 0.2927 0.4085 0.0037 Average Fit Error – – 0.1651 – 0.0630

Table 7.7: Comparison between the Quantum Prospect Decision Theory (QPDT) model and the pro- posed Quantum-Like Bayesian Network (QLBN) for different works of the literature reporting violations to the Sure Thing Principle. b corresponds to the average of all seven experiments reported.

In the end, the results from Table 7.7 demonstrate that, in general, the proposed Quantum-Like Bayesian Network together with the dynamic heuristic managed to fit the observed results in the several different experiments with an average fit error of 6.3%, whereas the Quantum Prospect Decision Theory achieved an average fit error of 16.51%. One needs to take into account that in the Quantum Prospect Decision Theory and in the proposed

123 Quantum-Like Bayesian Network, heuristics are used to estimate the quantum interference effects. This means that the heuristic can lead to a good fit of the data most of the times, but, in some cases, it can lead to completely wrong results. In the Quantum Prospect Theory, for instance, one can see the static Interference Quarter Law heuristic performed several estimations with big errors. The same is applied to the proposed Quantum-Like Bayesian Network. The difference is that this last model makes use of dynamic heuristics. Table 7.7 shows that the proposed dynamic heuristic overestimated the results in the works of Busemeyer et al. [39] and Kuhberger et al. [120]. This also happens due to the sensitivity of the θ parameters already discussed in Figure 7.7

Literature Pr( Defect ) Observed Pr( Defect ) Computed (QPDT) Fit Error Pr( Defect ) Computed (QLBN) Fit Error Game 1 0.6000 0.4502 0.2497 0.6313 0.0522 Game 2 0.6300 0.5333 0.1535 0.7011 0.1129 Game 3 0.8667 0.6334 0.2692 0.8113 0.0639 Game 4 0.7000 0.5667 0.1904 0.7341 0.0487 Game 5 0.7000 0.5333 0.2381 0.7006 0.0009 Game 6 0.8000 0.5500 0.3125 0.7169 0.1039 Game 7 0.7667 0.5500 0.2826 0.7159 0.0663 Average Fit Error – – 0.2423 – 0.0641

Table 7.8: Comparison between the Quantum Prospect Decision Theory (QPDT) model and the pro- posed Quantum-Like Bayesian Network (QLBN) for all the different experiments performed in the work of Li & Taplin [125].

We also applied the Quantum Prospect Theory and the proposed Quantum-Like Bayesian Network to all experiments performed in the work of Li & Taplin [125]. Table 7.8 shows again great discrepancies between the average fit error obtained with the static heuristic of the Quantum Prospect Decision Theory. In general, the proposed model manages to fit all the different seven experiments with an average fit error of 6.41%, whereas the Quantum Prospect Decision Theory achieved an error of 24.23%. Most of the times, the Interference Quarter Law managed to produce lower estimations of the results observed during the several experiments. This shows that having a dynamical heuristic that is able to adapt to the different decision problems brings advantages in terms of predictive effectiveness.

7.6 Summary and Final Discussion

So far we proposed an alternative quantum structure to perform quantum probabilistic inferences to ac- commodate the paradoxical findings of the Sure Thing Principle. We proposed a quantum-like Bayesian Network, which consists in replacing classical probabilities by quantum probability amplitudes. How- ever, since this approach suffers from the problem of exponential growth of quantum parameters, in this chapter we propose a similarity heuristic that automatically fits quantum parameters through vector similarities. This makes the proposed model general and predictive in contrast to the current state of the art models, which cannot be generalised for more complex decision scenarios and that only provide an explanatory nature for the observed paradoxes. With so many models proposed over the literature, on might ask why do we need another quantum- like model to explain violations to the Sure Thing Principle. The answer can be summarised by the fact that most of the models that have been proposed in the literature cannot be considered predictive. Most of these models require a set of quantum parameters to be fitted and, so far, the only way these models

124 have to fit the parameters is to use the final outcome of the experiment to set the parameters in order to explain the experimental outcome. There is, however, one model in the literature that proposed a static heuristic to compute the quantum interference effects and can be called predictive. This model is the Quantum Prospect Decision Theory, proposed by Yukalov & Sornette [215].

Since each decision problem is different, we believe that a quantum decision model would benefit from a dynamic heuristic that could take into account the decision problem’s settings and come up with estimations for the quantum interference parameters. In the proposed model, quantum parameters are found based on the correlations that the vectors share between them. These correlations are explored through vector similarities that are computed using the Law of Cosines in a vector space. In this sense, we suggest that the quantum parameters that arise from interference effects might represent some degree of similarity between events. The previous work of Moreira & Wichert [140] point out this semantic relation between vectors. In the end, the proposed model can be seen as a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous Quantum Dynamical Model [164] and Quantum-Like Approach [108] models proposed in the literature. The method makes use of the principles of Bayesian Networks, in order to obtain a more general and scalable model that can produce competitive results over the current state of the art models. Experimental data demonstrated that the proposed heuristic managed to produce accurate fits to the data, overcoming the previously proposed Quantum Prospect Theory. This suggests that taking into account a dynamic estimation of quantum parameters is a good direction to build quantum-like predictive models.

125 126 Chapter 8

Heuristical Approaches Based on Contents of the Data

In this chapter, we present an heuristic to model the Categorisation / Decision experiment from Buse- meyer et al. [41] with a Quantum-Like Bayesian Network by representing objects (or events) in an ar- bitrary n-dimensional vector space, enabling their comparison through similarity functions. The com- puted similarity value is used to set the quantum parameters in the Quantum-Like Bayesian Network model. Just like in the work of Pothos et al. [166], we are not restricting our model to a vector in a two-dimensional space, but to an arbitrary multidimensional space. Previously in the literature, Busemeyer et al. [41] studied the differences between a classical Markov and a quantum dynamical model in order to explain some violations of the law of classical probabil- ity theory in a categorisation experiment. Participants were presented with a set of digitally modified images of faces. Then, they had to first to categorise the face as Good or Bad and then perform the decision to either W ithdraw or Attack. In the end, the proposed quantum dynamical model was able to accommodate the violations of the laws of classical probability theory by fitting the quantum parameters. This work demonstrated that quantum theory could be applied to build more general models to explain paradoxical situations found in cognitive psychology. More recently, more experiments to investigate the impact of quantum interference effects under the categorisation experiment have been performed in the work of Wang & Busemeyer [205]. In this chapter, we propose an alternative way to accommodate the paradoxical findings detected in the experiments of Busemeyer et al. [41], Townsend et al. [186] that takes only into account the contents of the images and their vector similarities. The similarity is used to fit quantum interference parame- ters in the Quantum-Like Bayesian Network model. The main advantage of the proposed Quantum-Like Bayesian Network towards other cognitive models is its predictive nature and its scalability. By scalability we mean that the network structure of the proposed model is able to model more complex decision sce- narios (scenarios that are modelled with several random variables). Moreover, through the representa- tion of objects (or events) by their contents, one is able to perform vector similarities in an n-dimensional vector space and compute quantum parameters.

127 Approaching this categorisation / Decision experiment under a quantum probabilistic point of view is also important for several reasons. For instance, in the work of Pothos & Busemeyer [164], the authors showed that a classical Markov model could not explain the violations to the Sure Thing Principle found in the experiment. Of course, one could always model a Markov model with extra hidden states and parameterisations to model these violations. However, this would lead to an exponential increase in complexity. Quantum probability theory is important for this reason. The geometric representation of events, which is present in quantum probability, does not exist in a classical setting. The main advantage of this geometrical representation is the ability of allowing the rotation from one basis into another in order to contextualise events and interpret events, providing great flexibility to decision-making systems. Moreover, since quantum probability deals with quantum amplitudes, the computation of probabilities introduces a nonlinearity function in the computation. In the end, the nonlinearity introduced in the computation of quantum probabilities through quantum interference effects leads to a more general, flexible and scalable model that enables the representation of violations to the Sure Thing Principle.

8.1 A Vector Similarity Model to Extract Quantum Parameters

Over the current literature, quantum parameters must be assigned manually to obtain a prediction in order to accommodate fallacies. In this work, we attempt to extend the paradigm of Quantum-Like Bayesian Networks to be predictive by representing events (in this case, images) as n-dimensional vectors and use these vector similarities to find the quantum θ parameters. This vector representation is similar to the approach proposed in the work of Pothos et al. [166], where the authors represent a person’s beliefs/actions in an n-dimensional vector space and the similarity between the vectors is measured by a projection operator, which corresponds to the computation of the squared length of the projected vector.

The reader might be thinking why we should expect that the θ parameter computed from the similar- ities of vectors should correspond to the interference term in the Quantum-Like Bayesian Network. In the book of Busemeyer & Bruza [34], it is stated that the θ parameter that arises in quantum interfer- ence effects corresponds to the phase of the angle of the inner product between the projectors of two random variables. They also state that the inner product provides a measure of similarity between two vectors (where each vector corresponds to a superposition of events). If the vectors are unit length, then the Cosine Similarity collapses to the inner product. Also, in the work of Trueblood et al. [190], the authors mention that similarity is understood to be a function of the distance between two concepts in a psychological space. Given all these relations, we can assume that the similarities computed between two vectors representing the images of faces can be used to set quantum interference parameters, since they are both computing the inner product between two random variables and, consequently, we can assume a mathematical equivalence between the θ parameters computed from similarities and the quantum θ parameters corresponding to the interference terms in the quantum Bayesian model.

128 8.1.1 Using Cosine Similarity to Determine Quantum Parameters

Cosine Similarity is a metric that measures the similarity between two n-dimensional vectors through the cosine that they share between them. It is a widely used metric in several research fields, specially in Information Retrieval [199]. Given two n-dimensional vectors A and B, the Cosine Similarity measure is given by:

A.B PN A B cosine sim(A, B) = cos(θ) = = i=1 i i (8.1) ||A|| ||B|| pPn 2 Pn 2 i=1 Ai i=1 Bi When mapping an image into an n-dimensional space, since the pixels of the images are always positive numbers between 0 and 1 (or between 0 and 255, depending on which scale), these vectors will always share a similarity angle: cos (θ) ∈ [0, 1]. This implies that when using this value in the quantum interference term in the Quantum-Like Bayesian Network inference formula (Equation 11.27), we will notice that the quantum probabilities will converge (or will be very close) to the classical probability. In the previous work of Yukalov & Sornette [215], the authors noticed that, in order to accommodate the vi- olations to the Sure Thing Principle, it was required the quantum interference term to be negative. In this sense, we applied a normalisation to the vector representation of the images of the faces such that the new re-scaled vectors belong to the interval cos (θ0) ∈ [−1, 0]. By doing this, we are re-scaling the vectors such that they cover the negative part of the vector space and to enable the occurrence of destructive quantum interferences. The re-scaling formula applied corresponds to the Min-Max Normalisation for- mula, which is widely used in many different research fields, specially in Information Retrieval [199]. Note that the applied renormalisation also enables constructive interferences. The renormalisation spans the entire vector space producing both negative and positive interferences for each image representation. In this article, we focused on the destructive interferences, because to accommodate violations to the Sure Thing Principle it is necessary these types of interferences [213].

X − min(X) Y = MinMaxNorm(X ) = i . (new − new ) + new (8.2) i i max(X) − min(X) max min min

Equation 8.2 transforms a value Xi to Yi, which fits in the range [newmin, newmax]. Figure 8.1 illustrates the re-scaling process.

Figure 8.1: Vector normalization to obtain quantum destructive interferences.

This vector representation of images (and events) also opens a new direction for the exploration of

129 semantic similarities between concepts [140, 139]. Under the quantum mechanics point of view, quantum parameters that arise from interference ef- fects represent the shift of energy waves. Under a quantum cognitive perspective, through this vector representation, they can be interpreted as correlations between events (beliefs) and the semantic rela- tionships that they share between them. In the next section, we present the experiment from Busemeyer et al. [41] and show how to apply the Quantum-Like Bayesian Network with the proposed vector similarity model in order to accommodate and predict violations to the Sure Thing Principle.

8.2 Application to the categorisation-Decision Experiment

In this section, we show how the Quantum-Like Bayesian Network model can be applied to predict the results obtained in the empirical experiments of Busemeyer et al. [41].

8.2.1 Categorisation - Decision Making Experiment

In the work of Busemeyer et al. [41], the authors analysed the formalisms of quantum mechanics in order to describe the evolution of the cognitive process from the presentation of a decision problem to the actual decision. They performed an empirical experiment based on interactions between categori- sation and decision making. This experiment served as an empirical test to compare Classical Markov models with Quantum models. The experiment showed a violation of the law of probability theory while comparing the results between the probability of choosing a decision and the probability of making a categorisation, followed by a decision. The authors proposed a Quantum Dynamical model that takes into account time evolution through the usage of Schrodinger’s¨ equation and unitary operators. A recent article from Yearsley & Pothos [209] makes an interesting discussion about the classical notion of time under a quantum mechanical perspective.

Figure 8.2: Example of Wide faces used in the Figure 8.3: Example of Narrow faces used in the experiment of Busemeyer et al. [41]. experiment of Busemeyer et al. [41].

The proposed experiment was the following. Given a set of images of faces, the participants had to categorise them as Good / Bad and had to make a decision towards that face: either to make an Attack or Withdraw. The Narrow faces had a 60% chance of belonging to the Lork group and 40% chance to the Adok group. The W ide faces, had a 40% chance of belonging to the Lork group and a 60% chance to the Adok group. Moreover, the Lork group is considered to be more hostile and therefore

130 Figure 8.4: Summary of the probability distribution of the Good / Bad faces in the experiment of Buse- meyer et al. [41]. for 70% of them the right decision to take was to Attack. For the remaining 30%, the right decision was to W ithdraw. For the Adok group, since they are considered more friendly, for 70% of the faces, the right decision was to W ithdraw and for the remaining 30% was to Attack. Figure 8.4 illustrates the distribution of the faces. The participants were divided into groups and had to perform four different tasks:

• One group had to perform first a categorisation and then make a decision (the ”C-then-D” condi- tion);

• One group had to first make a decision and then perform a categorisation (the ”D-then-C” condi- tion);

• One group had to just make the decision (”D-Alone” condition);

• One group had to just perform the categorisation (”C-Alone” condition);

The main results obtained with this experiment are discriminated in Table 8.1. In this table, Pr(G) is the probability of a participant categorising a face as Good. Pr(A | G ) is the probability of a participant deciding to Attack, given that the face was categorised as being Good. Pr(B) is the probability of a participant categorising a face as Bad. Pr( A | B ) is the probability of choosing an Attack action, given that the face was categorised as Bad. Total Prob corresponds to the total probability through the formula P r(A) P r(A|G) + P r(B)P r(A|B). Finally, Pr( A ) corresponds to the total probability observed in the experiment. In order to verify if the experiment accommodates the law of total probability, the values obtained in the columns Total Prob and Pr(A) should be similar.

C-then-D D-Alone Experiments Pr(G) Pr(A | G) Pr(B) Pr(A | B) Total Prob Pr(A) Empirical Experiment [41] Wide Face 0.82 0.36 0.18 0.53 0.39 0.39 Narrow Face 0.19 0.43 0.81 0.63 0.59 0.69

Table 8.1: Empirical data collected in the experiment of Busemeyer et al. [41].

For the Wide faces, the classical law of total probability was not violated since the probability of choosing an Attack action alone is the same as the probability of Attack, but computed using the law of total probability formula. However, when we look at the results obtained with the Narrow faces, one can see that these probabilities are significantly different. When computing the probability of making an Attack with the law of total probability, the computed probability ended up in 59%. When computing

131 the same probability is the D − Alone experiment, this probability increased to 69%. This deviation in the results suggests a violation to the Sure Thing Principle and leads to a violation of the law of total probability. Next, we present an experimental simulation of the categorisation / decision experiment performed by Busemeyer et al. [41] using Quantum-Like Bayesian Networks and the proposed vector similarity model.

8.2.2 Modelling the Problem using Quantum-Like Bayesian Networks

The results observed in the empirical experiments of Busemeyer et al. [41] can be represented in a Bayesian Network just like it is demonstrated on the left side of Figure 8.5 for Narrow faces and the right side of the same figure for Wide faces. In the Bayesian Network it is specified both classical probabilities

(P r(X)) and quantum probability amplitudes (ψx).

Figure 8.5: Representation of the Narrow faces experiment (left) and Wide faces experiment (right) in a Bayesian Network with classical probabilities and quantum amplitudes. The classical probabilities are given by P r(X) and the quantum amplitudes by ψx.

8.2.3 Computation of the Probability of Narrow Faces

The first step to compute Bayesian inference consists in calculating the quantum version of the full joint probability distribution. This corresponds to Equation 11.26. From the quantum-like full joint probability distribution, one can easily compute the probability of Attack in the following way (using Equation 11.27). Note that, in order to simplify the notation, we use the letter a instead of Attack, the letter g for Good, the letter b for Bad and the letter w for W ithdraw. Also, the γ parameter is the normalisation factor and corresponds to γ = [P r(Attack) + P r(W ithdraw)]−1:

2 P rnarrow(Attack) = γ |ψ(C = g)ψ(D = a|C = g) + ψ(C = b)ψ(D = a|C = b)| (8.3)

h 2 2 i P rnarrow(Attack) = γ |ψ(C = g)ψ(D = a|C = g)| + |ψ(C = b)ψ(D = a|C = b)| + InterfA (8.4)

132 Where,

InterfA = 2 · |ψ(C = g)ψ(D = a|C = g(| · |ψ(C = b)ψ(D = a|C = b)| · cos (θa,g − θa,b)

In order to determine the normalisation factor γ, one also needs to compute the probability P rnarrow(W ithdraw) in the same way:

2 P rnarrow(W ithdraw) = γ |ψ(C = g)ψ(D = w|C = g) + ψ(C = b)ψ(D = w|C = b)| (8.5)

 2 2  P rnarrow(W ithdraw) = γ |ψ(C = g)ψ(D = w|C = g)| + |ψ(C = b)ψ(D = w|C = b)| + InterfW (8.6) Where,

InterfW = 2 · |ψ(C = g)ψ(D = w|C = g)| · |ψ(C = b)ψ(D = w|C = b)| · cos (θw,g − θw,b)

The computation of the probabilities for the Wide faces is performed in an analogous way, but using the Quantum-Like Bayesian Network in Figure 8.5.

In order to determine the quantum parameters InterfA and InterfW , we converted all images of the dataset used in the experiment of Busemeyer et al. [41] into binary images, using the conversion threshold 0.4, which was analysed in the previous section. The quantum interference term was set using the cosine similarity between the images of the dataset. In the next section, we present with more detail how quantum interference terms were computed using the images of the dataset in the experiments of Busemeyer et al. [41] and the proposed similarity measure.

8.2.4 Computing Quantum Interference Terms

The quantum interference terms were obtained through vector similarities using a dataset of images of faces. The dataset used in our simulations is the same used in the experiments of Busemeyer et al. [41], and it consists in 17 digitally modified Narrow faces (shape of face narrowed and lips enhanced) and 17 digitally modified Wide faces (shape of face widen and lips thick). The images of the dataset used in the experiments of Busemeyer et al. [41] were in grayscale. That is, they are represented by three matrices containing pixel information for the RGB cooler scheme. In order to represent an image in an n-dimensional vector, it was required to convert the images into black and white to keep the main information of the image simple. This means that the image is represented by a single matrix in which the pixels are either 0 or 1. We made this conversion in order to obtain two different types of images: one that enhances the thickness of the lips and other features of the face (such as eyes and nose), and another one that diminishes the impact of these features. Figure 8.6 shows an example of the conversion of the main dataset image into binary (black/white) images with the features either enhanced or reduced. The main motivation of doing this was to test if the content information of the image played any role in finding the quantum θ parameters.

133 Figure 8.6: Conversion of a dataset image into a binary image. Conversion with a small threhold (left). Conversion with a high threhold (right).

In the work of Busemeyer et al. [41], the authors randomly considered a set of faces do be cate- gorised as Good or as Bad according to some digitally modified features. In our work, since we did not have access to the information of which faces were classified as Good or as Bad, we decided to perform a simulation similar to the work of Busemeyer et al. [41]. For each simulation performed, we created several samples of this dataset, in which we randomly selected 70% of the Narrow faces to be considered Bad and 30% to be considered Good, just like it was already presented in Figure 8.4. In the same way, we randomly selected 70% of the W ide faces to be Good and 30% to be Bad. For each image of the dataset, we converted the black/white face images into n-dimensional vectors. From this, we measured the similarity between every single image of the dataset with each of the faces that were classified as Bad and each of the images that were classified as Good. This would represent the categorisation performed by each participant: given a face, he/she would have to categorise it as either Good or Bad. The similarity was computed through the Cosine Similarity function (Equation 8.1). The computed value was used to set the quantum interference terms in the Quantum-Like Bayesian Network and the final probability of the participant deciding to Attack was computed through Equation 11.27. In the experiment performed by Busemeyer et al. [41], this information was randomly assigned to each face of the dataset, and based on that information, they measured the probability of a participant answering the questions correctly. So, each face a priori already had a classification attached to it. We simulated this classification by randomising the dataset 100 times and considering different faces to be either Good or Bad according to the percentages used in the original study. The mean value of the probabilities for each simulation was computed. Originally, it was this randomisation the cause of the occurrence of the levels of uncertainty in the study of Busemeyer et al. [41], leading to a violation of the laws of proba- bility theory. In other words, one can see this uncertainty as a cause for the emergence of quantum interference effects. In the same way, we use the differences between the contents of the images, when compared to an image that was previously classified as either Good or Bad (since we do not know which classifications were attributed to each face), in order to raise uncertainty and to measure the quantum effects. Note that some readers might think that the quantum interference effects occur due to the fact that two images can be incompatible. However, one should take into account that incompatibility only means that you have to represent each of the questions (under a quantum-like point of view) in different basis, it does not mean that the two images are incompatible (in the experiment, people answered two incompatible questions about the same image).

134 8.2.5 The Impact of the Conversion Threshold

Also, in this experiment, we wanted to verify how the conversion of the images influence the probabilities of an Attack action under the Narrow faces. We varied the conversion threshold from [0.2, 0.8]. Figure 8.7 shows an example of how the images vary according to the conversion threshold.

Figure 8.7: Impact of the threshold when converting an image into a binary image. Threhold ranges from 0.2 (left) to 0.8 (right).

For each threshold, we analysed their respective probability distributions using histograms. We also fitted a normal probability distribution curve in order to check if the frequency of the occurrence of choosing the action Attack is distributed around the mean value. This will play an important role in choosing which is the best conversion threshold that describes the probability distribution of our data. Figures 8.8 to 8.14 show the histograms of the experiments for each threshold with their respective normal density probability distributions. Regarding this model, a legitimate question that one can pose is concerned with the assumption that the angles computed through vector similarities can be used as quantum interference terms. The answer to this concern can be addressed by taking into account how the dataset was built. In the experiment of Pothos & Busemeyer [164], the authors performed digital modifications in the dataset in order to enhance the properties that they wanted participants to perceive during the experimental setup: (1) enhance the narrowness or wideness of the shape of the faces and (2) enhance the lips making them more thick for narrow faces or thinner for wide faces. By doing these digital modifications in the dataset, the authors were already introducing some kind of categorisation in the images of the faces: narrow faces tend to be part of the Lork group, which have a higher probability to be aggressive; and wide faces tend to be part of the Adok group, which are more friendly. By manipulating the images’ pixels, the authors were already encoding some kind of semantical / psychological information. By comparing the image vector information between a face categorised as Good with a face categorised as Bad, we are measuring the phase of the angle of the inner product between the projectors of two random variables, and this is precisely the definition of quantum interference term given in the book of Busemeyer & Bruza [34]. Moreover, the usage of vector similarities in order to represent quantum interferences has been already applied in previous literature, such as in the work of Pothos et al. [166], where the author represents concepts in a multidimensional vector space and measures similarities between them. From these histograms, one can observe that the probability distribution of Figures 8.8 and 8.13 is a little skewed. That is, the highest occurrence of the probability of choosing the Attack action is shifted either to the left or to the right of the mean value of the distribution. On the other hand, the histograms that better describe the data by a normal distribution fit are the Figures 8.10, 8.11 and 8.12. Note that,

135 Figure 8.8: Distribution of Figure 8.9: Distribution of Figure 8.10: Distribution Figure 8.11: Distribution P r(Attack) using thresh- P r(Attack) using thresh- of P r(Attack) using of P r(Attack) using old 0.2. old 0.3. threshold 0.4. threshold 0.5.

Figure 8.12: Distribution of Figure 8.13: Distribution of Figure 8.14: Distribution of P r(Attack) using threshold P r(Attack) using threshold P r(Attack) using threshold 0.6. 0.7. 0.8. in the end, for each threshold image, the mean values are located between 0.64 and 0.65. This means that, the choice of the conversion threshold of the images does not have a significant impact on the final outcome of the results. However, there are some thresholds that lead to a probability distribution closer to a normal density probability function (which is the case of Figures 8.10, 8.11 and 8.12). For this reason, we chose the threshold that leads to the higher mean value of choosing an Attack action. In this case, Figure 8.10, which corresponds to a conversion threshold of 0.4, that is, a threshold that slightly diminishes the features of the images. In Figure 8.15, it is illustrated how the probabilities are distributed for the 100 samples tested using a conversion threshold of 0.4.

8.2.6 Results and Discussion

The results obtained after running the simulations described in the above sections are presented in Table 8.2. In the experiments performed by Busemeyer et al. [41], only the Narrow faces experiment presented a violation to the Sure Thing Principle. The Wide faces experiment presented the same results as the classical probability, so no violations occurred. In the previous work of Busemeyer et al. [41], the authors present a Quantum Dynamical Model (QDM) model to perform quantum time evolution. This model requires the creation of a doubly stochastic matrix, which represents the rotation of the participants’ beliefs (it can be favouring an Attack action or a W ithdraw action). The double stochasticity is a requirement in order to preserve unit length operations

136 Figure 8.15: Probability distribution of the 100 simulations performed when converting a grayscale image into a binary one with a threshold of 0.4.

Literature Pr( Attack ) Observed Pr( Attack ) Classical Probability Pr( Attack ) QDM Error QDM Pr( Attack ) QLBN Error QLBN Narrow Faces 0.69 0.59 0.74 5.00 % 0.65 3.94 % Wide Faces 0.39 0.39 0.39 0.00 % 0.35 4.81 %

Table 8.2: Results from the application of the Quantum Like Bayesian Network (QLBN) model to the cat- egorisation / Decision experiment and comparison with the Quantum Dynamical Model (QDM) proposed in the work of Busemeyer et al. [41]. and to obtain a probability value that does not require normalisation. The time evolution of the model simulates the participants’ deliberation process, until a final decision is reached, and is modelled using Shroedinger’s¨ Equation. To obtain the observed results depicted in Table 8.2, the authors had to fit the parameters of their model to this outcome. In the end, the Quantum Dynamical Model proposed in Busemeyer et al. [41] obtained an error percentage 5.00% for the Narrow faces and 0.00% for the Wide faces experiment. Note that the Quantum Dynamical Model fits three parameters in order to estimate four data points (the first four entries of Table 8.1). With the proposed Quantum-Like Bayesian Network together with the geometric representation of events, we were able to build a quantum-like model that has a predictive nature. In a way, we also make use of quantum interference effects in order to explain the violations to the Sure Think Principle. The proposed model also has a predictive nature since the parameter fitting of the quantum model is found by geometric similarities. By predictive we mean that the model does not require any a priori knowledge about the outcome of the experiment in order to accommodate the violations to the Sure Thing Principle. In the proposed model, the contents of the images (the pixels) are represented in an N-dimensional vector space. From this representation, we computed the geometric similarity between them through the usage of the cosine similarity measure. Since the contents of the images (pixels) are always positive (ranging between 0 and 1), it was required to renormalise this information in order to obtain quantum interference effects. Taking into account this normalisation in the computation of the final quantum prob- abilities, one could predict the observed results with an error percentage of 3.94%. This preliminary result suggests that there could be a relation between quantum parameters with the semantic and geometric representation of events. We are aware that this is just a preliminary conclusion, and more experiments in this direction need to be conducted. In what concerns the Wide faces, no violations to the Sure Thing

137 Principle were reported, since the probability of Attack in the D − Alone condition was the same as the one computed using the law of total probability. In this case, the proposed model was able to predict the result with an error percentage of also 4.81%. In the end, the proposed similarity model tends to be more effective in decision scenarios that violate the Sure Thing Principle. Moreover, the Quantum-Like Bayesian did not obtain very significant error rates when compared to the Quantum Dynamical Model. This means that the proposed model tends to have a similar performance when compared to state of the art models with the advantage of being able to estimate quantum interference parameters. This makes the model general, scalable and predictive.

Regarding scalability, it is well known that the computational costs of performing probabilistic infer- ences in Bayesian Networks grow with the number random variables [116]. That is why, for very complex decision scenarios, approximative methods are used in order to perform probabilistic inferences [151]. Bayesian Networks are decision support systems that are used very frequently to model complex de- cision scenarios such as BioInformatics, in which many scenarios include dealing with an exponential number of genes [218], medical decision support, where Bayesian networks are used to compute the probability of a patient having cancer given several conditions [88], spam filters, in which the probability of some textual content being considered spam is computed [169], etc. This is what we mean by gen- eralisation. Bayesian Networks are widely accepted structures in the literature, because they can deal with a big number of random variables and be applied in different decision scenarios.

8.3 Algorithm

Algorithm4 presents the pseudocode of the proposed heuristic. The algorithm receives as input 5 elements. From the analysis of the dataset, we need to provide the upper and lower bounds of the dataset to be used in the min/max normalisation formula that we present in Equation 8.2. These upper and lower bounds represent the mean of maximum / minimum degree of similarity between all the images of the dataset and the reference images representing the categorisation of a Good face and a Bad face, respectively. Along this, we will also need to provide three vectors to the algorithm: one corresponding to the contents of a base image representing an example of a face categorised as Good (ReferenceV ecA), another vector representing an example of a face categorised as Bad (ReferenceV ecB), and a vector representation of an image we wish to categorise (DatabaseV ector).

Note that the quantum-like interference Equation 6.13, consists in the cosine function of the differ- ence between two interference parameters: Cos(θi − θj). In our approach, we obtain the interference term by first computing the cosine similarities between the input image (DatabaseV ector) with an exam- ple of an image categorised as Good and Bad and then by performing their difference. Since the pixels of images are always positive, we need to renormalise the vector space using the Min/Max Equation 8.2 in order to cover negative interferences. For this, we use the general information about the distribution of the dataset given as input: DBU PPERBOUND and DBLOWERBOUND. After renormalising the results, the computed interference is returned.

138 Algorithm 4 Content Based Heuristic Input: ReferenceV ecA, N-dimensional vector of an example of an object of type A, ReferenceV ecB, N-dimensional vector of an example of an object of type B, DatabaseV ector, N-dimensional vector representing an instance of an object of unknown type, DBU PPERBOUND, maximum similarity value obtained from prior analysis of the dataset, DBLOWERBOUND, minimum similarity value obtained from prior analysis of the dataset Output: interf, Quantum Interference term

1: // To normalise vectors between 0 and -1 2: MIN NORM ← 0; 3: MAX NORM ← −1;

4: // Apply Cosine Similarity Between database object and reference object A PN A[i]DatabaseV ectors[obj,i] 5: Cos vecA [obj] ← √ i=1 √ PN 2 PN 2 i=1 A[i] i=1 DatabaseV ectors[obj,i] 6: PN B[i]DatabaseV ectors[obj,i] 7: Cos vecB [obj] ← √ i=1 √ PN 2 PN i=1 B[i] i=1 DatabaseV ectors[obj,i] 8: 9: //Re-normalise angles in such a way they cover the entire vector space 10: Cos vec diff ← (Cos vecA − Cos vecB) Cos vec diff−DB OWER OUND 11: interf ← L B ∗ (MAX NORM − MIN NORM) − MIN NORM DBU PPERB OUND−DBLOWERB OUND

12: 13: // Return the re-normalised cosine similarity measure as the quantum interference term. 14: return interf;

8.4 Summary and Final Discussion

In this chapter, we model the Categorisation / Decision experiment Busemeyer et al. [41] with a Quantum- Like Bayesian Network, which represents objects (or events) in an arbitrary N-dimensional vector space. This enables their comparison through similarity functions. The computed similarity value is used to com- pute the quantum interference terms. Just like in the work of Pothos et al. [166], we are not restricting our model to a vector in a multidimensional psychological space, but to an arbitrary multidimensional space. In this model we are assuming that the similarities computed between two vectors representing the images of faces can be used to set quantum interference parameters, since they are both consists in the computation of the inner product between two random variables. This suggests a mathematical equivalence between the θ parameters computed from Cosine similarity and the quantum θ parameters corresponding to the quantum interference effects. This assumption comes from the book of Busemeyer & Bruza [34], where it is stated that the θ parameter that arises in quantum interference effects corre- sponds to the phase of the angle of the inner product between the projectors of two random variables. They also state that the inner product provides a measure of similarity between two vectors (where each vector corresponds to a superposition of events). If the vectors are unit length, then the Cosine Similarity collapses to the inner product. Given all these relations, we can assume that the similarities computed between two vectors representing the images of faces can be used to set quantum interference param- eters. The results of the simulations of the experiment of Busemeyer et al. [41] demonstrated that the

139 proposed heuristic was able to reproduce the experimental observations of the violations of the Sure Thing principle with a small error percentage. We are aware that this is just a preliminary result and more experiments and studies are needed towards this direction in order to verify the applicability of this type of modelling in decision problems that are violating the Sure Thing Principle.

140 Chapter 9

Heuristical Approaches Based on Semantic Similarities

In this chapter, we explore the implications of causal/dependency relationships in quantum-like Bayesian networks and also the implications of semantic similarities between events [140]. These semantic sim- ilarities add new relationships to the graphical models, which do not necessarily include direct causal relationships, but introduce semantic dependencies between random variables. We will use this ad- ditional semantic information to compute quantum interference effects in order to accommodate the violations to the Sure Thing Principle. In Nature, most events are reduced to the principle of causality, which is the connection of phenom- ena where the cause gives rise to some effect. This is the philosophical principle that underlies our conception of natural law [87]. In other words, the essence of causality is the generation and determina- tion of one phenomenon by another. Causality enables the representation of our knowledge regarding a given context through experience. By experience, we mean that the observation of the relationships between events enables the detection of irrelevancies in the domain. This will lead to the construction of causal models with minimised relationships between events (Bayesian Networks are examples of causal models) [161]. Under the principle of causality, two events that are not causally connected should not produce any effects. When some acausal events occur by producing an effect, it is called a coincidence. Carl Jung, believed that all events had to be connected between each other, not in a causal setting, but rather in a meaningful way, suggesting some kind of semantical relationship between events. This notion is known as the Synchronicity principle [87]. The proposed heuristic is inspired in the Synchronicity principle and enables the computation of an exponential number of quantum parameters in a meaningfully (semantically) way. Under this repre- sentation, quantum interferences can code semantical similarities that can generate acausal connec- tions between events, this way accommodating the paradoxical findings related to violations to the Sure Thing Principle. Note that the idea of combining Synchronicity with quantum theory is not new. For instance, Blutner & Hochnadel [27] proposed a quantum model to represent Jung’s theory of person-

141 ality. A four-dimensional Hilbert space representation with two qubits was used together with quantum projection operators. Limar [126] studied Jung’s Synchronicity and established a relation between this principle and the phenomena.

9.1 Synchronicity: an Acausal Connectionist Principle

The Synchronicity principle may occur as a single event of a chain of related events and can be defined by a significant coincidence which appears between a mental state and an event occurring in the external world [130]. Jung believed that two acausal events did not occur by chance, but rather by a shared meaning. Therefore, in order to experience a synchronised event, one needs to extract the meaning of its symbols for the interpretation of the synchronicity. So, the Synchronicity principle can be seen as a correlation between two acausal events which are connected through meaning [87]. Jung defended that the connection between a mental state and matter is due to the energy emerged from the emotional state associated to the synchronicity event [87]. This metaphysical assertion was based on the fact that it is the person’s interpretation that defines the meaning of a synchronous event. This implies a strong relation between the extraction of the semantic meaning of events and how one interprets it. If there is no semantic extraction, then there is no meaningful interpretation of the event, and consequently, there is no synchronicity [127]. It is important to mention that the Synchronicity principle is a concept that does not question or compete with the notion of causality. Instead, it maintains that just as events may be connected by a causal line, they may also be connected by meaning. A grouping of events attached by meaning do not need to have an explanation in terms of cause and effect. In this work, we explore the consequences of the synchronicity principle applied to quantum states with high levels of uncertainty as a way to provide additional information to quantum-like Bayesian Net- works, which mainly contain cause/effect relationships. Although the principles of probability are well established, such that synchronicity might be seen as the occurrence of coincidences, in the quantum mechanics realm, given the high levels of uncertainty that describe the quantum states, the coincidences or improbable occurrences happen quite often.

9.2 Combining Causal and Acausal Principles for Quantum Cog- nition

The heuristic that we propose in this chapter comprises two main stages:

• Extraction of a Semantical Network: Given the structure of a Bayesian Network representing the decision problem, we extract synchronised events through their meaning. The reason why we need to derive a semantic network comes from Jung’s definition of Synchronicity. If there is no semantic extraction of the meaning of the events, then there cannot be any meaningful

142 Figure 9.1: Encoding of the synchronised variables with their respective angles (left). Two synchronised events forming an angle of π/4 between them (right).

connection and no Synchronicity experienced. Under this paradigm, the quantum parameters can be interpreted as the semantic relation between these acausal events.

• The Semantical similarity Heuristic: After designing the Semantical Network one can identify which events of the Bayesian network share meaningful connections. The semantical similar- ity heuristic will enable the computation of the parameter θ associated to a pair of synchronised events.

9.2.1 Semantic Networks

A semantic network is often used for knowledge representation. It corresponds to a directed or undi- rected graph in which nodes represent concepts and edges reflect semantic relations. The extraction of the semantic network from the context of the decision scenario is a necessary step in order to find variables that are only connected in a meaningful way (and not necessarily connected by cause/effect relationships).

9.2.2 The Semantic Similarity Heuristic

The heuristic that we propose in this chapter seeks to correlate acausal events in quantum-like Bayesian Networks through the computation of vector similarities with semantical information. We define the semantical similarity heuristic in a similar way to the synchronicity principle: two vari- ables are said to be synchronised, if they share a meaningful connection between them. This meaning- ful connection can be obtained through a semantic network representation of the variables in question. This will enable the emergence of new meaningful connections that would be inexistent when consid- ering only cause/effect relationships. The quantum parameters are then tuned in such a way that the angle formed by these two variables, in a Hilbert space, is the smallest possible (high similarity), this way forcing acausal events to be correlated. In the case of binary variables, the semantical similarity heuristic is associated with a set of two variables, which can be in one of four possible states. The Hilbert space is partitioned according to these four states, as exemplified in Figure 9.1. The angles formed by the combination of these four possible states are detailed in the table also in Figure 9.1. In the right extreme of the Hilbert space represented in Figure 9.1, we encoded it as the occurrence

143 of a pair of synchronised variables. So, when two synchronised variables occur, the smallest angle that these vectors make between each other corresponds to θ = 0. The most dissimilar vector corresponds to the situation where two synchronised variables do not occur. So, we set θ to be the largest angle possible, that is π.

The other situations correspond to the scenarios where one synchronised variable occurs and the other one does not. In Figure 9.1, the parameter θ is chosen according to the smallest angle that these two vectors, i and j, make between each other, that is π/4. We are choosing the smallest angle, because we want to correlate these two acausal events by forcing the occurrence of coincidences between them, just like described in the Synchronicity principle. The axis corresponding to π/2 and 3π/2 were ignored, because they correspond to classical probabilities (cos (π/2) = cos (3π/2) = 0). The reason why we are taking steps of π/4 has its motivation in the interference quarter law proposed by Yukalov & Sornette [215] for their static heuristic. In their work, Yukalov & Sornette [215] computed the expectation value of the distribution of a random variable with a result of 1/4, which they named it the interference quarter law. A step-by-step application example of the synchronicity heuristic can be found in Section 9.3.

9.3 The Semantic Similarity Heuristic in the Categorisation/Decision Experiment

In the work of, the authors performed an empirical experiment based on interactions between categori- sation and decision making. This experiment served as an empirical test to compare Classical and Quantum Markov models.

The Categorisation/Decision experiment has already been presented in Chapter8 and can be sum- marised as follows. On each trial, participants were shown digitally modified pictures of faces. These faces were digitally modified under two characteristics: face width and lip thickness. This gave rise to two different face distributions: a narrow face distribution (with a narrow width and thick lips) and a wide face distribution (with a wide width and thin lips).

The participants had to categorise the faces as Good guy or Bad guy and/or choose the actions Attack or Withdraw. The following test conditions were performed: the C-then-D condition, the D-then-C condition, the D-alone and the C-alone conditions.

In the C-then-D condition, participants had to first perform a categorisation of the face and then choose an action decision. In the D-then-C condition, participants had to select an action decision and then perform the categorisation of the face. In the C-alone participants only performed the categorisation of the face, whereas in the D-alone condition, participants had to choose an action decision towards the given face. For more details of how this experiment was conducted please refer to the work of Busemeyer et al. [41] and the main results have already been presented in Table 8.1. The Bayesian Networks corresponding to these experiments were presented in Figure 8.5.

144 Figure 9.2: Representation of the Synchronicity heuristic in the Hilbert Space. Vector i corresponds to the event C = Good, D = Attack. Vector j corresponds to the event C = Bad, D = Attack. The computed angle for the Attack (left) and W thdraw (right) actions is θ = 3π/4.

9.3.1 Application of the Synchronity Heuristic: Narrow Faces

The full joint probability distribution table to the Bayesian Network correspondent to the Narrow faces experiment (left side of Figure 8.5) is given by Table 9.1.

categorisation (C) Decision (D) Pr( C, D ) √ √ψ(C,D) √ θ1 θ3 θA Good Attack 0.19 × 0.43 = 0.0817 √0.19e × √0.43e = √0.0817e θ1 θ5 θB Good Withdraw 0.19 × 0.57 = 0.1083 √0.19e × √0.57e = √0.1083e θ2 θ4 θC Bad Attack 0.81 × 0.63 = 0.5103 √0.81e × √0.63e = √0.5103e Bad Withdraw 0.81 × 0.37 = 0.2997 0.81eθ2 × 0.37eθ6 = 0.2997eθD

Table 9.1: Full joint probability distribution. Pr(C,D) corresponds to the classical probability and ψ(C,D) corresponds to the respective quantum amplitude.

Since in this small example the semantic network has the same structure as the Bayesian Network, there will be no additional edges between variables. This means the random variables corresponding to Categorisation and to Decision are semantically correlated. In order to compute the probability of the participant choosing Attack, we used the first and the third rows of the full joint probability distribution (Table 9.1). The first row corresponds to the situation where both variables occur: categorisation = Good and Decision = Attack. The third row corresponds to the situation where only one of the variables occur: categorisation = Bad (not good) and Decision = Attack. According to the proposed synchronicity heuristic, these situations can be represented by the Hilbert space on the left side of Figure 9.1. In order to compute the θ associated to the probability of making a Withdraw action, the same process is used and corresponds to the Hilbert space representation on the right of Figure 9.1. Following Figure 9.2, θ should be equal to 3π/4:

P rquantum(Attack) = γ (0.5920 + 0.4084 cos(3π/4)) = α0.3032 (9.1)

P rquantum(W ithdraw) = γ (0.4080 + 0.3603 cos(3π/4)) = α0.1532 (9.2)

Then, we can compute the normalisation factor γ and arrive at the final probabilities.

1 1 1 α = = = (9.3) P rquantum(Attack) + P rquantum(W ithdraw) 0.3032 + 0.1532 0.4564

0.3032 0.1532 P r (Attack) = = 0.6643 P r (W ithdraw) = = 0.3357 (9.4) quantum 0.4564 quantum 0.4564

145 The calculations for the wide faces Bayesian Network are performed in the same way, but using the values of the quantum-like Bayesian network in Figure 8.5 (right).

9.3.2 Results and Discussion

In the experiment of Busemeyer et al. [41], the probability observed of choosing the Attack was 0.69. Using a quantum Bayesian Network together with the synchronicity heuristic, we were able to predict the percentage of Attack to be 0.66, corresponding to a percentage error of 2.57%. In what concerns the wide faces experiment, the proposed heuristic was able to predict the percentage of a decision-maker choosing the Attack option to be 0.33%, corresponding to a percentage error 6%. When presented with a face, the decision-maker performs a subjective and personal semantical categorisation of the image. Some features of the face will trigger a response based on a personal meaningful connection between the image and the decision-maker. This response will also be enriched by the cause/effect dependencies of the Bayesian network, which correspond to the information that is given to the decision-maker, regarding the probability distribution of Good and Bad faces. By changing the interference term, in this small example, according to the assignments of the vari- ables regarding the presence / absence, variables that were verified to occur together (which were synchronised) suffered constructive interference, while the variables that were verified to not occur to- gether suffered destructive interferences. Using this information, we were able to make a prediction with a small error percentage of the categorisation/decision experiment.

Literature Works Computed Pr(A) Observed Pr(A) Error (%) Quantum Dynamical Model [41] Wide Face 0.39 0.39 0.00 Narrow Face 0.74 0.69 5.00 Proposed Algorithm (this work) Wide Face (with synchronicity) 0.33 0.39 6.00 Narrow Face (with synchronicity) 0.66 0.69 3

Table 9.2: Comparison between a Quantum Markov Model and the proposed Bayesian Network.

As far as we know, there are no data in the literature concerned with complex decision making scenarios that end up violating the Sure Thing Principle. By complex, we mean scenarios that can be modelled by at least three random variables. In the next sections, we will study the impact of applying the semantic similarity heuristic to a medical decision problem (Section 9.4) and to the well known Burglar/Alarm network (Section 9.5).

9.4 Application to More Complex Bayesian Networks: The Lung Cancer Network

In this experiment, we apply the proposed quantum model in a medical decision making scenario using a Lung Cancer Bayesian Network model (Figure 9.4) inspired by the book of Korb & Nicholson [118].

146 9.4.1 Deriving a Semantical Network

Consider the following scenario regarding Lung Cancer. Lung Cancer is a disease characterised by an uncontrolled cell growth in tissues of the lung, which can spread to other parts of the body. It is fairly known that environmental destruction, such as long-term air pollution or smoking, can cause Lung Cancer [163]. In the semantical network in Figure 9.3, Air Pollution and Smoking derive both from the same concept Environmental Destruction. Although these variables are not causally connected, they share a meaningful connection. So, these two variables become synchronised. Moreover, a person with Lung Cancer manifests several symptoms, such as chest pain, coughing, dyspnea, etc. Coughing and dyspnea, although not causally connected, share a meaningful connection, since they both derive from the concept Symptom. Therefore, these variables will also constitute another synchronised pair.

Figure 9.3: Semantic Network for the Lung Cancer Bayesian Network.

9.4.2 Inference in Quantum Bayesian Networks

After designing the semantic representation of our Bayesian Network and after extracting the synchro- nised variables, the full Bayesian Network for Lung Cancer is given by Figure 9.4. In this network, the classical probabilities and quantum probability amplitudes are represented by the functions P r and ψ, respectively. In order to make a fair comparison between both classical and quantum inferences, the quantum probability amplitudes were obtained by converting the classical real probabilities into com- plex numbers (Equation 2.4). This mapping of probabilities has also been used in other works of the literature [191, 124].

9.4.3 Results with No Evidences Observed: Maximum Uncertainty

We queried each variable of the network in Figure 9.4 without providing any observation. We per- formed the following queries: P r(Smoker = true), P r(P ollution = high), P r(Lung Cancer = positive), P r(Cough = high) and P r(Dyspnea = true). We then extracted both classical and quantum inferences and represented the results in the graph in Figure 9.5. When nothing is observed, quantum probabilities tend to increase and overcome their classical coun- terpart. All nodes of the Bayesian Network are in a superposition state and the probabilities of the nodes start to be modified due to the interference effects. If one looks at the nodes as waves cross- ing the network from different locations, these waves can crash between each other, causing them to

147 Figure 9.4: Lung Cancer Bayesian Network

Figure 9.5: Probabilities obtained using classical and quantum inferences for different queries for the Lung Cancer Bayesian Network (Figure 9.4). be either destroyed or merged. This interference is controlled by the Synchronicity principle: acausal events which are meaningful connected, become correlated. The synchronised variable pairs, (Cough, Dyspnea) suffered constructive inference. On the other hand, for the other pair of synchronised vari- ables (P ollution, Smoker), the P ollution variable suffered destructive interference, whereas Smoker suffered constructive interference. This means that, when nothing is known, this pair of variables should not be synchronised at all since they could not be correlated effectively. The variable Lung Cancer also suffered constructive interference, since it is influenced by the increase of the probabilities in its parent nodes.

9.4.4 Results with One Piece of Evidence Observed

When one starts to provide information to the Bayesian Network, then the superposition state collapses into another superposition quantum state, affecting the configuration of the remaining possible states of

148 Evidences Pr( Cough = H ) Pr( Pollution = H) Pr( Dyspnea = T ) Pr( Cancer = Pos ) Pr( Smoker = T) Cough = H 1.0000 0.2929 0.5286 0.7000 0.2791 Pollution = H 0.4019 1.0000 0.3468 0.4272 0.1500 Dyspnea = T 0.8334 0.3985 1.0000 0.9000 0.3815 Cancer = Pos 0.9217 0.4100 0.7517 1.0000 0.3900 CLASSIC Smoker = H 0.7659 0.3000 0.6640 0.8128 1.0000 Cough = H 1.0000 0.3233 0.5534 0.7000 0.3118 Pollution = H 0.4273 1.0000 0.3895 0.4730 0.1727 Dyspnea = T 0.8463 0.5113 1.0000 0.9000 0.4833 Cancer = Pos 0.9325 0.5728 0.8048 1.0000 0.5320

QUANTUM Smoker = H 0.7792 0.3507 0.7395 0.8167 1.0000

Table 9.3: Probabilities obtained when performing inference on the Bayesian Network of Figure 9.4.

the network. Moreover, by making some observation to the network, we are reducing the total amount of uncertainty and, consequently, the reduction of the waves crossing the network, as well as the number of collisions.

Table 9.3 shows that synchronised pairs tend to increase together. When variable Dyspnea is ob- served, the synchronised pairs (P ollution, Smoker) tend to increase 12.31% and 11.37% towards their classical counterparts, respectively. So, under this scenario, acausal events such as P ollution and Smoker became highly correlated due to the Synchronicity principle and had a similar growth. The same phenomena can be verified when the variable Smoker is observed. Dyspnea and Cough tend to increase 26.68% and 11, 72% towards their classical counterpart, respectively.

When the variable Lung Cancer is observed, the quantum probabilities of Dyspnea and Cough converged to their classical counterparts. This is a phenomena called conditional independence and constitutes a major property in Bayesian Networks. When Lung Cancer is observed, it does not matter the values of its child nodes, because they cannot affect the probabilities of their parents. Thus, condi- tional independence properties of classical Bayesian Networks are preserved during quantum inference. In the work of Leifer & Poulin [124], the authors present a mathematical proof of why the conditional in- dependence principle should also be consistent with quantum theory.

9.5 Application to More Complex Bayesian Networks: The Burglar / Alarm Network

Consider the Bayesian Network in Figure 9.6[168, 161]. In order to extract its semantic meaning, we need to take into account the context of the network. Suppose that you have a new burglar alarm installed at home. It can detect burglary, but also sometimes responds to earthquakes. John and Mary are two neighbours, who promised to call you when they hear the alarm. John always calls when he hears the alarm, but sometimes confuses telephone ringing with the alarm and calls too. Mary likes loud music and sometimes misses the alarm.

149 Figure 9.6: Example of a Quantum-Like Bayesian Network [168]. ψ represents quantum amplitudes. P r corresponds to the real classical probabilities.

9.5.1 Semantic Networks: Incorporating Acausal Connections

A semantic network is often used for knowledge representation. It corresponds to a directed or undi- rected graph in which nodes represent concepts and edges reflect semantic relations. The extraction of the semantic network from the original Bayesian Network is a necessary step in order to find variables that are only connected in a meaningful way (and not necessarily connected by cause/effect relation- ships), just like it is stated in the Synchronicity principle. We extracted the semantic network, illustrated in Figure 9.7, which represents the meaningful con- nections between concepts. The following knowledge was extracted. It is well known that catastrophes cause panic among people and, consequently, increase crime rates, more specifically burglaries. So, a new pair of synchronised variables between Earthquake and Burglar emerges. Moreover, John and Mary derive both from the same concept person, so, these two nodes will also be synchronised. These synchronised variables mean that, although there is no explicit causal connection between these nodes in the Bayesian Network, they can become correlated through their meaning.

Figure 9.7: Semantic Network representation of the network in Figure 9.6.

We queried each variable of the network in Figure 9.6 without providing any observation. We per- formed the following queries: P r(JonhCalls = true), P r(MaryCalls = true), P r(Alarm = true),

150 P r(Burglar = true) and P r(Earthquake = true). We then extracted both classical and quantum infer- ences and represented the results in the graph in Figure 9.8.

Figure 9.8: Results for various queries comparing probabilistic inferences using classical and quantum probability when no evidences are observed: maximum uncertainty.

Figure 9.8, shows that, when nothing is known about the state of the world, quantum probabilities tend to increase and overcome their classical counterpart. In quantum theory, when nothing is observed, all nodes of the Bayesian Network are in a superposition state. For each possible configuration in this superposition state, a probability amplitude is associated to it. During the superposition state, the amplitudes of the probabilities of the nodes of the Bayesian Network start to be modified due to the interference effects. If one looks at the nodes as waves crossing the network from different locations, these waves can crash between each other, causing them to be either destroyed or to be merged together. This interference of the waves is controlled through the Synchronicity principle by linking acausal events. When one starts to provide information to the Bayesian Network, then the superposition state col- lapses into another quantum state, affecting the configuration of the remaining possible states of the network. Moreover, by making some observation to the network, we are reducing the total amount of uncertainty and, consequently, the reduction of the waves crossing the network (Table 9.4).

Evidences Pr( Alarm = t ) Pr( Earthquake = t ) Pr( Burglar = t ) Pr( JohnCalls = t ) Pr( MaryCalls = t ) JohnCalls = t 0.2277 0.0949 0.1333 1.0000 0.1671 MaryCalls = t 0.5341 0.2033 0.3119 0.5040 1.0000 Earthquake = t 0.2966 1.0000 0.0100 0.3021 0.2147 Burglar = t 0.9402 0.0200 1.0000 0.8492 0.6587 CLASSIC Alarm = t 1.0000 0.3581 0.5835 0.9000 0.7000 JohnCalls = t 0.3669 0.1484 0.2124 1.0000 0.2321 MaryCalls = t 0.6598 0.2239 0.3474 0.6032 1.0000 Earthquake = t 0.4389 1.0000 0.0124 0.4012 0.2403 Burglar = t 0.9611 0.02 1.0000 0.8583 0.6337 QUANTUM Alarm = t 1.0000 0.3431 0.5560 0.9000 0.7000

Table 9.4: Probabilities obtained when performing inference on the Bayesian Network of Figure 9.6.

In Table 9.4 there are two pairs of synchronised variables: (Earthquake, Burglar) and (MaryCalls, JohnCalls). The quantum probability of P r(Earthquake = t|JohnCalls = t) has increased almost the same quantity as for the probability P r(Burglar = t|JohnCalls = t) (56.37% for earthquake and 59.34% for burglar). In the same way, when we observe that MaryCalls = t, then the percentage

151 of a Burglary increased 11.38%, whereas Earthquake increased a percentage of 10.13% towards its classical counterpart.

9.6 Summary and FInal Discussion

In this work, we analysed a quantum-like Bayesian Network that puts together cause/effect relationships and semantic similarities between events. These similarities constitute acausal connections according to the Synchronicity principle and provide new relationships to the graphical models. As a consequence, events can be represented in vector spaces, in which quantum parameters are determined by the simi- larities that these vectors share between them. In the realm of quantum cognition, quantum parameters might represent the correlation between events (beliefs) in a meaningful acausal relationship. The proposed quantum-like Bayesian Network benefits from the same advantages of classical Bayesian Networks: (1) it enables a visual representation of all relationships between all random variables of a given decision scenario, (2) can perform inferences over unobserved variables, that is, can deal with uncertainty, (3) enables the detection of independent and dependent variables more easily. Moreover, the mapping to a quantum-like approach leads to a new mathematical formalism for computing infer- ences in Bayesian Networks that takes into account quantum interference effects. These effects can accommodate puzzling phenomena that could not be explained through a classical Bayesian Network. This is probably the biggest advantage of the proposed model. A network structure that can combine different sources of knowledge in order to model a more complex decision scenario and accommodate violations to the Sure Thing Principle. With this work, we argue that, when presented with a problem, we perform a semantic categorisation of the symbols that we extract from the given problem through our thoughts Osherson [159]. Since our thoughts are abstract, cause/effect relationships might not be the most appropriate mechanisms to sim- ulate interferences between them. The Synchronicity principle seems to fit more in this context, since our thoughts can relate to each other from meaningful connections, rather than cause/effect relation- ships [87].

152 Chapter 10

Classical and Quantum Models for Order Effects

In this chapter, we explore the application of quantum-like projection models to accommodate para- doxical situations concerned with order effects. By order effects, we mean that the probability of, for instance, asking some question A followed by question B usually gives different results if we pose these questions in reverse order. In purely classical models, this poses a problem and cannot be directly ex- plained. Since classical probability is based on set theory, this means that it is commutative, that is, for some hypothesis H and two events A and B: P r ( A ∩ B | H ) = P r ( B ∩ A | H ). This commutativ- ity poses great challenges to accommodate situations such as order effects, because in order to have P r ( H | A ∩ B ) = P r ( H | B ∩ A ), then using Bayes Rule, one would need to satisfy the following relationship [188]:

P r ( B | H ∩ A ) P r ( A | H ∩ B ) P r ( H | A ) · = P r ( H | A ∩ B ) = P r ( H | B ∩ A ) = · P r ( H | B ) . P r ( B | A ) P r ( A | B ) (10.1) One of the quantum approaches that is highly used to explain order effects is the quantum projection model (also known as a quantum geometric model) [166, 190]. In this approach, we start to represent a person’s beliefs in a N-dimensional superposition state (for the case of binary questions, N = 2). To model a sequence of answers, we start by projecting this superposition state into the basis with the desired answer. From this basis, we perform a second projection to another basis, which represents the desired answer for the second question, and corresponds to a rotation by φ radians of the initial basis state. The final probability is given by performing the squared magnitude of the multiplication of all these projections. Since projections are given by matrices and matrix multiplication is non-commutative, then order effects can be naturally explained under this framework. The questions that we address in this chapter are the following. Given that the quantum approach consists in a geometric model that performs projections, can we obtain the same results using a clas- sical approach also based on projections? Is quantum theory really necessary and advantageous to explain paradoxical findings such as order effects? And, what makes this quantum projection approach

153 quantum, since no complex probability amplitudes are being used in the model [34, 206]? From where do the quantum interferences arise from and what do they mean under this context? In this chapter, we pretend to make a discussion about these questions. We also propose an alterna- tive interpretation for the parameters that are involved in these geometric projection models that can be applied to both classical and quantum models. Current literature interprets these parameters as similar- ities between questions. Under the proposed Relativistic Interpretation, these parameters emerge due to the lack of knowledge concerned with a personal basis state and also due to uncertainties towards the state of the world and towards the context of the questions. So, with a relativistic interpretation of parameters, we can give to both classical and quantum approaches an interpretation for the rotation of the basis vectors and why this rotation is necessary under a cognitive point of view. In the end, we argue that the application of the classical and quantum models should be based on Occam’s Razor: in the presence of two competing hypothesis, the one that has the fewest assumptions (or the one that is simpler) should be chosen. This depends much on the problem and what knowledge we want to extract from it. If we are mainly focused on a mathematical approach that can perform pre- dictions for order effects, abstracting the model from any interpretations or theories, which are intrinsic to the problem, then the classical approach is the one that makes the fewer assumptions and should be applied. If, on the other hand, we want to make a model that leverages on a more general theory to explain its predictions, then the quantum model is more indicated.

10.1 The Gallup Poll Problem

One example of question order effects that has been widely reported over the literature corresponds to the work of [134], where the author collected public opinions between two important people: Bill Clinton and Al Gore. In this poll, half of the participants were asked if they thought that Bill Clinton was a honest and trustworthy person and next they were asked the same about Al Gore. The other half of the participants were asked the same questions, but in reversed order. In the end, there was a total of 1002 respondents. Results showed that there was a big gap in a non-comparative context (the first questions) and a small gap in a comparative context (the second question). Asking about Clinton first, made the probabil- ity of the second question increase. On the other hand, asking about Al Gore first, made the probability of the second question decrease. This phenomena is usually referred to the assimilation effect and pure classical probability models cannot explain it, because they are based on set theory and, consequently, they are commutative. Table 10.1 summarises the results obtained in the work of [134].

Clinton-First Gore-First Differences Pr After First Question P r(C) = 50% P r(G) = 68% = 18% (non-comparative) Pr After Second Question P r(G|C) = 57% P r(C|G) = 60% = 3% (comparative) Effect + 7% - 8% Assimilation Effect

Table 10.1: Summary of the results obtained in the work of [134] for the Clinton-Gore Poll, showing an Assimilation Effect

154 In the same work, [134] also reported the same experiment, but using different politicians in the questions. The same questions as above were asked, but regarding the honesty and trustworthiness of Gingrich and Dole. The questions were posed in different orders. The total amount of respondents in this experiment was 1015. Table 10.2 summarises the results. Opposite to the assimilation effect, asking about Gingrich first, made the probability of the question regarding Dole decrease. On the other hand, asking about Dole first made the probability of the question regarding Gingrich increase. This phenomena is usually referred to the Contrast effect. Table 10.2 summarises the results obtained in the work of [134].

Gingrich-First Dole-First Differences Pr After First Question P r(G) = 41% P r(D) = 60% = 19% (non-comparative) Pr After Second Question P r(D|G) = 33% P r(G|D) = 64% = 31% (comparative) Effect - 8% + 4% Contrast Effect

Table 10.2: Summary of the results obtained in the work of [134] for the Gingrich-Dole Poll, showing a Contrast Effect.

The third poll reported in the work of [134] corresponds to a set of questions concerned with racial hostility. In a group of 1004 respondents, two questions were asked. One was: Do you think that only a few white people dislike blacks, many white people dislike blacks, or almost all white people dislike blacks?. The other question was the same, but about black people. These questions were posed in sequence and in different orders. Table 10.3 summarises the results. In this case, the order how the questions were posed did not matter, since they both contributed to an increase of the probability of the second question. This phenomena is usually referred to the additive effect.

White People First Black People First Differences Pr After First Question P r(G) = 41% P r(D) = 46% = 5% (non-comparative) Pr After Second Question P r(D|G) = 53% P r(G|D) = 56% = 3% (comparative) Effect + 12% + 10% Additive Effect

Table 10.3: Summary of the results obtained in the work of [134]. The table reports the probability of answering All or Many to the questions. The results show the occurrence of an Additive Effect.

The last example in the work of [134] is concerned with a poll about Peter Rose and Joe Jackson. Again, a set of 1061 respondents was gathered and two questions were posed in sequence. The ques- tions were do you think Peter Rose / Shoeless Joe Jackson should or should not be eligible for admission to the Hall of Fame?. These questions were performed in different orders and the results obtained are summarized in Table 10.4. In this last example, the order how the questions were posed did not matter, since they both con- tributed to a decrease of the probability of the second question. This phenomena is usually referred to the subtractive effect. All data reported in the work of Moore (2002) [134] constitute a violation of order effects. By or- der effects, we mean the probability of, for instance, asking some question A followed by question B usually gives different results if we pose the questions in reverse order. One way to approach this prob- lem is through Quantum Cognition Models. Quantum cognition is a research field that aims to explain

155 Peter Rose First Shoeless Joe Jackson First Differences Pr After First Question P r(G) = 64% P r(D) = 45% = 19% (non-comparative) Pr After Second Question P r(D|G) = 52% P r(G|D) = 33% = 19% (comparative) Effect - 12% - 12% Subtractive Effect

Table 10.4: Summary of the results obtained in the work of [134] for the Rose-Jacjson Poll, showing a Subtractive Effect. paradoxical findings (such as order effects) using the laws of quantum probability theory. The models provide several advantages towards its classical counterparts. They can model events in a superposi- tion state, which is a vector that comprises the occurrence of all possible events. They enable events in superposition to interfere with each other, this way disturbing the final probability outcome. These quantum interference effects do not exist in a classical setting and constitute the major advantage of these models, since we can use these interference effects to accommodate the paradoxical findings.

10.2 A Quantum Approach for Order Effectsl

In the Quantum Projection Model [34, 166, 190], a state is represented by a unit vector in a k-dimensional complex vector space. A quantum superposition state is represented by a vector |ψi = α0|0i + α1|1i +

··· + αk−1|k − 1i, where α0, . . . , αk−1 are quantum amplitudes and the sum of their squared magnitude Pk−1 2 must sum to 1, i=0 |αi| = 1. In the case of the Gallup Poll presented in Section 10.1, the quantum states are binary, since they correspond to a person’s yes/no answer, that is, we would represent a superposition state |Si regarding the answer to Clinton’s honesty and trustworthiness person as

|Si = s0|Cyi + s1|Cni, (10.2) where Cn and Cy correspond to the answers no or yes, respectively to Clinton’s honesty question. The variables s0 and s1 represent complex quantum probability amplitudes, which represent the initial beliefs of a participant, before answering the question. Consider Figure 10.1 where it is represented two sets of basis states: {Ay, An} and {By, Bn}. Generally speaking, one can look at these basis as the representation of two yes/no questions. Basis state Ay corresponds to a yes answer to question A and is given by the state vector |Ayi = [ 1 0 ] and |Ani = [ 0 1 ] corresponds to a no answer to question A. In the same way, for the yes/no answers of question B, we can represent the states as a rotation of questions’ A basis states. When an answer is given, we project the superposition state |Si into the basis state corresponding to the desired answer. In Figure 10.1 (right), we perform two sets of projections: (1) starting in the superposition state |Si, we first make a projection that is orthogonal to the basis state representing the yes answer of question B giving rise to the projector PBy and (2) from the basis state By we perform another orthogonal projection to the yes basis vector of question A, resulting in projector PAy. In Figure 10.1 (center), we perform the same projections, but in reverse order: (1) starting in the superposition state |Si, we first make a projection that is orthogonal to the basis state representing the yes answer of question A, PAy, and then (2) from the basis state Ay we perform another orthogonal projection to the yes basis vector of question B, PBy.

156 Figure 10.1: Example of the application of the quantum projection approach for a sequece of two binary questions A and B. We start in a superposition state and project this state into the yes basis of question A (left). Then, starting in this basis, we project into the basis corresponding to the answer yes of question B (center). We can then have a different result if we reverse the order the projections (right).

In the end, the final probability of answering yes to B given that it was previously answered yes to question A is given by the squared magnitude of the sequence of these projections. We can obtain a different probability value by making the inverse sequence of questions. The probabilities computed using the Quantum Projection approach give different results, matching the experimental findings of Moore (2002) [134], in which it is shown that the order of how the questions are posed matter and have an impact over the results.

P r(B = yes) = ||P Ay P By |Si||2 + ||P An P By |Si||2 6= ||P By P Ay |Si||2 + ||P By P An |Si||2 (10.3)

10.2.1 The Quantum Projection Model

For a 2-dimensional decision scenario the projection model can be described in the following points:

• Start by defining two orthonormal basis vectors for the questions, which implies that hAn|Ayi = 0. One set of basis vectors that is commonly chosen is

    1 0 |Ayi =   |Ani =   , 0 1

the other set of vectors corresponds to a rotation of the above basis vectors by φ radians. The new

basis can be found by multiplying a rotation matrix Rφ with each of the basis vectors

      Cos (φ) −Sin (φ) Cos (φ) −Sin (φ) |Byi = RφAy =   |Bni = RφAn =   Rφ =   . Sin (φ) Cos (φ) Sin (φ) Cos (φ) (10.4)

• Then, we define a superposition vector |Si, which comprises a person’s beliefs (or features) about some object. In this case, it corresponds to a superposition of possible answers to a given question: to answering No (An) or Yes (Ay). Although the model can be generalized for N random variables, we will describe the model for two random variables, which is what we need to describe the different

157 order effects found in the work of [134].

N=1 √ √ 2 iθ0 iθ1 X √ iθj |Si = s0 e |Ayi + s1 e |Ani such that, sj e = 1. (10.5) j=0

The variables s0 and s1 represent quantum probability amplitudes and the variables θ0 and θ1 correspond to their respective phase.

• When we ask question A first, this corresponds to a projection of the belief state |Si onto the subspace with the desired answer. In Figure 10.1 (left), this vector is being projected into the subspace Ay, which corresponds to the answer yes of question A. This projection produces the

state PAy|Si, which is given by

  √  iθ0 1 0 s0 e PAy = |AyihAy| =   PAy|Si =   (10.6) 0 0 0

• The probability of answering yes to the first question is given by the squared magnitude of the projected state:   2 iθ0 s0 e √ √ ∗ 2 iθ0 iθ0  ||PAy|Si|| =   = s0 e · s0 e = s0 (10.7)

0

• When we ask question B in the first place, this corresponds to a projection operator PBy that will project the superposition state S into the yes basis of question B. For this, we need to define the

projector PBy and the rotation of the superposition state SR as

  Cos2 (φ) Cos (φ) Sin (φ) PBy = |ByihBy| =   (10.8) Sin (φ) Cos (φ) Sin2 (φ)

So, PBy|Si is given by,

  √ √ 2 2 2 iθ0 iθ1 PBy = |Cos (φ)| + |Sin (φ)| s0 e Cos (φ) + s1 e Sin (φ) (10.9)

• In order to compute the probability of the sequence of answers Ay → By, we need to compute the 2 squared magnitude of their sequence of projections. That is, we need to compute PByPAy |Si . This sequence of projections is illustrated in Figure 10.1 (left) and corresponds answering yes to question A and yes to question B, respectively.

√ √ 2 2 2 iθ0 iθ1 P r(ByAy) = ||PByPAy|Si|| = Cos (φ) s0 e Cos (φ) + s1 e Sin (φ) (10.10)

• In the same way, we can compute the sequence of answers Bn and Ay, representing the answer yes to question A and no to question B.

√ √ 2 2 2 iθ1 iθ0 P r(BnAy) = ||PBnPAy|Si|| = Sin (φ) s1 e Cos (φ) − s0 e Sin (φ) (10.11)

158 • The final probability of A being yes, P r(Ay), is given by the sum: P r(B → Ay) = P r(ByAy) + P r(BnAy).

√ √ 2 2 iθ1 iθ0 P r(B → Ay) = P r(ByAy) + P r(BnAy) = Sin (φ) Cos (φ) s1 e − s0 e Sin (φ) + √ √ 2 2 iθ0 iθ1 Cos (φ) Cos (φ) s0 e + s1 e Sin (φ) (10.12)

• If, however, we wanted to compute the probability of By in the second question, that is, the prob- ability of answering yes to B after answering question A, then we would have to compute the sequence of projections:

2 √ iθ0 2 2 2 P r(AyBy) = ||PAy PBy|Si|| = s0 e Cos (φ) = | s0 | Cos (φ) (10.13)

• The probability of answering no to the first question A and yes to the second question B is given by 2 √ iθ1 2 2 P r(AnBy) = ||PAn PBy|Si|| = s1 e Sin (φ) = |s1| Sin (φ) (10.14)

• The final probability of answering yes to question B followed by question A is given by the sum of the probabilities P r(A → By) = P r(AnBy) + P r(AyBy)

√ iθ0 2 2 √ iθ1 2 2 2 2 P r(A → By) = s0 e Cos (φ)+ s1 e Sin (φ) = |s0| Cos (φ)+|s1| Sin (φ) (10.15)

• In the end, setting the parameters as suggested in [34], that is θ = π/4, s0 = 0.7 and s1 = 0.3, leads to a big gap in the non-comparative context and a small gap for the comparative context, simulating the Assimilation Effect presented in Table 10.1.

P r(Ay) = 0.7 P r(By) = 0.96 P r(A → By) = P r(B → Ay) = 0.5

10.2.2 Discussion of the Quantum Projection Model

In the quantum projection model, it is interesting to notice that in Equation 10.15, the probability of answering yes to question B followed by question A, does not depend on the quantum parameter θ. This means that the quantum model collapses to its classical counterpart, since it depends only on the rotation parameter φ and on the initial beliefs (given by s0 and s1). A deeper analysis of Equation 10.15 shows more information. If we fix the rotation parameter φ to π/4 and s1 = 1 − s0 (as suggested in the example in [34]), then we can conclude that varying the initial belief state s0 plays no role in the outcome of the final probabilities. When making a sequence of questions, only the rotation parameter φ can change the outcome. In another analysis, we tested how the function would evolve if we varied the rotation parameter φ (between 0 and 2π) and the initial belief state s0 (between 0 and 1). The outcome is Figures 10.2 and 10.3, in which one can conclude that it is possible to simulate the several order effects reported in the work of Moore (2002) [134] by setting s0 and varying the rotation parameter φ.

159 In the same way, we performed a similar analysis to Equation 10.12, which corresponds answering yes to the sequence of questions B → Ay. The difference is that Equation 10.12 does depend on the quantum interference parameter θ. By fixing the rotation parameter φ and the initial belief state s0 with the quantum interference parameter θ, this analysis showed similar results to the previous question: when we fix the rotation parameter, the function becomes constant and both initial state s0 and quantum interference term θ play no role in the final probability outcome. This means that when computing these probabilities, only the rotation parameter will have a direct impact in the calculations. Moreover, if we fix the quantum interference term θ = 0 (as suggested in the example contained in [34]), then we can see that varying the rotation parameter φ also enables the possibility of representing the different types of order effects reported in [134]: assimilation, contrast, additive and subtractive effects. In the next section, we will address these rotation parameters more closely and propose and alternative interpretation under the scope of quantum cognition.

10.3 The Relativist Interpretation of Parameters

In the book of [34], quantum parameters in the Quantum Projection Model represent the similarity be- tween questions. This parameter represents the angle of the inner product between the two projections. In other works of the literature, the quantum interference parameter represents not only the similarity be- tween two random variables (through their inner product), but also the semantical relationship between them [141, 145]. However, under the Quantum Projection Model, we have seen in the previous section that the quan- tum interference parameters play no role in the computation of the final probabilities. So, where do the quantum interference effects come from in order to accommodate the violations of order effects? One could argue that this interference comes from the rotation of the basis vectors by φ radians. In the pre- vious section, the rotation operator R changed the basis state Ax into Bx by applying a rotation of φ radians. Under a cognitive point of view, this rotation is used to model the quantum interference effects

Figure 10.2: Relation between the rotation pa- Figure 10.3: Relation between the rotation pa- rameter φ and the quantum probability ampli- rameter φ and the quantum probability ampli- tude s0 of Equation 10.15. The amplitude s1 tude s0 of Equation 10.12. The amplitude s1 was set to s1 = 1 − s0. We can simulate sev- was set to s1 = 1 − s0. We can simulate sev- eral order effects by varying the parameter φ. eral order effects by varying the parameter φ.

160 and corresponds to the change of a person’s mental beliefs [202, 206]. However, this interpretation is independent from a quantum perspective and it holds for pure classical projection models as we will show in Section 10.4. In this section, we present an alternative interpretation of quantum interference parameters for this projection-based approach for order effects that can accommodate both classical and quantum projec- tion approaches for order effects. We call it The Relativistic Interpretation of Parameters. Under the Relativistic Interpretation of Parameters, when we pose a question to different people, each person will represent its preferred answer in a N-dimensional vector space. For the case of binary questions, this representation is performed in a 2-dimensional psychological space. However, the indi- vidual is not aware in which basis he is making this representation. Each individual person has its own basis, but it is possible to represent different beliefs from different individuals by performing a rotation of the basis state by φ radians.

Figure 10.4: Example of the Relativistic Interpretation of Quantum Parameters. Each person reasons according to a N-dimensional personal basis state without being aware of it. The representation of the beliefs between different people consists in rotating the personal belief state by φ radians.

Take as example Figure 10.4. There are three individuals A, B and C to whom it is posed the same binary question. Each person will represent their answer in their own 2-dimensional psychological space, without knowing in which basis they are representing their beliefs. However, since we are assuming that each person has its own vector space, then one can describe the beliefs of each individual in terms of another by performing a rotation of their basis by some φ radians. For instance, assuming that individual A is in the common |0i / |1i basis, then, individual’s C beliefs are described, under person’s

A perspective, as a rotation of φA,C radians. In the same way, person’s A beliefs can be described by person’s C point of view by performing the inverse rotation, that is, by rotating the basis −φA,C radians or φC,A radians. The same line of thought is applied for person B. In the end, quantum interferences arise in quantum cognition due to this lack of knowledge regarding each person’s own basis states and due to uncertainties towards the state world. In the next section, we will perform a discussion of whether or not we need a Quantum Approach to

161 model order effects.

10.4 Do We Need Quantum Theory for Order Effects?

So far, we presented the problem of order effects in which the probability of a sequence of events is different if we switch the order of that sequence. This cannot be simulated by pure classical probabilistic models, because classical probability theory is based on set theory and, consequently, events commute. We also presented a quantum model that is widely used in the literature and can account for order effects in a general and natural way. However, we showed that the quantum interferences do not play any role in this quantum projection model and only the rotation of the basis vectors is necessary to accommodate paradoxes derived from order effects. At this point, the reader might be thinking: Do we really need quantum theory to explain order effects? Can we achieve the same results using a classical projection counterpart of the quantum model? The answers to these questions are no and yes, respectively. One can argue that the main difference between the quantum approach towards a classical projection model is that in the first it is used complex probability amplitudes and in the later it is used real numbers. One could argue that what makes the model quantum is the usage of these quantum amplitudes, which in turn generate quantum interference effects, which can accommodate several paradoxical findings re- ported in the literature [193, 196, 198, 26]. However, that does not hold in the quantum projection model, since quantum interferences end up playing no role in the computation of the probabilities. Moreover, even if the quantum interference terms did matter, then the complexity of the quantum projection model would increase, since we would require an additional 2N free quantum interference parameters for bi- nary questions that would need to be fit. For the case of M possible answers, the complexity will grow to M N , where N is the number of questions.

10.4.1 A Classical Approach for Order Effects

The classical projection approach works just like the previously described quantum model with the dif- ference that real numbers are used instead of quantum probability amplitudes. The model works in a N-dimensional vector space, however, in order to simulate the results obtained in the Clinton / Gore experiment from the work of Moore (2002), we will describe the model for a 2-dimensional decision scenario:

• Start by defining two orthonormal basis vectors for the questions. One set of basis vectors that is commonly chosen is     1 0 |Ayi =   |Ani =   , 0 1

the other set of vectors corresponds to a rotation of the above basis vectors by φ radians. The new

162 basis can be found by multiplying a rotation matrix Rφ with each of the basis vectors

      Cos (φ) −Sin (φ) Cos (φ) −Sin (φ) |Byi = RφAy =   |Bni = RφAn =   Rφ =   . Sin (φ) Cos (φ) Sin (φ) Cos (φ) (10.16)

• Since we are in a Euclidean space, we can define a vector |Si, where s0 and s1 are variables representing real numbers.

N=1 √ √ X √ 2 |Si = s0|Ayi + s1|Ani such that | si| = 1. (10.17) i=0

• The probability of answering yes to the first question A corresponds to the squared magnitude of the projection of vector S into the yes basis of A:

  1 0 2 PAy = |AyihAy| =   P r (Ay) = ||PAy|Si|| = |s0| (10.18) 0 0

• On the other hand, if we want to compute the probability of posing question B first, then the same paradigm applies. We start with vector S and then we project this state into the yes basis of question B:   Cos2 (φ) Cos (φ) Sin (φ) PBy = |ByihBy| =   (10.19) Sin (φ) Cos (φ) Sin2 (φ)

2 2 2  √ √ 2 P r (Ay) = ||PBy|Si|| = Cos (φ) + Sin (φ) | s0Cos (φ) + s1Sin (φ)| (10.20)

• To compute the probability of the sequence of answers Ay → By, we need to compute the squared 2 magnitude of their sequence of projections, that is PByPAy |Si :

2 2 2 2  P r(ByAy) = PBy PAy |Si = |s0| Cos (φ) Cos (φ) + Sin (φ) (10.21)

• In the same way, we can compute the sequence of answers Bn and Ay, representing the answer yes to question A and no to question B.

2 √ √ 2 P r(BnAy) = PBn PAy |Si = |Sin (φ)(− s1Cos (φ) + s0Sin (φ))| (10.22)

• The final probability of A being yes, P r(B → Ay), is given by the sum: P r(B → Ay) = P r(ByAy)+ P r(BnAy).

√ √ 2 P r (B → Ay) = P r(ByAy) + P r(BnAy) = |Sin (φ)( s0 Sin (φ) − s1Cos (φ))| + (10.23) √ √ 2 |Cos (φ)(Cos (φ) s0 + Sin (φ) s1)|

• If, however, we want to compute the probability of By in the second question, that is, the probability

163 of answering yes to B after answering question A, then we would have to compute the sequence of projections:

2 2  2 2 P r(AyBy) = PBy PAy |Si = |s0| Cos (φ) Cos (φ) + Sin (φ) (10.24)

• The probability of answering no to the first question A and yes to the second question B is given by 2 2 2 2  P r(AnBy) = PBy PAn |Si = |s1| Sin (φ) Cos (φ) + Sin (φ) (10.25)

• The final probability of answering yes to question B followed by question A is given by the sum of the probabilities P r(A → By) = P r(AnBy) + P r(AyBy)

2 2 P r (A → By) = |s0| Cos (φ) + |s1| Sin (φ) (10.26)

• In the end, setting the parameters as suggested in [34], that is θ = π/4, s0 = 0.7 and s1 = 0.3, this leads to a big gap in the non-comparative context and a small gap for the comparative context, simulating the Assimilation Effect presented in Table 10.1.

P r (Ay) = 0.7 P r (By) = 0.96 P r (A → By) = P r (B → Ay) = 0.5

It is important to note that, in this work, the main difference between a quantum-like model and a classical model resumes to the fact that in the first we use amplitudes (which are complex numbers) and in the later we use real numbers [44]. The general idea is that, by using complex numbers, when we perform a measurement and make the squared magnitude of the projection, the usage of complex numbers will lead to the emergence of quantum interference effects. If, on the other hand, we use real numbers (classical model), then when we measure the length of the projection, we will never obtain any kind of interference terms.

10.4.2 Analysis of the Classical Projection Model

In the quantum model, we concluded that the quantum interference term θ plays no role in a 2-dimensional model. In this section, we are interested to know if there is any relation between the classical model and the quantum model. In order to do this, we started to analyse the probability of answering yes to B in the second question, P r(A → By). With the classical projection model one can achieve the same results reported in the quantum model. Since in the quantum model the quantum interference terms do not play any role in the computation of the probability of the second question (the one that has the rotated basis vectors), then it is straightforward that it can only accommodate the paradoxical finds through the rotation parameter φ. In the same way, the classical model only depends on this rotation parameter in order to simulate the several order effects reported in the work of Moore (2002) [134]. We also decided to fix the rotation parameter φ, in the classical model, and verify how the input state

164 s0 would affect the final probabilities. The analysis showed a constant function, which means that from the moment we specify a rotation to the model, it does not matter what the input state s0 is, because it does not affect the final probability outcome. Just like in the quantum model, it is only the rotation parameter that will enable the accommodation of the several order effects reported in the work of Moore (2002) [134]. We also made a similar analysis with respect the probability of answering A in the second question. The results obtained reinforce the evidence that the quantum projection model has the same performance as the classical model.

10.4.3 Explaining Serveral Order Effects using the Classical and Quantum Pro- jection Models

So far, we have seen that both classical and quantum models applied for the Clinton-Gore example give similar results. In this section, we will fit the different parameters of both models in order to simulate all order effects presented in the work of Moore (2002) [134]. Table 10.5 presents the results obtained. Since quantum interference terms did not play any role in the quantum model, we can see that the values used to fit the classical model are the same ones used to fit the quantum model. This reinforces the conclusion that, for a 2-dimensional decision scenario, the quantum projection model collapses to the classical model and does not provide any advantages towards the classical approach.

Quantum Projection Model Classical Projection Model Experiments s0 θ φ Pr(1st ans) vs Pr(2nd ans) vs s0 φ Pr(1st ans) vs Pr(2nd ans) vs √ Pr(1st ans exp) Pr(2nd ans exp) √ Pr(1st ans exp) Pr(2nd ans exp) iθ0 Clinton / Gore √0.50e [0, 2 π] 0.7133 0.50 / 0.50 0.57 / 0.57 √0.50 0.7133 0.50 / 0.50 0.57 / 0.57 iθ0 Gore / Clinton √0.68e [0, 2 π] 2.6516 0.68 / 0.68 0.60 / 0.60 √0.68 2.6516 0.68 / 0.68 0.60 / 0.60 iθ0 Gingrich / Dole √0.41e [0, 2 π] 1.4858 0.41 / 0.41 0.33 / 0.33 √0.41 1.4858 0.41 / 0.41 0.33 / 0.33 iθ0 Dole / Gingrich √0.60e [0, 2 π] π 0.60 / 0.60 0.60 / 0.64 √0.60 π 0.60 / 0.60 0.60 / 0.64 iθ0 While / Black √0.41e [0, 2 π] 0.0510 0.41 / 0.41 0.46 / 0.46 √0.41 0.0510 0.41 / 0.41 0.46 / 0.46 iθ0 Black / White √0.53e [0, 2 π] π 0.53 / 0.53 0.53 / 0.56 √0.53 π 0.53 / 0.53 0.53 / 0.56 iθ0 Rose / Jackson √0.64e [0, 2 π] 3.0216 0.64 / 0.64 0.52 / 0.52 √0.64 3.0216 0.64 / 0.64 0.52 / 0.52 Jackson / Rose 0.45eiθ0 [0, 2 π] π 0.45 / 0.45 0.45 / 0.33 0.45 π 0.45 / 0.45 0.45 / 0.33

Table 10.5: Prediction of the geometric approach using different φ rotation parameters to explain the different types of order effects reported in the work of [134]. The columns Pr(1st ans) vs Pr(1st ans exp) represent the answer to the first question obtained using the projection models and the value reported in [134], respectively. Pr(2nd ans) vs Pr(2nd ans exp) represent the answer to the second question obtained using the projection models and the value reported in [134].

From Table 10.5, one can also note that it was possible to fit all parameters of both models in order to accommodate the paradoxical findings reported for different order effects, namely assimilation, contrast, additive and subtractive effects. One can also note that, for the subtractive effects, in the experiment regarding Jackson / Rose, the final probability computed achieved an error of 36% when compared to the results reported in the work of Moore (2002) [134]. This error occurred because there was no possible way to minimise Equation 10.26 to such values. The most minimum value of the function was achieved by setting the rotation parameter φ to π. Despite this problem, in general, both classical and quantum projection approaches were proved to be similar for a 2-dimensional decision problem and were both able to accommodate all order effects. In the next section, we will make a brief discussion on whether to use quantum models or classical

165 models to accommodate order effects.

10.4.4 Occam’s Razor

Given that in the quantum projection approach the quantum interference θ parameter ends up not playing any role in the calculations, the interference effects come only from the rotation of the basis (belief) vectors. In a true quantum setting, interference effects would emerge naturally due to the nature of complex numbers. For some event α ∈ A followed by a finite set of N partition events βj ∈ B, where α and βj are represented by complex numbers, the total probability of event α considering just two events, N = 2, is given by Equation 10.27. The resulting interference is different from the interference that is produced by rotating the belief vectors by φ radians.

2 p p iθ1 p p iθ2 P r(α) = β1 α|β1 e + β2 α|β2 e = (10.27) 2 2 p p p p p p p p = β1 α|β1 + β1 α|β1 + 2 · β1 α|β1 β2 α|β2 Cos (θ1 − θ2)

For N random variables, Equation 10.27 generalises to Equation 10.28[141], where one can notice an exponential growth of the quantum interference parameters θ.

N q 2 N−1 N q X p iθj X X p iθj p p iθk P r(α) = βj α|βje + 2 βj α|βje βk α|βke · Cos (θj − θk) (10.28)

j=1 j=1 k=j+1

For the case of order effects, interference terms end up not playing any role in order to determine the final probabilities in a decision scenario. However, this represents an exception. Many experiments have been conducted throughout the literature, which show that humans violate the laws of probability theory and logic under scenarios with high levels of uncertainty. Experiments such as The Prisoner’s Dilemma [172, 55, 125], The Two Stage Gambling Game [198, 120, 122] and categorization experi- ments [41] show that pure classical models cannot simulate human decisions, however using quantum probability, it is possible. The quantum interference terms that emerge from the application of complex numbers in the representation of mental states, gives rise to a free parameter that can be used to fit and explain these experiments. So far, the literature has proposed dynamical Quantum models [44] based on Hamiltonian matrices and Schrodinger’s¨ equation, Quantum-Like Bayesian Networks [141] that can explain both classical and quantum phenomena in a single model, and many more. Summarising, both classical and quantum models are similar. Order effects can be explained by these two frameworks intuitively, since both models take advantage of the fact that matrix multiplica- tions are non-commutative. One could argue that what makes the model quantum is the usage of these quantum amplitudes, which in turn generate quantum interference effects, which can be used to ac- commodate paradoxical situations, such as order effects. However, that does not hold in the quantum projection model, since quantum interferences end up playing no role in the computation of the probabil- ities. Moreover, even if the quantum interference terms did matter, then the complexity of the quantum projection model would increase, since we would require an additional 2N free quantum interference parameters for binary questions that would need to be fit. To each input state sx there is an additional

166 phase parameter eiθx . For the case of M possible answers, the complexity grows to M N , where N is the number of questions. In the end, one can question, which approach should be used. We can either use a classical projec- tion approach to simulate inconsistencies of order effects or we can use a quantum projection approach. According to Occam’s razor, in the presence two competing hypothesis, the one that has the fewest assumptions (or the one that is simpler) should be chosen. This, of course is a very vague statement and we should take into account the nature of the problem itself and what we intend to prove with it. If we are merely interested in simulating the results of some experiments, abstracting ourselves from the interpretation and meaning of the nature of the experiment, then according to Occam’s razor, the classi- cal approach would be right choice. If, however, we are interested in developing a more general theory that requires additional interpretations, then the quantum approach would be the one to go with. So, in the end, the application of a Quantum or Classical approach depends on the nature of the problem and in what we intend to explain (or prove) with the application of the model.

10.5 Summary and Final Discussion

Quantum probability theory has been gaining increasing attention in fields outside of physics [142]. Its general framework enables the representation of beliefs in a superposition state, which is a vector that comprises the occurrence of all possible beliefs. Moreover, this vector representation decouples itself from classical probability theory and it is not limited to the constraints of set theory. This means that empirical findings, such as order effects, can be easily explained by the non-commutativity of matrix operations under a quantum approach. In purely classical models, these order effects cannot be directly explained, because set theory is commutative. In this chapter, we showed how one can take advantage of the geometric-based quantum theory in order to accommodate several order effects situations (additive, subtractive, assimilation and contrast effects) using the gallup reports collected in the work of Moore [134]. However, we also showed that the exact same results can be obtained using a pure classical projection model. In the end, order effects can be explained by both frameworks intuitively, since both models are similar and take advantage of the fact that matrix multiplications are non-commutative. Depending on how one sets the rotation operator, one can simulate any effect reported in the work of Moore [134]. Additionally, we also proposed an alternative interpretation for the rotation parameters used in these models, which is called the relativistic interpretation of parameters. This interpretation states that each person makes an inference by project- ing a point in their personal N-dimensional psychological space, however, the person is not aware in which basis this point is projected in. So, the rotation of the parameters, instead of being interpreted as a measure of similarity (or an inner product) between questions, we state that this parameter emerges due to this lack of knowledge concerned with the basis state and also due to uncertainties towards the state of world and the context of the questions. With the relativistic interpretation of parameters, we can give to both classical and quantum approaches an interpretation for the rotation of the basis vectors and why this rotation is necessary under a cognitive point of view.

167 In the end, we argue that the application of these models should be based on Occam’s Razor: in the presence two competing hypothesis, the one that has the fewest assumptions (or the one that is simpler) should be chosen. This depends much on the problem and what knowledge we want to extract from it. If we are mainly focused on a mathematical approach that can perform predictions for order effects, then the classical approach should be used. If, on the other hand, we want to make a model that leverages on theories and interpretations to explain the its predictions, then the quantum model is more indicated.

168 Chapter 11

Classical Models with Hidden Variables

So far, it was shown that quantum-based probabilistic models are able to explain and predict scenarios that cannot be explained by pure classical models [33, 31, 43]. However, there is still a big resistance in the scientific literature to accept these quantum-based models. Many researchers believe that one can model scenarios that violate the laws of probability and logic through traditional classical decision models [151]. These violations of the laws of probability theory are hard to explain through classical theory and can be of different types: violations to the Sure Thing Principle [170], disjunction / conjunction errors [196], Ellsberg [70] / Allais [13] paradoxes, order effects [182], etc. Although, the quantum cognition field is recent in the literature, there have been several different quantum-like models proposed in the literature. These models range from dynamical models [44, 41, 164], which make use of unitary operators to describe the time evolution since a participant is given a problem (or asked a question), until he/she makes a decision, to models that are based on contextual probabilities [7, 105, 215]. Quantum-like dynamical models have also been proposed in the literature to accommodate violations to the Prisoner’s Dilemma Game [164], study the evolution of the interaction of economical agents in markets [102, 82] or even to specify a formal description of dynamics of epigenetic states of cells interacting with an environment [19]. On the other hand, quantum-like models based on contextual probabilities, explore the application of complex probability amplitudes in order to define contexts that can interfere with the decision-maker [99, 106, 105]. In this chapter, we take a different approach. In the literature, it is clear and acceptable that simple and pure probabilistic models cannot accommodate human decisions that violate the laws of classical probability theory and logic [34]. But can a more complex classical model simulate the paradoxical findings reported in the literature? In order to answer this question, we propose the application of latent variables in classical models to accommodate these paradoxical findings. By latent variables, we mean random variables that are hidden, that is, they cannot be directly measured in an experimental setting, but can be indirectly inferred from experimental data. These variables bring great advantages to cognitive models, because many observed variables can be condensed into a smaller number of

169 hidden variables, enabling a dimensionality reduction of the model. For instance, in Psychology or Social Sciences, one can use latent variables to summarise the influence of several variables, such as beliefs, personality, social attitudes, etc, over the concept of human behaviour [29, 80]. A well known classical model that can incorporate such dependencies is the Bayesian Network [161]. This model represents relationships between random variables (such as causal and conditional depen- dencies) in an acyclic probabilistic graphical structure. Bayesian Networks are powerful inference mod- els that have been successfully applied over the years in many different fields of the literature, mainly in artificial intelligence, genetics, medical decision-making, economics, etc. In this chapter, we present a classical Bayesian Network that makes use of Latent Variables and we will compare it against its quantum counterpart, the Quantum-Like Bayesian Network, which was previously proposed in Moreira & Wichert [141]. In the end, we conclude that it is possible to simulate the violations to the Sure Thing Principle using the classical Bayesian Network with latent variables with an exponential increase in its complexity, however this model cannot predict both observed events and unobserved experimental conditions from Tversky & Shafir [198]. On the other hand, the quantum-like model is shown to be able to accommodate both situations for observed and unobserved events in a single and general model. Note that the Sure Thing Principle is a concept widely used in game theory and was originally introduced by Savage [170]. This principle is fundamental in Bayesian probability theory and states that if one prefers action A over B under state of the world X, and if one also prefers A over B under the complementary state of the world X, then one should always prefer action A over B even when the state of the world is unspecified.

11.1 Latent Variables

Most of the times, the data that is recorded (or observed) does not provide all the information that is needed to model a decision scenario. In these situations, latent variables are used to model complex patterns that we do not have the complete data for. There is not a general and formal definition for latent variables [29]. Since it is a concept that is widely used across different multidisciplinary areas, it can be defined differently according to its application. However, a very simplistic and informal definition can be given as variables that are not directly observed from data, but can be inferred using the information of the variables that were recorded [14]. Instead of specifying concrete relationships between variables, latent variables enable the abstraction of these relationships allowing a more general representation, which can be inferred from the observed variables. In this work, we will use latent variables in a probabilistic graphical model, more specifically in a Bayesian Network. Generally speaking, a Bayesian Network is an acyclic probabilistic graphical model, which provides an intuitive way of specifying probabilistic relationships and dependencies between ran- dom variables [80]. These relationships are specified through a joint distribution over the set of all random variables in the model, and each node specifies conditional dependences over its parent nodes. Under this representation, a random variable becomes latent when it is unobserved (or unknown), which suggests a local independence definition, according to Bollen [29]. When a latent variable is constant

170 Figure 11.1: Example of a Bayesian Network with a latent variable H and a random variable X.

(for instance, a prior probability representing a person’s cognitive bias towards some topic), the ob- served variables become independent. More formally, the independence between random variables and the latent variables is given by Equation 11.1.

P r(X1,X2,...,Xn) = P r(X1|h)P r(X2|h) ··· P r(Xn|h) (11.1)

Given a set of observed random variables X1,X2, ··· Xn and some vector of latent (hidden) variables h, the joint probability P r(X1,X2, ··· ,Xn) corresponds to the product of the conditional probabilities of each random variable Xi over the associated latent variable, P r(X1|h)P r(X2|h) ··· P r(Xn|h). Consider Figure 11.1. Suppose you have a parameterized acyclic probabilistic graphical model over the parameter φh=i. We will assume that node H represents a latent variable, because it is not directly observed (or it is hidden) for some given reason: it might be too expensive to collect its data, it might have been not recorded or we simply might not have access to the process generating the data we can observe. Given a dataset of collected data D of size M, the above network consists in a tuple hh[m], x[m]i, where h is paramterized instance of the latent variable H and x an instance of the random variable X. The likelihood (a measure similar to a probability, which provides support for particular values of a parameter in a parametric model) of the network is given by the joint distribution:

M Y L(φ : D) = P r(h[m], x[m]: φ) (11.2) m=1 In a Bayesian Network, the full joint probability distribution can be described in terms of the chain rule. So, Equation 11.2 can be rewritten as:

M Y L(φ : D) = P r(h[m]: φH ) · P r(x[m]|h[m]: φX|H ) (11.3) m=1 Note that Equation 11.3 is composed by two terms, because the network in Figure 11.1 has two ran- dom variables (more specifically, one random variable and one latent variable). For N random variables, this model would have N terms. Each term is called a local likelihood function and can determine how well a random variable can predict its parents [151]. In Equation 11.3, one can decompose the local likelihood function regarding the random variable X into two sets: one for each assignment that the random variable can take. In this case, for the sake of simplicity, it is assumed that X is a binary random variable that can be assigned the values T rue or F alse.

171 Y Y P r(x[m]|h[m]: φX|H ) = P r(x[m]|h[m]: φX|HT rue ) · P r(x[m]|h[m]: φX|HF alse )

m:h[m]=HT rue m:h[m]=HF alse (11.4) Note that for random variables with A assignments, the decomposition in Equation 11.4 would consist in A terms. Moreover, since X is a binary random variable, then one can also rewrite Equation 11.4 as

Y P r(x[m]|h[m]: φX|HT rue ) = φX=T rue|H=T rue · φX=F alse|H=T rue (11.5)

m:h[m]=HT rue

From Equation 11.5, we conclude

# hh = T rue, x = F alsei # hh = T rue, x = F alsei φ = = X=F alse|H=T rue # hh = T rue, x = T ruei + hh = T rue, x = F alsei # hh = T ruei (11.6) In Equation 11.6, the symbol # represents the cardinality. More specifically, # hh = T rue, x = F alsei represents both number of instances where the latent variable H has the value T rue and the number of instances where the random variable X has the value F alse. This means that for a given network structure, we simply count the number of instances that each assignment X and H appear. The goal is to estimate the parameters of the latent variable in order to accommodate the violations of the several paradoxes reported over the literature. For the case of this work, to accommodate violations to the Sure Thing Principle.

11.2 Classical Bayesian Network with Latent Variables

In order to address the idea that some hidden variable(s) might influence the participant’s mental states leading to the paradoxical findings reported in Table 4.5, in this section we introduce a classical Bayesian Network model with latent variables to model the Prisoner’s Dilemma game. A classical Bayesian Network can be defined by a directed acyclic graph structure in which each node represents a different random variable from a specific domain and each edge represents a direct influence from the source node to the target node. The graph represents relationships between variables and each node is associated with a conditional probability table which specifies a distribution over the values of a node given each possible joint assignment of values of its parents. This idea of a node, depending directly from its parent nodes, is the core of Bayesian Networks. Once the values of the parents are known, no information relating directly or indirectly to its parents or other ancestors can influence the beliefs about it [116]. Consider Figure 11.2, which represents a classical Bayesian Network with a latent variable to model the Prisoner’s Dilemma game. In Figure 11.2, P 1 and P 2 are both random variables. P 1 represents the decision of the first player and P 2 represents the second participant’s decision (either defecting or cooperating). H is the hidden state or latent variable and represents some unmeasurable factor that can influence the players’ deci-

172 Figure 11.2: A classical Bayesian Network with a latent variables to model the Prisoner’s Dilemma game. P 1 and P 2 are both random variables. P 1 represents the decision of the first player and P 2 represents the decision of the second player (either to cooperate or to defect). H is the hidden state or latent variable and represents some unmeasurable factor that can influence the participant’s decisions. sions. For the sake of simplicity, let’s assume that the latent variable has two states: risk seeking and risk averse. In the proposed classical model, the latent variable represents the personality of a player towards risk and suggests that a risk seeking player will tend to cooperate more, whereas a risk averse player will tend to defect more often. The main reason for making this assumption resides in two factors. First, over the literature, the prisoner’s dilemma game is modelled under these two conditions when the purpose is to represent individual risk in decision-making tasks [22, 164]. Second, the complexity of the Bayesian Network would grow exponentially and unnecessarily large with the incorporation of more random variables or with random variables with multiple assignments. Adding extra assignments to the latent variable will not bring any advantage in this decision-making problem as the reader will notice in the end of this section. Since a latent variable is a variable that is not directly measured, it can only be inferred from the observed data. The two random variables, P 1 and P 2, are the observed data from the experiment and are represented, respectively, by the functions F (P 1i,Hj) and G(P 2i,Hj) for i ∈ {defect, cooperate} and j ∈ {risk averse, risk seeking}. These functions depend on the hidden and unmeasured variable H, which is parameterized over the parameter K(H = j). One can note that this model is in accordance with the definition of latent variables from [29]: if H is known, then the random variables P1 and P2 become independent:

P r(P1,P2) = P r(P1|H).P r(P2|H)

The goal of this model is to find the parameter K(H = j) from the observed experimental data such that all conditions of the Prisoner’s Dilemma Game are satisfied. In other words:

1. When it is known that the first player chose to defect, then the participant should defect .

2. When it is known that the first player chose to cooperate, then the participant should defect.

3. When it is not known if the first player chose to defect or cooperate, then the second player should cooperate.

Assuming that m corresponds to the mth element of the dataset D, the maximum likelihood estimate

173 of this network with variable φ is given by the full joint probability distribution,

Y L (φ : D) = P r (H [m] ,P 1 [m] ,P 2 [m]: φ) (11.7) m=1

Remember that H is the hidden state or latent variable that represents the personality of the player and is parameterized over the parameter K(H = j), with j ∈ {risk averse, risk seeking}. P 1 and P 2 are both random variables. P 1 represents the decision of the first player and P 2 represents the second participant’s decision (either defecting or cooperating). P 1 and P 2, are the observed data from the experiment and are represented, respectively, by the functions F (P 1i,Hj) and G(P 2i,Hj) for i ∈ {defect, cooperate}

Given that we have three random variables in the network, the full joint probability distribution consists in the product of the nodes given their parents,

! ! ! Y Y  Y  L (φ : D) = P r (H [m]: φH ) P r P 1 [m] |H[m]: φP 1|H P r P 2 [m] |H[m]: φP 2|H m=1 m=1 m=1 (11.8)

The first term of Equation 11.8 is the latent variable and corresponds to the prior probability about a player’s personality. This hidden variable needs to be inferred from the observed data. The second and third terms correspond to player’s P1 and P2 actions, respectively, and are the terms that we need to expand and compute. Since we are more interested in computing the probability of the second player, P 2, we will proceed the calculations with this term. The calculations for P 1 are performed in a similar way. Expanding the term regarding P 2, the probability of the second player can be computed in terms of his personality (either risk seeking or risk averse), which is represented by the latent variable H.

Y  Y P r P 2 [m] |H[m]: φP 2|H = P r(P 2[m]|H[m]: φP 2|H=risk averse) · m=1 m:h[m]=risk averse (11.9) Y P r(P 2[m]|H[m]: φP 2|H=risk seeking) m:h[m]=risk seeking

Analysing each term, we see the latent variable can indeed be estimated by the random variables P 2 and P 1 through a function G(P 2 = i, H = j) and F (P 1 = i, H = j), for i ∈ {defect, cooperate} and j ∈ {risk averse, risk seeking}, which corresponds to the number of times the assignments of H would appear together with each assignment of P 2 and P 1. More formally,

Y    P r G2 [m] |H [m]: φG2|H=risk averse = φG2=defect|H=risk averse · φG2=cooperate|H=risk seeking H=risk averse (11.10)

174 # hP 2 = defect, H = risk aversei φ = = G (P 2 = defect, H = risk averse) P 2=defect|H=risk averse hH = risk aversei (11.11) In the same way, we can make the same calculations for a risk seeking player:

# hP 2 = defect, H = risk seekingi φ = = G (P 2 = defect, H = risk seeking) P 2=defect|H=risk seeking hH = risk seekingi (11.12) For the random variable P 1 the calculations are similar:

# hP 1 = defect, H = risk aversei φ = = F (P 1 = defect, H = risk averse) P 1=defect|H=risk averse hH = risk aversei (11.13) # hP 1 = defect, H = risk seekingi φ = = F (P 1 = defect, H = risk seeking) P 1=defect|H=risk seeking hH = risk seekingi (11.14) Which leads to the paramterization made in the Bayesian Network model in Figure 11.2. In the next section, we will try to find an estimation for the functions F (P 1 = i, H = j), F (P 1 = i, H = j) and for parameter K(H = j) that can explain both observed and unobserved conditions for the Prisoner’s Dilemma game.

11.2.1 Estimating the Parameters

In order to find a Classical Bayesian Network that can explain the paradoxical results in Table 4.5, one needs to fit the conditional probabilities F (P 1 = i, H = j), F (P 1 = i, H = j) and the prior probability K(H = j). In order to simulate the three conditions of the Prisoner’s Dilemma game experiment, we need to satisfy two sets of conditions: one when the player does not know the decision of the first player and another one when the player knows the decisions of the first player. These conditions can be taken from the computation of the full joint probability of the Bayesian Network in Figure 11.2. The full joint probability distribution of a Bayesian Network corresponds to the multiplication of each assignment of a random variable with its parents. More specifically, for a set of random variables X that make up a Bayesian Network, the full joint probability is given by [168]:

n Y P r(X1,...,Xn) = P r(Xi|P arents(Xi)) (11.15) i=1

We can specify the full joint probability of the Bayesian Network in Figure 11.2, through Equa- tion 11.15 by Table 11.1. Note that throughout this work, we will address to risk averse by ra, risk seeking by rs, defect by d and cooperate by c. The unobserved conditions correspond to the third condition of the experiment in which the partici- pant does not know the decision of the first player. That is the same as computing the probability of a participant choosing to defect without further information: P r(P 2 = defect). This results in the following

175 H P1 P2 Pr( H, P1, P2) risk averse defect defect K(H = ra) F (P 1 = d, H = ra) G(P 2 = d, H = ra) risk averse defect cooperate K(H = ra) F (P 1 = d, H = ra) G(P 2 = c, H = ra) risk averse cooperate defect K(H = ra) F (P 1 = c, H = ra) G(P 2 = d, H = ra) risk averse cooperate cooperate K(H = ra) F (P 1 = c, H = ra) G(P 2 = c, H = ra) risk seeking defect defect K(H = rs) F (P 1 = d, H = rs) G(P 2 = d, H = rs) risk seeking defect cooperate K(H = rs) F (P 1 = d, H = rs) G(P 2 = c, H = rs) risk seeking cooperate defect K(H = rs) F (P 1 = c, H = rs) G(P 2 = d, H = rs) risk seeking cooperate cooperate K(H = rs) F (P 1 = c, H = rs) G(P 2 = c, H = rs)

Table 11.1: Full joint probability distribution for the general Bayesian Network from Figure 11.2, which models the Prisoner’s Dilemma game. Note that rs stands for risk seeking, ra for risk averse, d for defect and c for cooperate.

conditions:

P r(P 2 = defect) = P r(H = ra, P 1 = d, P 2 = d) + P r(H = ra, P 1 = c, P 2 = d)+ (11.16) P r(H = rs, P 1 = d, P 2 = d) + P r(H = rs, P 1 = c, P 2 = d)

In the same way, one can specify the probability of the participant choosing to cooperate, P r(P 2 = cooperate), as

P r(P 2 = cooperate) = P r(H = ra, P 1 = d, P 2 = c) + P r(H = ra, P 1 = c, P 2 = c)+ (11.17) P r(H = rs, P 1 = d, P 2 = c) + P r(H = rs, P 1 = c, P 2 = c)

In the same way, using the full joint probability in Table 11.1, one can rewrite Equations 11.16 and 11.17 and set the probabilities P r(P 2 = defect), P r(P 2 = cooperate) to the experimental values of the Pris- oner’s Dilemma game. For the case of the work of [198], then we should guarantee that P r(P 2 = defect) = 0.63, P r(G2 = cooperate) = 0.37.

P r(P 2 = defect) = K(H = ra) F (P 1 = d, H = ra) G(P 2 = d, H = ra)+

+K(H = ra) F (P 1 = c, H = ra) G(P 2 = d, H = ra)+ (11.18) +K(H = rs) F (P 1 = d, H = rs) G(P 2 = d, H = rs)+

+K(H = rs) F (P 1 = c, H = rs) G(P 2 = d, H = rs) = 0.63

P r(P 2 = cooperate) = K(H = ra) F (P 1 = d, H = ra) G(P 2 = d, H = ra)+

+K(H = ra) F (P 1 = c, H = ra) G(P 2 = d, H = ra)+ (11.19) +K(H = rs) F (P 1 = d, H = rs) G(P 2 = d, H = rs)+

+K(H = rs) F (P 1 = d, H = rs) G(P 2 = d, H = rs) = 0.37

Equations 11.18 and 11.19 specify the unobserved conditions for the Prisoner’s Dilemma Game and in order to satisfy them, we need to set their parameters in the following way (note that the following

176 Figure 11.3: Classical Bayesian Network to model the observed conditions for the Prisoner’s Dilemma Game. OutP 1 and P 2 are both random variables that represent the outcome (or decision) of the first player and the decision of the second player. The decisions can either be defect, which is represented by d or cooperate, represented by c. H2 represents a latent (hidden) unmeasurable variable that corre- sponds to the personality of the second player: either risk averse (ra) or risk seeking (rs).

values represent one possible solution, however this solution is not unique).

K(H = rs) = 0.5 F (P 1 = d, H = rs) = 0.1 G(P 2 = d, H = rs) = 0.36 (11.20) K(H = ra) = 0.5 F (P 1 = d, H = ra) = 0.9 G(P 2 = d, H = ra) = 0.9

This means that there is indeed a classical model that explains the paradoxical findings of the Prisoner’s Dilemma Game, which violate the laws of classical probability theory. However, we also need to satisfy the conditions when the player knows the decisions of the first player. For this, we will need to change the model. Consider the Bayesian Network in Figure 11.3, which represents a classical Bayesian Network to model the observed conditions for the Prisoner’s Dilemma Game. In this network, OutP 1 and P 2 are both random variables that represent the outcome (or decision) of the first player and the decision of the second player. H2 represents a latent unmeasurable variable that corresponds to the personality of the second player: either risk averse or risk seeking.

Since the second player will have access to the first player’s decision, then we will need to make add a dependency between the nodes OutP 1 and P 2. Using the same line of thought, since we know the outcome of the first player, then we do not need any dependency between this node and the latent variable (since, for this experimental condition, it will only affect the second player). Using this model, the observed conditions for the Prisoners Dilemma game are given by the probabilities P r(P 2|OutP 1 = defect, H2), when the player is informed that the first player chose to defect and P r(P 2|OutP 1 = cooperate, H2), when the player is informed that the first player chose to cooperate.

177   P r(P 2 = d|P 1 = d) = α [K2(H2 = ra) F 2(OutP 1 = d) G(P 2 = d, OutP 1 = d, H = ra)+  1   +K2(H2 = rs) F 2(OutP 1 = d) G(P 2 = d, OutP 1 = d, H = rs)] = 0.97  P r(P 2 = c|P 1 = d) = α [K2(H2 = ra) F 2(OutP 1 = d) G(P 2 = c, OutP 1 = d, H = ra)+  1   +K2(H2 = rs) F 2(OutP 1 = d) G(P 2 = c, OutP 1 = d, H = rs)] = 0.03 (11.21)

  P r(P 2 = d|P 1 = c) = α [K2(H2 = ra) F 2(OutP 1 = c) G(P 2 = d, OutP 1 = c, H = ra)+  2   +K2(H2 = rs) F 2(OutP 1 = c) G(P 2 = d, OutP 1 = c, H = rs)] = 0.84  P r(P 2 = c|P 1 = c) = α [K2(H2 = ra) F 2(OutP 1 = c) G(P 2 = c, OutP 1 = c, H = ra)+  2   +K2(H2 = rs) F 2(OutP 1 = c) G(P 2 = c, OutP 1 = c, H = rs) = 0.16 (11.22) In Equations 11.21 and 11.22, the variable α is the normalisation factor due to Na¨ıve Bayes indepen- dence assumptions and is defined by:

1 α1 = P P p∈{d,c} h∈{ra,rs} K2(H2 = h) F 2(OutP 1 = d) G(P 2 = p, OutP 1 = d, H = h)

1 α2 = P P p∈{d,c} h∈{ra,rs} K2(H2 = h) F 2(OutP 1 = c) G(P 2 = p, OutP 1 = c, H = h)

In order to satisfy the observed conditions, we need to set their parameters in the following way (note that the following values represent one possible solution, however this solution is not unique).

K2(H = rs) = 0.5 F 2(OutP 1 = d) = 0.9 G2(P 2 = d, OutP 1 = d, H = rs) = 0.97 (11.23) K2(H = ra) = 0.5 F 2(OutP 1 = c) = 0.1 G2(P 2 = d, OutP 1 = c, H = ra) = 0.84

This means that there is also a classical model that explains the observed findings of the Prisoner’s Dilemma Game, which violate the laws of classical probability theory. But is there a single classical model that can accommodate both observed and unobserved conditions? Let’s go back to the Bayesian Network in Figure 11.2 (the one we used to compute the unobserved conditions). In order to satisfy the experimental results for the Prisoner’s Dilemma game, one needs to find values for the parameters K(H = j), F (P 1 = d, H = j) and G(P 2 = d, H = j), for j ∈ {ra, rs}, such that all observed and unobserved conditions are satisfied. Making the calculations, one finds that the only way to satisfy both conditions, would be to set the parameters to:

K(H = rs) = 0.9999 F (P 1 = d, H = rs) = −3442.8 G(P 2 = d, H = rs) = 0.6301 (11.24) K(H = ra) = 0.0001 F (P 1 = d, H = ra) = −3443.8 G(P 2 = d, H = ra) = 0.3699

This is an impossible statement, because F (P 1 = d, H = ra) = −3443.8 is violating the positivity axiom of classical probability theory. Moreover, the parameter G(P 2 = d, H = ra) = 0 does not make

178 Figure 11.4: A general classical Bayesian Network with two latent variables, H1 and H2, to express both unobserved and observed conditions for the Prisoner’s Dilemma game. Random variables P 1U and P 1 represent the first player’s decision according to the unobserved and observed conditions, respectively. Random variables P 2U and P 2 represent the second player’s decision according to the unobserved and observed conditions, respectively. The assignments ra stand for risk averse, rs risk seeking, d defect and c cooperate. sense, because it would imply that a player with a risk averse personality would always choose to cooperate, which is a contradiction. A risk averse player would prefer to defect, which is the action that leads to a higher utility. This shows that in a classical Bayesian Network with latent variables, we can either find the conditional probability tables to accommodate the paradoxical findings of the Prisoner’s Dilemma game or to accommodate the observed conditions. Satisfying both conditions in a single model is not possible.

11.2.2 Increasing the Dimensionality of a Classical Bayesian Network

One can argue that adding another layer of hidden variables might solve the problem at hand and we would be able to simulate both observed and unobserved conditions. Although this line of thought is legitimate, it still does not solve the problem. Consider Figure 11.4, which presents a classical Bayesian Network. We have introduced a latent variable H2 that joins the models that can address the paradoxical findings and the models that can address the observed conditions of the Prisoner’s Dilemma game. This means that by increasing the dimensionally of the model, we can obtain a network that takes into account both observed and unobserved conditions. In Figure 11.4, H1 and H2 are latent variables that express both unobserved and observed conditions for the Prisoner’s Dilemma game. Random variables P 1U and P 1 represent the first player’s decision according to the unobserved and observed conditions, respectively. Random variables P 2U and P 2 represent the second player’s decision according to the unobserved and observed conditions, respectively. The assignments ra stand for risk averse, rs risk seeking, d defect and c cooperate. In order to address all conditions for the Prisoner’s Dilemma game, we need to find the values for parameter φ0 that would lead to the experimental outcomes of Tversky & Shafir [198] work reported in

179 Table 4.5. There are two possible ways to obtain a value for φ0, but they both lead to a contradiction to the decision model.

1. Setting φ = 1, would make P r(P 2 = defect|P 1 = defect) = 0.97, P r(P 2 = defect|P 1 = cooperate) = 0.84 and P r(P 2U = defect) = 0.63. This reflects the observations of Tversky & Shafir [198] experiments, however we have two problems. First, setting a latent variable to a 100% probability goes against its definition, since we would be implying that this hidden variable that affects the player’s decisions is always present. This leads to the second problem. Under such pa- rameterization, we would not be able to justify the probability P r(P 2U = defect) = 0.63, because we would always be under an observed condition (and P2U represents the actions of the second player under unobserved conditions).

2. Setting φ = 2/3, which represents the experimental setup: two experiments for observed condi- tions, and one experiment for the unobserved condition. With this parametrization, the only way to achieve the results for the observed conditions, would be if we know that H2 = obs. That is, only by computing the probability P r(P 2 = defect|P 1 = defect, H2 = obs) we would obtain the experimental values from Tversky & Shafir [198] work. This again is a contradiction, because a latent variable can never be used as a piece of absolute information during an inference process. That is, since it is a hidden variable, this means that we cannot measurable it.

One should also take into account that by adding extra hidden variables to a Bayesian Network model, one is exponentially increasing the complexity of the model. For N binary random variables, if no information is observed, we would need to compute a full joint probability distribution with 2N entries. The Prisoner’s Dilemma game is just a small decision scenario and in order to attempt to accommodate all experimental observations, we required 6 binary random variables, which leads to a full joint probability distribution with 26 = 64 entries to be stored in memory. For more complex decision scenarios, these computations grow exponentially large and the inference process becomes intractable. In the next section, we propose an alternative model based on quantum probability theory that can take into account both observed and unobserved conditions in general and compact way.

11.3 Quantum-Like Bayesian Networks as an Alternative Model

A more recent work from [137] suggested defining the Quantum-Like Bayesian Network in the same manner as a classical Bayesian Network, but replacing real probability numbers by quantum probability amplitudes. In this sense, we can construct a Quantum-Like Bayesian network by applying Born’s rule[65], that is by replacing the classical Full Joint probability distribution and the classical marginal probability distri- bution by quantum complex amplitudes. Then, we just apply the the squared magnitude to the equation. The quantum-like full joint probability distribution is given by Equation 11.25.

180 N 2 Y P r(X ,...,X ) = ψ (11.25) 1 n (Xi|P arents(Xi)) i=1

Just like it is mentioned in the work of [141], the general idea of a Quantum-Like Bayesian network is that, when performing probabilistic inference, the probability amplitudes of each assignment of the network are propagated and influence the probabilities of the remaining nodes, causing quantum inter- ference effects to occur. In other words, every assignment of every node of the network is propagated until the node representing the query variable is reached.

By applying Born’s rule, one can obtain the quantum counterpart of the classical marginal probability distribution. In other words, one can obtain a quantum-like version of the classical exact inference formula in the following way:

N 2 X Y P r(X|e) = α ψ (11.26) (Xx|P arents(Xx),e,y) y x=1

Expanding Equation 11.26, it will lead to the quantum interference formula:

 |Y | N 2  X Y P r(X|e) = α ψ + 2 · Interference  (Xx|P arents(Xx),e,y=i)  i=1 x

|Y |−1 |Y | N N X X Y Y Interference = ψ · ψ · cos(θ − θ ) (Xx|P arents(Xx),e,y=i) (Xx|P arents(Xx),e,y=j) i j i=1 j=i+1 x x (11.27)

In the end, we need to normalise the final scores that are computed to achieve a probability value, because we do not have the constraints of double stochasticity operators. In classical Bayesian infer- ence, normalisation of the inference scores is also necessary due to assumptions made in Bayes rule. The normalisation factor corresponds to α in Equation 11.27.

Note that, in Equation 11.27, if one sets (θi − θj) to π/2, then cos(θi − θj) = 0, which means that the quantum Bayesian Network collapses to its classical counterpart. That is, they can behave in a classical way if one sets the interference term to zero. Moreover, in Equation 11.27, if the Bayesian Network has N binary random variables, we will end up with 2N free quantum θ parameters. Approaches to tune those parameters under a Quantum-Like Bayesian Network approach are still an open research question. In the model of Moreira & Wichert [137, 141], if there are many unobserved nodes in the network, then the levels of uncertainty are very high and the interference effects produce changes in the final likelihoods of the outcomes, making it possible to explain the paradoxical results found in the literature.

The full joint probability distribution of the Quantum-Like Bayesian Network in Figure 11.5 is given by Table 11.2.

Using the Quantum-Like Bayesian Network in Figure 11.5, one can compute the probability of P2 =

181 Figure 11.5: Example of a Quantum-Like Bayesian Network. The terms ψ correspond to quantum probability amplitudes. The variables P 1 and P 2 correspond to random variables representing the first and the second player, respectively.

P1 P2 ψ( P1, P2 )√ √ defect defect ψ(P 1 = d)ψ(P 2 = d|P 1 = d) = √0.5√0.97 = 0.6964 defect cooperate ψ(P 1 = d)ψ(P 2 = c|P 1 = d) = √0.5√0.03 = 0.1225 cooperate defect ψ(P 1 = c)ψ(P 2 = d|P 1 = c) = √0.5√0.84 = 0.6481 cooperate cooperate ψ(P 1 = c)ψ(P 2 = c|P 1 = c) = 0.5 0.16 = 0.2828

Table 11.2: Full joint probability distribution table of the Quantum-Like Bayesian Network in Figure 11.5. defect in the following way:

  X 2 P r(P 2 = defect) = α  |ψ(P 1 = p, P 2 = d)| + 2 |ψ(P 1 = d, P 2 = d)| |ψ(P 1 = c, P 2 = d)| Cos (θd − θc) p∈P 1 (11.28)

= α (ψ(P 1 = d, P 2 = d) + ψ(P 1 = c, P 2 = d) + 2 |ψ(P 1 = d, P 2 = d)| |ψ(P 1 = c, P 2 = d)| Cos (θd − θc)) (11.29)

The quantum interference term θd − θc can either be set manually according to the experimental obser- vations or it can be estimated using the similarity heuristic proposed in [141]. In this work, in order to be fair with the previously presented classical model, we will set the quantum parameter directly according to the experimental setting. That is, the quantum interference parameter is given by θd − θc = 2.8151. Continuing the calculations,

√ √ 2 √ √ 2 P r(P 2 = defect) = α( 0.5 0.97 + 0.5 0.84 + √ √ √ √ (11.30) +2 0.5 0.97 0.5 0.84 Cos (2.8151)) = α 0.05

In the same way, we compute the probability of the second player choosing to cooperate:

  X 2 P r(P 2 = cooperate) = α  |ψ(P 1 = p, P 2 = c)| + 2 |ψ(P 1 = d, P 2 = c)| |ψ(P 1 = c, P 2 = c)| Cos (θd − θc) p∈P 1 (11.31) √ √ 2 √ √ 2 P r(P 2 = cooperate) = α( 0.5 0.03 + 0.5 0.16 + √ √ √ √ (11.32) +2 0.5 0.03 0.5 0.16 Cos (2.8151)) = α 0.0294

Making the calculations we would end up with the probabilities

P r(P 2 = defect) = 0.63 P r(P 2 = cooperate = 0.37, which simulate the results obtained in the work of Tversky & Shafir [198] described in Table 4.5 under

182 unobserved events. If it is known the action of Player 1, then the probability of Player 2 choosing to defect is given by:

P r(P 2 = defect|P 1 = defect) = α |ψ(P 1 = d, P 2 = d)|2 = 0.97

P r(P 2 = defect|P 1 = cooperate) = α |ψ(P 1 = c, P 2 = d)|2 = 0.84

This shows that the Quantum-Like Bayesian Network is a general a suitable model to be applied in scenarios with high levels of uncertainty that violate the laws of classical probability theory, since it can account for both observed and unobserved events without requiring hidden variables. Table 11.3 shows which values the quantum interference parameter must have in order to simulate several works over the literature that report violations to the Sure Thing Principle. A method to estimate these parameters through heuristic functions has been proposed in the work of Moreira & Wichert [141].

Literature θi − θj Pr( P2 = Defect) Pr( P2 = Defect | P1 = Defect) Pr( P2 = Defect | P1 = Cooperate) Tversky & Shafir [198] 2.8151 0.6300 0.9700 0.8400 Li & Taplin [125]b 3.3033 0.7200 0.8200 0.7700 Busemeyer et al. [39] 2.9738 0.6600 0.9100 0.8400 Hristova & Grinberg [86] 2.8255 0.8800 0.9700 0.9300

Table 11.3: Analysis of the quantum θx parameters computed for each work of the literature in order to reproduce the observed and unobserved conditions of the Prisoner’s Dilema Game. b corresponds to the average of all seven experiments reported.

In Table 11.3, the column θi − θj represents the value of the quantum parameters of Equation 11.27 that need to be set in order to explain the paradoxical findings reported in the several works of the literature. The column Pr( P2 = Defect ) corresponds to the probability of the second player choosing the action defect (the unobserved condition). The columns Pr( P2 = Defect | P1 = Defect) and Pr( P2 = Defect | P1 = Cooperate) correspond to the probability of the second player choosing to defect, given that it is known that the first player chose the actions defect and cooperate, respectively (the observed conditions). It is important to note that the Quantum-Like Bayesian Network is just an example of a quantum- like model that is able to express both observed and unobserved conditions of the prisoner’s dilemma game. However, it is not the only model capable of achieving this. We chose this model, because it represents the quantum counterpart of the classical Bayesian Network with latent variables that we are proposing in this work. There are other quantum-like models, which are able to accommodate the paradoxical situations found in the prisoner’s dilemma game. The most representative ones correspond to the application of the quantum dynamical model [44, 164] and the quantum-like approach [105]. The quantum dynamical model takes into account time evolution to express the participants’ beliefs and decisions throughout time using unitary operators [41, 205]. This dynamical representation also enables the simulation of dissonance effects. That is, the participants might have been confronted by some information that conflicted with his/her existing beliefs. In the end, the Quantum Dynamical model shows that quantum probability is a very general framework and can also accommodate both observed and unobserved experimental conditions of the prisoner’s dilemma game. The quantum-like approach [91, 92, 94, 97, 98] is another example of a quantum model that can

183 represent both paradoxical findings and observed conditions in a single model. The quantum-like ap- proach makes use of contexts to model decision scenarios. The context relates to the circumstances that form the setting for an event in terms of which it can be fully understood, clarifying the meaning of the event. More specifically, it is a complex of conditions under which a measurement is performed. For instance, in domains outside of physics, such as cognitive science, one can have mental contexts. In social sciences, we can have a social context. And the same idea is applied to many other domains, such as economics, politics, game theory, biology, etc. These contexts will enable the representation of interferences between quantum states, which will allow the accommodation of the paradoxical findings.

11.4 Summary and Final Discussions

The application of quantum principles to model decision-making scenarios emerged in the scientific lit- erature as a way to explain and understand human behaviour in situations with high levels of uncertainty that lead to the violation of the classical laws of probability theory and logic. However, many researchers are still resistant in accepting the promising advantages of these quantum-like models towards modelling decision making scenarios. Many times it is argued that classical models can simulate these decision scenarios under high levels of uncertainty adding extra variables to the model that are not directly ob- served through data. That is, by including extra latent (or hidden) variables it was believed that the model could represent uncertainty in the same way as in a quantum-like model, despite the complexity of the classical model. In this chapter, we study this classical conception and make a mathematical comparison between a classical Bayesian Network with Latent variables with the quantum-like Bayesian Network previously proposed in the work of Moreira & Wichert [141]. Latent Variables can be defined as variables that are not directly observed from data, but they can be inferred using the information of the variables that were recorded. For a complete dataset and given the full network structure, latent variables can be estimated by simply counting how many times they can be inferred from each assignment of the observed random variables. We also validated these two models against the Prisoner’s Dilemma Game, which is highly mentioned in the Cognitive Psychology domain [198, 55, 125, 39, 86, 54]. This experiment is suitable to validate both classical and quantum-like models, because it violates the classical laws of classical probability theory and, consequently, it cannot be simulated by pure classical models. Experimental results show that, although the classical model with latent variables could explain the paradoxical findings under the Prisoner’s Dilemma game, the same model could not simulate the choice of the player when a piece of evidence was given, that is, when it was known which action the first player chose. This leads to the dilemma: either one creates a classical model just to account for observed evidence or one creates the model just to explain the paradoxical findings. Of course, one could argue that adding another extra Hidden latent variable to the network, one could re-estimate the conditional probability tables in order to account for both observed and unobserved phenomena. However, one must also take into account the exponential increase of complexity of the model. For N binary random

184 variables, if no information is observed, we would need to compute a full joint probability distribution with 2N entries, which means that the number of computations required grow exponentially large and the inference process becomes intractable. On the other hand, it was already shown in previous literature that the Quantum-Like Model can ac- count for both observed and unobserved phenomena in a single model with a low error percentage [141]. Since no extra nodes are incorporated in the network (when compared to the classical latent variable model), then the quantum model has also the advantage of a reduced complexity towards its classical counterpart. Summarising, in this work we conclude that the Quantum-Like Bayesian Network model poses ad- vantages towards the classical model with latent variables, since it can simulate both observed and unobserved phenomena in a single network, whereas the classical model would require extra hidden nodes (contributing to a decrease in efficiency) and cannot simulate both phenomena on the same model.

185 186 Chapter 12

Conclusions

In this chapter, it is presented the final remarks of this work. The emergence of Quantum Cognition can be traced to the time where Niels Bohr suggested that the mathematical principles underlying quantum mechanics were so general that they could be applied in fields outside of physics [150]. However, what truly triggered the emergence of this field were the paradoxical experimental findings of Amos Tversky and Daniel Kahneman [193, 194, 195]. Their findings showed that humans violate the laws of classical probability theory and logic under decision scenarios with high levels of uncertainty. This motivated the development of new and alternative mathematical frameworks, which tried to explain the paradoxical results. It was in the 90’s, with the pioneer work of Aerts & Aerts [7], that the mathematical principles that underly the principles of quantum mechanics started to be used as a framework for human decision- making.

Since then, many works have been proposed and the literature has been mainly dominated by Physics, Economics and Psychology / Social Science communities. This is understandable. The find- ings of Tversky and Kahneman resulted in the development of a set of heuristics for human decision- making, which had a big impact in the psychology community and how people make their choices under uncertainty. Moreover, by showing that humans do not follow the strong normative theories of ratio- nal choices, they also influenced the economics community. Finally, the physics community have the necessary mathematical background to apply the formalisms of quantum mechanics in other fields.

The scientific community, on the other hand, is still very skeptic about the applications of quantum-like approaches in fields such as cognition and decision-making. And for this reason, Quantum Cognition is still hard to be accepted as an alternative mathematical approach for problem-solving. As a computer scientist and engineer, I attempted to bring to the scientific community a new perspective towards the current quantum-like models of the literature and towards this skepticism. I attempted to create a bridge between the fundamental principles of quantum-like models and their applications to more general deci- sion problems by proposing a decision model that is general, predictive and abstracts its users from any quantum background. It is my hope that my different perspective can contribute to this field and open new doors for research, specially towards the application of these quantum-like models in real world problems, such as medical decision-making, finance, etc.

187 Four years ago, after coming across the influential works of Busemeyer et al. [44], Pothos & Buse- meyer [164], this thesis started with the question whether one could apply these quantum-like principles in one of the most powerful structures known by the Computer Science community for deriving proba- bilistic inferences: the Bayesian Networks [116]. We looked over the literature and asked whether we could turn current quantum-like models scalable for more complex decision scenarios and, indeed, the modular network structure of Bayesian Networks contributed for the development of this approach, but with consequences: the exponential growth of quantum interference parameters and the task of setting these parameters in such a way that it is not required to know a priori the outcome of the participants’ choices in a decision scenario. The answer to these questions consisted in the development of a set of heuristics all approached from different perspectives in an attempt to better understand the underlying structures of human cognition, however the lack of high dimensional data in the literature makes it hard to validate these models. This thesis began with a set of questions (Section 1.7.1), which emerged throughout these four years of work and research. We will end this document with the answers that we found.

[ RQ1 ] What is the advantage of the proposed approach?

Many of the models that have been proposed in the literature cannot be considered predictive. Most of these models require a set of quantum parameters to be fitted and, so far, the only way these models have to fit the parameters is to use the final outcome of the experiment to set the parameters in order to explain the experimental outcome. This means that it is required an a priori knowledge of the outcome of the experimental decision scenario in order to set these quantum interference parameters. There is, however, one model in the literature that proposed a static heuristic to compute the quantum interference effects and can be called predictive. This model is the Quantum Prospect Decision Theory, proposed by Yukalov & Sornette [215]. In this work we take a different approach. Since each decision problem is different, we believe that a quantum decision model would benefit from a dynamic heuristic that could take into account the decision problem settings’ and come up with estimations for the quantum interference parameters. In the proposed model, quantum parameters are found based on the correlations that the vectors share between them. These correlations are explored through vector similarities that are computed using the Law of Cosines in a vector space. In this sense, we suggest that the quantum parameters that arise from interference effects might represent some degree of similarity between events. In the end, the proposed model can be seen as a nonparametric method for estimating inference effects from a statistical point of view. It is a statistical model that is simpler than the previous Quantum Dynamical Model [164] and Quantum-Like Approach [108] models proposed in the literature. The method makes use of the principles of Bayesian Networks, in order to obtain a more general and scalable model that can produce competitive results over the current state of the art models. Experimental data demonstrated that the proposed heuristic managed to produce accurate fits to the data, overcoming the previously proposed Quantum Prospect Theory. This suggests that taking into

188 account a dynamic estimation of quantum parameters is a good direction to build quantum-like predictive models.

[ RQ2 ] What is the psychologic interpretation of the proposed quantum-like model?

The proposed quantum-like Bayesian network is a general model that can be applied in a wide range of fields. While reasoning, humans cannot process large amounts of information, because of their lim- ited capacity [90]. In order to reason and process information, humans to it in a modular fashion: they combine small pieces of information in order to reach a final decision. This is exactly what the proposed network does. Since it is not possible to have access to the full joint probability distribution that ex- plains all possible events in the universe, we make use of Na¨ıve Bayes independence assumption and represent knowledge (or beliefs) in a modular basis through conditional independence relationships. This way, each node represents a belief and their respective conditional probability tables represents the decision-maker’s degree of knowledge towards those beliefs. This knowledge is obtained through experience. The core of the proposed model is that, under uncertainty, that is if we do not give any infor- mation to the decision-maker about his beliefs, then they enter into a superposition state. As mentioned in the book of Busemeyer & Bruza [34], this superposition represents the feelings of ambiguity and con- fusion in a person. If we see these beliefs in superposition as waves, then they can interfere with each other generating destructive or constructive effects. They are responsible for the differences between what classical probability theory predicts and the true observed events and they can be interpreted as a semantic categorisation of the symbols that we extract from the the decision problem [159]. In what concerns order effects, we proposed a relativistic interpretation for quantum interferences. When we pose a question to different people, each person will represent its preferred answer in a N-dimensional vector space. However, the individual is not aware in which basis he is making this representation. In other words, each person makes an inference by projecting a point in their personal N-dimensional psychological space, however, the person is not aware in which basis this point is being projected in. So, quantum interference effects, instead of being interpreted as a measure of similarity (or an inner product) between questions, can be interpreted as this lack of knowledge regarding each person’s own basis states and due to uncertainties towards the state world.

[ RQ3 ] Are classical projection models better than quantum-like models?

In this work, we analysed several order effects situations (additive, subtractive, assimilation and contrast effects) using the gallup reports collected in the work of Moore [134]. We showed that we could accom- modate the violations reported in Moore [134] using a quantum-like projection model and a pure clas- sical projection model. In the end, order effects can be explained by both frameworks intuitively, since both models are similar and take advantage of the fact that matrix multiplications are non-commutative. Depending on how one sets the rotation operator, one can simulate any effect reported in Moore [134]. In the end, we argue that the application of these models should be based on Occam’s Razor: in the

189 presence two competing hypothesis, the one that has the fewest assumptions (or the one that is simpler) should be chosen. This depends much on the problem and what knowledge we want to extract from it. If we are mainly focused on a mathematical approach that can perform predictions for order effects, then the classical approach should be used. If, on the other hand, we want to make a model that leverages on theories and interpretations to explain the its predictions, then the quantum model is more indicated.

[ RQ4 ] Can a classical model with hidden variables be used to accommodate violations to the Sure Thing Principle?

The studies and the experiments, which were conducted in this work show that, although the classical models with latent variables could explain the paradoxical findings under the Prisoner’s Dilemma game, the same model could not simulate the choice of the player when a piece of evidence was given, that is, when it was known which action the first player chose. This leads to the dilemma: either one creates a classical model just to account for observed evidence or one creates the model just to explain the paradoxical findings. Of course, one could argue that by adding another extra hidden latent variable to the network, one could re-estimate the conditional probability tables in order to account for both observed and unobserved phenomena. However, one must also take into account the exponential increase of complexity of the model. For N binary random variables, if no information is observed, we would need to compute a full joint probability distribution with 2N entries, which means that the number of computations required grows exponentially large and the inference process becomes intractable. On the other, the proposed quantum-like Bayesian network can account for both observed and un- observed phenomena in a single model with a low error percentage. Since no extra hidden nodes are incorporated in the network (when compared to the classical latent variable model), then the quantum model has also the advantage of a reduced complexity towards its classical counterpart. Note that, with the proposed heuristics, the quantum interference terms are set in polynomial time, which grows slowly when compared to the exponential growth of the classical model. In the end, the quantum-like Bayesian network poses advantages towards the classical model with latent variables, since it can simulate both observed and unobserved phenomena in a single network, whereas the classical model would require extra hidden nodes (contributing to a decrease in efficiency) and cannot simulate both phenomena on the same model.

190 Chapter 13

Future Work

In this chapter, it is presented some ideas, which I am currently working on and new directions of re- search. At the moment, I am interested in exploring the capabilities of quantum-like Bayesian Networks in real world scenarios, such banks and loan applications (Section 13.1). I am also interested in ex- tending the quantum-like Bayesian Network paradigm in a way that its probabilistic inferences influence some given utility function (Section 13.2). Finally, we intend to introduce the quantum-like models to the Neuroeconomics community, which seeks for the unified theory of decision-making by combining information from many different fields (Section 13.3).

13.1 A Quantum-Like Analysis of a Real Life Financial Scenario: The Dutch’s Bank Loan Application

Quantum-Like Bayesian Networks (QLBN) are used in quantum cognition to explain human decision problems. In this work, we apply a QLBN to human decision tasks in the financial domain with the aim to model a real life financial log of a loan application belonging to a bank in the Netherlands. The log is robust in terms of data, containing a total of 262 200 event logs, belonging to 13 087 credit applications. A customer selects a certain amount of money and submits his request to the banks web platform. Some automatic tasks are triggered and it is verified if an application is eligible for credit. The dataset is heterogeneous and consists of a mixture of computer generated automatic processes and manual human tasks. We investigate the capabilities of QLBN in this real life financial scenario in order to not only assess potential areas of improvement of the institutions internal operations, but also to use the information acquired during the analysis of the business process to make predictions about the outcome of certain events related to the loan application. However, this poses some challenging and interesting problems. First, there is the need to process the large amounts of log events and extract the necessary information. Second, a visualisation tool is necessary in order to understand and determine the structure, order and dependencies of each operational task. Third, given a structure, an automatic machine learning algorithm is required in order to learn the conditional probabilities associated to each task given its

191 parents tasks. Only after these steps are completed, it is possible to analyse and perform quantum-like probabilistic inferences and predictions for the data.

In this work, we give primary focus to human tasks, since they are more susceptible to errors. We will also introduce uncertainty by disturbing the learning dataset (making some events unknown) and verify how the Quantum-Like and Classical Bayesian networks predict the data.

13.2 Quantum-Like Influence Diagrams: Incorporating Expected Utility in Quantum-Like Bayesian Networks

It has been established in the literature that quantum-like models provide an alternative way of ex- plaining and accommodating paradoxical findings that are unexplainable through classical probability models [34, 31]Quantum-like models tend to explain the probability distributions in several decision sce- narios where the agent (or the decision-maker) tends to act irrationally. By irrational, we mean that the agent chooses strategies that do not maximize or violate the axioms of expected utility. However, it is not enough to explain these irrational decisions through probability distributions. It would be desirable to use these probability distributions to help us act upon a real world decision scenario. For instance, it is not enough for a doctor to find out the disease of a patient. The doctor needs to decide which treatment to give to the patient, based on the disease and on the patients tolerance towards different medications.Following this line of thought, in this work, we extend the previously Quantum-Like Bayesian Network proposed by Moreira & Wichert [141] by incorporating the framework of expected utility, this way presenting a graphical decision model called Quantum-Like Influence Diagram. Generally speaking, an Influence diagram is a compact graphical representation of a decision scenario, which consists in three types of nodes: random variables (nodes) of a Bayesian Network, action nodes representing a decision that we need to make, and an utility function. The goal is to make a decision, which maximizes the expected utility function by taking into account probabilistic inferences performed on the Bayesian Net- work. However, since influence diagrams are based on classical Bayesian Networks, then they cannot cope with the paradoxical findings reported over the literature. It is the focus of this work to study the implications of incorporating Quantum-Like Bayesian Networks in the context of influence graphs. By doing so, we are introducing quantum interference effects that can disturb the final probability outcomes of a set of actions and affect the final expected utility. We will study how one can use influence diagrams to explain the paradoxical findings of the prisoners dilemma game based on expected utilities. Moreover, since influence diagrams are widely used over the literature (for instance, in finance to determine the net present value of a project), we will also study the implications of using quantum probability inferences in such scenarios where violations of classical probability theory are not evident (or not present).

192 13.3 Neuroeconomics: quantum probabilities towards a unified theory of decision making

A new trend in cognitive science is the Neuroeconomics interdisciplinary field. Neuroeconomics is a research domain that seeks the universal theory of decision-making by combining several different re- search areas: psychology, economics, neuroscience, biology, etc. By combining different fields, we can avoid the limitations that single perspective approaches offer. For instance, psychologists tend to work in circles by creating human behaviour models that were built by observing human behaviour. Researchers in economics tend to assume that humans always make optimum choices. Neuroscientists focus more in electric signals from the brain to explain human behaviour. A combination of different fields is what makes Neuroeconimics a rich and promising direction for research. One of the most discussed research topics in Neuroeconomics is decision-making under risk. Before the revolutionary work of [193], the principle of maximum utility was used to develop economic and deci- sion making models. However, it was shown that humans violate the laws of logic and probability theory in decisions under uncertainty (which is also the scope of this work). So far, the literature connecting quantum cognition to Neuroeconomics is shallow and almost inexistent. The literature in the quantum cognition domain, although it is still in an early stage, have been showing promising applications of quantum probability theory to model decision scenarios that violate the Sure Thing Principle. It is our intention to pursuit in this direction and introduce the quantum cognition field as a promising approach to contribute to the unified theory of decision-making.

193 194 Bibliography

[1] Aaronson, S. (2004), Is quantum mechanics an island in theoryspace, in ‘Proceedings of the Vaxj¨ o¨ Conference Quantum Theory: Reconsideration of Foundations’.

[2] Aaronson, S. (2013), since Democritus, Cambridge University Press.

[3] Accardi, L., Khrennikov, A. & Ohya, M. (2009), ‘Quantum markov model for data from shafir-tversky experiments in cognitive psychology’, Open Systems and Information Dynamics 14, 371–385.

[4] Aerts, D. (1995), ‘Quantum structures: An attempt to explain the origin of their appearence in nature’, International Journal of Theoretical Physics 34, 1–22.

[5] Aerts, D. (2009), ‘Quantum structure in cognition’, Journal of Mathematical Psychology 53, 314– 348.

[6] Aerts, D. (2014), ‘Quantum theory and human perception of the macro-world’, Frontiers in Psychol- ogy 5, 1–19.

[7] Aerts, D. & Aerts, S. (1994), ‘Applications of quantum statistics in psychological studies of decision processes’, Journal of Foundations of Science 1, 85–97.

[8] Aerts, D., Broekaert, J. & Gabora, L. (2011), ‘A case for applying an abstracted quantum formalism to cognition’, New Ideas in Psychology 29, 136–146.

[9] Aerts, D., Broekaert, J. & Smets, S. (2004), ‘A quantum structure description of the liar paradox’, International Journal of Theoretical Physics 38, 3231–3239.

[10] Aerts, D., Geriente, S., Moreira, C. & Sozzo, S. (2017), Testing ambiguity and machina preferences within a quantum-theoretic framework for decision-making.

[11] Aerts, S. (1996), ‘Conditinal probabilities with a quantal and kolmogorovian limit’, International Jour- nal of Theoretical Physics 35, 2245.

[12] Aerts, S. (1998), ‘Interactive probability models: Inverse problems on the sphere’, International Journal of Theoretical Physics 37, 305–309.

[13] Allais, M. (1953), ‘Le comportement de l’homme rationel devant le risque: Critique des postulats et axiomes de l’Ecole´ americaine’, Econometrica 21, 503–546.

195 [14] Anandkumar, A., Hsu, D., Javanmard, A. & Kakade, S. M. (2013), Learning linear bayesian net- works with latent variables, in ‘Proceedings of the 30th International Conference on Machine Learn- ing’.

[15] Asano, M., Basieva, I., Khrennikov, A., Ohya, M. & Tanaka, Y. (2012a), ‘Quantum-like generaliza- tion of the bayesian updating scheme for objective and subjective mental uncertainties’, Journal of Mathematical Psychology 56, 166–175.

[16] Asano, M., Basieva, I., Khrennikov, A., Ohya, M., Tanaka, Y. & Yamato, I. (2012b), ‘Quantum-like model for the adaptive dynamics of the genetic regulation of e. coli’s metabolism of glucose/lactose’, Journal of Systems and Synthetic Biology 6, 1–7.

[17] Asano, M., Basieva, I., Khrennikov, A., Ohya, M., Tanaka, Y. & Yamato, I. (2012c), ‘Quantum- like model of diauxie in escherichia coli: operational description of precultivation effect’, Journal of theoretical Biology 314, 130–137.

[18] Asano, M., Basieva, I., Khrennikov, A., Ohya, M., Tanaka, Y. & Yamato, I. (2012d), A quantum-like model of escherichia coli’s metabolism based on adaptive dynamics, in ‘Proceedings of the 6th International Symposium on Quantum Interactions’.

[19] Asano, M., Basieva, I., Khrennikov, A., Ohya, M., Tanaka, Y. & Yamato, I. (2013), ‘A model of epigenetic evolution based on theory of open quantum systems’, Systems and Synthetic Biology 7, 161–173.

[20] Asano, M., Khrennikov, A., Ohya, M., Tanaka, Y. & Yamato, I. (2015), Quantum Adaptivity in Biology: From Genetics to Cognition, Springer.

[21] Asano, M., Ohya, M. & Khrennikov, A. (2010), ‘Quantum-like model for decision making process in two players game, a non-kolmogorovian model’, Journal of Foundations of Physics 41, 538–548.

[22] Au, W. T., Lu, S., Leung, H., Yam, P. & Fung, J. M. Y. (2011), ‘Risk and prisoner’s dilemma: A reinterpretation of coombs’ re-parameterization’, Journal of Behavioral Decision Making 25, 476- 490.

[23] Baeza-Yates, R. & Ribeiro-Neto, B. (2010), Modern Information Retrieval: The Concepts and Tech- nology Behind Search, Addison Wesley.

[24] Basieva, I., Khrennikov, A., Ohya, M. & Yamato, I. (2011), ‘Quantum-like interference effect in gene expression: glucose-lactose destructive interference’, Systems and Synthetic Biology 5, 59–68.

[25] Bergus, G., Chapman, G., Levy, B., Ely, J. & Oppliger, R. (1998), ‘Clinical diagnosis and the order of information’, Medical Decision Making 18, 412–417.

[26] Birnbaum, M. (2008), ‘New paradoxes of risky decision making’, Psychological Review 115, 463– 501.

196 [27] Blutner, R. & Hochnadel, E. (2010), ‘Two qubits for c.g. jung’s theory of personality’, Cognitive Systems Research 11, 243–259.

[28] Blutner, R., Pothos, E. & Bruza, P. (2013), ‘A quantum probability perspective on borderline vague- ness’, Topics in Cognitive Science 5, 711–736.

[29] Bollen, K. (2002), ‘Latent variable in psychology and the social sciences’, Annual Review in Psy- chology 53, 605–634.

[30] Brandenburger, A. (2010), ‘The relationship between quantum and classical correlation in games’, Games and Economic Behavior 69, 175-183.

[31] Bruza, P., Wang, Z. & Busemeyer, J. (2015), ‘Quantum cognition: a new theoretical approach to psychology’, Trends in Cognitive Sciences 19, 383–393.

[32] Bruza, P., Widdows, D. & Woods, J. (2006), ‘A of down below’, In Handbook of quantum logic, quantum structures, and quantum computation 2, 625–660.

[33] Busemeyer, J. (2015), ‘Cognitive science contributions to decision science’, Cognition 135, 43–46.

[34] Busemeyer, J. & Bruza, P.(2012), Quantum Model of Cognition and Decision, Cambridge University Press.

[35] Busemeyer, J. & Diederich, A. (2010), Cognitive Modeling, SAGE Publications, Inc.

[36] Busemeyer, J. & Trueblood, J. (2009), Comparison of quantum and bayesian inference models, in ‘Proceedings of the 3rd International Symposium on QUantum Interaction’.

[37] Busemeyer, J. & Wang, Z. (2014), ‘Quantum cognition: Key issues and discussion.’, Topics in Cognitive Science 6, 43–46.

[38] Busemeyer, J. & Wang, Z. (2015), ‘What is quantum cognition, and how is it applied to psychology?’, Current Directions in Psychological Science 24, 163–169.

[39] Busemeyer, J., Matthew, M. & Wang, Z. (2006a), A quantum information processing explanation of disjunction effects, in ‘Proceedings of the 28th Annual COnference of the Cognitive Science Society’.

[40] Busemeyer, J., Pothos, E., Franco, R. & Trueblood, J. (2011), ‘A quantum theoretical explanation for probability judgment errors’, Psychology Review 118, 193–218.

[41] Busemeyer, J., Wang, Z. & Lambert-Mogiliansky, A. (2009), ‘Empirical comparison of markov and quantum models of decision making’, Journal of Mathematical Psychology 53, 423–433.

[42] Busemeyer, J., Wang, Z. & Shiffrin, R. (2015a), ‘Bayesian model comparison favors quantum over standard decision theory account of dynamic inconsistencies’, Decision 2, 1–12.

[43] Busemeyer, J., Wang, Z. & Shiffrin, R. (2015b), ‘Bayesian model comparison favours quantum over standard decision theory account of dynamic inconsistency’, Decision 2, 1–12.

197 [44] Busemeyer, J., Wang, Z. & Townsend, J. (2006b), ‘Quantum dynamics of human decision making’, Journal of Mathematical Psychology 50, 220–241.

[45] Busemeyer, J., Wang, Z. & Trueblood, J. (2012), Hierarchical bayesian estimation of quantum de- cision model parameters, in ‘Proceedings of the 6th International Symposium on Quantum Interac- tions’.

[46] Camerer, C. & Weber, M. (1992), ‘Recent advancements in modelling preferences: Uncertainty and ambiguity’, Journal of Risk and Uncertainty 5, 325–370.

[47] Carlson, B. & Yates, J. (1989), ‘Disjunction errors in qualitative likelihood judgment’, Organizational Behavior and Human Decision Processes 44, 368–379.

[48] Cheon, T. & Tahahashi, T. (2010), ‘Interference and inequality in quantum decision theory’, Journal of Physics Letters A 375, 100–104.

[49] Choustova, O. (2009), Quantum-like viewpoint on the complexity and randomness of the financial market, in ‘Coping with the Complexity of Economics’, Springer.

[50] Conte, E. (2008), ‘Testing quantum consciousness’, NeuroQuantology 6, 126–139.

[51] Conte, E., Khrennikov, A., Todarello, O., Federici, A., Mendolicchio, L. & Zbilut, J. (2009), ‘Mental states follow quantum mechanics during perception and cognition of ambiguous figures’, Open Systems and Information Dynamics 16, 1–17.

[52] Conte, E., Khrennikov, A., Todarello, O., Robertis, R. D., Federici, A. & Zbilut, J. (2008), ‘A prelimi- nary experimental verification on the possibility of bell inequality violation in mental states’, Neuro- quantology 6, 214–221.

[53] Conte, E., Todarello, O., Federici, A., Vitiello, F. & Lopane, M. (2004), A preliminary evidence of quantum like bahviour in measurements of mental states, in ‘Proceedings of the 3rd International Conference on Quantum Theory: Reconcideration of Foundations’.

[54] Conte, E., Todarello, O., Federici, A., Vitiello, F., Lopane, M., Khrennikov, A. & Zbilut, J. (2007), ‘Some remarks on an experiment suggesting quantum like behavior of cognitive entities and formu- lation of an abstract quantum mechanical formalism to describe cognitive entity and its dynamics’, Chaos, Solitons and Fractals 31, 1076–1088.

[55] Crosson, R. (1999), ‘The disjunction effect and reason-based choice in games’, Organizational and Human Decision Processes 80, 118–133.

[56] Davisson, C. J. & Germer, L. H. (1928), ‘Reflection of electrons by a crystal of nickel’, Proceedings of the National Academy of Sciences of the United States of America 14, 317-322.

[57] de Barros, A. (2013), Decision making for inconsistent expert judgments using negative probabili- ties, in ‘Proceedings of the 7th International Symposium on Quantum Interactions’.

198 [58] de Barros, J. A. (2012a), Joint probabilities and quantum cognition, in ‘Proceedings of the 6th Amer- ican Institute of Physics Conference Series on Quantum Theory: Reconsideration of Foundations’.

[59] de Barros, J. A. (2012b), ‘Quantum-like model of behavioral response computation using neural oscillators’, BioSystems 110, 171–182.

[60] de Barros, J. A. & Oas, G. (2014), ‘Negative probabilities and counter-factual reasoning in quantum cognition’, Physica Scripta.

[61] de Barros, J. A. & Oas, G. (2015), Some examples of contextuality in physics: Implications to quantum cognition, in ‘Contextuality From Quantum Physics to Psychology’, World Scientific.

[62] de Barros, J. A. & Oas, G. (2017), Quantum cognitoni, neural oscilators, and negative probabilisties, in ‘The Palgrave Handbook of Quantum Models in Social Science’, Palgrave Macmillan UK.

[63] de Barros, J. A. & Suppes, P. (2009), ‘Quantum mechanics, interference, and the brain’, Journal of Mathematical Psychology 53, 306–313.

[64] DeGroot, M. & Schervish, M. (2011), Probability and Statistics, Pearson Education (4th Edition).

[65] Deutsch, D. (1999), Quantum theory of probability and decisions, in ‘Proceedings of the Royal Society A’.

[66] Dikshit, B. (2017), ‘A simple proof og born’s rule for statsistical interpretation of quantum mechan- ics’, Journal for Foundations and Applications of Physics.

[67] Dirac, P. A. (1930), Principles of Quantum Mechanics, Oxford University Press.

[68] Dirac, P. A. (1942), ‘Bakerian lecture - the physical interpretation of quantum mechanics’, Proceed- ings of the Royal Society A 180, 1–40.

[69] Druzdzel, M. J. & Henrion, M. (1993), Intercausal reasoning with uninstantiated ancestor nodes, in ‘Proceedings of the 9th Annual Conference on Uncertainty in Artificial Intelligence’.

[70] Ellsberg, D. (1961), ‘Risk, ambiguity and the savage axioms’, Quaterly Economics 75, 643–669.

[71] Epstein, L. (1999), ‘A definition of uncertainty aversion’, The Review of Economic Studies 66, 579– 608.

[72] Feynman, R., Leighton, R. & Sands, M. (1965), The Feynman Lectures on Physics: Quantum Mechanics, Addison-Wesley.

[73] Franco, R. (2009), ‘The conjunction fallacy and interference effects’, Journal of Mathematical Psy- chology 53, 415–422.

[74] Friedman, M. & Savage, L. (1952), ‘The expected-utility hypothesis and the measurability of utility’, Journal of Political Economy 50, 463–474.

199 [75] Fries, P. (2005), ‘A mechanism for cognitive dynamics: neuronal communication through neuronal coherence’, Trends in Cognitive Sciences 9, 474–480.

[76] Gabora, L. & Aerts, D. (2002), ‘Contextualizing concepts using a mathematical generalization of the quantum formalism’, Experimental & Theoretical Artificial Intelligence 14, 327–358.

[77] Glimcher, P. & Fehr, E., eds (2014), Neuroeconomics: Decision Making and the Brain, Academic Press, Elsvier.

[78] Griffiths, R. (2003), Consistent Quantum Theory, Cambridge University Press.

[79] Griffiths, T., Kemp, C. & Tenenbaum, J. (2008), Bayesian models of inductive learning, in ‘Proceed- ings of the Annual Conference of the Cognitive Science Society’.

[80] Griffiths, T., Steyvers, M. & Tenenbaum, J. (2007), ‘Topics in semantic representation’, Psychologi- cal Review 114, 211–244.

[81] Haven, E. (2005), ‘Pilot-wave theory and financial option pricing’, International Journal of Theoreti- cal Physics 44, 1957–1962.

[82] Haven, E. & Khrennikov, A. (2013), , Cambridge University Press.

[83] Henrion, M. & Fischhoff, B. (1986), ‘Assessing uncertainty in physical constants’, American Physics 54, 791–798.

[84] Hirvensalo, M. (2003), Quantum Computing (Second Edition), Springer.

[85] Hogarth, R. & Einhorn, H. (1992), ‘Order effects in belief updating: The belief adjustment model’, Cognitive Psychology 24, 1–55.

[86] Hristova, E. & Grinberg, M. (2008), Disjunction effect in prisonner’s dilemma: Evidences from an eye-tracking study, in ‘Proceedings of the 30th Annual Conference of the Cognitive Science Society’.

[87] Jung, C. & Pauli, W. (2012), The Interpretation of Nature and the Psyche, Ishi Press.

[88] Kahn, C., Roberts, L., Shaffer, K. & Haddawy, P. (1997), ‘Construction of a bayesian network for mammographic diagnosis of breast cancer’, Computers in Biology and Medicine 27, 19–29.

[89] Kahneman, D. & Tversky, A. (1979), ‘Prospect theory - an analysis of decision under risk’, Econo- metrica 47, 263 – 292.

[90] Kahneman, D., Slovic, P.& Tversky, A. (1982), Judgment Under Uncertainty: Heuristics and Biases, Cambridge University Press.

[91] Khrennikov, A. (1999), ‘Classical and quantum mechanics on information spaces with applications to cognitive psychological, social and anomalous phenomena’, Foundations of Physics 29, 1065– 1098.

200 [92] Khrennikov, A. (2001), ‘Linear representations of probabilistic transformations induced by context transitions’, Journal of Physics A: Mathematical and General 34, 9965–9981.

[93] Khrennikov, A. (2003a), ‘Quantum-like formalism for cognitive measurements’, Journal of BioSys- tems 70, 211–233.

[94] Khrennikov, A. (2003b), ‘Representation of the kolmogorov model having all distinguishing features of quantum probabilistic model’, Physics Letters A 316, 279–296.

[95] Khrennikov, A. (2004a), Information Dynamics in Cognitive, Psychological, Social, and Anomalous Phenomena, Springer.

[96] Khrennikov, A. (2004b), ‘On quantum-like probabilistic structure of mental information’, Journal of Open Systems and Information Dynamics 11, 267–275.

[97] Khrennikov, A. (2005a), From classical statistical model to quantum model through ignorance of information, in ‘Proceedings of the Third Conference on the Foundations of Information Science’.

[98] Khrennikov, A. (2005b), ‘Linear and nonlinear analogues of the schrodinger¨ equation in the contex- tual approach to quantum mechanics’, Journal of Doklady Mathematics 72, 791–794.

[99] Khrennikov, A. (2005c), ‘The principle of supplementarity: A contextual probabilistic viewpoint to complementarity, the interference of probabilities, and the incompatibility of variables in quantum mechanics.’, Journal of Foundations of Physics 35, 1655–1693.

[100] Khrennikov, A. (2005d), ‘Representation of the contextual statistical model by hyperbolic ampli- tudes’, Mathematical Physics 46, 1–13.

[101] Khrennikov, A. (2006), ‘Quantum-like brain: Interference of minds’, Journal of BioSystems 84, 225–241.

[102] Khrennikov, A. (2009a), Classical and quantum-like randomness and the financial market, in ‘Cop- ing with the Complexity of Economics’, Springer.

[103] Khrennikov, A. (2009b), ‘Description of composite quantum systems by means of classical random fields’, Foundations of Physics 40, 1051–1064.

[104] Khrennikov, A. (2009c), Interpretations of Probability, Springer.

[105] Khrennikov, A. (2009d), ‘Quantum-like model of cognitive decision making and information pro- cessing’, BioSystems 95, 179–187.

[106] Khrennikov, A. (2009e), Quantum-like representation of macroscopic configurations, in ‘Proceed- ings of the 3rd International Symposium on Quantum Interactions’.

[107] Khrennikov, A. (2009f), Ubiquitous Quantum Structures: From Psychology to Finance, Springer.

[108] Khrennikov, A. (2010), Contextual Approach to Quantum Formalism, Springer.

201 [109] Khrennikov, A. & Basieva, I. (2014), ‘Possibility to agree on disagree from quantum information and decision making’, Journal of Mathematical Psychology 62-63, 1–15.

[110] Khrennikov, A. & Haven, E. (2009), ‘Quantum mechanics and violations of the sure-thing princi- ple: The use of probability interference and other concepts’, Journal of Mathematical Psychology 53, 378–388.

[111] Khrennikov, A., Basieva, I., Dzhafarov, E. & Busemeyer, J. (2014), ‘Quantum models for psycho- logical measurements: An unsolved problem’, PLOS One 9, 1–8.

[112] Kitto, K. (2014), ‘A contextualised general systems theory’, Systems 2, 541–565.

[113] Kitto, K. & Boschetti, F. (2013a), Attitudes, ideologies and self-organization: information load min- imization in multi-agent decision making, in ‘Proceedings of the 35th Annual Conference of the Cognitive Science Society’.

[114] Kitto, K. & Boschetti, F. (2013b), The effects of personality in a social context, in ‘Proceedings of the 35th Annual Conference of the Cognitive Science Society’.

[115] Kitto, K., Boschetti, F. & Bruza, P. (2012), The quantum inspired modelling of the changing atti- tudes and self-organising societies, in ‘Proceedings of the 6th international Conference on Quantum Interactions’.

[116] Koller, D. & Friedman, N. (2009), Probabilistic Graphical Models: Principles and Techniques, The MIT Press.

[117] Kolmogorov, A. (1933), Foundations of the Probability Theory, Chelsea Publishing Company.

[118] Korb, K. B. & Nicholson, A. E. (2011), Bayesian Artificial Intelligence, CRC Press.

[119] Krumhansl, C. (1978), ‘Concerning the applicability of geometric models to similarity data: The interrelationship between similarity and spatial density’, Psychological Review 85, 445–463.

[120] Kuhberger, A., Komunska, D. & Josef, P. (2001), ‘The disjunction effect: does it exist for two-step gambles?’, Organizational Behavior and Human Decision Processes 85, 250–264.

[121] Kvam, P., Pleskac, T., Yu, S. & Busemeyer, J. (2015), ‘Interference effects of choice on confi- dence: Quantum characteristics of evidence accumulation’, Proceedings of the National Academy of Sciences.

[122] Lambdin, C. & Burdsal, C. (2007), ‘The disjunction effect reexamined: Relevant methodological issues and the fallacy of unspecified percentage comparisons’, Organizational Behavior and Human Decision Processes 103, 268–276.

[123] Lee, M. D. & Vanpaemel, W. (2013), ‘Quantum models of cognition as orwellian newspeak’, Be- havioral and Brain Sciences 36, 295–296.

202 [124] Leifer, M. & Poulin, D. (2008), ‘Quantum graphical models and belief propagation’, Annals of Physics Journal 323, 1899–1946. eprint arxiv-ph/0708.1337 [quant-ph].

[125] Li, S. & Taplin, J. (2002), ‘Examining whether there is a disjunction effect in prisoner’s dilemma game’, Chinese Journal of Psychology 44, 25–46.

[126] Limar, I. (2012), ‘A version of carl jung’s synchronicity in the event of correlation of mental pro- cesses in the past and the future: Possible role of quantum entanglment in quantum vaccum’, Journal of NeuroQuantology 10, 1–3.

[127] Lindorff, D. (2004), Pauli and Jung: The Meeting of Two Great Minds, Quest Books.

[128] Lucas, P. & van der Gaag, L. (1991), Principles of Expert Systems, Addison Wesley.

[129] Machina, M. (2009), ‘Risk, ambiguity, and the rank-dependence axioms’, Journal of Economic Review 99, 385–392.

[130] Martin, F., Carminati, F. & Carminati, G. G. (2009), ‘Synchronicity, quantum information and the psyche’, Cosmology 3, 580–589.

[131] Mart´ınez-Mart´ınez, I. & Sanchez-Burillo,´ E. (2016), ‘Quantum stochastic walks on networks for decision-making’, Scientific Reports.

[132] McKenzie, C., Lee, S. & Chen, K. (2002), ‘When negative evidence increases confidence: Change n belief after hearing two sides of dispute’, Behavioural Decision Making 15, 1–18.

[133] Melucci, M. (2015), Introduction to Information Retrieval and Quantum Mechanics, Springer.

[134] Moore, D. (2002), ‘Measuring new types of question-order effects: Additive and subtractive’, Public Opinion Quarterly 66, 80–91.

[135] Moreira, C. (2017), Quantum-like influence diagrams: Incorporating expected utility in quantum- like bayesian networks, in ‘International Symposium Worlds of Entanglement’.

[136] Moreira, C. & Wichert, A. (2013), ‘Finding academic experts on a multisensor approach using shannon’s entropy’, Expert Systems with Applications 40, 5740–5754.

[137] Moreira, C. & Wichert, A. (2014), ‘Interference effects in quantum belief networks’, Applied Soft Computing 25, 64–85.

[138] Moreira, C. & Wichert, A. (2015a), Quantum-like bayesian networks using feynmans´ path diagram rules, in ‘Proceedings of the 16th Vaxj¨ o¨ Conference on Quantum Theory: from Foundations to Technologies’.

[139] Moreira, C. & Wichert, A. (2015b), The relation between acausality and interference in quantum- like bayesian networks, in ‘Proceedings of the 9th International Conference on Quantum Interac- tions’.

203 [140] Moreira, C. & Wichert, A. (2015c), ‘The synchronicity principle under quantum probabilistic infer- ences’, NeuroQuantology 13, 111–133.

[141] Moreira, C. & Wichert, A. (2016a), ‘Quantum-like bayesian networks for modeling decision mak- ing’, Frontiers in Psychology.

[142] Moreira, C. & Wichert, A. (2016b), ‘Quantum probabilistic models revisited: the case of disjunction effects in cognition’, Frontiers in Physics: Interdisciplinary Physics 4, 1–26.

[143] Moreira, C. & Wichert, A. (2016c), When to use quantum probabilities in quantum cognition? a discussion, in ‘Proceedings of the 12th Biennial International Quantum Structures Association Conference’.

[144] Moreira, C. & Wichert, A. (2017a), ‘Are quantum models for order effects quantum?’, International Journal of Theoretical Physics.

[145] Moreira, C. & Wichert, A. (2017b), ‘Exploring the relations between quantum-like bayesian net- works and decision-making tasks with regard to face stimuli’, Journal of Mathematical Psychology 78, 86–95.

[146] Moreira, C., Haven, E., Sozzo, S. & Wichert, A. (2017), A quantum-like analysis of a real financial scenario: The dutch bank loan application, in ‘Proceedings of the 13th Econophysics Colloquium’.

[147] Morier, D. & Borgida, E. (1984), ‘The conjunction fallacy: A task specific phenomena?’, Personality and Social Psychology Bulletin 10, 243–252.

[148] Mura, P. L. (2005), ‘Correlated equilibria of classical strategic games with quantum signals’, Inter- national Journal of Quantum Information 3, 183–188.

[149] Mura, P. L. (2009), ‘Projective expected utility’, Journal of Mathematical Psychology 53, 408–414.

[150] Murdoch, D. (1989), Niels Bohr’s Philosophy of Physics, Cambridge University Press.

[151] Murphy, K. (2012), Machine Learning: A Probabilistic Perspective, MIT.

[152] Nielsen, M. A. & Chuang, I. L. (2000), Quantum Computation and Quantum Information, Cam- bridge University Press.

[153] Niestegge, G. (2008), ‘An approach to quantum mechanics via conditional probabilities’, Founda- tions of Physics 38, 241–256.

[154] Nyman, P. (2010), ‘On consistency of the quantum-like representation algorithm.’, Journal of The- oretical Physics 49, 1–9.

[155] Nyman, P. (2011a), ‘On hyperbolic interferences in the quantum-like representation algorithm for the case of triple valued observables’, Journal of Foundations of Physics.

[156] Nyman, P.(2011b), ‘On the consistency of the quantum-like representation algorithm for hyperbolic interference’, Journal of Advances in Applied Clifford Algebras 21, 799–811.

204 [157] Nyman, P. & Basieva, I. (2011a), ‘Quantum-like representation algorithm for trichotomous observ- ables’, Journal of Theoretical Physics 50, 3864–3881.

[158] Nyman, P. & Basieva, I. (2011b), Representation of probabilistic data by complex probability am- plitudes - the case of triple-valued observables., in ‘Proceedings of the International Conference on Advances in Quantum Theory’.

[159] Osherson, D. (1995), Thinking, MIT Press.

[160] Pearl, J. (1986), ‘Fusion, propagation, and structuring in belief networks’, Journal of Artificial Intelligence 29, 241–288.

[161] Pearl, J. (1988), Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann Publishers.

[162] Peres, A. (1998), Quantum Theory: Concepts and Methods, Kluwer Academic.

[163] Peto, R. & Darby, S. (2000), ‘Smoking, smoking cessation, and lung cancer in the uk since 1950: combination of national statistics with two case-control studies’, British Medical Journal 7257, 323– 329.

[164] Pothos, E. & Busemeyer, J. (2009), ‘A quantum probability explanation for violations of rational decision theory’, Proceedings of the Royal Society B 276, 2171–2178.

[165] Pothos, E. & Busemeyer, J. (2013), ‘Can quantum probability provide a new direction for cognitive modeling?’, Brain and Behavioral Science 36, 255–327.

[166] Pothos, E., Busemeyer, J. & Trueblood, J. (2013), ‘A quantum geometric model of similarity’, Psychological Review 120, 679–696.

[167] Rieffel, E. & Polak, W. (2011), Quantum Computing: A Gentle Introduction, MIT Press.

[168] Russel, S. & Norvig, P. (2010), Artificial Intelligence: A Modern Approach, Pearson Education (3rd Edition).

[169] Sahami, M., Dumais, S., Heckerman, D. & Horvitz, E. (1998), A bayesian approach to filtering junk e-mail, in ‘Proceedings of the 15th National Conference on Artificial Intelligence, Workshop on Learning for Text Categorization’.

[170] Savage, L. (1954), The Foundations of Statistics, John Wiley.

[171] Shafer, G. (1976), A Mathematical Theory of Evidence, Princeton University Press.

[172] Shafir, E. & Tversky, A. (1992), ‘Thinking through uncertainty: nonconsequential reasoning and choice’, Cognitive Psychology 24, 449–474.

[173] Shah, A. & Oppenheimer, D. (2008), ‘Heuristics made easy: an effort-reduction framework.’, Psy- chological Bulletin 134, 207–222.

205 [174] Shanteau, J. (1970), ‘An additive model for sequential decision making’, Journal of Experimental Psychology 85, 181–191.

[175] Shuman, H. & Pressler, S. (1981), Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Content, New York Academic Press.

[176] Sides, A., Osherson, D., Bonini, N. & Viale, R. (2002), ‘On the reality of the conjunction fallacy’, Memory and Cognition 30, 191–198.

[177] Simon, H. (1955), ‘A behavioral model of rational choice’, Quarterly Journal of Economics 69, 99– 118.

[178] Sinha, U., Couteau, C., Jennewein, T., Laflamme, R. & Weihs, G. (2010), ‘Ruling out multi-order interference in quantum mechanics’, Science 329, 418–421.

[179] Sloman, S. (2014), ‘Comments on quantum probability theory’, Topics in Cognitive Science 6, 47– 52.

[180] Soros, G. (1987), The Alchemy of Finance: Reading the Mind of the Market, John Wiley.

[181] Sterman, J. (1989), ‘Deterministic chaos in an experimental economic system’, Economic Behav- ior & Organization 12, 1–28.

[182] Sudman, S. & Bradburn, N. (1974), Response Effects in Surveys: A Review and Synthesis, Aldine Publishers Co.

[183] Suppes, P. & Zanotti, M. (1981), ‘When are probabilistic explanations possible?’, Synthese.

[184] Tentori, K. & Crupi, V. (2013), ‘Why quantum probability does not explain the conjunction fallacy’, Behavioral and Brain Sciences. The Statistical Interpretation of Quantum Mechanics

[185] The Statistical Interpretation of Quantum Mechanics (1954), Nobel Lecture.

[186] Townsend, J., Silva, K., Spencer-Smith, J. & Wenger, M. (2000), ‘Exploring the relations between categorization and decision making with regard to realistic face stimuli’, Pragmatics and Cognition 8, 83 – 105.

[187] Trueblood, J. & Busemeyer, J. (2011a), ‘A comparison of the belief-adjustment model and the quantum inference model as explanations of order effects in human inference’, Cognitive Science.

[188] Trueblood, J. & Busemeyer, J. (2011b), ‘A quantum probability account of order of effects in infer- ence’, Cognitive Science 35, 1518 – 1552.

[189] Trueblood, J. & Busemeyer, J. (2012), ‘A quantum probability model of causal reasoning’, Frontiers in Psychology.

[190] Trueblood, J., Pothos, E. & Busemeyer, J. (2014), ‘Quantum probability theory as a common framework for reasoning and similarity’, Frontiers in Psychology.

206 [191] Tucci, R. (1995), ‘Quantum bayesian nets’, International Journal of Modern Physics B 9, 295–337.

[192] Tversky, A. & Fox, C. (1995), ‘Weighting risk under uncertainty’, Psychological Review 102, 269– 283.

[193] Tversky, A. & Kahneman, D. (1974), ‘Judgment under uncertainty: Heuristics and biases’, Science 185, 1124–1131.

[194] Tversky, A. & Kahneman, D. (1977), Causal schemata in judgments under uncertainty, Technical report, Cybernetics Technology Office of the Defense Advanced Research Projects Agency.

[195] Tversky, A. & Kahneman, D. (1981), ‘The framing of decisions and the psychology of choice’, Science 211, 453–458.

[196] Tversky, A. & Kahneman, D. (1983), ‘Extension versus intuitive reasoning: The conjunction fallacy in probability judgment’, Psychological Review 90, 293–315.

[197] Tversky, A. & Kahneman, D. (1986), ‘Rational choice and the framing of decisions’, The Journal of Business 59, 251–278.

[198] Tversky, A. & Shafir, E. (1992), ‘The disjunction effect in choice under uncertainty’, Psychological Science 3, 305–309.

[199] van Rijsbergen, C. J. (2004), The Geometry of Information Retrieval, Cambridge University Press.

[200] Vassilieva, E., Pinto, G., de Barros, J. A. & Suppes, P. (2011), ‘Learning pattern recognition through quasi-synchronization of phase oscillators’, IEEE Transactions on Neural Networks.

[201] von Neumann, J. & Morgenstern, O. (1953), Theory of Games and Economic Behavior, Princeton University Press.

[202] Wang, Z. & Busemeyer, J. (2013), ‘A quantum question order model supported by empirical tests of an apriori and precise prediction’, Journal of Topics in Cognitive Science 5, 689–710.

[203] Wang, Z. & Busemeyer, J. (2016a), ‘Comparing quantum versus markov random walk mod- els of judgments measured by rating scales’, Philosophical Transactions of the Royal Society A 374, 20150098.

[204] Wang, Z. & Busemeyer, J. (2016b), ‘Interference effects of categorization on decision making’, Cognition 150, 133–149.

[205] Wang, Z. & Busemeyer, J. (2016c), ‘Interference effects of categorization on decision making’, Cognition 150, 133–149.

[206] Wang, Z., Solloway, T., Shiffrin, R. & Busemeyer, J. (2014), ‘Context effects produced by ques- tion orders reveal quantum nature of human judgments’, Proceedings of the National Academy of Sciences 111, 9431–436.

207 [207] Wichert, A. (2013), Principles of Quantum Artificial Intelligence, World Scientific Publishing.

[208] Yearsley, J. & Busemeyer, J. (in press), ‘Quantum cognition and decision theories: A tutorial’, Journal of Mathematical Psychology.

[209] Yearsley, J. & Pothos, E. (2014), ‘Challenging the classical notion of time in cognition: a quantum perspective’, Proceedings of the Royal B 281, 20133056–8.

[210] Yukalov, V. & Sornette, D. (2008), ‘Quantum decision theory as quantum theory of measurement’, Physics Letters A 372, 6867–6871.

[211] Yukalov, V. & Sornette, D. (2009a), ‘Physics of risk and uncertainty in quantum decision making. 71,’, The European Physical Journal B 71, 533–548.

[212] Yukalov, V. & Sornette, D. (2009b), ‘Processing information in quantum decision theory.’, Entropy 11, 1073–1120.

[213] Yukalov, V. & Sornette, D. (2010a), ‘Entanglement production in quantum decision making’, Physics of Atomic Nuclei 73, 559–562.

[214] Yukalov, V. & Sornette, D. (2010b), ‘Mathematical structure of quantum decision theory’, Advances in Complex Systems 13, 659–698.

[215] Yukalov, V. & Sornette, D. (2011), ‘Decision theory with prospect interference and entanglement’, Theory and Decision 70, 283–328.

[216] Zadeh, L. (2006), ‘Generalized theory of uncertainty (gtu) - principal concepts and ideas’, Com- putational Statistics and Data Analysis 51, 15–46.

[217] Zhang, J. (2002), ‘Subjective ambiguity, expected utility and choquet expected utility’, Economic Theory 20, 159–181.

[218] Zou, M. & Conzen, S. (2005), ‘A new dynamic bayesian network (dbn) approach for identifying gene regulatory networks from time course microarray data’, Bioinformatics 21, 71–79.

[219] Zurek, W. (2005), ‘Probabilities from entanglement, born’s rule from envariance’, Physical Review A 71, 1–29.

[220] Zurek, W. (2011), ‘Entanglement symmetry, amplitudes, and probabilities: Inverting born’s rule’, Physical Review Letters 106, 1–5.

208