A Quantum Interference Inspired Neural Matching Model for Ad-hoc Retrieval Yongyu Jiang, Peng Zhang*, Hui Gao Dawei Song College of Intelligence and Computing School of Computer Science and Technology Tianjin University Beijing Institute of Technology Tianjin, China Beijing, China [email protected] [email protected] ABSTRACT An essential task of information retrieval (IR) is to compute the of relevance of a document given a query. If we regard a query term or n-gram fragment as a relevance matching unit, most retrieval models firstly calculate the relevance evidence between the given query and the candidate document separately, and then accumulate these evidences as the final document relevance pre- diction. This kind of approach obeys the the classical probability, (a) probability of relevance (b) classical probabilistic model which is not fully consistent with human cognitive rules in the ac- tual retrieval process, due to the possible existence of interference effect between relevance matching units. In our work, we propose a Quantum Interference inspired Neural Matching model (QINM), which can apply the interference effects to guide the construction of additional evidence generated by the interaction between matching units in the retrieval process. Experimental results on two bench- mark collections demonstrate that our approach outperforms the quantum-inspired retrieval models, and some well-known neural (c) dependency-based model (d) neural matching model retrieval models in the ad-hoc retrieval task.

CCS CONCEPTS Figure 1: Illustrative examples for different retrieval mod- els for document relevance judgments. 푄 represents the user • Information systems → Retrieval models and ranking. query which is composed of two terms 푞1 and 푞2, 푢푖 denotes KEYWORDS a query-document matching unit and 푅퐷 represents the rel- evance probability of document 퐷. Information Retrieval, Neural Matching Models, Quantum Interfer- ence, Learning-to-Rank ACM Reference Format: the current query) [25]. One of the essential steps is how to calcu- Yongyu Jiang, Peng Zhang*, Hui Gao and Dawei Song. 2020. A Quantum late the probability of relevance of a candidate document based on Interference Inspired Neural Matching Model for Ad-hoc Retrieval. In Pro- a user query (see Figure 1 (a)). Some classical probabilistic models ceedings of the 43rd International ACM SIGIR Conference on Research and (e.g., BM25 [26] and the binary independence model (BIM) [28]) Development in Information Retrieval (SIGIR ’20), July 25–30, 2020, Virtual make an assumption that each query term is independent. They Event, China. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/ firstly calculate the co-occurrence information between each query 3397271.3401070 term and the candidate document as the relevance evidence sepa- rately, and then accumulate these evidences as the final relevance 1 INTRODUCTION probability prediction. From the description in Figure 1 (b), we can The aim of a IR system is to find the optimum retrieval mechanism, find that the above process actually obeys the classical law oftotal which is achieved when candidate documents are ranked according probability (LTP). However, this independence assumption ignores to decreasing values of the probability of relevance (with respect to the dependencies between terms, which plays a crucial role in the documents relevance judgment [11, 28]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed Many dependency-based models [8, 10, 18, 27] has focused on for profit or commercial advantage and that copies bear this notice and the full citation modeling the term dependencies in the retrieval process. Typically, on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or Metzler and Croft [18] adopts Markov random field (MRF) to repre- republish, to post on servers or to redistribute to lists, requires prior specific permission sent three variants (i.e., occurrences of single terms, ordered phrases, and/or a fee. Request permissions from [email protected]. and unordered phrases) for capturing different dependencies be- SIGIR ’20, July 25–30, 2020, Virtual Event, China tween query terms. The final document relevance prediction of this © 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-8016-4/20/07...$15.00 https://doi.org/10.1145/3397271.3401070 ∗The corresponding author. method is obtained by weighted summation of these three variables Convolution Network and Query Attention mechanism, we select (see an example in Figure 1(c) for a combination of 푞1 and 푞2 as a the effective matching features in the operator. Finally, the ranking kind of dependency). Compared with classical probabilistic models, score is calculated by the Multi-Layer Perceptron (MLP). this idea calculates the co-occurrence information of phrase and Evaluation results on a series of systematic experiments show applies it as additional evidences for the prediction of document that the proposed QINM performs well on two TREC collections, relevance. However, different kinds of relevance evidences are still Robust-04 and ClueWeb-09-Cat-B. Our major contributions are calculated separately. By considering this process as described by summarized are as follows: the probability theory, dependency-based models still conforms to 1. We analyze the neural matching model in a probabilistic the classical probability. relevance point of view, and show that a typical neural model With the development of deep learning technology, the IR task is consistent with the classical law of total probability (LTP). can be formalized as a text matching task [6], and many neural 2. Using the projection measurement, we show that there is an matching models have been proposed, such as MP [22], Conv- interference term in the process of measuring the probability KNRM [9], DRMM [13] and MIX [6]. These models usually regard of relevance, which can violate the LTP. a term or n-gram fragment embedding as a query-document match- 3. We further propose a Quantum Interference inspired Neural ing unit. They calculate the relevance matching evidences of a docu- Matching model (QINM), which can model the interference ment for each matching unit separately, and then accumulate these effect in the neural matching method. Systematic evaluation relevance evidences for a final relevance prediction, as shown in Fig- also shows the effectiveness of this proposed method. ure 1(d). The idea of calculating the relevance evidence separately is similar to the aforementioned probabilistic and dependency-based 2 RELATED WORK models, so that we investigate whether or not neural matching The proposed QINM formulates the quantum interference in a models are consistent with the classical probability. neural matching model. In this part, we summarize the related In our work, we first re-formalize a representative neural match- work of neural matching model and quantum-inspired retrieval ing model (i.e., DRMM) in a probabilistic form, and find that its model in Section 2.1 and Section 2.2, respectively. matching idea dose obey the classical law of total probability (LTP). The main reason is that for different matching units, the relevance 2.1 Neural Matching Models evidences are calculated separately. However, if we re-visit Fig- ure 1(a)) and consider the human relevance judgement, the judge- Neural matching models can be categorized as representation-based ment process is regarding each query 푄 as a whole, rather than models and interaction-based models, according to their architec- tures [14, 23]. The representation-based models map a single doc- treating each query term 푞푖 (or matching unit) separately. Some research [1, 2, 7, 32] has shown that the human cognition laws in ument to a low-dimensional semantic space by neural network the real decision-making do not conform to the classical probability, (e.g., CNN, RNN or self-attention mechanism [17]), and calculate its due to the quantum-like interference effects. In the next section, distance from the query representation [15]. This kind of model is by using the projection measurement in , we mainly concerned with semantic matching, and is highly dependent will show that the LTP is violated by an interference term in the on the contextual representations of individual tokens. However, process of calculating the probability of relevance. our work dose not focus on exploring the contextual representation, In IR literature, there are many work inspired by quantum me- but on the essential relevance judgment process of IR. chanics. Sordoni et al. [27] propose a quantum language model The interaction-based models construct the matching informa- (QLM) for mapping the dependencies between words or phrases in tion (e.g., similarity) through the local interaction between a query a single text into a , but do not take into account the and a document, and then calculate the matching degree through interference effects. Zuccon and Azzopardi [38] propose a quantum the neural network. This kind of model is mainly about the rele- probability ranking principle (QPRP), which encodes quantum inter- vance matching, which is more suitable for ad-hoc retrieval. Ex- ference effects, but just explore how the user’s document relevance amples include DRMM [13], K-NRM [33], and Conv-KNRM [9]. judgment process is affected by previously retrieved documents. Compared with the representation-based model, it is more easier to Wang et al. [30] aim to explore and model the quantum interference re-formalize the interaction-based model to explicitly analyze the effects in users’ relevance judgment caused by the presentation relevance judgment process. The idea of this paper is to introduce order of documents, but just carry out a user study. To our knowl- quantum interference theory to construct a neural retrieval model edge, existing work has not introduced interference effects in neural that could be more in line with the human relevance judgment matching models. process. In order to model the interference effects in neural matching mod- els, we propose a Quantum Interference inspired Neural Matching 2.2 Quantum-inspired Retrieval Model model (QINM), which can effectively construct additional matching van Rijsbergen for the first time proposes that the Quantum Theory features provided by interference between matching units. QINM (QT) can be used to axiomatize the geometric, probabilistic and regards a query and its candidate document as a quantum sub- logic-based IR models within a single mathematical formalism in system defined in the vector space, construct a query-document complex Hilbert vector spaces [29], and various key concepts from composite system, and then encodes the probability distribution quantum mechanics find their analogy in the IR field. Following of a document into the reduced density operator, which is a key this pioneering work, on the one hand, a growing body of liter- step in modeling interference effects. Through a N-gram Window ature [1, 2, 7, 32] study the mechanism of quantum interference in the user retrieval process. On the other hand, the study of a the relevance score of the document to a query unit 푞푖 by a neural quantum language model (QLM) has been carried out in [27] as network, which can be expressed as a function 푓 (푞푖, 퐷); the second aforementioned. step is to apply the term gating network that produces a weight (e.g., The interference phenomenon in IR has been studied by dy- IDF value) for each query unit, and the final document relevance namic relevance judgment [24] and topic interference [31]. Zhang score is the weighted sum of all relevance scores. et al. [24] adopt the quantum finite automaton (QFA), to represent the transition of the measurement states (the relevance degrees of Relevance Score the document judged by users) and dynamically model the cogni- Score Aggregation tive interference of users when they are judging a list of documents. Wang et al. [31] show that the relevance of a topic to a document is greatly affected by the companion topic’s relevance, and the degree Term Gating ...... Network of the impact differs with respect to different companion topics. Feed Forward They think that the judgment of a document may be only interfered Network ...... with by a different reference point of another topic’s relevance ...... degrees. However, the above works are only limited to the user’s cognitive level in the retrieval process, and have not introduced the Matching Histogram interference effect into the IR modeling process. After QLM [27] provides a general modeling idea for the de- Local Interaction velopment of quantum theory in the field of IR, Zhang et al.[36] propose a neural network based QLM (NNQLM), which uses two Query Q ... Document D different approaches to density matrices optimization and learn- ing architecture. Zhang et al. [37] establish a quantum many-body language modeling (QMWF-LM), based on the funda- mental connection between the quantum theory and convolutional Figure 2: Illustrative matching example of DRMM. neural network (CNN). However, these two models do not consider interference effects in relevance judgment process. In addition, We can regard the normalized 푓 (푞푖, 퐷) as the conditional prob- these two models are only applied on the question answering task, ability of document relevance under the 푖푡ℎ query unit, that is, rather than the ad-hoc retrieval, where the relevance estimation is Í푛 푃 (푅퐷 |푞푖 ) = 푒푥푝(푓 (푞푖, 퐷))/ 푗=1 푒푥푝(푓 (푞푗, 퐷)). Provided by the more important. In this paper, we will design a new neural retrieval term gating network, the global weight information can be con- model and integrate the interference effect in the modeling process sidered as 푃 (푞푖 ). Without loss of generality, we set 푛 to 2 for an of the relevance estimation. example, and the process of calculating the final relevance score can be denoted as the following probability formula: 3 ANALYSIS OF INTERFERENCE EFFECTS IN 푃 (푅퐷 ) = 푃 (푞1, 푅퐷 ) + 푃 (푞2, 푅퐷 ) RELEVANCE PROBABILITY ESTIMATION (1) = 푃 (푞1)푃 (푅퐷 |푞1) + 푃 (푞2)푃 (푅퐷 |푞2) In this section, based on the probability theory, we firstly analyze and re-formalize a representative neural matching model. Next, Therefore, the matching idea of DRMM obeys the classical law of we describe the methodology of how to explore the existence of total probability (LTP). the quantum interference in the process of document relevance judgment and how to interpret such an interference effects based 3.2 Basic Concepts of Quantum Mechanics on projection measurement method in quantum mechanics. In this section, we introduce the basic knowledge of quantum me- We assume that the event 푅퐷 represents that the document chanics in order to analyze the existence of the quantum interfer- 퐷 is judged as "relevant" with query 푄, and we are interested in ence in the process of document relevance judgment. ( ) estimating the probability 푃 푅퐷 . A query and a document can 3.2.1 and Density Operator. A quantum system = { } = { } be represented by sets 푄 푞1, 푞2, ..., 푞푛 and 퐷 푑1,푑2, ...,푑푚 , usually is represented by the state vector in a H푛, 푡ℎ 1 푡ℎ where 푞푖 and 푑푗 represent the 푖 query unit and the 푗 document and we usually limit the problem in the real space (denoted as unit, respectively. R푛) for practical reasons. Given a set of orthonormal basis vectors 풆풊 (푖 = 1, ..., 푛), the state vector 횿 can be represented as: 3.1 Re-Formalization of DRMM 푛 Õ In this section, we would take DRMM [13] for an example (as shown 횿 = 푎푖 풆푖 (2) in Figure 2), and apply the probability theory to analyze the neural 푖=1 matching ideas. The DRMM can be divided into two steps: the first where 횿 is superposition state and 푎푖 is probability amplitudes, since step is to generate 푛 matching histograms from local interaction 2 Í푛 2 푎푖 represents a probability of the sum to 1 (i.e., 푖 |푎푖 | = 1). (i.e., Cosine similarity) between each pair of matching units from a For modeling the uncertainty of states, a density matrix 휌 can query and a document, and each matching histogram can calculate be defined as: 푛 1 Õ 푇 Generally speaking, a query unit can be a term, or a term combination, or n-gram 흆 = 푏푖 횿푖 (횿푖 ) (3) fragment, subject the the context. 푖=1 ( ) ( ) = || ||2 where 횿푖 is a pure state vector with probability 푏푖 , and the outer 푃 푞1 can be expressed as 푃 푞1 Π풒1 푸 , where the projec- ( )푇 ( ) → product 횿푖 횿푖 with rank-one can represent a quantum elemen- tion Π풒1 푸 : 푸 풒1 means that the query vector 푸 projects to tary event. 흆 is symmetric, positive semi-definite, and of trace 1 basis 풒1, and 푃 (푞1) means the weight of the first query matching (푡푟 (흆) = 1). If the rank of the density operator is greater than one, unit in the query. Neural matching models calculate relevance de- the corresponding state is called a mixed state. grees by the local interacting process between each query matching unit with the entire document. As shown in Figure 4 (b), we can 3.2.2 Projection Measurement. Quantum probability theory is de- model the above process as a conditional probability 푃 (푅 |푞 ) = fined by von Neumann [29] with the projective geometric, and 퐷 1 ||Π 풒 ||2, where the projection (Π 풒 ) : 풒 → 푹 means that the probability space is naturally defined in a vector space. Below 푹퐷 1 푹퐷 1 1 퐷 the query unit 풒 projects to basis 푹 . we will introduce a common method of quantum measurement, 1 퐷 According to the introduction of Section 3.1, the matching model that is, projection measurement, to calculate the probability of the DRMM assumes that each query unit is independent and its im- occurrence of a state. Suppose there are two unit column vectors 풖 portance is determinate, and we summarize the matching process and 풗 that represent events 푈 and 푉 , respectively. The conditional applied in the DRMM as the following formula, i.e., the probability probability of the event 푉 given the event 푈 is: of event 푅퐷 is: 푃 (푉 |푈 ) = (푐표푠(풗, 풖))2 = 풖푇 풗풗푇 풖 (4) 푃 (푅퐷 ) = 푃 (푞1)푃 (푅퐷 |푞1) + 푃 (푞2)푃 (푅퐷 |푞2) where the outer product Π = 풗풗푇 can be called as "projective (7) 푣 = ||Π Π 푸||2 + ||Π Π 푸||2 operator", which is a Hermitian operator. Meanwhile, the projective 푹퐷 풒1 푹퐷 풒2 푇 2 operator can satisfy the condition Π Π푣 = Π푣. Therefore, Eq. 4 2 푣 where 푃 (푞1, 푅퐷 ) = 푃 (푞1)푃 (푅퐷 |푞1) = ||Π푹 Π풒 푸|| and the pro- can be written as: 퐷 1 jection (Π푹퐷 Π풒 푸) : 푸 → 풒1 → 푹퐷 can get the matching feature 푇 푇 푇 1 푃 (푉 |푈 ) = 풖 Π풗풖 = 풖 Π풗 Π풗풖 vector provided by the first query matching unit 풒 and the whole (5) 1 푇 2 document. As shown in Figure 4 (c), the process is that the query = (Π풗풖) Π풗풖 = ||Π풗풖|| vector 푸 firstly projects to basis 풒1, and then projects to basis 푹퐷 . where ||·|| is the norm of vector, and the projection (Π푣풖) : 풖 → 풗 is However, in the process of document relevance judgement, the a projection of vector 풖 onto the vector 풗. The process of projection user usually considers the interaction between text matching units calculation is shown in Figure 3. during the document relevance judgment process. Different from the idea of the current neural matching model, if the query is firstly u expressed as a state vector 푸, which means that it can consider the query as a whole during the matching process. Therefore, the prob- ability that document D is relevant to the query can be calculated as follows: ′ 푃 (푅 ) = ||Π 푸||2 = ||Π (Π 푸 + Π 푸)||2 v 퐷 푹퐷 푹퐷 풒1 풒2 = || + ||2 Π퐯퐮 Π푹퐷 Π풒1 푸 Π푹퐷 Π풒2 푸 = || ||2 + || ||2+ (8) Π푹퐷 Π풒1 푸 Π푹퐷 Π풒2 푸 Figure 3: The process of projection measurement. 푇 푇 푇 푇 2|풒1푹퐷 ||풒1푸 ||풒2푹퐷 ||풒2푸 | 푇 푇 = 푃 (푅퐷 ) + 퐼 (푄, 푅퐷, 푞1, 푞2) For example, if two basis vectors are 풆1 = (1, 0) and 풆2 = (0, 1) in Eq. 2, the state can be represented as = (푎 , 푎 )푇 , and the 횿 횿 1 2 where the projection (Π푹퐷 푸) : 푸 → 푹퐷 will directly get the corresponding projective operators of 풆1 and 풆2 are: matching features provided by the whole query, which can be 1 0 0 0 directly applied for relevance prediction. This process is shown in Π = , Π = (6) 풆1 0 0 풆2 0 1 Figure 4 (d). The query vector 푸 is directly projected onto the basis state 푹퐷 , that is, the relevance degree can be directly calculated The probability of observing the elementary event 푒1 from the 2 from the perspective of the whole query. Compared with Eq. 7, the perspective of state Ψ can be calculated as 푃 (푒1) = ||Π풆 횿|| = 1 additional term 퐼 (푄, 푅퐷, 푞1, 푞2) in Eq. 8 is called the interference (푎 )2, and similarly we can prove that 푃 (푒 ) = ||Π ||2 = (푎 )2. 1 2 풆2 횿 2 term [3, 4, 20, 30], which allows the quantum probability theory to explain the violation of LTP in the process of document relevance 3.3 Interference Effects in the Relevance judgment. Judgment Process We want to explore the interference effect in the judgment process 4 THE QUANTUM INTERFERENCE INSPIRED of document relevance, by the projection measurement. In the NEURAL MATCHING MODEL framework of quantum probability, the probability events 푄, 푞 , 푞 1 2 In this section, based on the above analysis, we model quantum and 푅 correspond to vectors 푸, 풒 , 풒 and 푹 , respectively. 퐷 1 2 퐷 interference in neural network and propose Quantum Interference Firstly, as shown in Figure 4 (a), we assume that query Q can Inspired Neural Matching Model (QINM) for ad-hoc retrieval. For be represented by a state 푸 = 훼풒 + 훽풒 . Hence, the probability 1 2 clarity, we divide this model into two components: document prob- 2 푇 푇 푇 푇 푇 푇 푇 Π푣 Π푣 = (풗풗 ) (풗풗 ) = 풗 (풗 풗)풗 = 풗풗 = Π푣 ability distribution representation and relevance prediction. In the (a) Formalization of term weight process (b) Formalization of the matching process

(c) Classical probability case (d) Quantum probability case

Figure 4: Analogy of two neural matching processes in quantum probability and analysis on the existence of interference effect in the document relevance judgment. The unit vector 푸 denotes user information need. following, we will describe these components in detail, as shown where the state vector 흋 (with dimensions 푑2 ×1, 푑 is the dimension in Figure 5. of term vector) is composed of all composite state vectors3, which are obtained by tensor product operation ⊗4. The works [35, 37] 4.1 Document Probability Distribution show that the method of constructing text representation through Representation tensor product can make the interaction between each dimensions of matching unit vectors and model all possible combinatorial se- The input to our model is a query-document pair. A query is repre- mantics. sented as a set of query term vectors denoted by 푄 = 풒 , ..., 풒 퐷 푡ℎ 1 푛 The coefficient 푔푗 is the tf-idf value of the 푗 document term and a document is represented as a set of document term vectors, in its query candidate document set, and this calculation process which is denoted by 퐷 = {풅1, ..., 풅푚 }, where 풒푖 and 풅 푗 represent 푄 is denoted in Figure 5 as 푇푊 퐺퐷 (Term Weight Gating, TWG); 푔 the 푖푡ℎ query term vector and the 푗푡ℎ document term vector, respec- 푖 is a trainable parameter about the 푖푡ℎ query term, and this process tively. All term vectors are normalized and encoded in an embed- is denoted in Figure 5 as 푇푊 퐺 . These weight coefficients denote ding matrix 퐸 ⊆ R|푉 |×푑 , where |푉 | denotes the size of vocabulary 푄 the salience of term, and also can be regarded as a kind of global set and 푑 is the dimension of a term embeddings. matching information. Meanwhile, these coefficients satisfy the Í푛 | 푄 |2 = Í푚 | 퐷 |2 = 4.1.1 Query-Document Composite System Representation. In our condition 푖=1 푔푖 1 and 푗=1 푔푗 1. work, we regard the query and document as two quantum subsys- 4.1.2 Construction of Document Probability Distribution. However, tems in vector space. In order to construct interference effects in due to the high dimension of vector 흋, the subsequent calculation the matching process, we first construct the input query-document requires more computing resources. Therefore, we can study only pair into a query-document composite system according to the one subsystem of the query-document composite system by partial definition (see appendix 8.1). Specifically, we can derive the state trace operation (see appendix 8.1) to improve the computational vector 흋 of a query-document composite system as: 3 푞푖 ,푑푗 푄 퐷 In Figure 5, the vector 푽 = (푔푖 풒푖 ) ⊗ (푔푗 풅 푗 ) represents a composite state 푛,푚 vector composed of the 푖푡ℎ query term and the 푗푡ℎ document term. Õ 푄 퐷 4Suppose there are two vectors 푎ì = (0, 1) and 푏ì = (1, 0). The tensor product can be 흋 = (푔 풒푖 ) ⊗ (푔 풅 푗 ) (9) 푖 푗 ì 푖,푗=1 expressed as 푎ì ⊗ 푏 = (0, 0, 1, 0). Figure 5: The detailed Architecture of the QINM model.

′ performance. In our work, we choose to apply the composite state The specific process is to calculate the relevance probability 푃 (푅퐷 |푞푖 ) vector 흋 for computing the probability distribution of document provided by the document probability distribution and each query subsystem to the reduced density operator 흆퐷 (with dimensions projection operator separately, and then accumulate the final doc- ′ 푑 × 푑): ( ) ( ) ( 푄 )2 ument relevance probability 푃 푅퐷 . 푃 푞1 = 푔1 denotes the importance of the first query term. Compared with Eq. 7, the joint ′ 퐷 푻 probability 푃 (푞 , 푅 ) calculated by Eq. 11 has an extra interference 흆 = 푡푟푄 (흋흋 ) 푖 퐷 푚 푚,푚 term that can be applied to explain some non-classical phenom- 푄 Õ 퐷 2 퐷 Õ 퐷 퐷 퐷 ena. Meanwhile, compared with Eq. 8, the interference term in the = 퐶 ( (푔 ) Π + (푔 푔 )Π ) (10) ′ 푖 푖,푖 푗 푘 푗,푘 ( ) 푖=1 푗,푘=1 probability 푃 푅퐷 calculated by Eq. 11 is related to the interaction between all query matching units. = 푀푆 + 푀퐼 , (푗 ≠ 푘) 4.1.3 Effective Matching Feature Extraction. Now, we apply Eq.11 where the 흋흋푻 is a density operator of the query-document com- for generating matching features. We propose the Query Attention posite system, and represents the probability distribution of the mechanism, and the specific process is as the following formula: entire query-document composite system. The partial trace opera- tion 푡푟푄 (·) is an important tool for studying quantum composite system [21], and can be understood as trying to exclude the infor- 푖 ( 푄 )2 ( ( 퐷 ) 푄 ) 풙푎푡푡 = 푔푖 푑푖푎푔 퐶푁 푁 흆 Π푖,푖 mation of the query subsystem in the composite system to obtain ( 푄 )2 ( 푄 ) (12) the probability distribution of the document subsystem. = 푔푖 푑푖푎푔 푮Π푖,푖 , 푄 Í푛,푛 ( 푄 푄 ) ( 푄 ) 1 푛 For the same query, the coefficient 퐶 = 푖,푗 푔푖 푔푗 푡푟 Π푖,푗 풙푎푡푡 = 풙푎푡푡 ⊕ ... ⊕ 풙푎푡푡 indicates the overall the interaction between query terms. Mean- 퐷 푖 푡ℎ while, we can notice that the reduced density operator 흆 can where 풙푎푡푡 denotes the matching feature provided by 푖 query be represented by two parts: the first part is called the similarity term in candidate document 퐷, and all the matching features are feature matrix 푀푆 , which can be used to calculate similarity match- combined into the final matching tensor 풙푎푡푡 by concat operation ing features commonly used in some neural matching models (e.g., ⊕. Different from Eq.11, this process firstly extracts the effective MP and KNRM); the other part is called the interference feature features from operator 흆퐷 through a N-gram Window Convolution 퐷 matrix 푀퐼 , which is obtained by any two different document terms’ Network (likely [9]), which is represented by 푮 = 퐶푁 푁 (흆 ). Mean- outer product and can applies to matching features generated by while, in order to have as many matching features as possible in the the interaction between document terms. final relevance scoring process, we do not apply the trace operation Further, we can calculate the probability of document 퐷 related but instead use diagonal elements of the attention operator 푮Π푄 , ′ 푖,푖 to query 푄 (i.e., 푃 (푅퐷 )) by applying the document probability 푄 푇 where the projection operator Π = 풒풊 (풒풊) . distribution 흆퐷 : 푖,푖

′ ( | ) ( )푇 퐷 ( 퐷 푄 ) 4.2 Relevance Prediction and Model Training 푃 푅퐷 푞푖 = 풒푖 흆 풒푖 = 푡푟 흆 Π푖,푖 4.2.1 Ranking Score Prediction. The matching features 풙 are = 푃 (푅퐷 |푞푖 ) + 퐼 (푄, 퐷, 푞푖 ) 푎푡푡 ′ ′ ′ (11) combined by a MLP to produce the final ranking score: 푃 (푅퐷 ) = 푃 (푞1)푃 (푅퐷 |푞1) + 푃 (푞2)푃 (푅퐷 |푞2) 푇 = 푃 (푅퐷 ) + 퐼 (푄, 퐷, 푞1, 푞2) 푓 (풙푎푡푡 ) = 2 · 푡푎푛ℎ(푊 · 풙푎푡푡 + 푏) (13) where 푊 and 푏 are the linear ranking parameters to learn, and 5.2 Baselines 푡푎푛ℎ(·) is the activation function, which is used to limit the ranking We take classical retrieval models, some retrieval models based on score between -2 and 2 according to the range of relevance labels quantum theory and well-known Neural IR models as baselines. of the data set used in our experiment. First, we select two classical retrieval models as baslines :QL [34] 4.2.2 Model Training. Since the ad-hoc retrieval task is funda- and BM25 [26]. mentally a ranking problem, we employ ListNet [5] algorithm for For the Neural IR models, we choose five interaction-based re- learning relative position information between documents. Given a trieval models as baselines : MP [22], DRMM [13], K-NRM [33] query Q and its candidate document set {퐷1, 퐷2, ..., 퐷푀 }, the loss Conv-KNRM [9] and MIX [6]. function is defined as: Finally, for the retrieval model inspired by quantum theory, we 푀 select three cutting-edge research works for baselines : QLM [27], Õ 퐿(푦, 푓 ) = − 푃푦 (푄, 퐷푖 ) log(푃푠 (푄, 퐷푖 )) (14) NNQLM [36] and QMWF-LM [37]. 푖=1 5.3 Experimental Settings where 푃 (푄, 퐷 ) = 푒푥푝(푦(푄, 퐷 ))/Í푀 푒푥푝(푦(푄, 퐷 )), and the 푦 푖 푖 푘=1 푖 All experiments in this paper are implemented by Tensorflow, and function 푦(푄, 퐷 ) can return the relevance label for document 퐷 푖 푖 the experimental environment is based on TITAN X graphics card with respect to query 푄, and similarly for 푃 (푄, 퐷 ), where func- 푠 푖 server. For all baseline models that require word vector as input, we tion 푠(푄, 퐷 ) denotes the predicted matching score for (Q, 퐷 ) from 푖 푖 apply the 300-dimensional word vector trained with the Continuous QINM. Bag-of-Words (CBOW) Model [19] which uses 10 as the context window size on the Robust-04 and ClueWeb-09-Cat-B collections, 5 EXPERIMENTS where the context window is set to 10 to be consistent with the In this section, we conduct experiments to demonstrate the effec- baseline experimental settings. We discard from the vocabulary all 5 tiveness of our proposed model. The model code has been released , the terms that occur less than 10 times in the corpus. The out-of- which shows the experimental details and parameter settings. vocabulary terms are randomly initialized by a uniform distribution in the range of (-0.25, 0.25). 5.1 Data Sets For all baseline models, we try both the default configurations To conduct experiments, we use two TREC6 collections, ClueWeb- in their original paper and other settings. In our model, the N-gram 09-Cat-B and Robust-04. The details of the two collections are Window Convolution Network applies two convolution layers: the provided in Table 1. As we can see, they represent different sizes first layer has convolution kernel sizes of 3, 4 and 5 respectively, in and genres of heterogeneous text collections. ClueWeb-09-Cat-B7 is which the number of each size is 16, and one Max-pooling layer; a large English Web collection, and our training and testing topics the second convolution layer is composed of a convolution kernel are accumulated from TREC Web Tracks in 2009, 2010, and 2013. of size 5 and a Max-pooling layer. In scoring Layer, the fully con- Robust-04 is a small news dataset, whose topics are collected from nected network has three hidden layers, with 512, 128 and 64 nodes TREC Robust Track 2004. The topics provided by the TREC are respectively. In order to reduce the over-fitting phenomenon, there generally composed of title and description, and we only apply the is a Dropout layer before the output layer, and the output layer has title as user query in our work. In this experiment, the candidate one node as the final document relevance score. document set is obtained through the Galago Search Engine8. Dur- For all the matching models, we adopt a re-ranking strategy. ing indexing and retrieval, both documents and query words are That is, we perform an initial retrieval applying the QL model [34] white-space tokenized, lowercased, and stemmed using the Krovetz to get the top 1000 documents as candidate documents, then score stemmer [16]. each candidate document by the matching model, and finally return a reordered list of candidate documents. We conduct 5-fold cross- validation to minimize over-fitting without reducing the number Table 1: Statistics of the TREC collections used in this study. of learning instances, and each displayed evaluation statistic is the The ClueWeb-09-Cat-B collection has been filtered to the set average of five fold-level evaluation values. of documents in the 60th percentile of spam scores. During the evaluation phase, the top-ranked 1,000 documents are compared using the mean average precision (MAP), normalized Robust-04 ClueWeb-09-cat-B discounted cumulative gain at rank 20 (NDCG@20), precision at Vocabulary 0.6M 38M rank 20 (P@20), and expected reciprocal rank at rank 20 (ERR@20). Document Count 0.5M 34M Collection Length 252M 26B 5.4 Retrieval Performance and Analysis Query Count 250 150 This section presents the performance results of different retrieval models over the two benchmark datasets. A summary of results is displayed in Table 2. As we can see, our QINM perform significantly better than two 5 https://github.com/TJUIRLAB/SIGIR20_QINM traditional retrieval models, demonstrating that the reordering ap- 6https://trec.nist.gov 7http://lemurproject.org/clueweb09 proach by QINM is effective and can greatly improve the retrieval 8http://www.lemurproject.org/galago.php performance. Compared with the classical QLM retrieval model, Table 2: Comparison of different retrieval models over the ClueWeb-09-Cat-B and Robust-04 collections. (∗, ¶, §, † and ‡ mean a significant improvement over BM25∗, DRMM¶, Conv-KNRM§, NNQLM-II† and QMWF-LM‡ using Wilcoxon signed-rank test p<0.05.)

Model Name ClueWeb-09-Cat-B Robust-04 MAP NDCG@20 P@20 ERR@20 MAP NDCG@20 P@20 ERR@20 QL 0.100† 0.224† 0.328†‡ 0.139 0.253†‡ 0.415†‡ 0.369†‡ 0.213 BM25 0.101† 0.225† 0.326†‡ 0.141 0.255†‡ 0.418†‡ 0.370†‡ 0.220 QLM 0.082 0.164 0.167 0.112 0.103 0.247 0.208 0.193 NNQLM-I 0.089 0.181 0.169 0.128 0.134 0.278 0.237 0.210 NNQLM-II 0.091 0.203 0.216 0.132 0.150 0.290 0.249 0.236 QMWF-LM 0.103† 0.223† 0.237† 0.151† 0.164† 0.314† 0.257† 0.243† CDSSM 0.064 0.153 0.214 0.117 0.067 0.146 0.125 0.185 MP 0.066 0.158 0.222 0.124 0.189†‡ 0.330†‡ 0.290†‡ 0.207 DRMM 0.113∗†‡ 0.258∗†‡ 0.365∗†‡ 0.142† 0.279∗†‡ 0.431∗†‡ 0.382∗†‡ 0.342∗†‡ K-NRM 0.109† 0.273∗¶†‡ 0.361∗†‡ 0.153∗¶†‡ 0.262∗†‡ 0.407∗†‡ 0.364∗†‡ 0.353∗†‡ Conv-KNRM 0.121∗¶†‡ 0.285∗¶†‡ 0.367∗†‡ 0.177∗¶†‡ 0.274∗¶† 0.432∗† 0.376∗† 0.367∗¶† MIX-weight 0.119∗¶† 0.297∗¶† 0.349∗† 0.215∗¶† 0.281∗¶† 0.438∗† 0.383∗† 0.372∗¶† QINM 0.134∗¶§†‡ 0.338∗¶§†‡ 0.375∗¶†‡ 0.267∗¶§†‡ 0.294∗¶§†‡ 0.453∗¶§†‡ 0.408∗¶§†‡ 0.396∗¶§†‡

QINM has greatly improved the retrieval effect, showing that the play a certain role during the process of document relevance judg- application of advanced word vector technology and neural net- ment in the case of relatively short candidate documents. work technology with strong learning ability and generalization ability can not only take semantic information into consideration 5.5 Analysis on QINM Model in the retrieval process but also extract effective matching features, We design comparative experiments to explore the impact of Learn- which will play a key role in relevance matching. Meanwhile, both to-Rank methods and the relevance matching information provided NNQLM and QMWF-LM perform worse than QINM, indicating by interference term. Through these experiments, we try to gain a that global matching information is also a key feature in ad-hoc deeper understanding of the QINM. retrieval, and it is also extremely important to apply an appropriate learning-to-rank algorithm to learn the relative position informa- 5.5.1 The Impact of Learning-to-Rank Algorithms. Due to the im- tion between documents. portance of relative position information between documents in When we look at the Neural IR baselines, we find that QINM IR tasks, most retrieval models need Learning-to-Rank algorithm can work better than MP, K-NRM and Conv-KNRM, showing that to learn this information. Typically, ranking models can be classi- matching models just design for local matching cannot handle the fied into three categories: point-wise, pair-wise and list-wise, and matching requirement in ad-hoc retrieval. At the same time, we the ability of these three types to learn relative position informa- noticed that Conv-KNRM and MIX-weight perform better than tion is also gradually improved [5]. Therefore, the selection of the other Neural IR baselines, which indicates that the local interaction Learning-to-Rank algorithm can have a more significant effect on based on the n-gram fragment as the matching unit can construct the performance of the retrieval model. We explore the performance the dependency information of the text term to a certain extent. of several representative models in different types of Learning-to- Compared with DRMM and MIX-weight, QINM has been improved Rank algorithms, as shown in Table 3. obviously on ClueWeb-09-Cat-B, showing that the QINM based on quantum probability theory can effectively improve the retrieval performance, since the interference feature matrix can provide Table 3: Comparison of model performance under different additional relevance matching signals to construct the interaction Learning-to-Rank algorithms on ClueWeb-09-Cat-B. between the text matching units based on the existing neural match- ing model. Model name MAP NDCG@20 P@20 Finally, we can see that our proposed QINM is better than all NNQLM-II푝표푖푛푡 0.091 0.203 0.216 retrieval model inspired by quantum theory as well as most the NNQLM-II푝푎푖푟 0.105 0.217 0.228 existing neural matching models. For example, on ClueWeb-09- NNQLM-II푙푖푠푡 0.112 0.244 0.263 Cat-B topic titles, the relative improvement of our model over the Conv-KNRM푝표푖푛푡 0.115 0.264 0.348 Conv-KNRM is about 1.3%, 5.3%, 0.8% and 9.0%in terms of MAP, Conv-KNRM푝푎푖푟 0.121 0.285 0.367 NDCG@20, P@20 and ERR@20, respectively. At the same time, Conv-KNRM푙푖푠푡 0.130 0.317 0.372 we find that compared with KNRM and Conv-KNRM, QINM could 푄퐼푁 푀푝표푖푛푡 0.120 0.271 0.352 improve by about 2% on average in the four evaluation indicators, 푄퐼푁 푀푝푎푖푟 0.128 0.306 0.369 indicating that matching features constructed by QINM could still 푄퐼푁 푀푙푖푠푡 0.134 0.338 0.375 In the comparative experiment, we adopt the point-wise algo- measurement in quantum theory), we show the necessity of model- rithm in NNQLM-II [36], the pair-wise algorithm in Conv-KNRM [9], ing the interference effects in the estimation process of relevance and the list-wise algorithm used in this paper. Through comparison, probability. After that, we propose a Quantum Interference inspired we find that our model can achieve better performance thanthe Neural Matching model (QINM). other two methods with the same ranking algorithm. We also note To our knowledge, the proposed QINM model for the first time that the apply of list-wise algorithm also improves the performance applies quantum interference theory to neural matching model of the other two models. At the same time, this reflects the validity for ad-hoc retrieval. Specifically, we first construct the probability of QINM from the other side, and the ranking algorithm is only a distribution of a document into the reduced density operator, then complementary method of improving the retrieval performance. apply a N-gram Window Convolution Network to extract the effec- Meanwhile, we also discover a similar phenomenon in another data tive probability distribution, and finally the the matching features set. calculated by Query Attention mechanism are used together to calculate the final matching score. 5.5.2 The impact of interference matching features. Compared with Systematic experiments on ClueWeb-09-Cat-B and Robust-04 QLM and NNQLM, QINM encodes the interference effects, which have demonstrated that our model achieves a significant improve- plays an important role in the process of document relevance judg- ment over quantum-inspired models, and some well-known neu- ment. Combined with Eq. 10, we can know that the interference ral retrieval models, which suggest that the QINM can effectively term in operator 흆푫 will disappear when we ignore the interference improve the retrieval performance by constructing interference feature matrix 푀 (i.e., 흆푫 = 퐶푄 (Í푚 (푔퐷 )2Π퐷 ) = 푀 ). 퐼 푖=1 푖 푖,푖 푆 matching information. We set up two sets of comparative experi- ments, one of which shows the superiority of the QINM under the 0.5 MAP same Leaning-to-Ranking algorithm, and the other shows that the 0.453 NDCG@20 0.45 P@20 performance of the QINM is weaker after removing the interference 0.408 0.396 ERR@20 0.4 0.375 0.373 information, which also show the effectiveness of the interference 0.349 0.35 0.338 0.332 matching information.

0.3 0.284 0.294 For the future, we would like further explore the practical use of 0.267 0.261 quantum interference information in the field of natural language 0.25 0.231 0.212 processing (NLP). We can also model the quantum interference ef- 0.2 fects as part of BERT’s fine-tuning component. Moreover, due to the 0.15 0.134 0.12 high spatial complexity of extracting the interference information, 0.1 we will optimize the model and make it more practical. 0.05 0 7 ACKNOWLEDGMENTS QTNM QTNM_N QTNM QTNM_N This work is supported in part by the state key development pro- ClueWeb-09-Cat-B Robust-04 gram of China (grant No.2018YFC0831704, 2017YFE0111900), Nat- Figure 6: A comparative experiments to verify the effective- ural Science Foundation of China (grant No.61772363, U1636203), ness of interference effects on two benchmark collections. and the European Unions Horizon 2020 research and innovation pro- grame under the Marie SkodowskaCurie grant agreement No.721321. We compare the model QINM with QINM_N, which just calcu- 8 APPENDIX lates the the similarity feature matrix 푀푆 in the process of construct- ing the operator 흆푫 . According to the results depict in Figure 6, 8.1 Quantum Composite System we can see that the relative MAP, NDCG@20, P@20 and ERR@20 In quantum mechanics, the composite system consists of two or drop of QINM_N compared with QINM are about 1.4%, 7.7%, 9.1% more different quantum subsystems. The state space of the compos- and 5.5% on ClueWeb-09-cat-B; on Robust-04, the relative MAP, ite system is the tensor product of the state space of each subsystem. NDCG@20, P@20 and ERR@20 drop of QINM_N compared with Assuming that there are two different quantum system A and B,de- QINM are about 6.3%, 8.0%, 5.9% and 6.4%. These results demon- noted as Hilbert spaces 퐻퐴 and 퐻퐵. We can get a composite system strate that the interference matching features used in QINM are of A and B by a operation of tensor product, denoted as: important and effective in document relevance judgement. We also note that QINM_N is similar to the result of K-NRM, which con- 퐻퐴,퐵 = 퐻퐴 ⊗ 퐻퐵 . (15) forms to the relevance judgment process based on classical axioms. Assuming that 휶 is a state of system A and 휷 is a state of system 6 CONCLUSIONS AND FUTURE WORK B, the system of 퐻퐴,퐵 has a state of Ψ denoted as: In this paper, we try to extend the boundary of relevance probability Ψ = 휶 ⊗ 휷 (16) estimation, by introducing the interference effects in the neural matching models. First, we re-visited three kinds of retrieval models The partial trace operation of matrix is a special operation in (especially neural matching models) in the probabilistic points of quantum mechanics, and it is a special matrix trace operation. This view and find out that they are consistent with the classical probabil- process of tracing does not extent to the whole space, but to a ity rules. Then, by using the projection measurement (a probability subsystem. Suppose we have systems A and B, whose state is described by a [15] Kai Hui, Andrew Yates, Klaus Berberich, and Gerard De Melo. 2017. PACRR: density operator 휌퐴퐵. The partial trace [21] is defined by A Position-Aware Neural IR Model for Relevance Matching. arXiv preprint arXiv:1704.03940 (2017). [16] Robert Krovetz. 1993. Viewing morphology as an inference process. Artificial   퐴 = 퐴퐵 Intelligence 118, 1 (1993), 277–294. 휌 푡푟퐵 휌 . (17) [17] Christina Lioma, Birger Larsen, Casper Petersen, and Jakob Grue Simonsen. 2016. Deep Learning Relevance: Creating Relevant Information (as Opposed to where 푡푟퐵 is a map of operators known as the partial trace over Retrieving it). CoRR abs/1606.07660 (2016). arXiv:1606.07660 http://arxiv.org/ system B. And 휌퐴 is the reduced density operator for system A. The abs/1606.07660 [18] Donald Metzler and W Bruce Croft. 2005. A Markov random field model for reduced density matrix, which excludes the influence of the rest of term dependencies. In Proceedings of the 28th annual international ACM SIGIR the subsystems, is an indispensable tool for analyzing composite conference on Research and development in information retrieval. ACM, 472–479. quantum systems. It provides the correct measurement statistics [19] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed representations of words and phrases and their compositionality. In for measurements made on system A. International Conference on Neural Information Processing Systems. 3111–3119. If 풂1 and 풂2 are any two states in system A, and 풃1 and 풃2 are [20] Catarina Moreira and Andreas Wichert. 2016. Quantum-like bayesian networks any two vectors in system B. The partial trace is defined by for modeling decision making. Frontiers in psychology 7 (2016), 11. [21] M. A. Nielson and I. L. Chuang. 2000. Quantum Computation and Quantum 푇 푇 푇 푇 Information. Cambridge University Press. 푡푟퐵 (풂1 (풂2) ⊗ 풃1 (풃2) ) ≡ 풂1 (풂2) 푡푟 (풃1 (풃2) ). (18) [22] Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2016. A Study of MatchPyramid Models on Ad-hoc Retrieval. CoRR abs/1606.04648 (2016). hp. where the trace operation appearing on the right hand side is arxiv. org/abs/1606.04648 (2016). 푇 the usual trace operation [21] for system B, so 푡푟 (풃1 (풃2) ) = [23] Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, and Xueqi Cheng. 2017. A Deep 푇 Investigation of Deep IR Models. arXiv preprint arXiv:1707.07700 (2017). (풃1) 풃2 [12]. We have defined the partial trace operation only [24] Zhang Peng, Dawei Song, Yuexian Hou, Jun Wang, and Peter Bruza. 2010. Au- a special subclass of operators on the composite system AB; the tomata modeling for cognitive interference in users’ relevance judgment. Proc of specification is completed by requiring in addition to Eq.18 thatthe Qi (2010), 125–133. [25] Stephen E Robertson. 1977. The probability ranking principle in IR. Journal of partial trace be linear in its input. documentation 33, 4 (1977), 294–304. [26] Stephen E Robertson and Steve Walker. 1994. Some simple effective approxi- mations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR’94. REFERENCES Springer, 232–241. [1] Peter Bruza, Kirsty Kitto, Douglas Nelson, and Cathy McEvoy. 2009. Is there some- [27] Alessandro Sordoni, Jian Yun Nie, and Yoshua Bengio. 2013. Modeling term thing quantum-like about the human mental lexicon? Journal of Mathematical dependencies with quantum language models for IR. In International ACM SIGIR Psychology 53, 5 (2009), 362–377. Conference on Research and Development in Information Retrieval. 653–662. [2] P. D Bruza and R. J Cole. 2006. of Semantic Space: An Exploratory [28] Cornelis Joost Van Rijsbergen. 1977. A theoretical basis for the use of co- Investigation of Context Effects in Practical Reasoning. Physics 9, 10 (2006), 329– occurrence data in information retrieval. Journal of documentation (1977). 332. [29] Cornelis Joost Van Rijsbergen. 2004. The geometry of information retrieval. Cam- [3] Jerome R Busemeyer and Peter D Bruza. 2012. Quantum models of cognition and bridge University Press. decision. Cambridge University Press. [30] Benyou Wang, Peng Zhang, Jingfei Li, Dawei Song, Yuexian Hou, and Zhen- [4] Jerome R Busemeyer, Zheng Wang, and Ariane Lambert-Mogiliansky. 2009. Em- guo Shang. 2016. Exploration of quantum interference in document relevance pirical comparison of Markov and quantum models of decision making. Journal judgement discrepancy. Entropy 18, 4 (2016), 144. of Mathematical Psychology 53, 5 (2009), 423–433. [31] Jun Wang, Dawei Song, Peng Zhang, Yuexian Hou, and Peter Bruza. 2010. Expla- [5] Zhe Cao, Tao Qin, Tie Yan Liu, Ming Feng Tsai, and Hang Li. 2007. Learning to nation of relevance judgement discrepancy with quantum interference. In 2010 rank:from pairwise approach to listwise approach. In International Conference on AAAI Fall Symposium Series. Machine Learning. 129–136. [32] Zheng Wang, Jerome R Busemeyer, Harald Atmanspacher, and Emmanuel M [6] Haolan Chen, Fred X Han, Di Niu, Dong Liu, Kunfeng Lai, Chenglin Wu, and Pothos. 2013. The potential of using quantum theory to build models of cognition. Yu Xu. 2018. Mix: Multi-channel information crossing for text matching. In Topics in Cognitive Science 5, 4 (2013), 672–688. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge [33] Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. Discovery & Data Mining. ACM, 110–119. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th [7] Elio Conte, Andrei Yuri Khrennikov, Orlando Todarello, Roberta De Robertis, International ACM SIGIR conference on research and development in information Antonio Federici, and Joseph P Zbilut. 2011. On the possibility that we think in a retrieval. ACM, 55–64. quantum mechanical manner: An experimental verification of existing quantum [34] Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for interference effects in cognitive anomaly of conjunction fallacy. Chaos and language models applied to Ad Hoc information retrieval. In International ACM Complexity Letters 4 (2011), 123–136. SIGIR Conference on Research and Development in Information Retrieval. 334–342. [8] W Bruce Croft, Howard R Turtle, and David D Lewis. 1991. The use of phrases [35] Lipeng Zhang, Peng Zhang, Xindian Ma, Shuqin Gu, and Dawei Song. 2019. A and structured queries in information retrieval. In Proceedings of the 14th annual Generalized Language Model in Tensor Space. In Proceedings of The Thirty-Third international ACM SIGIR conference on Research and development in information AAAI Conference on Artificial Intelligence (AAAI-19). retrieval. 32–45. [36] Peng Zhang, Jiabin Niu, Zhan Su, Benyou Wang, Liqun Ma, and Dawei Song. [9] Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional 2018. End-to-End quantum-like language models with application to question neural networks for soft-matching n-grams in ad-hoc search. In Proceedings of answering. In Thirty-Second AAAI Conference on Artificial Intelligence. the eleventh ACM international conference on web search and data mining. ACM, [37] Peng Zhang, Zhan Su, Lipeng Zhang, Benyou Wang, and Dawei Song. 2018. A 126–134. Quantum Many-body Wave Function Inspired Language Modeling Approach. [10] Joel L Fagan. 2017. Automatic P hrase indexing for document retrieval: an In Proceedings of the 27th ACM International Conference on Information and examination of syntactic and non-syntactic methods. In ACM SIGIR Forum, Vol. 51. Knowledge Management. ACM, 1303–1312. ACM New York, NY, USA, 51–61. [38] Guido Zuccon, Leif A. Azzopardi, and Keith Van Rijsbergen. 2009. The Quan- [11] Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, and Guihong Cao. 2004. Depen- tum Probability Ranking Principle for Information Retrieval. In International dence language model for information retrieval. In Proceedings of the 27th annual Conference on Theory of Information Retrieval: Advances in Information Retrieval international ACM SIGIR conference on Research and development in information Theory. retrieval. 170–177. [12] Andrew M Gleason. 1975. Measures on the Closed Subspaces of a Hilbert Space. Springer Netherlands. 123–133 pages. [13] Jiafeng Guo, Yixing Fan, Qingyao Ai, and W Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM, 55–64. [14] Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In ACM International on Conference on Information and Knowledge Management. 55–64.