METHODOLOGY FOR THE SECRETARY PROBLEM WITH RANDOM QUERIES

BY GEORGE V. MOUSTAKIDES1,XUJUN LIU2,* AND OLGICA MILENKOVIC2,†

1Department of ECE, University of Patras, 26500 Rion, GREECE. [email protected]

2Department of ECE, University of Illinois, Urbana-Champaign, IL, USA. *[email protected]; †[email protected]

Candidates arrive sequentially for an interview process which results in them being ranked relative to their predecessors. Based on the ranks available at each time, one must develop a decision mechanism that selects or dismisses the current candidate in an effort to maximize the chance to select the best. This classical version of the “Secretary problem” has been studied in depth using mostly combinatorial approaches, along with numerous other variants. In this work we consider a particular new version where during reviewing one is allowed to query an external expert to improve the probability of making the correct decision. Unlike existing formulations, we consider experts that are not necessarily infallible and may provide suggestions that can be faulty. For the solution of our problem we adopt a probabilistic methodology and view the querying times as consecutive stopping times which we optimize with the help of optimal stopping theory. For each querying time we must also design a mechanism to decide whether we should terminate the search at the querying time or not. This decision is straightforward under the usual assumption of infallible experts but, when experts are faulty, it has a far more intricate structure.

1. Introduction. The secretary problem, also known as the game of Googol, the beauty contest problem, and the dowry problem, was formally introduced in [6] while the first so- lution was obtained in [10]. A widely used version of the secretary problem can be stated as follows: n individuals are ordered without ties according to their qualifications. They apply for a “secretary” position, and are interviewed one by one, in a random order. When the t- th candidate appears, s/he can only be ranked (again without ties) with respect to the t ´ 1 already interviewed individuals. At the time of the t-th interview, the employer must make the decision to hire the person present or continue with the interview process by rejecting the candidate; rejected candidates cannot be revisited at a later time. The employer succeeds only if the best candidate is hired. If only one selection is to be made, the question of interest is to determine a strategy (i.e., final rule) that maximizes the probability of selecting the best candidate.

arXiv:2107.07513v2 [cs.DS] 4 Aug 2021 In [10] the problem was solved using algebraic methods with backwards recursions while [3] considered the process as a Markov chain. An extension of the secretary problem, known as the dowry problem with multiple choices (for simplicity, we refer to it as the dowry prob- lem), was studied in [7]. In the dowry problem, one is allowed to select s ą 1 candidates during the interview process, and the condition for success is that the selected group includes the best candidate. This review process can be motivated in many different ways: for ex- ample, one may view the s-selection to represent candidates invited for a second round of interviews. In [7] we find a heuristic solution to the dowry problem while [17] solved the problem using a functional-equation approach of the dynamic programming method.

MSC2020 subject classifications: Primary 60G40, Stopping Times, Optimal Stopping; secondary 62L15, Op- timal Stopping in . Keywords and phrases: Multiple stopping times, Secretary problem, Dowry problem. 1 2

In [7] the authors also offer various extensions to the secretary problem in many different directions. For example, they examined the secretary problem (single choice) when the ob- jective is to maximize the probability of selecting the best or the second-best candidate, while the more generalized version of selecting one of the top ` candidates was considered in [8]. Additionally in [7] the authors studied the full information game where the interviewer is al- lowed to observe the actual values of the candidates, which are chosen independently from a known distribution. Several other extensions have been considered in the literature: the post- doc problem, for which the objective is to find the second-best candidate [7, 16]; the problem of selecting all or one of the top ` candidates when ` choices are allowed [14, 19, 20]. More recently, the problem is considered under a model where interviews are performed according to a nonuniform distribution such as the Mallows or Ewens, [2,9, 11, 12]. For more details regarding the history of the secretary problem, the interested reader may refer to [4,5]. The secretary problem with genie-queries, an extension of the secretary problem with multiple choices, was introduced in [12] and solved using combinatorial methods for a gen- eralized collection of distributions that includes the Mallows distribution. In this extended version it is assumed that the decision making entity has access to a limited number of in- fallible genies (experts). When faced with a candidate in the interviewing process a genie, if queried, provides a binary answer of the form “The best” or “Not the best”. Queries of experts are frequently used in interviews as additional source of information and the feed- back is usually valuable but it does not mean that it is necessarily accurate. This motivates the investigation of the secretary problem when the response of the genie is not deterministic (infallible) but it may also be false. This can be modeled by assuming that the response of the genie is random. Actually, the response does not even have to be binary as in the random query model employed in [1, 13] for the completely different problem of clustering. In our analysis we consider more than two possibilities which could reflect the level of confidence of the genie in its knowledge about the sample being the best or not. For example a four-level response could be of the form: “The best with high confidence”, “The best with low confi- dence”, “Not the best with low confidence” and “Not the best with high confidence”. As we will see, the analysis of the problem under a randomized genie model requires stochastic op- timization techniques and in particular results we are going to borrow from optimal stopping theory [15, 18].

2. Background. We begin by formally introducing the problem of interest along with our notation. Suppose the set tξ1, . . . , ξnu contains samples that are random uniform draws n without replacement from the set of integers t1, . . . , nu. The sequence tξtut“1 becomes avail- able sequentially and we are interested in identifying the sample with value equal to 1, which is regarded as “the best”. The difficulty of the problem stems from the fact that the value ξt is not observable. Instead, at each time t, we observe the relative rank zt of the sample ξt after it is being compared to all the previous samples tξ1, . . . , ξt´1u. If zt “ m (where 1 ď m ď t) this signifies that in the set tξ1, . . . , ξt´1u there are m ´ 1 samples with values strictly smaller than ξt. As mentioned, at each time t we are interested in deciding whether tξt “ 1u or tξt ą 1u based on the information provided by the relative ranks tz1, . . . , ztu. Consider now the existence of an expert/genie which we may query. The purpose of query- ing at any time t is to obtain from the genie the information about the current sample being the best or not. Unlike all articles in the literature that treat the case of deterministic genie responses here, as mentioned in the Introduction, we adopt a random response model. In the deterministic case the genie provides the exact answer to the question of interest and we ob- viously terminate the search if the answer is “tξt “ 1u”. In our approach the corresponding response is assumed to be a random number ζt that can take M different values. The reason we allow more than two values is to model the possibility of different levels of confidence in THE SECRETARY PROBLEM WITH RANDOM QUERIES 3 the genie response. Without loss of generality we may assume that ζt P t1,...,Mu and the probabilistic model we adopt is the following

(1) Ppζt “ m|ξt “ 1q “ ppmq, Ppζt “ m|ξt ą 1q “ qpmq, m “ 1,...,M, M M where m“1 ppmq “ m“1 qpmq “ 1, to assure that the genie responds with probability 1. As we can see, the probability of the genie generating a specific response depends on whether ř ř the true sample value is 1 or not. Additionally, we assume that ζt conditioned on tξt “ 1u or tξt ą 1u is statistically independent from any other past or future responses, relative ranks and sample values. It is clear that the random model is more general than its deterministic counterpart. Indeed we can emulate the deterministic case by simply selecting M “ 2 and pp1q “ 1, pp2q “ 0, qp1q “ 0, qp2q “ 1, with ζt “ 1 corresponding to “tξt “ 1u” and ζt “ 2 to “tξt ą 1u” with probability 1. In the case of deterministic responses it is clear that we gain no extra information by querying the genie more than once per sample (the genie simply repeats the same response). Motivated by this observation we adopt the same principle for the random response model as well, namely, we allow at most one query per sample. Of course, we must point out that under the random response model, querying multiple times for the same sample makes perfect sense since the corresponding responses may be different. However, as mentioned, we do not adopt this possibility in our current analysis. We discuss this point in more detail in Remark5 at the end of Section3. We study the case where we have available a maximal number K of queries. This means that for the selection process we need to define the querying times T1,..., TK with T1 ă T2 ă ¨ ¨ ¨ ă TK (inequalities are strict since we are allowed to query at most once per sample) and a final time Tf where we necessarily terminate the search. It is understood that when Tf occurs, if there are any remaining queries, we simply discard them. As we pointed out, in the classical case of an infallible genie, when the genie informs that the current sample is the best we terminate the search while in the opposite case we continue with the next sample. Under the random response model stopping at a querying time or continuing the search requires a more sophisticated decision mechanism. For this reason, with each querying time Tk we associate a decision function DTk P t0, 1u where DTk “ 1 means that we terminate the search at Tk while

DTk “ 0 that we continue the search beyond Tk. Let us now summarize our components: The search strategy is comprised of the querying times T1,..., TK , the final time Tf and the decision functions DT1 ,..., DTK which need to be properly optimized. Before starting our analysis let us make the following important remarks.

REMARK 1. It makes no sense to query or final-stop the search at any time t if we do not observe zt “ 1. Indeed, since our goal is to capture the sample ξt “ 1, if this sample occurs at t then it forces the corresponding relative rank zt to become 1.

REMARK 2. If we have queried at times tk ą ¨ ¨ ¨ ą t1 and there are still queries available (i.e. k ă K), then we have the following possibilities: 1) Terminate the search at tk; 2) Make another query after tk; and 3) Terminate the search after tk without making any additional queries. Regarding case 3) we can immediately dismiss it from the possible choices. Indeed, if we decide to terminate at some point t ą tk, then it is understandable that our overall performance will not change if at t we first make a query, ignore the genie response and then terminate our search. Of course, if we decide to use the genie response optimally then we cannot perform worse than terminating at t without querying. Hence, we can conclude that, if we make the k-th query at tk then we must obtain the genie response ζtk and use it to decide whether we should terminate at tk or make another query after tk. Of course if k “ K, namely, if we have exhausted all queries, then we decide between terminating at tK and employing the final time Tf to terminate after tK . We thus conclude that T1 ă ¨ ¨ ¨ ă TK ă Tf . 4

REMARK 3. Based on the previous remarks we may now specify the information each search component is related to. Denote with Zt “ sigmatz1, . . . , ztu the sigma-algebra gen- erated by the relative ranks available at time t and let Z0 be the trivial sigma-algebra. We then n have that the querying time T1 is a tZtut“0-adapted where tZtu denotes the fil- tration generated by the sequence of the corresponding sigma-algebras. Practically this means that the events tT1 “ tu, tT1 ą tu, tT1 ď tu belong to Zt. More generally, suppose that we fix k Tk “ tk,..., T1 “ t1 and t0 “ 0 and for t ą tk we define Zt “ sigmatz1, . . . , zt, ζt1 , . . . , ζtk u with 0 then, the querying time T is a k n -adapted stopping time. Indeed Zt “ Zt k`1 tZt ut“tk`1 we can see that the event tTk`1 “ tu depends on the relative ranks Zt but also on the genie responses tζt1 , . . . , ζtk u available at time t. If we apply this definition for k “ K then TK`1 simply denotes the final time Tf . With the first k querying times fixed as before, we can also D k see that the decision function tk is measurable with respect to Ztk . This is true because at tk in order to decide whether to stop or continue the search we use all available information up to time tk which consists of the relative ranks and the existing genie responses.

We begin our analysis by presenting certain basic probabilities. They are listed in the following lemma.

LEMMA 1. For n ě t ě t1 ě 1 we have pn ´ tq! 1 1 (2) Ppξt, . . . , ξ1q “ , PpZtq “ , Ppzt|Zt 1q “ n! t! ´ t 1 (3) Ppξt “ 1, Ztq “ I , pt ´ 1q!n tzt“1u 1 I 1t (4) Ppξt1 “ 1, Ztq “ tzt “1u t 1, pt ´ 1q!n 1 1` I 1b where A denotes the indicator function of the event A and where for b ě a we define a “ b I , while for b a we let 1b 1. `“a tz`ą1u ă a “

śPROOF. The first equality is well known and corresponds to the probability of selecting t values from the set t1, . . . , nu without replacement. The second and third equality in (2) show that the ranks tztu are independent and each zt is uniformly distributed in the set t1, . . . , tu. The equality in (3) computes the probability of the event tξt “ 1u in combination with rank values observed up to time t. Because the event tξt1 “ 1u forces the corresponding rank zt1 to become 1 and all subsequent ranks to be greater than 1, this fact is captured in (4) by the product of the indicators I 1t . The details of the proof can be found in the tzt1 “1u t1`1 Appendix.

In the next lemma, we present a collection of more advanced equalities as compared to the ones appeared in Lemma1 where we also include genie responses. In these identities we k will encounter the event tZt , ztk “ 1u that has the following meaning: When t ě tk then k ztk is part of Zt which in turn is part of Zt . By including explicitly the event tztk “ 1u we k simply state that we fix ztk to 1, while the remaining variables comprising Zt or Zt are free to assume any value.

LEMMA 2. For n ě t ą tk ą ¨ ¨ ¨ ą t1 ě 1 we have

k t (5) Ppξt “ 1|Z q “ I t n tzt“1u THE SECRETARY PROBLEM WITH RANDOM QUERIES 5

k ppζtk qtk (6) Ppξt “ 1|Z q “ I k tk tztk “1u ppζtk qtk ` qpζtk qpn ´ tkq

k´1 tk tk (7) Ppζt |Z , zt “ 1q “ ppζt q ` qpζt q 1 ´ k tk k k n k n ´ ¯ 1 ppζt q ´ qpζt q pt ´ 1q (8) P z 1 k , z 1 1 k k 1t´1 . p t “ |Zt´1 tk “ q “ ´ tk`1 t ppζtk qpt ´ 1q ` pn ´ t ` 1qqpζtk q ! ` ˘ ) PROOF. Equality (5) expresses the fact that the probability of interest depends only on the current rank while it is independent from previous ranks and genie responses. Equalities (6), (7), (8) suggest that the corresponding probabilities are functions of only the most recent genie response and do not depend on the previous responses. In particular in (8) we note the dependency structure that exists between zt and past information which is captured by the indicator 1t´1 . As we argue in the proof of Lemma1, this indicator is a result of the fact tk`1 that if ξtk “ 1 then the ranks for times t ą tk can no longer assume the value 1. The complete proof is presented in the Appendix.

As we will see in the subsequent analysis, compared to the deterministic case, a more chal- lenging decision structure will emerge under the random response model. In the deterministic model [7, 12], deciding to stop at a querying time is straightforward. If the genie responds with “tξt “ 1u” we stop, otherwise we continue our search. This strategy is not optimum in the case of random responses since the genie can provide faulty information. As we are going to show, the optimum decision functions DT1 ,..., DTK have a more intriguing form which depends on the values of the genie response and their corresponding probabilities of occurrence. Identifying the optimum search components will be the focus of the next section.

3. Optimizing the Success Probability. To simplify our presentation we make a final definition. For tk ą tk´1 ą ¨ ¨ ¨ ą t1 ą t0 “ 0 define the event tk z 1, D 0, . . . , z 1, D 0 , Bt1 “ t tk “ tk “ t1 “ t1 “ u with t0 denoting the whole sample space. From the definition we conclude Bt1 k I I I I I I (9) tk “ “ tk´1 . B tzt` “1u tDt` “0u tztk “1u tDtk “0u t1 Bt1 `“1 ź Basically tk captures the event of querying at t , . . . , t , after observing relative ranks equal Bt1 k 1 to 1 (required by Remark1) while deciding not to terminate the search at any of these query- I ing instances. It is clear from Remark3 that for t ě tk the indicator tk is measurable with Bt1 k respect to Zt , because this property applies to each individual indicator participating in the product in (9). Consider now a collection of querying times and a final time satisfying Tf ą TK ą ¨ ¨ ¨ ą

T2 ą T1 ą T0 “ 0 and a corresponding collection of decision functions DT1 ,..., DTK . De- note with Psucc the success probability delivered by this combination, namely the probability to select the best sample, then we have the following decomposition

K T (10) P “ Ppξ “ 1, D “ 1, k´1 q ` Ppξ “ 1, TK q, succ Tk Tk BT1 Tf BT1 k“1 ÿ b where, as usual, we define k“a xk “ 0 when a ą b. The general term in the sum expresses the probability of the event where we did not terminate at the first k ´1 querying times (this is ř 6 captured by Tk´1 which contains the event of all previous decisions being 0) and we decided BT1 to terminate at the k-th query (indicated by DTk “ 1). The single last term in (10) is the probability of the event where we did not terminate at any querying time (captured by TK ) BT1 and we make use of the final time Tf to terminate the search. Please note, that in (10) we did not include the events tzTf “ 1u, tzTk “ 1u although, as pointed out in Remark1, we query or final-stop only at points that satisfy this property. The reason is that these events are implied by the events tξTk “ 1u and tξTf “ 1u respectively, since tξTf “ 1u X tzTf “ 1u “ tξTf “ 1u, with a similar equality being true for any querying time. Let us now focus on the last term in (10) and apply the following manipulations

n n (11) Ppξ “ 1, TK q “ ¨ ¨ ¨ Ppξ “ 1, T “ t, T “ t ,..., T “ t , tK q Tf BT1 t f K K 1 1 Bt1 tątK ą¨¨¨ąt1“1 ÿ ÿ n n

I I I I I t “ ¨ ¨ ¨ Er tξt“1u tTf “tu tTK “tku ¨ ¨ ¨ tT1“t1u K s Bt1 tątK ą¨¨¨ąt1“1 ÿ ÿ n n K I I I I I t “ ¨ ¨ ¨ ErEr tξt“1u|Zt s tTf “tu tTK “tku ¨ ¨ ¨ tT1“t1u K s Bt1 tątK ą¨¨¨ąt1“1 ÿ ÿ n n K I I I I t “ ¨ ¨ ¨ ErPpξt “ 1|Zt q tTf “tu tTK “tku ¨ ¨ ¨ tT1“t1u K s Bt1 tątK ą¨¨¨ąt1“1 ÿ ÿ n n t Tf I I I I I t I I “ ¨ ¨ ¨ E z 1 T t T t ¨ ¨ ¨ T t K “ E z TK , t t“ u t f “ u t K “ ku t 1“ 1u Bt t Tf “1u B n 1 n T1 tątK ą¨¨¨ąt1“1 ÿ ÿ ” ı ” ı I t I I where for the third equality we used the fact that the indicators K , tTf “tu, tTk“tku, k “ Bt1 K 1,...,K, are measurable with respect to Zt and can therefore be placed outside the inner expectation, while for the second last equality we used (5). If we substitute (11) into (10) we can rewrite the success probability as

K Tk´1 Tf I I T (12) Psucc “ PpξTk “ 1, DTk “ 1, B q ` E z 1 K . T1 t Tf “ u B n T1 k“1 ÿ ” ı We can now continue with the task of optimizing (12) over all querying times T1,..., TK , the final time Tf and the decision functions DT1 ,..., DTK . We will achieve this goal step- by-step. We start by conditioning on T t ,..., T t , tK and first optimize over t K “ K 1 “ 1 Bt1 u

Tf ą tK , followed by a second optimization over DTK . This will result in an expression that depends on T1,..., TK and DT1 ,..., DTK´1 with a form which will turn out to be similar to (12) but with the sum reduced by one term. Continuing this idea of first conditioning on the previous querying times and the corresponding event B we are going to optimize succes- sively over the pairs pTf , DTK q, pTK , DTK´1 q,..., pT2, DT1 q and then, finally, over T1. The outcome of this sequence of optimizations is presented in the next theorem which constitutes our main result. As explained, in this theorem we will identify the optimum version of all search components and also specify the overall optimum success probability.

THEOREM 1. For t “ n, n ´ 1,..., 1, and k “ K,K ´ 1,..., 0, define recursively in t k k and k the deterministic sequences tAt u, tUt u according to 1 1 (13) Ak “ Ak 1 ´ ` max Uk`1, Ak , t´1 t t t t t ´ ¯ ( THE SECRETARY PROBLEM WITH RANDOM QUERIES 7

M t (14) Uk “ max ppmq , qpmqAk , t n t m“1 ÿ ( k K`1 t initializing with An “ 0 and Ut “ n . Then, for any collection of querying times and

final time T1 ă ¨ ¨ ¨ ă TK ă Tf and any collection of decision functions DT1 ,..., DTK that conform with Remark3, if we define for k “ K,K ´ 1,..., 0 the sequence tPku with k T`´1 k`1 I I T (15) Pk “ PpξT` “ 1, DT` “ 1, BT q ` E UT tzT “1u k , 1 k`1 k`1 BT `“1 1 ÿ “ ‰ then 0 (16) Psucc “ PK ď PK´1 ď ¨ ¨ ¨ ď P0 ď A0. 0 The upper bound A0 in (16) is independent from any search strategy and constitutes the maximal achievable success probability. This optimal performance can be attained if we select the querying times according to k`1I k (17) Tk`1 “ min t ą Tk : Ut tzt“1u ě At , and the decision functions to satisfy (

1 if ppζ q Tk ě qpζ qAk Tk n Tk Tk (18) DT “ k Tk k 0 if ppζT q ă qpζT qA , " k n k Tk where ζTk is the response of the genie at querying time Tk.

PROOF. As mentioned, Theorem1 constitutes our main result because we identify the optimum version of all the search components and the corresponding maximal success prob- ability. In particular, we have (17) for the optimum querying times T1,..., TK and final time Tf (we recall that Tf “ TK`1), while the optimum version of the decision functions is de- picted in (18). The complete proof is presented in the Appendix.

4. Simplified Form of the Optimum Components. Theorem1 offers explicit formulas for the optimum version of all search components. We recall that in the existing literature, querying and final stopping are defined in terms of very simple rules involving thresholds. For this reason in this section our goal is to develop similar rules for our search strategy. The next lemma identifies certain key monotonicity properties of the sequences introduced in Theorem1 that will help us achieve this goal.

k k LEMMA 3. For each fixed t the two sequences tAt u and tUt u are decreasing in k, while k k for each fixed k we have tAt u decreasing and tUt u increasing in t. Finally, at the two end k k`1 k k`1 points we have An ď Un and A0 ě U0 .

PROOF. With the help of this lemma we will be able to produce simpler versions of the optimum components. We can find the complete proof in the Appendix.

Let us use the results of Lemma3 to study (17) and (18). We observe that (17) can be true k`1 only if zt “ 1. Under this assumption and because of the increase of tUt u and decrease k of tAt u with respect to t, combined with their corresponding values at the two end points k`1 k t “ 0, n, we understand that there exists a time threshold rk such that Ut ě At for t ě rk while the inequality is reversed for t ă rk. The threshold rk can be identified beforehand by 8

k`1 k comparing the terms of the two deterministic sequences tUt u, tAt u. With the help of rk we can then equivalently write Tk as follows

(19) Tk “ min t ě maxtTk´1 ` 1, rku : zt “ 1 , namely, we make the k-th query the first time after the previous querying( time Tk´1, but no sooner than the time threshold rk, we encounter zt “ 1. Similar conclusion applies to the final time Tf where the corresponding threshold is rf “ rK`1. Regarding now (18), namely the decision whether to stop at the k-th querying time or not, k t again because of the decrease of tAt u (from Lemma3) and the increase of t n u with respect k k to t and also the fact that An “ 0 and A0 ą 0, we can conclude that there exist thresholds t k skpmq, m “ 1,...,M, that depend on the genie response ζTk “ m so that ppmq n ě qpmqAt for t ě skpmq while the inequality is reversed when t ă skpmq. The precise definition of skpmq is t (20) s pmq “ min t ě 1 : ppmq ě qpmqAk . k n t

With the help of the thresholds skpmq which can be computed beforehand,( we can equiva- lently write the optimal decision as

1 if Tk ě skpζTk q (21) DT “ k 0 if T ă s pζ q, " k k Tk where ζTk is the genie response at querying time Tk. In other words if we make the k-th query at time Tk and the genie responds with ζTk then if Tk is no smaller than the time threshold skpζTk q we terminate the search. If Tk is strictly smaller than skpζTk q then we continue to the next query (or final-stop if k “ K). At this point we have identified the optimum version of all components of the search strategy. In Table1 we summarize the formulas we need to apply in order to compute the

TABLE 1 Optimum Search Strategy.

k K`1 t Define An “ 0 and Ut “ n . For t “ n, n ´ 1,..., 1, and k “ K,..., 1, 0, compute: k k 1 k`1 k 1 At´1 “ At 1 ´ t ` max Ut , At t k M t k Ut “ m“´1 maxt¯ppmq n , q!pmqAt u ) 0 then Ař0 is the maximal success probability.

For k “ 1,...,K, find the thresholds for the final time and the querying times: t K rf “ mintt ě 1 : n ě At u k`1 k rk “ mintt ě 1 : Ut ě At u

With T0 “ 0, the optimum querying times and the final time are defined as:

Tk “ mintt ě maxtTk´1 ` 1, rku : zt “ 1u, if we did not terminate at Tk´1 Tf “ mintt ě maxtTK ` 1, rf u : zt “ 1u, if we did not terminate at TK .

For k “ 1,...,K and m “ 1,...,M, find the decision thresholds: t k skpmq “ mintt ě 1 : ppmq n ě qpmqAt u

The optimum decision whether to terminate at Tk or not is defined as: ζ 1,...,M T T s ζ For a genie response Tk P t u, stop at k if k ě kp Tk q otherwise proceed to the next query (or final time if k “ K). THE SECRETARY PROBLEM WITH RANDOM QUERIES 9 corresponding thresholds and also present the way these thresholds must be employed to implement the optimal search strategy.

REMARK 4. Even though it is not immediately evident from the previous analysis, the probabilistic description of the querying times, final time and decision functions enjoy a no- table stationarity characteristic (also pointed out in [11] for the infallible genie case). In particular, the form of the final time Tf is independent from the maximal number K of queries. This means that the threshold rf does not depend on K and it is in fact the same as the unique threshold of the classical secretary problem. The same observation applies to any querying time TK´k and decision function DTK´k . Their corresponding thresholds rK´k and sK´kpmq do not depend on K but only on k. This observation basically suggests that if we have identified the optimum components for some maximal value K and we are interested in decreasing K then we do not need to recompute the components. We simply start from the thresholds of the last querying time and decision function and go towards the first and we stop when we have collected the desired number of components. Similarly, if we increase K then we keep the optimum components computed for the original smaller K and add more components in the beginning by applying the formulas of Table1.

REMARK 5. Once more, we would like to emphasize that the search strategy presented in Table1 is optimum under the assumption that we allow at most one query per sample. We should however point out that when the genie provides faulty answers it clearly makes sense to query more than once per sample in order to improve our trust in the genie responses. Un- fortunately the corresponding analysis turns out to be significantly more involved compared to our current results, as one can easily confirm by considering the simple example of K “ 2 queries. For this reason, we believe, this more general setting requires separate consideration.

REMARK 6. Being able to query does not necessarily guarantee a success probability that can approach 1. This limit is accessible only in the case of an infallible genie. Unfortunately, when responses may be wrong, we can improve the success probability but we can only reach a maximal value which is strictly smaller than 1 even if we query with every sample (i.e. K “ n). To see this fact consider the extreme case where the genie outputs M “ 2 values with uniform probabilities pp1q “ pp2q “ qp1q “ qp2q “ 0.5. It is clear that responses under this probabilistic model are completely useless for any number of queries. Hence, we expect the resulting optimum scheme to be equivalent to the classical secretary problem (with no queries) which (see [7]) enjoys a success probability that approximates the value e´1 « 0.3679 for large n. Under the probabilistic response model, in order to experience success probabilities that approach 1, we conjecture that we must allow multiple queries per sample and a maximal number K of queries which exceeds the number of samples, that is, K ą n. Of course, again, we need to exclude the uniform probability model because it continues to be equivalent to the classical secretary problem with no queries even when multiple queries per sample are permitted.

5. Numerical Example. Let us now apply the formulas of Table1 to a particular exam- ple. We consider the case of n “ 100 samples where the genie outputs M “ 2 values. This means that the random response model contains the probabilities ppiq, qpiq, i “ 1, 2. We focus on the symmetric case pp1q “ qp2q, meaning that pp1q “ 1 ´ pp2q “ 1 ´ qp1q “ qp2q, which can be parametrized with the help of a single parameter p “ pp1q “ qp2q. We assign to p the following values p “ 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.98, 1 and allow a maximum of K “ 10 queries in order to observe the effectiveness of the optimum scheme. For p “ 1, as mentioned, we are in the case of the infallible genie, consequently we expect to match the existing results 10

TABLE 2 Thresholds and optimum success probability for n “ 100 samples and K “ 10 queries.

p rf r1 ˜ r10 s1pmq ˜ s10pmq, m “ 1 top, m “ 2 bottom Psucc 0.50 38 1 1 1 1 1 1 1 1 1 1 38 38 38 38 38 38 38 38 38 38 0.3710 38 38 38 38 38 38 38 38 38 38 0.60 38 27 27 27 27 27 27 27 27 27 29 27 27 27 27 27 27 27 27 27 27 0.3952 52 52 52 52 52 52 52 52 52 52 0.70 38 20 20 20 20 20 20 20 20 22 25 20 20 20 20 20 20 20 20 20 19 0.4568 66 66 66 66 66 66 66 66 66 66 0.80 38 14 14 14 14 14 14 15 16 18 24 14 14 14 14 14 14 14 14 14 13 0.5548 78 78 78 78 78 78 78 78 78 78 0.90 38 8 8 8 8 9 9 10 12 16 23 8 8 8 8 8 8 8 8 7 6 0.7055 90 90 90 90 90 90 90 90 90 90 0.95 38 5 5 5 5 6 7 8 11 15 23 5 5 5 5 5 5 5 4 4 3 0.8173 95 95 95 95 95 95 95 95 95 95 0.98 38 2 3 3 3 4 5 7 10 15 23 2 2 2 2 2 2 2 2 2 2 0.9095 98 98 98 98 98 98 98 98 98 98 1.00 38 1 1 2 2 3 4 6 10 15 23 1 1 1 1 1 1 1 1 1 1 0.9983 100 100 100 100 100 100 100 100 100 100 in the literature. We also note that p “ 0.5 corresponds to the uniform model therefore genie responses contain no useful information and we expect our scheme to be equivalent to the optimum scheme of the classical secretary problem. Using the formulas of Table1 we compute the thresholds rf , rk, k “ 1,...,K and the decision thresholds skpmq, m “ 1, 2, k “ 1,...,K. We can see the corresponding values in Table2 accompanied by the optimum performance delivered by the optimum scheme for K “ 10 queries. In Fig.1 we depict the evolution of the optimum performance for values of K ranging from K “ 0 to K “ 10 where K “ 0 corresponds to the classical secretary

n=100 1

p=1.0 p=0.98 0.9

p=0.95 0.8

p=0.9 0.7

0.6 p=0.8

0.5 p=0.7

p=0.6 0.4

p=0.5

0.3 012345678910 Q

FIG 1. Success probability as a function of number of queries K when n “ 100 samples and M “ 2 responses with symmetric probabilities pp1q “ 1 ´ pp2q “ 1 ´ qp1q “ qp2q “ p for p “ 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.98, 1. THE SECRETARY PROBLEM WITH RANDOM QUERIES 11 problem. Indeed, as we can see from Fig.1 all curves start from the same point which is equal to Psucc “ 0.37104 namely the maximal success probability in the classical case for n “ 100 (see [7, Table 2]). In Fig.1 we note the performance of the uniform case p “ 0.5 which is constant, not changing with the number of queries. As we discussed, this is to be expected since, under the uniform model, genie responses contain no information. It is interesting in this case to compare our optimum scheme to the optimum scheme of the classical version. In the classical case with no queries we recall that the optimum search strategy consists in stopping the first time, but no sooner than rf “ 38, that we observe zt “ 1. When we allow queries with p “ 0.5, as we can see, the querying thresholds r1 ˜ rK are all equal to 1. This means that the first K times we encounter a relative rank equal to zt “ 1 we must query. However, stopping at any of the querying times happens only if the querying time is no smaller than skpmq “ rf “ 38. If all K queries are exhausted before time 38, then we use the terminal time Tf that has a threshold equal to rf “ 38 as well. Consequently final stopping occurs if we encounter a rank equal to 1 no sooner than rf . Combining carefully all the possibilities we conclude that we stop at the first time after and including rf that we encounter a rank equal to 1. In other words, we match the classical optimum scheme. In the last row of Table2 we have the case of an infallible genie. We know that the optimum decision with an infallible genie requires the termination of the search if the genie responds with “tξt “ 1u” and continuation of the search if the response is “tξt ą 1u”. In our setup, instead, we compare the querying time Tk to the threshold skpmq. From the table we see that skp1q “ 1, skp2q “ n for all k “ 1,...,K. According to our model ζTk “ 1 corresponds with certainty (because p “ 1) to ξTk “ 1, therefore if ζTk “ 1 we see that we necessarily stop at

Tk since Tk ě skp1q “ 1. On the other hand, ζTk “ 2 corresponds with certainty to ξTk ą 1 and when ζTk “ 2 occurs we can stop only if Tk ě skp2q “ n which is impossible (unless Tk “ n where we necessarily stop since we have exhausted all samples). Therefore, when p “ 1 our optimum scheme matches the optimum scheme of an infallible genie. This can be further corroborated by comparing our thresholds rf “ 38, r10 “ 23 with the corresponding thresholds s˚, r˚ in [7, Table 3] and verifying that they are the same and they deliver the same success probability (in [7], there are tables only for K “ 0, 1). From Fig.1 we can also observe the fact we described in Remark6, namely that there is an improvement in the success probability, however the optimal value “saturates” with the limiting value being strictly less than 1. An additional conclusion we can draw from this example is that the thresholds also converge to some limiting value. This means that, after some point, increasing K results in repeating the last thresholds and, experiencing the same optimum performance. The only case which does not follow this rule and is capable of improving the success probability until it converges to 1 is when p “ 1 which, as mentioned, corresponds to an infallible genie. A final observation regarding our example is the case where the value of the parameter p satisfies p ă 0.5. Using the formulas of Table1 we can show that we obtain exactly the same results as using, instead of p, the value 1 ´ p ą 0.5. The only modification we need to make is to exchange the roles of m “ 1 and m “ 2 in the thresholds skpmq. We can see why this modification is necessary by considering the extreme case p “ 0 corresponding to Ppζt “ 1|ξt “ 1q “ 0. Then, when M “ 2, it is of course true that Ppζt “ 2|ξt “ 1q “ 1 and, therefore, we now have that the value ζt “ 2 corresponds with certainty to tξt “ 1u. This exchange of roles between m “ 1 and m “ 2 continues to apply when 0 ď p ă 0.5.

APPENDIX: PROOF OF LEMMA1

For any collection of values tξt, . . . , ξ1u under the uniform model without replacement it is well known the validity of (2) since ξ` can select its value among n ´ ` ` 1 possibilities, 12

1 each with the same probability n´``1 . Suppose now that the collection tξt, . . . , ξ1u when occurring sequentially produces the sequence of ranks tzt, . . . , z1u where 1 ď zt ď t ď n. If we fix a collection tzt, . . . , z1u of t ranks, making sure that they conform with the constraint 1 ď z` ď ` and also select t integers 1 ď i1 ă i2 ă ¨ ¨ ¨ ă it ď n as possible sample values, then there is a unique way to assign these values to tξt, . . . , ξ1u in order to produce the specified ranks. Indeed, we start with ξt to which we assign the zt-th value from the set ti1, . . . , itu, that is, the value izt . We remove this element from the set of values and then we proceed to ξt´1 to which we assign the zt´1-th element from the new list of values, etc. This procedure generates the specified ranks from any subset of t1, . . . , nu of size t. As we just mentioned, for fixed ranks tzt, . . . , z1u any subset of t integers from the set t1, . . . , nu can be uniquely rearranged and assigned to tξt, . . . , ξ1u in order to generate the n specified rank sequence. There are t such possible combinations with each combination pn´tq! having a probability of occurrence equal` ˘ to n! . Multiplying the two quantities yields the Ppzt,...,z1q 1 second equality in (2), from which we can then deduce that Ppzt|Zt´1q “ “ Ppzt´1,...,z1q t which proves the third equality in (2).

Consider now (4). If ξt1 “ 1 this will force zt1 “ 1 but also all ranks for times larger than t1 are going to be necessarily larger than 1. This is expressed through the indicator t t I 1 “ I p I q. With ξt “ 1 and zt “ 1 let us fix the remaining tzt1 “1u t1`1 tzt1 “1u `“t1`1 tz`ą1u 1 1 ranks in Zt assuring they are consistent with the constraint imposed for times larger than t1 ś and also recalling that z` must take values in the set t1, . . . , `u. We can now see that we are allowed to select the values of t´1 samples from a pool of n´1 integers (since the value 1 is ξ n´1 already assigned to t1 ). This generates t´1 combination and each combination, including also the fact that ξ 1, has probability of occurrence equal to pn´tq! . If we multiply the two t1 “ ` ˘ n! quantities then we obtain (4). This concludes the proof of the lemma.

APPENDIX: PROOF OF LEMMA2 We begin with (5) and we note k k Ppξt “ 1, Z q Ppξt “ 1, Z q (22) P ξ 1 k t t I , p t “ |Zt q “ k “ k tzt“1u PpZt q PpZt q where the last equality is due to the fact that ξt “ 1 forces zt to become 1 as well, therefore I the numerator is 0 if zt ‰ 1. This fact is captured with the indicator tzt“1u. We can now write

k k k k Ppξt “ 1, Zt q “ Ppξt “ 1, Zt , ξtk “ 1q`Ppξt “ 1, Zt , ξtk ą 1q “ Ppξt “ 1, Zt , ξtk ą 1q k´1 k´1 “ Ppξt “ 1, ζtk , Zt , ξtk ą 1q “ Ppξt “ 1, ζtk , Zt |ξtk ą 1qPpξtk ą 1q k´1 k´1 “ Ppζtk |ξtk ą 1qPpξt “ 1, Zt |ξtk ą 1qPpξtk ą 1q “ qpζtk qPpξt “ 1, Zt , ξtk ą 1q k´1 k´1 k´1 “ qpζtk q Ppξt “ 1, Zt q ´ Ppξt “ 1, Zt , ξtk “ 1q “ qpζtk qPpξt “ 1, Zt q, where qpζtk q, following the model in (1), denotes the probability( of the genie responding with the value ζtk given that ξtk ą 1. We also observe that the second and last equality are true due to the fact that it is impossible for two samples at different time instances to have the same value, while the forth equality is true because, according to our model when we condition on tξtk ą 1u then ζtk is independent from all ranks, other responses and other sample values. I In order to modify the denominator in (22), due to the indicator tzt“1u in the numerator it is sufficient to analyze the denominator by fixing zt “ 1. Specifically

k k k k PpZt , zt “ 1q “ PpZt , zt “ 1, ξtk “ 1q ` PpZt , zt “ 1, ξtk ą 1q “ PpZt , zt “ 1, ξtk ą 1q THE SECRETARY PROBLEM WITH RANDOM QUERIES 13

k´1 k´1 “ Ppζtk , Zt , zt “ 1, ξtk ą 1q “ Ppζtk |ξtk ą 1qPpZt , zt “ 1, ξtk ą 1q k´1 k´1 k´1 “ qpζtk q PpZt , zt “ 1q ´ PpZt , zt “ 1, ξtk “ 1q “ qpζtk qPpzt “ 1, Zt q, where in the third and the last equality we used the fact that for(t ą tk we cannot have zt “ 1 when ξtk “ 1 because this requires ξt ă ξtk which is impossible since ξtk “ 1. Dividing the numerator with the denominator proves that

k´1 k´1 Ppξt “ 1, Z q Ppξt “ 1, Z q Ppξ “ 1| kq “ t I “ t I t Zt k´1 tzt“1u k´1 tzt“1u Ppzt “ 1, Zt q PpZt q k´1 I “ Ppξt “ 1|Zt q tzt“1u. In other words, we remove the most recent genie response. Applying this equality repeatedly we conclude that k 0 I I (23) Ppξt “ 1|Zt q “ Ppξt “ 1|Zt q tzt“1u “ Ppξt “ 1|Ztq tzt“1u namely, the conditional probability is independent from all past genie responses. Combining the second equality in (2) with (3) we can now show 1 I Ppξt “ 1, Ztq pt´1q!n tzt“1u t P ξ 1 I , p t “ |Ztq “ “ 1 “ tzt“1u PpZtq t! n which if substituted in (23) proves (5) and suggests that the desired conditional probability k k Ppξt “ 1|Zt q depends only on t and zt and not on Zt´1, namely previous ranks and previous genie responses. To show (6) we observe that P ξ 1, k k p tk “ Ztk q Ppξt “ 1|Z q “ . k tk P k pZtk q For the numerator using similar steps as before, we can write

P ξ 1, k P ξ 1, ζ , k´1 q ζ P ξ 1, k´1 p tk “ Ztk q “ p tk “ tk Ztk q “ p tk q p tk “ Ztk q t k´1 k´1 k I k´1 “ qpζtk qPpξtk “ 1|Zt qPpZt q “ qpζtk q tzt “1uPpZt q k k n k k where for the second equality we first conditioned on tξtk “ 1u and used the fact that ζtk is independent from any other information, while for the last equality we applied (5). Similarly, for the denominator we have

(24) P k P ζ , k´1 P ξ 1, ζ , k´1 P ξ 1, ζ , k´1 pZtk q “ p tk Ztk q “ p tk “ tk Ztk q ` p tk ą tk Ztk q p ζ P ξ 1, k´1 q ζ P ξ 1, k´1 “ p tk q p tk “ Ztk q ` p tk q p tk ą Ztk q p ζ P ξ 1 k´1 q ζ P ξ 1 k´1 P k´1 “ p tk q p tk “ |Ztk q ` p tk q p tk ą |Ztk q pZtk q t t k I k(I k´1 “ ppζtk q tzt “1u ` qpζtk q 1 ´ tzt “1u PpZt q n k n k k where for the last equality we applied! the same idea we used in´ the last equality¯) of the numer- ator. Dividing the numerator with the denominator and using the fact that in the numerator we have the indicator I , it is easy to verify that we obtain the expression appearing tztk “1u in (6). 14

To prove (7) we have

Ppζ , k´1, z “ 1q P k, z 1 k´1 tk Ztk tk pZtk tk “ q Ppζt |Z , zt “ 1q “ “ k tk k P k´1, z 1 P k´1, z 1 pZtk tk “ q pZtk tk “ q t t “ ppζ q k ` qpζ q 1 ´ k , tk n tk n ´ ¯ where for the last equality we used (24) and applied it for ztk “ 1. Let us now demonstrate (8) which is the relationship that distinguishes our random model from the classical infallible genie case. We observe that

k Ppzt “ 1, Zt´1, ztk “ 1, ζtk , . . . , ζt1 q Ppzt “ 1|Zt´1, ztk “ 1q “ PpZt´1, ztk “ 1, ζtk , . . . , ζt1 q As before we can write for the numerator

Ppzt “ 1, Zt´1, ztk “ 1, ζtk , . . . , ζt1 q “ Ppzt “ 1, Zt´1, ztk “ 1, ζtk , . . . , ζt1 , ξt1 ą 1q

“ Ppzt “ 1, Zt´1, ztk “ 1, ζtk , . . . , ζt2 , ξt1 ą 1qqpζt1 q

“ Ppzt “ 1, Zt´1, ztk “ 1, ζtk , . . . , ζt2 qqpζt1 q.

For the denominator, due to the constraint ztk “ 1, we can follow similar steps as in the numerator and show

PpZt´1, ztk “ 1, ζtk , . . . , ζt1 q “ PpZt´1, ztk “ 1, ζtk , . . . , ζt2 qqpζt1 q. Dividing the numerator with the denominator yields

Ppzt “ 1, Zt 1, zt “ 1, ζt , . . . , ζt q Ppzt “ 1, Zt 1, zt “ 1, ζt , . . . , ζt q ´ k k 1 “ ´ k k 2 , PpZt´1, ztk “ 1, ζtk , . . . , ζt1 q PpZt´1, ztk “ 1, ζtk , . . . , ζt2 q which suggests that the first ratio does not depend on ζt1 . Following similar steps we can remove all previous genie responses one-by-one and prove that

k Ppzt “ 1, Zt´1, ztk “ 1, ζtk q Ppzt “ 1|Zt´1, ztk “ 1q “ , PpZt´1, ztk “ 1, ζtk q namely, the conditional probability depends only on the most recent genie response. It is possible now for the numerator and the denominator to obtain more suitable expressions. We start with the numerator and apply similar steps as above. Specifically

Ppzt “ 1, Zt´1, ztk “ 1, ζtk q “ Ppzt “ 1, Zt´1, ztk “ 1, ζtk , ξtk ą 1q 1 “ Ppzt “ 1, Zt 1, zt “ 1qqpζt q “ qpζt q, ´ k k t! k where for the last expression we used the second equality in (2) after observing that tzt “

1, Zt´1, ztk “ 1u is simply Zt with two of its elements fixed to 1. For the denominator we can similarly write

PpZt´1, ztk “ 1, ζtk q “ PpZt´1, ztk “ 1, ζtk , ξtk “ 1q ` PpZt´1, ztk “ 1, ζtk , ξtk ą 1q

“ PpZt´1, ztk “ 1, ξtk “ 1qppζtk q ` PpZt´1, ztk “ 1, ξtk ą 1qqpζtk q

“ PpZt´1, ztk “ 1, ξtk “ 1qppζtk q ` PpZt´1, ztk “ 1q ´ PpZt´1, ztk “ 1, ξtk “ 1q qpζtk q 1 1 1 “ 1t´1 ` ´ 1t´(1 , pt ´ 2q!n tk`1 pt ´ 1q! pt ´ 2q!n tk`1 ! ) THE SECRETARY PROBLEM WITH RANDOM QUERIES 15 where to obtain the last expression we applied the second equality of (2) combined with (4) t´1 after observing that by fixing zt “ 1 the product I 1 produced by (4) becomes k tztk “1u tk`1 1t´1 . Dividing the numerator with the denominator we can verify that the resulting ratio tk`1 matches the right hand side of (8) for the two possible values of 1t´1 , namely 0 or 1. This tk`1 completes the proof of Lemma2.

APPENDIX: PROOF OF THEOREM1 K`1 t We first note that Psucc “ PK where PK satisfies (15) for k “ K and Ut “ n . Consider now the general form of Pk defined in (15). We focus on the last term which we intend to optimize with respect to Tk`1. We note that

k`1 I I (25) E U z 1 Tk Tk`1 t Tk`1 “ u B T1 “ n n ‰ k`1 “ ¨ ¨ ¨ E U I I I ¨ ¨ ¨ I I tk Tk 1 tzT “1u tTk`1ątku tTk“tku tT1“t1u ` k`1 Bt1 tką¨¨¨ąt1ě1 ÿ ÿ “ ‰ n n k`1 k “ ¨ ¨ ¨ E E U I | I I ¨ ¨ ¨ I I tk , Tk 1 tzT “1u Ztk tTk`1ątku tTk“tku tT1“t1u ` k`1 Bt1 tką¨¨¨ąt1ě1 ÿ ÿ ” “ ‰ ı I I where in the last equality, as we point out in Remark3, the indicators tTk`1ątku, tTk“tku,..., k I I tk tT1“t1u, are measurable with respect to Ztk and, consequently, can be placed outside Bt1 the inner expectation. We could isolate the inner expectation and optimize it by solving the optimal stopping problem k`1 k (26) max E U I z 1 |Z , Tk`1 t Tk`1 “ u tk Tk`1ątk “ ‰ with respect to Tk`1. Unfortunately the resulting optimal reward turns out to be a complicated k I tk expression of the information Ztk . After careful examination, and recalling from (9) that Bt1 contains the indicator I , it is sufficient to consider the case where zt is fixed to the tztk “1u k value 1. This constraint simplifies considerably our analysis and it is the main reason we have developed equalities (7), (8) in Lemma2. We also recall that ztk “ 1, according to Remark1, is a prerequisite for querying at tk. After this observation, we replace (25) with the alternative relationship

k`1 I I (27) E U z 1 Tk Tk`1 t Tk`1 “ u B T1 “n n ‰ k`1 k “ ¨ ¨ ¨ E E U I | , z “ 1 I I ¨ ¨ ¨ I I tk . Tk 1 tzT “1u Ztk tk tTk`1ątku tTk“tku tT1“t1u ` k`1 Bt1 tką¨¨¨ąt1ě1 ÿ ÿ ” “ ‰ ı Again, we emphasize that we are allowed to make this specific conditioning because the value

I I t ztk “ 1 is imposed by the indicator tzt “1u contained in k . Let us now isolate the inner k Bt1 expectation in (27) and consider the following optimal stopping problem in place of (26) k`1 I k (28) max E U z 1 |Z , ztk “ 1 . Tk`1 t Tk`1 “ u tk Tk`1ątk “ ‰ k Following [15, 18] we need to define, for t ą tk, the sequence of optimal rewards tRt u where k k`1 I k (29) R “ max ErU z 1 |Z , ztk “ 1s. t Tk`1 t Tk`1 “ u t Tk`1ět 16

k From optimal stopping theory we have that tRt u satisfies the backward recursion k k`1I k k (30) Rt “ max Ut tzt“1u, ErRt`1|Zt , ztk “ 1s , k which must be applied for t “ n, n ´ 1, . . . , tk ` 1 and initialized with( Rn`1 “ 0. We recall that tk is excluded from the possible values of Tk`1 since we require Tk`1 ą tk. In order to find an explicit formula for the reward, we use the definition of the sequence k k tAt u from (13) and we introduce a second sequence tBt pmqu satisfying the following back- ward recursion

k (31) Bt´1pmq “ 1 1 ppmq ´ qpmq pt ´ 1q Bkpmq 1 ´ ` Ak `Bkpmq ´ maxtUk`1, Aku t t t t t t t p m t 1 q m n t 1 p qp` ´ q ` p ˘qp ´ ` q ´ ¯ ` ˘ k t “ n, . . . , tk, m “ 1,...,M, which is initialized with Bnpmq “ 0. Actually, we are inter- k k k ested in the expected reward Vt “ ErRt`1|Zt , ztk “ 1s for which we intend to show, using (backward) induction, that Vk Ak Bk ζ 1t . (32) t “ t ` t p tk q tk`1 Indeed, we have that (32) is true for t “ n since both the right and left sides are 0. Assume our claim is true for t, then we will show that it is also valid for t ´ 1. Using (30) we can write

k k`1 k I kI Rt “ maxtUt , Vt u tzt“1u ` Vt tztą1u max Uk`1, Ak Bk ζ 1t I Ak Bk ζ 1t I “ t t t ` t p tk q tk`1u tzt“1u ` t ` t p tk q tk`1 tztą1u max Uk`1, Ak I ` Ak Bk ζ 1˘ t´1 I . “ t t t u tzt“1u ` t ` t p tk q tk`1 tztą1u k Taking expectation on both sides conditioned on tZt´1, ztk `“ 1u, using (8) and rearranging˘ terms, it is not complicated to verify that Vk is also equal to Ak Bk ζ 1t´1 , t´1 t´1 ` t´1p tk q tk`1 k k provided that tAt u and tBt pmqu are defined through (13) and (31), respectively. Let us now return to the optimization problem in (28). According to our analysis, the optimal reward satisfies (33) k`1 I k k k k 1tk k k max ErU z 1 |Z , ztk “ 1s “ V “ A ` B pζtk q “ A ` B pζtk q, Tk`1 t Tk`1 “ u tk tk tk tk tk`1 tk tk Tk`1ątk 1b since, according to our definition, a “ 1 when a ą b. k k The next step consists in finding a more convenient expression for the sum At ` Bt pmq. Again, using backward induction we prove that qpmq ´ ppmq t (34) Bkpmq “ Ak . t t p m t q m n t p` q ` p qp ˘´ q Clearly, for t “ n this expression is true since both sides are 0. We assume it is true for t and we will show that it is valid for t ´ 1. Indeed, if we substitute (34) in the definition in (31) then, after some straightforward manipulations, we end up with the equality 1 1 qpmq ´ ppmq pt ´ 1q Bk pmq “ Ak 1 ´ ` maxtUk`1, Aku , t´1 t t t t t ppmqpt ´ 1q ` qpmqpn ´ t ` 1q ! ´ ¯ ) ` ˘ which, with the help of (13), proves the induction. Substituting (34) in (33) provides a more concise expression for the optimal reward qpζ qn k`1 I k k tk (35) max ErU z 1 |Z , ztk “ 1s “ A , Tk`1 t Tk`1 “ u tk tk Tk`1ątk ppζtk qtk ` qpζtk qpn ´ tkq THE SECRETARY PROBLEM WITH RANDOM QUERIES 17

which depends only on the most recent genie response ζtk . The optimal performance, according to optimal stopping theory, can be achieved by the following stopping time T min t t : Uk`1I Ak Bk ζ 1t . k`1 “ t ą k t tzt“1u ě t ` t p tk q tk`1u

The previous stopping rule gives the impression that the optimum Tk`1 depends on the genie response value ζtk . However, we observe that the only way we can stop is if zt “ 1 which forces the indicator 1t to become 0. Consequently, the optimum version of T is equiv- tk`1 k`1 alent to k`1I k Tk`1 “ mintt ą tk : Ut tzt“1u ě At u, which is independent from ζtk and proves (17). Using (35) we obtain the following (attainable) upper bound for (25)

k`1 k qpζTk qn E U I I T E A I T (36) T tzT “1u k ď Tk k k`1 k`1 BT BT 1 ppζTk qTk ` qpζTk qpn ´ Tkq 1 ” ı “ ‰ qpζ qn k Tk I I I “ E AT tz “1u tD “0u Tk´1 . k Tk Tk B ppζTk qTk ` qpζTk qpn ´ Tkq T1 ” ı We conclude that the solution of the optimization problem introduced in (28) resulted in the identification of the optimum querying times T1,..., TK and the optimum final time Tf (since Tf “ TK`1). Let us now see how we can optimize the remaining elements of our search strategy, namely, the decision functions DT1 ,..., DTK . Consider the last component of the sum in (15) which can be written as follows

(37) Ppξ “ 1, D “ 1, Tk´1 q Tk Tk BT1 n n I I I I I “ ¨ ¨ ¨ Er ¨ ¨ ¨ tk´1 s tξtk “1u tTk“tku tT1“t1u tDtk “1u Bt1 tką¨¨¨ąt1ě1 ÿ ÿ n n k I I I I “ ¨ ¨ ¨ E Ppξt “ 1|Z q ¨ ¨ ¨ tk´1 k tk tTk“tku tT1“t1u tDtk “1u Bt1 tką¨¨¨ąt1ě1 ÿ ÿ “ ‰ n n ppζtk qtk I I I I I t “ ¨ ¨ ¨ E tzt “1u tTk“tku ¨ ¨ ¨ tT1“t1u tDt “1u k´1 k k Bt ppζtk qtk ` qpζtk qpn ´ tkq 1 tką¨¨¨ąt1ě1 ÿ ÿ ” ı ppζ qT Tk k I I I “ E tz “1u tD “1u Tk´1 . Tk Tk B ppζTk qTk ` qpζTk qpn ´ Tkq T1 ” ı k The second equality is true because we condition on Zt and since all indicator functions are measurable with respect to this sigma algebra they can be placed outside the inner expectation which gives rise to the conditional probability. For the third equality we simply apply (6). If we now add the two parts analyzed in (36) and (37), we can optimize the sum with respect to

DTk . In particular

Tk´1 k`1 I I T (38) PpξTk “ 1, DTk “ 1, B q ` E U z 1 k T1 Tk`1 t Tk`1 “ u B T1 ppζ qT“ ‰ Tk k I I I ď E tz “1u tD “1u Tk´1 Tk Tk B ppζTk qTk ` qpζTk qpn ´ Tkq T1 ” ı qpζ qn k Tk I I I ` E AT tz “1u tD “0u Tk´1 k Tk Tk B ppζTk qTk ` qpζTk qpn ´ Tkq T1 ” ı 18

Tk k maxtppζT q , qpζT qA u k n k Tk I I ď E tz “1u Tk´1 . Tk Tk Tk B ppζT q ` qpζT qp1 ´ q T1 ” k n k n ı We can see that we attain the last upper bound if we select DTk “ 1 (i.e. stop) when ppζ q Tk ě qpζ qAk and D “ 0 (i.e. continue to the next query or final time if k “ K) Tk n Tk Tk Tk when the inequality is reversed. This clearly establishes (18) and identifies the optimum ver- sion of the decision functions.

As we can see the upper bound in (38) is written in terms of the genie response ζTk . In order to obtain an expression which has the same form as the one in (15) we need to average out this . We note

Tk k maxtppζT q , qpζT qA u k n k Tk I I (39) E tz “1u Tk´1 Tk Tk Tk B ppζT q ` qpζT qp1 ´ q T1 ” k n k n ı n n tk k maxtppζt q , qpζt qA u k n k tk I I I I “ ¨ ¨ ¨ E ¨ ¨ ¨ tk´1 tk tk tztk “1u tTk“tku tT1“t1u Bt1 ppζtk q ` qpζtk qp1 ´ q tką¨¨¨ąt1ě1 n n ÿ ÿ ” ı n n M maxtppmq tk , qpmqAk u n tk I I I I I “ ¨ ¨ ¨ E ζ m z 1 T t ¨ ¨ ¨ T t tk´1 tk tk t tk “ u t tk “ u t k“ ku t 1“ 1u B ppmq ` qpmqp1 ´ q t1 tką¨¨¨ąt1ě1 m“1 n n ÿ ÿ ÿ ” ı n n M maxtppmq tk , qpmqAk u n tk I I I I I “ ¨ ¨ ¨ Er ζ m z 1 T t ¨ ¨ ¨ T t tk´1 s, tk tk t tk “ u t tk “ u t k“ ku t 1“ 1u B ppmq ` qpmqp1 ´ q t1 tką¨¨¨ąt1ě1 m“1 n n ÿ ÿ ÿ with the last equality being true because the ratio is deterministic. Consider the last expecta- tion separately. Again, we observe the existence of the indicator I which allows us to tztk “1u write I I I I I Er ¨ ¨ ¨ tk´1 s tζtk “mu tztk “1u tTk“tku tT1“t1u Bt1 k´1 I I I I “ ErPpζt “ m|Z , zt “ 1q ¨ ¨ ¨ tk´1 s k tk k tztk “1u tTk“tku tT1“t1u Bt1 tk tk I I I I “ E ppmq ` qpmq 1 ´ ¨ ¨ ¨ tk´1 tztk “1u tTk“tku tT1“t1u n n Bt1 ”! ´ ¯) ı tk tk I I I I “ ppmq ` qpmq 1 ´ Er ¨ ¨ ¨ tk´1 s. tztk “1u tTk“tku tT1“t1u n n Bt1 We note that according! to Remark3 all´ indicators¯) are k´1, z 1 -measurable, which tZtk tk “ u allowed us in the first equation to position them outside the inner expectation which resulted in the conditional probability. For the second equation we used (7). Substituting in (39) we obtain

Tk k maxtppζT q , qpζT qA u k n k Tk I I (40) E tz “1u Tk´1 Tk Tk Tk B ppζT q ` qpζT qp1 ´ q T1 ” k n k n ı n n M tk k I I I I “ ¨ ¨ ¨ max ppmq , qpmqA Er ¨ ¨ ¨ tk´1 s tk tztk “1u tTk“tku tT1“t1u n Bt1 tką¨¨¨ąt1ě1 m“1 ÿ ÿ ´ ÿ ! )¯ n n k I I I I “ ¨ ¨ ¨ U Er ¨ ¨ ¨ tk´1 s tk tztk “1u tTk“tku tT1“t1u Bt1 tką¨¨¨ąt1ě1 ÿ ÿ n n k k E U I I I I tk 1 E U I I T , “ ¨ ¨ ¨ r tk tzt “1u tTk“tku ¨ ¨ ¨ tT1“t1u ´ s “ Tk tzT “1u k´1 s k Bt k B 1 T1 tką¨¨¨ąt1ě1 ÿ ÿ “ THE SECRETARY PROBLEM WITH RANDOM QUERIES 19

k where Ut is deterministic and defined in (14). As we have seen, the sum of the two terms in (38) is optimized in (40). A direct conse- quence of this optimization is the following inequality

k T`´1 k`1 I I T Pk “ PpξT` “ 1, DT` “ 1, BT q ` E UT tzT “1u k 1 k`1 k`1 BT `“1 1 ÿ “ ‰ k´1 T`´1 Tk´1 k`1 I I T “ PpξT` “ 1, DT` “ 1, BT q`PpξTk “ 1, DTk “ 1, BT q`E UT tzT “1u k 1 1 k`1 k`1 BT `“1 1 ÿ “ ‰ k´1 T`´1 k I I ď Ppξ “ 1, D “ 1, q ` E U Tk 1 “ P , T` T` BT1 Tk tzT “1u ´ k´1 k BT `“1 1 ÿ “ ‰ namely Pk ď Pk´1. Repeated application of this fact for k “ K,..., 1, proves (16) except for the last inequality. In other words, we have P P P P E U1 . succ “ K ď K´1 ď ¨ ¨ ¨ ď 0 “ r T1 s

The last expectation can be further optimized with respect to T1 using optimal stopping and, 0 if we call At the corresponding optimum average reward at t we can show that it follows the backward recursion in (13) with k “ 0. The steps are straightforward presenting no particular difficulty and are therefore omitted. This establishes the last inequality in (16) and concludes the proof of our main theorem.

APPENDIX: PROOF OF LEMMA3 k k´1 Let us first prove At ď At . We will show this fact using backward induction. To show its validity for k “ K we note from (14) that M t M t t UK “ max ppmq , qpmqAK ě ppmq “ “ UK`1. t n t n n t m“1 m“1 ÿ ( ÿ Applying now (13) for k “ K and k “ K ´ 1, using the previous inequality and the fact K K´1 K K´1 that An “ An “ 0, we can easily show using backward induction in t that At ď At . k k´1 k´1 k´2 Suppose now it is true for k, that is, At ď At , then we will show that At ď At . k k´1 k k´1 k´1 k´2 From At ď At and (14) we conclude that Ut ď Ut . Expressing At and At with k k´1 k´1 k´2 the help of (13), using that Ut ď Ut and the fact that An “ An “ 0, we can again k´1 k´2 prove using backward induction in t that At ď At therefore completing the induction. k The monotonicity in k of Ut is a direct consequence of (14) and of the same monotonicity of k At . k To establish the decrease of tAt u in t we use (13) and observe that 1 Ak ´ Ak “ max Uk`1, Ak ´ Ak ě 0 t´1 t t t t t ´ ! ) ¯ k which proves the desired inequality. Demonstrating the increase of tUt u in t requires more k work. From the definition in (14) and using (13) to replace At´1 we have that

M t ´ 1 Uk “ max ppmq , qpmqAk t´1 n t´1 m“1 ÿ ! ) M t 1 1 1 “ max ppmq 1 ´ , qpmq Ak 1 ´ ` maxtUk`1, Aku n t t t t t t m“1 ÿ ! ´ ¯ ” ´ ¯ ı) 20

M t 1 1 ď max ppmq , qpmqAk 1 ´ ` qpmq maxtUk`1, Aku n t t t t t m“1 ÿ ˆ ! )´ ¯ ˙ 1 1 1 “ Uk 1 ´ ` max Uk`1, Ak “ Uk ` max Uk`1, Ak ´ Uk , t t t t t t t t t t ´ ¯ ´ ¯ with the inequality being true because maxtad,( bd ` cu ď maxta, bud ` c when( c, d ě 0. k k k k`1 k To establish that Ut´1 ď Ut it suffices to prove that Ut ě maxtUt , At u, namely that k k`1 k k Ut ě Ut (which we already know to be valid) and Ut ě At . To show the latter, from its k M k k definition in (14) we can see that Ut ě m“1 qpmqAt “ At and this establishes the desired increase in t. ř k k`1 k k`1 To complete our proof we still need to show that An ď Un and A0 ě U0 . We recall k k`1 that An “ 0. On the other hand from (14) we can see that Un “ 1, therefore the first k`1 k`1 inequality is true. For the second, again from (14), we observe that U0 “ A0 and since k we previously established the decrease in k of At for fixed t, this proves the second inequality and concludes our proof.

Acknowledgements. This work was supported by the US National Science Foundation under Grant CIF 1513373, through Rutgers University, also in part by the US National Sci- ence Foundation grants CCF 15-26875, The Center for Science of Information at Purdue Uni- versity, under contract number 239 SBC PURDUE 4101-38050 and by the DARPA Molecu- lar Informatics Program.

REFERENCES

[1]C HIEN,I.,PAN, C. and MILENKOVIC, O. (2018). Query K-means clustering and the double dixie cup prob- lem. Proceedings of 32nd Conference on Neural Information Processing Systems, Montréal, Canada, 6649–6658. [2]C REWS,M.,JONES,B.,MYERS,K.,TAALMAN,L.,URBANSKI, M. and WILSON, B. (2019). Opportu- nity costs in the game of best choice. Elect. J. Comb., 26(1). [3]D YNKIN, E. B. (1963). The optimal choice of the stopping moment for a markov process. Dokl. Akad. Nauk. SSSR, 150 238–240. [4]F ERGUSON, T. S. (1989). Who solved the secretary problem? Stat. Science, 4(3) 282–289. [5]F REEMAN, P. R. (1983). The secretary problem and its extensions - a review. Internat. Statist. Rev., 51 189–206. [6]G ARDNER, M. (1960). Mathematical games. Scient. Amer., 202(3) 178–179. [7]G ILBERT, J. and MOSTELLER, F. (1966). Recognizing the maximum of a sequence. J. Am. Statist. Assoc., 61(3). [8]G USEIN-ZADE, S. (1966). The problem of choice and the optimal stopping rule for a sequence of indepen- dent trials. Th. Prob. Appl., 11(3) 472–476. [9]J ONES, B. (2020). Weighted games of best choice. SIAM J. Discr. Math., 34(1) 399–414. [10]L INDLEY, D. (1961). Dynamic programming and . Appl. Statist., 10 39–52. [11]L IU, X. and MILENKOVIC, O. (2020). Finding the second-best candidate under the mallows model. arXiv: 2011.07666, 2020. [12]L IU,X.,MILENKOVIC, O. and MOUSTAKIDES, G. V. (2021). Query-based selection of optimal candidates under the Mallows model. arXiv: 2101.07250. [13]M AZUMDAR A. and SAHA B. (2017). Clustering with noisy queries. Proceedings of 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA., 5789–5800. [14]N IKOLAEV, M. (1977). On a generalization of the best choice problem. Th. Prob. Appl., 22 187–190. [15]P ESKIR, P. and SHIRYAEV, A. (2006). Optimal Stopping and Free-Boundary Problems. Springer. [16]R OSE, J. S. (1982). A problem of optimal choice and assignment. Oper. Res,, 30 172–181. [17]S AKAGUCHI, M. (1978). Dowry problems and ola policies. Rep. Stat. Appl. Res., Juse, 25 124–128. [18]S HIRYAEV, A. (1978). Optimal Stopping Rules. Springer. [19]T AMAKI, M. (1979). A secretary problem with double choices. J. Oper. Res. Soc. Japan, 22 257–264. [20]T AMAKI, M. (1979). Recognizing both the maximum and the second maximum of a sequence. J. Appl. Prob., 16 803–812.