Probability in the Engineering and Informational (2021), 1–9 doi:10.1017/S0269964821000395

RESEARCH ARTICLE

A regret lower bound for assortment optimization under the capacitated MNL model with arbitrary revenue parameters Yannik Peeters1 and Arnoud V. den Boer1,2 1 Business School, , Muidergracht 12, 1018 TV Amsterdam, The Netherlands. E-mail: [email protected] 2 Korteweg-de Vries Institute for Mathematics, University of Amsterdam, Park 105-107, 1098 XG Amsterdam, The Netherlands.

Keywords: Assortment optimization, Incomplete information, Multinomial logit model, Regret lower bound

Abstract In this note, we consider dynamic assortment optimization with incomplete information under the capacitated multinomial logit choice model. Recently, it has been shown that the regret (the cumulative expected revenue loss caused√ by offering suboptimal assortments) that any decision policy endures is bounded from below by a constant times 𝑁𝑇,where𝑁 denotes the number of products and 𝑇 denotes the time horizon. This result is shown under the assumption that the product revenues are constant, and thus leaves the question open whether a lower regret rate can be achieved for nonconstant revenue parameters. In this note, we show that this is not the case: we show that, for any vector of product revenues√ there is a positive constant such that the regret√ of any policy is bounded from below by this constant times 𝑁𝑇. Our result implies that policies that achieve O( 𝑁𝑇) regret are asymptotically optimal for all product revenue parameters.

1. Introduction In this note, we consider the problem of assortment optimization under the multinomial logit (MNL) choice model with a capacity constraint on the size of the assortment and incomplete information about the model parameters. This problem has recently received considerable attention in the literature (see, e.g., [1–7,9,10]). Two notable recent contributions are from Agrawal et al. [1,2], who construct decision policies based on Thompson Sampling and Upper Confidence Bounds, respectively, and show that the regret of these policies—the cumulative expected revenue loss compared with the benchmark√ of always offering an optimal assortment—is, up to logarithmic terms, bounded by a constant times 𝑁𝑇, where 𝑁 denotes the number of products and 𝑇  𝑁 denotes the length of the time horizon. These upper bounds are complemented by the recent work from Chen and Wang√ [3], who show that the regret of any policy is bounded from below by a positive constant times 𝑁𝑇, implying that the policies by Agrawal et al. [1,2] are (up to logarithmic terms) asymptotically optimal. The lower bound by Chen and Wang [3] is proven under the assumption that the product revenues are constant—that is, each product generates the same amount of revenue when sold. In practice, it often happens that different products have different marginal revenues, and it is apriorinot completely clear whether the policies by Agrawal et al. [1,2] are still asymptotically optimal or that a lower regret can be achieved. In addition, Chen and Wang [3] assume that 𝐾, the maximum number of products allowed in 1 · 𝑁 1 an assortment, is bounded by 4 , but point out that this√ constant 4 can probably be increased. In this note, we settle this open question by proving√ a 𝑁𝑇 regret lower bound for any given vector of product revenues. This implies that policies with O( 𝑁𝑇) regret are asymptotically optimal regardless

© The Author(s), 2021. Published by Cambridge University Press. This is an Open Access article, distributed under the terms of the Creative Com- mons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited. Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.19, on 02 Oct 2021 at 02:18:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0269964821000395 2 Y. Peeters and A. V. den Boer

𝐾<1 𝑁 of the product revenue parameters. Furthermore, our result is valid for all 2 , thereby confirming 𝐾  1 𝑁 the intuition of Chen and Wang [3] that the constraint 4 is not tight.

2. Model and main result We consider the problem of dynamic assortment optimization under the MNL choice model. In this model, the number of products is 𝑁 ∈ N. Henceforth, we abbreviate the set of products {1 ...,𝑁} as [𝑁]. Each product 𝑖 ∈[𝑁] yields a known marginal revenue to the seller of 𝑤𝑖 > 0. Without loss of generality due to scaling, we can assume that 𝑤𝑖  1 for all 𝑖 ∈[𝑁]. Each product 𝑖 ∈[𝑁] is associated with a preference parameter 𝑣𝑖  0, unknown to the seller. Each offered assortment 𝑆 ⊆[𝑁] must satisfy a capacity constraint, that is, |𝑆|  𝐾 for capacity constraint 𝐾 ∈ N, 𝐾  𝑁. For notational convenience, we write

A𝐾 := {𝑆 ⊆[𝑁] : |𝑆|  𝐾}

for the collection of all assortments of size at most 𝐾, and

S𝐾 := {𝑆 ⊆[𝑁] : |𝑆| = 𝐾}

for the collection of all assortments of exact size 𝐾.Let𝑇 ∈ N denote a finite time horizon. Then, at each time 𝑡 ∈[𝑇], the seller selects an assortment 𝑆𝑡 ∈A𝐾 based on the purchase information available up to and including time 𝑡 − 1. Thereafter, the seller observes a single purchase 𝑌𝑡 ∈ 𝑆𝑡 ∪{0}, where product 0 indicates a no-purchase. The purchase probabilities within the MNL model are given by

𝑣𝑖 P(𝑌𝑡 = 𝑖 | 𝑆𝑡 = 𝑆) =  , 1 + 𝑗 ∈𝑆 𝑣 𝑗

for all 𝑡 ∈[𝑇] and 𝑖 ∈ 𝑆 ∪{0},wherewewrite𝑣0 := 1. The assortment decisions of the seller are described by his/her policy: a collection of probability distributions 𝜋 = (𝜋(·|ℎ) : ℎ ∈ 𝐻) on A𝐾 , where  𝑡−1 𝐻 := {(𝑆,𝑌) : 𝑌 ∈ 𝑆 ∪{0},𝑆 ∈A𝐾 } 𝑡 ∈[𝑇 ]

is the set of possible histories. Then, conditionally on ℎ = (𝑆1,𝑌1,...,𝑆𝑡−1,𝑌𝑡−1), assortment 𝑆𝑡 has 𝜋 distribution 𝜋(·|ℎ), for all ℎ ∈ 𝐻 and all 𝑡 ∈[𝑇].LetE𝑣 be the expectation operator under policy 𝜋 and preference vector 𝑣 ∈V:= [0, ∞) 𝑁 . The objective for the seller is to find a policy 𝜋 that maximizes the total accumulated revenue or, equivalently, minimizes the accumulated regret:

𝑇 𝜋 Δ𝜋 (𝑇,𝑣) := 𝑇 · max 𝑟(𝑆, 𝑣)− E𝑣 [𝑟(𝑆𝑡 ,𝑣)], 𝑆∈A𝐾 𝑡=1

where 𝑟(𝑆, 𝑣) is the expected revenue of assortment 𝑆 ⊆[𝑁] under preference vector 𝑣 ∈V:  𝑖∈𝑆 𝑣𝑖 𝑤𝑖 𝑟(𝑆, 𝑣) :=  . 1 + 𝑖∈𝑆 𝑣𝑖

In addition, we define the worst-case regret:

Δ𝜋 (𝑇) := sup Δ𝜋 (𝑇,𝑣). 𝑣 ∈V

The main result, presented√ below, states that the regret of any policy can uniformly be bounded from below by a constant times 𝑁𝑇. Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.19, on 02 Oct 2021 at 02:18:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0269964821000395 Probability in the Engineering and Informational Sciences 3

Theorem 1. Suppose that 퐾<푁/2. Then, there exists a constant 푐1 > 0 such that, for all 푇  푁 and for all policies 휋, √ Δ휋 (푇)  푐1 푁푇.

3. Proof of Theorem 1 3.1. Proof outline The proof of Theorem 1 can be broken up into four steps. First, we define a baseline preference vector 0 0 푣 ∈Vand we show that under 푣 any assortment 푆 ∈S퐾 is optimal. Second, for each 푆 ∈A퐾 ,we define a preference vector 푣푆 ∈Vby  푣0 ( + 휖), 푖 ∈ 푆, 푣푆 = 푖 1 if 푖 : 0 (3.1) 푣푖 , otherwise,

for some 휖 ∈(0, 1]. For each such 푣푆, we show that the instantaneous regret from offering a suboptimal assortment 푆푡 is bounded from below by a constant times the number of products |푆 \ 푆푡 | not in 푆;cf. Lemma 1 below. This lower bound takes into account how much the assortments 푆1,...,푆푇 overlap 푆 with 푆 when the preference vector is 푣 . Third, let 푁푖 denote the number of times product 푖 ∈[푁] is contained 푆1,...,푆푇 , that is, 푇 푁푖 := 1{푖 ∈ 푆푡 }. 푡=1

Then, we use the Kullback–Leibler (KL) divergence and Pinsker’s inequality to upper bound the 푆 푆\{푖} difference between the expected value of 푁푖 under 푣 and 푣 , see Lemma 2. Fourth, we apply a 푆 randomization argument over {푣 : 푆 ∈S퐾 }, we combine the previous steps, and we set 휖 accordingly to conclude the proof. The novelty of this work is concentrated in the first two steps. The third and fourth step closely follow the work of Chen and Wang [3]. These last steps are included (1) because of slight deviations in our setup, (2) for the sake of completeness, and (3) since the proof techniques are extended to the case where 퐾/푁<1/2. In the work of Chen and Wang [3], the lower bound is shown for 퐾/푁  1/4, but the authors already mention that this constraint can probably be relaxed. Our proof confirms that this is indeed the case.

3.2. Step 1: Construction of the baseline preference vector

Let 푤 := min푖∈[푁 ] 푤푖 > 0 and define the constant

푤2 푠 := . 3 + 2푤

Note that 푠<푤. The baseline preference vector is formally defined as 푠 0 푣푖 := , for all 푖 ∈[푁]. 퐾(푤푖 − 푠)

0 Now, the expected revenue for any 푆 ∈A퐾 under 푣 can be rewritten as

  푤  푤 0 𝑖 𝑖 푣 푤푖 푠 푖∈푆 푠 푖∈푆 푟(푆, 푣0) = 푖∈푆 푖 =  푤𝑖 −푠 = 푤𝑖 −푠 . + 푣0 퐾 + 푠 퐾 −|푆|+ 푤𝑖 1 푖∈푆 푖 푖∈푆 푤𝑖 −푠 푖∈푆 푤𝑖 −푠 Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.19, on 02 Oct 2021 at 02:18:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0269964821000395 4 Y. Peeters and A. V. den Boer

The expression on the right-hand side is only maximized by assortments 푆 with maximal size |푆| = 퐾, in which case

푟(푆, 푣0) = 푟(푆,푣0) = 푠. max 푆 ∈A𝐾

It follows that all assortments 푆 with size 퐾 are optimal.

3.3. Step 2: Lower bound on the instantaneous regret of 푣푆 For the second step, we bound the instantaneous regret under 푣푆.

Lemma 1. Let 푆 ∈S퐾 . Then, there exists a constant 푐2 > 0, only depending on 푤 and 푠, such that, for all 푡 ∈[푇] and 푆푡 ∈A퐾 ,

 푆 푆 휖 |푆\푆푡 | 푟(푆 ,푣 )−푟(푆푡 ,푣 )  푐 . max 2 푆 ∈A𝐾 퐾

As a consequence,   푇   푆 푆 1 푇 · 푟(푆 ,푣 )− 푟(푆푡 ,푣 )  푐 휖 푇 − 푁푖 . max 2 (3.2) 푆 ∈A𝐾 퐾 푡=1 푖∈푆

 Proof. Fix 푆 ∈S퐾 . First, note that since 휖  1, for any 푆 ∈A퐾 , it holds that

 푠 푣푆  2 . 푖 푤 − 푠 (3.3) 푖∈푆

∗  푆 ∗ ∗ 푆 ∗  푆 푆 ∈  푟(푆 ,푣 ) 휚 = 푟(푆 ,푣 ) 휚  푟(푆 ,푣 ) Second, let arg max푆 ∈A𝐾 and . By rewriting the inequality   for all 푆 ∈A퐾 , we find that for all 푆 ∈A퐾  ∗ 푆 ∗ 휚  푣푖 (푤푖 − 휚 ). (3.4) 푖∈푆

Let 푡 ∈[푇] and 푆푡 ∈A퐾 . Then, it holds that

 푆 푣 푤푖 푟(푆∗,푣푆)−푟(푆 ,푣푆) = 휚∗ − 푖∈푆𝑡 푖 푡  푆 1 + 푖∈푆 푣 𝑡 푖    1 =  휚∗ + 푣푆 휚∗ − 푣푆 푤 + 푣푆 푖 푖 푖 1 푖∈푆𝑡 푖 푖∈푆 푖∈푆  𝑡  𝑡 푤 − 푠   휚∗ − 푣푆 (푤 − 휚∗) 푤 + 푠 푖 푖 푖∈푆  𝑡  푤 − 푠    푣푆 (푤 − 휚∗)− 푣푆 (푤 − 휚∗) 푤 + 푠 푖 푖 푖 푖 푖∈푆 푖∈푆𝑡

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.19, on 02 Oct 2021 at 02:18:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0269964821000395 Probability in the Engineering and Informational Sciences 5  푤 − 푠   = 푣푆 (푤 − 푠)− 푣푆 (푤 − 푠) 푤 + 푠 푖 푖 푖 푖 푖∈푆 푖∈푆  𝑡 (푎)      ∗ 푆 푆 −(휚 − 푠) 푣푖 − 푣푖 . 푖∈푆 푖∈푆  𝑡 (푏)

Here, the first inequality is due to (3.3) and the second inequality follows from (3.4) with 푆 = 푆.Next, note that since |푆푡 |  퐾 and |푆| = 퐾, we find that

|푆푡 \푆|  |푆\푆푡 |. (3.5)

Now, term (푎) can be bounded from below as   푆 푆 (푎) = 푣푖 (푤푖 − 푠)− 푣푖 (푤푖 − 푠)

푖∈푆\푆𝑡 푖∈푆𝑡 \푆 푠 = (( + 휖)|푆\푆 |−|푆 \푆|) 퐾 1 푡 푡 휖 |푆\푆 |  푠 푡 . 퐾 (3.6)

Here, at the final inequality, we used (3.5). Next, term (푏) can be bounded from above as

  (푏)  |휚∗ − 푠| 푣푆 − 푣푆 .   푖 푖 푖∈푆 푖∈푆𝑡 (푐)   (푑)

푆 0 ∗ 0 Now, for term (푐), we note that 푣푖  푣푖 for all 푖 ∈[푁]. In addition, since 푟(푆 ,푣 )  푠,   푆 0 ∗ 푣 푤푖 ∗ 푣 푤푖 휚∗ − 푠  푖∈푆 푖 − 푖∈푆 푖 푆 0 1 + 푖∈푆∗ 푣푖 1 + 푖∈푆∗ 푣푖  1 푆 0   (푣 − 푣 )푤푖 + 푣0 푖 푖 1 푖∈푆∗ 푖 푖∈푆∗ 푁  푠  (푣푆 − 푣0) = 휖 푣0  휖. 푖 푖 푖 푤 − 푠 푖=1 푖∈푆

This entails an upper bound for (푐).Term(푑) is bounded from above as   푆 푆 (푑)  푣푖 + 푣푖 푖∈푆\푆𝑡 푖∈푆𝑡 \푆  0 0  (1 + 휖) 푣푖 + 푣푖

푖∈푆\푆𝑡 푖∈푆𝑡 \푆 푠 푠  ( + 휖) |푆\푆 |+ |푆 \푆| 1 퐾(푤 − 푠) 푡 퐾(푤 − 푠) 푡 푠 |푆\푆 |  3 푡 . 푤 − 푠 퐾

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.19, on 02 Oct 2021 at 02:18:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0269964821000395 6 Y. Peeters and A. V. den Boer

Here, at the final inequality, we used (3.5) and the fact that 휖  1. Now, we combine the upper bounds of (푐) and (푑) to find that 2 3푠 휖 |푆\푆푡 | (푏)  · . (3.7) (푤 − 푠)2 퐾

It follows from (3.6) and (3.7) that  2 ∗ 푆 푆 푤 − 푠 3푠 휖 |푆\푆푡 | 푟(푆 ,푣 )−푟(푆푡 ,푣 )  푠 − 푤 + 푠 (푤 − 푠)2 퐾 휖 |푆\푆 |  푐 푡 , 2 퐾

where  푤 − 푠 3푠2 푐 := 푠 − . 2 푤 + 푠 (푤 − 푠)2

2 2 Note that the constant 푐2 is positive if (푤 − 푠) > 3푠. This follows from 푠 = 푤 /(3 + 2푤) since

(푤 − 푠)2 − 3푠>푤2 − 푠(3 + 2푤).

Statement (3.2) follows from the additional observation 푇 푇  |푆\푆푡 | = 푇퐾 − |푆 ∩ 푆푡 | = 푇퐾 − 푁푖 .  푡=1 푡=1 푖∈푆

3.4. Step 3: KL divergence and Pinsker’s inequality We denote the dependence of the expected value and the probability on the preference vector 푣푆 as E푆 [·] and P푆 (·) for 푆 ∈A퐾 . In addition, we write 푆\푖 instead of 푆\{푖}. The lemma below states an upper bound on the KL divergence of P푆 and P푆\푖 and uses Pinsker’s inequality to derive an upper 푆 푆\푖 bound on the absolute difference between the expected value of 푁푖 under 푣 and 푣 .

 Lemma 2. Let 푆 ∈S퐾 , 푆 ∈A퐾 , and 푖 ∈ 푆. Then, there exists a constant 푐3, only depending on 푤 and 푠, such that 휖 2 (P (·|푆)||P (·|푆))  푐 . KL 푆 푆\푖 3 퐾 As a consequence,  휖푇3/2 |E푆 [푁푖]−E푆\푖 [푁푖]|  2푐 √ . (3.8) 3 퐾

Proof. Let P and Q be arbitrary probability measures on 푆 ∪{0}. It can be shown, see, for example, Lemma 3 from Chen and Wang [3], that

 2 (푝 푗 − 푞 푗 ) KL(P || Q)  , 푞 푗 푗 ∈푆∪{0}

where 푝 푗 and 푞 푗 are the probabilities of outcome 푗 under P and Q, respectively. We apply this result for 푝 푗 and 푞 푗 defined as 푆 푆\푖 푣 푗 푣 푗 푝 =  푞 = , 푗 : 푆 and 푗 :  푆\푖 +  푣 1 ℓ ∈푆 ℓ 1 + ℓ ∈푆 푣ℓ Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.19, on 02 Oct 2021 at 02:18:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0269964821000395 Probability in the Engineering and Informational Sciences 7

for 푗 ∈ 푆 ∪{0}. First, note that by (3.3), for all 푗 ∈ 푆 ∪{0},

푣0 푤 − 푠 푗 0 푞 푗  푠 = 푣 푗 . 1 + 2 푤−푠 푤 + 푠

 Now, we bound |푝 푗 − 푞 푗 | from above for 푗 ∈ 푆 ∪{0}. Note that for 푗 = 0 it holds that

 푆  푆\푖 |  푣 −  푣 | |푝 − 푞 | = ℓ ∈푆 ℓ ℓ ∈푆 ℓ 0 0  푆  푆\푖 (1 + ℓ ∈푆 푣ℓ )(1 + ℓ ∈푆 푣ℓ ) 0 0 0  |(1 + 휖)푣푖 − 푣푖 | = 푣푖 휖.

For 푗 ≠ 푖, since 휖  1, we find that

푆 0 0 |푝 푗 − 푞 푗 | = 푣 푗 |푝0 − 푞0|  2푣 푗 푣푖 휖.

For 푗 = 푖, we find that

0 |푝푖 − 푞푖 | = 푣푖 |푝0 − 푞0 + 휖푝0| 0  푣푖 (|푝0 − 푞0|+휖푝0) 0 0  푣푖 (푣푖 + 1)휖.

Therefore, we conclude that

 2   (푝 푗 − 푞 푗 ) KL(P푆 (·|푆 )||P푆\푖 (·|푆 ))  푞 푗 푗 ∈푆∪{0} (푝 − 푞 )2  (푝 − 푞 )2 (푝 − 푞 )2  0 0 + 푗 푗 + 푖 푖 푞 푞 푞 0 푗 ∈푆 푗≠푖 푗 푖  :  푤 + 푠   (푣0휖)2 + (푣0휖)2 푣0 + 푣0 (푣0 + )2휖 2 푤 − 푠 푖 4 푖 푗 푖 푖 1 푗 ∈푆: 푗≠푖    푠(푤 + 푠) 푠 푠2 푤 2 휖 2  + 4 + (푤 − 푠)2 푤 − 푠 (푤 − 푠)2 푤 − 푠 퐾 휖 2 = 푐 , 3 퐾

where  푠(푤 + 푠) 푠 4푠2 + 푤2 푐 := + . 3 (푤 − 푠)2 푤 − 푠 (푤 − 푠)2

Next, note that the entire probability measures P푆 and P푆\푖 depend on 푇. Then, as a consequence of the chain rule of the KL divergence, we find that

휖 2 푇 (P || P )  푐 . KL 푆 푆\푖 3 퐾 Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.19, on 02 Oct 2021 at 02:18:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0269964821000395 8 Y. Peeters and A. V. den Boer

Now, statement (3.8) follows from

푇 |E푆 [푁푖]−E푆\푖 [푁푖]|  푛|P푆 (푁푖 = 푛)−P푆\푖 (푁푖 = 푛)| 푛=0 푇  푇 |P푆 (푁푖 = 푛)−P푆\푖 (푁푖 = 푛)| 푛=0 = 2푇 max |P푆 (푁푖 = 푛)−P푆\푖 (푁푖 = 푛)| 푛=0,...,푇  2푇 sup |P푆 (퐴)−P푆\푖 (퐴)|  퐴  푇 2KL(P푆 || P푆\푖), (3.9)

where the step in (3.9) follows from, for example, Proposition 4.2 from Levin et al. [8] and we used Pinsker’s inequality at the final inequality. 

3.5. Step 4: Proving the main result With all the established ingredients, we can finalize the proof of the lower bound on the regret.

푆 Proof of Theorem 1. Since 푣 ∈Vfor all 푆 ∈S퐾 and by Lemma 1, we know that  Δ (푇)  1 Δ (푇,푣푆) 휋 |S | 휋 퐾 푆∈S  𝐾     푐 휖 푇 − 1 1 E [푁 ] . 2 |S | 퐾 푆 푖 (3.10) 퐾 푆∈S 푖∈푆 𝐾  (푎)

We decompose (푎) into two terms:     (푎) = 1 1 E [푁 ] + 1 1 (E [푁 ]−E [푁 ]) . |S | 퐾 푆\푖 푖 |S | 퐾 푆 푖 푆\푖 푖 퐾 푆∈S 푖∈푆 퐾 푆∈S 푖∈푆 𝐾  𝐾  (푏) (푐)

Recall that 푐 = 퐾/푁 ∈(0, 1/2). By summing over 푆 = 푆\푖 instead of over 푆, we bound (푏) from above by   1 1 |S퐾 −1| 푐 (푏) = E  [푁 ]  푇  푇, |S | 퐾 푆 푖 |S | − 푐 퐾   퐾 1 푆 ∈S𝐾−1 푖∉푆  where the first inequality follows from 푖∈[푁 ] E푆 [푁푖]  푇퐾, and the second inequality from   |S | 푁 퐾 퐾/푁 퐾 −1 = 퐾 −1 =  . |S | 푁 푁 − 퐾 + − 퐾/푁 퐾 퐾 1 1

Now, (푐) can be bounded by applying Lemma 2: √  휖푇3/2 2푐 휖푇3/2 (푐)  2푐 √ = √ 3 √ . 3 퐾 푐 푁

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.19, on 02 Oct 2021 at 02:18:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0269964821000395 Probability in the Engineering and Informational Sciences 9

By plugging the upper bounds on (푏) and (푐) in (3.10), we obtain √  3/2 푐 2푐3 휖푇 Δ휋 (푇)  푐 휖 푇 − 푇 − √ √ 2 1 − 푐 푐 푁 √  1 − 2푐 2푐 휖푇3/2 = 푐 휖 푇 − √ 3 √ . 2 1 − 푐 푐 푁 휖 Now, we set as  √  (1 − 2푐) 푐  휖 = min 1, √ 푁/푇 . 2(1 − 푐) 2푐3 This yields, for all 푇  푁,  √  2 √ 푐2 2푐3 푐2 (1 − 2푐) 푐 Δ휋 (푇)  min √ 푇, √ 푁푇 . 푐 8(1 − 푐) 2푐3 √ Finally, note that for 푇  푁 it follows that 푇  푁푇 and therefore √ Δ휋 (푇)  푐1 푁푇,

where  √  푐 2푐 푐 (1 − 2푐)2푐 푐 := min 2√ 3 , 2 √ > 0.  1 푐 8(1 − 푐) 2푐3

Acknowledgments. The authors thank the Editor in Chief and an anonymous reviewer for their positive remarks and useful comments.

Competing interest. The authors declare no conflict of interest.

References [1] Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2017). Thompson sampling for the MNL-bandit. In Conference on Learning Theory (COLT), pp. 76–78. [2] Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2019). MNL-bandit: a dynamic learning approach to assortment selection. Operations Research 67(5): 1453–1485. [3] Chen, X. & Wang, Y. (2018). A note on a tight lower bound for MNL-bandit assortment selection models. Operations Research Letters 46(5): 534–537. [4] Chen, X., Wang, Y., & Zhou, Y. (2018). An optimal policy for dynamic assortment planning under uncapacitated multinomial logit models. ArXiv e-print. https://arxiv.org/abs/1805.04785. [5] Cheung, W.C. & Simchi-Levi, D. (2017). Assortment optimization under unknown multinomial logit choice models. ArXiv e-print. https://arxiv.org/abs/1704.00108. [6] Farias, V.F., Jagabathula, S., & Shah, D. (2013). A nonparametric approach to modeling choice with limited data. Management Science 59(2): 305–322. [7] Kallus, N. & Udell, M. (2016). Dynamic assortment personalization in high dimensions. ArXiv e-print. https://arxiv.org/ abs/1610.05604. [8] Levin, D.A., Peres, Y., & Wilmer, E.L. (2017). Markov chains and mixing times, 2nd ed. Providence, RI: American Mathematical Society. [9] Rusmevichientong, P., Shen, Z.-J.M., & Shmoys, D.B. (2010). Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Operations Research 58(6): 1666–1680. [10] Sauré, D. & Zeevi, A. (2013). Optimal dynamic assortment planning with demand learning. Manufacturing & Service Operations Management 15(3): 387–404.

Cite this article: Peeters Y and den Boer AV (2021). A regret lower bound for assortment optimization under the capacitated MNL model with arbitrary revenue parameters. Probability in the Engineering and Informational Sciences 1–9. https://doi.org/10.1017/S0269964821000395

Downloaded from https://www.cambridge.org/core. IP address: 170.106.33.19, on 02 Oct 2021 at 02:18:55, subject to the Cambridge Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/S0269964821000395