Lai–Robbins lower bound

Lai–Robbins lower bound

← Previous revision Revision as of 10:04, 19 April 2026
Line 1: Line 1:
The Lai-Robbins lower bound gives an asymptotic lower bound on the regret that any uniformly good algorithm must incur in the stochastic [[Multi-armed bandit|multi-armed bandit problem]]. The original result was proved by [[Tze Leung Lai]] and [[Herbert Robbins]] in 1985 for parametric [[Exponential family|exponential families]]. Later work extended the statement to more general classes of distributions.
The Lai–Robbins lower bound gives an asymptotic lower bound on the regret that any uniformly good algorithm must incur in the stochastic [[Multi-armed bandit|multi-armed bandit problem]]. The original result was proved by [[Tze Leung Lai]] and [[Herbert Robbins]] in 1985 for parametric [[Exponential family|exponential families]]. Later work extended the statement to more general classes of distributions.


== Multi-armed bandit problem ==
== Multi-armed bandit problem ==
Line 32: Line 32:
# One may consider a ''fixed-horizon'' setting (known T) or an ''anytime'' setting (unknown T).
# One may consider a ''fixed-horizon'' setting (known T) or an ''anytime'' setting (unknown T).


== Lai-Robbins lower bound ==
== Lai–Robbins lower bound ==


The theorem gives the right amount of time we should pull a suboptimal arm k to distinguish whether we are in the instance with \nu_k or with \tilde{\nu}_k where \tilde{\nu}_k is such that \tilde{\mu}_k > \mu^*.
The theorem gives the right amount of time we should pull a suboptimal arm k to distinguish whether we are in the instance with \nu_k or with \tilde{\nu}_k where \tilde{\nu}_k is such that \tilde{\mu}_k > \mu^*.
Line 86: Line 86:
Consistency imposes that, for every \nu, the number of pulls of an optimal arm must be large. This means that \mu^* is estimated very accurately. The goal is to determine, for a suboptimal arm k, how many samples are needed to be confident, with the appropriate level of confidence, that \mu_k < \mu^*. To do so, we use what is called the '''most confusing instance''': an instance close to \nu such that arm k is optimal. We define it as \tilde{\nu} such that, for all a \neq k, \tilde{\nu}_a = \nu_a, and \tilde{\nu}_k is chosen so that \tilde{\mu}_k > \mu^*. The objective is to determine how many samples of arm k are required to distinguish whether we are in the instance with \nu_k or with \tilde{\nu}_k in terms of \operatorname{KL} distance.
Consistency imposes that, for every \nu, the number of pulls of an optimal arm must be large. This means that \mu^* is estimated very accurately. The goal is to determine, for a suboptimal arm k, how many samples are needed to be confident, with the appropriate level of confidence, that \mu_k < \mu^*. To do so, we use what is called the '''most confusing instance''': an instance close to \nu such that arm k is optimal. We define it as \tilde{\nu} such that, for all a \neq k, \tilde{\nu}_a = \nu_a, and \tilde{\nu}_k is chosen so that \tilde{\mu}_k > \mu^*. The objective is to determine how many samples of arm k are required to distinguish whether we are in the instance with \nu_k or with \tilde{\nu}_k in terms of \operatorname{KL} distance.


== Algorithms achieving the Lai-Robbins lower bound ==
== Algorithms achieving the Lai–Robbins lower bound ==


Several algorithms are known to achieve the Lai-Robbins asymptotic lower bound
Several algorithms are known to achieve the Lai–Robbins asymptotic lower bound
under specific assumptions on the reward distribution class \mathcal{D}.
under specific assumptions on the reward distribution class \mathcal{D}.
The following list summarizes a non-exhaustive list of algorithms mathing the lower bound.
The following list summarizes a non-exhaustive list of algorithms mathing the lower bound.