|
The Lai-Robbins lower bound gives an asymptotic lower bound on the regret that any uniformly good algorithm must incur in the stochastic [[Multi-armed bandit|multi-armed bandit problem]]. The original result was proved by [[Tze Leung Lai]] and [[Herbert Robbins]] in 1985 for parametric [[Exponential family|exponential families]]. Later work extended the statement to more general classes of distributions. |
|
The Lai–Robbins lower bound gives an asymptotic lower bound on the regret that any uniformly good algorithm must incur in the stochastic [[Multi-armed bandit|multi-armed bandit problem]]. The original result was proved by [[Tze Leung Lai]] and [[Herbert Robbins]] in 1985 for parametric [[Exponential family|exponential families]]. Later work extended the statement to more general classes of distributions. |
|
The theorem gives the right amount of time we should pull a suboptimal arm to distinguish whether we are in the instance with or with where is such that . |
|
The theorem gives the right amount of time we should pull a suboptimal arm to distinguish whether we are in the instance with or with where is such that . |
|
Consistency imposes that, for every , the number of pulls of an optimal arm must be large. This means that is estimated very accurately. The goal is to determine, for a suboptimal arm , how many samples are needed to be confident, with the appropriate level of confidence, that . To do so, we use what is called the '''most confusing instance''': an instance close to such that arm is optimal. We define it as such that, for all , , and is chosen so that . The objective is to determine how many samples of arm are required to distinguish whether we are in the instance with or with in terms of distance. |
|
Consistency imposes that, for every , the number of pulls of an optimal arm must be large. This means that is estimated very accurately. The goal is to determine, for a suboptimal arm , how many samples are needed to be confident, with the appropriate level of confidence, that . To do so, we use what is called the '''most confusing instance''': an instance close to such that arm is optimal. We define it as such that, for all , , and is chosen so that . The objective is to determine how many samples of arm are required to distinguish whether we are in the instance with or with in terms of distance. |