Finding All Epsilon-Good Arms in Stochastic Bandits
The pure-exploration problem in stochastic multi-armed bandits aims to find one or more arms with the largest (or near largest) means. Examples include finding an ∈-good arm, best-arm identification, top-k arm identification, and finding all arms with means above a specified threshold. However, the problem of finding all ∈-good arms has been overlooked in past work, although arguably this may be the most natural objective in many applications. For example, a virologist may conduct preliminary laboratory experiments on a large candidate set of treatments and move all ∈-good treatments into more expensive clinical trials. Since the ultimate clinical efficacy is uncertain, it is important to identify all ∈-good candidates. Mathematically, the all-∈-good arm identification problem presents significant new challenges and surprises that do not arise in the pure-exploration objectives studied in the past. We introduce two algorithms to overcome these and demonstrate their great empirical performance on a large-scale crowd-sourced dataset of 2.2Mratings collected by the New Yorker Caption Contest as well as a dataset testing hundreds of possible cancer drugs.
B. Mason et al., "Finding All ∈-Good Arms in Stochastic Bandits," Advances in Neural Information Processing Systems, Neural Information Processing Systems Foundation, Dec 2020.
34th Conference on Neural Information Processing Systems, NeurIPS 2020 (2020: Dec. 6-12, Vancouver, Canada)
Article - Conference proceedings
© 2020 Neural Information Processing Systems Foundation, All rights reserved.
12 Dec 2020