Improved regret for zeroth-order adversarial bandit convex optimisation

  • Tor Lattimore

    Google UK, London, UK
Improved regret for zeroth-order adversarial bandit convex optimisation cover

A subscription is required to access this article.

Abstract

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most , where is the dimension and is the number of interactions. This improves on the bound of by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

Cite this article

Tor Lattimore, Improved regret for zeroth-order adversarial bandit convex optimisation. Math. Stat. Learn. 2 (2019), no. 3/4, pp. 311–334

DOI 10.4171/MSL/17