November 2019 – Quantitative Regulation

Published on arxiv:

Improving reinforcement learning algorithms: towards optimal learning rate policies

Abstract: This paper investigates to what extent we can improve reinforcement learning algorithms. Our study is split in three parts. First, our analysis shows that the classical asymptotic convergence rate $O(1/\sqrt{N})$ is pessimistic and can be replaced by $O((log(N)/N)^{\beta})$ with 1/2≤β≤1 and N the number of iterations. Second, we propose a dynamic optimal policy for the choice of the learning rate $(\gamma_{k})_{k\text{\ensuremath{\ge}}0}$ used in stochastic algorithms. We decompose our policy into two interacting levels: the inner and the outer level. In the inner level, we present the PASS algorithm (for “PAst Sign Search”) which, based on a predefined sequence $(\gamma_{k}^{o})_{k\text{\ensuremath{\ge}}0}$ , constructs a new sequence $(\gamma_{k}^{i})_{k\text{\ensuremath{\ge}}0}$ whose error decreases faster. In the outer level, we propose an optimal methodology for the selection of the predefined sequence $(\gamma_{k}^{o})_{k\text{\ensuremath{\ge}}0}$ . Third, we show empirically that our selection methodology of the learning rate outperforms significantly standard algorithms used in reinforcement learning (RL) in the three following applications: the estimation of a drift, the optimal placement of limit orders and the optimal execution of large number of shares.

Month: November 2019

Risk Awards 2020 – Rising Stars

Improving reinforcement learning algorithms: towards optimal learning rate policies