A dual reoptimization scheme for solving large-scale stochastic optimization problems
Abstract
Markov decision processes (MDPs) arising in many applications are high dimensional and subject to well-known curses of dimensionality. A variety of approximate dynamic programming methods have been developed to solve different classes of MDPs. Nevertheless, methods to tackle MDPs which are high dimensional in both the endogenous and exogenous components of their state are limited and rely on problem structure (e.g. convexity) and sophisticated value function approximations. In this paper, we propose a more general framework to approach this intractable problem class. We consider the information relaxation method that is typically used to obtain dual bounds on the MDP optimal value [1], and use it to develop a novel dual reoptimization scheme (DRH) that extracts non-anticipative decision rules from sample action distributions. Specifically, we obtain a distribution of actions at a given MDP stage and state by combining Monte Carlo simulation with the solution of dual mathematical programs on sample paths. We develop some theoretical support for our DRH method, and apply it to an emerging energy real option problem where a company has to construct a dynamic power procurement portfolio to meet a renewable power target by a future date at minimum cost. In other words, the company has to supply a specific percentage of its electricity demand from renewable generators, and has access to multiple short and long-term procurement options. We find that our DRH method outperforms commonly used primal reoptimization methods and simple heuristics on realistic instances. Thus, DRH emerges as a promising framework to tackle high-dimensional MDPs.