Optimising the Beer Distribution Game: a Reinforcement Learning and Monte Carlo Tree Search Approach.

  • Thu 9 Dec 21

    14:00 - 15:00

  • Online


  • Event speaker

    Felipe Maldonado

  • Event type

    Lectures, talks and seminars

  • Event organiser

    Mathematical Sciences, Department of

  • Contact details

    Osama Mahmoud

These Departmental Seminars are for everyone in Maths. We encourage anyone interested in the subject in general, or in the particular subject of the seminar, to come along. It's a great opportunity to meet people in the Maths Department and join in with our community.

Optimising the Beer Distribution Game: a Reinforcement Learning and Monte Carlo Tree Search Approach.

The Beer Distribution Game (BDG) was originally introduced by Professor Jay Wright Forrester at MIT in 1960, ever since it has worked as a model for multi echelon supply chains, where it has been mainly studied using classic Operational Research techniques.

Some early attempts of applying AI methodologies consisted on tabular Q-learning approaches on simplified settings (Chaharsooghi et al. 2008; Mortazavi et al. 2015), but reached their limits in more realistic environments due to the large state-spaces and the resulting "Curse of Dimensionality”. One idea to break this Curse is the introduction of function approximation for the action value Q of any given action in any given state, instead of saving the value for each of them in a huge table. This was for example done by Oroojlooyjadid et al. (2017) and Geevers et al. (2020), who introduced Deep Neural Networks for this task.

In our research we focus on another approach: (traditional) Reinforcement Learning and Monte Carlo tree search (MCTS) algorithms. While the most famous application of these methods is associated to building intelligent agents for board games like Chess or Go (Silver, Schrittwieser, et al. 2017), it has also strongly impacted other domains which can be modelled as trees of sequential decisions (Browne et al. 2012).

In a recent study, Preil and Krapp (2021) applied MCTS to inventory management for the first time and found that it performed even better than other AI-based approaches which were previously explored. In their research the authors consider an adapted version of the BDG, where the states are fully observable and all decisions are made centrally. In this talk I will present a model for the classical BDG environment with imperfect information. This setting models a multi-echelon supply chain where actors take decentralised decisions about their specific order quantities without being able to observe the inventory levels of the other actors. 


Felipe Maldonado, University of Essex

How to attend

If not a member of the Dept. Mathematical Science at the University of Essex, you can register your interest in attending the seminar and request the Zoom’s meeting password by emailing Dr Osama Mahmoud (o.mahmoud@essex.ac.uk).

Related events