# MDP Solver for an

# Inverted Rotary Pendulum

In this project, we applied Reinforcement Learning and planning for large-scale systems. We developed an optimistic planning algorithm, SOOP (Simultaneous Optimistic Optimization for Planning), described in our publication. SOOP is an online Markovian Decision Process solver, and the implementation is in C++. It was tested in simulation and on an inverted rotary pendulum. The hardware is from Quanser, and the communication protocols with the hardware are implemented in the Quanser Hardware in the Loop C API. The mathematical model and its parameters can be found here, and the equation is:

The control loop implementation has three treads:

1) computing control sequence,

2) applying control sequence and

3) data logging.

The method can efficiently compute longer control sequences even for a short amount of computation time. All threads are synchronized with a barrier. The Compute U and Apply U loop frequency is 20Hz, so the sampling time is 50ms. We applied a control sequence with length one, while the computed sequence length was five.

The logger thread had a higher sampling frequency 40Hz to save the lambda and theta angles, control signal, and reward.

Thread synchronization

It can be observed that the system stabilizes in less than 0.6 sec. This is not an optimal solution because the pendulum required more than one swing to reach the stable upright position. Fine-tuning of the control parameters and code optimization would improve the outcome to an optimal solution.

Experiments results

**GIT source** can be found here: https://bitbucket.org/ElodP/soopirp/

**Reference publication:** Lucian Busoniu, Elod Páll, Remi Munos, "Discounted near-optimal control of general nonlinear systems using optimistic planning", American Control Conference (ACC-16) 2016