MDP Solver for an

Inverted Rotary Pendulum

In this project, we applied Reinforcement Learning and planning for large-scale systems. We developed an optimistic planning algorithm, SOOP (Simultaneous Optimistic Optimization for Planning), described in our publication. SOOP is an online Markovian Decision Process solver, and the implementation is in C++. It was tested in simulation and on an inverted rotary pendulum. The hardware is from Quanser, and the communication protocols with the hardware are implemented in the Quanser Hardware in the Loop C API. The mathematical model and its parameters can be found here, and the equation is:

The control loop implementation has three threads:

1) computing control sequence,

2) applying the control sequence and

3) data logging.

The method can efficiently compute longer control sequences even for a short amount of computation time. All threads are synchronized with a barrier. The Compute U and Apply U loop frequency is 20Hz, so the sampling time is 50ms. We applied a control sequence with length one, while the computed sequence length was five.

The logger thread had a higher sampling frequency, 40Hz, to save the lambda and theta angles, control signal, and reward.

Thread synchronization

It can be observed that the system stabilizes in less than 0.6 sec. This is not an optimal solution because the pendulum required more than one swing to reach the stable upright position. Fine-tuning of the control parameters and code optimization would improve the outcome to an optimal solution.

Experiments results

GIT source can be found here: https://bitbucket.org/ElodP/soopirp/

Reference publication: Lucian Busoniu, Elod Páll, Remi Munos, "Discounted near-optimal control of general nonlinear systems using optimistic planning", American Control Conference (ACC-16) 2016

Check the other projects:

Use-case application of Environmental Constraint Exploitation for robotic surface treatment

Motion Generation With Contact-Based Environmental Constraints (Thesis link)

Human-like grasping from piles leveraging granular Environmental Constraints

Reactive motion planning with contact events

Assistive robotics with POMDP online solver

Inverted rotary pendulum controlled with an optimistic planning algorithm

Vision-based autonomous navigation for railway inspection with a UAV

UAV sensor noises and system identification

Pick and place Matlab application with a Melfa RV-2AJ arm

A Five-bar mechanism with Matlab UI and control for drawing

Google Sites

Report abuse