Autonomous inverted helicopter flight via reinforcement learning Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse, Eric Berger, and Eric Liang By Varun Grover
Outline! Helicopter flying! Objective! Setup! Model Identification! Controller Design! Experiments! Decision Making! Strength and Weaknesses! Applicability to my project
Helicopter flying! From How stuff works, Helicopters can do three things that an airplane cannot. " Fly Backward " Rotate in the air " Hover! Well Helicopters can do one more thing " Inverted Hover
Helicopter flying! Main Rotor rotating wing assembly.! Tail Rotor - produces thrust just like an airplane's propeller does
Helicopter flying! If you give the main rotor wings a slight angle of attack on the shaft and spin the shaft, the wings start to develop lift.! In order to actually control the machine, both the main rotor and the tail rotor need to be adjustable.
Helicopter flying! Hovering in a helicopter requires experience and skill. The pilot adjusts the cyclic to maintain the helicopter's position over a point on the ground. The pilot adjusts the collective to maintain a fixed altitude (especially important when close to the ground). The pilot adjusts the foot pedals to maintain the direction that the helicopter is pointing.! Hovering an inverted helicopter is quite challenging
Objective! Perform an autonomous inverted hovering.! Stochastic, Non-linear! The problem is high-dimensional: " X,Y,Z coordinates for position and velocity " Main rotor " Tail rotor
Helicopter setup! Modified Bergen industrial twin helicopter http://www.bergenrc.com/industrialtwin.as p! Also equipped with " PC104 flight computer " Inertial Science ISIS-IMU accelerometers and turning-rate gyroscopes " Novatel GPS unit " MicroStrain 3d magnetic compass
Machine Learning for controller design! Flown using four controls: " a[1] and a[2] for forward/backward or sideways motion " a[3] pitch angle which changes the angel of the main rotor blades. " a[4] tail rotor pitch Pitch control rods shown in orange
Model Identification! To learn the dynamics of the helicopter, collect data while a human flies the helicopter upside-down.! Collect information about position (x,y,z), orientation velocity and angular velocity! A total of 391 such states were collected.
Model Identification! These 12 dimensional state is reduced into 8 dimensional state using body coordinates!we can represent a state only using 6 dimensions
Model Identification! Time difference between s t and s t+1 = 0.1 seconds! Used linear regression to learn to predict! Errors in one-step prediction were modeled as Gaussian.! Estimated noise variance via maximum likelihood.
Model Identification! Using the above model create a simulator! Human pilot tests the model by flying the helicopter in the simulation! Simulation used to test the controller
Controller design! Used Reinforcement learning! Reward function: punishment for deviation of desired position and orientation
Controller design! Policy is represented as a neural network! Once a policy is defined, calculate! Choose the gains for the controller so that we obtain a policy which maximizes the above value
Controller design! Use Monte Carlo method on the simulated model to calculate the state transition probabilities.! Use these transitional probabilities to calculate! Repeat m times to get an average.
Controller design! But since the Monte Carlo method works with stochastic values, estimating the best policy is hard because of noise in the result! Use the PEGASUS method to convert the stochastic problem into deterministic problem
PEGASUS! Stands for Policy Evaluation-of-Goodness And Search Using Scenarios! we can reduce the problem of policy search in an arbitrary POMDP to one in which all the transitions are deterministic! This reduction is achieved by transforming the original POMDP into an equivalent one that has only deterministic transitions.! For this paper they fixed the random number sequence used by the simulator.
Controller design! Now apply hill-climbing algorithm to search for the best policy Function Hill-Climbing(problem) { inputs: problem local variables: current_node, next_node loop do next_node = a highest-valued successor of current_node if valueof(next_node) < valueof(current_node) return current_node current_node = next_node end do }
Hill-Climbing Algorithm properties! Moves in the direction of increasing value.! Only need to store the current node and its evaluation! Hill-climbing can get stuck in local maxima Once in a local maxima the algorithm will halt even though the solution may be far from satisfactory! In a plateau the hill-climbing algorithm conducts a random walk.! Why use hill-climbing is not clear.
Experiments! Were able to learn how to hover an inverted helicopter! It took 72 hours to design and demonstrate a stable inverted flight controller
Decision Making! Choosing probability distribution models for modeling error! Number of iterations to average over to pick the best policy! Given a state which action will maximize reward
Strengths and Weaknesses! + Solves a very complex problem! + Fast solution (72 hours)! - Lots of high level information, not many concrete details! - Model constructed does not take whether conditions into account! - Using hill-climbing
Applicability to my project! Improving performance of the MTS problem by modeling opponent! Liuyang and I! We are estimating the target movements through a linear function! Similar to Policy Iteration, we can use hill-climb parameter setting to find optimal parameter values.
References! [Ng, Coates, Diel, Ganapathi, Schulte, Tse, Berger, Liang 2004] Inverted autonomous helicopter flight via reinforcement learning. International Symposium on Experimental Robotics.! [http://travel.howstuffworks.com/helicopter.htm]! [Ng, Jordan 2000] PEGASUS: A policy search method for large MDPs and POMDPs. In Uncertainty in Artificial Intelligence, Proceedings of Sixteenth Conference, pages 406-415.! [Russell, Norvig 1995] Artificial Intelligence: A Modern Approach, Prentice Hall Series in Artificial Intelligence. Englewood Cliffs, New Jersey! [Ng, Kim, Jordan, Sastry 2004] Autonomous helicopter flight via reinforcement learning. In Neural Information Processing Systems 16.