Humanoid learns standing push recovery via PPO with Beta policy in OpenAI/MuJoCo environment

Similar to previous video ( • Position-controlled hu... ) but now attempting to learn to recover from the disturbances.
For each run, there was a phase of 1000 episodes, during which the humanoid learned undisturbed static standing. Then, in the second phase, disturbances of random magnitude and direction were applied at t=5 sec. and t=10 sec. (each force was applied for 0.15 sec.). The force magnitude and direction are shown (0 radians denotes a force pointed forward, equivalent to a push forward from behind).
In the first run, the disturbance magnitudes were sampled from a uniform distribution between 40 and 50 N. In the second run, the maximum disturbance magnitude was gradually increased each episode, starting from 20 N and eventually reaching nearly 50 N. The minimum disturbance magnitude was set to 10N less than the maximum, to ensure a constant range.
Both runs were for roughly 3 million timesteps, but run 2 executed about 4200 episodes, compared to over 5500 episodes for run 1. While the run 2 episodes were longer, on average, than the run 1 episodes, this does not necessarily mean that the final policy was more effective for run 2 than for run 1, since the run 1 "disturbance curriculum" was more difficult. But, it was evident that, near the end of run 2, the humanoid was often able to recover from both large disturbances more effectively than near the end of run 1, despite being exposed to fewer "larger disturbance" episodes.

Пікірлер: 2

  • @krishnaprakashyadav9013
    @krishnaprakashyadav9013 Жыл бұрын

    How are you applying the push force? is it continuous? if yes, then where is the location of push force?

  • @jerrysweaffordjr

    @jerrysweaffordjr

    Жыл бұрын

    In MuJoCo, you can apply external forces directly to any of the body parts. The simulation is discrete rather than continuous. So, at each timestep during a disturbance interval (e.g., 5.0 to 5.15 second), the push force is applied at the torso. Then, after the interval is over (e.g., after t = 5.15 second), the external force on the torso is set back to zero, to stop the disturbing force.