This example shows an actor critic model that consists of a critic that measures how good an action taken is and an actor that controls the agent's behavior. In our example actor and critic use the same model:
class Policy(gluon.Block):
def __init__(self, **kwargs):
super(Policy, self).__init__(**kwargs)
with self.name_scope():
self.dense = nn.Dense(16, in_units=4, activation='relu')
self.action_pred = nn.Dense(2, in_units=16)
self.value_pred = nn.Dense(1, in_units=16)
def forward(self, x):
x = self.dense(x)
probs = self.action_pred(x)
values = self.value_pred(x)
return F.softmax(probs), values
The example uses Gym, which is a toolkit for developing and comparing reinforcement learning algorithms. The model is running an instance of CartPole-v0 that simulates a pole that is attached by an un-actuated joint to a cart, which moves along a frictionless track. The goal is to prevent it from falling over.
The example provides the following commandline options:
MXNet actor-critic example optional arguments: -h, --help show this help message and exit --gamma G discount factor (default: 0.99) --seed N random seed (default: 1) --render render the environment --log-interval N interval between training status logs (default: 10)
To run the model execute, type
python actor_critic.py --render
You will get an output like the following: 