我正在尝试创建一个参与者策略,它是一个使用 tf_agents 将观察(状态空间)映射到动作(动作空间)的神经网络。以下是我的实现(深受他们的教程启发:https ://www.tensorflow.org/agents/tutorials/3_policies_tutorial )
input_tensor_spec = tensor_spec.TensorSpec((5,), tf.float32)
time_step_spec = ts.time_step_spec(input_tensor_spec)
action_spec = tensor_spec.BoundedTensorSpec((),
tf.int32,
minimum=0,
maximum=9)
class ActionNet(network.Network):
def __init__(self, input_tensor_spec, output_tensor_spec):
super(ActionNet, self).__init__(
input_tensor_spec=input_tensor_spec,
state_spec=(),
name='ActionNet')
self._output_tensor_spec = output_tensor_spec
self._sub_layers = [
tf.keras.layers.Dense(
100, activation=tf.nn.relu),
tf.keras.layers.Dense(
action_spec.shape.num_elements(), activation=tf.nn.sigmoid),
]
def call(self, observations, step_type, network_state):
del step_type
output = tf.cast(observations, dtype=tf.float32)
for layer in self._sub_layers:
output = layer(output)
actions = tf.reshape(output, [-1] + self._output_tensor_spec.shape.as_list())
actions *= 9
print(actions)
actions = tf.math.round(actions)
# Scale and shift actions to the correct range if necessary.
return actions, network_state
action_net = ActionNet(input_tensor_spec, action_spec)
my_actor_policy = actor_policy.ActorPolicy(
time_step_spec=time_step_spec,
action_spec=action_spec,
actor_network=action_net)
我收到以下错误:
ValueError: actor_network output spec does not match action spec:
TensorSpec(shape=(), dtype=tf.float32, name=None)
vs.
BoundedTensorSpec(shape=(), dtype=tf.int32, name=None, minimum=array(0), maximum=array(9))
In call to configurable 'ActorPolicy' (<class 'tf_agents.policies.actor_policy.ActorPolicy'>)
这基本上是说我的神经网络的输出不是有界张量。如何将神经网络的输出转换为有界张量。在我的例子中,由于我希望输出在 0 到 9 之间,我只是将 sigmoid 输出乘以 9 并将数字四舍五入。这不起作用,因为类型仍然是无界张量。
非常感谢