0

我正在尝试创建一个参与者策略,它是一个使用 tf_agents 将观察(状态空间)映射到动作(动作空间)的神经网络。以下是我的实现(深受他们的教程启发:https ://www.tensorflow.org/agents/tutorials/3_policies_tutorial )

input_tensor_spec = tensor_spec.TensorSpec((5,), tf.float32)
time_step_spec = ts.time_step_spec(input_tensor_spec)
action_spec = tensor_spec.BoundedTensorSpec((),
                                            tf.int32,
                                            minimum=0,
                                            maximum=9)


class ActionNet(network.Network):

    def __init__(self, input_tensor_spec, output_tensor_spec):
        super(ActionNet, self).__init__(
            input_tensor_spec=input_tensor_spec,
            state_spec=(),
            name='ActionNet')
        self._output_tensor_spec = output_tensor_spec
        self._sub_layers = [
            tf.keras.layers.Dense(
                100, activation=tf.nn.relu),
                tf.keras.layers.Dense(
                    action_spec.shape.num_elements(), activation=tf.nn.sigmoid),
        ]

    def call(self, observations, step_type, network_state):
        del step_type

        output = tf.cast(observations, dtype=tf.float32)
        for layer in self._sub_layers:
            output = layer(output)
        actions = tf.reshape(output, [-1] + self._output_tensor_spec.shape.as_list())

        actions *= 9
        print(actions)
        actions = tf.math.round(actions)

        # Scale and shift actions to the correct range if necessary.
        return actions, network_state




action_net = ActionNet(input_tensor_spec, action_spec)

my_actor_policy = actor_policy.ActorPolicy(
    time_step_spec=time_step_spec,
    action_spec=action_spec,
    actor_network=action_net)

我收到以下错误:

ValueError: actor_network output spec does not match action spec:
TensorSpec(shape=(), dtype=tf.float32, name=None)
vs.
BoundedTensorSpec(shape=(), dtype=tf.int32, name=None, minimum=array(0), maximum=array(9))
  In call to configurable 'ActorPolicy' (<class 'tf_agents.policies.actor_policy.ActorPolicy'>)

这基本上是说我的神经网络的输出不是有界张量。如何将神经网络的输出转换为有界张量。在我的例子中,由于我希望输出在 0 到 9 之间,我只是将 sigmoid 输出乘以 9 并将数字四舍五入。这不起作用,因为类型仍然是无界张量。

非常感谢

4

0 回答 0