python - Tf-agents 环境示例中 _observation_spec 的形状和 _action_spec 的形状

Question

在TF-Agents Environments 的 tensorflow 文档中，有一个简单（受二十一点启发）纸牌游戏的环境示例。

初始化如下所示：

class CardGameEnv(py_environment.PyEnvironment):

  def __init__(self):
    self._action_spec = array_spec.BoundedArraySpec(
        shape=(), dtype=np.int32, minimum=0, maximum=1, name='action')
    self._observation_spec = array_spec.BoundedArraySpec(
        shape=(1,), dtype=np.int32, minimum=0, name='observation')
    self._state = 0
    self._episode_ended = False

动作规范只允许 0（不要求卡片）或 1（要求卡片），因此形状是shape=()（只需要一个整数）是明智的。

但是我不太明白观察规范的形状是shape=(1,)，因为它只代表当前回合中牌的总和（所以也是一个整数）。

什么解释了形状的差异？

score 0 · Accepted Answer

一开始我以为他们是一样的。为了测试它们，我在 W3 Schools Python “试用编辑器”上运行了以下代码（我通过此链接访问它）：

import numpy as np

arr1 = np.zeros((), dtype=np.int32)
arr2 = np.zeros((1), dtype=np.int32)

print("This is the first array:", arr1, "\n")
print("This is the second array:", arr2, "\n")

我得到的输出是：

This is the first array: 0

This is the second array: [0]

这使我得出结论，它shape=()是一个简单的整数，被视为一个 0-D 数组，但shape=(1,)它是一个由单个整数组成的 1-D 数组。我希望这是准确的，因为我自己想要一些确认。在第二次测试中进一步检查：

import numpy as np

arr1 = np.array(42)
arr2 = np.array([1])
arr3 = np.array([1, 2, 3, 4])

print(arr1.shape)
print(arr2.shape)
print(arr3.shape)

输出是：

()
(1,)
(4,)

这似乎与我首先得出的结论相符，因为 arr1 是一个 0-D 数组，而 arr3 是一个 4 个元素的 1-D 数组（如 W3 Schools 教程中所述），并且数组 arr2 具有与 arr3 相似的形状，但有不同数量的元素。

至于为什么动作和观察分别表示为整数和一个元素的数组，可能是因为TensorFlow使用张量（n维数组）工作，将观察视为数组可能更容易计算。

该动作被声明为一个整数，可能是为了简化函数内部的流程_step()，因为使用 if/elif/else 结构的数组会有点乏味。还有其他带有更多元素和离散/连续值的 action_specs 示例，因此没有其他想到的。

我不确定这一切是否正确，但至少开始讨论似乎是一个好点。

python - Tf-agents 环境示例中 _observation_spec 的形状和 _action_spec 的形状

1 回答 1

Related

Reference