python - 在张量流代理中将状态存储为列表/整数的好处

Question

在 tensorflow 代理的环境教程（https://www.tensorflow.org/agents/tutorials/2_environments_tutorial）中，状态存储为整数。当需要状态时，将其转换为 numpy 数组：

from tf_agents.environments import py_environment
import numpy as np

class CardGameEnv(py_environment.PyEnvironment):

    def __init__(self):
        self._state = 0

    def _step(self,action):
        state_array = np.array([self._state], dtype=np.int32)
        return np.transition(state_array, reward=1.0, discount=0.9)

他们这样做有什么理由，而不是直接将状态存储为一个 numpy 数组？所以像这样：

from tf_agents.environments import py_environment
import numpy as np
class CardGameEnv(py_environment.PyEnvironment):

    def __init__(self):
        self._state = np.array([0], dtype=np.int32)

    def _step(self,action):
        return np.transition(self._state, reward=1.0, discount=0.9)

使用第二种方法有什么缺点吗？或者这同样有效吗？

score 1 · Accepted Answer

为了方便起见，我经常不将数据存储为 numpy 数组。我有时使用 pandas 数据框，有时使用列表，这取决于您如何更新当前状态。

尽管如此，将状态存储为 numpy 数组总是更有效，因为在转换中返回观察时不需要将状态转换为 numpy 数组。

python - 在张量流代理中将状态存储为列表/整数的好处

1 回答 1

Related