1

我是 TensorFlow 和 Keras 的新手。我已经从 CSV 加载了一个数据集并创建了一个 train_dataset:

column_names = ['a', 'date', 'c', 'd', 'e', 'f']
label_name = column_names[0]
feature_names = column_names[1:]
class_names = ['good', 'bad']

train_dataset = tf.data.experimental.make_csv_dataset(
    train_dataset_fp,
    batch_size,
    column_names=column_names,
    label_name=label_name,
    num_epochs=1)

features, labels = next(iter(train_dataset))
print(features)

我的功能是 OrderedDict 并打印为:

OrderedDict([('b', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([1, 1, 0, 0, 0, 1, 0, 1, 0, 2, 1 , 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1], dtype=int32)>) , ('date', <tf.Tensor: shape=(32,), dtype=int64, numpy= array([-9223372036855, 1262478794000, 1262426153000, 1262431717000, 1262425334000, 1262588520000, 1262425515000, 1262418072000, 1262420797000, 1262428601000, 1262590037000, 1262421322000, 1262433023000, 1262390762000, 1262590200000, 1262432769000, 1262427397000, -9223372036855, 1262425996000, 1262430050000, 1262431867000, 1262424427000, 1262420906000, 1262391208000, 1262590114000, -9223372036855, 1262589645000, 1262424306000, 1262428178000, 1262421300000, 1262423456000, 1262515569000])>), (' d', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([357, 313,557、691、292、557、605、605、48、295、81、656、321、734、584、652、575、465、71、453、196、48、689、591、676、271、67、 229, 740, 713, 230, 664], dtype=int32)>), ('e', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([519, 537, 610, 178, 552, 610, 240, 240, 343, 643, 481, 340, 362, 143, 511, 167, 5, 685, 436, 105, 659, 343, 427, 242, 30, 717, 531, 492, 433, 452, 645, 303], dtype=int32)>), ('f', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([ 345, 545, 1663, 1426, 2065, 1017, 1655, 47, 2070, -1, 1191, 191, 1569, 547, 1295, 1776, 1620, 680, 1990, 1642, 1930, 1465, 1887, 2128, 999, 447, 1584, 884, , 1742, 2079, 729], dtype=int32)>)])dtype=int32)>), ('e', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([519, 537, 610, 178, 552, 610, 240, 240, 343 , 643, 481, 340, 362, 143, 511, 167, 5, 685, 436, 105, 659, 343, 427, 242, 30, 717, 531, 492, 433, 452, 645, 303], dtype= int32)>), ('f', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([ 345, 545, 1663, 1426, 2065, 1017, 1655, 47, 2070, - 1, 1191, 191, 1569, 547, 1295, 1776, 1620, 680, 1990, 1642, 1930, 1465, 1887, 2128, 999, 447, 844, 1851, 1586, 1742, 2079, 272] )>)])dtype=int32)>), ('e', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([519, 537, 610, 178, 552, 610, 240, 240, 343 , 643, 481, 340, 362, 143, 511, 167, 5, 685, 436, 105, 659, 343, 427, 242, 30, 717, 531, 492, 433, 452, 645, 303], dtype= int32)>), ('f', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([ 345, 545, 1663, 1426, 2065, 1017, 1655, 47, 2070, - 1, 1191, 191, 1569, 547, 1295, 1776, 1620, 680, 1990, 1642, 1930, 1465, 1887, 2128, 999, 447, 844, 1851, 1586, 1742, 2079, 272] )>)])dtype=int32, numpy= array([ 345, 545, 1663, 1426, 2065, 1017, 1655, 47, 2070, -1, 1191, 191, 1569, 547, 1295, 1776, 1620, 680, 1990, 1642, 1930, 1465, 1887, 2128, 999, 447, 844, 1851, 1586, 1742, 2079, 729], dtype=int32)>)])dtype=int32, numpy= array([ 345, 545, 1663, 1426, 2065, 1017, 1655, 47, 2070, -1, 1191, 191, 1569, 547, 1295, 1776, 1620, 680, 1990, 1642, 1930, 1465, 1887, 2128, 999, 447, 844, 1851, 1586, 1742, 2079, 729], dtype=int32)>)])

如您所见,其中一个具有 dtype=int64。然后我使用以下函数将特征打包到一个数组中:

def pack_features_vector(features, labels):
  features = tf.stack(list(features.values()), axis=1)
  return features, labels

但是,当我运行它时:

train_dataset = train_dataset.map(pack_features_vector)

我收到以下错误:

“TypeError:传递给'Pack' Op 的'values'的列表中的张量具有不完全匹配的类型[int32,int64,int32,int32,int32]。”

我知道问题是堆栈功能。我有一个纪元格式日期作为我的第二个功能,它被读取为 int64。我认为将所有张量转换为相同的 dType 可能是最简单的,但我不确定如何。我可以看到功能集合是 Numpy 数组的 OrderedDict,但我不知道如何更改项目的 dType。我尝试了以下方法,它没有产生回溯,但是当我再次打印我的功能时,所有 dtypes 仍然相同:

for k,v in train_dataset:
  tf.dtypes.cast(v, tf.int64)

我将不胜感激任何帮助。谢谢你。

4

1 回答 1

0

我想我已经想通了。我需要添加 column_defaults:

column_names = ['a', 'b', 'date', 'd', 'e', 'f']
column_defaults=[tf.int64, tf.int64, tf.int64, tf.int64, tf.int64, tf.int64]
label_name = column_names[0]
feature_names = column_names[1:]
class_names = ['good', 'bad']

batch_size = 32

train_dataset = tf.data.experimental.make_csv_dataset(
    train_dataset_fp,
    batch_size,
    column_names=column_names,
    column_defaults=column_defaults,
    label_name=label_name,
    num_epochs=1)

features, labels = next(iter(train_dataset))

def pack_features_vector(features, labels):
  features = tf.stack(list(features.values()), axis=1)
  return features, labels

train_dataset = train_dataset.map(pack_features_vector)

但我现在正在尝试创建一个模型和预测:

model = tf.keras.Sequential([
  tf.keras.layers.Dense(128, activation=tf.nn.relu, input_shape=(5,)),
  tf.keras.layers.Dense(128, activation=tf.nn.relu),
  tf.keras.layers.Dense(2)
])

predictions = model(features)

并得到“KeyError:'dense_input'”,不知道为什么。

于 2020-08-01T19:38:52.397 回答