我是 TensorFlow 和 Keras 的新手。我已经从 CSV 加载了一个数据集并创建了一个 train_dataset:
column_names = ['a', 'date', 'c', 'd', 'e', 'f']
label_name = column_names[0]
feature_names = column_names[1:]
class_names = ['good', 'bad']
train_dataset = tf.data.experimental.make_csv_dataset(
train_dataset_fp,
batch_size,
column_names=column_names,
label_name=label_name,
num_epochs=1)
features, labels = next(iter(train_dataset))
print(features)
我的功能是 OrderedDict 并打印为:
OrderedDict([('b', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([1, 1, 0, 0, 0, 1, 0, 1, 0, 2, 1 , 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1], dtype=int32)>) , ('date', <tf.Tensor: shape=(32,), dtype=int64, numpy= array([-9223372036855, 1262478794000, 1262426153000, 1262431717000, 1262425334000, 1262588520000, 1262425515000, 1262418072000, 1262420797000, 1262428601000, 1262590037000, 1262421322000, 1262433023000, 1262390762000, 1262590200000, 1262432769000, 1262427397000, -9223372036855, 1262425996000, 1262430050000, 1262431867000, 1262424427000, 1262420906000, 1262391208000, 1262590114000, -9223372036855, 1262589645000, 1262424306000, 1262428178000, 1262421300000, 1262423456000, 1262515569000])>), (' d', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([357, 313,557、691、292、557、605、605、48、295、81、656、321、734、584、652、575、465、71、453、196、48、689、591、676、271、67、 229, 740, 713, 230, 664], dtype=int32)>), ('e', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([519, 537, 610, 178, 552, 610, 240, 240, 343, 643, 481, 340, 362, 143, 511, 167, 5, 685, 436, 105, 659, 343, 427, 242, 30, 717, 531, 492, 433, 452, 645, 303], dtype=int32)>), ('f', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([ 345, 545, 1663, 1426, 2065, 1017, 1655, 47, 2070, -1, 1191, 191, 1569, 547, 1295, 1776, 1620, 680, 1990, 1642, 1930, 1465, 1887, 2128, 999, 447, 1584, 884, , 1742, 2079, 729], dtype=int32)>)])dtype=int32)>), ('e', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([519, 537, 610, 178, 552, 610, 240, 240, 343 , 643, 481, 340, 362, 143, 511, 167, 5, 685, 436, 105, 659, 343, 427, 242, 30, 717, 531, 492, 433, 452, 645, 303], dtype= int32)>), ('f', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([ 345, 545, 1663, 1426, 2065, 1017, 1655, 47, 2070, - 1, 1191, 191, 1569, 547, 1295, 1776, 1620, 680, 1990, 1642, 1930, 1465, 1887, 2128, 999, 447, 844, 1851, 1586, 1742, 2079, 272] )>)])dtype=int32)>), ('e', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([519, 537, 610, 178, 552, 610, 240, 240, 343 , 643, 481, 340, 362, 143, 511, 167, 5, 685, 436, 105, 659, 343, 427, 242, 30, 717, 531, 492, 433, 452, 645, 303], dtype= int32)>), ('f', <tf.Tensor: shape=(32,), dtype=int32, numpy= array([ 345, 545, 1663, 1426, 2065, 1017, 1655, 47, 2070, - 1, 1191, 191, 1569, 547, 1295, 1776, 1620, 680, 1990, 1642, 1930, 1465, 1887, 2128, 999, 447, 844, 1851, 1586, 1742, 2079, 272] )>)])dtype=int32, numpy= array([ 345, 545, 1663, 1426, 2065, 1017, 1655, 47, 2070, -1, 1191, 191, 1569, 547, 1295, 1776, 1620, 680, 1990, 1642, 1930, 1465, 1887, 2128, 999, 447, 844, 1851, 1586, 1742, 2079, 729], dtype=int32)>)])dtype=int32, numpy= array([ 345, 545, 1663, 1426, 2065, 1017, 1655, 47, 2070, -1, 1191, 191, 1569, 547, 1295, 1776, 1620, 680, 1990, 1642, 1930, 1465, 1887, 2128, 999, 447, 844, 1851, 1586, 1742, 2079, 729], dtype=int32)>)])
如您所见,其中一个具有 dtype=int64。然后我使用以下函数将特征打包到一个数组中:
def pack_features_vector(features, labels):
features = tf.stack(list(features.values()), axis=1)
return features, labels
但是,当我运行它时:
train_dataset = train_dataset.map(pack_features_vector)
我收到以下错误:
“TypeError:传递给'Pack' Op 的'values'的列表中的张量具有不完全匹配的类型[int32,int64,int32,int32,int32]。”
我知道问题是堆栈功能。我有一个纪元格式日期作为我的第二个功能,它被读取为 int64。我认为将所有张量转换为相同的 dType 可能是最简单的,但我不确定如何。我可以看到功能集合是 Numpy 数组的 OrderedDict,但我不知道如何更改项目的 dType。我尝试了以下方法,它没有产生回溯,但是当我再次打印我的功能时,所有 dtypes 仍然相同:
for k,v in train_dataset:
tf.dtypes.cast(v, tf.int64)
我将不胜感激任何帮助。谢谢你。