4

I have one serialized file for each class in my dataset. I would like to use queues to load up each of these files and then place them in a RandomShuffleQueue that will pull them off so I get a random mix of examples from each class. I thought this code would work.

In this example each file has 10 examples.

filenames = ["a", "b", ...]

with self.test_session() as sess:
  # for each file open a queue and get that
  # queue's results. 
  strings = []
  rq = tf.RandomShuffleQueue(1000, 10, [tf.string], shapes=())
  for filename in filenames:
    q = tf.FIFOQueue(99, [tf.string], shapes=())
    q.enqueue([filename]).run()
    q.close().run()
    # read_string just pulls a string from the file
    key, out_string = input_data.read_string(q, IMAGE_SIZE, CHANNELS, LABEL_BYTES)
    strings.append(out_string)

    rq.enqueue([out_string]).run()

  rq.close().run()
  qs = rq.dequeue()
  label, image = input_data.string_to_data(qs, IMAGE_SIZE, CHANNELS, LABEL_BYTES)
  for i in range(11):
    l, im = sess.run([label, image])
    print("L: {}".format(l)

This works fine for 10 calls, but on the 11th it says that the queue is empty.

I believe this is due to a misunderstanding on my part of what these queues operate on. I add 10 variables to the RandomShuffleQueue, but each of those variables is itself pulling from a queue, so I assumed the queue would not be emptied until each of the file queues was empty.

What am I doing wrong here?

4

1 回答 1

3

这个问题的正确答案取决于你有多少文件,它们有多大,以及它们的大小是如何分布的。

您的示例的直接问题是rq每个 仅获取一个元素filename in filenames,然后关闭队列。我假设有 10 个filenames,因为每次调用rq.dequeue()都会消耗一个元素。由于队列关闭,无法再添加元素,第11次激活操作失败。rqsess.run([label, image])rq.dequeue()

一般的解决方案是您必须创建额外的线程以保持rq.enqueue([out_string])循环运行。TensorFlow 包含一个QueueRunner旨在简化这一点的类,以及一些处理常见情况的其他函数。线程和队列的文档解释了它们是如何使用的,还有一些关于使用队列读取文件的好信息。

至于您的特定问题,您可以处理的一种方法是创建N阅读器(为每个N文件)。然后,您可以将元素(每个阅读器中的一个)放入一个批次中,并用于一次将一个批次添加到具有足够大容量的 a 中,并确保类之间有足够的混合。调用将为您提供一批从每个文件中以相等概率采样的元素。tf.pack() Nenqueue_manytf.RandomShuffleQueuemin_after_dequeuedequeue_many(k)RandomShuffleQueuek

于 2015-11-19T07:14:17.753 回答