python - Python使用aiofiles并行读取目录中的文件

Question

所以，基本上我正在考虑将本地目录中的一些文件上传到云存储。本地目录将包含多个文件，这些文件可能非常大，因此我试图找到一种并行上传它的方法，每个文件都以块的形式读取，以免阻塞内存。

现在，我可以使用它连续完成它并且它可以工作

position=0
<methodtocreatezerokbfilecloudstorage>
with file_chunks(localfilefullpath,chunk_size) as chunks:
    for chunk in chunks:
        <methodforappendingdatatocloudstorage>
        position=position + len(chunk)
<finalizethefileincloudstorage>

file_chunks 定义为

@contextmanager
def file_chunks(filename, chunk_size):
    f = smbclient.open_file(filename, 'rb')
    try:
        def gen():
            b = f.read(chunk_size)
            while b:
                yield b
                b = f.read(chunk_size)
        yield gen()
    finally:
        f.close()

现在，当我计划并行上传多个文件时，我希望是否可以使用aiofiles并asyncio执行此操作。所以，我尝试将上面的内容重写为下面的内容

pathlist = Path(localdirectory).glob('*.'+ extension) 
for path in pathlist:
    async with aiofiles.open(f'{localdirectory}/{path.name}', mode='rb') as f:
        position=0
        filename=path.name
        <methodtocreatezerokbfilecloudstorage>
        async with file_chunks(f, chunk_size) as chunks:
            async for chunk in chunks:
                <methodforappendingdatatocloudstorage>
                print(f"Writing to {filename} at position: {position}")
                position=position + len(chunk)
    <finalizethefileincloudstorage>

file_chunks 的定义更改为

@asynccontextmanager
async def file_chunks(f, chunk_size):
    try:
        async def gen():
            b = await f.read(chunk_size)
            while b:
                yield b
                b = await f.read(chunk_size)
        yield gen()
    finally:
        f.close()

我现在有 2 个文件要上传，我希望看到打印语句，打印类似

Writing to File1 at position: 0
Writing to File2 at position: 0
Writing to File1 at position: 41943240
Writing to File2 at position: 41943240

但是，我看到它正在连续运行

Writing to File1 at position: 0
Writing to File1  at position: 20971620
Writing to File1  at position: 41943240
Writing to File1  at position: 62914860

只有当 File1 结束时，File 2 才会开始。

我浏览了一些 Stack Overflow 帖子，他们提到使用multiprocessing. 这是唯一可能的方法吗？

就我而言，没有任何“处理”（主要计算），只是试图上传文件。asyncio 和 aiofiles 是否无法处理它，例如在读取一个文件时，程序不会等待（异步）并开始读取下一个文件？或者我只是在我的代码中做错了什么？

任何问题，我可以详细说明。

python - Python使用aiofiles并行读取目录中的文件

0 回答 0

Related

Reference