0

我的程序会同时使用aiohttp下载大约 1000 万条数据,然后将数据写入磁盘上大约 4000 个文件。

我使用aiofiles库是因为我希望我的程序在读/写文件时也能做其他事情。

但是我担心如果程序尝试同时写入所有 4000 个文件,硬盘就不能那么快地完成所有写入。

是否可以使用 aiofiles(或其他库)限制并发写入的数量?aiofiles 已经这样做了吗?

谢谢。

测试代码:

import aiofiles
import asyncio


async def write_to_disk(fname):
    async with aiofiles.open(fname, "w+") as f:
        await f.write("asdf")


async def main():
    tasks = [asyncio.create_task(write_to_disk("%d.txt" % i)) 
             for i in range(10)]
    await asyncio.gather(*tasks)


asyncio.run(main())
4

1 回答 1

1

您可以使用asyncio.Semaphore来限制并发任务的数量。只需在编写之前强制您的write_to_disk函数获取信号量:

import aiofiles
import asyncio


async def write_to_disk(fname, sema):
    # Edit to address comment: acquire semaphore after opening file
    async with aiofiles.open(fname, "w+") as f, sema:
        print("Writing", fname)
        await f.write("asdf")
        print("Done writing", fname)


async def main():
    sema = asyncio.Semaphore(3)  # Allow 3 concurrent writers
    tasks = [asyncio.create_task(write_to_disk("%d.txt" % i, sema)) for i in range(10)]
    await asyncio.gather(*tasks)


asyncio.run(main())

请注意该行sema = asyncio.Semaphore(3)以及sema,.async with

输出:

"""
Writing 1.txt
Writing 0.txt
Writing 2.txt
Done writing 1.txt
Done writing 0.txt
Done writing 2.txt
Writing 3.txt
Writing 4.txt
Writing 5.txt
Done writing 3.txt
Done writing 4.txt
Done writing 5.txt
Writing 6.txt
Writing 7.txt
Writing 8.txt
Done writing 6.txt
Done writing 7.txt
Done writing 8.txt
Writing 9.txt
Done writing 9.txt
"""
于 2019-12-13T18:13:55.053 回答