python - python/httpx/asyncio: httpx.RemoteProtocolError: 服务器断开连接但没有发送响应

Question

我正在尝试优化我制作的一个简单的网络爬虫。它从主页上的表格中获取 url 列表，然后转到每个“子”url 并从这些页面获取信息。我能够成功地同步编写它并使用concurrent.futures.ThreadPoolExecutor(). 但是，我正在尝试对其进行优化以供使用asyncio，httpx因为这些对于发出数百个 http 请求来说似乎非常快。

我使用编写了以下脚本asyncio，httpx但是，我不断收到以下错误：

httpcore.RemoteProtocolError: Server disconnected without sending a response.

RuntimeError: The connection pool was closed while 4 HTTP requests/responses were still in-flight.

运行脚本时，我似乎一直失去连接。我什至尝试运行它的同步版本并得到相同的错误。我在想远程服务器阻止了我的请求，但是，我能够运行我的原始程序并从同一个 IP 地址转到每个 url 而没有问题。

什么会导致这个异常，你如何解决它？

import httpx
import asyncio

async def get_response(client, url):
    resp = await client.get(url, headers=random_user_agent()) # Gets a random user agent.
    html = resp.text
    return html


async def main():
    async with httpx.AsyncClient() as client:
        tasks = []

        # Get list of urls to parse.
        urls = get_events('https://main-url-to-parse.com')
        
        # Get the responses for the detail page for each event
        for url in urls:
            tasks.append(asyncio.ensure_future(get_response(client, url)))
            
        detail_responses = await asyncio.gather(*tasks)

        for resp in detail_responses:
            event = get_details(resp) # Parse url and get desired info
        
asyncio.run(main())

python - python/httpx/asyncio: httpx.RemoteProtocolError: 服务器断开连接但没有发送响应

0 回答 0

Related

Reference