c# - 长时间运行后连接意外关闭 C#

Question

嗨，我正在为网站制作爬虫。爬了大约 3 个小时后，我的应用程序在 WebException 上停止了。下面是我在 c# 中的代码。client 是预定义的WebClient对象，每次处理 gameDoc 时都会被释放。gameDoc 是一个HtmlDocument对象（来自HtmlAgilityPack）

while (retrygamedoc)
{
    try
    {
        gameDoc.LoadHtml(client.DownloadString(url)); // this line caused the exception
        retrygamedoc = false;
    }
    catch
    {
        client.Dispose();
        client = new WebClient();

        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

我尝试从这个答案中使用下面的代码（以保持 webclient 新鲜）

while (retrygamedoc)
{
    try
    {
        using (WebClient client2 = new WebClient())
        {
            gameDoc.LoadHtml(client2.DownloadString(url)); // this line cause the exception
            retrygamedoc = false;
        }
    }
    catch
    {
        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

但结果还是一样。然后我使用 StreamReader，结果保持不变！下面是我使用 StreamReader 的代码。

while (retrygamedoc)
{
    try
    {
        // using native to check the result
        HttpWebRequest webreq = (HttpWebRequest)WebRequest.Create(url);
        string responsestring = string.Empty;
        HttpWebResponse response = (HttpWebResponse)webreq.GetResponse(); // this cause the exception
        using (StreamReader reader = new StreamReader(response.GetResponseStream()))
        {
            responsestring = reader.ReadToEnd();
        }
        gameDoc.LoadHtml(client.DownloadString(url));

        retrygamedoc = false;
    }
    catch
    {
        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

我应该怎么做和检查？我很困惑，因为我能够在同一站点上的某些页面上爬行，然后在大约 1000 个结果中，它导致了异常。来自异常的消息是唯一The request was aborted: The connection was closed unexpectedly.的，状态是ConnectionClosed

PS。该应用程序是桌面表单应用程序。

更新：

现在我正在跳过这些值并将它们设置为 null，以便继续爬行。但是如果真的需要数据，我还是得手动更新爬取结果，因为结果包含数千条记录，这很累。请帮我。

例子：

就像您从网站上下载了大约 1300 条数据，然后应用程序停止说The request was aborted: The connection was closed unexpectedly.，而您的所有互联网连接仍然处于良好的速度。

score 4 · Accepted Answer

ConnectionClosed可能表明（并且可能会）您正在下载的服务器正在关闭连接。也许它注意到来自您的客户的大量请求并拒绝您提供额外的服务。

由于您无法控制服务器端的恶作剧，我建议您有某种逻辑稍后重试下载。

score 0 · Accepted Answer

0

收到此错误，因为它从服务器返回为 404。

于 2021-03-18T18:34:27.500 回答

c# - 长时间运行后连接意外关闭 C#

2 回答 2

Related

Reference