0

Can I download a webpage with charset=iso-8859-1, and download it with encoding=utf-8? Will it download correctly?

Can I always download with utf-8 encoding for all encodings in the web?

My code:

Html page in the web:

<html debug="true">
<head/>
<body>
<%@LANGUAGE="JAVASCRIPT" CODEPAGE="1252"%>
<title>Untitled Document</title>
<meta name="robots" content="noindex"/>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
............

Func:

void download() {
        WebClient client = new WebClient();
        client.Encoding = Encoding.UTF8;
        client.DownloadDataCompleted += new DownloadDataCompletedEventHandler(client_DownloadDataCompleted);
        worker.ReportProgress(i);
        client.DownloadDataAsync(new Uri(link), i);
}
void client_DownloadDataCompleted(object sender, DownloadDataCompletedEventArgs e) {
   Encoding enc = Encoding.UTF8;
   string myString = enc.GetString(e.Result);
}
4

1 回答 1

2

No, this doesn’t work. The documentation of WebClient.Encoding clearly says:

When a string is downloaded using the DownloadString or DownloadStringAsync methods, WebClient uses the Encoding returned by this to convert the downloaded Byte array into a string.

And why should it work? Your web page has an encoding different from UTF-8. Why do you want to use UTF-8 here? It makes no sense. The document is encoded as ISO 8859-1, consequently this is the encoding you need to use to read it.

于 2012-05-21T08:45:46.420 回答