使用HtmlAgilityPack(可从 NuGet 获得)解析 HTML 文档。这是向控制台显示表数据的示例:
var doc = new HtmlDocument();
doc.Load(path_to_html);
var rows =
doc.DocumentNode.SelectNodes("//table/tbody/tr")
.Select(tr => tr.SelectNodes("td").Select(td => td.InnerHtml).ToList())
.ToList();
输出:
var headers = rows[0];
// skip first row which contains headers
foreach (var row in rows.Skip(1))
{
for (int i = 0; i < row.Count; i++)
if (headers.Count > i) // you can remove this check if data is valid
Console.WriteLine("{0} = {1}", headers[i], row[i]);
}
结果:
Header1 = 1
Header2 = 2
Header3 = 3
Header4 = 4
Header1 = 11
Header2 = 22
Header3 = 33
Header4 = 44
如果您需要为列定义标题,那么我建议您使用thead标签:
<table>
<thead>
<tr>
<td>Header1</td>
<td>Header2</td>
<td>Header3</td>
<td>Header4</td>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<td>11</td>
<td>22</td>
<td>33</td>
<td>44</td>
</tr>
</tbody>
</table>
在这种情况下,解析和输出看起来像
var headers = doc.DocumentNode.SelectNodes("//table/thead/tr/td")
.Select(td => td.InnerHtml).ToList();
var rows =
doc.DocumentNode.SelectNodes("//table/tbody/tr")
.Select(tr => tr.SelectNodes("td").Select(td => td.InnerHtml).ToList())
.ToList();
foreach (var row in rows)
{
for(int i = 0; i < row.Count; i++)
Console.WriteLine("{0} = {1}", headers[i], row[i]);
}