python - 关联列表python

Question

我正在用美丽的汤解析一些 html 表单。基本上我有大约 60 个输入字段，主要是单选按钮和复选框。到目前为止，这适用于以下代码：

from BeautifulSoup import BeautifulSoup
x = open('myfile.html','r').read()
out = open('outfile.csv','w')
soup = BeautifulSoup(x)
values = soup.findAll('input',checked="checked")
# echoes some output like ('name',1) and ('value',4)

for cell in values:
# the following line is my problem! 
    statement = cell.attrs[0][1] + ';' + cell.attrs[1][1] + ';\r'
    out.write(statement)

out.close()
x.close()

如代码中所示，我的问题是选择属性的位置，因为 HTML 模板很丑陋，混淆了属于输入字段的参数序列。我对 name="somenumber" value="someothernumber" 感兴趣。不幸的是，我的 attrs[1] 方法不起作用，因为名称和值在我的 html 中不会以相同的顺序出现。

有什么方法可以关联地访问生成的 BeautifulSoup 列表？

提前谢谢任何建议！

score 2 · Accepted Answer

2

我相当确定您可以将属性名称用作哈希键：

print cell['name']

于 2010-06-29T14:19:43.400 回答

score 2 · Accepted Answer

我的建议是制作values一个dict. 如果soup.findAll返回您似乎暗示的元组列表，那么它很简单：

values = dict(soup.findAll('input',checked="checked"))

之后，您可以简单地通过属性名称引用这些值，就像 Peter 所说的那样。

当然，如果soup.findAll没有像您暗示的那样返回元组列表，或者您的问题是元组本身以某种奇怪的方式返回（例如，而不是 ('name', 1) 它将是 ( 1, 'name'))，那么它可能会更复杂一些。

另一方面，如果soup.findAll返回一组特定数据类型中的一个（dict 或 dicts 列表、namedtuple 或 namedtuples 列表），那么您实际上会更好，因为您不必在第一次进行任何转换地方。

...是的，在检查了 BeautifulSoup 文档之后，似乎findAll返回了一个可以被视为字典列表的对象，所以你可以按照彼得所说的去做。

http://www.crummy.com/software/BeautifulSoup/documentation.html#The%20attributes%20of%20Tags

哦，是的，如果您想枚举属性，只需执行以下操作：

for cell in values:
    for attribute in cell:
        out.write(attribute + ';' + str(cell[attribute]) + ';\r')

python - 关联列表python

2 回答 2

Related

Reference