python - 使用 xml.etree.ElementTree 获取文件中的 XML 标记列表

Question

如前所述，我需要使用 library 获取文件中的 XML 标记列表xml.etree.ElementTree。

我知道有一些属性和方法，例如ETVar.child, ETVar.getroot(), ETVar.tag, ETVar.attrib.

但是为了能够使用它们并至少获得第 2 层的标签名称，我不得不使用嵌套的 for。

目前我有类似的东西

for xmlChild in xmlRootTag:
    if xmlChild.tag:
        print(xmlChild.tag)

目标是获取所有文件的列表，甚至是文件中深度嵌套的 XML 标记，从而消除重复项。

为了更好的想法，我添加了 XML 代码的可能示例：

<root>
 <firstLevel>
  <secondlevel level="2">
    <thirdlevel>
      <fourth>text</fourth>
      <fourth2>text</fourth>
    </thirdlevel>
  </secondlevel>
 </firstlevel>
</root>

score 31 · Accepted Answer

我对该主题进行了更多研究，并找到了合适的解决方案。由于这可能是一项常见的任务，我会回答它，因此我相信它可以帮助其他人。

我正在寻找的是 etree 方法 iter。

import xml.etree.ElementTree as ET
# load and parse the file
xmlTree = ET.parse('myXMLFile.xml')

elemList = []

for elem in xmlTree.iter():
    elemList.append(elem.tag)

# now I remove duplicities - by convertion to set and back to list
elemList = list(set(elemList))

# Just printing out the result
print(elemList)

重要笔记

xml.etree.ElemTree是一个标准的 Python 库
样本是为Python v3.2.3
mechanic used to remove duplicities is based on converting to set, which allows only unique values and then converting back to list.

score 8 · Accepted Answer

You could use the built-in Python set comprehension:

import xml.etree.ElementTree as ET

xmlTree = ET.parse('myXMLFile.xml')
tags = {elem.tag for elem in xmlTree.iter()}

If you specifically need a list, you can cast it to a list:

import xml.etree.ElementTree as ET

xmlTree = ET.parse('myXMLFile.xml')
tags = list({elem.tag for elem in xmlTree.iter()})

python - 使用 xml.etree.ElementTree 获取文件中的 XML 标记列表

2 回答 2

重要笔记

Related

Reference