0

我有一个 xml 列表,假设:

    xmls = [
   '<note>
    <to>John</to>
    <from>Janet</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
    </note>',
    '<note>
    <to>Tom</to>
    <from>Jennifer</from>
    <heading>Reminder</heading>
    <body>Don't forget me this weekend!</body>
    </note>'
]

我想将该列表变成这样的 Pandas DataFrame:

to          from             heading             body
John        Janet            Reminder            Don't forget me this weekend!
Tom         Jennifer         Reminder            Don't forget me this weekend!

非常感谢!

4

2 回答 2

1

您可以使用xml.etree.ElementTree模块。首先,遍历 XML 列表并解析所需字段。然后,构造一个字典,其中键作为列名,值作为行。然后将此字典传递给pd.DataFrame类以创建DataFrame.

>>> import pandas as pd
>>> import xml.etree.ElementTree as ET
>>> from pprint import pprint
>>> 
>>> xmls = [
...     """<note>
... <to>John</to>
... <from>Janet</from>
... <heading>Reminder</heading>
... <body>Don't forget me this weekend!</body>
... </note>""",
...     """<note>
... <to>Tom</to>
... <from>Jennifer</from>
... <heading>Reminder</heading>
... <body>Don't forget me this weekend!</body>
... </note>""",
... ]
>>> 
>>> fields = {}
>>> for i in xmls:
...     root = ET.fromstring(i)
...     for child in root:
...         fields.setdefault(child.tag, []).append(child.text)
... 
>>> pprint(fields)
{'body': ["Don't forget me this weekend!", "Don't forget me this weekend!"],
 'from': ['Janet', 'Jennifer'],
 'heading': ['Reminder', 'Reminder'],
 'to': ['John', 'Tom']}
>>> df = pd.DataFrame(fields)
>>> df
     to      from   heading                           body
0  John     Janet  Reminder  Don't forget me this weekend!
1   Tom  Jennifer  Reminder  Don't forget me this weekend!
于 2020-04-03T08:38:58.280 回答
1

您可以尝试xmltodict,因为它可以为您简化处理:

  res = [
          {key: [value]
          for key, value
          #parses the data into a json like format
          #which u can manipulate like a python dict
          in xmltodict.parse(xml)["note"].items()} 
          for xml in xmls
         ]

 print(res)

 [{'to': ['John'],
   'from': ['Janet'],
   'heading': ['Reminder'],
   'body': ["Don't forget me this weekend!"]},
 {'to': ['Tom'],
  'from': ['Jennifer'],
  'heading': ['Reminder'],
  'body': ["Don't forget me this weekend!"]}]

#merge dictionaries into one dataframe
pd.concat([pd.DataFrame(entry) for entry in res])


    to      from        heading                     body
0   John    Janet      Reminder     Don't forget me this weekend!
0   Tom     Jennifer    Reminder    Don't forget me this weekend!
于 2020-04-03T09:05:00.220 回答