0

我想合并两个 XML 文件。我阅读了许多解决方案,但它们特定于这些文件。我正在使用xml.etree.ElementTree以及lxml解析、比较文件、获取差异。我知道我的下一步是:

for element in file2.xml:
    if element present in file1.xml:
        append to output_file.xml
    else:
        copy element to the output_file

但是我在 XML 上工作不多,而且合并的工具是许可的,所以我需要编写一个通用脚本来合并为我想要的格式。

file1.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>

    <grandpa>
        <grandpa_name>grandpa_name_one_1</grandpa_name>
    </grandpa>
    <grandpa>
        <grandpa_name>grandpa_name_two_1</grandpa_name>
    </grandpa>

    <grandma>
        <grandma_name>grandma_name_one_1</grandma_name>
    </grandma>
    <grandma>
        <grandma_name>grandma_name_two_1</grandma_name>
    </grandma>

</great_grands>

file2.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>


    <grandpa>
        <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
    </grandpa>

    <grandma>
        <grandma_name_2>grandma_name_one_2</grandma_name_2>
    </grandma>

</great_grands>

所需输出:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<great_grands>

    <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
    <great_grandma_name_two>great_grandma_name</great_grandma_name_two>

    <grandpa>
        <grandpa_name>grandpa_name_one_1</grandpa_name>
    </grandpa>
    <grandpa>
        <grandpa_name>grandpa_name_two_1</grandpa_name>
    </grandpa>

    <grandpa>
        <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
    </grandpa>

    <grandma>
        <grandma_name>grandma_name_one_1</grandma_name>
    </grandma>
    <grandma>
        <grandma_name>grandma_name_two_1</grandma_name>
    </grandma>

    <grandma>
        <grandma_name_2>grandma_name_one_2</grandma_name_2>
    </grandma>

</great_grands>
4

1 回答 1

1

以XSLT为例,它是一种特殊用途的声明性语言,是 XPath 的兄弟,旨在转换 XML 文件。使用它的document()功能,它可以从相对链接的外部 XML 文件中解析。Python 的lxml模块可以处理 XSLT 1.0 脚本。

而且因为 XSLT 脚本是格式良好的 XML 文件,您可以从文件或嵌入的字符串中进行解析。下面假设所有文件和脚本都保存在同一目录中:

XSLT脚本(另存为 .xsl 脚本,注意只引用了 file2.xml)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

 <xsl:template match="/great_grands">
   <xsl:copy>
     <xsl:copy-of select="great_grandpa_name_one"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/great_grandpa_name_two"/>
     <xsl:copy-of select="grandpa"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/grandpa"/>
     <xsl:copy-of select="grandma"/>
     <xsl:copy-of select="document('file2.xml')/great_grands/grandma"/>
   </xsl:copy>
 </xsl:template>

</xsl:transform>

Python脚本 (注意只引用了 file1.xml)

from lxml import etree

xml = etree.parse('file1.xml')
xsl = etree.parse('XSLTScript.xsl')

transform = etree.XSLT(xsl)
newdom = transform(xml)

# SAVE NEW DOM STRING TO FILE
with open('Output.xml', 'wb') as f:
   f.write(newdom)

输出

<?xml version="1.0" encoding="UTF-8"?>
<great_grands>
  <great_grandpa_name_one>great_grandpa_name</great_grandpa_name_one>
  <great_grandpa_name_two>great_grandpa_name</great_grandpa_name_two>
  <grandpa>
    <grandpa_name>grandpa_name_one_1</grandpa_name>
  </grandpa>
  <grandpa>
    <grandpa_name>grandpa_name_two_1</grandpa_name>
  </grandpa>
  <grandpa>
    <grandpa_name_2>grandpa_name_one_2</grandpa_name_2>
  </grandpa>
  <grandma>
    <grandma_name>grandma_name_one_1</grandma_name>
  </grandma>
  <grandma>
    <grandma_name>grandma_name_two_1</grandma_name>
  </grandma>
  <grandma>
    <grandma_name_2>grandma_name_one_2</grandma_name_2>
  </grandma>
</great_grands>
于 2017-01-20T15:55:20.397 回答