python - 使用 Python lxml 解析带有条件的 ONIX xml

Question

我正在尝试使用 Python解析器从ONIX XML 格式文件中提取一些信息。lxml

除其他外，我对文档感兴趣的部分如下所示：

<?xml version="1.0" encoding="UTF-8"?>
<ProductSupply>
       <SupplyDetail>
          <Supplier>
             <SupplierRole>03</SupplierRole>
             <SupplierName>EGEN</SupplierName>
          </Supplier>
          <ProductAvailability>40</ProductAvailability>
          <Price>
             <PriceType>01</PriceType>
             <PriceAmount>0.00</PriceAmount>
             <Tax>
                <TaxType>01</TaxType>
                <TaxRateCode>Z</TaxRateCode>
                <TaxRatePercent>0</TaxRatePercent>
                <TaxableAmount>0.00</TaxableAmount>
                <TaxAmount>0.00</TaxAmount>
             </Tax>
             <CurrencyCode>NOK</CurrencyCode>
          </Price>
          <Price>
             <PriceType>02</PriceType>
             <PriceQualifier>05</PriceQualifier>
             <PriceAmount>0.00</PriceAmount>
             <Tax>
                <TaxType>01</TaxType>
                <TaxRateCode>Z</TaxRateCode>
                <TaxRatePercent>0</TaxRatePercent>
                <TaxableAmount>0.00</TaxableAmount>
                <TaxAmount>0.00</TaxAmount>
             </Tax>
             <CurrencyCode>NOK</CurrencyCode>
          </Price>
       </SupplyDetail>
    </ProductSupply>

我需要在以下条件下提取价格金额：

PriceType='02' and CurrencyCode='NOK' and PriceQualifier='05'

我试过：

price = p.find(
"ProductSupply/SupplyDetail[Supplier/SupplierRole='03']/Price[PriceType='02' \
and CurrencyCode='NOK' and PriceQualifier='05']/PriceAmount").text

由于某种原因，我的带运算符的 XPathand不起作用并出现以下错误：

File "<string>", line unknown
    SyntaxError: invalid predicate

知道如何处理它吗？非常感谢任何帮助！

score 0 · Accepted Answer

TL;DR：因为方法不支持xpath()布尔运算符，所以使用。andfind*()

正如Daniel 建议的那样，您应该为您的（相当复杂的）XPath 表达式使用lxml的解析器方法。xpath()

XPath

您的 XPath 表达式包含使用布尔运算符 (XPath 1.0)的节点测试和谓词： and

ProductSupply/SupplyDetail[Supplier/SupplierRole='03']/Price[PriceType='02' \
and CurrencyCode='NOK' and PriceQualifier='05']/PriceAmount

提示：在线测试（参见Xpather 演示）。这断言它<PriceAmount>0.00</PriceAmount>按预期找到了单个元素。

使用`find()`方法

根据 Python 文档，您可以使用以下接受匹配表达式（例如 XPath）作为参数的查找方法：

问题：有限的 XPath 语法支持`find()`

尽管它们支持的 XPath 语法是有限的！

此限制包括逻辑运算符，例如您的and. Karl Thornton 在他的XML parsing页面上解释了这一点：Python ~ XPath ~logical AND | 诗织。

另一方面，关于lxml文档的注释更喜欢它们：

这些.find*()方法通常比成熟的 XPath 支持更快。它们还通过 .iterfind() 方法支持增量树处理，而 XPath 总是在返回它们之前收集所有结果。因此，无论何时，只要不需要高度选择性的 XPath 查询，出于速度和内存方面的考虑，建议使用它们而不是 XPath。

（强调我的）

使用 lxml 的`xpath()`

因此，让我们从更安全、更丰富的xpath()功能开始（在过早优化之前）。例如：

# the node predicates to apply within XPath
sd_predicate = "[Supplier/SupplierRole='03']"
p_predicate = "[PriceType='02' and CurrencyCode='NOK' and PriceQualifier='05']"

pa_xpath = f"ProductSupply/SupplyDetail{sd_predicate}/Price{p_predicate}/PriceAmount"  # building XPath including predicates with f-string
print("Using XPath:", pa_xpath) # remove after debugging

root = tree.getroot()
price_amount = root.xpath(pa_xpath)
print("XPath evaluated to:", price_amount) # remove after debugging

也可以看看：

官方 lxml 指南：带有 lxml 的 XPath 和 XSLT
在 Python 中使用 XPath 和 LXML

python - 使用 Python lxml 解析带有条件的 ONIX xml

1 回答 1

XPath

使用find()方法

问题：有限的 XPath 语法支持find()

使用 lxml 的xpath()

Related

Reference

使用`find()`方法

问题：有限的 XPath 语法支持`find()`

使用 lxml 的`xpath()`