php - 在 Pre 标记中转义 HTML 字符

Question

我已经安装了一个语法荧光笔，但是为了让它工作，标签必须写成<and >。我需要做的是将所有 < 替换为<和 >替换为>但仅在 PRE 标记内。

所以，简而言之，我想转义 pre 标记内的所有 HTML 字符。

提前致谢。

score 2 · Accepted Answer

tl;博士

您需要解析输入的 HTML。使用DOMDocument该类来表示您的文档、解析输入、查找所有<pre>标签（使用findElementsByTagName）并转义其内容。

代码

不幸的是，DOM 模型非常低级，并且迫使您自己迭代<pre>标签的子节点，以逃避它们。这看起来如下：

function escapeRecursively($node) {
    if ($node instanceof DOMText)
        return $node->textContent;

    $children = $node->childNodes;
    $content = "<$node->nodeName>";
    for ($i = 0; $i < $children->length; $i += 1) {
        $child = $children->item($i);
        $content .= escapeRecursively($child);
    }

    return "$content</$node->nodeName>";
}

现在这个函数可以用来转义<pre>文档中的每个节点：

function escapePreformattedCode($html) {
    $doc = new DOMDocument();
    $doc->loadHTML($html);

    $pres = $doc->getElementsByTagName('pre');
    for ($i = 0; $i < $pres->length; $i += 1) {
        $node = $pres->item($i);

        $children = $node->childNodes;
        $content = '';
        for ($j = 0; $j < $children->length; $j += 1) {
            $child = $children->item($j);
            $content .= escapeRecursively($child);
        }
        $node->nodeValue = htmlspecialchars($content);
    }

    return $doc->saveHTML();
}

测试

$string = '<h1>Test</h1> <pre>Some <em>interesting</em> text</pre>';
echo escapePreformattedCode($string);

产量：

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><h1>Test</h1> <pre>Some &lt;em&gt;interesting&lt;/em&gt; text</pre></body></html>

请注意，一个 DOM 始终代表一个完整的文档。因此，当 DOM 解析器获取文档片段时，它会填充缺失的信息。这使得输出可能与输入不同。

php - 在 Pre 标记中转义 HTML 字符

1 回答 1

tl;博士

代码

测试

Related

Reference