2

我用 SGML 和 XmlDocument 遍历一个 html 文档。当我找到其类型为 Text 的 XmlNode 时,我需要更改其具有 xml 元素的值。我无法更改 InnerXml,因为它是只读的。我试图更改 InnerText,但这次标签描述符字符<>编码为&lt;and &gt;。例如:

<p>
    This is a text that will be highlighted.
    <anothertag />
    <......>
</p>

我正在尝试更改为:

<p>
    This is a text that will be <span class="highlighted">highlighted</span>.
    <anothertag />
    <......>
</p>

修改文本 XmlNode 值的最简单方法是什么?

4

3 回答 3

2

我有一个解决方法,我不知道它是一个真正的解决方案还是什么,但它可以产生我想要的结果。请评论此代码是否值得解决

    private void traverse(ref XmlNode node)
    {
        XmlNode prevOldElement = null;
        XmlNode prevNewElement = null;
        var element = node.FirstChild;
        do
        {
            if (prevNewElement != null && prevOldElement != null)
            {
                prevOldElement.ParentNode.ReplaceChild(prevNewElement, prevOldElement);
                prevNewElement = null;
                prevOldElement = null;
            }
            if (element.NodeType == XmlNodeType.Text)
            {
                var el = doc.CreateElement("text");
                //Here is manuplation of the InnerXml.
                el.InnerXml = element.Value.Replace(a_search_term, "<b>" + a_search_term + "</b>");
                //I don't replace element right now, because element.NextSibling will be null.
                //So I replace the new element after getting the next sibling.
                prevNewElement = el;
                prevOldElement = element;
            }
            else if (element.HasChildNodes)
                traverse(ref element);
        }
        while ((element = element.NextSibling) != null);
        if (prevNewElement != null && prevOldElement != null)
        {
            prevOldElement.ParentNode.ReplaceChild(prevNewElement, prevOldElement);
        }

    }

另外,我在 traverse 函数之后删除<text></text>字符串:

        doc = new XmlDocument();
        doc.PreserveWhitespace = true;
        doc.XmlResolver = null;
        doc.Load(sgmlReader);
        var html = doc.FirstChild;
        traverse(ref html);
        textBox1.Text = doc.OuterXml.Replace("<text>", String.Empty).Replace("</text>", String.Empty);
于 2011-10-28T13:35:23.420 回答
1
using System;
using System.Xml;

public class Sample {

  public static void Main() {
    XmlDocument doc = new XmlDocument();
    doc.LoadXml(
    "<p>" +
    "This is a text that will be highlighted." +
    "<br />" +
    "<img />" +
    "</p>");
    string ImpossibleMark = "_*_";
    XmlNode elem = doc.DocumentElement.FirstChild;
    string thewWord ="highlighted";
    if(elem.NodeType == XmlNodeType.Text){
        string OriginalXml = elem.ParentNode.InnerXml;
        while(OriginalXml.Contains(ImpossibleMark)) ImpossibleMark += ImpossibleMark;
        elem.InnerText = elem.InnerText.Replace(thewWord, ImpossibleMark);
        string replaceString = "<span class=\"highlighted\">" + thewWord + "</span>";
        elem.ParentNode.InnerXml = elem.ParentNode.InnerXml.Replace(ImpossibleMark, replaceString);
    }

    Console.WriteLine(doc.DocumentElement.InnerXml);
  }
}
于 2011-10-26T15:26:56.573 回答
0

InnerText属性将为您提供所有子节点的文本内容XmlNode。您真正想要设置的是InnerXml属性,它将被解释为 XML,而不是文本。

于 2011-10-26T12:40:32.110 回答