这是我第一次使用 SAXParser,(我在 Android 中使用它,但我认为这对这个特定问题没有影响)并且我正在尝试从 RSS 提要中读取数据。到目前为止,它在大多数情况下对我来说都很好,但是当它到达包含 HTML 编码文本(例如<a href="http://...
)的标签时我遇到了麻烦。该characters()
方法仅读取<
as a <
,然后将下一组字符视为单独的实体,而不是一次获取全部内容。我宁愿它只是按原样读入,而不实际翻译 HTML。我用于文档处理程序(缩短)的代码发布在下面:
@Override
public void startElement(String uri, String localName, String qName, Attributes attrs) throws SAXException {
if (localName.equalsIgnoreCase("channel")) {
inChannel = true;
}
if (inChannel) {
if (newFeed == null) newFeed = new Feed();
if (localName.equalsIgnoreCase("image")) {
if (feedImage == null) feedImage = new Image();
inImage = true;
}
if (localName.equalsIgnoreCase("item")) {
if (newItem == null) newItem = new Item();
if (itemList == null) itemList = new ArrayList<Item>();
inItem = true;
}
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if(!inItem) {
if(!inImage) {
if(inChannel) {
//Reached end of feed
if(localName.equalsIgnoreCase("channel")) {
newFeed.setItems((ArrayList<Item>)itemList);
finalFeed = newFeed;
newFeed = null;
inChannel = false;
return;
} else if(localName.equalsIgnoreCase("title")) {
newFeed.setTitle(currentValue); return;
} else if(localName.equalsIgnoreCase("link")) {
newFeed.setLink(currentValue); return;
} else if(localName.equalsIgnoreCase("description")) {
newFeed.setDescription(currentValue); return;
} else if(localName.equalsIgnoreCase("language")) {
newFeed.setLanguage(currentValue); return;
} else if(localName.equalsIgnoreCase("copyright")) {
newFeed.setCopyright(currentValue); return;
} else if(localName.equalsIgnoreCase("category")) {
newFeed.addCategory(currentValue); return;
}
}
}
else { //is inImage
//finished with feed image
if(localName.equalsIgnoreCase("image")) {
newFeed.setImage(feedImage);
feedImage = null;
inImage = false;
return;
} else if (localName.equalsIgnoreCase("url")) {
feedImage.setUrl(currentValue); return;
} else if (localName.equalsIgnoreCase("title")) {
feedImage.setTitle(currentValue); return;
} else if (localName.equalsIgnoreCase("link")) {
feedImage.setLink(currentValue); return;
}
}
}
else { //is inItem
//finished with news item
if (localName.equalsIgnoreCase("item")) {
itemList.add(newItem);
newItem = null;
inItem = false;
return;
} else if (localName.equalsIgnoreCase("title")) {
newItem.setTitle(currentValue); return;
} else if (localName.equalsIgnoreCase("link")) {
newItem.setLink(currentValue); return;
} else if (localName.equalsIgnoreCase("description")) {
newItem.setDescription(currentValue); return;
} else if (localName.equalsIgnoreCase("author")) {
newItem.setAuthor(currentValue); return;
} else if (localName.equalsIgnoreCase("category")) {
newItem.addCategory(currentValue); return;
} else if (localName.equalsIgnoreCase("comments")) {
newItem.setComments(currentValue); return;
} /*else if (localName.equalsIgnoreCase("enclosure")) {
To be implemented later
}*/ else if (localName.equalsIgnoreCase("guid")) {
newItem.setGuid(currentValue); return;
} else if (localName.equalsIgnoreCase("pubDate")) {
newItem.setPubDate(currentValue); return;
}
}
}
@Override
public void characters(char[] ch, int start, int length) {
currentValue = new String(ch, start, length);
}
我试图解析的 RSS 提要的一个例子就是这个。
有任何想法吗?