python-2.7 - 将俄语字符串转换为日期时间

Question

我正在尝试抓取一个俄罗斯网站。但是我坚持尝试将俄语西里尔字母转换为 DateTime 对象。

让我们以这个 html 片段为例：

<div class="medium-events-list_datetime">22 января весь день</div>

我可以通过使用 lxml 来获取这个 div 的内容，即：

date = root.xpath('/html/body/div[1]/div/div[2]/text()')[0].strip()

所以这个字符串的相关部分是 22 января，即日和月。

为了得到这部分，我正在使用该.split()方法

现在这就是问题所在，我正在尝试将其转换为 DateTime。我尝试使用 DateParser：https ://dateparser.readthedocs.org/en/latest/ ，它应该支持俄语。

但是，None当我将此字符串传递给dateparser.parse()

有没有人遇到过类似的问题？我正在用头撞墙。任何帮助表示赞赏:)

score 3 · Accepted Answer

try running this example:

#coding=utf-8
import dateparser
s = u"22 января"
print dateparser.parse(s)

It should output 2016-01-22 00:00:00

Important: Make sure that you're actually using utf-8 strings. More info: https://www.python.org/dev/peps/pep-0263/

Otherwise your parsing/splitting might be wrong, so try having a look at the results after the split().

1 回答 1