python - GMail API 解码来自各处的消息

Question

我正在使用 Python 中的 GMail API 来检索用法语编写的邮件，但实际上我遇到了口音问题。

我用这个检索消息：

 message = service.users().messages().get(userId="me", id=i, format="raw").execute()

我想要的只是获取邮件的正文，所以我从这个开始：

base64.urlsafe_b64decode(message['raw'].encode('ASCII'))

对于某些邮件，它有效，我检索所有邮件数据，包括法语文本，例如：

"Cette semaine, vous vous êtes servis du module de révision 0 fois"

对于其他一些人，我得到引用打印编码，如下所示：

"Salut, =E7a farte?"

引用打印编码没有问题，因为我使用该quopri模块构建了一个简单的解码功能。这里的主要问题是最后一句对于quoted-print编码是错误的，编码的字符ç应该是这样编码的：

"Salut, =C3=A7a farte?"

所以用错误的编码句子，我最终得到了这种东西：

Salut, �a farte?

我怀疑来源是不同的邮件客户端，我的第一个示例是从 Gmail 客户端发送到 Outlook 地址的消息，而第二个示例则相反；发送到 Gmail 地址的 Outlook 邮件。

我的问题是，有没有办法为任何可能的情况处理解码？

score 2 · Accepted Answer

问题在于，虽然quopri正确地将邮件正文从 7 位数据转换为 8 位数据，但用于将此字节字符串转换为 unicode 字符串的编码并不正确。在您的示例中，它似乎是 ISO-8859-1：

In [1]: import quopri

In [2]: quopri.decodestring('Salut, =E7a farte?').decode('iso-8859-1')
Out[2]: 'Salut, ça farte?'

通常，您应该能够使用Content-Type标头获得正确的编码。这是使用引用可打印 UTF-8 编码的邮件中的样子：

Content-Type: text/plain;charset=UTF-8
Content-Transfer-Encoding: quoted-printable

score 0 · Accepted Answer

尝试这个：

message = service.users().messages().get(userId='me', id=i).execute()
content = message['payload']['body']['data']
print(base64.b64decode(content).decode('utf-8'))

这将获得电子邮件的内容。

python - GMail API 解码来自各处的消息

2 回答 2

Related

Reference