1183 次
1 回答
2
The headers
'headers': [{'name': 'Content-Type', 'value': 'text/html; charset="UTF-8"'},
{'name': 'Content-Transfer-Encoding', 'value': 'quoted-printable'}]
are telling you that the message consists of text encoded as UTF-8, then quoted-printable encoded so that it can be processed by systems that only support 7-bit characters.
To decode, you need to decode from quoted-printable first, and then decode the resulting bytes from UTF-8.
Something like this ought to work:
utf8 = quopri.decodestring(htmlpart)
text = ut8.decode('utf-8')
HTML email bodies may contain character entities. These can be converted to individual characters using html.unescape (available in Python 3.4+).
>>> import html
>>> h = """</tr><tr><td class="m_4364729876101169671Uber18_text_p1" align="left" style="color:rgb(0,0,0);font-family:'Uber18-text-Regular','HelveticaNeue-Light','Helvetica Neue Light',Helvetica,Arial,sans-serif;font-size:16px;line-height:28px;direction:ltr;text-align:left"> Give friends free ride credit to try Uber. You'll get CN¥10 off each of your next 3 rides when they start riding. <span class="m_4364729876101169671Uber18_text_p1" style="color:#000000;font-family:'Uber18-text-Regular','HelveticaNeue-Light','Helvetica Neue Light',Helvetica,Arial,sans-serif;font-size:16px;line-height:28px">Share code: 20ccv</span></td>"""
>>> print(html.unescape(h))
</tr><tr><td class="m_4364729876101169671Uber18_text_p1" align="left" style="color:rgb(0,0,0);font-family:'Uber18-text-Regular','HelveticaNeue-Light','Helvetica Neue Light',Helvetica,Arial,sans-serif;font-size:16px;line-height:28px;direction:ltr;text-align:left"> Give friends free ride credit to try Uber. You'll get CN¥10 off each of your next 3 rides when they start riding. <span class="m_4364729876101169671Uber18_text_p1" style="color:#000000;font-family:'Uber18-text-Regular','HelveticaNeue-Light','Helvetica Neue Light',Helvetica,Arial,sans-serif;font-size:16px;line-height:28px">Share code: 20ccv</span></td>
于 2019-03-23T07:48:42.880 回答