python - 在内存流中编码或 TextIOBase 如何工作？

Question

我目前正在阅读 io 模块的文档：https ://docs.python.org/3.5/library/io.html?highlight=stringio#io.TextIOBase

也许是因为我对 Python 不够了解，但在大多数情况下，我只是不了解他们的文档。

我需要将数据保存addresses_list到 csv 文件中，并通过 https 将其提供给用户。所以所有这些都必须发生在内存中。这是它的代码，目前它工作正常。

addresses = Abonnent.objects.filter(exemplare__gt=0)
addresses_list = list(addresses.values_list(*fieldnames))

csvfile = io.StringIO()
csvwriter_unicode = csv.writer(csvfile)
csvwriter_unicode.writerow(fieldnames)

for a in addresses_list:
    csvwriter_unicode.writerow(a)
csvfile.seek(0)

export_data = io.BytesIO()
myzip = zipfile.ZipFile(export_data, "w", zipfile.ZIP_DEFLATED)
myzip.writestr("output.csv", csvfile.read())
myzip.close()
csvfile.close()
export_data.close()

# serve the file via https

现在的问题是我需要将 csv 文件的内容编码为cp1252而不是utf-8。传统上，我只会编写f = open("output.csv", "w", encoding="cp1252")然后将所有数据转储到其中。但是对于内存中的流，它不会那样工作。两者，io.StringIO()并且io.BytesIO()不带参数encoding=。

这是我对文档有深刻理解的地方：

文本流 API 在 TextIOBase 的文档中有详细描述。

TextIOBase的文档是这样说的：

编码=

用于将流的字节解码为字符串以及将字符串编码为字节的编码名称。

但io.StringIO(encoding="cp1252")只是抛出：TypeError: 'encoding' is an invalid keyword argument for this function。

那么如何将 TextIOBase 的编码参数与 StringIO 一起使用呢？或者这通常是如何工作的？我感到很困惑。

score 1 · Accepted Answer

StringIO 只处理字符串/文本。它对编码或字节一无所知。做你想做的最简单的方法可能是这样的：

f = StringIO()
f.write("Some text")

# Old-ish way:
f.seek(0)
my_bytes = f.read().encode("cp1252")

# Alternatively
my_bytes = f.getvalue().encode("cp1252")

score 0 · Accepted Answer

使用 io.TextIOWrapper 从 io.BytesIO（在内存流中）读取文本，包括编码和错误处理（python3）

这做了 io.StringIO 不能做的事情

示例代码

>>> import io
>>> import chardet
>>> # my bytes, single german umlaut
... bts = b'\xf6'
>>> 
>>> # try reading as utf-8 text and on error replace
... my_encoding = 'utf-8'
>>> fh_bytes = io.BytesIO(bts)
>>> fh = io.TextIOWrapper(fh_bytes, encoding=my_encoding, errors='replace')
>>> fh.read()
'�'
>>> 
>>> # try reading as utf-8 text with strict error handling
... fh_bytes = io.BytesIO(bts)
>>> fh = io.TextIOWrapper(fh_bytes, encoding=my_encoding, errors='strict')
>>> # catch exception
... try:
...     fh.read()
... except UnicodeDecodeError as err:
...     print('"%s"' % err)
...     # try to get encoding
...     my_encoding = chardet.detect(err.object)['encoding']
...     print("correct encoding is %s" % my_encoding)
... 
"'utf-8' codec can't decode byte 0xf6 in position 0: invalid start byte"
correct encoding is windows-1252
>>> # retry with detected encoding
... fh_bytes = io.BytesIO(bts)
>>> fh = io.TextIOWrapper(fh_bytes, encoding=my_encoding, errors='strict')
>>> fh.read()
'ö'

python - 在内存流中编码或 TextIOBase 如何工作？

编码=

2 回答 2

使用 io.TextIOWrapper 从 io.BytesIO（在内存流中）读取文本，包括编码和错误处理（python3）

示例代码

Related

Reference