python - 使用 Python pathlib 处理非 UTF-8 Posix 文件名？

Question

我正在尝试使用在 Python 3.4+ 中成为标准库一部分的 pathlib 模块来查找和操作文件路径。尽管它是对 os.path 样式函数的改进，能够以面向对象的方式处理路径，但我在处理 Posix 文件系统上一些更奇特的文件名时遇到了麻烦；特别是名称包含无法解码为 UTF-8 的字节的文件：

>>> pathlib.PosixPath(b'\xe9')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/pathlib.py", line 969, in __new__
    self = cls._from_parts(args, init=False)
  File "/usr/lib/python3.5/pathlib.py", line 651, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/lib/python3.5/pathlib.py", line 643, in _parse_args
    % type(a))
TypeError: argument should be a path or str object, not <class 'bytes'>

>>> b'\xe9'.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 0: unexpected end of data

这样做的问题是，在 Posix 文件系统上，可以存在这样的文件，我希望能够在我的应用程序中处理任何文件系统有效的文件名，而不是导致错误和/或可预测的行为。

我可以使用父目录的 .iterdir() 方法在目录中获取此类文件的 PosixPath 对象。但是我还没有找到一种方法从作为“字节”类型变量提供的完整路径中获取它，当从完全支持所有文件系统有效的原始字节值的另一个源加载路径时，这是很难避免的（例如数据库或包含 nul 分隔路径的文件）。

有没有办法做到这一点，我不知道？或者，如果真的不可能：这是设计使然，还是认为标准库中的缺陷可能需要报告错误？

我确实找到了相关的错误报告，但该问题涉及文档错误地提到允许使用“字节”类的参数。

score 3 · Accepted Answer

我想你可以像这样得到你想要的：

import os
PosixPath(os.fsdecode(b'\xe9'))

演示：

>>> import os, pathlib
>>> b = b'\xe9'
>>> p = pathlib.Path(os.fsdecode(b))
>>> p.exists()
False
>>> with open(b, mode='w') as f:
...     f.write('wacky filename')
...     
>>> p.exists()
True
>>> p.read_bytes()
b'wacky filename'
>>> os.listdir(b'.')
[b'\xe9']

python - 使用 Python pathlib 处理非 UTF-8 Posix 文件名？

1 回答 1

Related

Reference