encoding - 将lisp字符串从/转换为C字符串时编码'utf-16'不一致

Question

我发现当使用'utf-16'作为编码将lisp字符串转换为cffi的C字符串时，实际使用的编码是'utf-16le'。但是，当将 C 字符串转换回 lisp 字符串时，使用的实际编码是 'utf-16be'。由于我还不熟悉“babel”（它为“cffi”提供了编码工具），我不确定这是否是一个错误。

(defun convtest (str to-c from-c)
  (multiple-value-bind (ptr size)
      (cffi:foreign-string-alloc str :encoding to-c)
    (declare (ignore size))
    (prog1
        (cffi:foreign-string-to-lisp ptr :encoding from-c)
      (cffi:foreign-string-free ptr))))

(convtest "hello" :utf-16   :utf-16)     ;=> garbage string
(convtest "hello" :utf-16   :utf-16le)   ;=> "hello"
(convtest "hello" :utf-16   :utf-16be)   ;=> garbage string
(convtest "hello" :utf-16le :utf-16be)   ;=> garbage string
(convtest "hello" :utf-16le :utf-16le)   ;=> "hello"

`convtest' 将 lisp 字符串转换为 C 字符串，然后再转换回 lisp 字符串，使用 `to-c'、`from-c' 作为编码。所有输出的垃圾字符串都是一样的。从测试中我们看到，如果我们同时使用'utf-16'作为'to-c'和'from-c'，转换失败。

score 2 · Accepted Answer

这里的 to-c 编码默认采用小端（le）。From-c 然后将 big-endian 作为默认值 (be)。

平台本身（x86）是小端的。UTF-16 更喜欢大端或从字节顺序标记中获取信息。

这可能取决于您运行的平台？平台似乎有不同的默认值。

最好查看源代码，为什么选择这些编码。您也可以在 CFFI 邮件列表中询问编码选择以及它们如何依赖于平台（如果有的话）。

encoding - 将lisp字符串从/转换为C字符串时编码'utf-16'不一致

1 回答 1

Related

Reference