python - 当文档包含图像时，使用 python-docx 读取文本问题

Question

我在从包含图像的文档中解析文本时遇到问题。

我在运行 Ubuntu 12.04.4 LTS（GNU/Linux 3.2.0-60-generic x86_64）的 Ubuntu Linux 机器上使用 0.7.0 版 Python docx

我正在使用这个逻辑：

```

        document = Document(path)
        # Get all paragraphs
        paras = document.paragraphs

        text = ""

        # Push the text from the paragraph on a single string
        for para in paras:
            # Don't forget the line break
            text += "\n" + para.text

        return text.strip()

```

当有图像时，此过程失败。

有什么我做错了吗？

score 0 · Accepted Answer

python-docx应该支持你在这里尝试做的事情。如果您提供在引发错误时获得的堆栈跟踪，我会看一下。

顺便说一句，您可以更优雅地将其编码为：

document = Document(path)
text = '\n'.join([para.text for para in document.paragraphs])

python - 当文档包含图像时，使用 python-docx 读取文本问题

1 回答 1

Related

Reference