nlp - 如何结合多个 OCR 工具的结果来获得更好的文本识别

翻译自：https://stackoverflow.com/questions/55367637 2019-03-26T23:28:31.580

209 次

想象一下，您有不同的 OCR 工具来从图像中读取文本，但它们都不能为您提供 100% 准确的输出。然而，结合起来，结果可能非常接近基本事实 - 将文本“融合”在一起以获得良好结果的最佳技术是什么？

例子：

实际文本

§ 5.1: The contractor is obliged to announce the delay by 01.01.2019 at the latest. The identification-number to be used is OZ-771LS.

OCR 工具 1

5 5.1 The contractor is obliged to announce the delay by O1.O1.2019 at the latest. The identification-number to be used is OZ77lLS.

OCR 工具 2

§5.1: The contract or is obliged to announce theedelay by 01.O1. 2O19 at the latest. The identification number to be used is O7-771LS

OCR 工具 3

§ 5.1: The contractor is oblige to do announced he delay by 01.01.2019 at the latest. T he identification-number ti be used is OZ-771LS.

什么是融合 OCR 1、2 和 3 以获得实际文本的有前途的算法？

我的第一个想法是创建一个任意长度的“翻转窗口”，比较窗口中的单词并从 3 个工具中为每个位置预测单词 2。

例如窗口大小为 3：

[5 5.1 The]

[§5.1: The contract]

[§ 5.1: The]

如您所见，该算法不起作用，因为所有三个工具都有不同的位置一候选（5，§5.1：和§）。

当然可以添加一些技巧，比如 Levenshtein distance 来允许一些偏差，但我担心这真的不够健壮。

0 回答 0