python - 为什么 Tesseract 数字识别无法正常工作？

Question

过去几天我一直在工作pytesseract，我注意到图书馆在识别数字方面非常糟糕。我不知道，如果我做错了什么，但我不断得到♀输出。

class Image_Recognition():
    def digit_identification(self):
        # save normal screenshot
        screen = ImageGrab.grab(bbox=(706,226,1200,726))
        screen.save(r'tmp\tmp.png')

        # read the image file
        img = cv2.imread(r'tmp\tmp.png', 2)
        
        # convert to binary image
        [ret, bw_img] = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)

        # use OCR library to identify numbers in screenshot
        text = pytesseract.image_to_string(bw_img)
        print(text)

输入：

（转换为二进制图像以使数字更易于理解。）

输出：

♀

告诉我是否有问题，或者只是建议其他处理文本识别的方法。

score 1 · Accepted Answer

首先，请阅读文章提高输出质量，尤其是关于页面分割方法的部分。此外，您可以将要查找的字符限制为数字0-9。

您有一个很小的图像，这使得同时提取所有数字非常具有挑战性，特别是对于深色背景上明亮文本的混合，反之亦然。但是，您可以很容易地裁剪所有单个图块，并一一提取数字。因此，无需区分这两种类型的瓷砖。

另外，您知道，该数字必须是 2 的倍数（我想大多数人都会知道2048）。因此，如果找不到这样的数字，请尝试放大裁剪的图块，然后重复。（最终，几次后放弃。）

那将是我的完整代码：

import cv2
import math
import pytesseract


# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def log2(x):
    return math.log10(x) / math.log10(2)


# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def is_power_of_2(n):
    return math.ceil(log2(n)) == math.floor(log2(n))


# Load image, get dimensions of a single tile
img = cv2.imread('T72q4s.png')
h, w = [x // 4 for x in img.shape[:2]]

# Initialize result array (too lazy to import NumPy for that...)
a = cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (4, 4)).astype(int)

# https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method
# https://stackoverflow.com/q/4944830/11089932
config = '--psm 6 -c tessedit_char_whitelist=0123456789'

# Iterate tiles, and extract texts
for i in range(4):
    for j in range(4):
        
        # Crop tile
        x1 = i * w
        x2 = (i + 1) * w
        y1 = j * h
        y2 = (j + 1) * h
        roi = img[y1:y2, x1:x2]

        # If no proper power of 2 is found, upscale image and repeat
        while True:
            text = pytesseract.image_to_string(roi, config=config)
            text = text.replace('\n', '').replace('\f', '')
            if (text == '') or (not is_power_of_2(int(text))):
                roi = cv2.resize(roi, (0, 0), fx=2, fy=2)
                if roi.shape[0] > 1000:
                    a[j, i] = -1
                    break
            else:
                a[j, i] = int(text)
                break

print(a)

对于给定的图像，我得到以下输出：

[[ 8 16  4  2]
 [ 2  8 32  8]
 [ 2  4 16  4]
 [ 4  2  4  2]]

对于另一个类似的图像

我得到：

[[ 4 -1 -1 -1]
 [ 2  2 -1 -1]
 [-1 -1 -1 -1]
 [ 2 -1 -1 -1]]

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.3
OpenCV:        4.5.3
pytesseract:   5.0.0-alpha.20201127
----------------------------------------

python - 为什么 Tesseract 数字识别无法正常工作？

1 回答 1

Related

Reference