项目源码

https://github.com/PaddlePaddle/PaddleOCR

环境安装

启动一个py3.8的环境

conda create -n ocr python=3.8
# 激活环境
conda activate ocr
# 退出环境
conda deactivate

安装飞浆

# 服务器上cuda版本高，装的2.2.2
python -m pip install paddlepaddle-gpu==2.2.2.post110 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
# 本地开发环境上cuda版本10.0，装的2.0.2
python -m pip install paddlepaddle-gpu==2.0.2.post100 -f https://paddlepaddle.org.cn/whl/mkl/stable.html

验证安装paddlepaddle，执行代码，初始化paddlepaddle

import paddle
paddle.utils.run_check()

验证过程中出现：

TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
(ocr) [root@iZbp1100kme19bqwzx10b6Z paddleocr]# pip install protobuf==3.20.0

# 因此降低protobuf版本就好

# 服务器上安装的是
pip install "protobuf<=3.19.0"

pip install protobuf==3.20.0

安装ocr

# 本地开发环境安装的是
pip install "paddleocr==2.0.1"
# 服务器上安装的是，linux下的2.0.1版本找不到
pip install "paddleocr==2.1.1"

测试OCR

paddleocr --image_dir C:\Users\13110\Desktop\12.png  --use_angle_cls true --use_gpu true

测试过程中提示依赖版本冲突，pip check也能查询当前环境的所有pip冲突
注意： 没有提示冲突，可以不管版本降级

AttributeError: module 'numpy' has no attribute 'int'.
`np.int` was a deprecated alias for the builtin `int`. To avoid this error in existing code, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

# 降低numpy版本就好
pip install "nump1.19.5"

降低numpy版本后又有版本冲突，是其他依赖numpy的库版本太高了，降低就好

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
contourpy 1.2.0 requires numpy<2.0,>=1.20, but you have numpy 1.19.5 which is incompatible.
matplotlib 3.8.3 requires numpy<2,>=1.21, but you have numpy 1.19.5 which is incompatible.
pandas 2.2.1 requires numpy<2,>=1.22.4; python_version < "3.11", but you have numpy 1.19.5 which is incompatible.
scikit-image 0.22.0 requires numpy>=1.22, but you have numpy 1.19.5 which is incompatible.
scipy 1.12.0 requires numpy<1.29.0,>=1.22.4, but you have numpy 1.19.5 which is incompatible.
contourpy 1.2.0 requires numpy<2.0,>=1.20, but you have numpy 1.19.5 which is incompatible.
pywavelets 1.5.0 requires numpy<2.0,>=1.22.4, but you have numpy 1.19.5 which is incompatible.

# 降低版本
pip install pandas==1.3.3
pip install matplotlib==3.4.3
pip install scikit-image==0.18.3
pip install scipy==1.9.0
pip install contourpy==1.1.0
pip install pywavelets==1.4.1

使用

命令行

paddleocr --image_dir C:\Users\13110\Desktop\12.png  --use_angle_cls true --use_gpu true

代码调用

import time

import cv2
import numpy as np
import paddle
from PIL import Image
from paddleocr import PaddleOCR

from utils.utility import draw_ocr_box_txt, draw_ocr

ocr = PaddleOCR(use_angle_cls=True,
                lang="ch",
                det_model_dir='model/4.0/ch_PP-OCRv4_det_infer',
                rec_model_dir='model/4.0/ch_PP-OCRv4_rec_infer',
                cls_model_dir='model/4.0/ch_ppocr_mobile_v2.0_cls_infer',
                )  # need to run only once to download and load model into memory


def detect_ocr(img_path):
    # Paddleocr目前支持的多语言语种可以通过修改lang参数进行切换
    # 例如`ch`, `en`, `fr`, `german`, `korean`, `japan`

    img = cv2.imread(img_path)
    img_origin = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    height, width = img_origin.shape[:2]
    img_origin = img_origin[int(height / 2):height, int(width / 2):width]

    height, width = img_origin.shape[:2]
    max_length = 1000.0
    scale = 1
    if max(height, width) > max_length:
        scale = max_length / max(height, width)
        img = cv2.resize(img_origin, (int(width * scale), int(height * scale)))
    else:
        img = img_origin
    result = ocr.ocr(img, det=True, rec=True, cls=True)
    # for idx in range(len(result)):
    #     res = result[idx]
    #     for line in res:
    #         print(line)

    # 显示结果
    boxes = []
    txts = []
    scores = []

    for one_res in result:
        boxes.append(one_res[0])
        txts.append(one_res[1][0])
        scores.append(one_res[1][1])

    boxes = [[[int(element / scale) for element in sublist] for sublist in subsublist] for subsublist in boxes]
    im_show1 = draw_ocr(img_origin, boxes, txts, scores, font_path='simfang.ttf')
    img_origin = Image.fromarray(img_origin)
    im_show2 = draw_ocr_box_txt(img_origin, boxes, txts, scores, font_path='simfang.ttf')

    im_show = im_show2

    im_show = Image.fromarray(im_show)
    im_show.save(f"{img_path.split('.')[0] + '_detect'}.png")


if __name__ == '__main__':
    detect_ocr('imgs/1.jpg')
    detect_ocr('imgs/2.jpg')
    detect_ocr('imgs/3.jpg')
    detect_ocr('imgs/4.jpg')

我拷贝了源码中的utility逻辑，自己修改了draw_ocr_box_txt和draw_ocr内部绘图逻辑

# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import os
import sys
import cv2
import numpy as np
import json
from PIL import Image, ImageDraw, ImageFont
import math


def parse_args():
    def str2bool(v):
        return v.lower() in ("true", "t", "1")

    parser = argparse.ArgumentParser()
    # params for prediction engine
    parser.add_argument("--use_gpu", type=str2bool, default=True)
    parser.add_argument("--ir_optim", type=str2bool, default=True)
    parser.add_argument("--use_tensorrt", type=str2bool, default=False)
    parser.add_argument("--gpu_mem", type=int, default=8000)

    # params for text detector
    parser.add_argument("--image_dir", type=str)
    parser.add_argument("--det_algorithm", type=str, default='DB')
    parser.add_argument("--det_model_dir", type=str)
    parser.add_argument("--det_limit_side_len", type=float, default=960)
    parser.add_argument("--det_limit_type", type=str, default='max')

    # DB parmas
    parser.add_argument("--det_db_thresh", type=float, default=0.3)
    parser.add_argument("--det_db_box_thresh", type=float, default=0.5)
    parser.add_argument("--det_db_unclip_ratio", type=float, default=1.6)

    # EAST parmas
    parser.add_argument("--det_east_score_thresh", type=float, default=0.8)
    parser.add_argument("--det_east_cover_thresh", type=float, default=0.1)
    parser.add_argument("--det_east_nms_thresh", type=float, default=0.2)

    # SAST parmas
    parser.add_argument("--det_sast_score_thresh", type=float, default=0.5)
    parser.add_argument("--det_sast_nms_thresh", type=float, default=0.2)
    parser.add_argument("--det_sast_polygon", type=bool, default=False)

    # params for text recognizer
    parser.add_argument("--rec_algorithm", type=str, default='CRNN')
    parser.add_argument("--rec_model_dir", type=str)
    parser.add_argument("--rec_image_shape", type=str, default="3, 32, 320")
    parser.add_argument("--rec_char_type", type=str, default='ch')
    parser.add_argument("--rec_batch_num", type=int, default=6)
    parser.add_argument("--max_text_length", type=int, default=25)
    parser.add_argument(
        "--rec_char_dict_path",
        type=str,
        default="./ppocr/utils/ppocr_keys_v1.txt")
    parser.add_argument("--use_space_char", type=str2bool, default=True)
    parser.add_argument(
        "--vis_font_path", type=str, default="./doc/simfang.ttf")
    parser.add_argument("--drop_score", type=float, default=0.5)

    # params for text classifier
    parser.add_argument("--use_angle_cls", type=str2bool, default=False)
    parser.add_argument("--cls_model_dir", type=str)
    parser.add_argument("--cls_image_shape", type=str, default="3, 48, 192")
    parser.add_argument("--label_list", type=list, default=['0', '180'])
    parser.add_argument("--cls_batch_num", type=int, default=30)
    parser.add_argument("--cls_thresh", type=float, default=0.9)

    parser.add_argument("--enable_mkldnn", type=str2bool, default=False)
    parser.add_argument("--use_zero_copy_run", type=str2bool, default=False)

    parser.add_argument("--use_pdserving", type=str2bool, default=False)

    return parser.parse_args()


def draw_text_det_res(dt_boxes, img_path):
    src_im = cv2.imread(img_path)
    for box in dt_boxes:
        box = np.array(box).astype(np.int32).reshape(-1, 2)
        cv2.polylines(src_im, [box], True, color=(255, 255, 0), thickness=2)
    return src_im


def resize_img(img, input_size=600):
    """
    resize img and limit the longest side of the image to input_size
    """
    img = np.array(img)
    im_shape = img.shape
    im_size_max = np.max(im_shape[0:2])
    im_scale = float(input_size) / float(im_size_max)
    img = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
    return img


def draw_ocr(image,
             boxes,
             txts=None,
             scores=None,
             drop_score=0.5,
             font_path="./doc/simfang.ttf"):
    """
    Visualize the results of OCR detection and recognition
    args:
        image(Image|array): RGB image
        boxes(list): boxes with shape(N, 4, 2)
        txts(list): the texts
        scores(list): txxs corresponding scores
        drop_score(float): only scores greater than drop_threshold will be visualized
        font_path: the path of font which is used to draw text
    return(array):
        the visualized img
    """
    if scores is None:
        scores = [1] * len(boxes)
    box_num = len(boxes)
    for i in range(box_num):
        if scores is not None and (scores[i] < drop_score or
                                   math.isnan(scores[i])):
            continue
        box = np.reshape(np.array(boxes[i]), [-1, 1, 2]).astype(np.int64)
        image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
    if txts is not None:
        # img = np.array(resize_img(image, input_size=600))
        img = image
        txt_img = text_visual(
            txts,
            scores,
            img_h=img.shape[0],
            img_w=img.shape[1],
            threshold=drop_score,
            font_path=font_path)
        img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
        return img
    return image


def draw_ocr_box_txt(image,
                     boxes,
                     txts,
                     scores=None,
                     drop_score=0.5,
                     font_path=r"C:\Windows\Fonts\simfang.ttf"):
    h, w = image.height, image.width
    img_left = image.copy()
    img_right = Image.new('RGB', (w, h), (255, 255, 255))

    import random

    random.seed(0)
    draw_left = ImageDraw.Draw(img_left)
    draw_right = ImageDraw.Draw(img_right)
    for idx, (box, txt) in enumerate(zip(boxes, txts)):
        if scores is not None and scores[idx] < drop_score:
            continue
        color = (random.randint(0, 255), random.randint(0, 255),
                 random.randint(0, 255))
        box0 = [item for sublist in box for item in sublist]
        draw_left.polygon(box0, fill=color)
        draw_right.polygon(
            [
                box[0][0], box[0][1], box[1][0], box[1][1], box[2][0],
                box[2][1], box[3][0], box[3][1]
            ],
            outline=color)
        box_height = math.sqrt((box[0][0] - box[3][0]) ** 2 + (box[0][1] - box[3][
            1]) ** 2)
        box_width = math.sqrt((box[0][0] - box[1][0]) ** 2 + (box[0][1] - box[1][
            1]) ** 2)
        if box_height > 2 * box_width:
            font_size = max(int(box_width * 0.9), 10)
            font = ImageFont.truetype(font_path, font_size, encoding="utf-8")
            cur_y = box[0][1]
            for c in txt:
                char_size = font.getbbox(c)
                draw_right.text(
                    (box[0][0] + 3, cur_y), c, fill=(0, 0, 0), font=font)
                cur_y += char_size[1]
        else:
            font_size = max(int(box_height * 0.8), 10)
            font = ImageFont.truetype(font_path, font_size, encoding="utf-8")
            draw_right.text(
                [box[0][0], box[0][1]], txt, fill=(0, 0, 0), font=font)
    img_left = Image.blend(image, img_left, 0.5)
    img_show = Image.new('RGB', (w * 2, h), (255, 255, 255))
    img_show.paste(img_left, (0, 0, w, h))
    img_show.paste(img_right, (w, 0, w * 2, h))
    return np.array(img_show)


def str_count(s):
    """
    Count the number of Chinese characters,
    a single English character and a single number
    equal to half the length of Chinese characters.
    args:
        s(string): the input of string
    return(int):
        the number of Chinese characters
    """
    import string
    count_zh = count_pu = 0
    s_len = len(s)
    en_dg_count = 0
    for c in s:
        if c in string.ascii_letters or c.isdigit() or c.isspace():
            en_dg_count += 1
        elif c.isalpha():
            count_zh += 1
        else:
            count_pu += 1
    return s_len - math.ceil(en_dg_count / 2)


def text_visual(texts,
                scores,
                img_h=400,
                img_w=600,
                threshold=0.,
                font_path="./doc/simfang.ttf"):
    """
    create new blank img and draw txt on it
    args:
        texts(list): the text will be draw
        scores(list|None): corresponding score of each txt
        img_h(int): the height of blank img
        img_w(int): the width of blank img
        font_path: the path of font which is used to draw text
    return(array):
    """
    if scores is not None:
        assert len(texts) == len(
            scores), "The number of txts and corresponding scores must match"

    def create_blank_img():
        blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
        blank_img[:, img_w - 1:] = 0
        blank_img = Image.fromarray(blank_img).convert("RGB")
        draw_txt = ImageDraw.Draw(blank_img)
        return blank_img, draw_txt

    blank_img, draw_txt = create_blank_img()

    # font_size = 20
    font_size = int(min(img_w, img_h) / 20)
    txt_color = (0, 0, 0)
    font = ImageFont.truetype(font_path, font_size, encoding="utf-8")

    gap = font_size + 5
    txt_img_list = []
    count, index = 1, 0
    for idx, txt in enumerate(texts):
        index += 1
        if scores[idx] < threshold or math.isnan(scores[idx]):
            index -= 1
            continue
        first_line = True
        while str_count(txt) >= img_w // font_size - 4:
            tmp = txt
            txt = tmp[:img_w // font_size - 4]
            if first_line:
                new_txt = str(index) + ': ' + txt
                first_line = False
            else:
                new_txt = '    ' + txt
            draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
            txt = tmp[img_w // font_size - 4:]
            if count >= img_h // gap - 1:
                txt_img_list.append(np.array(blank_img))
                blank_img, draw_txt = create_blank_img()
                count = 0
            count += 1
        if first_line:
            new_txt = str(index) + ': ' + txt + '   ' + '%.3f' % (scores[idx])
        else:
            new_txt = "  " + txt + "  " + '%.3f' % (scores[idx])
        draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
        # whether add new blank img or not
        if count >= img_h // gap - 1 and idx + 1 < len(texts):
            txt_img_list.append(np.array(blank_img))
            blank_img, draw_txt = create_blank_img()
            count = 0
        count += 1
    txt_img_list.append(np.array(blank_img))
    if len(txt_img_list) == 1:
        blank_img = np.array(txt_img_list[0])
    else:
        blank_img = np.concatenate(txt_img_list, axis=1)
    return np.array(blank_img)


def base64_to_cv2(b64str):
    import base64
    data = base64.b64decode(b64str.encode('utf8'))
    data = np.fromstring(data, np.uint8)
    data = cv2.imdecode(data, cv2.IMREAD_COLOR)
    return data


def draw_boxes(image, boxes, scores=None, drop_score=0.5):
    if scores is None:
        scores = [1] * len(boxes)
    for (box, score) in zip(boxes, scores):
        if score < drop_score:
            continue
        box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
        image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
    return image


if __name__ == '__main__':
    test_img = "./doc/test_v2"
    predict_txt = "./doc/predict.txt"
    f = open(predict_txt, 'r')
    data = f.readlines()
    img_path, anno = data[0].strip().split('\t')
    img_name = os.path.basename(img_path)
    img_path = os.path.join(test_img, img_name)
    image = Image.open(img_path)

    data = json.loads(anno)
    boxes, txts, scores = [], [], []
    for dic in data:
        boxes.append(dic['points'])
        txts.append(dic['transcription'])
        scores.append(round(dic['scores'], 3))

    new_img = draw_ocr(image, boxes, txts, scores)

    cv2.imwrite(img_name, new_img)

conda环境的安装方式

最近升级了一下开发环境的cuda版本，发现可以用conda来做cuda和cudnn版本隔离，实现一个项目对应不同的cuda和cudnn版本。

conda install cudatoolkit==11.8
conda install cudnn==8.8.0.121
python -m pip install paddlepaddle-gpu==2.6.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install "paddleocr>=2.0.1"

注意： 如果在 Conda 环境中没有安装 CUDA Toolkit（cudatoolkit），而你直接执行 conda install cudnn 来安装 cuDNN，cuDNN 文件会被安装到本地机器上已经安装的 CUDA 目录中。这意味着 Conda 会尝试将 cuDNN 安装到系统中已有的 CUDA 安装目录下，而不是安装到 Conda 环境中的 CUDA 目录。
因此，在这种情况下，cuDNN 文件会直接安装到本地机器的 CUDA 目录中，可能会覆盖或更新已有的 cuDNN 文件，需要注意这一点以避免与已有的 CUDA 安装发生冲突。如果你希望将 cuDNN 安装到 Conda 环境中的 CUDA 目录下，建议先确保在 Conda 环境中安装了对应版本的 CUDA Toolkit，然后再安装 cuDNN。

此时再查看下相关依赖的版本，和宿主机的不一样的

import paddle
if __name__ == '__main__':
    print(paddle.__version__)
    print(paddle.is_compiled_with_cuda())
    print(paddle.get_cudnn_version())

Linux安装conda

只需要安装miniconda就行（Miniconda3仅包含conda、Python和一些必要的包以及少量其他有用的包）
见：https://docs.conda.io/projects/conda/en/stable/user-guide/install/linux.html

目录CONTENT

PaddleOCR安装教程