File size: 6,338 Bytes
5de2f8f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
# Fine-tuning Text Detection Model of OpenOCR System

1. [Data and Weights Preparation](#1-data-and-weights-preparation)
   - [1.1 Data Preparation](#11-data-preparation)
   - [1.2 Download Pre-trained Model](#12-download-pre-trained-model)
2. [Training](#2-training)
   - [2.1 Start Training](#21-start-training)
   - [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
3. [Evaluation and Test](#3-evaluation-and-test)
   - [3.1 Evaluation](#31-evaluation)
   - [3.2 Test](#32-test)
4. [ONNX Inference](#4-onnx-inference)

______________________________________________________________________

## Installation

#### Dependencies:

- [PyTorch](http://pytorch.org/) version >= 1.13.0
- Python version >= 3.7

```shell
conda create -n openocr python==3.8
conda activate openocr
# install gpu version torch
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# or cpu version
conda install pytorch torchvision torchaudio cpuonly -c pytorch
```

#### Clone this repository:

```shell
git clone https://github.com/Topdu/OpenOCR.git
cd OpenOCR
pip install albumentations
pip install -r requirements.txt
```

This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in OpenOCR.

## 1. Data and Weights Preparation

### 1.1 Data Preparation

**Note:** If you want to use your own dataset, please following the format of [icdar2015 dataset](https://aistudio.baidu.com/datasetdetail/46088).

Downloading datasets from [icdar2015 dataset](https://aistudio.baidu.com/datasetdetail/46088)/[Google Drive](https://drive.google.com/file/d/1nfsYj-JzAqVouZPBDqmuP0Rkj6J6XFUJ/view?usp=sharing).

#### File Directory

```
OpenOCR/
icdar2015/text_localization/
  └─ icdar_c4_train_imgs/         Training data of the icdar dataset
  └─ ch4_test_images/             Testing data of the icdar dataset
  └─ train_icdar2015_label.txt    Training annotations of the icdar dataset
  └─ test_icdar2015_label.txt     Testing annotations of the icdar dataset
```

The provided annotation file format is as follows, where the fields are separated by "\\t":

```
"Image file name                   json.dumps encoded image annotation information"
ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]
```

Before being encoded with `json.dumps`, the image annotation information is a list containing multiple dictionaries. In each dictionary, the field `points` represents the coordinates (x, y) of the four corners of the text bounding box, arranged in a clockwise order starting from the top-left corner. The field `transcription` indicates the text content within the current bounding box.

To modify the training and evaluation dataset paths in the configuration file `./configs/det/dbnet/repvit_db.yml` to your own dataset paths, for example:

```yaml
Train:
  dataset:
    name: SimpleDataSet
    data_dir: ../icdar2015/text_localization/  # Root directory of the training dataset
    label_file_list: ["../icdar2015/text_localization/train_icdar2015_label.txt"]  # Path to the training label file
    ......
Eval:
  dataset:
    name: SimpleDataSet
    data_dir: ../icdar2015/text_localization/  # Root directory of the evaluation dataset
    label_file_list: ["../icdar2015/text_localization/test_icdar2015_label.txt"]  # Path to the evaluation label file
```

### 1.2 Download Pre-trained Model

First download the pre-trained model.

```bash
cd OpenOCR/
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_det_repvit_ch.pth
```

______________________________________________________________________

## 2. Training

### 2.1 Start Training

```bash
# multi-GPU training
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model=./openocr_det_repvit_ch.pth
# single GPU training
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model=./openocr_det_repvit_ch.pth
```

### 2.2 Load Trained Model and Continue Training

If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.

For example:

```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.checkpoints=./your/trained/model
```

**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrained_model`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrained_model` will be loaded.

______________________________________________________________________

## 3. Evaluation and Test

### 3.1 Evaluation

OpenOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).

```bash
python tools/eval_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model="{path/to/weights}/best.pth"
```

### 3.2 Test

Test the detection result on all images in the folder or a single image:

```bash
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_fold or /path/img_file Global.pretrained_model={path/to/weights}/best.pth
```

______________________________________________________________________

## 4. ONNX Inference

Firstly, we can convert Detection model to onnx model:

```bash
pip install onnx
python tools/toonnx.py --c ./configs/det/dbnet/repvit_db.yml --o Global.device=cpu Global.pretrained_model={path/to/weights}/best.pth
```

The onnx model is saved in `./output/det_repsvtr_db/export_det/det_model.onnx`.

The detection onnx model inference:

```bash
pip install onnxruntime
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.backend=onnx Global.device=cpu Global.infer_img=/path/img_fold or /path/img_file Global.onnx_model_path=./output/det_repsvtr_db/export_det/det_model.onnx
```