Spaces:
Sleeping
Sleeping
File size: 6,338 Bytes
5de2f8f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 |
# Fine-tuning Text Detection Model of OpenOCR System
1. [Data and Weights Preparation](#1-data-and-weights-preparation)
- [1.1 Data Preparation](#11-data-preparation)
- [1.2 Download Pre-trained Model](#12-download-pre-trained-model)
2. [Training](#2-training)
- [2.1 Start Training](#21-start-training)
- [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
3. [Evaluation and Test](#3-evaluation-and-test)
- [3.1 Evaluation](#31-evaluation)
- [3.2 Test](#32-test)
4. [ONNX Inference](#4-onnx-inference)
______________________________________________________________________
## Installation
#### Dependencies:
- [PyTorch](http://pytorch.org/) version >= 1.13.0
- Python version >= 3.7
```shell
conda create -n openocr python==3.8
conda activate openocr
# install gpu version torch
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# or cpu version
conda install pytorch torchvision torchaudio cpuonly -c pytorch
```
#### Clone this repository:
```shell
git clone https://github.com/Topdu/OpenOCR.git
cd OpenOCR
pip install albumentations
pip install -r requirements.txt
```
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in OpenOCR.
## 1. Data and Weights Preparation
### 1.1 Data Preparation
**Note:** If you want to use your own dataset, please following the format of [icdar2015 dataset](https://aistudio.baidu.com/datasetdetail/46088).
Downloading datasets from [icdar2015 dataset](https://aistudio.baidu.com/datasetdetail/46088)/[Google Drive](https://drive.google.com/file/d/1nfsYj-JzAqVouZPBDqmuP0Rkj6J6XFUJ/view?usp=sharing).
#### File Directory
```
OpenOCR/
icdar2015/text_localization/
ββ icdar_c4_train_imgs/ Training data of the icdar dataset
ββ ch4_test_images/ Testing data of the icdar dataset
ββ train_icdar2015_label.txt Training annotations of the icdar dataset
ββ test_icdar2015_label.txt Testing annotations of the icdar dataset
```
The provided annotation file format is as follows, where the fields are separated by "\\t":
```
"Image file name json.dumps encoded image annotation information"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]
```
Before being encoded with `json.dumps`, the image annotation information is a list containing multiple dictionaries. In each dictionary, the field `points` represents the coordinates (x, y) of the four corners of the text bounding box, arranged in a clockwise order starting from the top-left corner. The field `transcription` indicates the text content within the current bounding box.
To modify the training and evaluation dataset paths in the configuration file `./configs/det/dbnet/repvit_db.yml` to your own dataset paths, for example:
```yaml
Train:
dataset:
name: SimpleDataSet
data_dir: ../icdar2015/text_localization/ # Root directory of the training dataset
label_file_list: ["../icdar2015/text_localization/train_icdar2015_label.txt"] # Path to the training label file
......
Eval:
dataset:
name: SimpleDataSet
data_dir: ../icdar2015/text_localization/ # Root directory of the evaluation dataset
label_file_list: ["../icdar2015/text_localization/test_icdar2015_label.txt"] # Path to the evaluation label file
```
### 1.2 Download Pre-trained Model
First download the pre-trained model.
```bash
cd OpenOCR/
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_det_repvit_ch.pth
```
______________________________________________________________________
## 2. Training
### 2.1 Start Training
```bash
# multi-GPU training
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model=./openocr_det_repvit_ch.pth
# single GPU training
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model=./openocr_det_repvit_ch.pth
```
### 2.2 Load Trained Model and Continue Training
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
For example:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.checkpoints=./your/trained/model
```
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrained_model`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrained_model` will be loaded.
______________________________________________________________________
## 3. Evaluation and Test
### 3.1 Evaluation
OpenOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
```bash
python tools/eval_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model="{path/to/weights}/best.pth"
```
### 3.2 Test
Test the detection result on all images in the folder or a single image:
```bash
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_fold or /path/img_file Global.pretrained_model={path/to/weights}/best.pth
```
______________________________________________________________________
## 4. ONNX Inference
Firstly, we can convert Detection model to onnx model:
```bash
pip install onnx
python tools/toonnx.py --c ./configs/det/dbnet/repvit_db.yml --o Global.device=cpu Global.pretrained_model={path/to/weights}/best.pth
```
The onnx model is saved in `./output/det_repsvtr_db/export_det/det_model.onnx`.
The detection onnx model inference:
```bash
pip install onnxruntime
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.backend=onnx Global.device=cpu Global.infer_img=/path/img_fold or /path/img_file Global.onnx_model_path=./output/det_repsvtr_db/export_det/det_model.onnx
```
|