OpenOCR-UniRec-Demo / docs /finetune_det.md
duyongkun
update app
5de2f8f
# Fine-tuning Text Detection Model of OpenOCR System
1. [Data and Weights Preparation](#1-data-and-weights-preparation)
- [1.1 Data Preparation](#11-data-preparation)
- [1.2 Download Pre-trained Model](#12-download-pre-trained-model)
2. [Training](#2-training)
- [2.1 Start Training](#21-start-training)
- [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
3. [Evaluation and Test](#3-evaluation-and-test)
- [3.1 Evaluation](#31-evaluation)
- [3.2 Test](#32-test)
4. [ONNX Inference](#4-onnx-inference)
______________________________________________________________________
## Installation
#### Dependencies:
- [PyTorch](http://pytorch.org/) version >= 1.13.0
- Python version >= 3.7
```shell
conda create -n openocr python==3.8
conda activate openocr
# install gpu version torch
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# or cpu version
conda install pytorch torchvision torchaudio cpuonly -c pytorch
```
#### Clone this repository:
```shell
git clone https://github.com/Topdu/OpenOCR.git
cd OpenOCR
pip install albumentations
pip install -r requirements.txt
```
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in OpenOCR.
## 1. Data and Weights Preparation
### 1.1 Data Preparation
**Note:** If you want to use your own dataset, please following the format of [icdar2015 dataset](https://aistudio.baidu.com/datasetdetail/46088).
Downloading datasets from [icdar2015 dataset](https://aistudio.baidu.com/datasetdetail/46088)/[Google Drive](https://drive.google.com/file/d/1nfsYj-JzAqVouZPBDqmuP0Rkj6J6XFUJ/view?usp=sharing).
#### File Directory
```
OpenOCR/
icdar2015/text_localization/
└─ icdar_c4_train_imgs/ Training data of the icdar dataset
└─ ch4_test_images/ Testing data of the icdar dataset
└─ train_icdar2015_label.txt Training annotations of the icdar dataset
└─ test_icdar2015_label.txt Testing annotations of the icdar dataset
```
The provided annotation file format is as follows, where the fields are separated by "\\t":
```
"Image file name json.dumps encoded image annotation information"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]
```
Before being encoded with `json.dumps`, the image annotation information is a list containing multiple dictionaries. In each dictionary, the field `points` represents the coordinates (x, y) of the four corners of the text bounding box, arranged in a clockwise order starting from the top-left corner. The field `transcription` indicates the text content within the current bounding box.
To modify the training and evaluation dataset paths in the configuration file `./configs/det/dbnet/repvit_db.yml` to your own dataset paths, for example:
```yaml
Train:
dataset:
name: SimpleDataSet
data_dir: ../icdar2015/text_localization/ # Root directory of the training dataset
label_file_list: ["../icdar2015/text_localization/train_icdar2015_label.txt"] # Path to the training label file
......
Eval:
dataset:
name: SimpleDataSet
data_dir: ../icdar2015/text_localization/ # Root directory of the evaluation dataset
label_file_list: ["../icdar2015/text_localization/test_icdar2015_label.txt"] # Path to the evaluation label file
```
### 1.2 Download Pre-trained Model
First download the pre-trained model.
```bash
cd OpenOCR/
wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_det_repvit_ch.pth
```
______________________________________________________________________
## 2. Training
### 2.1 Start Training
```bash
# multi-GPU training
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model=./openocr_det_repvit_ch.pth
# single GPU training
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model=./openocr_det_repvit_ch.pth
```
### 2.2 Load Trained Model and Continue Training
If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded.
For example:
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.checkpoints=./your/trained/model
```
**Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrained_model`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrained_model` will be loaded.
______________________________________________________________________
## 3. Evaluation and Test
### 3.1 Evaluation
OpenOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score).
```bash
python tools/eval_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model="{path/to/weights}/best.pth"
```
### 3.2 Test
Test the detection result on all images in the folder or a single image:
```bash
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_fold or /path/img_file Global.pretrained_model={path/to/weights}/best.pth
```
______________________________________________________________________
## 4. ONNX Inference
Firstly, we can convert Detection model to onnx model:
```bash
pip install onnx
python tools/toonnx.py --c ./configs/det/dbnet/repvit_db.yml --o Global.device=cpu Global.pretrained_model={path/to/weights}/best.pth
```
The onnx model is saved in `./output/det_repsvtr_db/export_det/det_model.onnx`.
The detection onnx model inference:
```bash
pip install onnxruntime
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.backend=onnx Global.device=cpu Global.infer_img=/path/img_fold or /path/img_file Global.onnx_model_path=./output/det_repsvtr_db/export_det/det_model.onnx
```