Spaces:
Sleeping
Sleeping
| # Fine-tuning Text Detection Model of OpenOCR System | |
| 1. [Data and Weights Preparation](#1-data-and-weights-preparation) | |
| - [1.1 Data Preparation](#11-data-preparation) | |
| - [1.2 Download Pre-trained Model](#12-download-pre-trained-model) | |
| 2. [Training](#2-training) | |
| - [2.1 Start Training](#21-start-training) | |
| - [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training) | |
| 3. [Evaluation and Test](#3-evaluation-and-test) | |
| - [3.1 Evaluation](#31-evaluation) | |
| - [3.2 Test](#32-test) | |
| 4. [ONNX Inference](#4-onnx-inference) | |
| ______________________________________________________________________ | |
| ## Installation | |
| #### Dependencies: | |
| - [PyTorch](http://pytorch.org/) version >= 1.13.0 | |
| - Python version >= 3.7 | |
| ```shell | |
| conda create -n openocr python==3.8 | |
| conda activate openocr | |
| # install gpu version torch | |
| conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia | |
| # or cpu version | |
| conda install pytorch torchvision torchaudio cpuonly -c pytorch | |
| ``` | |
| #### Clone this repository: | |
| ```shell | |
| git clone https://github.com/Topdu/OpenOCR.git | |
| cd OpenOCR | |
| pip install albumentations | |
| pip install -r requirements.txt | |
| ``` | |
| This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in OpenOCR. | |
| ## 1. Data and Weights Preparation | |
| ### 1.1 Data Preparation | |
| **Note:** If you want to use your own dataset, please following the format of [icdar2015 dataset](https://aistudio.baidu.com/datasetdetail/46088). | |
| Downloading datasets from [icdar2015 dataset](https://aistudio.baidu.com/datasetdetail/46088)/[Google Drive](https://drive.google.com/file/d/1nfsYj-JzAqVouZPBDqmuP0Rkj6J6XFUJ/view?usp=sharing). | |
| #### File Directory | |
| ``` | |
| OpenOCR/ | |
| icdar2015/text_localization/ | |
| ββ icdar_c4_train_imgs/ Training data of the icdar dataset | |
| ββ ch4_test_images/ Testing data of the icdar dataset | |
| ββ train_icdar2015_label.txt Training annotations of the icdar dataset | |
| ββ test_icdar2015_label.txt Testing annotations of the icdar dataset | |
| ``` | |
| The provided annotation file format is as follows, where the fields are separated by "\\t": | |
| ``` | |
| "Image file name json.dumps encoded image annotation information" | |
| ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}] | |
| ``` | |
| Before being encoded with `json.dumps`, the image annotation information is a list containing multiple dictionaries. In each dictionary, the field `points` represents the coordinates (x, y) of the four corners of the text bounding box, arranged in a clockwise order starting from the top-left corner. The field `transcription` indicates the text content within the current bounding box. | |
| To modify the training and evaluation dataset paths in the configuration file `./configs/det/dbnet/repvit_db.yml` to your own dataset paths, for example: | |
| ```yaml | |
| Train: | |
| dataset: | |
| name: SimpleDataSet | |
| data_dir: ../icdar2015/text_localization/ # Root directory of the training dataset | |
| label_file_list: ["../icdar2015/text_localization/train_icdar2015_label.txt"] # Path to the training label file | |
| ...... | |
| Eval: | |
| dataset: | |
| name: SimpleDataSet | |
| data_dir: ../icdar2015/text_localization/ # Root directory of the evaluation dataset | |
| label_file_list: ["../icdar2015/text_localization/test_icdar2015_label.txt"] # Path to the evaluation label file | |
| ``` | |
| ### 1.2 Download Pre-trained Model | |
| First download the pre-trained model. | |
| ```bash | |
| cd OpenOCR/ | |
| wget https://github.com/Topdu/OpenOCR/releases/download/develop0.0.1/openocr_det_repvit_ch.pth | |
| ``` | |
| ______________________________________________________________________ | |
| ## 2. Training | |
| ### 2.1 Start Training | |
| ```bash | |
| # multi-GPU training | |
| CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model=./openocr_det_repvit_ch.pth | |
| # single GPU training | |
| CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model=./openocr_det_repvit_ch.pth | |
| ``` | |
| ### 2.2 Load Trained Model and Continue Training | |
| If you expect to load trained model and continue the training again, you can specify the parameter `Global.checkpoints` as the model path to be loaded. | |
| For example: | |
| ```bash | |
| CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 tools/train_det.py --c configs/det/dbnet/repvit_db.yml --o Global.checkpoints=./your/trained/model | |
| ``` | |
| **Note**: The priority of `Global.checkpoints` is higher than that of `Global.pretrained_model`, that is, when two parameters are specified at the same time, the model specified by `Global.checkpoints` will be loaded first. If the model path specified by `Global.checkpoints` is wrong, the one specified by `Global.pretrained_model` will be loaded. | |
| ______________________________________________________________________ | |
| ## 3. Evaluation and Test | |
| ### 3.1 Evaluation | |
| OpenOCR calculates three indicators for evaluating performance of OCR detection task: Precision, Recall, and Hmean(F-Score). | |
| ```bash | |
| python tools/eval_det.py --c configs/det/dbnet/repvit_db.yml --o Global.pretrained_model="{path/to/weights}/best.pth" | |
| ``` | |
| ### 3.2 Test | |
| Test the detection result on all images in the folder or a single image: | |
| ```bash | |
| python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_fold or /path/img_file Global.pretrained_model={path/to/weights}/best.pth | |
| ``` | |
| ______________________________________________________________________ | |
| ## 4. ONNX Inference | |
| Firstly, we can convert Detection model to onnx model: | |
| ```bash | |
| pip install onnx | |
| python tools/toonnx.py --c ./configs/det/dbnet/repvit_db.yml --o Global.device=cpu Global.pretrained_model={path/to/weights}/best.pth | |
| ``` | |
| The onnx model is saved in `./output/det_repsvtr_db/export_det/det_model.onnx`. | |
| The detection onnx model inference: | |
| ```bash | |
| pip install onnxruntime | |
| python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.backend=onnx Global.device=cpu Global.infer_img=/path/img_fold or /path/img_file Global.onnx_model_path=./output/det_repsvtr_db/export_det/det_model.onnx | |
| ``` | |